Advancing Federated Learning for NLP in Healthcare

Problem and Need for the Study

Natural Language Processing (NLP) algorithms offer a way to conduct research with clinical text data. Federated learning is a decentralized approach to algorithm training, enabling researchers to train models on smaller, localized datasets without sharing sensitive health information. However, a key challenge arises when the datasets are imbalanced, meaning some categories have significantly more training data than others. This can cause misleading results, especially in critical applications like NLP in healthcare.

Innovation and Impact

This project is a continuation of the research team's previous work on federated learning for medical images.

The primary goals are:

  1. To develop and validate federated learning for NLP algorithms that extract and classify data from clinical text.
  2. To develop new federated learning methods compatible with imbalanced learning frameworks and capable of working under constraints. 

Currently, federated learning algorithms are limited to solving problems without constraints. The federated learning methods developed in this study will be the first to work with constraints and be compatible with frameworks that account for imbalanced clinical data.

Key Personnel

Headshot of Ju Sun
Assistant Professor, Department of Computer Science & Engineering, College of Science and Engineering
Rui Zhang, PhD, FAMIA
Professor and Chief, Division of Computational Health Sciences

Performance Sites

University of Minnesota

  • Multiple Principal Investigators: Ju Sun and Rui Zhang

Grant Details

  • This project is funded by Cisco Systems, Inc.
  • Project dates: 01-January-2023 to 31-December-2023