Federated and Imbalanced Learning for Medical NLP and Imaging
Problem & Need for the Study
Natural Language Processing (NLP) algorithms offer a way to conduct research with clinical text data. Federated learning is a decentralized approach to train algorithms like these. Separate copies of the model are trained in independent sessions using different, smaller sets of data. The trained copies are uploaded to a server and integrated into one, centralized model. This is important to developing clinical algorithms because it allows researchers to train models without sharing sensitive health information. However, the data sets models are trained with may be imbalanced, meaning there is much more training data for some categories than others. This can cause misleading results.
Innovation & Impact
This project is a continuation of the research team's previous work on federated learning for medical images.
The primary goals are:
- To develop and validate federated learning for NLP algorithms that extract and classify data from clinical text.
- To develop new federated learning methods that can work with constraints and are compatible with imbalanced learning frameworks.
Currently, federated learning algorithms have only been developed to solve learning problems that do not have constraints. The federated learning methods developed in this study will be the first to work with constraints and be compatible with frameworks that account for imbalanced data.
Key Personnel & Performance Sites
University of Minnesota
- Principal Investigators: Ju Sun and Rui Zhang
This project is funded by Cisco Systems, Inc.
Project dates: 01-January-2023 to 31-December-2023