Federated learning is an important concept when it comes to developing and validating artificial intelligence (AI) models for clinical applications. CLHSS and our programs are conducting key projects related to federated learning. We want to describe what federated learning is and why it’s important to our work to improve healthcare practice. 

Before discussing federated learning, it is important to understand generalizability. AI models are used to predict things such as the price of a house next year or whether or not someone has a disease. One problem any model has (AI or not) is that it needs to work at places other than where it was developed.

For example, if you build a model that predicts house prices using only housing data in Minnesota, that model may work great for the state of Minnesota, but if you try to use that model on housing prices in Florida it may not work as well. There may be different things that affect housing prices in Florida, such as if you are in a flood zone. This concept of a model working somewhere else is called “generalizability” meaning that the model “generalizes” to the entire country. 

To get around this problem in medicine, you need to have a lot of hospitals share their data so that the models that are developed work broadly across the country. This would be similar to combining data from multiple states to build a single model to predict housing prices across the entire country. However, housing data is easily available and not subject to privacy protection rules such as HIPAA, so it is much easier to do.

In healthcare, historically we have set up agreements which allow one hospital that is taking the lead on developing the model to receive data from all other hospitals in order to train the model. This process of sharing data from multiple hospitals to a single institution and having them develop the model is known as a “centralized” approach, because all of the data is centralized to one place.

The centralized approach has some problems, including a greater risk of exposing protected health information and issues with translating each hospital’s nuances in their data. Transferring all of the data can also take a long time, and only the centralized institution’s computer can be used instead of leveraging the power of the computers at each institution.

To fix this problem, in 2017 Google developed a “decentralized” approach known as federated learning. With this method, each hospital keeps their data safe within their own institution and firewalls. Federated learning addresses all limitations the “centralized” approach has and therefore is rapidly becoming the new standard in healthcare and beyond.

With federated learning, a single institution leading the model development sends a copy of the model to each hospital. That model can be as simple as a tabulation that asks each hospital how many patients they have that are over 65 years old and sends back a percentage to the lead institution. The lead institution would then use an algorithm that allows it to combine the percentages from each site to come up with the average percentage for all sites.

This is conceptually the same thing that happens with AI model training where it only sends back summary information. Whether the model is simple or complex, no individual patient information or identified information is ever sent back, nor is any unique count data ever sent back. Only combined percentages are sent back, and it will always be sent using encryption.

CLHSS is part of a partnership that aims to foster collaborative federated learning with multiple partner sites (University of Florida, Emory University, Indiana University, Medical University of South Carolina, University of North Carolina). The group is currently operating without a federated learning server, which makes validating AI models more difficult. 

Our Program for Clinical AI is working with the partner sites to validate a model that predicts the probability of a patient having rib fracture. However, without a server, the program has to develop a customized validation process for each individual site. 

Faculty and staff within CLHSS have been working to build a federated learning server within the University of Minnesota infrastructure to address these issues. This will allow the University to collaborate with other hospitals on machine learning projects without sharing their patient data in any shape or form. The server will also allow CLHSS to use the same, streamlined process to validate models at our federated learning collaborative partner sites.