Protecting Privacy and Improving Personalization in Cross-Silo Federated Learning
As the world braces for the next pandemic, patient data collected by hospitals and health agencies continues to be a critical tool for monitoring and preventing the spread of disease. However, this data is highly sensitive and often stored in disparate locations, across dozens or even hundreds of hospitals and health agencies. Federated learning (FL) is a decentralized learning technique that can tackle such challenges by performing cross-silo learning while preserving individual privacy. In this article, we discuss our research on differential privacy in cross-silo FL and its application in pandemic forecasting.
Part 1: Tackling Privacy Challenges in Cross-Silo Federated Learning
1.1. Protecting Privacy using Differential Privacy
FL involves a central server sending a model to participating clients for training with their own local data. In cross-silo FL, using client-level differential privacy (DP) to protect privacy is not suitable as it may not effectively protect the sensitive data of all individuals within an organization. Instead, silo-specific example-level DP, where each silo sets its own (ε,δ) targets, is better aligned with real-world use cases.
DP adds randomness to a query on a dataset to create uncertainty about whether any data point has contributed to the query output. This statistical notion of privacy provides a formal guarantee and mitigation to attacks such as membership inference and data poisoning. We propose using DP-SGD for local gradient steps with calibrated noise for each silo’s privacy target.
1.2. Improving Personalization using Model Personalization
Model personalization is a technique used to improve model performance in FL when there is data heterogeneity across silos. We propose mean-regularized multi-task learning (MR-MTL) as a form of personalization that balances privacy and utility trade-offs.
MR-MTL asks each client to train its own local model, regularize it towards the mean of others’ models via a penalty, and keep the model across rounds. The mean model is updated in every round and serves as an input for subsequent rounds. MR-MTL has the following properties in private cross-silo FL:
– Noise reduction is attained throughout training via the soft proximity constraint towards an averaged model;
– The mean-regularization itself has no privacy overhead; and
– A hyperparameter, λ, provides a smooth interpolation along the personalization spectrum.
Part 2: Application in Pandemic Forecasting
Using our research on DP in cross-silo FL and MR-MTL for model personalization, we recently won a 1st place, $100k prize in a competition hosted by the US & UK governments on privacy-preserving pandemic forecasting.
Our analysis points to mean-regularized multi-task learning (MR-MTL) as a simple yet particularly suitable form of personalization for privacy in cross-silo FL. Personalization algorithms such as finetuning, clustering, and MR-MTL effectively navigate the personalization spectrum in different ways, each with unique trade-offs between privacy and heterogeneity.
Part 3: Future Work
Ongoing research on differential privacy and model personalization in cross-silo FL aims to explore how it can be applied in data collaboration across multiple industries and organizations. This includes investigating privacy challenges in decentralized learning settings such as distributed learning and multi-party computation, and exploring how to handle non-iid data silos, where data may not be identically distributed.
Conclusion
Using our research on differential privacy in cross-silo federated learning, we won a 1st place in a competition on privacy-preserving pandemic forecasting. We proposed silo-specific example-level DP as a privacy solution and mean-regularized multi-task learning (MR-MTL) as a personalization technique to handle data heterogeneity across silos. Our ongoing research focuses on applying our techniques across sectors to improve collaboration without compromising individual privacy.
GIPHY App Key not set. Please check settings