Skip to main content
Top

2020 | Book

Federated Learning

Privacy and Incentive

insite
SEARCH

About this book

This book provides a comprehensive and self-contained introduction to Federated Learning, ranging from the basic knowledge and theories to various key applications, and the privacy and incentive factors are the focus of the whole book. This book is timely needed since Federated Learning is getting popular after the release of the General Data Protection Regulation (GDPR). As Federated Learning aims to enable a machine model to be collaboratively trained without each party exposing private data to others. This setting adheres to regulatory requirements of data privacy protection such as GDPR.

This book contains three main parts. First, it introduces different privacy-preserving methods for protecting a Federated Learning model against different types of attacks such as Data Leakage and/or Data Poisoning. Second, the book presents incentive mechanisms which aim to encourage individuals to participate in the Federated Learning ecosystems. Last but not the least, this book also describes how Federated Learning can be applied in industry and business to address data silo and privacy-preserving problems. The book is intended for readers from both academia and industries, who would like to learn federated learning from scratch, practice its implementation, and apply it in their own business.

Readers are expected to have some basic understanding of linear algebra, calculus, and neural network. Additionally, domain knowledge in FinTech and marketing are preferred.

Table of Contents

Frontmatter

Privacy

Frontmatter
Threats to Federated Learning
Abstract
As data are increasingly being stored in different silos and societies becoming more aware of data privacy issues, the traditional centralized approach of training artificial intelligence (AI) models is facing strong challenges. Federated learning (FL) has recently emerged as a promising solution under this new reality. Existing FL protocol design has been shown to exhibit vulnerabilities which can be exploited by adversaries both within and outside of the system to compromise data privacy. It is thus of paramount importance to make FL system designers aware of the implications of future FL algorithm design on privacy-preservation. Currently, there is no survey on this topic. In this chapter, we bridge this important gap in FL literature. By providing a concise introduction to the concept of FL, and a unique taxonomy covering threat models and two major attacks on FL: 1) poisoning attacks and 2) inference attacks, we provide an accessible review of this important topic. We highlight the intuitions, key techniques as well as fundamental assumptions adopted by various attacks, and discuss promising future research directions towards more robust privacy preservation in FL.
Lingjuan Lyu, Han Yu, Jun Zhao, Qiang Yang
Deep Leakage from Gradients
Abstract
Exchanging model updates is a widely used method in the modern federated learning system. For a long time, people believed that gradients are safe to share: i.e., the gradients are less informative than the training data. However, there is information hidden in the gradients. Moreover, it is even possible to reconstruct the private training data from the publicly shared gradients. This chapter discusses techniques that reveal information hidden in gradients and validate the effectiveness on common deep learning tasks. It is important to raise people’s awareness to rethink the gradient’s safety. Several possible defense strategies have also been discussed to prevent such privacy leakage.
Ligeng Zhu, Song Han
Rethinking Privacy Preserving Deep Learning: How to Evaluate and Thwart Privacy Attacks
Abstract
This chapter investigates capabilities of Privacy-Preserving Deep Learning (PPDL) mechanisms against various forms of privacy attacks. First, we propose to quantitatively measure the trade-off between model accuracy and privacy losses incurred by reconstruction, tracing and membership attacks. Second, a novel Secret Polarization Network (SPN) is proposed to thwart privacy attacks, which is highly competitive against existing PPDL methods. Extensive experiments showed that model accuracies are improved on average by 5–20% compared with baseline mechanisms, in regimes where data privacy are satisfactorily protected.
Lixin Fan, Kam Woh Ng, Ce Ju, Tianyu Zhang, Chang Liu, Chee Seng Chan, Qiang Yang
Task-Agnostic Privacy-Preserving Representation Learning via Federated Learning
Abstract
The availability of various large-scale datasets benefits the advancement of deep learning. These datasets are often crowdsourced from individual users and contain private information like gender, age, etc. Due to rich private information embedded in the raw data, users raise the concerns on privacy leakage from the shared data. Such privacy concerns will hinder the generation or use of crowdsourcing datasets and lead to hunger of training data for new deep learning applications. In this work, we present TAP, a task-agnostic privacy-preserving representation learning framework to protect data privacy with anonymized intermediate representation. The goal of this framework is to learn a feature extractor that can hide the privacy information from the intermediate representations; while maximally retaining the original information embedded in the raw data for the data collector to accomplish unknown learning tasks. We adopt the federated learning paradigm to train the feature extractor, such that learning the extractor is also performed in a privacy-respecting fashion. We extensively evaluate TAP and compare it with existing methods using two image datasets and one text dataset. Our results show that TAP can offer a good privacy-utility tradeoff.
Ang Li, Huanrui Yang, Yiran Chen
Large-Scale Kernel Method for Vertical Federated Learning
Abstract
For nowadays real-world data mining task, multiple data holders usually maintain the different feature part of the common data which is called as vertically partitioned data. Accompanying by the emerging demand of privacy persevering, it is hard to do data mining over this kind of vertically partitioned data by legacy machine learning methods. In consideration of less literature for non-linear learning over kernels, in this chapter, we propose a vertical federated kernel learning (VFKL) method to train over the vertically partitioned data. Specifically, we first approximate the kernel function by the random feature technique, and then federatedly update the predict function by the special designed doubly stochastic gradient without leaking privacy in both data and model. Theoretically, our VFKL could provide a sublinear convergence rate, and guarantee the security of data under the common semi-honest assumption. We conduct numerous experiments on various datasets to demonstrate the effectiveness and superiority of the proposed VFKL method.
Zhiyuan Dang, Bin Gu, Heng Huang
Towards Byzantine-Resilient Federated Learning via Group-Wise Robust Aggregation
Abstract
Federated learning (FL) is a distributed machine learning approach where many participants collaboratively train a model while keeping the training data decentralized. The distributed setting makes FL vulnerable to Byzantine failures and malicious participants, leading to unusable models or compromised model with backdoor. Although several robust statistics based methods have been proposed to address the robustness of FL against Byzantine failures, recent works has shown the insufficiency of their defenses because that non-i.i.d data distribution and high variance of gradients among participants in practical FL settings disrupt their common assumptions. To address this problem, we propose a simple but efficient group-wise robust aggregation framework, which clusters model parameters into different groups and applies robust aggregation with-in each cluster. We apply our framework with a number of popular Byzantine-robust aggregation methods, and evaluate its resiliency against the attack that can successfully circumvent these methods in their original settings. Our experimental results demonstrate that the group-wise robust aggregation effectively improves the robustness against Byzantine failures and highlights the effect of clustering for addressing the gap between practical FL and theoretical assumptions of robust statistics based defense.
Lei Yu, Lingfei Wu
Federated Soft Gradient Boosting Machine for Streaming Data
Abstract
Federated learning has received wide attention in both academic and industrial communities recently. Designing federated learning models applicable on the streaming data has received growing interests since the data stored within each participant may often vary from time to time. Based on recent advancements on soft gradient boosting machine, in this work, we propose the federated soft gradient boosting machine framework applicable on the streaming data. Compared with traditional gradient boosting methods, where base learners are trained sequentially, each base learner in the proposed framework can be efficiently trained in a parallel and distributed fashion. Experiments validated the effectiveness of the proposed method in terms of accuracy and efficiency, compared with other federated ensemble methods as well as its corresponding centralized versions when facing the streaming data.
Ji Feng, Yi-Xuan Xu, Yong-Gang Wang, Yuan Jiang
Dealing with Label Quality Disparity in Federated Learning
Abstract
Federated Learning (FL) is highly useful for the applications which suffer silo effect and privacy preserving, such as healthcare, finance, education, etc. Existing FL approaches generally do not account for disparities in the quality of local data labels. However, the participants tend to suffer from label noise due to annotators’ varying skill-levels, biases or malicious tampering. In this chapter, we propose an alternative approach to address this challenge. It maintains a small set of benchmark samples on the FL coordinator and quantifies the credibility of the participants’ local data without directly observing them by computing the mutual cross-entropy between performance of the FL model on the local datasets and that of the participant’s local model on the benchmark dataset. Then, a credit-weighted orchestration is performed to adjust the weight assigned to participants in the FL model based on their credibility values. By experimentally evaluating on both synthetic data and real-world data, the results show that the proposed approach effectively identifies participants with noisy labels and reduces their impact on the FL model performance, thereby significantly outperforming existing FL approaches.
Yiqiang Chen, Xiaodong Yang, Xin Qin, Han Yu, Piu Chan, Zhiqi Shen

Incentive

Frontmatter
FedCoin: A Peer-to-Peer Payment System for Federated Learning
Abstract
Federated learning (FL) is an emerging collaborative machine learning method to train models on distributed datasets with privacy concerns. To properly incentivize data owners to contribute their efforts, Shapley Value (SV) is often adopted to fairly and quantitatively assess their contributions. However, the calculation of SV is time-consuming and computationally costly. In this chapter, we propose FedCoin, a blockchain-based peer-to-peer payment system for FL to enable a feasible SV based profit distribution. In FedCoin, blockchain consensus entities calculate SVs and a new block is created based on the proof of Shapley (PoSap) protocol. It is in contrast to the popular BitCoin network where consensus entities “mine” new blocks by solving meaningless puzzles. Based on the computed SVs, we propose a scheme for dividing the incentive payoffs among FL participants with non-repudiation and tamper-resistance properties. Experimental results based on real-world data show that FedCoin can promote high-quality data from FL participants through accurately computing SVs with an upper bound on the computational resources required for reaching block consensus. It opens opportunities for non-data owners to play a role in FL.
Yuan Liu, Zhengpeng Ai, Shuai Sun, Shuangfeng Zhang, Zelei Liu, Han Yu
Efficient and Fair Data Valuation for Horizontal Federated Learning
Abstract
Availability of big data is crucial for modern machine learning applications and services. Federated learning is an emerging paradigm to unite different data owners for machine learning on massive data sets without worrying about data privacy. Yet data owners may still be reluctant to contribute unless their data sets are fairly valuated and paid. In this work, we adapt Shapley value, a widely used data valuation metric to valuating data providers in federated learning. Prior data valuation schemes for machine learning incur high computation cost because they require training of extra models on all data set combinations. For efficient data valuation, we approximately construct all the models necessary for data valuation using the gradients in training a single model, rather than train an exponential number of models from scratch. On this basis, we devise three methods for efficient contribution index estimation. Evaluations show that our methods accurately approximate the contribution index while notably accelerating its calculation.
Shuyue Wei, Yongxin Tong, Zimu Zhou, Tianshu Song
A Principled Approach to Data Valuation for Federated Learning
Abstract
Federated learning (FL) is a popular technique to train machine learning (ML) models on decentralized data sources. In order to sustain long-term participation of data owners, it is important to fairly appraise each data source and compensate data owners for their contribution to the training process. The Shapley value (SV) defines a unique payoff scheme that satisfies many desiderata for a data value notion. It has been increasingly used for valuing training data in centralized learning. However, computing the SV requires exhaustively evaluating the model performance on every subset of data sources, which incurs prohibitive communication cost in the federated setting. Besides, the canonical SV ignores the order of data sources during training, which conflicts with the sequential nature of FL. This chapter proposes a variant of the SV amenable to FL, which we call the federated Shapley value. The federated SV preserves the desirable properties of the canonical SV while it can be calculated without incurring extra communication cost and is also able to capture the effect of participation order on data value. We conduct a thorough empirical study of the federated SV on a range of tasks, including noisy label detection, adversarial participant detection, and data summarization on different benchmark datasets, and demonstrate that it can reflect the real utility of data sources for FL and has the potential to enhance system robustness, security, and efficiency. We also report and analyze “failure cases” and hope to stimulate future research.
Tianhao Wang, Johannes Rausch, Ce Zhang, Ruoxi Jia, Dawn Song
A Gamified Research Tool for Incentive Mechanism Design in Federated Learning
Abstract
Federated Learning (FL) enables multiple participants to collaboratively train AI models in a privacy-preserving manner, which incurs cost during the training processing. This can be a significant issue especially for business participants [8]. These costs include communication, technical, compliance, risk of market share erosion and free-riding problems (i.e., participants may only join FL training with low-quality data) [6]. Motivating participants to contribute high-quality data continuously and maintain a healthy FL ecosystem is a challenging problem. The key to achieving this goal is through effective and fair incentive schemes. When designing such schemes, it is important for researchers to understand how FL participants react under different schemes and situations.
In this chapter, we present a multi-player game to facilitate researchers to study federated learning incentive schemes – FedGame (A demonstration video of the platform can be found at: https://​youtu.​be/​UhAMVx8SOE8. Additional resources about the platform will continuously be made available over time at: http://​www.​federated-learning.​org/​.), by extending our previous work in [5]. FedGame allows human players to role-play as FL participants under various conditions. It serves as a tool for researchers or incentive mechanism designers to study the impact of emergent behaviors by FL participants under different incentive schemes. It can be useful for eliciting human behaviour patterns in FL and identifying potential loopholes in the proposed incentive scheme. After learning the behaviour pattern, FedGame can, in turn, further test any given incentive scheme’s competitiveness again schemes based on real decision patterns.
Zichen Chen, Zelei Liu, Kang Loon Ng, Han Yu, Yang Liu, Qiang Yang
Budget-Bounded Incentives for Federated Learning
Abstract
We consider federated learning settings with independent, self-interested participants. As all contributions are made privately, participants may be tempted to free-ride and provide redundant or low-quality data while still enjoying the benefits of the FL model. In Federated Learning, this is especially harmful as low-quality data can degrade the quality of the FL model.
Free-riding can be countered by giving incentives to participants to provide truthful data. While there are game-theoretic schemes for rewarding truthful data, they do not take into account redundancy of data with previous contributions. This creates arbitrage opportunities where participants can gain rewards for redundant data, and the federation may be forced to pay out more incentives than justified by the value of the FL model.
We show how a scheme based on influence can both guarantee that the incentive budget is bounded in proportion to the value of the FL model, and that truthfully reporting data is the dominant strategy of the participants. We show that under reasonable conditions, this result holds even when the testing data is provided by participants.
Adam Richardson, Aris Filos-Ratsikas, Boi Faltings
Collaborative Fairness in Federated Learning
Abstract
In current deep learning paradigms, local training or the Standalone framework tends to result in overfitting and thus low utility. This problem can be addressed by Distributed or Federated Learning (FL) that leverages a parameter server to aggregate local model updates. However, all the existing FL frameworks have overlooked an important aspect of participation: collaborative fairness. In particular, all participants can receive the same or similar models, even the ones who contribute relatively less, and in extreme cases, nothing. To address this issue, we propose a novel Collaborative Fair Federated Learning (CFFL) framework which utilizes reputations to enforce participants to converge to different models, thus ensuring fairness and accuracy at the same time. Extensive experiments on benchmark datasets demonstrate that CFFL achieves high fairness and performs comparably to the Distributed framework and better than the Standalone framework.
Lingjuan Lyu, Xinyi Xu, Qian Wang, Han Yu
A Game-Theoretic Framework for Incentive Mechanism Design in Federated Learning
Abstract
Federated learning (FL) has great potential for coalescing isolated data islands. It enables privacy-preserving collaborative model training and addresses security and privacy concerns. Besides booming technological breakthroughs in this field, for better commercialization of FL in the business world, we also need to provide sufficient monetary incentives to data providers. The problem of FL incentive mechanism design is therefore proposed to find out the optimal organization and payment structure for the federation. This problem can be tackled by game theory.
In this chapter, we set up a research framework for reasoning about FL incentive mechanism design. We introduce key concepts and their mathematical notations specified under the FML environment, hereby proposing a precise definition of the FML incentive mechanism design problem. Then, we break down the big problem into a demand-side problem and a supply-side problem. Based on different settings and objectives, we provide a checklist for FL practitioners to choose the appropriate FL incentive mechanism without deep knowledge in game theory.
As examples, we introduce the Crémer-McLean mechanism to solve the demand-side problem and present a VCG-based mechanism, PVCG, to solve the demand-side problem. These mechanisms both guarantee truthfulness, i.e., they encourage participants to truthfully report their private information and offer all their data to the federation. Crémer-McLean mechanism, together with PVCG, attains allocative efficiency, individual rationality, and weak budget balancedness at the same time, easing the well-known tension between these objectives in the mechanism design literature.
Mingshu Cong, Han Yu, Xi Weng, Siu Ming Yiu

Applications

Frontmatter
Federated Recommendation Systems
Abstract
Recommender systems are heavily data-driven. In general, the more data the recommender systems use, the better the recommendation results are. However, due to privacy and security constraints, directly sharing user data is undesired. Such decentralized silo issues commonly exist in recommender systems. There have been many pilot studies on protecting data privacy and security when utilizing data silos. But, most works still need the users’ private data to leave the local data repository. Federated learning is an emerging technology, which tries to bridge the data silos and build machine learning models without compromising user privacy and data security. In this chapter, we introduce a new notion of federated recommender systems, which is an instantiation of federated learning on decentralized recommendation. We formally define the problem of the federated recommender systems. Then, we focus on categorizing and reviewing the current approaches from the perspective of the federated learning. Finally, we put forward several promising future research challenges and directions.
Liu Yang, Ben Tan, Vincent W. Zheng, Kai Chen, Qiang Yang
Federated Learning for Open Banking
Abstract
Open banking enables individual customers to own their banking data, which provides fundamental support for the boosting of a new ecosystem of data marketplaces and financial services. In the near future, it is foreseeable to have decentralized data ownership in the finance sector using federated learning. This is a just-in-time technology that can learn intelligent models in a decentralized training manner. The most attractive aspect of federated learning is its ability to decompose model training into a centralized server and distributed nodes without collecting private data. This kind of decomposed learning framework has great potential to protect users’ privacy and sensitive data. Therefore, federated learning combines naturally with an open banking data marketplaces. This chapter will discuss the possible challenges for applying federated learning in the context of open banking, and the corresponding solutions have been explored as well.
Guodong Long, Yue Tan, Jing Jiang, Chengqi Zhang
Building ICU In-hospital Mortality Prediction Model with Federated Learning
Abstract
In-hospital mortality prediction is a crucial task in the clinical settings. Nevertheless, individual hospitals alone often have limited amount of local data to build a robust model. Usually domain transfer of an in-hospital mortality prediction model built with publicly-accessible dataset is conducted. The study in [6] shows quantitatively that with more datasets from different hospitals being shared, the generalizability and performance of domain transfer improves. We see this as an area that Federated Learning could help. It enables collaborative modelling to take place in a decentralized manner, without the need for aggregating all datasets in one place. This chapter reports a recent pilot of building an in-hospital mortality model with Federated Learning. It empirically shows that Federated Learning does achieve a similar level of performance with centralized training, but with additional benefit of no dataset exchanging among different hospitals. It also compares the performance of two common federated aggregation algorithms empirically in the Intensive Care Unit (ICU) setting, namely FedAvg and FedProx.
Trung Kien Dang, Kwan Chet Tan, Mark Choo, Nicholas Lim, Jianshu Weng, Mengling Feng
Privacy-Preserving Stacking with Application to Cross-organizational Diabetes Prediction
Abstract
To meet the standard of differential privacy, noise is usually added into the original data, which inevitably deteriorates the predicting performance of subsequent learning algorithms. In this chapter, motivated by the success of improving predicting performance by ensemble learning, we propose to enhance privacy-preserving logistic regression by stacking. We show that this can be done either by sample-based or feature-based partitioning. However, we prove that when privacy-budgets are the same, feature-based partitioning requires fewer samples than sample-based one, and thus likely has better empirical performance. As transfer learning is difficult to be integrated with a differential privacy guarantee, we further combine the proposed method with hypothesis transfer learning to address the problem of learning across different organizations. Finally, we not only demonstrate the effectiveness of our method on two benchmark data sets, i.e., MNIST and NEWS20, but also apply it into a real application of cross-organizational diabetes prediction from RUIJIN data set, where privacy is of a significant concern.
Xiawei Guo, Quanming Yao, James Kwok, Weiwei Tu, Yuqiang Chen, Wenyuan Dai, Qiang Yang
Backmatter
Metadata
Title
Federated Learning
Editors
Qiang Yang
Lixin Fan
Han Yu
Copyright Year
2020
Electronic ISBN
978-3-030-63076-8
Print ISBN
978-3-030-63075-1
DOI
https://doi.org/10.1007/978-3-030-63076-8

Premium Partner