Introduction
Contributions
-
We have developed an evaluation algorithm called DCEM(Data Composite Evaluation Method) to assess the quality of participant data. Based on a composite rating of data quality, we control the level of participation of data in the global iterations of Federated training, aiming to mitigate the negative impact of low-quality data on model accuracy;
-
By using a single-cloud outsourcing approach, we transfer complex computations to a cloud server in order to reduce the computational burden on participants and improve the inclusiveness of the model for participants with limited computing capabilities;
-
By utilizing the distributed Paillier homomorphic encryption mechanism, we have introduced a federated learning multi-party aggregation scheme that allows partial low-quality data to participate in training. This scheme effectively ensures the privacy and security of participants’ private data while being user-friendly for participants to join or exit midway. Experimental results have shown that this scheme exhibits excellent overall performance.
Organization
Related work
Preliminaries
Federated learning
Distributed paillier cryptosystem
System model
System architecture
Threat model and privacy objectives
-
Single malicious participant attack: A single malicious participant in federated training may exhibit curiosity towards the model parameters uploaded by other participants and attempt to infer their private information.
-
Cloud Server Attack: In the process of global model iteration, cloud servers may launch inference attacks on participants’ private data by learning and reasoning over the local parameter data uploaded by participants in federated learning. This can lead to privacy breaches and the potential for malicious actors to infer sensitive information about the participants.
-
Some participants conspired with the cloud server to attack: When some malicious participants collude with the server to launch an attack, they may exploit shared model parameters to infer and obtain private information of other participants, leading to security issues such as privacy leaks among participants.
-
Privacy of participant’s local gradients: Attackers may obtain the local gradient data obtained by participants during local training using their data and use this information to reconstruct the original data, leading to privacy breaches (such as medical records, location, visited websites, etc.). To protect the privacy of participant data, we set participants should encrypt the local gradient before uploading it to the cloud server, in addition to local operations. The cloud server only performs aggregation and computation on the encrypted data, ensuring the privacy of the original data.
-
Privacy protection of aggregated evaluation values: The privacy of the aggregated evaluation values of each participant also needs to be ensured. If leaked, participants with lower data quality may face discrimination during the training process, thus impacting the fairness of training. Therefore, it is crucial to maintain the privacy and confidentiality of the aggregated evaluation values to uphold the fairness and integrity of the training process.
Our proposed scheme
Composite evaluations containing participants with low quality data
Component orientation evaluation
Component dispersion evaluation
Data evaluation aggregation
Aggregated value update
Secure aggregation protocol(SAP)
The establishment of the PPEL-LQDP
Correctness
Security analysis
-
Single malicious participant attack: A single malicious participant seeks to obtain the privacy of other participants through collected information. In our “SAP” protocol, no participant knows all the private keys. Therefore, even if the malicious participant collects the encrypted gradient information sent by other participants to the cloud server, they cannot obtain the plaintext information.
-
Semi-honest cloud server: Participants need to send their locally trained gradient information to the cloud server in an encrypted form, in order to offload large-scale computations to a third party. In our setup, the cloud server will perform all computations as specified, but will infer participants’ information based on the knowledge acquired. During the interaction process of our scheme, the cloud server does not possess the private key, which is held by all participants in the key group. Decryption requires the collaboration of t participants. Therefore, the cloud server can only obtain aggregated information and cannot access participants’ private data.
-
Malicious Participants Collaborating with the Cloud Server: In the case of a collusion attack between participants and the cloud server, malicious participants can provide partial private keys to the cloud server. If the cloud server manages to collect partial private keys from t or more participants, it can access the data of all participants. Therefore, the proposed scheme can resist collusion attacks when the number of malicious participants is less than or equal to \(t - 1\), as distributed decryption requires collecting at least t partial decryption results before aggregating the final result. Thus, this scheme possesses a certain level of resilience against collusion attacks.
Performance analysis
Functionality
Participant privacy protection | Prevent users from going offline or joining in the middle | All the data participated in the training | Threat model | Server type | |
---|---|---|---|---|---|
SecProbe | \(\checkmark\) | \(\times\) | \(\checkmark\) | Semi honest | Single-Server |
PPFDL | \(\checkmark\) | \(\times\) | \(\checkmark\) | Semi honest | Dual-Servers |
EPPFL | \(\checkmark\) | \(\checkmark\) | \(\times\) | Semi honest | Single-Server |
PPFL-LQDP | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | Semi honest | Single-Server |