Introduction
-
Data are labeled by a single social network is limited, and the training data has a significant influence on the training effect of the machine learning model, so the amount of data needs to be expanded while ensuring data security.
-
The increasing number of depressed people puts forward higher requirements on the accuracy of the model.
-
With the development of big data, privacy-preserving has attracted attention from the public. How to conduct model training on massive data is an important issue to be solved under the conditions of improving data security and ensure the efficiency of the model.
-
We propose a new asynchronous federated optimization algorithm with provable convergence for non-convex problems under Weibo users’ data.
-
We show that our proposed method can effectively protect users' privacy under the premise of ensuring the accuracy of prediction.
-
We use 900 users to train the model together, which improves the utilization of resources and the performance of the model.
-
The proposed algorithm can reduce communication overhead.
Related work
Machine learning
Natural language processing
Federated learning
-
Non-IID: there are different data distributions on each device [18], i.e., the overall distribution cannot be learned from data on a single device.
-
Imbalanced data: Data can be biased to certain labels [17], e.g., users may have different habits or edge devices are monitoring different locations.
-
Heterogeneity: Data size and device performance may vary on different local devices [19].
Notation/tern | Description |
---|---|
\(\textbf{n}\) | Number of devices |
\(T\) | Number of global epochs |
\({H}_{\tau }^{i}\) | Number of local iterations in the \({\tau }^{th}\) epoch on the \(i\)th devices |
\({w}_{t}\) | Global model in the \({t}^{th}\) epoch on sever |
\({w}_{t}^{k}\) | The \({k}^{th}\) entry of \({w}_{t}\) |
\({w}_{\tau ,h}^{i}\) | Model initialized from \({w}_{\tau }\), updated in the \(h\)th local iteration, on the \(i\)th device |
\({w}_{back}^{i}\) | the model before \({w}_{\mathrm{new}}^{i}\) is updated, on the \(i\)th devices |
\(c\) | A hyper-parameter |
\(r\) | A random number |
\(\vartheta \) | A small constant |
\(\beta \) | A magnitude coefficient |
\({D}^{i}\) | Dataset on the \(i\)th device |
\({z}_{t,h}^{i}\) | Data(minibatch) sampled from \({D}^{i}\) |
\(\gamma \) | Learning rate |
Server | The place where the training data are placed |
Worker | One worker on each device, process that trains the model |
Differential privacy
Methods
Initial model
-
CNN-rand: all words are randomly initialized and then modified during training.
-
CNN-static: all words presented by pre-trained word vectors, including the unknown ones that are randomly initialized are kept static, and only other parameters of the model are learned.
-
CNN-nostatic: similar to above but the pre-trained vectors are fine-tuned.
FedAvg model
CAFed model
Assumptions and lemmas
Convergence guarantees
Experiments
Data analysis
Total sample | Depression samples | Number of normal users | ||
---|---|---|---|---|
Training sample size | Test sample size | Training sample size | Test sample size | |
900 | 253 | 74 | 467 | 106 |
Experiment results
Method | Precision (%) | Recall (%) | F-measure (%) | Accuracy (%) |
---|---|---|---|---|
CNN + rand | 91.76 | 80.88 | 85.94 | 86.11 |
CNN + static | 90.01 | 83.33 | 86.96 | 83.33 |
CNN + nostatic | 87.50 | 100 | 93.33 | 87.50 |
Method | Precision (%) | Recall (%) | F-score (%) | Accuracy (%) |
---|---|---|---|---|
FedAVg + rand | 89.47 | 85.00 | 87.18 | 83.33 |
FedAvg + static | 81.82 | 81.82 | 81.82 | 73.33 |
FedAvg + nostatic | 100 | 81.82 | 90.00 | 86.67 |
Method | Precision (%) | Recall (%) | F-measure (%) | Accuracy (%) |
---|---|---|---|---|
CAFed + rand | 86.90 | 85.25 | 73.20 | 67.92 |
CAFed + static | 80.00 | 89.75 | 84.59 | 80.00 |
CAFed + nostatic | 90.00 | 81.82 | 85.26 | 86.67 |
Randomization | CAFed-rand (%) | CAFed-static (%) | CAFed-nostatic (%) |
---|---|---|---|
Nonrandomized | |||
\(\beta \) = 0 | 85.27 | 80.00 | 86.67 |
Randomized | |||
\(\beta \) = 0.0001 | 83.67 | 78.00 | 84.00 |
\(\beta \) = 0.001 | 81.25 | 76.27 | 81.62 |
\(\beta \) = 0.01 | 78.63 | 74.00 | 78.00 |
\(\beta \) = 0.05 | 73.00 | 71.23 | 74.00 |
Randomization | FedAvg (%) | CAFed (%) |
---|---|---|
Nonrandomized | ||
\(\beta \) = 0 | 72.23 | 83.26 |
Randomized | ||
\(\beta \) = 0.0001 | 70.26 | 81.75 |
\(\beta \) = 0.001 | 68.14 | 80.00 |
\(\beta \) = 0.01 | 67.75 | 78.13 |
\(\beta \) = 0.05 | 65.00 | 76.00 |
Analysis of experimental results
Frameworks | Technique used | Accuracy (%) | Precision (%) | Recall (%) | F1 score (%) |
---|---|---|---|---|---|
Centralized framework | CNN + rand | 86.11 | 91.67 | 80.88 | 85.94 |
CNN + static | 83.33 | 90.01 | 83.33 | 86.96 | |
CNN + nostatic | 87.50 | 87.50 | 100 | 93.33 | |
Distributed framework | FedAvg + rand | 83.33 | 89.47 | 85.00 | 87.18 |
FedAvg + static | 73.33 | 81.82 | 81.82 | 81.82 | |
FedAvg + nostatic | 86.67 | 100 | 81.82 | 90.00 | |
CAFed + rand | 67.92 | 86.90 | 85.25 | 73.20 | |
CAFed + static | 80.00 | 80.00 | 89.75 | 84.59 | |
CAFed + nostatic | 86.67 | 90.00 | 81.82 | 85.26 |