Top

Published in:

2021 | OriginalPaper | Chapter

Efficient Privacy Preserving Distributed K-Means for Non-IID Data

Authors : André Brandão, Ricardo Mendes, João P. Vilela

Published in: Advances in Intelligent Data Analysis XIX

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Privacy is becoming a crucial requirement in many machine learning systems. In this paper we introduce an efficient and secure distributed K-Means algorithm, that is robust to non-IID data. The base idea of our proposal consists in each client computing the K-Means algorithm locally, with a variable number of clusters. The server will use the resultant centroids to apply the K-Means algorithm again, discovering the global centroids. To maintain the client’s privacy, homomorphic encryption and secure aggregation is used in the process of learning the global centroids. This algorithm is efficient and reduces transmission costs, since only the local centroids are used to find the global centroids. In our experimental evaluation, we demonstrate that our strategy achieves a similar performance to the centralized version even in cases where the data follows an extreme non-IID form.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter The Compromise of Data Privacy in Predictive Performance

https://github.com/deric/clustering-benchmark.

We assume a constant time complexity for multiplication between the encrypted centroids and the plaintext global centroids, according to [20].

Bonawitz, K., et al.: Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191 (2017)

Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40(1), 200–210 (2013)CrossRef

Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. In: 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), pp. 88–93. IEEE (2013)

Farrand, T., Mireshghallah, F., Singh, S., Trask, A.: Neither private nor fair: impact of data imbalance on utility and fairness in differential privacy. In: Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, pp. 15–19 (2020)

Graepel, T., Lauter, K., Naehrig, M.: ML confidential: machine learning on encrypted data. In: Kwon, T., Lee, M.-K., Kwon, D. (eds.) ICISC 2012. LNCS, vol. 7839, pp. 1–21. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37682-5_1CrossRef

Hu, X., et al.: Privacy-preserving K-means clustering upon negative databases. In: Cheng, L., Leung, A.C.S., Ozawa, S. (eds.) ICONIP 2018. LNCS, vol. 11304, pp. 191–204. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04212-7_17CrossRef

Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRef

Jahangiri, A., Rakha, H.A.: Applying machine learning techniques to transportation mode recognition using mobile phone sensor data. IEEE Trans. Intell. Transp. Syst. 16(5), 2406–2417 (2015)CrossRef

Januzaj, E., Kriegel, H.P., Pfeifle, M.: Towards effective and efficient distributed clustering. In: Workshop on Clustering Large Data Sets (ICDM2003), Vol. 60 (2003)

10.

Jiang, Z.L., et al.: Efficient two-party privacy-preserving collaborative k-means clustering protocol supporting both storage and computation outsourcing. Inf. Sci. 518, 168–180 (2020)MathSciNetCrossRef

11.

Liu, B., et al.: Follow my recommendations: a personalized privacy assistant for mobile app permissions. In: Twelfth Symposium on Usable Privacy and Security (SOUPS 2016), pp. 27–41 (2016)

12.

Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)MathSciNetCrossRef

13.

Lu, Z., Shen, H.: A convergent differentially private k-means clustering algorithm. In: Yang, Q., Zhou, Z.-H., Gong, Z., Zhang, M.-L., Huang, S.-J. (eds.) PAKDD 2019. LNCS (LNAI), vol. 11439, pp. 612–624. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16148-4_47CrossRef

14.

McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 54, pp. 1273–1282. PMLR (2017)

15.

Navidi, W., Murphy Jr., W.S., Hereman, W.: Statistical methods in surveying by trilateration. Comput. Stat. Data Anal. 27(2), 209–227 (1998)CrossRef

16.

Palacio-Niño, J., Berzal, F.: Evaluation metrics for unsupervised learning algorithms. arXiv preprint arXiv:1905.05667 (2019)

17.

Sarker, I.H., Hoque, M.M., Uddin, M.K., Alsanoosy, T.: Mobile data science and intelligent apps: concepts, AI-based modeling and research directions. Mob. Netw. Appl. 1–19 (2020). https://doi.org/10.1007/s11036-020-01650-z

18.

Schellekens, V., Chatalic, A., Houssiau, F., De Montjoye, Y.A., Jacques, L., Gribonval, R.: Differentially private compressive k-means. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7933–7937. IEEE (2019)

19.

Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web. p. 1177–1178. WWW 2010, Association for Computing Machinery (2010)

20.

Microsoft SEAL (release 3.5), Microsoft Research, Redmond, WA (2020)

21.

Soliman, A., Girdzijauskas, S., Bouguelia, M.-R., Pashami, S., Nowaczyk, S.: Decentralized and adaptive K-means clustering for non-IID data using hyperLogLog counters. In: Lauw, H.W., Wong, R.C.-W., Ntoulas, A., Lim, E.-P., Ng, S.-K., Pan, S.J. (eds.) PAKDD 2020. LNCS (LNAI), vol. 12084, pp. 343–355. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47426-3_27CrossRef

22.

Su, D., Cao, J., Li, N., Bertino, E., Jin, H.: Differentially private k-means clustering. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 26–37. ACM (2016)

23.

Thiagarajan, A., et al.: Vtrack: accurate, energy-aware road traffic delay estimation using mobile phones. In: Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems, SenSys 2009, pp. 85–98. Association for Computing Machinery (2009)

24.

Triebe, O.J., Rajagopal, R.: Federated K-Means: clustering algorithm and proof of concept (2020)

25.

Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. KDD 2003 (2003)

26.

Xing, K., Hu, C., Yu, J., Cheng, X., Zhang, F.: Mutual privacy preserving \( k \)-means clustering in social participatory sensing. IEEE Trans. Ind. Inform. 13(4), 2066–2076 (2017)CrossRef

27.

Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)CrossRef

28.

Yin, H., Zhang, J., Xiong, Y., Huang, X., Deng, T.: PPK-means: achieving privacy-preserving clustering over encrypted multi-dimensional cloud data. Electronics 7(11), 310 (2018)CrossRef

29.

Yuan, C., Yang, H.: Research on k-value selection method of k-means clustering algorithm. J. Multi. Sci. J. 2(2), 226–235 (2019)

30.

Yuan, J., Tian, Y.: Practical privacy-preserving mapreduce based k-means clustering over large-scale dataset. IEEE Trans. Cloud Comput. 7(2), 568–579 (2019)CrossRef

31.

Zhang, W., Li, C., Peng, G., Chen, Y., Zhang, Z.: A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load. Mech. Syst. Signal Process. 100, 439–453 (2018)CrossRef

Title: Efficient Privacy Preserving Distributed K-Means for Non-IID Data
Authors: André Brandão
Ricardo Mendes
João P. Vilela
Publisher: Springer International Publishing
Book: Advances in Intelligent Data Analysis XIX
Print ISBN: 978-3-030-74250-8

Electronic ISBN: 978-3-030-74251-5

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-3-030-74251-5_35

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner