Skip to main content

2021 | OriginalPaper | Buchkapitel

Efficient Privacy Preserving Distributed K-Means for Non-IID Data

verfasst von : André Brandão, Ricardo Mendes, João P. Vilela

Erschienen in: Advances in Intelligent Data Analysis XIX

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Privacy is becoming a crucial requirement in many machine learning systems. In this paper we introduce an efficient and secure distributed K-Means algorithm, that is robust to non-IID data. The base idea of our proposal consists in each client computing the K-Means algorithm locally, with a variable number of clusters. The server will use the resultant centroids to apply the K-Means algorithm again, discovering the global centroids. To maintain the client’s privacy, homomorphic encryption and secure aggregation is used in the process of learning the global centroids. This algorithm is efficient and reduces transmission costs, since only the local centroids are used to find the global centroids. In our experimental evaluation, we demonstrate that our strategy achieves a similar performance to the centralized version even in cases where the data follows an extreme non-IID form.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
We assume a constant time complexity for multiplication between the encrypted centroids and the plaintext global centroids, according to [20].
 
Literatur
1.
Zurück zum Zitat Bonawitz, K., et al.: Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191 (2017) Bonawitz, K., et al.: Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191 (2017)
2.
Zurück zum Zitat Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40(1), 200–210 (2013)CrossRef Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40(1), 200–210 (2013)CrossRef
3.
Zurück zum Zitat Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. In: 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), pp. 88–93. IEEE (2013) Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. In: 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), pp. 88–93. IEEE (2013)
4.
Zurück zum Zitat Farrand, T., Mireshghallah, F., Singh, S., Trask, A.: Neither private nor fair: impact of data imbalance on utility and fairness in differential privacy. In: Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, pp. 15–19 (2020) Farrand, T., Mireshghallah, F., Singh, S., Trask, A.: Neither private nor fair: impact of data imbalance on utility and fairness in differential privacy. In: Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, pp. 15–19 (2020)
7.
Zurück zum Zitat Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRef Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRef
8.
Zurück zum Zitat Jahangiri, A., Rakha, H.A.: Applying machine learning techniques to transportation mode recognition using mobile phone sensor data. IEEE Trans. Intell. Transp. Syst. 16(5), 2406–2417 (2015)CrossRef Jahangiri, A., Rakha, H.A.: Applying machine learning techniques to transportation mode recognition using mobile phone sensor data. IEEE Trans. Intell. Transp. Syst. 16(5), 2406–2417 (2015)CrossRef
9.
Zurück zum Zitat Januzaj, E., Kriegel, H.P., Pfeifle, M.: Towards effective and efficient distributed clustering. In: Workshop on Clustering Large Data Sets (ICDM2003), Vol. 60 (2003) Januzaj, E., Kriegel, H.P., Pfeifle, M.: Towards effective and efficient distributed clustering. In: Workshop on Clustering Large Data Sets (ICDM2003), Vol. 60 (2003)
10.
Zurück zum Zitat Jiang, Z.L., et al.: Efficient two-party privacy-preserving collaborative k-means clustering protocol supporting both storage and computation outsourcing. Inf. Sci. 518, 168–180 (2020)MathSciNetCrossRef Jiang, Z.L., et al.: Efficient two-party privacy-preserving collaborative k-means clustering protocol supporting both storage and computation outsourcing. Inf. Sci. 518, 168–180 (2020)MathSciNetCrossRef
11.
Zurück zum Zitat Liu, B., et al.: Follow my recommendations: a personalized privacy assistant for mobile app permissions. In: Twelfth Symposium on Usable Privacy and Security (SOUPS 2016), pp. 27–41 (2016) Liu, B., et al.: Follow my recommendations: a personalized privacy assistant for mobile app permissions. In: Twelfth Symposium on Usable Privacy and Security (SOUPS 2016), pp. 27–41 (2016)
14.
Zurück zum Zitat McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 54, pp. 1273–1282. PMLR (2017) McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 54, pp. 1273–1282. PMLR (2017)
15.
Zurück zum Zitat Navidi, W., Murphy Jr., W.S., Hereman, W.: Statistical methods in surveying by trilateration. Comput. Stat. Data Anal. 27(2), 209–227 (1998)CrossRef Navidi, W., Murphy Jr., W.S., Hereman, W.: Statistical methods in surveying by trilateration. Comput. Stat. Data Anal. 27(2), 209–227 (1998)CrossRef
16.
18.
Zurück zum Zitat Schellekens, V., Chatalic, A., Houssiau, F., De Montjoye, Y.A., Jacques, L., Gribonval, R.: Differentially private compressive k-means. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7933–7937. IEEE (2019) Schellekens, V., Chatalic, A., Houssiau, F., De Montjoye, Y.A., Jacques, L., Gribonval, R.: Differentially private compressive k-means. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7933–7937. IEEE (2019)
19.
Zurück zum Zitat Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web. p. 1177–1178. WWW 2010, Association for Computing Machinery (2010) Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web. p. 1177–1178. WWW 2010, Association for Computing Machinery (2010)
20.
Zurück zum Zitat Microsoft SEAL (release 3.5), Microsoft Research, Redmond, WA (2020) Microsoft SEAL (release 3.5), Microsoft Research, Redmond, WA (2020)
21.
Zurück zum Zitat Soliman, A., Girdzijauskas, S., Bouguelia, M.-R., Pashami, S., Nowaczyk, S.: Decentralized and adaptive K-means clustering for non-IID data using hyperLogLog counters. In: Lauw, H.W., Wong, R.C.-W., Ntoulas, A., Lim, E.-P., Ng, S.-K., Pan, S.J. (eds.) PAKDD 2020. LNCS (LNAI), vol. 12084, pp. 343–355. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47426-3_27CrossRef Soliman, A., Girdzijauskas, S., Bouguelia, M.-R., Pashami, S., Nowaczyk, S.: Decentralized and adaptive K-means clustering for non-IID data using hyperLogLog counters. In: Lauw, H.W., Wong, R.C.-W., Ntoulas, A., Lim, E.-P., Ng, S.-K., Pan, S.J. (eds.) PAKDD 2020. LNCS (LNAI), vol. 12084, pp. 343–355. Springer, Cham (2020). https://​doi.​org/​10.​1007/​978-3-030-47426-3_​27CrossRef
22.
Zurück zum Zitat Su, D., Cao, J., Li, N., Bertino, E., Jin, H.: Differentially private k-means clustering. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 26–37. ACM (2016) Su, D., Cao, J., Li, N., Bertino, E., Jin, H.: Differentially private k-means clustering. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 26–37. ACM (2016)
23.
Zurück zum Zitat Thiagarajan, A., et al.: Vtrack: accurate, energy-aware road traffic delay estimation using mobile phones. In: Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems, SenSys 2009, pp. 85–98. Association for Computing Machinery (2009) Thiagarajan, A., et al.: Vtrack: accurate, energy-aware road traffic delay estimation using mobile phones. In: Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems, SenSys 2009, pp. 85–98. Association for Computing Machinery (2009)
24.
Zurück zum Zitat Triebe, O.J., Rajagopal, R.: Federated K-Means: clustering algorithm and proof of concept (2020) Triebe, O.J., Rajagopal, R.: Federated K-Means: clustering algorithm and proof of concept (2020)
25.
Zurück zum Zitat Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. KDD 2003 (2003) Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. KDD 2003 (2003)
26.
Zurück zum Zitat Xing, K., Hu, C., Yu, J., Cheng, X., Zhang, F.: Mutual privacy preserving \( k \)-means clustering in social participatory sensing. IEEE Trans. Ind. Inform. 13(4), 2066–2076 (2017)CrossRef Xing, K., Hu, C., Yu, J., Cheng, X., Zhang, F.: Mutual privacy preserving \( k \)-means clustering in social participatory sensing. IEEE Trans. Ind. Inform. 13(4), 2066–2076 (2017)CrossRef
27.
Zurück zum Zitat Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)CrossRef Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)CrossRef
28.
Zurück zum Zitat Yin, H., Zhang, J., Xiong, Y., Huang, X., Deng, T.: PPK-means: achieving privacy-preserving clustering over encrypted multi-dimensional cloud data. Electronics 7(11), 310 (2018)CrossRef Yin, H., Zhang, J., Xiong, Y., Huang, X., Deng, T.: PPK-means: achieving privacy-preserving clustering over encrypted multi-dimensional cloud data. Electronics 7(11), 310 (2018)CrossRef
29.
Zurück zum Zitat Yuan, C., Yang, H.: Research on k-value selection method of k-means clustering algorithm. J. Multi. Sci. J. 2(2), 226–235 (2019) Yuan, C., Yang, H.: Research on k-value selection method of k-means clustering algorithm. J. Multi. Sci. J. 2(2), 226–235 (2019)
30.
Zurück zum Zitat Yuan, J., Tian, Y.: Practical privacy-preserving mapreduce based k-means clustering over large-scale dataset. IEEE Trans. Cloud Comput. 7(2), 568–579 (2019)CrossRef Yuan, J., Tian, Y.: Practical privacy-preserving mapreduce based k-means clustering over large-scale dataset. IEEE Trans. Cloud Comput. 7(2), 568–579 (2019)CrossRef
31.
Zurück zum Zitat Zhang, W., Li, C., Peng, G., Chen, Y., Zhang, Z.: A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load. Mech. Syst. Signal Process. 100, 439–453 (2018)CrossRef Zhang, W., Li, C., Peng, G., Chen, Y., Zhang, Z.: A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load. Mech. Syst. Signal Process. 100, 439–453 (2018)CrossRef
Metadaten
Titel
Efficient Privacy Preserving Distributed K-Means for Non-IID Data
verfasst von
André Brandão
Ricardo Mendes
João P. Vilela
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-74251-5_35