Skip to main content

2020 | OriginalPaper | Buchkapitel

Towards a Practical Cluster Analysis over Encrypted Data

verfasst von : Jung Hee Cheon, Duhyeong Kim, Jai Hyun Park

Erschienen in: Selected Areas in Cryptography – SAC 2019

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Cluster analysis is one of the most significant unsupervised machine learning methods, and it is being utilized in various fields associated with privacy issues including bioinformatics, finance and image processing. In this paper, we propose a practical solution for privacy-preserving cluster analysis based on homomorphic encryption (HE). Our work is the first HE solution for the mean-shift clustering algorithm. To reduce the super-linear complexity of the original mean-shift algorithm, we adopt a novel random sampling method called dust sampling approach, which perfectly suits with HE and achieves the linear complexity. We also substitute non-polynomial kernels by a new polynomial kernel so that it can be efficiently computed in HE.
The HE implementation of our modified mean-shift clustering algorithm based on the approximate HE scheme HEAAN shows prominent performance in terms of speed and accuracy. It takes approx. 30 min with \(99\%\) accuracy over several public datasets with hundreds of data, and even for the dataset with 262, 144 data, it takes 82 min only when SIMD operations in HEAAN is applied. Our results outperform the previously best known result (SAC 2018) by over 400 times.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Albrecht, M.R., Player, R., Scott, S.: On the concrete hardness of learning with errors. J. Math. Cryptol. 9(3), 169–203 (2015)MathSciNetCrossRef Albrecht, M.R., Player, R., Scott, S.: On the concrete hardness of learning with errors. J. Math. Cryptol. 9(3), 169–203 (2015)MathSciNetCrossRef
4.
Zurück zum Zitat Almutairi, N., Coenen, F., Dures, K.: K-means clustering using homomorphic encryption and an updatable distance matrix: secure third party data clustering with limited data owner interaction. In: Bellatreche, L., Chakravarthy, S. (eds.) DaWaK 2017. LNCS, vol. 10440, pp. 274–285. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64283-3_20CrossRef Almutairi, N., Coenen, F., Dures, K.: K-means clustering using homomorphic encryption and an updatable distance matrix: secure third party data clustering with limited data owner interaction. In: Bellatreche, L., Chakravarthy, S. (eds.) DaWaK 2017. LNCS, vol. 10440, pp. 274–285. Springer, Cham (2017). https://​doi.​org/​10.​1007/​978-3-319-64283-3_​20CrossRef
7.
Zurück zum Zitat Bunn, P., Ostrovsky, R.: Secure two-party k-means clustering. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS 2007, New York, NY, USA, pp. 486–497. ACM (2007) Bunn, P., Ostrovsky, R.: Secure two-party k-means clustering. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS 2007, New York, NY, USA, pp. 486–497. ACM (2007)
11.
Zurück zum Zitat Cheon, J.H., et al.: Toward a secure drone system: flying with real-time homomorphic authenticated encryption. IEEE Access 6, 24325–24339 (2018)CrossRef Cheon, J.H., et al.: Toward a secure drone system: flying with real-time homomorphic authenticated encryption. IEEE Access 6, 24325–24339 (2018)CrossRef
14.
Zurück zum Zitat Cheon, J.H., Kim, D., Kim, D., Lee, H.H., Lee, K.: Numerical methods for comparison on homomorphically encrypted numbers. Cryptology ePrint Archive, Report 2019/417 (2019). https://eprint.iacr.org/2019/417, To appear ASIACRYPT 2019 Cheon, J.H., Kim, D., Kim, D., Lee, H.H., Lee, K.: Numerical methods for comparison on homomorphically encrypted numbers. Cryptology ePrint Archive, Report 2019/417 (2019). https://​eprint.​iacr.​org/​2019/​417, To appear ASIACRYPT 2019
15.
Zurück zum Zitat Cheon, J.H., Kim, D., Kim, Y., Song, Y.: Ensemble method for privacy-preserving logistic regression based on homomorphic encryption. IEEE Access 6, 46938–46948 (2018)CrossRef Cheon, J.H., Kim, D., Kim, Y., Song, Y.: Ensemble method for privacy-preserving logistic regression based on homomorphic encryption. IEEE Access 6, 46938–46948 (2018)CrossRef
18.
Zurück zum Zitat Cho, H., Wu, D.J., Berger, B.: Secure genome-wide association analysis using multiparty computation. Nat. Biotechnol. 36(6), 547 (2018)CrossRef Cho, H., Wu, D.J., Berger, B.: Secure genome-wide association analysis using multiparty computation. Nat. Biotechnol. 36(6), 547 (2018)CrossRef
19.
Zurück zum Zitat Crawford, J.L., Gentry, C., Halevi, S., Platt, D., Shoup, V.: Doing real work with FHE: the case of logistic regression (2018) Crawford, J.L., Gentry, C., Halevi, S., Platt, D., Shoup, V.: Doing real work with FHE: the case of logistic regression (2018)
20.
Zurück zum Zitat Dhillon, I.S., Marcotte, E.M., Roshan, U.: Diametrical clustering for identifying anti-correlated gene clusters. Bioinformatics 19(13), 1612–1619 (2003)CrossRef Dhillon, I.S., Marcotte, E.M., Roshan, U.: Diametrical clustering for identifying anti-correlated gene clusters. Bioinformatics 19(13), 1612–1619 (2003)CrossRef
21.
Zurück zum Zitat Doganay, M.C., Pedersen, T.B., Saygin, Y., Savaş, E., Levi, A.: Distributed privacy preserving k-means clustering with additive secret sharing. In: Proceedings of the 2008 International Workshop on Privacy and Anonymity in Information Society, PAIS 2008, New York, NY, USA, pp. 3–11. ACM (2008) Doganay, M.C., Pedersen, T.B., Saygin, Y., Savaş, E., Levi, A.: Distributed privacy preserving k-means clustering with additive secret sharing. In: Proceedings of the 2008 International Workshop on Privacy and Anonymity in Information Society, PAIS 2008, New York, NY, USA, pp. 3–11. ACM (2008)
22.
Zurück zum Zitat Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Hoboken (2012)MATH Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Hoboken (2012)MATH
23.
Zurück zum Zitat Freedman, D., Kisilev, P.: Fast mean shift by compact density representation. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1818–1825. IEEE (2009) Freedman, D., Kisilev, P.: Fast mean shift by compact density representation. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1818–1825. IEEE (2009)
25.
Zurück zum Zitat Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., Wernsing, J.: CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In: International Conference on Machine Learning, pp. 201–210 (2016) Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., Wernsing, J.: CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In: International Conference on Machine Learning, pp. 201–210 (2016)
26.
Zurück zum Zitat Goldschmidt, R.E.: Applications of division by convergence. Ph.D. thesis, Massachusetts Institute of Technology (1964) Goldschmidt, R.E.: Applications of division by convergence. Ph.D. thesis, Massachusetts Institute of Technology (1964)
27.
Zurück zum Zitat Han, K., Hong, S., Cheon, J.H., Park, D.: Logistic regression on homomorphic encrypted data at scale (2019)CrossRef Han, K., Hong, S., Cheon, J.H., Park, D.: Logistic regression on homomorphic encrypted data at scale (2019)CrossRef
28.
Zurück zum Zitat Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD 2005, New York, NY, USA, pp. 593–599. ACM (2005) Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD 2005, New York, NY, USA, pp. 593–599. ACM (2005)
30.
Zurück zum Zitat Kim, A., Song, Y., Kim, M., Lee, K., Cheon, J.H.: Logistic regression model training based on the approximate homomorphic encryption. BMC Med. Genomics 11(4), 83 (2018)CrossRef Kim, A., Song, Y., Kim, M., Lee, K., Cheon, J.H.: Logistic regression model training based on the approximate homomorphic encryption. BMC Med. Genomics 11(4), 83 (2018)CrossRef
31.
Zurück zum Zitat Kim, M., Song, Y., Wang, S., Xia, Y., Jiang, X.: Secure logistic regression based on homomorphic encryption: design and evaluation. JMIR Med. Inform. 6(2), e19 (2018)CrossRef Kim, M., Song, Y., Wang, S., Xia, Y., Jiang, X.: Secure logistic regression based on homomorphic encryption: design and evaluation. JMIR Med. Inform. 6(2), e19 (2018)CrossRef
33.
Zurück zum Zitat Liu, X., et al.: Outsourcing two-party privacy preserving k-means clustering protocol in wireless sensor networks. In: 2015 11th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN), pp. 124–133. IEEE (2015) Liu, X., et al.: Outsourcing two-party privacy preserving k-means clustering protocol in wireless sensor networks. In: 2015 11th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN), pp. 124–133. IEEE (2015)
34.
Zurück zum Zitat Malik, M.B., Ghazi, M.A., Ali, R.: Privacy preserving data mining techniques: current scenario and future prospects. In: 2012 Third International Conference on Computer and Communication Technology (ICCCT), pp. 26–32. IEEE (2012) Malik, M.B., Ghazi, M.A., Ali, R.: Privacy preserving data mining techniques: current scenario and future prospects. In: 2012 Third International Conference on Computer and Communication Technology (ICCCT), pp. 26–32. IEEE (2012)
35.
Zurück zum Zitat Meskine, F., Nait-Bahloul, S.: Privacy preserving k-means clustering: a survey research. Int. Arab J. Inf. Technol. 9, 03 (2012) Meskine, F., Nait-Bahloul, S.: Privacy preserving k-means clustering: a survey research. Int. Arab J. Inf. Technol. 9, 03 (2012)
36.
Zurück zum Zitat Pouget, F., Dacier, M., et al.: Honeypot-based forensics. In: AusCERT Asia Pacific Information Technology Security Conference (2004) Pouget, F., Dacier, M., et al.: Honeypot-based forensics. In: AusCERT Asia Pacific Information Technology Security Conference (2004)
37.
Zurück zum Zitat Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRef Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRef
39.
Zurück zum Zitat Samet, S., Miri, A., Orozco-Barbosa, L.: Privacy preserving k-means clustering in multi-party environment, January 2007 Samet, S., Miri, A., Orozco-Barbosa, L.: Privacy preserving k-means clustering in multi-party environment, January 2007
40.
Zurück zum Zitat Su, C., Bao, F., Zhou, J., Takagi, T., Sakurai, K.: Privacy-preserving two-party k-means clustering via secure approximation. In: Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops - Volume 01, AINAW 2007, Washington, DC, USA, pp. 385–391. IEEE Computer Society (2007) Su, C., Bao, F., Zhou, J., Takagi, T., Sakurai, K.: Privacy-preserving two-party k-means clustering via secure approximation. In: Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops - Volume 01, AINAW 2007, Washington, DC, USA, pp. 385–391. IEEE Computer Society (2007)
41.
Zurück zum Zitat Sugar, C.A., James, G.M.: Finding the number of clusters in a dataset: an information-theoretic approach. J. Am. Stat. Assoc. 98(463), 750–763 (2003)MathSciNetCrossRef Sugar, C.A., James, G.M.: Finding the number of clusters in a dataset: an information-theoretic approach. J. Am. Stat. Assoc. 98(463), 750–763 (2003)MathSciNetCrossRef
43.
Zurück zum Zitat Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, New York, NY, USA, pp. 206–215. ACM (2003) Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, New York, NY, USA, pp. 206–215. ACM (2003)
44.
Zurück zum Zitat Vinoth, K.J., Santhi, V.: A brief survey on privacy preserving techniques in data mining. IOSR J. Comput. Eng. (IOSR-JCE) 18, 47–51 (2016) Vinoth, K.J., Santhi, V.: A brief survey on privacy preserving techniques in data mining. IOSR J. Comput. Eng. (IOSR-JCE) 18, 47–51 (2016)
45.
Zurück zum Zitat Wang, S., et al.: HEALER: homomorphic computation of exact logistic regression for secure rare disease variants analysis in GWAS. Bioinformatics 32(2), 211–218 (2016) Wang, S., et al.: HEALER: homomorphic computation of exact logistic regression for secure rare disease variants analysis in GWAS. Bioinformatics 32(2), 211–218 (2016)
46.
Zurück zum Zitat Wang, Y.: Notes on two fully homomorphic encryption schemes without bootstrapping. IACR Cryptology ePrint Archive, 2015:519 (2015) Wang, Y.: Notes on two fully homomorphic encryption schemes without bootstrapping. IACR Cryptology ePrint Archive, 2015:519 (2015)
Metadaten
Titel
Towards a Practical Cluster Analysis over Encrypted Data
verfasst von
Jung Hee Cheon
Duhyeong Kim
Jai Hyun Park
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-38471-5_10

Premium Partner