Skip to main content
Erschienen in: Peer-to-Peer Networking and Applications 5/2016

01.09.2016

A semi-supervised privacy-preserving clustering algorithm for healthcare

verfasst von: Meiyu Huang, Yiqiang Chen, Bo-Wei Chen, Junfa Liu, Seungmin Rho, Wen Ji

Erschienen in: Peer-to-Peer Networking and Applications | Ausgabe 5/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the proliferation of healthcare data, the cloud mining technology for E-health services and applications has become a hot research topic. While on the other hand, these rapidly evolving cloud mining technologies and their deployment in healthcare systems also pose potential threats to patient’s data privacy. In order to solve the privacy problem in the cloud mining technique, this paper proposes a semi-supervised privacy-preserving clustering algorithm. By employing a small amount of supervised information, the method first learns a Large Margin Nearest Cluster metric using convex optimization. Then according to the trained metric, the method imposes multiplicative perturbation on the original data, which can change the distribution shape of the original data and thus protect the privacy information as well as ensuring high data usability. The experimental results on the brain fiber dataset provided by the 2009 PBC demonstrated that the proposed method could not only protect data privacy towards secure attacks, but improve the clustering purity.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Wang L, Alexander CA (2014) Telemedicine based on mobile devices and mobile cloud computing. Int J Cloud Comput Serv Sci 3(1):26–36 Wang L, Alexander CA (2014) Telemedicine based on mobile devices and mobile cloud computing. Int J Cloud Comput Serv Sci 3(1):26–36
2.
Zurück zum Zitat Sultan N (2014) Making use of cloud computing for healthcare provision: opportunities and challenges. Int J Inf Manag 34(2):177–184CrossRef Sultan N (2014) Making use of cloud computing for healthcare provision: opportunities and challenges. Int J Inf Manag 34(2):177–184CrossRef
3.
Zurück zum Zitat Uniyal D, Raychoudhury V (2014) Pervasive healthcare-a comprehensive survey of tools and techniques. Clin Orthop Relat Res Uniyal D, Raychoudhury V (2014) Pervasive healthcare-a comprehensive survey of tools and techniques. Clin Orthop Relat Res
4.
Zurück zum Zitat Jeong S, Kim Y-W, Youn C-H (2014) Personalized healthcare system for chronic disease care in cloud environment. J Electron Telecommunications Res Inst 36(5):730–740 Jeong S, Kim Y-W, Youn C-H (2014) Personalized healthcare system for chronic disease care in cloud environment. J Electron Telecommunications Res Inst 36(5):730–740
5.
Zurück zum Zitat Meyer J, Boll S (2014) Digital health devices for everyone! IEEE Pervasive Comput 13(2):10–13CrossRef Meyer J, Boll S (2014) Digital health devices for everyone! IEEE Pervasive Comput 13(2):10–13CrossRef
6.
Zurück zum Zitat Min J-K, Doryab A, Wiese J, Amini S, Zimmerman J, Hong JI (2014) Toss’n’turn: smartphone as sleep and sleep quality detector. In Proceedings of the 32nd annual ACM conference on human factors in computing systems. ACM 477–486 Min J-K, Doryab A, Wiese J, Amini S, Zimmerman J, Hong JI (2014) Toss’n’turn: smartphone as sleep and sleep quality detector. In Proceedings of the 32nd annual ACM conference on human factors in computing systems. ACM 477–486
7.
Zurück zum Zitat Banu PN, Andrews S (2015) Performance analysis of hard and soft clustering approaches for gene expression data. Int J Rough Sets Data Anal 2(1):58–69CrossRef Banu PN, Andrews S (2015) Performance analysis of hard and soft clustering approaches for gene expression data. Int J Rough Sets Data Anal 2(1):58–69CrossRef
8.
Zurück zum Zitat Yuan B, Herbert J (2014) Context-aware hybrid reasoning framework for pervasive health- care. Pers Ubiquit Comput 18(4):865–881CrossRef Yuan B, Herbert J (2014) Context-aware hybrid reasoning framework for pervasive health- care. Pers Ubiquit Comput 18(4):865–881CrossRef
9.
Zurück zum Zitat Theoharidou M, Tsalis N, Gritzalis D (2014) Smart home solutions for healthcare: privacy in ubiquitous computing infrastructures. Handbook of smart homes, health care and well-being Theoharidou M, Tsalis N, Gritzalis D (2014) Smart home solutions for healthcare: privacy in ubiquitous computing infrastructures. Handbook of smart homes, health care and well-being
10.
Zurück zum Zitat Avancha S, Baxi A, Kotz D (2012) Privacy in mobile technology for personal healthcare. ACM Comput Surv 45(1):3CrossRef Avancha S, Baxi A, Kotz D (2012) Privacy in mobile technology for personal healthcare. ACM Comput Surv 45(1):3CrossRef
11.
Zurück zum Zitat Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM Sigmod Rec 33(1):50–57CrossRef Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM Sigmod Rec 33(1):50–57CrossRef
12.
Zurück zum Zitat Chhinkaniwala H, Garg S (2014) Privacy preserving data mining-issues & techniques: preserving privacy of data streams and large data sets while mining. Scholars Press Chhinkaniwala H, Garg S (2014) Privacy preserving data mining-issues & techniques: preserving privacy of data streams and large data sets while mining. Scholars Press
13.
Zurück zum Zitat Wang B, Yang J (2011) The state of the art and tendency of privacy preserving data mining. In International Conference on E-Business and E-Government. IEEE 1–3 Wang B, Yang J (2011) The state of the art and tendency of privacy preserving data mining. In International Conference on E-Business and E-Government. IEEE 1–3
14.
Zurück zum Zitat Keyvanpour MR, Moradi SS (2014) A perturbation method based on singular value decomposition and feature selection for privacy preserving data mining. Int J Data Warehouse Min 10(1):55–76CrossRef Keyvanpour MR, Moradi SS (2014) A perturbation method based on singular value decomposition and feature selection for privacy preserving data mining. Int J Data Warehouse Min 10(1):55–76CrossRef
15.
Zurück zum Zitat Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative data pertur- bation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1):92–106CrossRef Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative data pertur- bation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1):92–106CrossRef
16.
Zurück zum Zitat Saygin Y, Verykios VS, Elmagarmid AK (2002) Privacy preserving association rule mining. In Proceedings of twelfth international workshop on research issues in data engineering: engineering e-commerce/e-business systems. IEEE 151–158 Saygin Y, Verykios VS, Elmagarmid AK (2002) Privacy preserving association rule mining. In Proceedings of twelfth international workshop on research issues in data engineering: engineering e-commerce/e-business systems. IEEE 151–158
17.
Zurück zum Zitat Fienberg SE, McIntyre J (2005) Data swapping: variations on a theme by dalenius and reiss. J Off Stat 21(2):309 Fienberg SE, McIntyre J (2005) Data swapping: variations on a theme by dalenius and reiss. J Off Stat 21(2):309
18.
Zurück zum Zitat Oliveira SR, Za¨ıane OR (2004) Achieving privacy preservation when sharing data for clustering. In secure data management. Springer 67–82 Oliveira SR, Za¨ıane OR (2004) Achieving privacy preservation when sharing data for clustering. In secure data management. Springer 67–82
20.
Zurück zum Zitat Han J, Kamber M (2006) Data mining, Southeast Asia edition: concepts and techniques. Morgan kaufmann Han J, Kamber M (2006) Data mining, Southeast Asia edition: concepts and techniques. Morgan kaufmann
21.
Zurück zum Zitat Tomar D, Agarwal S (2013) A survey on data mining approaches for healthcare. Int J Bio Sci Bio Technol 5(5):241–266CrossRef Tomar D, Agarwal S (2013) A survey on data mining approaches for healthcare. Int J Bio Sci Bio Technol 5(5):241–266CrossRef
22.
Zurück zum Zitat Kumar V, Park H, Basole RC, Braunstein M, Kahng M, Chau DH, Tamersoy A, Hirsh DA, Serban N, Bost J et al (2014) Exploring clinical care processes using visual and data analytics: challenges and opportunities. In Proceedings of the 20th ACM SIGKDD conference on knowledge discovery and data mining workshop on data science for social good Kumar V, Park H, Basole RC, Braunstein M, Kahng M, Chau DH, Tamersoy A, Hirsh DA, Serban N, Bost J et al (2014) Exploring clinical care processes using visual and data analytics: challenges and opportunities. In Proceedings of the 20th ACM SIGKDD conference on knowledge discovery and data mining workshop on data science for social good
23.
Zurück zum Zitat Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. Appl Stat 100–108 Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. Appl Stat 100–108
24.
Zurück zum Zitat Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Disc 2(3):283–304CrossRef Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Disc 2(3):283–304CrossRef
25.
Zurück zum Zitat Ball GH, Hall DJ (1965) Isodata, a novel method of data analysis and pattern classification. DTIC document. Technol Rep Ball GH, Hall DJ (1965) Isodata, a novel method of data analysis and pattern classification. DTIC document. Technol Rep
26.
Zurück zum Zitat Kaushik K, Kapoor D, Varadharajan V, Nallusamy R (2014) Disease management: clustering–based disease prediction. Int J Collab Enterp 4(1):69–82CrossRef Kaushik K, Kapoor D, Varadharajan V, Nallusamy R (2014) Disease management: clustering–based disease prediction. Int J Collab Enterp 4(1):69–82CrossRef
27.
Zurück zum Zitat Vesanto J, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11(3):586–600CrossRef Vesanto J, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11(3):586–600CrossRef
28.
Zurück zum Zitat Hajihashemi Z, Yefimova M, Popescu M (2014) Detecting daily routines of older adults using sensor time series clustering. In Proceedings of the 36th annual IEEE international conference on engineering in medicine and biology society. IEEE 5912–5915 Hajihashemi Z, Yefimova M, Popescu M (2014) Detecting daily routines of older adults using sensor time series clustering. In Proceedings of the 36th annual IEEE international conference on engineering in medicine and biology society. IEEE 5912–5915
29.
Zurück zum Zitat Fahad LG, Tahir SF, Rajarajan M (2014) Activity recognition in smart homes using clustering based classification. In Proceedings of the 22nd international conference on pattern recognition. IEEE 1348–1353 Fahad LG, Tahir SF, Rajarajan M (2014) Activity recognition in smart homes using clustering based classification. In Proceedings of the 22nd international conference on pattern recognition. IEEE 1348–1353
31.
Zurück zum Zitat Haraty RA, Dimishkieh M, Masud M (2015) An enhanced k-means clustering algorithm for pattern discovery in healthcare data. Int J Distrib Sens Netw Haraty RA, Dimishkieh M, Masud M (2015) An enhanced k-means clustering algorithm for pattern discovery in healthcare data. Int J Distrib Sens Netw
32.
Zurück zum Zitat Wang X, Chen M, Kwon TT, Yang L, Leung V (2013) Ames-cloud: a framework of adaptive mobile video streaming and efficient social video sharing in the clouds. IEEE Trans Multimed 15(4):811–820CrossRef Wang X, Chen M, Kwon TT, Yang L, Leung V (2013) Ames-cloud: a framework of adaptive mobile video streaming and efficient social video sharing in the clouds. IEEE Trans Multimed 15(4):811–820CrossRef
33.
Zurück zum Zitat Wan J, Ullah S, Lai C-F, Zhou M, Wang X et al (2013) Cloud-enabled wireless body area networks for pervasive healthcare. IEEE Netw 27(5):56–61CrossRef Wan J, Ullah S, Lai C-F, Zhou M, Wang X et al (2013) Cloud-enabled wireless body area networks for pervasive healthcare. IEEE Netw 27(5):56–61CrossRef
34.
Zurück zum Zitat Raij A, Ghosh A, Kumar S, Srivastava M (2011) Privacy risks emerging from the adoption of innocuous wearable sensors in the mobile environment. In Proceedings of the SIGCHI conference on human factors in computing systems, ser. CHI ’11. New York, NY, USA. ACM 11–20 Raij A, Ghosh A, Kumar S, Srivastava M (2011) Privacy risks emerging from the adoption of innocuous wearable sensors in the mobile environment. In Proceedings of the SIGCHI conference on human factors in computing systems, ser. CHI ’11. New York, NY, USA. ACM 11–20
35.
Zurück zum Zitat Du W, Zhan Z (2003) Using randomized response techniques for privacy-preserving data mining. In Proceedings of the ninth ACM SIGKDD international conference on knowl- edge discovery and data mining. ACM 505–510 Du W, Zhan Z (2003) Using randomized response techniques for privacy-preserving data mining. In Proceedings of the ninth ACM SIGKDD international conference on knowl- edge discovery and data mining. ACM 505–510
36.
Zurück zum Zitat Kalaivani R, Chidambaram S (2014) Additive gaussian noise based data perturbation in multi-level trust privacy preserving data mining. Int J Data Min Knowl Manag Process 4(3):21–29CrossRef Kalaivani R, Chidambaram S (2014) Additive gaussian noise based data perturbation in multi-level trust privacy preserving data mining. Int J Data Min Knowl Manag Process 4(3):21–29CrossRef
37.
Zurück zum Zitat Wieland SC, Cassa CA, Mandl KD, Berger B (2008) Revealing the spatial distribution of a disease while preserving privacy. Proc Natl Acad Sci 105(46):17608–17613CrossRef Wieland SC, Cassa CA, Mandl KD, Berger B (2008) Revealing the spatial distribution of a disease while preserving privacy. Proc Natl Acad Sci 105(46):17608–17613CrossRef
38.
Zurück zum Zitat Elmisery AM, Fu H (2010) Privacy preserving distributed learning clustering of healthcare data using cryptography protocols. In Proceeedings of the 34th Annual IEEE Conference on Computer Software and Applications Workshops. IEEE 140–145 Elmisery AM, Fu H (2010) Privacy preserving distributed learning clustering of healthcare data using cryptography protocols. In Proceeedings of the 34th Annual IEEE Conference on Computer Software and Applications Workshops. IEEE 140–145
39.
Zurück zum Zitat Williams J (2010) Social networking applications in health care: threats to the privacy and security of health information. In Proceedings of the International Conference on Software Engeneering Workshop on Software Engineering in Health Care. ACM 39–49 Williams J (2010) Social networking applications in health care: threats to the privacy and security of health information. In Proceedings of the International Conference on Software Engeneering Workshop on Software Engineering in Health Care. ACM 39–49
40.
Zurück zum Zitat Allab K, Benabdeslem K (2011) Constraint selection for semi-supervised topological clustering. In machine learning and knowledge discovery in databases. Springer 28–43 Allab K, Benabdeslem K (2011) Constraint selection for semi-supervised topological clustering. In machine learning and knowledge discovery in databases. Springer 28–43
41.
Zurück zum Zitat Lange T, Law MH, Jain AK, Buhmann JM (2005) Learning with constrained and unlabelled data. Proc IEEE Comput Soc Conf Comput Vis Pattern Recogn 1:731–738 Lange T, Law MH, Jain AK, Buhmann JM (2005) Learning with constrained and unlabelled data. Proc IEEE Comput Soc Conf Comput Vis Pattern Recogn 1:731–738
42.
Zurück zum Zitat Bekkerman R, Sahami M (2006) Semi-supervised clustering using combinatorial mrfs. In Proceedings of IEEE international conference of machine learning workshop on learn- ing in structured output spaces Bekkerman R, Sahami M (2006) Semi-supervised clustering using combinatorial mrfs. In Proceedings of IEEE international conference of machine learning workshop on learn- ing in structured output spaces
43.
Zurück zum Zitat Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Michigan State Univ 2 Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Michigan State Univ 2
44.
Zurück zum Zitat Guillaumin M, Verbeek J, Schmid C (2010) Multiple instance metric learning from auto- matically labeled bags of faces. In Europeon Conference on Computer Vision. Springer 634–647 Guillaumin M, Verbeek J, Schmid C (2010) Multiple instance metric learning from auto- matically labeled bags of faces. In Europeon Conference on Computer Vision. Springer 634–647
45.
Zurück zum Zitat Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space- level constraints: making the most of prior knowledge in data clustering. In Proceedings of the Nineteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc. 307–314 Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space- level constraints: making the most of prior knowledge in data clustering. In Proceedings of the Nineteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc. 307–314
46.
Zurück zum Zitat Cohn D, Caruana R, McCallum A (2003) Semi-supervised clustering with user feedback. Constrained Cluster Adv AlgorithmsTheory Appl 4(1):17–32MathSciNetMATH Cohn D, Caruana R, McCallum A (2003) Semi-supervised clustering with user feedback. Constrained Cluster Adv AlgorithmsTheory Appl 4(1):17–32MathSciNetMATH
47.
Zurück zum Zitat Wu L, Hoi SC, Jin R, Zhu J, Yu N (2012) Learning bregman distance functions for semi-supervised clustering. IEEE Trans Knowl Data Eng 24(3):478–491CrossRef Wu L, Hoi SC, Jin R, Zhu J, Yu N (2012) Learning bregman distance functions for semi-supervised clustering. IEEE Trans Knowl Data Eng 24(3):478–491CrossRef
48.
Zurück zum Zitat Domeniconi C, Peng J, Yan B (2011) Composite kernels for semi-supervised clustering. Knowl Inf Syst 28(1):99–116CrossRef Domeniconi C, Peng J, Yan B (2011) Composite kernels for semi-supervised clustering. Knowl Inf Syst 28(1):99–116CrossRef
49.
Zurück zum Zitat Chen Y, Rege M, Dong M, Hua J (2007) Incorporating user provided constraints into document clustering. In Proceedings of the Seventh IEEE International Conference on Data Mining. IEEE 103–112 Chen Y, Rege M, Dong M, Hua J (2007) Incorporating user provided constraints into document clustering. In Proceedings of the Seventh IEEE International Conference on Data Mining. IEEE 103–112
50.
Zurück zum Zitat Baghshah MS, Shouraki SB (2010) Kernel-based metric learning for semi-supervised clustering. Neurocomputing 73(7):1352–1361CrossRefMATH Baghshah MS, Shouraki SB (2010) Kernel-based metric learning for semi-supervised clustering. Neurocomputing 73(7):1352–1361CrossRefMATH
51.
Zurück zum Zitat Hoi SC, Jin R, Lyu MR (2007) Learning nonparametric kernel matrices from pairwise constraints. In Proceedings of the 24th International Conference on Machine Learning. ACM 361–368 Hoi SC, Jin R, Lyu MR (2007) Learning nonparametric kernel matrices from pairwise constraints. In Proceedings of the 24th International Conference on Machine Learning. ACM 361–368
52.
Zurück zum Zitat Bar-Hillel A, Hertz T, Shental N, Weinshall D (2003) Learning distance functions using equivalence relations. Proc Tenth Int Conf Mach Learn 3:11–18MATH Bar-Hillel A, Hertz T, Shental N, Weinshall D (2003) Learning distance functions using equivalence relations. Proc Tenth Int Conf Mach Learn 3:11–18MATH
53.
Zurück zum Zitat Xing EP, Jordan MI, Russell S, Ng AY (2002) Distance metric learning with appli- cation to clustering with side-information. In advances in neural information processing systems. 505–512 Xing EP, Jordan MI, Russell S, Ng AY (2002) Distance metric learning with appli- cation to clustering with side-information. In advances in neural information processing systems. 505–512
54.
Zurück zum Zitat Weinberger KQ, Blitzer J, Saul LK (2005) Distance metric learning for large margin nearest neighbor classification. In Advances in neural information processing systems. 1473–1480 Weinberger KQ, Blitzer J, Saul LK (2005) Distance metric learning for large margin nearest neighbor classification. In Advances in neural information processing systems. 1473–1480
55.
56.
57.
Zurück zum Zitat Golub GH, Van Loan CF (2012) Matrix computations. JHU Press 3 Golub GH, Van Loan CF (2012) Matrix computations. JHU Press 3
Metadaten
Titel
A semi-supervised privacy-preserving clustering algorithm for healthcare
verfasst von
Meiyu Huang
Yiqiang Chen
Bo-Wei Chen
Junfa Liu
Seungmin Rho
Wen Ji
Publikationsdatum
01.09.2016
Verlag
Springer US
Erschienen in
Peer-to-Peer Networking and Applications / Ausgabe 5/2016
Print ISSN: 1936-6442
Elektronische ISSN: 1936-6450
DOI
https://doi.org/10.1007/s12083-015-0356-9

Weitere Artikel der Ausgabe 5/2016

Peer-to-Peer Networking and Applications 5/2016 Zur Ausgabe