Skip to main content

2016 | OriginalPaper | Buchkapitel

Cloud Based K-Means Clustering Running as a MapReduce Job for Big Data Healthcare Analytics Using Apache Mahout

verfasst von : Sreekanth Rallapalli, R. R. Gondkar, Golajapu Venu Madhava Rao

Erschienen in: Information Systems Design and Intelligent Applications

Verlag: Springer India

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Increase in data volume and need for analytics has led towards innovation of big data. To speed up the query responses models like NoSQL has emerged. Virtualized platforms using commodity hardware and implementing Hadoop on it helps small and midsized companies use cloud environment. This will help organizations to decrease the cost for data processing and analytics. As health care generating volumes and variety of data it is required to build parallel algorithms that can support petabytes of data using hadoop and MapReduce parallel processing. K-means clustering is one of the methods for parallel algorithm. In order to build an accurate system large data sets need to be considered. Memory requirement increases with large data sets and algorithms become slow. Mahout scalable algorithms developed works better with huge data sets and improve the performance of the system. Mahout is an open source and can be used to solve problems arising with huge data sets. This paper proposes cloud based K-means clustering running as a MapReduce job. We use health care data on cloud for clustering. We then compare the results with various measures to conclude the best measure to find number of vectors in a given cluster.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat T Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Wu, “An efficient K-means clustering algorithm: Analysis and implementation”, Pattern Analysis and Machine Intelligence, IEEE Transactions, Vol 24, No 7, pp. 881–892, 2002. T Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Wu, “An efficient K-means clustering algorithm: Analysis and implementation”, Pattern Analysis and Machine Intelligence, IEEE Transactions, Vol 24, No 7, pp. 881–892, 2002.
2.
Zurück zum Zitat White, T: Hadoop the definitive guide, O’Reilly Media, 2009. White, T: Hadoop the definitive guide, O’Reilly Media, 2009.
3.
Zurück zum Zitat Fredrik Farnstorm, J: Scalability for clustering algorithms revisited—SIGKDD Explorations, 2002, 2, pp. 51–57. Fredrik Farnstorm, J: Scalability for clustering algorithms revisited—SIGKDD Explorations, 2002, 2, pp. 51–57.
4.
Zurück zum Zitat Rui Maximo Esteves, Chunming Rong, Rui Pais: K-means clustering in the cloud—a Mahout test, IEEE 2011 Workshops of international conference on Advanced information networking and application, pp. 514–519. Rui Maximo Esteves, Chunming Rong, Rui Pais: K-means clustering in the cloud—a Mahout test, IEEE 2011 Workshops of international conference on Advanced information networking and application, pp. 514–519.
6.
Zurück zum Zitat Jain, A.K. and R.C Dubes, 1998: Algorithms for Clustering Data, Prentince Hall, New Jersy. Jain, A.K. and R.C Dubes, 1998: Algorithms for Clustering Data, Prentince Hall, New Jersy.
7.
Zurück zum Zitat Dweepna Garg, Kushboo Trivedi, Fuzzy k-mean clustering in MapReduce on cloud based Hadoop, 2014 IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT). Dweepna Garg, Kushboo Trivedi, Fuzzy k-mean clustering in MapReduce on cloud based Hadoop, 2014 IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT).
8.
Zurück zum Zitat Lin Gu, Zhonghua sheng, Zhiqiang Ma, Xiang Gao, Charles Zhang, Yaohui Jin: K Means of cloud computing: MapReduce, DVM, and windows Azure, Fourth International Conference on Cloud Computing, GRIDs, and Virtualization (cloud computing 2013). May 27–June 1, 2013, Valencia, Spain. Lin Gu, Zhonghua sheng, Zhiqiang Ma, Xiang Gao, Charles Zhang, Yaohui Jin: K Means of cloud computing: MapReduce, DVM, and windows Azure, Fourth International Conference on Cloud Computing, GRIDs, and Virtualization (cloud computing 2013). May 27–June 1, 2013, Valencia, Spain.
10.
Zurück zum Zitat Sean Owen, Robin Anil, Ted Dunning, Ellen Friedman, Mahout in Action by Manning Shelter Island. Sean Owen, Robin Anil, Ted Dunning, Ellen Friedman, Mahout in Action by Manning Shelter Island.
11.
Zurück zum Zitat J. Dean and S. Ghemawat, “MapReduce simplified data processing on large clusters”, In Proc. Of the 6th Symposium on OS design and implementation (OSDI’04), Berkely, CA, USA, 2004, pp. 137–149. J. Dean and S. Ghemawat, “MapReduce simplified data processing on large clusters”, In Proc. Of the 6th Symposium on OS design and implementation (OSDI’04), Berkely, CA, USA, 2004, pp. 137–149.
Metadaten
Titel
Cloud Based K-Means Clustering Running as a MapReduce Job for Big Data Healthcare Analytics Using Apache Mahout
verfasst von
Sreekanth Rallapalli
R. R. Gondkar
Golajapu Venu Madhava Rao
Copyright-Jahr
2016
Verlag
Springer India
DOI
https://doi.org/10.1007/978-81-322-2755-7_14

Premium Partner