Skip to main content
Top

2015 | OriginalPaper | Chapter

A Novel Clustering Approach Using Hadoop Distributed Environment

Authors : Nagesh Vadaparthi, P. Srinivas Rao, Y. Srinivas, M. Athmaja

Published in: Computational Intelligence Techniques for Comparative Genomics

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Nowadays, information retrieval plays a vital role by allowing users to retrieve documents of their interest based on relevance score. Such systems can be implemented either in distributed systems or parallel systems to achieve high throughput. If such kind of framework is deployed in a cloud, grouping of relevant documents is essential to retrieve documents of interest. Hence, an efficient and scalable clustering is required to process huge volume of documents. To handle huge documents and to provide scalability while processing Apache Hadoop is efficient with its powerful feature map reduce. Hence, in this paper, a novel approach is proposed that is capable of clustering bulk data with high throughput. This paper also demonstrates the need of parallel caching approach for obtaining effective results.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
2.
go back to reference Ye K et al (2012) vHadoop: a scalable hadoop virtual cluster platform for mapreduce-based parallel machine learning with performance consideration. In: IEEE international conference on cluster computing workshops, pp 152–160 Ye K et al (2012) vHadoop: a scalable hadoop virtual cluster platform for mapreduce-based parallel machine learning with performance consideration. In: IEEE international conference on cluster computing workshops, pp 152–160
3.
go back to reference Dean J et al (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef Dean J et al (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef
4.
go back to reference White T (2010) Hadoop: the definitive guide. Yahoo Press White T (2010) Hadoop: the definitive guide. Yahoo Press
5.
go back to reference Vadaparthi Nagesh et al (2011) Segmentation of brain MR images based on finite skew gaussian mixture model with fuzzy C-Means clustering and -EM algorithm. Int J Comput Appl 28(10):18–26 Vadaparthi Nagesh et al (2011) Segmentation of brain MR images based on finite skew gaussian mixture model with fuzzy C-Means clustering and -EM algorithm. Int J Comput Appl 28(10):18–26
6.
go back to reference Sabena S et al (2011) Image retrieval using canopy and improved K mean clustering. In: International conference on emerging technology trends (ICETT) 2011, pp 15–19 Sabena S et al (2011) Image retrieval using canopy and improved K mean clustering. In: International conference on emerging technology trends (ICETT) 2011, pp 15–19
7.
go back to reference McCallum A et al (2011) Efficient clustering of high-dimensional data sets with application to reference matching. White papers McCallum A et al (2011) Efficient clustering of high-dimensional data sets with application to reference matching. White papers
8.
go back to reference Bradley PS et al (1998) Scaling clustering algorithms to large databases. In: Proceeding of 4th international conference on knowledge discovery and data mining (KDD-98). AAAI Press, Menlo Park Bradley PS et al (1998) Scaling clustering algorithms to large databases. In: Proceeding of 4th international conference on knowledge discovery and data mining (KDD-98). AAAI Press, Menlo Park
Metadata
Title
A Novel Clustering Approach Using Hadoop Distributed Environment
Authors
Nagesh Vadaparthi
P. Srinivas Rao
Y. Srinivas
M. Athmaja
Copyright Year
2015
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-287-338-5_9

Premium Partner