Skip to main content

2015 | OriginalPaper | Buchkapitel

Optimized Distributed Text Document Clustering Algorithm

verfasst von : J. E. Judith, J. Jayakumari

Erschienen in: Artificial Intelligence and Evolutionary Algorithms in Engineering Systems

Verlag: Springer India

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Due to scientific progression, a variety of challenges exist in the field of information retrieval (IR) . These challenges are due to the increased usage of large volumes of data. These enormous amounts of data are available from large-scale distributed networks. Centralization of these data to perform analysis is difficult. There exists a need for distributed text document clustering algorithms that overcomes challenges in clustering. The two main challenges are clustering accuracy and clustering quality. In this paper, an optimized distributed text document clustering algorithm is proposed that uses a distributed particle swarm optimization (DPSO) algorithm for the purpose of optimizing and generating initial centroids for the distributed K-means (DKMeans) clustering algorithm. This improves the quality of clustering. Similarity is determined using Jaccard coefficient that generates coherent clusters, thus improving the accuracy of the proposed algorithm. Extensive evaluations based on simulation are carried out with the given data sets to demonstrate the effectiveness of the algorithm. Data sets such as Reuters-21578 and 20 Newsgroups are used for evaluation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat J. Han, M. Kamber, Data Mining: Concepts and Technique (2006) J. Han, M. Kamber, Data Mining: Concepts and Technique (2006)
2.
Zurück zum Zitat S. Datta, K. Bhaduri, C. Giannella, R. Wolff, H. Kargupta, Distributed data mining in peer-to-peer networks. IEEE Int. Comput. (2006), pp. 1–8 S. Datta, K. Bhaduri, C. Giannella, R. Wolff, H. Kargupta, Distributed data mining in peer-to-peer networks. IEEE Int. Comput. (2006), pp. 1–8
3.
Zurück zum Zitat N. Narayanan, J.E. Judith, J. Jayakumari, Enhanced distributed document clustering algorithm using different similarity measures. in IEEE Conference on Information and Communications Technologies (ICT) (2013), pp. 545–550 N. Narayanan, J.E. Judith, J. Jayakumari, Enhanced distributed document clustering algorithm using different similarity measures. in IEEE Conference on Information and Communications Technologies (ICT) (2013), pp. 545–550
4.
Zurück zum Zitat J.E. Judith, J. Jayakumari, Performance evaluation of an effective hybrid distributed document clustering algorithm. Eur. J. Sci. Res. 86(2), 283–297 (2012) J.E. Judith, J. Jayakumari, Performance evaluation of an effective hybrid distributed document clustering algorithm. Eur. J. Sci. Res. 86(2), 283–297 (2012)
5.
Zurück zum Zitat J.E. Judith, J. Jayakumari, Enhanced distributed text document clustering based on semantics. Int. Rev. Comput. softw. 8(10) (2013) J.E. Judith, J. Jayakumari, Enhanced distributed text document clustering based on semantics. Int. Rev. Comput. softw. 8(10) (2013)
6.
Zurück zum Zitat K.M. Hammouda, M.S. Kamel, Hierarchically distributed peer-to-peer document clustering and cluster summarization. IEEE Trans. Knowl. Data Eng. 21(5), 681–698 (2009)CrossRef K.M. Hammouda, M.S. Kamel, Hierarchically distributed peer-to-peer document clustering and cluster summarization. IEEE Trans. Knowl. Data Eng. 21(5), 681–698 (2009)CrossRef
7.
Zurück zum Zitat S. Datta, C.R. Giannella, H. Kargupta, Approximate distributed k-means clustering over P2P network. IEEE Trans. Knowl. Data Eng. 2(10), 1372–1388 (2009) S. Datta, C.R. Giannella, H. Kargupta, Approximate distributed k-means clustering over P2P network. IEEE Trans. Knowl. Data Eng. 2(10), 1372–1388 (2009)
8.
Zurück zum Zitat O. Papapetrou, W. Siberski, W. Nejdl, Decentralized Probabilistic Text Clustering. IEEE Trans. Knowl. Data Eng. 24(10), 1848–1861 (2012)CrossRef O. Papapetrou, W. Siberski, W. Nejdl, Decentralized Probabilistic Text Clustering. IEEE Trans. Knowl. Data Eng. 24(10), 1848–1861 (2012)CrossRef
9.
Zurück zum Zitat E. Januzaj, H.-P. Kriegel, M. Pfeifle, Towards effective and efficient distributed clustering, in Workshop on Clustering large Data Sets (2003) E. Januzaj, H.-P. Kriegel, M. Pfeifle, Towards effective and efficient distributed clustering, in Workshop on Clustering large Data Sets (2003)
10.
Zurück zum Zitat M. Steinbach, G. Karypis, V. Kumar, A comparison of document clustering techniques. KDD Workshop on Text Mining (2000) M. Steinbach, G. Karypis, V. Kumar, A comparison of document clustering techniques. KDD Workshop on Text Mining (2000)
11.
Zurück zum Zitat J. Kennedy, R.C. Eberhart, Particle swarm optimization, in IEEE International Conference on Neural Networks (1995), pp. 1942–1948 J. Kennedy, R.C. Eberhart, Particle swarm optimization, in IEEE International Conference on Neural Networks (1995), pp. 1942–1948
12.
Zurück zum Zitat X. Cui, T.E. Potok, Document clustering analysis based on hybrid PSO + Kmeans Algorithm. J. Comput. Sci. 27–33 (2005) X. Cui, T.E. Potok, Document clustering analysis based on hybrid PSO + Kmeans Algorithm. J. Comput. Sci. 27–33 (2005)
13.
Zurück zum Zitat A. Huang, Similarity measures for text document clustering, in Proceedings of the New Zealand Computer Science Research Student Conference (2008), pp. 49–56 A. Huang, Similarity measures for text document clustering, in Proceedings of the New Zealand Computer Science Research Student Conference (2008), pp. 49–56
14.
Zurück zum Zitat M.F. Porter, An algorithm for suffix stripping. Program Electron. Libr. Info. Syst. 14(3), 130–137 (1980) M.F. Porter, An algorithm for suffix stripping. Program Electron. Libr. Info. Syst. 14(3), 130–137 (1980)
15.
Zurück zum Zitat G. Salton, A. Wong, C.S. Yang, A vector space model for automatic Indexing. Commun. ACM 18, 613–620 (1975)CrossRefMATH G. Salton, A. Wong, C.S. Yang, A vector space model for automatic Indexing. Commun. ACM 18, 613–620 (1975)CrossRefMATH
Metadaten
Titel
Optimized Distributed Text Document Clustering Algorithm
verfasst von
J. E. Judith
J. Jayakumari
Copyright-Jahr
2015
Verlag
Springer India
DOI
https://doi.org/10.1007/978-81-322-2135-7_60

Premium Partner