nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

An Efficient Ranking-Centered Density-Based Document Clustering Method

verfasst von : Wathsala Anupama Mohotti, Richi Nayak

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Document clustering is a popular method for discovering useful information from text data. This paper proposes an innovative hybrid document clustering method based on the novel concepts of ranking, density and shared neighborhood. We utilize ranked documents generated from a search engine to effectively build a graph of shared relevant documents. The high density regions in the graph are processed to form initial clusters. The clustering decisions are further refined using the shared neighborhood information. Empirical analysis shows that the proposed method is able to produce accurate and efficient solution as compared to relevant benchmarking methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel A New Local Density for Density Peak Clustering

Nächstes Kapitel Fast Manifold Landmarking Using Locality-Sensitive Hashing

Anastasiu, D.C., Tagarelli, A., Karypis, G.: Document clustering: the next frontier. In: Aggarwal, C.C., Reddy, C.K. (eds.) Data Clustering: Algorithms and Applications, pp. 305–328 (2013)

Zhao, W., He, Q., Ma, H., Shi, Z.: Effective semi-supervised document clustering via active learning with instance-level constraints. KAIS 30, 569–587 (2012)

Tomašev, N., Radovanović, M., Mladenić, D., Ivanović, M.: Hubness-based clustering of high-dimensional data. In: Celebi, M.Emre (ed.) Partitional Clustering Algorithms, pp. 353–386. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-09259-1_11CrossRef

Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)

Ertöz, L., Steinbach, M., Kumar, V.: Finding topics in collections of documents: a shared nearest neighbor approach. Clustering and Information Retrieval. Network Theory and Applications, vol. 11, pp. 83–103. Springer, Boston (2003). https://doi.org/10.1007/978-1-4613-0227-8_3CrossRef

Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared near neighbors. IEEE Trans. Comput. 100, 1025–1034 (1973)CrossRef

Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: SIAM, pp. 47–58. SIAM (2003)CrossRef

Sutanto, T., Nayak, R.: Semi-supervised document clustering via loci. In: Wang, J., Cellary, W., Wang, D., Wang, H., Chen, S.-C., Li, T. (eds.) WISE 2015. LNCS, vol. 9419, pp. 208–215. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26187-4_16CrossRef

Broder, A., Garcia-Pueyo, L., Josifovski, V., Vassilvitskii, S., Venkatesan, S.: Scalable k-means by ranked retrieval. In: 7th WSDM, pp. 233–242. ACM (2014)

10.

Fuhr, N., Lechtenfeld, M., Stein, B., Gollub, T.: The optimum clustering framework: implementing the cluster hypothesis. Inf. Retr. 15, 93–115 (2012)CrossRef

11.

Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: 17th WWW, pp. 387–396. ACM (2008)

12.

Jardine, N., van Rijsbergen, C.J.: The use of hierarchic clustering in information retrieval. Inf. Storage Retr. 7, 217–240 (1971)CrossRef

13.

Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.-Y.: Improving web search results using affinity graph. In: 28th ACM SIGIR, pp. 504–511. ACM (2005)

14.

Hou, J., Nayak, R.: The heterogeneous cluster ensemble method using hubness for clustering text documents. In: Lin, X., Manolopoulos, Y., Srivastava, D. (eds.) WISE 2013. LNCS, vol. 8180, pp. 102–110. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41230-1_9CrossRef

15.

Lin, C.J.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19, 2756–2779 (2007)MathSciNetCrossRef

16.

Hajek, B.: Adaptive transmission strategies and routing in mobile radio networks. Urbana 51, 61801 (1983)

Titel: An Efficient Ranking-Centered Density-Based Document Clustering Method
verfasst von: Wathsala Anupama Mohotti
Richi Nayak
Verlag: Springer International Publishing
Buch: Advances in Knowledge Discovery and Data Mining
Print ISBN: 978-3-319-93039-8

Electronic ISBN: 978-3-319-93040-4

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-93040-4_35

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner