Skip to main content
Top

2016 | OriginalPaper | Chapter

A Semantic Overlapping Clustering Algorithm for Analyzing Short-Texts

Authors : Lipika Dey, Kunal Ranjan, Ishan Verma, Abir Naskar

Published in: Rough Sets

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The rise in volumes of digitized short-texts like tweets or customer complaints and opinions about products and services pose new challenges to the established methods of text analytics both due to the sparseness of text and noise. In this paper we present a new semantic clustering algorithm, which first discovers frequently occurring semantic concepts within a repository, and then clusters the documents around these concepts based on concept distribution within them. The method produces overlapping clusters which generates far more accurate view of content embedded within real-life communication texts. We have compared the clustering results with LSH based clustering and show that the proposed method produces fewer overall clusters with more semantic coherence within a cluster.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Seifzadeh, S., Farahat, A.K., Kamel, M.S., Karray, F.: Short-text clustering using statistical semantics. In: Proceedings of the 24th International Conference on World Wide Web Companion, pp. 805–810 (2015) Seifzadeh, S., Farahat, A.K., Kamel, M.S., Karray, F.: Short-text clustering using statistical semantics. In: Proceedings of the 24th International Conference on World Wide Web Companion, pp. 805–810 (2015)
2.
go back to reference Yan, X., Guo, J., Liu, S., Cheng, X., Wang, Y.: Clustering short text using Ncut weighted non-negative matrix factorization. In: CIKM 12 Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2259–2262 (2012) Yan, X., Guo, J., Liu, S., Cheng, X., Wang, Y.: Clustering short text using Ncut weighted non-negative matrix factorization. In: CIKM 12 Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2259–2262 (2012)
3.
go back to reference Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788 (2007) Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788 (2007)
4.
go back to reference Hu, X., Sun, N., Zhang, C., Chua, T.: Exploiting internal and external semantics for the clustering of short texts using world knowledge. In: Proceedings of CIKM, Hong Kong, China, pp. 919–928 (2009) Hu, X., Sun, N., Zhang, C., Chua, T.: Exploiting internal and external semantics for the clustering of short texts using world knowledge. In: Proceedings of CIKM, Hong Kong, China, pp. 919–928 (2009)
5.
go back to reference Farahat, A.K., Kamel, M.S.: Statistical semantics for enhancing document clustering. Knowl. Inf. Syst. 28(2), 365–393 (2010)CrossRef Farahat, A.K., Kamel, M.S.: Statistical semantics for enhancing document clustering. Knowl. Inf. Syst. 28(2), 365–393 (2010)CrossRef
6.
go back to reference Kang, J., Lerman, K., Anon, P.: Analyzing microblogs with affinity propagation. In: Proceedings of KDD workshop on Social Media Analytics (2010) Kang, J., Lerman, K., Anon, P.: Analyzing microblogs with affinity propagation. In: Proceedings of KDD workshop on Social Media Analytics (2010)
8.
go back to reference Song, Y., Wang, H., Wang, Z., Li, H., Chen, W.: Short text conceptualization using a probabilistic knowledgebase. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 3, pp. 2330–2336. AAAI Press (2011) Song, Y., Wang, H., Wang, Z., Li, H., Chen, W.: Short text conceptualization using a probabilistic knowledgebase. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 3, pp. 2330–2336. AAAI Press (2011)
9.
go back to reference Xu, J., Wang, P., Tian, G., Xu, B., Zhao, J., Wang, F., Hao, H.: Short text clustering via convolutional neural networks. In: Proceedings of NAACL-HLT, pp. 62–69 (2015) Xu, J., Wang, P., Tian, G., Xu, B., Zhao, J., Wang, F., Hao, H.: Short text clustering via convolutional neural networks. In: Proceedings of NAACL-HLT, pp. 62–69 (2015)
10.
go back to reference Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 775–784 (2011) Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 775–784 (2011)
11.
go back to reference Broder, A.Z.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences, pp. 21–29. IEEE Computer Society Press, Salerno, Italy Broder, A.Z.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences, pp. 21–29. IEEE Computer Society Press, Salerno, Italy
12.
go back to reference Blei, D.M., Ng, A., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)MATH Blei, D.M., Ng, A., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)MATH
13.
go back to reference Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, vol. 99, No. 6, pp. 518–529 (1999) Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, vol. 99, No. 6, pp. 518–529 (1999)
Metadata
Title
A Semantic Overlapping Clustering Algorithm for Analyzing Short-Texts
Authors
Lipika Dey
Kunal Ranjan
Ishan Verma
Abir Naskar
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-47160-0_43

Premium Partner