Skip to main content
Top

2020 | OriginalPaper | Chapter

Research on Chinese Short Text Clustering Ensemble via Convolutional Neural Networks

Authors : Haowen Wan, Bo Ning, Xiaoyu Tao, Jianfei Long

Published in: Artificial Intelligence in China

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Different from traditional text, short texts are characterized by high dimensionality, sparseness, and large text size. At the same time, some existing clustering ensemble algorithms treat each clusters equally, which will lead to the problem that the clustering results are not good enough. To solve this problem, this paper proposed a short text clustering ensemble algorithm based on convolution neural network (CNN). Firstly, the word2vec model is used to preserve the semantic relationship between words and obtain the multi-dimensional word vector representation; secondly, the feature is extracted from the original vector combining with the CNN; thirdly, clustering methods are used to cluster vectors; and then finally, Gini coefficient is used to measure the reliability of clustering, and the final clustering ensemble is carried out.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Lin D (1998) An information-theoretic definition of similarity. In: ICML, pp 296–304 Lin D (1998) An information-theoretic definition of similarity. In: ICML, pp 296–304
2.
go back to reference Ramage D, Heymann P, Manning CD et al (2009) Clustering the tagged web. In: Second ACM international conference on web search and data mining, ACM, pp 54–63 Ramage D, Heymann P, Manning CD et al (2009) Clustering the tagged web. In: Second ACM international conference on web search and data mining, ACM, pp 54–63
3.
go back to reference Suo H et al (2006) An improved K-means algorithm for document clustering. J Shandong Univ (Natural Science) 43(1):60–64MATH Suo H et al (2006) An improved K-means algorithm for document clustering. J Shandong Univ (Natural Science) 43(1):60–64MATH
4.
go back to reference Yang Z et al (2014) Online comment clustering based on an improved semantic distance. J Softw 25(12):2777–2789 Yang Z et al (2014) Online comment clustering based on an improved semantic distance. J Softw 25(12):2777–2789
5.
go back to reference Zhang Q et al (2016) Short text clustering algorithm combined with context semantic information. Comput Sci 43(s2):443–446 Zhang Q et al (2016) Short text clustering algorithm combined with context semantic information. Comput Sci 43(s2):443–446
6.
go back to reference Wang T (2011) CA-tree: a hierarchical structure for efficient and scalable coassociation-based cluster ensembles. IEEE TRANS Syst Man Cybern B Cybern 41(3):686–698CrossRef Wang T (2011) CA-tree: a hierarchical structure for efficient and scalable coassociation-based cluster ensembles. IEEE TRANS Syst Man Cybern B Cybern 41(3):686–698CrossRef
7.
go back to reference Yu Z, Li L, Liu J (2015) Adaptive noise immune cluster ensemble using affinity propagation. IEEE Trans Knowl Data Eng 27(12):3176–3189CrossRef Yu Z, Li L, Liu J (2015) Adaptive noise immune cluster ensemble using affinity propagation. IEEE Trans Knowl Data Eng 27(12):3176–3189CrossRef
8.
go back to reference Fred ALN, Jain AK (2005) Combing multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal March Intell 27(6):835–850CrossRef Fred ALN, Jain AK (2005) Combing multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal March Intell 27(6):835–850CrossRef
9.
go back to reference Huang D, Lai J-H (2015) Combining multiple clusterings via crows agreement estimation and multi-granularity link analysis. Neurocomputing 170:240–250CrossRef Huang D, Lai J-H (2015) Combining multiple clusterings via crows agreement estimation and multi-granularity link analysis. Neurocomputing 170:240–250CrossRef
10.
go back to reference Strehl A, Gosh J (2005) Cluster ensembles–A knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–850MathSciNetMATH Strehl A, Gosh J (2005) Cluster ensembles–A knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–850MathSciNetMATH
11.
go back to reference Li T, Ding C (2008) Weighted consensus clustering. In: Proceedings of the 2008 SIAM international conference on data mining (SDM), Atlanta, GA, USA, 2008, pp 798–809 Li T, Ding C (2008) Weighted consensus clustering. In: Proceedings of the 2008 SIAM international conference on data mining (SDM), Atlanta, GA, USA, 2008, pp 798–809
12.
go back to reference Yu Z et al (2014) Hybrid clustering solution selection strategy. Pattern Recognit 47(10):3362–3375CrossRef Yu Z et al (2014) Hybrid clustering solution selection strategy. Pattern Recognit 47(10):3362–3375CrossRef
13.
go back to reference Blei DM, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res Achieve 993–1022 Blei DM, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res Achieve 993–1022
Metadata
Title
Research on Chinese Short Text Clustering Ensemble via Convolutional Neural Networks
Authors
Haowen Wan
Bo Ning
Xiaoyu Tao
Jianfei Long
Copyright Year
2020
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-15-0187-6_74