nach oben

Neural Computing and Applications

Erschienen in:

01.05.2015 | Original Article

Text clustering using VSM with feature clusters

verfasst von: Cao Qimin, Guo Qiao, Wang Yongliang, Wu Xianghua

Erschienen in: Neural Computing and Applications | Ausgabe 4/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Representation of documents is the basis of clustering systems. In addition, non-contiguous phrases appear more and more frequent in the text in the Web 2.0 age, and these phrases can affect the result of text clustering. In order to improve the quality of text clustering, this paper proposed a feature cluster-based vector space model (FC-VSM) which used the text feature clusters co-occurrence matrix to represent document and proposed to identify non-contiguous phrases in the text preprocessing stage. Our method can reduce dimension of features compared with the traditional VSM-based model. It identified non-contiguous phrases, used distributed representation of features, and implements feature clusters. Despite their simplicity, our methods are surprisingly effective and can improve the accuracy of clustering significantly which is shown in experimental results.

Vorheriger Artikel An Extended Self-Organizing Map based on 2-opt algorithm for solving symmetrical Traveling Salesperson Problem

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Shi Z (2002) Knowledge discovery. Tsing University Press, BeiJing

Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666CrossRef

Grabmeier J, Rudolph A (2002) Techniques of cluster algorithms in data mining. Data Min Knowl Disc 6(4):303CrossRefMathSciNet

Meyer CD, Wessell CD (2012) Stochastic data clustering. SIAM SIAM J Matrix Anal Appl 33(4):1214–1236CrossRefMATHMathSciNet

Mikolov T, Chen K, Corrado G, and Dean J. Efficient estimation of word representations in vector space (2013). \hyperimage{http://arxiv.org/abs/1301–3781}{arXiv:1301–3781}

https://code.google.com/p/word2vec/

Mikolov T, Sutskever I, Chen K, et al (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

Simard M, Cancedda N, Cavestro B, et al (2005) Translating with non-contiguous phrases. In: Proceedings of the conference on human language technology and empirical methods in natural language processing, pp 755–762

Doucet A, Ahonen-Myka H (2004) Non-contiguous word sequences for information retrieval. In: Proceedings of the workshop on multiword expressions: integrating processing. Association for computational linguistics, pp 88–95

10.

Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(9):533–536CrossRef

11.

Mikolov T, Karafiát M, Burget L, et al (2010) Recurrent neural network based language model. In: INTERSPEECH, pp 1045–1048

12.

Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res, 1137–1155

13.

Mikolov T (2012) Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology

14.

Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, ACM, pp 160–167

15.

Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics, pp 384–394

16.

Socher R, Lin CC, Ng AY, Manning C (2011) Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 129–136

17.

Socher R, Bauer J, Manning CD and Ng AY (2013) Parsing with compositional vector grammars. In: Proceedings of the association for computational linguistics

18.

Collobert R, Weston J, Bottou L et al (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537MATH

19.

Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long papers, vol 1, pp 873–882

20.

Mnih A, Hinton GE (2008) A scalable hierarchical distributed language model. In: Advances in neural information processing systems, pp 1081–1088

21.

Mikolov T, Karafiát M, Burget L, et al (2010) Recurrent neural network based language model. In: INTERSPEECH, pp 1045–1048

22.

Morin F, Bengio Y (2005) Hierarchical probabilistic neural network language model. In: Proceedings of the international workshop on artificial intelligence and statistics, pp 246–252

23.

Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620CrossRefMATH

24.

Landauer TK, Domais ST (1997) A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychol Rev 104(2):211–240CrossRef

25.

Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH

26.

Lu Y, Mei Q, Zhai CX (2011) Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Inf Retrieval 14(2):178–203CrossRef

Titel: Text clustering using VSM with feature clusters
verfasst von: Cao Qimin
Guo Qiao
Wang Yongliang
Wu Xianghua
Publikationsdatum: 01.05.2015
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 4/2015
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-014-1792-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 4/2015

Investigation of turbulent flow and heat transfer in an air to water double-pipe heat exchanger

Finger-vein recognition with modified binary tree model

Fuzzy adaptive imperialist competitive algorithm for global optimization

Refined microstructure of compo cast nanocomposites: the performance of combined neuro-computing, fuzzy logic and particle swarm techniques

Scheduling of printed circuit board (PCB) assembly systems with heterogeneous processors using simulation-based intelligent optimization methods

A chaotic artificial immune system optimisation algorithm for solving global continuous optimisation problems