Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 12/2018

26.04.2017 | Original Article

Self-organizing weighted incremental probabilistic latent semantic analysis

verfasst von: Ning Li, Wenjuan Luo, Kun Yang, Fuzhen Zhuang, Qing He, Zhongzhi Shi

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 12/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

PLSA (Probabilistic Latent Semantic Analysis) is a popular topic modeling technique which has been widely applied to text mining applications to discover the underlying topics embedded in the data corpus. However, due to the variability of increasing data, it is necessary to discover the dynamic topics and process the large dataset incrementally. Moreover, PLSA models suffer from the problem of inferencing new documents. To overcome these problems, in this paper, we propose a novel Weighted Incremental PLSA algorithm called WIPLSA to dynamically discover topics and incrementally learn the topics from new documents. The experiments verify that the proposed WIPLSA could capture the dynamic topics hidden in the dynamic updating data corpus. Compared with PLSA, MAP PLSA and QB PLSA, WIPLSA performs better in perspexity on large dataset, which make it applicable for big data mining. In addition, WIPLSA has good performance in the application of document categorization.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
2.
Zurück zum Zitat Yan Y, Chen L, Tjhi W-C (2013) Fuzzy semi-supervised co-clustering for text documents. Fuzzy Sets Syst. 215:74–89MathSciNetCrossRef Yan Y, Chen L, Tjhi W-C (2013) Fuzzy semi-supervised co-clustering for text documents. Fuzzy Sets Syst. 215:74–89MathSciNetCrossRef
3.
Zurück zum Zitat Shehata S, Karray F, Kamel MS (2013) An efficient concept-based retrieval model for enhancing text retrieval quality. Knowl Inf Syst 1–24 Shehata S, Karray F, Kamel MS (2013) An efficient concept-based retrieval model for enhancing text retrieval quality. Knowl Inf Syst 1–24
4.
Zurück zum Zitat Freire A, Cacheda F, Formoso V, Carneiro V (2013) Analysis of performance evaluation techniques for large-scale information retrieval. Analyzing the Performance of Top-K Retrieval Algorithms, INVITED SPEAKER, p 2001 Freire A, Cacheda F, Formoso V, Carneiro V (2013) Analysis of performance evaluation techniques for large-scale information retrieval. Analyzing the Performance of Top-K Retrieval Algorithms, INVITED SPEAKER, p 2001
5.
Zurück zum Zitat Choo J, Lee C, Clarkson E, Liu Z, Lee H, Chau DHP, Li F, Kannan R, Stolper CD, Inouye D et al (2013) Visirr: Interactive visual information retrieval and recommendation for large-scale document data Choo J, Lee C, Clarkson E, Liu Z, Lee H, Chau DHP, Li F, Kannan R, Stolper CD, Inouye D et al (2013) Visirr: Interactive visual information retrieval and recommendation for large-scale document data
6.
Zurück zum Zitat Mei Q, Zhai C (2001) A note on em algorithm for probabilistic latent semantic analysis. In: Proceedings of the International Conference on Information and Knowledge Management, CIKM Mei Q, Zhai C (2001) A note on em algorithm for probabilistic latent semantic analysis. In: Proceedings of the International Conference on Information and Knowledge Management, CIKM
7.
Zurück zum Zitat Bai L, Liang J, Dang C, Cao F (2013) A novel fuzzy clustering algorithm with between-cluster information for categorical data. Fuzzy Sets Syst 215:55–73MathSciNetCrossRef Bai L, Liang J, Dang C, Cao F (2013) A novel fuzzy clustering algorithm with between-cluster information for categorical data. Fuzzy Sets Syst 215:55–73MathSciNetCrossRef
8.
Zurück zum Zitat Liu CL, Chang TH, Li HH (2013) Clustering documents with labeled and unlabeled documents using fuzzy semi-kmeans. Fuzzy Sets Syst Liu CL, Chang TH, Li HH (2013) Clustering documents with labeled and unlabeled documents using fuzzy semi-kmeans. Fuzzy Sets Syst
9.
Zurück zum Zitat Hakala K, Van Landeghem S, Salakoski T, Van de Peer Y, Ginter F (2013) Evex in st13: application of a large-scale text mining resource to event extraction and network construction. ACL 2013:26 Hakala K, Van Landeghem S, Salakoski T, Van de Peer Y, Ginter F (2013) Evex in st13: application of a large-scale text mining resource to event extraction and network construction. ACL 2013:26
10.
Zurück zum Zitat Zhou E, Zhong N, Li Y (2013) Extracting news blog hot topics based on the w2t methodology. World Wide Web, pp 1–28 Zhou E, Zhong N, Li Y (2013) Extracting news blog hot topics based on the w2t methodology. World Wide Web, pp 1–28
11.
Zurück zum Zitat Wang X, Wang J (2013) A method of hot topic detection in blogs using n-gram model. J Softw 8:184–191CrossRef Wang X, Wang J (2013) A method of hot topic detection in blogs using n-gram model. J Softw 8:184–191CrossRef
12.
Zurück zum Zitat Steyvers M, Griffiths T (2007) Probabilistic topic models. Handb Latent Semantic Anal 427:424–440 Steyvers M, Griffiths T (2007) Probabilistic topic models. Handb Latent Semantic Anal 427:424–440
13.
Zurück zum Zitat Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 113–120 Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 113–120
14.
Zurück zum Zitat Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 424–433 Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 424–433
16.
Zurück zum Zitat Aggarwal CC, Zhai C (2012) Mining text data. Springer Aggarwal CC, Zhai C (2012) Mining text data. Springer
18.
Zurück zum Zitat Bolshakova E, Loukachevitch N, Nokel M (2013) Topic models can improve domain term extraction. In: Advances in Information Retrieval. Springer, pp 684–687 Bolshakova E, Loukachevitch N, Nokel M (2013) Topic models can improve domain term extraction. In: Advances in Information Retrieval. Springer, pp 684–687
19.
Zurück zum Zitat Lin C, He Y, Everson R, Ruger S (2012) Weakly supervised joint sentiment-topic detection from text. IEEE Trans Knowl Data Eng 24:1134–1145CrossRef Lin C, He Y, Everson R, Ruger S (2012) Weakly supervised joint sentiment-topic detection from text. IEEE Trans Knowl Data Eng 24:1134–1145CrossRef
20.
Zurück zum Zitat Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. JASIS 41:391–407CrossRef Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. JASIS 41:391–407CrossRef
21.
Zurück zum Zitat Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 50–57 Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 50–57
22.
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH
23.
Zurück zum Zitat Chaney AJB, Blei DM (2012) Visualizing topic models. In: ICWSM Chaney AJB, Blei DM (2012) Visualizing topic models. In: ICWSM
24.
Zurück zum Zitat Zhai K, Boyd-Graber J, Asadi N, Alkhouja (2012) Mr. lda: a flexible large scale topic modeling package using variational inference in mapreduce. In: Proceedings of the 21st international conference on World Wide Web. ACM, pp 879–888 Zhai K, Boyd-Graber J, Asadi N, Alkhouja (2012) Mr. lda: a flexible large scale topic modeling package using variational inference in mapreduce. In: Proceedings of the 21st international conference on World Wide Web. ACM, pp 879–888
25.
Zurück zum Zitat Li N, Zhuang F, He Q, Shi Z (2012) Pplsa: Parallel probabilistic latent semantic analysis based on mapreduce. In: Intelligent Information Processing VI. Springer, pp 40–49 Li N, Zhuang F, He Q, Shi Z (2012) Pplsa: Parallel probabilistic latent semantic analysis based on mapreduce. In: Intelligent Information Processing VI. Springer, pp 40–49
26.
Zurück zum Zitat Chien J-T, Wu M-S (2008) Adaptive bayesian latent semantic analysis. IEEE Trans Audio Speech Lang Process 16:198–207CrossRef Chien J-T, Wu M-S (2008) Adaptive bayesian latent semantic analysis. IEEE Trans Audio Speech Lang Process 16:198–207CrossRef
27.
Zurück zum Zitat Wu H, Wang Y, Cheng X (2008) Incremental probabilistic latent semantic analysis for automatic question recommendation. In: Proceedings of the 2008 ACM conference on Recommender systems. ACM, pp 99–106 Wu H, Wang Y, Cheng X (2008) Incremental probabilistic latent semantic analysis for automatic question recommendation. In: Proceedings of the 2008 ACM conference on Recommender systems. ACM, pp 99–106
28.
Zurück zum Zitat Tzu-Chuan Chou MCC (2008) Using incremental plsi for threshold-resilient online event analysis. IEEE Trans Knowl Data Eng 20:289–299CrossRef Tzu-Chuan Chou MCC (2008) Using incremental plsi for threshold-resilient online event analysis. IEEE Trans Knowl Data Eng 20:289–299CrossRef
29.
Zurück zum Zitat Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42:177–196CrossRef Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42:177–196CrossRef
30.
Zurück zum Zitat Surendran AC, Sra S (2006) Incremental aspect models for mining document streams. In: Knowledge Discovery in Databases: PKDD 2006. Springer, pp 633–640 Surendran AC, Sra S (2006) Incremental aspect models for mining document streams. In: Knowledge Discovery in Databases: PKDD 2006. Springer, pp 633–640
31.
Zurück zum Zitat Wu H, Wang Y (2009) Incremental learning of triadic plsa for collaborative filtering. In: Active Media Technology. Springer, pp 81–92CrossRef Wu H, Wang Y (2009) Incremental learning of triadic plsa for collaborative filtering. In: Active Media Technology. Springer, pp 81–92CrossRef
32.
Zurück zum Zitat Qian Y (2016) Context based approach to overlapping ambiguity resolution in chinese word segmentation. J Chongqing Technol Bus Univ (Nat Sci Edn) 20–24 Qian Y (2016) Context based approach to overlapping ambiguity resolution in chinese word segmentation. J Chongqing Technol Bus Univ (Nat Sci Edn) 20–24
Metadaten
Titel
Self-organizing weighted incremental probabilistic latent semantic analysis
verfasst von
Ning Li
Wenjuan Luo
Kun Yang
Fuzhen Zhuang
Qing He
Zhongzhi Shi
Publikationsdatum
26.04.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 12/2018
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-017-0681-9

Weitere Artikel der Ausgabe 12/2018

International Journal of Machine Learning and Cybernetics 12/2018 Zur Ausgabe

Neuer Inhalt