Top

Published in:

2017 | OriginalPaper | Chapter

A Lexicon LDA Model Based Solution to Theme Extraction of Chinese Short Text on the Internet

Authors : Xu Wang, Jing Zhou

Published in: Bio-inspired Computing: Theories and Applications

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Chinese short text has become the main content of the Internet. Accurately extracting thematic terms is the basis of content analysis, query suggestions, document classification, and text clustering and other tasks for Chinese short text on the Internet. Since Chinese short text is short on the Internet, unbalanced and less of context information, the traditional text clustering model is not immediately appropriate. This paper presents a simple and generic theme model named Lexicon LDA for Chinese short text on the Internet, by using the sentence structure within the document, to enrich the context of the common Chinese word semantics. Words of each sentence which is divided by punctuation marks compose a word set. Unlike the previous method, the model distributes the theme for each word set, rather than for each document. When the data set presents a strong theme distribution, it can significantly improve the effect of the theme model through experiments. The conclusion is that extracting thematic terms of Chinese short text on the Internet is related both to the word itself and to the sentence where the word is located.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter A Modified Standard PSO-2011 with Robust Search Ability

next chapter The Decoder Based on DNA Strand Displacement with Improved “AND” Gate and “OR” Gate

Yu, L.L., Asur, S., Huberman, B.A.: Dynamics of Trends and Attention in Chinese Social Media. arXivpreprint arXiv:1312.0649 (2013)

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(1), 993–1022 (2003)MATH

Wang, X., Mc Callum, A.: Topics over time: a non-markov continuous-time model of topical trends. In: KDD, 424–433 (2006)

Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: WSDM, pp. 261–270 (2010)

Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW, pp. 91–100 (2008)

Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., Li, X.: Comparing twitter and traditional media using topic models. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_34 CrossRef

Chua, F.C.T., Asur, S.: Automatic summarization of events from social media. In: ICWSM (2013)

Chen, Y., Amiri, H., Li, Z., Chua, T.S.: Emerging topic detection for organizations from microblogs. In: SIGIR, pp. 43–52 (2013)

Jagarlamudi, J., Daumé III, H., Udupa, R.: Incorporating lexical priors into topic models. In: EACL, pp. 204–213 (2012)

10.

Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: ICML, pp. 25–32 (2009)

11.

Mc Callum, A., Mimno, D.M., Wallach, H.M.: Rethinking lda: why priors matter. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C., Culotta, A. (eds.) NIPS, pp. 1973–1981. Curran Associates, Inc. (2009)

12.

http://jgibblda.sourceforge.net/

13.

Yan, X., Guo, J., Lan, Y., Cheng, X.: A Biterm topic model for short texts. In: 22nd International Conference on World Wide Web, pp. 1445–1456. International World Wide Web Conferences Steering Committee (2013)

14.

http://code.google.com/p/btm

15.

Mimno, D., Wallach, H.M., Talley, E., Leenders, M., and Mc Callum, A.: Optimizing semantic coherence in topic models. In: Conference on Empirical Methods in Natural Language Processing, pp. 262–272. Association for Computational Linguistics (2011)

16.

Stevens, K., Kegelmeyer, P., Andrzejewski, D., Buttler, D.: Exploring topic coherence over many models and many topics. In: 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, ser, EMNLP-Co NLL 2012, pp. 952–961 (2012)

17.

Zhou, T., Lyu, M.T., King, I., Lou, J.: Learning to suggest questions in social media. Knowl. Inf. Syst. 1–28 (2014)

18.

Zuo, Y., Zhao, J., Xu, K.: Word network topic model: a simple but general solution for short and imbalanced texts. Knowl. Inf. Syst. 48(2), 379–398 (2016)CrossRef

19.

20.

Nigam, K., Mc Callum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2—-3), 103–134 (2000)CrossRefMATH

21.

Frank, E., Paynter G.W., Witten, I.H., et. al.: Domain-specific key phrase extraction. In: 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 668–673. Morgan Kaufmann Publishers Inc., San Francisco (1999)

22.

Turney, P.D.: Learning algorithms for key phrase extraction. Inform. Retrieval 2(4), 303–336 (2000)CrossRef

23.

Jing, S., Wanlong, L.: Topic words extraction method based on LDA model. Comput. Eng. 36(19), 81–83 (2010)

24.

Jun, L., Dongsheng, Z., Xinlai, X., et al.: Key phrase extraction based on topic feature. Appl. Res. Comput. 29(11), 4224–4227 (2012)

25.

Zhiyuan, L.: Research on Keyword Extraction Using Document Topical Structure. Tsinghua University, Beijing (2011)

Title: A Lexicon LDA Model Based Solution to Theme Extraction of Chinese Short Text on the Internet
Authors: Xu Wang
Jing Zhou
Publisher: Springer Singapore
Book: Bio-inspired Computing: Theories and Applications
Print ISBN: 978-981-10-7178-2

Electronic ISBN: 978-981-10-7179-9

Copyright Year: 2017
DOI: https://doi.org/10.1007/978-981-10-7179-9_17

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner