Skip to main content

2019 | OriginalPaper | Buchkapitel

Chinese News Keyword Extraction Algorithm Based on TextRank and Topic Model

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

TextRank tends to choose frequent words as keywords of a document. In fact, some infrequent words can also be keywords. In order to improve this situation, a Chinese news keyword extraction algorithm LDA-TextRank based on TextRank and LDA topic model is proposed. The algorithm is a single document, unsupervised algorithm. It defines the diffusivity of two candidate words, constructs a new weight formula, and improves the weight of the edges in the text graph. At the same time, it combines with the LDA topic model, and the damping factor in TextRank is adjusted by calculating the word’s topic relevance of the document. The experiment was carried out on the Chinese corpus. The results show that compared with TextRank, LDA-TextRank has an improvement in Precision, Recall and F1-measure.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retrieval 2(4), 303–336 (2000)CrossRef Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retrieval 2(4), 303–336 (2000)CrossRef
2.
Zurück zum Zitat Frank, E., Paynter, G.W., Witten, I.H., et al: Domain-specific keyphrase extraction. In: 16th International Joint Conference on Artificial Intelligence (IJCAI 99), pp. 668–673. Morgan Kaufmann Publishers Inc., San Francisco (1999) Frank, E., Paynter, G.W., Witten, I.H., et al: Domain-specific keyphrase extraction. In: 16th International Joint Conference on Artificial Intelligence (IJCAI 99), pp. 668–673. Morgan Kaufmann Publishers Inc., San Francisco (1999)
3.
Zurück zum Zitat Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–21 (1972)CrossRef Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–21 (1972)CrossRef
4.
Zurück zum Zitat Wu, H., Salton, G.: A comparison of search term weighting: term relevance vs. inverse document frequency. In: Proceedings of the 4th Annual International ACM SIGIR Conference on Information Storage and Retrieval, pp. 30–39. ACM Press, New York (1981) Wu, H., Salton, G.: A comparison of search term weighting: term relevance vs. inverse document frequency. In: Proceedings of the 4th Annual International ACM SIGIR Conference on Information Storage and Retrieval, pp. 30–39. ACM Press, New York (1981)
5.
Zurück zum Zitat Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–441. ACL, Stroudsburg (2004) Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–441. ACL, Stroudsburg (2004)
6.
Zurück zum Zitat Wu, X., Kumar, V., Quinlan, J.R., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)CrossRef Wu, X., Kumar, V., Quinlan, J.R., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)CrossRef
7.
Zurück zum Zitat Gu, Y.R., Xu, M.X.: Keyword extraction from News articles based on PageRank algorithm. J. Univ. Electron. Sci. Technol. China 46(5), 777–783 (2017)MATH Gu, Y.R., Xu, M.X.: Keyword extraction from News articles based on PageRank algorithm. J. Univ. Electron. Sci. Technol. China 46(5), 777–783 (2017)MATH
8.
Zurück zum Zitat Li, W., Zhao, J.: TextRank algorithm by exploiting Wikipedia for short text keywords extraction. In: 2016 3rd International Conference on Information Science and Control Engineering (ICISCE), pp. 683–686. IEEE, Piscataway (2016) Li, W., Zhao, J.: TextRank algorithm by exploiting Wikipedia for short text keywords extraction. In: 2016 3rd International Conference on Information Science and Control Engineering (ICISCE), pp. 683–686. IEEE, Piscataway (2016)
9.
Zurück zum Zitat Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109(2), 18–23 (2015) Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109(2), 18–23 (2015)
10.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)MATH
11.
Zurück zum Zitat Liu, Z.Y.: Research on Keyword Extraction Using Document Topical Structure. Tsinghua University, Beijing (2011) Liu, Z.Y.: Research on Keyword Extraction Using Document Topical Structure. Tsinghua University, Beijing (2011)
12.
Zurück zum Zitat Casella, G., George, E.I.: Explaining the Gibbs sampler. Am. Stat. 46(3), 167–174 (1992)MathSciNet Casella, G., George, E.I.: Explaining the Gibbs sampler. Am. Stat. 46(3), 167–174 (1992)MathSciNet
13.
Zurück zum Zitat Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)MathSciNet Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)MathSciNet
Metadaten
Titel
Chinese News Keyword Extraction Algorithm Based on TextRank and Topic Model
verfasst von
Ao Xiong
Qing Guo
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-22968-9_29