Skip to main content

2017 | OriginalPaper | Buchkapitel

Keyphrase Extraction Using Knowledge Graphs

verfasst von : Wei Shi, Weiguo Zheng, Jeffrey Xu Yu, Hong Cheng, Lei Zou

Erschienen in: Web and Big Data

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Extracting keyphrases from documents automatically is an important and interesting task since keyphrases provide a quick summarization for documents. Although lots of efforts have been made on keyphrase extraction, most of the existing methods (the co-occurrence based methods and the statistic-based methods) do not take semantics into full consideration. The co-occurrence based methods heavily depend on the co-occurrence relations between two words in the input document, which may ignore many semantic relations. The statistic-based methods exploit the external text corpus to enrich the document, which introduces more unrelated relations inevitably. In this paper, we propose a novel approach to extract keyphrases using knowledge graphs, based on which we could detect the latent relations of two keyterms (i.e., noun words and named entities) without introducing many noises. Extensive experiments over real data show that our method outperforms the state-of-art methods including the graph-based co-occurrence methods and statistic-based clustering methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Bavelas, A.: Communication patterns in task-oriented groups. J. Acoust. Soc. Am. 22(6), 725–730 (1950)CrossRef Bavelas, A.: Communication patterns in task-oriented groups. J. Acoust. Soc. Am. 22(6), 725–730 (1950)CrossRef
3.
Zurück zum Zitat Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. J. Web Sem. 7(3), 154–165 (2009)CrossRef Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. J. Web Sem. 7(3), 154–165 (2009)CrossRef
4.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
5.
Zurück zum Zitat Boudin, F.: A comparison of centrality measures for graph-based keyphrase extraction. In: IJCNLP 2013, pp. 834–838 (2013) Boudin, F.: A comparison of centrality measures for graph-based keyphrase extraction. In: IJCNLP 2013, pp. 834–838 (2013)
6.
Zurück zum Zitat Cilibrasi, R., Vitányi, P.M.B.: The Google similarity distance (2004). CoRR, abs/cs/0412098 Cilibrasi, R., Vitányi, P.M.B.: The Google similarity distance (2004). CoRR, abs/cs/0412098
7.
Zurück zum Zitat Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40, 35–41 (1977)CrossRef Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40, 35–41 (1977)CrossRef
8.
Zurück zum Zitat Grineva, M.P., Grinev, M.N., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: WWW 2009, pp. 661–670 (2009) Grineva, M.P., Grinev, M.N., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: WWW 2009, pp. 661–670 (2009)
9.
Zurück zum Zitat Hammouda, K.M., Matute, D.N., Kamel, M.S.: CorePhrase: keyphrase extraction for document clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS, vol. 3587, pp. 265–274. Springer, Heidelberg (2005). doi:10.1007/11510888_26 CrossRef Hammouda, K.M., Matute, D.N., Kamel, M.S.: CorePhrase: keyphrase extraction for document clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS, vol. 3587, pp. 265–274. Springer, Heidelberg (2005). doi:10.​1007/​11510888_​26 CrossRef
10.
Zurück zum Zitat Haveliwala, T.H.: Topic-sensitive PageRank. In: WWW 2002, pp. 517–526 (2002) Haveliwala, T.H.: Topic-sensitive PageRank. In: WWW 2002, pp. 517–526 (2002)
11.
Zurück zum Zitat Huang, C., Tian, Y., Zhou, Z., Ling, C.X., Huang, T.: Keyphrase extraction using semantic networks structure analysis. In: ICDM 2006, pp. 275–284 (2006) Huang, C., Tian, Y., Zhou, Z., Ling, C.X., Huang, T.: Keyphrase extraction using semantic networks structure analysis. In: ICDM 2006, pp. 275–284 (2006)
12.
Zurück zum Zitat Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: EMNLP 2003, pp. 216–223 (2003) Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: EMNLP 2003, pp. 216–223 (2003)
13.
Zurück zum Zitat Hulth, A.: Reducing false positives by expert combination in automatic keyword indexing. In: RANLP 2003, pp. 367–376 (2003) Hulth, A.: Reducing false positives by expert combination in automatic keyword indexing. In: RANLP 2003, pp. 367–376 (2003)
14.
Zurück zum Zitat Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 23–26 July 2002, pp. 538–543 (2002) Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 23–26 July 2002, pp. 538–543 (2002)
15.
Zurück zum Zitat Jiang, X., Hu, Y., Li, H.: A ranking approach to keyphrase extraction. In: SIGIR 2009, pp. 756–757 (2009) Jiang, X., Hu, Y., Li, H.: A ranking approach to keyphrase extraction. In: SIGIR 2009, pp. 756–757 (2009)
16.
Zurück zum Zitat Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q.: From word embeddings to document distances. In: ICML 2015, pp. 957–966 (2015) Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q.: From word embeddings to document distances. In: ICML 2015, pp. 957–966 (2015)
17.
Zurück zum Zitat Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: EMNLP, pp. 366–376 (2010) Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: EMNLP, pp. 366–376 (2010)
18.
Zurück zum Zitat Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: EMNLP 2009, pp. 257–266 (2009) Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: EMNLP 2009, pp. 257–266 (2009)
20.
Zurück zum Zitat Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL 2014, pp. 55–60 (2014) Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL 2014, pp. 55–60 (2014)
21.
Zurück zum Zitat Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: EMNLP 2004, pp. 404–411 (2004) Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: EMNLP 2004, pp. 404–411 (2004)
22.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR (2013)
23.
Zurück zum Zitat Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web (1999) Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web (1999)
24.
Zurück zum Zitat Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach: the Intelligent Agent Book. Prentice Hall Series in Artificial Intelligence. Prentice Hall, Englewood Cliffs (1995)MATH Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach: the Intelligent Agent Book. Prentice Hall Series in Artificial Intelligence. Prentice Hall, Englewood Cliffs (1995)MATH
25.
Zurück zum Zitat Tsatsaronis, G., Varlamis, I., Nørvåg, K.: SemanticRank: ranking keywords and sentences using semantic graphs. In: COLING 2010, pp. 1074–1082 (2010) Tsatsaronis, G., Varlamis, I., Nørvåg, K.: SemanticRank: ranking keywords and sentences using semantic graphs. In: COLING 2010, pp. 1074–1082 (2010)
26.
Zurück zum Zitat Turney, P.D.: Learning to extract keyphrases from text (2002). CoRR, cs.LG/0212013 Turney, P.D.: Learning to extract keyphrases from text (2002). CoRR, cs.LG/0212013
27.
Zurück zum Zitat Wan, X., Xiao, J.: Exploiting neighborhood knowledge for single document summarization and keyphrase extraction. ACM Trans. Inf. Syst. 28(2), 8 (2010)CrossRef Wan, X., Xiao, J.: Exploiting neighborhood knowledge for single document summarization and keyphrase extraction. ACM Trans. Inf. Syst. 28(2), 8 (2010)CrossRef
28.
Zurück zum Zitat Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: ACL 2007, vol. 7, pp. 552–559 (2007) Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: ACL 2007, vol. 7, pp. 552–559 (2007)
29.
Zurück zum Zitat Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, pp. 254–255 (1999) Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, pp. 254–255 (1999)
30.
Zurück zum Zitat Youn, E., Jeong, M.K.: Class dependent feature scaling method using naive Bayes classifier for text datamining. Pattern Recogn. Lett. 30(5), 477–485 (2009)CrossRef Youn, E., Jeong, M.K.: Class dependent feature scaling method using naive Bayes classifier for text datamining. Pattern Recogn. Lett. 30(5), 477–485 (2009)CrossRef
Metadaten
Titel
Keyphrase Extraction Using Knowledge Graphs
verfasst von
Wei Shi
Weiguo Zheng
Jeffrey Xu Yu
Hong Cheng
Lei Zou
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-63579-8_11