Skip to main content

2017 | OriginalPaper | Buchkapitel

Extracting Keyphrases Using Heterogeneous Word Relations

verfasst von : Wei Shi, Zheng Liu, Weiguo Zheng, Jeffrey Xu Yu

Erschienen in: Databases Theory and Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Extracting keyphrases from documents for providing a quick and insightful summarization is an interesting and important task, on which lots of research efforts have been laid. Most of the existing methods could be categorized as co-occurrence based, statistic-based, or semantics-based. The co-occurrence based methods do not take various word relations besides co-occurrence into full consideration. The statistic-based methods introduce more unrelated noises inevitably due to the inclusion of external text corpus, while the semantic-based methods heavily depend on the semantic meanings of words. In this paper, we propose a novel graph-based approach to extract keyphrases by considering heterogeneous latent word relations (the co-occurrence and the semantics). The underlying random walk model behind the proposed approach is made possible and reasonable by exploiting nearest neighbor documents. Extensive experiments over real data show that our method outperforms the state-of-art methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
2.
Zurück zum Zitat Boudin, F.: A comparison of centrality measures for graph-based keyphrase extraction. IJCNLP 2013, 834–838 (2013) Boudin, F.: A comparison of centrality measures for graph-based keyphrase extraction. IJCNLP 2013, 834–838 (2013)
3.
Zurück zum Zitat Hammouda, K.M., Matute, D.N., Kamel, M.S.: CorePhrase: keyphrase extraction for document clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 265–274. Springer, Heidelberg (2005). doi:10.1007/11510888_26 CrossRef Hammouda, K.M., Matute, D.N., Kamel, M.S.: CorePhrase: keyphrase extraction for document clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 265–274. Springer, Heidelberg (2005). doi:10.​1007/​11510888_​26 CrossRef
4.
Zurück zum Zitat Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: COLING, pp. 365–373 (2010) Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: COLING, pp. 365–373 (2010)
5.
Zurück zum Zitat Huang, C., Tian, Y., Zhou, Z., Ling, C.X., Huang, T.: Keyphrase extraction using semantic networks structure analysis. In: ICDM 2006, pp. 275–284 (2006) Huang, C., Tian, Y., Zhou, Z., Ling, C.X., Huang, T.: Keyphrase extraction using semantic networks structure analysis. In: ICDM 2006, pp. 275–284 (2006)
6.
Zurück zum Zitat Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: EMNLP 2003, pp. 216–223 (2003) Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: EMNLP 2003, pp. 216–223 (2003)
7.
Zurück zum Zitat Jiang, X., Hu, Y., Li, H.: A ranking approach to keyphrase extraction. In: SIGIR 2009, pp. 756–757 (2009) Jiang, X., Hu, Y., Li, H.: A ranking approach to keyphrase extraction. In: SIGIR 2009, pp. 756–757 (2009)
8.
Zurück zum Zitat Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: EMNLP, pp. 366–376 (2010) Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: EMNLP, pp. 366–376 (2010)
9.
Zurück zum Zitat Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: EMNLP 2009, pp. 257–266 (2009) Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: EMNLP 2009, pp. 257–266 (2009)
10.
Zurück zum Zitat Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL, pp. 55–60 (2014) Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL, pp. 55–60 (2014)
11.
Zurück zum Zitat Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: EMNLP 2004, pp. 404–411 (2004) Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: EMNLP 2004, pp. 404–411 (2004)
12.
Zurück zum Zitat Ng, M.K., Li, X., Ye, Y.: Multirank: co-ranking for objects and relations in multi-relational data. In: SIGKDD 2011, pp. 1217–1225 (2011) Ng, M.K., Li, X., Ye, Y.: Multirank: co-ranking for objects and relations in multi-relational data. In: SIGKDD 2011, pp. 1217–1225 (2011)
13.
Zurück zum Zitat Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. 1999 Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. 1999
14.
Zurück zum Zitat Tsatsaronis, G., Varlamis, I., Nørvåg, K.: SemanticRank: ranking keywords and sentences using semantic graphs. In: COLING 2010, pp. 1074–1082 (2010) Tsatsaronis, G., Varlamis, I., Nørvåg, K.: SemanticRank: ranking keywords and sentences using semantic graphs. In: COLING 2010, pp. 1074–1082 (2010)
15.
Zurück zum Zitat Turney, P.D.: Learning algorithms for keyphrase extraction. CoRR, cs.LG/0212020 (2002) Turney, P.D.: Learning algorithms for keyphrase extraction. CoRR, cs.LG/0212020 (2002)
16.
Zurück zum Zitat Turney, P.D.: Learning to extract keyphrases from text. CoRR, cs.LG/0212013 (2002) Turney, P.D.: Learning to extract keyphrases from text. CoRR, cs.LG/0212013 (2002)
17.
Zurück zum Zitat Wan, X., Xiao, J.: Exploiting neighborhood knowledge for single document summarization and keyphrase extraction. ACM Trans. Inf. Syst. 28(2) (2010) Wan, X., Xiao, J.: Exploiting neighborhood knowledge for single document summarization and keyphrase extraction. ACM Trans. Inf. Syst. 28(2) (2010)
18.
Zurück zum Zitat Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: ACL 2007 (2007) Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: ACL 2007 (2007)
19.
Zurück zum Zitat Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: practical automatic keyphrase extraction. In: ACM DL 1999, pp. 254–255 (1999) Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: practical automatic keyphrase extraction. In: ACM DL 1999, pp. 254–255 (1999)
20.
Zurück zum Zitat Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R.H.: Optimizing classifier performance via an approximation to the Wilcoxon-Mann-whitney statistic. In: ICML 2003, pp. 848–855 (2003) Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R.H.: Optimizing classifier performance via an approximation to the Wilcoxon-Mann-whitney statistic. In: ICML 2003, pp. 848–855 (2003)
21.
Zurück zum Zitat Youn, E., Jeong, M.K.: Class dependent feature scaling method using naive bayes classifier for text datamining. Pattern Recogn. Lett. 30(5), 477–485 (2009)CrossRef Youn, E., Jeong, M.K.: Class dependent feature scaling method using naive bayes classifier for text datamining. Pattern Recogn. Lett. 30(5), 477–485 (2009)CrossRef
Metadaten
Titel
Extracting Keyphrases Using Heterogeneous Word Relations
verfasst von
Wei Shi
Zheng Liu
Weiguo Zheng
Jeffrey Xu Yu
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-68155-9_13