Skip to main content

2018 | OriginalPaper | Buchkapitel

Most Important First – Keyphrase Scoring for Improved Ranking in Settings With Limited Keyphrases

verfasst von : Nils Witt, Tobias Milz, Christin Seifert

Erschienen in: Discovery Science

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automatic keyphrase extraction attempts to capture keywords that accurately and extensively describe the document while being comprehensive at the same time. Unsupervised algorithms for extractive keyphrase extraction, i.e. those that filter the keyphrases from the text without external knowledge, generally suffer from low precision and low recall. In this paper, we propose a scoring of the extracted keyphrases as post-processing to rerank the list of extracted phrases in order to improve precision and recall particularly for the top phrases. The approach is based on the tf-idf score of the keyphrases and is agnostic of the underlying method used for the initial extraction of the keyphrases. Experiments show an increase of up to 14% at 5 keyphrases in the F1-metric on the most difficult corpus out of 4 corpora. We also show that this increase is mostly due to an increase on documents with very low F1-scores. Thus, our scoring and aggregation approach seems to be a promising way for robust, unsupervised keyphrase extraction with a special focus on the most important keyphrases.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Throughout the document we will use the unifying term keyphrase to refer to keywords as well as keyphrases as defined in the Introduction.
 
Literatur
1.
Zurück zum Zitat Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: Scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint arXiv:1704.02853 (2017) Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: Scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint arXiv:​1704.​02853 (2017)
2.
Zurück zum Zitat Bougouin, A., Boudin, F., Daille, B.: Topicrank: graph-based topic ranking for keyphrase extraction, pp. 543–551, Oct 2013 Bougouin, A., Boudin, F., Daille, B.: Topicrank: graph-based topic ranking for keyphrase extraction, pp. 543–551, Oct 2013
4.
Zurück zum Zitat Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer. Netw. ISDN Syst. 30(1–7), 107–117 (1998)CrossRef Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer. Netw. ISDN Syst. 30(1–7), 107–117 (1998)CrossRef
5.
Zurück zum Zitat Danilevsky, M., Wang, C., Desai, N., Ren, X., Guo, J., Han, J.: Automatic construction and ranking of topical keyphrases on collections of short documents. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 398–406. SIAM (2014)CrossRef Danilevsky, M., Wang, C., Desai, N., Ren, X., Guo, J., Han, J.: Automatic construction and ranking of topical keyphrases on collections of short documents. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 398–406. SIAM (2014)CrossRef
6.
Zurück zum Zitat Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, IJCAI 1999, pp. 668–673. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999) Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, IJCAI 1999, pp. 668–673. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)
7.
Zurück zum Zitat Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 661–670. ACM, New York, NY, USA (2009) Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 661–670. ACM, New York, NY, USA (2009)
8.
Zurück zum Zitat Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING 2010, pp. 365–373. Association for Computational Linguistics, Stroudsburg, PA, USA (2010) Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING 2010, pp. 365–373. Association for Computational Linguistics, Stroudsburg, PA, USA (2010)
9.
Zurück zum Zitat Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1262–1273 (2014) Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1262–1273 (2014)
10.
Zurück zum Zitat Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP 2003, pp. 216–223. Association for Computational Linguistics, Stroudsburg, PA, USA (2003) Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP 2003, pp. 216–223. Association for Computational Linguistics, Stroudsburg, PA, USA (2003)
11.
Zurück zum Zitat Liu, F., Pennell, D., Liu, F., Liu, Y.: Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, pp. 620–628. Association for Computational Linguistics, Stroudsburg, PA, USA (2009) Liu, F., Pennell, D., Liu, F., Liu, Y.: Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, pp. 620–628. Association for Computational Linguistics, Stroudsburg, PA, USA (2009)
12.
Zurück zum Zitat Liu, Z., Chen, X., Zheng, Y., Sun, M.: Automatic keyphrase extraction by bridging vocabulary gap. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pp. 135–144. Association for Computational Linguistics (2011) Liu, Z., Chen, X., Zheng, Y., Sun, M.: Automatic keyphrase extraction by bridging vocabulary gap. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pp. 135–144. Association for Computational Linguistics (2011)
13.
Zurück zum Zitat Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the conference on empirical methods in natural language processing, pp. 366–376. Association for Computational Linguistics (2010) Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the conference on empirical methods in natural language processing, pp. 366–376. Association for Computational Linguistics (2010)
14.
Zurück zum Zitat Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, EMNLP 2009, pp. 257–266. Association for Computational Linguistics, Stroudsburg, PA, USA (2009) Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, EMNLP 2009, pp. 257–266. Association for Computational Linguistics, Stroudsburg, PA, USA (2009)
15.
Zurück zum Zitat Mani, I.: Advances in Automatic Text Summarization. MIT Press, Cambridge (1999) Mani, I.: Advances in Automatic Text Summarization. MIT Press, Cambridge (1999)
16.
Zurück zum Zitat Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3, EMNLP 2009, pp. 1318–1327. Association for Computational Linguistics, Stroudsburg, PA, USA (2009) Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3, EMNLP 2009, pp. 1318–1327. Association for Computational Linguistics, Stroudsburg, PA, USA (2009)
17.
18.
Zurück zum Zitat Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain (2004) Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain (2004)
19.
Zurück zum Zitat Miller, G.A.: The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol. Rev. 63(2), 81 (1956)CrossRef Miller, G.A.: The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol. Rev. 63(2), 81 (1956)CrossRef
20.
Zurück zum Zitat Ren, X., El-Kishky, A., Wang, C., Tao, F., Voss, C.R., Han, J.: Clustype: effective entity recognition and typing by relation phrase-based clustering. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 995–1004. ACM (2015) Ren, X., El-Kishky, A., Wang, C., Tao, F., Voss, C.R., Han, J.: Clustype: effective entity recognition and typing by relation phrase-based clustering. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 995–1004. ACM (2015)
21.
Zurück zum Zitat Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents, pp. 1–20. Wiley, Chichester (2010) Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents, pp. 1–20. Wiley, Chichester (2010)
22.
Zurück zum Zitat Turney, P.: Learning to extract keyphrases from text, Jan 1999 Turney, P.: Learning to extract keyphrases from text, Jan 1999
23.
Zurück zum Zitat Wan, X., Xiao, J.: Collabrank: towards a collaborative approach to single- document keyphrase extraction. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 969–976. Coling 2008 Organizing Committee, Manchester, UK, August 2008 Wan, X., Xiao, J.: Collabrank: towards a collaborative approach to single- document keyphrase extraction. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 969–976. Coling 2008 Organizing Committee, Manchester, UK, August 2008
24.
Zurück zum Zitat Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd National Conference on Artificial Intel- ligence - Volume 2, AAAI 2008, pp. 855–860. AAAI Press (2008) Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd National Conference on Artificial Intel- ligence - Volume 2, AAAI 2008, pp. 855–860. AAAI Press (2008)
25.
Zurück zum Zitat Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. AAAI 8, 855–860 (2008) Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. AAAI 8, 855–860 (2008)
26.
Zurück zum Zitat Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, pp. 254–255. ACM, New York, NY, USA (1999) Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, pp. 254–255. ACM, New York, NY, USA (1999)
27.
Zurück zum Zitat Zhang, Y., Fang, Y., Weidong, X.: Deep keyphrase generation with a convolutional sequence to sequence model. In: 2017 4th International Conference on Systems and Informatics (ICSAI), pp. 1477–1485. IEEE (2017) Zhang, Y., Fang, Y., Weidong, X.: Deep keyphrase generation with a convolutional sequence to sequence model. In: 2017 4th International Conference on Systems and Informatics (ICSAI), pp. 1477–1485. IEEE (2017)
Metadaten
Titel
Most Important First – Keyphrase Scoring for Improved Ranking in Settings With Limited Keyphrases
verfasst von
Nils Witt
Tobias Milz
Christin Seifert
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01771-2_24