Skip to main content

2022 | OriginalPaper | Buchkapitel

From Fundamentals to Recent Advances: A Tutorial on Keyphrasification

verfasst von : Rui Meng, Debanjan Mahata, Florian Boudin

Erschienen in: Advances in Information Retrieval

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Keyphrases represent the most important information of text which often serve as a surrogate for efficiently summarizing text documents. With the advancement of deep neural networks, recent years have witnessed rapid development in automatic identification of keyphrases. The performance of keyphrase extraction methods has been greatly improved by the progresses made in natural language understanding, enable models to predict relevant phrases not mentioned in the text. We name the task of summarizing texts with phrases keyphrasification.
In this half-day tutorial, we provide a comprehensive overview of keyphrasification as well as hands-on practice with popular models and tools. This tutorial covers important topics ranging from basics of the task to the advanced topics and applications. By the end of the tutorial, participants will have a better understanding of 1) classical and state-of-the-art keyphrasification methods, 2) current evaluation practices and their issues, and 3) current trends and future directions in keyphrasification research. Tutorial-related resources are available at https://​keyphrasificatio​n.​github.​io/​.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alzaidy, R., Caragea, C., Giles, C.L.: Bi-lstm-crf sequence labeling for keyphrase extraction from scholarly documents. In: The World Wide Web Conference, pp. 2551–2557 (2019) Alzaidy, R., Caragea, C., Giles, C.L.: Bi-lstm-crf sequence labeling for keyphrase extraction from scholarly documents. In: The World Wide Web Conference, pp. 2551–2557 (2019)
3.
Zurück zum Zitat Bennani-Smires, K., Musat, C., Hossmann, A., Baeriswyl, M., Jaggi, M.: Simple unsupervised keyphrase extraction using sentence embeddings. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp. 221–229 (2018) Bennani-Smires, K., Musat, C., Hossmann, A., Baeriswyl, M., Jaggi, M.: Simple unsupervised keyphrase extraction using sentence embeddings. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp. 221–229 (2018)
4.
Zurück zum Zitat Berend, G.: Opinion expression mining by exploiting keyphrase extraction. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1162–1170. Asian Federation of Natural Language Processing, Chiang Mai, Thailand (November 2011). https://aclanthology.org/I11-1130 Berend, G.: Opinion expression mining by exploiting keyphrase extraction. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1162–1170. Asian Federation of Natural Language Processing, Chiang Mai, Thailand (November 2011). https://​aclanthology.​org/​I11-1130
6.
Zurück zum Zitat Bougouin, A., Boudin, F., Daille, B.: TopicRank: graph-based topic ranking for keyphrase extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 543–551. Asian Federation of Natural Language Processing, Nagoya, Japan (October 2013). https://aclanthology.org/I13-1062 Bougouin, A., Boudin, F., Daille, B.: TopicRank: graph-based topic ranking for keyphrase extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 543–551. Asian Federation of Natural Language Processing, Nagoya, Japan (October 2013). https://​aclanthology.​org/​I13-1062
7.
Zurück zum Zitat Bougouin, A., Boudin, F., Daille, B.: Keyphrase annotation with graph co-ranking. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2945–2955. The COLING 2016 Organizing Committee, Osaka, Japan (December 2016). https://aclanthology.org/C16-1277 Bougouin, A., Boudin, F., Daille, B.: Keyphrase annotation with graph co-ranking. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2945–2955. The COLING 2016 Organizing Committee, Osaka, Japan (December 2016). https://​aclanthology.​org/​C16-1277
8.
Zurück zum Zitat Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: Yake! keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)CrossRef Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: Yake! keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)CrossRef
9.
Zurück zum Zitat Collins, A., Beel, J.: Document embeddings vs. keyphrases vs. terms for recommender systems: a large-scale online evaluation. In: Proceedings of the 18th Joint Conference on Digital Libraries, pp. 130–133. JCDL 2019, IEEE Press (2019). https://doi.org/10.1109/JCDL.2019.00027 Collins, A., Beel, J.: Document embeddings vs. keyphrases vs. terms for recommender systems: a large-scale online evaluation. In: Proceedings of the 18th Joint Conference on Digital Libraries, pp. 130–133. JCDL 2019, IEEE Press (2019). https://​doi.​org/​10.​1109/​JCDL.​2019.​00027
10.
Zurück zum Zitat Fagan, J.: Automatic phrase indexing for document retrieval. In: Proceedings of the 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 91–101. SIGIR 1987, Association for Computing Machinery, New York, NY, USA (1987). https://doi.org/10.1145/42005.42016 Fagan, J.: Automatic phrase indexing for document retrieval. In: Proceedings of the 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 91–101. SIGIR 1987, Association for Computing Machinery, New York, NY, USA (1987). https://​doi.​org/​10.​1145/​42005.​42016
12.
Zurück zum Zitat Florescu, C., Caragea, C.: Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1105–1115 (2017) Florescu, C., Caragea, C.: Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1105–1115 (2017)
14.
Zurück zum Zitat Han, J., Kim, T., Choi, J.: Web document clustering by using automatic keyphrase extraction. In: Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops, pp. 56–59. WI-IATW 2007, IEEE Computer Society, USA (2007) Han, J., Kim, T., Choi, J.: Web document clustering by using automatic keyphrase extraction. In: Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops, pp. 56–59. WI-IATW 2007, IEEE Computer Society, USA (2007)
15.
Zurück zum Zitat Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 531–538 (2008) Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 531–538 (2008)
17.
Zurück zum Zitat Jones, S., Staveley, M.S.: Phrasier: a system for interactive document retrieval using keyphrases. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167. SIGIR 1999, Association for Computing Machinery, New York, NY, USA (1999). https://doi.org/10.1145/312624.312671 Jones, S., Staveley, M.S.: Phrasier: a system for interactive document retrieval using keyphrases. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167. SIGIR 1999, Association for Computing Machinery, New York, NY, USA (1999). https://​doi.​org/​10.​1145/​312624.​312671
18.
Zurück zum Zitat Mahata, D., Kuriakose, J., Shah, R., Zimmermann, R.: Key2vec: automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2 (Short Papers), pp. 634–639 (2018) Mahata, D., Kuriakose, J., Shah, R., Zimmermann, R.: Key2vec: automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2 (Short Papers), pp. 634–639 (2018)
19.
Zurück zum Zitat Medelyan, O., Witten, I.H.: Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 296–297. JCDL 2006, Association for Computing Machinery, New York, NY, USA (2006). https://doi.org/10.1145/1141753.1141819 Medelyan, O., Witten, I.H.: Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 296–297. JCDL 2006, Association for Computing Machinery, New York, NY, USA (2006). https://​doi.​org/​10.​1145/​1141753.​1141819
21.
Zurück zum Zitat Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411. Association for Computational Linguistics, Barcelona, Spain (July 2004). https://aclanthology.org/W04-3252 Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411. Association for Computational Linguistics, Barcelona, Spain (July 2004). https://​aclanthology.​org/​W04-3252
23.
Zurück zum Zitat Park, S., Caragea, C.: Scientific keyphrase identification and classification by pre-trained language models intermediate task transfer learning. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 5409–5419 (2020) Park, S., Caragea, C.: Scientific keyphrase identification and classification by pre-trained language models intermediate task transfer learning. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 5409–5419 (2020)
24.
Zurück zum Zitat Sahrawat, D.: Keyphrase extraction as sequence labeling using contextualized embeddings. Adv. Inf. Retr. 12036, 328 (2020) Sahrawat, D.: Keyphrase extraction as sequence labeling using contextualized embeddings. Adv. Inf. Retr. 12036, 328 (2020)
25.
Zurück zum Zitat Song, Y., Zhang, L., Giles, C.L.: Automatic tag recommendation algorithms for social recommender systems. ACM Trans. Web (TWEB) 5(1), 1–31 (2011)CrossRef Song, Y., Zhang, L., Giles, C.L.: Automatic tag recommendation algorithms for social recommender systems. ACM Trans. Web (TWEB) 5(1), 1–31 (2011)CrossRef
26.
Zurück zum Zitat Sun, Z., Tang, J., Du, P., Deng, Z.H., Nie, J.Y.: Divgraphpointer: a graph pointer network for extracting diverse keyphrases. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 755–764. SIGIR 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3331184.3331219 Sun, Z., Tang, J., Du, P., Deng, Z.H., Nie, J.Y.: Divgraphpointer: a graph pointer network for extracting diverse keyphrases. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 755–764. SIGIR 2019, Association for Computing Machinery, New York, NY, USA (2019). https://​doi.​org/​10.​1145/​3331184.​3331219
27.
Zurück zum Zitat Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 552–559. Association for Computational Linguistics, Prague, Czech Republic (June 2007). https://aclanthology.org/P07-1070 Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 552–559. Association for Computational Linguistics, Prague, Czech Republic (June 2007). https://​aclanthology.​org/​P07-1070
28.
Zurück zum Zitat Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, p. 254–255. DL 1999, Association for Computing Machinery, New York, NY, USA (1999). https://doi.org/10.1145/313238.313437 Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, p. 254–255. DL 1999, Association for Computing Machinery, New York, NY, USA (1999). https://​doi.​org/​10.​1145/​313238.​313437
29.
Zurück zum Zitat Xiong, L., Hu, C., Xiong, C., Campos, D., Overwijk, A.: Open domain web keyphrase extraction beyond language modeling. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5175–5184 (2019) Xiong, L., Hu, C., Xiong, C., Campos, D., Overwijk, A.: Open domain web keyphrase extraction beyond language modeling. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5175–5184 (2019)
33.
Zurück zum Zitat Zha, H.: Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 113–120. SIGIR 2002, Association for Computing Machinery, New York, NY, USA (2002). https://doi.org/10.1145/564376.564398 Zha, H.: Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 113–120. SIGIR 2002, Association for Computing Machinery, New York, NY, USA (2002). https://​doi.​org/​10.​1145/​564376.​564398
Metadaten
Titel
From Fundamentals to Recent Advances: A Tutorial on Keyphrasification
verfasst von
Rui Meng
Debanjan Mahata
Florian Boudin
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-030-99739-7_73