Skip to main content
Erschienen in: Cognitive Computation 2/2022

21.01.2022

HAKE: an Unsupervised Approach to Automatic Keyphrase Extraction for Multiple Domains

verfasst von: Zakariae Alami Merrouni, Bouchra Frikh, Brahim Ouhbi

Erschienen in: Cognitive Computation | Ausgabe 2/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Keyphrases capture the main content of a free text document. The task of automatic keyphrase extraction (AKPE) plays a significant role in retrieving and summarizing valuable information from several documents with different domains. Various techniques have been proposed for this task. However, supervised AKPE requires large annotated data and depends on the tested domain. An alternative solution is to consider a new independent domain method that can be applied to several domains (such as medical, social). In this paper, we tackle keyphrase extraction from single documents with HAKE, a novel unsupervised method that takes full advantage of mining linguistic, statistical, structural, and semantic text features simultaneously to select the most relevant keyphrases in a text. HAKE achieves higher F-scores than the unsupervised state-of-the-art systems on standard datasets and is suitable for real-time processing of large amounts of Web and text data across different domains. With HAKE, we also explicitly increase coverage and diversity among the selected keyphrases by introducing a novel technique (based on a parse tree approach, part of speech tagging, and filtering) for candidate keyphrase identification and extraction. This technique allows us to generate a comprehensive and meaningful list of candidate keyphrases and reduce the candidate set’s size without increasing the computational complexity. HAKE’s effectiveness is compared to twelve state-of-the-art and recent unsupervised approaches, as well as to some other supervised approaches. Experimental analysis is conducted to validate the proposed method using five of the top available benchmark corpora from different domains and shows that HAKE significantly outperforms both the existing unsupervised and supervised methods. Our method does not require training on a particular set of documents, nor does it depend on external corpora, dictionaries, domain, or text size. Our experiments confirm that HAKE’s candidate selection model and its ranking model are effective.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Sarkar K. A hybrid approach to extract keyphrases from medical documents. Int J Comput Appl. 2013;63(18):14–19. Sarkar K. A hybrid approach to extract keyphrases from medical documents. Int J Comput Appl. 2013;63(18):14–19.
2.
Zurück zum Zitat Gutwin C, Paynter G, Witten I, Nevill-Manning C, Frank E. Improving browsing in digital libraries with keyphrase indexes. Decis Support Syst. 1999;27(1–2):81–104.CrossRef Gutwin C, Paynter G, Witten I, Nevill-Manning C, Frank E. Improving browsing in digital libraries with keyphrase indexes. Decis Support Syst. 1999;27(1–2):81–104.CrossRef
3.
Zurück zum Zitat Jones S, Staveley MS. PHRASIER: a system for interactive document retrieval using keyphrases. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. 1999. pp. 160–167. ACM. Jones S, Staveley MS. PHRASIER: a system for interactive document retrieval using keyphrases. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. 1999. pp. 160–167. ACM.
4.
Zurück zum Zitat D’Avanzo E, Magnini B. A keyphrase-based approach to summarization: the LAKE system at DUC-2005. In: Proceedings of DUC. 2005. D’Avanzo E, Magnini B. A keyphrase-based approach to summarization: the LAKE system at DUC-2005. In: Proceedings of DUC. 2005.
5.
Zurück zum Zitat Zha H. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. 2002. pp. 113–120. ACM. Zha H. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. 2002. pp. 113–120. ACM.
6.
Zurück zum Zitat Zhang Y, Zincir-Heywood N, Milios E. World Wide Web site summarization. Web Intelligence and Agent Systems: An International Journal. 2004;2(1), 39–53. Zhang Y, Zincir-Heywood N, Milios E. World Wide Web site summarization. Web Intelligence and Agent Systems: An International Journal. 2004;2(1), 39–53.
7.
Zurück zum Zitat Hammouda KM, Matute DN, Kamel MS. COREPHRASE: keyphrase extraction for document clustering. In: International workshop on machine learning and data mining in pattern recognition. 2005. pp. 265–274. Hammouda KM, Matute DN, Kamel MS. COREPHRASE: keyphrase extraction for document clustering. In: International workshop on machine learning and data mining in pattern recognition. 2005. pp. 265–274.
8.
Zurück zum Zitat Han J, Kim T, Choi J. Web document clustering by using automatic keyphrase extraction. In: 2007 IEEE/WIC/ACM international conferences on web intelligence and intelligent agent technology - workshops. 2007. pp. 56–59. IEEE. Han J, Kim T, Choi J. Web document clustering by using automatic keyphrase extraction. In: 2007 IEEE/WIC/ACM international conferences on web intelligence and intelligent agent technology - workshops. 2007. pp. 56–59. IEEE.
9.
Zurück zum Zitat Hulth A, Megyesi BB. A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics. 2006. pp. 537–544. Hulth A, Megyesi BB. A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics. 2006. pp. 537–544.
10.
Zurück zum Zitat Berend G. Opinion expression mining by exploiting keyphrase extraction. In: proceedings of the 5th international joint conference on natural language processing. Asian Federation of Natural Language Processing. 2011. Berend G. Opinion expression mining by exploiting keyphrase extraction. In: proceedings of the 5th international joint conference on natural language processing. Asian Federation of Natural Language Processing. 2011.
11.
Zurück zum Zitat Dashtipour K, Gogate M, Cambria E, Hussain A. A novel context-aware multimodal framework for Persian sentiment analysis. Neurocomputing. 2021. Dashtipour K, Gogate M, Cambria E, Hussain A. A novel context-aware multimodal framework for Persian sentiment analysis. Neurocomputing. 2021.
12.
Zurück zum Zitat Chen M, Sun JT, Zeng HJ, Lam KY. A practical system of keyphrase extraction for web pages. In: Proceedings of the 14th ACM international conference on information and knowledge management; 2005. pp. 277–278. ACM. Chen M, Sun JT, Zeng HJ, Lam KY. A practical system of keyphrase extraction for web pages. In: Proceedings of the 14th ACM international conference on information and knowledge management; 2005. pp. 277–278. ACM.
13.
Zurück zum Zitat Turney PD. Coherent keyphrase extraction via web mining. CORR ArXiv Preprint Cs/0308033. 2003. Turney PD. Coherent keyphrase extraction via web mining. CORR ArXiv Preprint Cs/0308033. 2003.
14.
Zurück zum Zitat Ferrara F, Pudota N, Tasso C. A keyphrase-based paper recommender system. In: Italian research conference on digital libraries; 2011, pp. 14–25. Ferrara F, Pudota N, Tasso C. A keyphrase-based paper recommender system. In: Italian research conference on digital libraries; 2011, pp. 14–25.
15.
Zurück zum Zitat Do N, Ho L. Domain-specific keyphrase extraction and near-duplicate article detection based on ontology. In: International conference on computing & communication technologies, research, innovation, and vision for the future (RIVF). 2015; pp. 123–126. IEEE. Do N, Ho L. Domain-specific keyphrase extraction and near-duplicate article detection based on ontology. In: International conference on computing & communication technologies, research, innovation, and vision for the future (RIVF). 2015; pp. 123–126. IEEE.
16.
Zurück zum Zitat El Idrissi O, Frikh B, Ouhbi B. HCHIRSIMEX: an extended method for domain ontology learning based on conditional mutual information. In: Third IEEE international colloquium in information science and technology (CIST); 2014. pp. 91–95. El Idrissi O, Frikh B, Ouhbi B. HCHIRSIMEX: an extended method for domain ontology learning based on conditional mutual information. In: Third IEEE international colloquium in information science and technology (CIST); 2014. pp. 91–95.
17.
Zurück zum Zitat Fortuna B, Grobelnik M, Mladeni’c D. Semi-automatic data-driven ontology construction system. In: Proceedings of the 9th international multi-conference information society; 2006, pp. 223–226. Fortuna B, Grobelnik M, Mladeni’c D. Semi-automatic data-driven ontology construction system. In: Proceedings of the 9th international multi-conference information society; 2006, pp. 223–226.
18.
Zurück zum Zitat Frikh B, Djaanfar AS, Ouhbi B. A new methodology for domain ontology construction from the Web. Int J Artif Intell Tools. 2011;20(06):1157–70.CrossRef Frikh B, Djaanfar AS, Ouhbi B. A new methodology for domain ontology construction from the Web. Int J Artif Intell Tools. 2011;20(06):1157–70.CrossRef
19.
Zurück zum Zitat Merrouni ZA, Frikh B, Ouhbi B. Automatic keyphrase extraction: a survey and trends. Journal of Intelligent Information Systems. 2019. pp. 1–34. Springer. Merrouni ZA, Frikh B, Ouhbi B. Automatic keyphrase extraction: a survey and trends. Journal of Intelligent Information Systems. 2019. pp. 1–34. Springer.
20.
Zurück zum Zitat You W, Fontaine D, Barth’es JP. An automatic keyphrase extraction system for scientific documents. Knowl Inf Syst. 2013;34(3), 691–724. You W, Fontaine D, Barth’es JP. An automatic keyphrase extraction system for scientific documents. Knowl Inf Syst. 2013;34(3), 691–724.
21.
Zurück zum Zitat Kim SN, Medelyan O, Kan MY, Baldwin T. SEMEVAL-2010 Task 5: automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics. 2010. pp. 21–26. Kim SN, Medelyan O, Kan MY, Baldwin T. SEMEVAL-2010 Task 5: automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics. 2010. pp. 21–26.
22.
Zurück zum Zitat Liu Z, Huang W, Zheng Y, Sun M. Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics. 2010. pp. 366–376. Liu Z, Huang W, Zheng Y, Sun M. Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics. 2010. pp. 366–376.
23.
Zurück zum Zitat Boudin F. Reducing over-generation errors for automatic keyphrase extraction using integer linear programming. In: ACL 2015 Workshop on Novel Computational Approaches to Keyphrase Extraction; 2015. Boudin F. Reducing over-generation errors for automatic keyphrase extraction using integer linear programming. In: ACL 2015 Workshop on Novel Computational Approaches to Keyphrase Extraction; 2015.
24.
Zurück zum Zitat Frank E, Paynter GW, Witten IH, Gutwin C, Nevill-Manning CG. Domain-specific keyphrase extraction. In: Proceedings of the sixteenth international joint conference on artificial intelligence, IJCAI ‘99. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 1999 pp. 668–673. http://dl.acm.org/citation.cfm?id=646307.687591. Frank E, Paynter GW, Witten IH, Gutwin C, Nevill-Manning CG. Domain-specific keyphrase extraction. In: Proceedings of the sixteenth international joint conference on artificial intelligence, IJCAI ‘99. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 1999 pp. 668–673. http://​dl.​acm.​org/​citation.​cfm?​id=​646307.​687591.
25.
Zurück zum Zitat Turney PD. Learning algorithms for keyphrase extraction. Inf Retrieval. 2000;2(4):303–36.CrossRef Turney PD. Learning algorithms for keyphrase extraction. Inf Retrieval. 2000;2(4):303–36.CrossRef
26.
Zurück zum Zitat Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG. KEA: Practical automatic keyphrase extraction. In: Proceedings of the fourth ACM conference on digital libraries. 1999. pp. 254–255. ACM. Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG. KEA: Practical automatic keyphrase extraction. In: Proceedings of the fourth ACM conference on digital libraries. 1999. pp. 254–255. ACM.
27.
Zurück zum Zitat Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 2013. (pp. 3111–3119) Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 2013. (pp. 3111–3119)
28.
Zurück zum Zitat Huang C, Tian Y, Zhou Z, Ling CX, Huang T. Keyphrase extraction using semantic networks structure analysis. In: Sixth international conference on data mining (ICDM’06). 2006; pp. 275–284. IEEE. Huang C, Tian Y, Zhou Z, Ling CX, Huang T. Keyphrase extraction using semantic networks structure analysis. In: Sixth international conference on data mining (ICDM’06). 2006; pp. 275–284. IEEE.
29.
Zurück zum Zitat Liu F, Pennell D, Liu F, Liu Y. Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the Association for Computational Linguistics. Association for Computational Linguistics. 2009. pp. 620–628. Liu F, Pennell D, Liu F, Liu Y. Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the Association for Computational Linguistics. Association for Computational Linguistics. 2009. pp. 620–628.
30.
Zurück zum Zitat Campos R, Mangaravite V, Pasquali A, Jorge A, Nunes C, Jatowt A. YAKE! Keyword extraction from single documents using multiple local features. Inf Sci. 2020;509:257–89.CrossRef Campos R, Mangaravite V, Pasquali A, Jorge A, Nunes C, Jatowt A. YAKE! Keyword extraction from single documents using multiple local features. Inf Sci. 2020;509:257–89.CrossRef
31.
Zurück zum Zitat Haddoud M, Abdeddaïm S. Accurate keyphrase extraction by discriminating overlapping phrases. J Inf Sci. 2014; 40(4), 488–500. Haddoud M, Abdeddaïm S. Accurate keyphrase extraction by discriminating overlapping phrases. J Inf Sci. 2014; 40(4), 488–500.
32.
Zurück zum Zitat Hulth A. Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 conference on empirical methods in natural language processing. Association for Computational Linguistics. 2003. pp. 216–223 Hulth A. Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 conference on empirical methods in natural language processing. Association for Computational Linguistics. 2003. pp. 216–223
33.
Zurück zum Zitat Wan X, Xiao J. Single document keyphrase extraction using neighborhood knowledge. In: AAAI. 2008. vol. 8, pp. 855–860. Wan X, Xiao J. Single document keyphrase extraction using neighborhood knowledge. In: AAAI. 2008. vol. 8, pp. 855–860.
34.
Zurück zum Zitat Barker K, Cornacchia N. Using noun phrase heads to extract document keyphrases. In: conference of the canadian society for computational studies of intelligence. 2000; pp. 40–52. Springer. Barker K, Cornacchia N. Using noun phrase heads to extract document keyphrases. In: conference of the canadian society for computational studies of intelligence. 2000; pp. 40–52. Springer.
35.
Zurück zum Zitat Nguyen TD, Kan MY. Keyphrase extraction in scientific publications. In: International conference on asian digital libraries. 2007. pp. 317–326. Springer. Nguyen TD, Kan MY. Keyphrase extraction in scientific publications. In: International conference on asian digital libraries. 2007. pp. 317–326. Springer.
36.
Zurück zum Zitat Grineva M, Grinev M, Lizorkin D. Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th international conference on World Wide Web, 2009. pp. 661–670. Grineva M, Grinev M, Lizorkin D. Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th international conference on World Wide Web, 2009. pp. 661–670.
37.
Zurück zum Zitat El-Beltagy SR, Rafea A. KP-MINER: a keyphrase extraction system for English and Arabic documents. Inf Syst. 2009;34(1):132–44.CrossRef El-Beltagy SR, Rafea A. KP-MINER: a keyphrase extraction system for English and Arabic documents. Inf Syst. 2009;34(1):132–44.CrossRef
38.
Zurück zum Zitat Newman D, Koilada N, Lau JH, Baldwin T. Bayesian text segmentation for index term identification and keyphrase extraction. Proceedings of COLING. 2012;2012:2077–92. Newman D, Koilada N, Lau JH, Baldwin T. Bayesian text segmentation for index term identification and keyphrase extraction. Proceedings of COLING. 2012;2012:2077–92.
39.
Zurück zum Zitat Medelyan O, Witten IH. Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS joint conference on digital libraries. 2006. pp. 296–297. ACM. Medelyan O, Witten IH. Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS joint conference on digital libraries. 2006. pp. 296–297. ACM.
40.
Zurück zum Zitat Mihalcea R, Tarau P. TEXTRANK: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004. Mihalcea R, Tarau P. TEXTRANK: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004.
41.
Zurück zum Zitat Mahata D, Kuriakose J, Shah R, Zimmermann R. Key2vec: automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2008. Volume 2 (Short Papers) (pp. 634–639) Mahata D, Kuriakose J, Shah R, Zimmermann R. Key2vec: automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2008. Volume 2 (Short Papers) (pp. 634–639)
42.
Zurück zum Zitat Medelyan O, Frank E, Witten IH. Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2009 conference on empirical methods in natural language processing. 2009. p. 1318–1327. Medelyan O, Frank E, Witten IH. Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2009 conference on empirical methods in natural language processing. 2009. p. 1318–1327.
43.
Zurück zum Zitat You W, Fontaine D, Barthes JP. Automatic keyphrase extraction with a refined candidate set. In: Proceedings of the 2009 IEE/WIC/ACM International joint conference on web intelligence and intelligent agent technology. IEEE Computer Society. 2009. volume 01, pp. 576–579. You W, Fontaine D, Barthes JP. Automatic keyphrase extraction with a refined candidate set. In: Proceedings of the 2009 IEE/WIC/ACM International joint conference on web intelligence and intelligent agent technology. IEEE Computer Society. 2009. volume 01, pp. 576–579.
44.
Zurück zum Zitat Liu Z, Li P, Zheng Y, Sun M. Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 conference on empirical methods in natural language processing. Association for Computational Linguistics. 2009. volume 1, pp. 257–266. Liu Z, Li P, Zheng Y, Sun M. Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 conference on empirical methods in natural language processing. Association for Computational Linguistics. 2009. volume 1, pp. 257–266. 
45.
Zurück zum Zitat Rose S, Engel D, Cramer N, Cowley W. Automatic keyword extraction from individual documents. Text Mining: Applications and Theory. 2010;1:1–20. Rose S, Engel D, Cramer N, Cowley W. Automatic keyword extraction from individual documents. Text Mining: Applications and Theory. 2010;1:1–20.
46.
Zurück zum Zitat Gollapalli SD, Caragea C. Extracting keyphrases from research papers using citation networks. In: AAAI. 2014. pp. 1629–1635. Gollapalli SD, Caragea C. Extracting keyphrases from research papers using citation networks. In: AAAI. 2014. pp. 1629–1635.
47.
Zurück zum Zitat Yang S, Lu W, Yang D, Li X, Wu C, Wei B. KEYPHRASEDS: automatic generation of survey by exploiting keyphrase information. Neurocomputing. 2017;224:58–70.CrossRef Yang S, Lu W, Yang D, Li X, Wu C, Wei B. KEYPHRASEDS: automatic generation of survey by exploiting keyphrase information. Neurocomputing. 2017;224:58–70.CrossRef
48.
Zurück zum Zitat Xie F, Wu X, Zhu X. Efficient sequential pattern mining with wildcards for keyphrase extraction. Knowl-Based Syst. 2017;115:27–39.CrossRef Xie F, Wu X, Zhu X. Efficient sequential pattern mining with wildcards for keyphrase extraction. Knowl-Based Syst. 2017;115:27–39.CrossRef
49.
Zurück zum Zitat Rafiei-Asl J, Nickabadi A. Tsake: a topical and structural automatic keyphrase extractor. Appl Soft Comput. 2017;58:620–30.CrossRef Rafiei-Asl J, Nickabadi A. Tsake: a topical and structural automatic keyphrase extractor. Appl Soft Comput. 2017;58:620–30.CrossRef
50.
Zurück zum Zitat Danesh S, Sumner T, Martin JH. SGRANK: Combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction. In: Proceedings of the fourth joint conference on lexical and computational semantics; 2015. pp. 117–126. Danesh S, Sumner T, Martin JH. SGRANK: Combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction. In: Proceedings of the fourth joint conference on lexical and computational semantics; 2015. pp. 117–126.
51.
Zurück zum Zitat Rabby G, Azad S, Mahmud M, Zamli KZ, Rahman MM. A flexible keyphrase extraction technique for academic literature. Procedia Computer Science. 2018;135:553–63.CrossRef Rabby G, Azad S, Mahmud M, Zamli KZ, Rahman MM. A flexible keyphrase extraction technique for academic literature. Procedia Computer Science. 2018;135:553–63.CrossRef
52.
Zurück zum Zitat Matsuo Y, Ishizuka M. Keyword extraction from a single document using word co-occurrence statistical information. Int J Artif Intell Tools. 2004;13(01):157–69.CrossRef Matsuo Y, Ishizuka M. Keyword extraction from a single document using word co-occurrence statistical information. Int J Artif Intell Tools. 2004;13(01):157–69.CrossRef
53.
Zurück zum Zitat Li Y, Luo C, Chung SM. Text clustering with feature selection by using statistical data. IEEE Trans Knowl Data Eng. 2008;20(5):641–52.CrossRef Li Y, Luo C, Chung SM. Text clustering with feature selection by using statistical data. IEEE Trans Knowl Data Eng. 2008;20(5):641–52.CrossRef
54.
Zurück zum Zitat Wang J, Peng H. Keyphrases extraction from web document by the least squares support vector machine. In: The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05). 2005 pp. 293–296. IEEE. Wang J, Peng H. Keyphrases extraction from web document by the least squares support vector machine. In: The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05). 2005 pp. 293–296. IEEE.
55.
Zurück zum Zitat Kumar N, Srinathan K. Automatic keyphrase extraction from scientific documents using n-gram filtration technique. In: Proceedings of the eighth ACM symposium on document engineering. 2008. pp. 199–208. ACM. Kumar N, Srinathan K. Automatic keyphrase extraction from scientific documents using n-gram filtration technique. In: Proceedings of the eighth ACM symposium on document engineering. 2008. pp. 199–208. ACM.
56.
Zurück zum Zitat Berend G, Farkas R. SZTERGAK: feature engineering for keyphrase extraction. In: proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics. 2010. pp. 186–189. Berend G, Farkas R. SZTERGAK: feature engineering for keyphrase extraction. In: proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics. 2010. pp. 186–189.
57.
Zurück zum Zitat Adar E, Datta S. Building a scientific concept hierarchy database (schbase). In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015. vol. 1, pp. 606–615. Adar E, Datta S. Building a scientific concept hierarchy database (schbase). In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015. vol. 1, pp. 606–615.
58.
Zurück zum Zitat Florescu C, Caragea C. A new scheme for scoring phrases in unsupervised keyphrase extraction. In: Proceedings of the 39th European Conference on Information Retrieval (ECIR’17), Aberdeen, Scotland. April 9–13. 2017. pp. 477–483. Florescu C, Caragea C. A new scheme for scoring phrases in unsupervised keyphrase extraction. In: Proceedings of the 39th European Conference on Information Retrieval (ECIR’17), Aberdeen, Scotland. April 9–13. 2017. pp. 477–483.
59.
Zurück zum Zitat Tomokiyo T, Hurst M. A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 workshop on multiword expressions: analysis, acquisition and treatment. Association for Computational Linguistic. 2003. volume 18, pp. 33–40. Tomokiyo T, Hurst M. A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 workshop on multiword expressions: analysis, acquisition and treatment. Association for Computational Linguistic. 2003. volume 18, pp. 33–40.
60.
Zurück zum Zitat Rabby G, Azad S, Mahmud M, Zamli KZ, Rahman MM. TeKET: a tree-based unsupervised keyphrase extraction technique. Cogn Comput. 2020. 1–23. Rabby G, Azad S, Mahmud M, Zamli KZ, Rahman MM. TeKET: a tree-based unsupervised keyphrase extraction technique. Cogn Comput. 2020. 1–23.
61.
Zurück zum Zitat Bougouin A, Boudin F, Daille B. Topicrank: graph-based topic ranking for keyphrase extraction. In: Proc IJCNLP; 2013. p. 543–551. Bougouin A, Boudin F, Daille B. Topicrank: graph-based topic ranking for keyphrase extraction. In: Proc IJCNLP; 2013. p. 543–551.
62.
Zurück zum Zitat Sterckx L, Demeester T, Deleu J, Develder C. Topical word importance for fast keyphrase extraction. In Proceedings of the 24th International Conference on World Wide Web; 2015. (pp. 121–122). Sterckx L, Demeester T, Deleu J, Develder C. Topical word importance for fast keyphrase extraction. In Proceedings of the 24th International Conference on World Wide Web; 2015. (pp. 121–122).
63.
Zurück zum Zitat Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003; 3, 993–1022. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003; 3, 993–1022.
64.
Zurück zum Zitat Florescu C, Caragea C. Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proc. ACL; 2017. p. 1105–1115. Florescu C, Caragea C. Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proc. ACL; 2017. p. 1105–1115.
65.
Zurück zum Zitat Boudin F. Unsupervised keyphrase extraction with multipartite graphs. In: Proc NAACL: Human language technologies; 2018. p. 667–672. Boudin F. Unsupervised keyphrase extraction with multipartite graphs. In: Proc NAACL: Human language technologies; 2018. p. 667–672.
66.
Zurück zum Zitat Baroni M, Dinu G, Kruszewski G. Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In ACL (1). 2014 p. 238–247. Baroni M, Dinu G, Kruszewski G. Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In ACL (1). 2014 p. 238–247.
67.
Zurück zum Zitat Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); 2014. (pp. 1532–1543). Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); 2014. (pp. 1532–1543).
68.
Zurück zum Zitat Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics. 2017;5:135–46.CrossRef Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics. 2017;5:135–46.CrossRef
69.
Zurück zum Zitat Papagiannopoulou E, Tsoumakas G. Local word vectors guiding keyphrase extraction. Inf Process Manage. 2018;54(6):888–902.CrossRef Papagiannopoulou E, Tsoumakas G. Local word vectors guiding keyphrase extraction. Inf Process Manage. 2018;54(6):888–902.CrossRef
70.
Zurück zum Zitat Bennani-Smires K, Musat C, Hossmann A, et al. Simple unsupervised keyphrase extraction using sentence embeddings. In: Proceedings of the 22nd Conference on Computational Natural Language Learning. 2018. p. 221–229. Bennani-Smires K, Musat C, Hossmann A, et al. Simple unsupervised keyphrase extraction using sentence embeddings. In: Proceedings of the 22nd Conference on Computational Natural Language Learning. 2018. p. 221–229.
71.
Zurück zum Zitat Sun Y, Qiu H, Zheng Y, Wang Z, Zhang C. SIFRank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access. 2020;8:10896–906.CrossRef Sun Y, Qiu H, Zheng Y, Wang Z, Zhang C. SIFRank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access. 2020;8:10896–906.CrossRef
72.
Zurück zum Zitat Cohen JD. Highlights: Language-and domain-independent automatic indexing terms for abstracting. J Am Soc Inf Sci. 1995;46(3):162–74.CrossRef Cohen JD. Highlights: Language-and domain-independent automatic indexing terms for abstracting. J Am Soc Inf Sci. 1995;46(3):162–74.CrossRef
73.
Zurück zum Zitat Nguyen TD, Luong MT. WINGNUS: keyphrase extraction utilizing document logical structure. In: Proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics. 2010. pp. 166–169. Nguyen TD, Luong MT. WINGNUS: keyphrase extraction utilizing document logical structure. In: Proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics. 2010. pp. 166–169.
74.
Zurück zum Zitat Ong TH, Chen H. Updateable pat-tree approach to Chinese keyphrase extraction using mutual information: A linguistic foundation for knowledge management. 1999. Ong TH, Chen H. Updateable pat-tree approach to Chinese keyphrase extraction using mutual information: A linguistic foundation for knowledge management. 1999.
75.
Zurück zum Zitat Ramos J, et al. Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning. Piscataway, NJ. 2003. vol. 242, pp. 133–142. Ramos J, et al. Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning. Piscataway, NJ. 2003. vol. 242, pp. 133–142.
76.
Zurück zum Zitat Barzilay R, Elhadad M. Using lexical chains for text summarization. Advances in automatic text summarization pp. 1999; 111–121. Barzilay R, Elhadad M. Using lexical chains for text summarization. Advances in automatic text summarization pp. 1999; 111–121.
77.
Zurück zum Zitat Krapivin M, Autayeu A, Marchese M, Blanzieri E, Segata N. Keyphrases extraction from scientific documents: improving machine learning approaches with natural language processing. In: International Conference on Asian Digital Libraries. 2010. pp. 102–111. Krapivin M, Autayeu A, Marchese M, Blanzieri E, Segata N. Keyphrases extraction from scientific documents: improving machine learning approaches with natural language processing. In: International Conference on Asian Digital Libraries. 2010. pp. 102–111.
78.
Zurück zum Zitat Krapivin M, Marchese M, Yadrantsau A, Liang Y. Unsupervised key-phrases extraction from scientific papers using domain and linguistic knowledge. In: 2008 Third International Conference on Digital Information Management. 2008. pp. 105–112. IEEE. Krapivin M, Marchese M, Yadrantsau A, Liang Y. Unsupervised key-phrases extraction from scientific papers using domain and linguistic knowledge. In: 2008 Third International Conference on Digital Information Management. 2008. pp. 105–112. IEEE.
79.
Zurück zum Zitat Le TTN, Le Nguyen M, Shimazu A. Unsupervised keyphrase extraction: Introducing new kinds of words to keyphrases. In: Australasian Joint Conference on Artificial Intelligence. 2016. pp. 665–671. Springer. Le TTN, Le Nguyen M, Shimazu A. Unsupervised keyphrase extraction: Introducing new kinds of words to keyphrases. In: Australasian Joint Conference on Artificial Intelligence. 2016. pp. 665–671. Springer.
80.
Zurück zum Zitat Salton G, Singhal A, Mitra M, Buckley C. Automatic text structuring and summarization. Inf Process Manage. 1997;33(2):193–207.CrossRef Salton G, Singhal A, Mitra M, Buckley C. Automatic text structuring and summarization. Inf Process Manage. 1997;33(2):193–207.CrossRef
81.
Zurück zum Zitat Lopez P, Romary L. HUMB: automatic key term extraction from scientific articles in GROBID. In: Proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics. 2010. pp. 248–251. Lopez P, Romary L. HUMB: automatic key term extraction from scientific articles in GROBID. In: Proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics. 2010. pp. 248–251.
82.
Zurück zum Zitat Chua S, Kulathuramaiyer N. Semantic feature selection using wordnet. In: IEEE/WIC/ACM International Conference on Web Intelligence (WI’04); 2004. pp. 166–172. Chua S, Kulathuramaiyer N. Semantic feature selection using wordnet. In: IEEE/WIC/ACM International Conference on Web Intelligence (WI’04); 2004. pp. 166–172.
83.
Zurück zum Zitat Dagan I, Marcus S, Markovitch S. Contextual word similarity and estimation from sparse data. In: Proceedings of the 31st annual meeting on Association for Computational Linguistics. Association for Computational Linguistics; 1993. pp. 164–171. Dagan I, Marcus S, Markovitch S. Contextual word similarity and estimation from sparse data. In: Proceedings of the 31st annual meeting on Association for Computational Linguistics. Association for Computational Linguistics; 1993. pp. 164–171.
84.
Zurück zum Zitat Kelleher D, Luz S. Automatic hypertext keyphrase detection In: IJCAI. 2005;5:1608–9. Kelleher D, Luz S. Automatic hypertext keyphrase detection In: IJCAI. 2005;5:1608–9.
85.
Zurück zum Zitat Li CH, Park SC. Combination of modified bpnn algorithms and an efficient feature selection method for text categorization. Inf Process Manage. 2009;45(3):329–40.CrossRef Li CH, Park SC. Combination of modified bpnn algorithms and an efficient feature selection method for text categorization. Inf Process Manage. 2009;45(3):329–40.CrossRef
86.
Zurück zum Zitat Song W, Liang JZ, He XL, Chen P. Taking advantage of improved resource allocating network and latent semantic feature selection approach for automated text categorization. Appl Soft Comput. 2014;21:210–20.CrossRef Song W, Liang JZ, He XL, Chen P. Taking advantage of improved resource allocating network and latent semantic feature selection approach for automated text categorization. Appl Soft Comput. 2014;21:210–20.CrossRef
87.
Zurück zum Zitat Frantzi KT, Ananiadou S, Tsujii J. The C-VALUE/NC-VALUE method of automatic recognition for multi-word terms. In: International conference on theory and practice of digital libraries. 1998, pp. 585–604. Frantzi KT, Ananiadou S, Tsujii J. The C-VALUE/NC-VALUE method of automatic recognition for multi-word terms. In: International conference on theory and practice of digital libraries. 1998, pp. 585–604.
89.
Zurück zum Zitat Zhang K, Xu H, Tang J, Li J. Keyword extraction using Support Vector Machine. In: international conference on web-age information management. 2006. pp. 85–96. Springer. Zhang K, Xu H, Tang J, Li J. Keyword extraction using Support Vector Machine. In: international conference on web-age information management. 2006. pp. 85–96. Springer.
92.
Zurück zum Zitat De Marneffe MC, MacCartney B, Manning CD, et al. Generating typed dependency parses from phrase structure parses. In: Lrec; 2006. vol. 6. pp. 449–454. De Marneffe MC, MacCartney B, Manning CD, et al. Generating typed dependency parses from phrase structure parses. In: Lrec; 2006. vol. 6. pp. 449–454.
93.
Zurück zum Zitat Toutanova K, Klein D, Manning CD, Singer Y. Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology. Association for computational Linguistics. 2003. volume 1, pp. 173–180. Toutanova K, Klein D, Manning CD, Singer Y. Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology. Association for computational Linguistics. 2003. volume 1, pp. 173–180. 
94.
Zurück zum Zitat Sotoca JM, Pla F. Supervised feature selection by clustering using conditional mutual information-based distances. Pattern Recogn. 2010;43(6):2068–81.CrossRef Sotoca JM, Pla F. Supervised feature selection by clustering using conditional mutual information-based distances. Pattern Recogn. 2010;43(6):2068–81.CrossRef
95.
Zurück zum Zitat Ding Z, Zhang Q, Huang X. Keyphrase extraction from online news using binary integer programming. In: Proceedings of 5th International Joint Conference on Natural Language Processing. 2011; pp. 165–173. Ding Z, Zhang Q, Huang X. Keyphrase extraction from online news using binary integer programming. In: Proceedings of 5th International Joint Conference on Natural Language Processing. 2011; pp. 165–173.
97.
Zurück zum Zitat Medelyan O, Witten IH, Milne D. Topic indexing with Wikipedia. In Proceedings of the AAAI WikiAI workshop. 2008, July (Vol. 1, pp. 19–24). Medelyan O, Witten IH, Milne D. Topic indexing with Wikipedia. In Proceedings of the AAAI WikiAI workshop. 2008, July (Vol. 1, pp. 19–24).
98.
Zurück zum Zitat Krapivin M, Autaeu A, Marchese M. Large dataset for keyphrases extraction, University of Trento. Tech Report # DISI-09–055. 2009. Krapivin M, Autaeu A, Marchese M. Large dataset for keyphrases extraction, University of Trento. Tech Report # DISI-09–055. 2009.
99.
Zurück zum Zitat Chen W, Chan HP, Li P, Bing L, King I. An integrated approach for keyphrase generation via exploring the power of retrieval and extraction. In: NAACL-HLT (1). 2019. Chen W, Chan HP, Li P, Bing L, King I. An integrated approach for keyphrase generation via exploring the power of retrieval and extraction. In: NAACL-HLT (1). 2019.
100.
Zurück zum Zitat Kim SN, Medelyan O, Kan MY, Baldwin T. Automatic keyphrase extraction from scientific articles. Lang Resour Eval. 2013;47(3):723–42.CrossRef Kim SN, Medelyan O, Kan MY, Baldwin T. Automatic keyphrase extraction from scientific articles. Lang Resour Eval. 2013;47(3):723–42.CrossRef
101.
Zurück zum Zitat Boudin F. PKE: an open-source python-based keyphrase extraction toolkit. In: Proc COLING; 2016. p. 69–73. Boudin F. PKE: an open-source python-based keyphrase extraction toolkit. In: Proc COLING; 2016. p. 69–73.
102.
Zurück zum Zitat Jones KS. A statistical interpretation of term specificity and its application in retrieval. J Document. 1972;28(1):11–21.CrossRef Jones KS. A statistical interpretation of term specificity and its application in retrieval. J Document. 1972;28(1):11–21.CrossRef
103.
Zurück zum Zitat Zesch T, Gurevych I. Approximate matching for evaluating keyphrase extraction. In: Proceedings of the international conference ranLP. 2009. pp. 484–489. Zesch T, Gurevych I. Approximate matching for evaluating keyphrase extraction. In: Proceedings of the international conference ranLP. 2009. pp. 484–489.
104.
Zurück zum Zitat Porter MF. An algorithm for suffix stripping. Program 1980;14(3), 130–137. Porter MF. An algorithm for suffix stripping. Program 1980;14(3), 130–137.
105.
Zurück zum Zitat Pal T, Banka H, Mitra P. Das B. Linguistic knowledge based supervised key-phrase extraction. In: Proceedings of national conference on future trends in information & communication technology & applications, Bhubaneswar. India. 2011. Pal T, Banka H, Mitra P. Das B. Linguistic knowledge based supervised key-phrase extraction. In: Proceedings of national conference on future trends in information & communication technology & applications, Bhubaneswar. India. 2011. 
106.
Zurück zum Zitat Kim SN, Baldwin T, Kan MY. Evaluating n-gram based evaluation metrics for automatic keyphrase extraction. In: Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics. 2010. pp. 572–580. Kim SN, Baldwin T, Kan MY. Evaluating n-gram based evaluation metrics for automatic keyphrase extraction. In: Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics. 2010. pp. 572–580.
107.
Zurück zum Zitat Kim SN, Kan MY. Re-examining automatic keyphrase extraction approaches in scientific articles. In: Proceedings of the workshop on multiword expressions: identification, interpretation, disambiguation and applications. Association for Computational Linguistics. 2009. pp. 9–16. Kim SN, Kan MY. Re-examining automatic keyphrase extraction approaches in scientific articles. In: Proceedings of the workshop on multiword expressions: identification, interpretation, disambiguation and applications. Association for Computational Linguistics. 2009. pp. 9–16.
108.
Zurück zum Zitat Pianta E, Tonelli S. Kx: A flexible system for keyphrase extraction. In: Proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics. 2010. pp. 170–173. Pianta E, Tonelli S. Kx: A flexible system for keyphrase extraction. In: Proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics. 2010. pp. 170–173.
Metadaten
Titel
HAKE: an Unsupervised Approach to Automatic Keyphrase Extraction for Multiple Domains
verfasst von
Zakariae Alami Merrouni
Bouchra Frikh
Brahim Ouhbi
Publikationsdatum
21.01.2022
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 2/2022
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-021-09979-7

Weitere Artikel der Ausgabe 2/2022

Cognitive Computation 2/2022 Zur Ausgabe

Premium Partner