Skip to main content
Top

2017 | OriginalPaper | Chapter

Unsupervised Extraction of Conceptual Keyphrases from Abstracts

Authors : Philipp Ludwig, Marcus Thiel, Andreas Nürnberger

Published in: Semantic Keyword-Based Search on Structured Data Sources

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The extraction of meaningful keyphrases is important for a variety of applications, such as recommender systems, solutions for browsing of literature, or automatic categorization of documents. Since this task is not trivial, a great amount of different approaches have been introduced in the past, either focusing on single aspects of the process or utilizing the characteristics of a certain type of document. Especially when it comes to supporting the user in grasping the topics of a document (i.e. in the display of search results), precise keyphrases can be very helpful. However, in such situations usually only the abstract or a short excerpt is available, which most approaches do not acknowledge. Methods based on the frequency of words are not appropriate in this case, since the short texts do not contain sufficient word statistics for a frequency analysis. Secondly, many existing methods are supervised and therefore depend on domain knowledge or manually annotated data, which is in many scenarios not available. Therefore we present an unsupervised graph-based approach for extracting meaningful keyphrases from abstracts of scientific articles. We show that even though our method is not based on manually annotated data or corpora, it works surprisingly well.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Baddeley, A.D., Thomson, N., Buchanan, M.: Word length and the structure of short-term memory. J. Verbal Learn. Verbal Behav. 14(6), 575–589 (1975)CrossRef Baddeley, A.D., Thomson, N., Buchanan, M.: Word length and the structure of short-term memory. J. Verbal Learn. Verbal Behav. 14(6), 575–589 (1975)CrossRef
2.
go back to reference Barla, M., Bieliková, M.: On deriving tagsonomies: keyword relations coming from crowd. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS (LNAI), vol. 5796, pp. 309–320. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04441-0_27 CrossRef Barla, M., Bieliková, M.: On deriving tagsonomies: keyword relations coming from crowd. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS (LNAI), vol. 5796, pp. 309–320. Springer, Heidelberg (2009). doi:10.​1007/​978-3-642-04441-0_​27 CrossRef
3.
go back to reference Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 39(1), 1–20 (2015) Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 39(1), 1–20 (2015)
4.
go back to reference Brandes, U.: A faster algorithm for betweenness centrality*. J. Math. Sociol. 25(2), 163–177 (2001)CrossRef Brandes, U.: A faster algorithm for betweenness centrality*. J. Math. Sociol. 25(2), 163–177 (2001)CrossRef
5.
go back to reference Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Seeing the whole in parts: text summarization for web browsing on handheld devices. In: Proceedings of the 10th International Conference on World Wide Web, pp. 652–662. ACM (2001) Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Seeing the whole in parts: text summarization for web browsing on handheld devices. In: Proceedings of the 10th International Conference on World Wide Web, pp. 652–662. ACM (2001)
6.
go back to reference Freeman, L.C.: Centrality in social networks - conceptual clarification. Soc. Netw. 1, 215–239 (1978)CrossRef Freeman, L.C.: Centrality in social networks - conceptual clarification. Soc. Netw. 1, 215–239 (1978)CrossRef
7.
go back to reference Goecks, J., Shavlik, J.: Learning users’ interests by unobtrusively observing their normal behavior. In: Proceedings of the 5th International Conference on Intelligent user interfaces, pp. 129–132. ACM (2000) Goecks, J., Shavlik, J.: Learning users’ interests by unobtrusively observing their normal behavior. In: Proceedings of the 5th International Conference on Intelligent user interfaces, pp. 129–132. ACM (2000)
8.
go back to reference Grothe, L., Luca, E.W.D., Nürnberger, A.: A comparative study on language identification methods. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), pp. 980–985 (2008) Grothe, L., Luca, E.W.D., Nürnberger, A.: A comparative study on language identification methods. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), pp. 980–985 (2008)
9.
go back to reference Igarashi, A., Pierce, B.C., Wadler, P.: Featherweight Java: a minimal core calculus for Java and GJ. ACM Trans. Program. Lang. Syst. (1999) Igarashi, A., Pierce, B.C., Wadler, P.: Featherweight Java: a minimal core calculus for Java and GJ. ACM Trans. Program. Lang. Syst. (1999)
10.
go back to reference Lahiri, S., Choudhury, S.R., Caragea, C.: Keyword and keyphrase extraction using centrality measures on collocation networks. arXiv:1401.6571 (2014) Lahiri, S., Choudhury, S.R., Caragea, C.: Keyword and keyphrase extraction using centrality measures on collocation networks. arXiv:​1401.​6571 (2014)
11.
go back to reference Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M.: Yet another ranking function for automatic multiword term extraction. In: Przepiórkowski, A., Ogrodniczuk, M. (eds.) NLP 2014. LNCS (LNAI), vol. 8686, pp. 52–64. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10888-9_6 CrossRef Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M.: Yet another ranking function for automatic multiword term extraction. In: Przepiórkowski, A., Ogrodniczuk, M. (eds.) NLP 2014. LNCS (LNAI), vol. 8686, pp. 52–64. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-10888-9_​6 CrossRef
12.
go back to reference Popova, S., Khodyrev, I.: Ranking in keyphrase extraction problem: is it suitable to use statistics of words occurrences. Proc. Inst. Syst. Program. 26(4), 123–136 (2014) Popova, S., Khodyrev, I.: Ranking in keyphrase extraction problem: is it suitable to use statistics of words occurrences. Proc. Inst. Syst. Program. 26(4), 123–136 (2014)
13.
go back to reference Popova, S., Kovriguina, L., Mouromtsev, D., Khodyrev, I.: Stop-words in keyphrase extraction problem. In: 2013 14th Conference of Open Innovations Association (FRUCT), pp. 113–121. IEEE (2013) Popova, S., Kovriguina, L., Mouromtsev, D., Khodyrev, I.: Stop-words in keyphrase extraction problem. In: 2013 14th Conference of Open Innovations Association (FRUCT), pp. 113–121. IEEE (2013)
14.
go back to reference Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Berry, M.W., Kogan, J. (eds.) Text Mining, pp. 1–20. Wiley, New York (2010) Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Berry, M.W., Kogan, J. (eds.) Text Mining, pp. 1–20. Wiley, New York (2010)
15.
go back to reference Sarıyüce, A.E., Kaya, K., Saule, E., Catalyürek, U.V.: Incremental algorithms for closeness centrality. In: IEEE International Conference on BigData (2013) Sarıyüce, A.E., Kaya, K., Saule, E., Catalyürek, U.V.: Incremental algorithms for closeness centrality. In: IEEE International Conference on BigData (2013)
16.
go back to reference Šišović, S., Martinčić-Ipšić, S., Meštrović, A.: Toward network-based keyword extraction from multitopic web documents. In: International Conference on Information Technologies and Information Society (ITIS 2014) (2014) Šišović, S., Martinčić-Ipšić, S., Meštrović, A.: Toward network-based keyword extraction from multitopic web documents. In: International Conference on Information Technologies and Information Society (ITIS 2014) (2014)
17.
go back to reference Wang, R., Liu, W., McDonald, C.: Using word embeddings to enhance keyword identification for scientific publications. In: Sharaf, M.A., Cheema, M.A., Qi, J. (eds.) ADC 2015. LNCS, vol. 9093, pp. 257–268. Springer, Heidelberg (2015). doi:10.1007/978-3-319-19548-3_21 CrossRef Wang, R., Liu, W., McDonald, C.: Using word embeddings to enhance keyword identification for scientific publications. In: Sharaf, M.A., Cheema, M.A., Qi, J. (eds.) ADC 2015. LNCS, vol. 9093, pp. 257–268. Springer, Heidelberg (2015). doi:10.​1007/​978-3-319-19548-3_​21 CrossRef
18.
go back to reference Xie, Z.: Centrality measures in text mining: prediction of noun phrases that appear in abstracts. In: Proceedings of the ACL Student Research Workshop, pp. 103–108. Association for Computational Linguistics, Stroudsburg (2005) Xie, Z.: Centrality measures in text mining: prediction of noun phrases that appear in abstracts. In: Proceedings of the ACL Student Research Workshop, pp. 103–108. Association for Computational Linguistics, Stroudsburg (2005)
19.
go back to reference Yoon, J., Kim, K.: Detecting signals of new technological opportunities using semantic patent analysis and outlier detection. Scientometrics 90(2), 445–461 (2011)CrossRef Yoon, J., Kim, K.: Detecting signals of new technological opportunities using semantic patent analysis and outlier detection. Scientometrics 90(2), 445–461 (2011)CrossRef
Metadata
Title
Unsupervised Extraction of Conceptual Keyphrases from Abstracts
Authors
Philipp Ludwig
Marcus Thiel
Andreas Nürnberger
Copyright Year
2017
Publisher
Springer International Publishing
DOI
https://doi.org/10.1007/978-3-319-53640-8_4