Skip to main content
Top

2017 | OriginalPaper | Chapter

Network-Enabled Keyword Extraction for Under-Resourced Languages

Authors : Slobodan Beliga, Sanda Martinčić-Ipšić

Published in: Semantic Keyword-Based Search on Structured Data Sources

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper we discuss advantages of network-enabled keyword extraction from texts in under-resourced languages. Network-enabled methods are shortly introduced, while focus of the paper is placed on discussion of difficulties that methods must overcome when dealing with content in under-resourced languages (mainly exhibit as a lack of natural language processing resources: corpora and tools). Additionally, the paper discusses how to circumvent the lack of NLP tools with network-enabled method such is SBKE method.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 39(1), 1–20 (2015) Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 39(1), 1–20 (2015)
2.
go back to reference Besacier, L., Barnard, E., Karpov, A., Schultz, T.: Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014)CrossRef Besacier, L., Barnard, E., Karpov, A., Schultz, T.: Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014)CrossRef
3.
go back to reference Krauwer, S.: The basic language resource kit (BLARK) as the first milestone for the language resources roadmap. In: Proceedings of the 2003 International Workshop Speech and Computer SPECOM-2003, pp. 8–15. Moscow, Russia (2003) Krauwer, S.: The basic language resource kit (BLARK) as the first milestone for the language resources roadmap. In: Proceedings of the 2003 International Workshop Speech and Computer SPECOM-2003, pp. 8–15. Moscow, Russia (2003)
4.
go back to reference Berment, V.: Méthodes pour informatiser des langues et des groupes de langues “peu dotées”. Ph.D. Thesis, J. Fourier University – Grenoble I (2004) Berment, V.: Méthodes pour informatiser des langues et des groupes de langues “peu dotées”. Ph.D. Thesis, J. Fourier University – Grenoble I (2004)
5.
go back to reference Abilhoa, W.D., Castro, L.N.: A keyword extraction method from twitter messages represented as graphs. Appl. Math. Comput. 240, 308–325 (2014) Abilhoa, W.D., Castro, L.N.: A keyword extraction method from twitter messages represented as graphs. Appl. Math. Comput. 240, 308–325 (2014)
6.
go back to reference Palshikar, G.K.: Keyword extraction from a single document using centrality measures. In: Ghosh, A., De, R.K., Pal, S.K. (eds.) PReMI 2007. LNCS, vol. 4815, pp. 503–510. Springer, Heidelberg (2007). doi:10.1007/978-3-540-77046-6_62 CrossRef Palshikar, G.K.: Keyword extraction from a single document using centrality measures. In: Ghosh, A., De, R.K., Pal, S.K. (eds.) PReMI 2007. LNCS, vol. 4815, pp. 503–510. Springer, Heidelberg (2007). doi:10.​1007/​978-3-540-77046-6_​62 CrossRef
7.
go back to reference Mihalcea, R., Tarau, P.: TextRank: Bringing order into texts. In: Proceedings of Empirical Methods in Natural Language Processing – EMNLP 2004, pp. 404–411. ACL, Barcelona, Spain (2004) Mihalcea, R., Tarau, P.: TextRank: Bringing order into texts. In: Proceedings of Empirical Methods in Natural Language Processing – EMNLP 2004, pp. 404–411. ACL, Barcelona, Spain (2004)
10.
go back to reference Joorabchi, A., Mahdi, A.E.: Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms. J. Inf. Sci. 39(3), 410–426 (2013)CrossRef Joorabchi, A., Mahdi, A.E.: Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms. J. Inf. Sci. 39(3), 410–426 (2013)CrossRef
11.
go back to reference Lahiri, S., Choudhury, S.R., Caragea, C.: Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks (2014). arXiv preprint arXiv:1401.6571 Lahiri, S., Choudhury, S.R., Caragea, C.: Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks (2014). arXiv preprint arXiv:​1401.​6571
12.
go back to reference Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: ACM 18th conference on World Wide Web, pp. 661–670 (2009) Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: ACM 18th conference on World Wide Web, pp. 661–670 (2009)
13.
go back to reference Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: Toward selectivity-based keyword extraction for croatian news. In: CEUR Proceedings of the Workshop on Surfacing the Deep and the Social Web (SDSW 2014), vol. 1310, pp. 1–8, Riva del Garda, Trentino, Italy (2014) Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: Toward selectivity-based keyword extraction for croatian news. In: CEUR Proceedings of the Workshop on Surfacing the Deep and the Social Web (SDSW 2014), vol. 1310, pp. 1–8, Riva del Garda, Trentino, Italy (2014)
14.
go back to reference Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: Selectivity-based keyword extraction method. Int. J. Semant. Web Inf. Syst. (IJSWIS) 12(3), 1–26 (2016)CrossRef Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: Selectivity-based keyword extraction method. Int. J. Semant. Web Inf. Syst. (IJSWIS) 12(3), 1–26 (2016)CrossRef
15.
go back to reference Proceedings of the ACL 2015 Workshop on Novel Computational Approaches to Keyphrase Extraction, ACL-IJCNLP 2015, Beijing, China (2015) Proceedings of the ACL 2015 Workshop on Novel Computational Approaches to Keyphrase Extraction, ACL-IJCNLP 2015, Beijing, China (2015)
16.
go back to reference Paroubek, P., Zweigenbaum, P., Forest, D., Grouin, C.: Indexation libreet controlee d’articles scientifiques. Presentation et resultats du defi fouille de textes DEFT2012. In: Proceedings of the DEfi Fouille de Textes 2012 Workshop, pp. 1–13 (2012) Paroubek, P., Zweigenbaum, P., Forest, D., Grouin, C.: Indexation libreet controlee d’articles scientifiques. Presentation et resultats du defi fouille de textes DEFT2012. In: Proceedings of the DEfi Fouille de Textes 2012 Workshop, pp. 1–13 (2012)
17.
go back to reference Kozłowski, M.: PKE: a novel Polish keywords extraction method. Pomiary Automatyka Kontrola, R. 60(5), 305–308 (2014) Kozłowski, M.: PKE: a novel Polish keywords extraction method. Pomiary Automatyka Kontrola, R. 60(5), 305–308 (2014)
18.
go back to reference Mijić, J., Dalbelo-Bašić, B., Šnajder, J.: Robust keyphrase extraction for a large-scale croatian news production system. In: Proceedings of the 7th International Conference on Formal Approaches to South Slavic and Balkan Languages, Zagreb, Croatia: Croatian Language Technologies Society, pp. 59–66 (2010) Mijić, J., Dalbelo-Bašić, B., Šnajder, J.: Robust keyphrase extraction for a large-scale croatian news production system. In: Proceedings of the 7th International Conference on Formal Approaches to South Slavic and Balkan Languages, Zagreb, Croatia: Croatian Language Technologies Society, pp. 59–66 (2010)
20.
go back to reference Zunde, P., Dexter, M.E.: Indexing consistency and quality. Am. Documentation 20(3), 259–267 (1969)CrossRef Zunde, P., Dexter, M.E.: Indexing consistency and quality. Am. Documentation 20(3), 259–267 (1969)CrossRef
21.
go back to reference Loza, V., Lahiri, S., Mihalcea, R., Lai, P.: Building a dataset for summarization and keyword extraction from emails. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). pp. 2441–2446, Reykjavik, Iceland (2014) Loza, V., Lahiri, S., Mihalcea, R., Lai, P.: Building a dataset for summarization and keyword extraction from emails. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). pp. 2441–2446, Reykjavik, Iceland (2014)
22.
go back to reference Su, N.K., Medelyan, O., Min-Yen, K., Timothy, B.: Automatic keyphrase extraction from scientific articles. Lang. Resour. Eval. 47(3), 723–742 (2013)CrossRef Su, N.K., Medelyan, O., Min-Yen, K., Timothy, B.: Automatic keyphrase extraction from scientific articles. Lang. Resour. Eval. 47(3), 723–742 (2013)CrossRef
23.
go back to reference Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., et al.: Part-of-speech tagging for twitter: annotation, features, and experiments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers – vol. 2, HLT 2011, Stroudsburg, PA, USA. Association for Computational Linguistics (2011) Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., et al.: Part-of-speech tagging for twitter: annotation, features, and experiments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers – vol. 2, HLT 2011, Stroudsburg, PA, USA. Association for Computational Linguistics (2011)
24.
go back to reference Marujo, L., Wang, L., Trancoso, I., Dyer, C., Black, A.W., Gershman, A., et al.: Automatic keyword extraction on twitter. In: ACL (2015) Marujo, L., Wang, L., Trancoso, I., Dyer, C., Black, A.W., Gershman, A., et al.: Automatic keyword extraction on twitter. In: ACL (2015)
25.
go back to reference Medelyan, O.: Human-competitive automatic topic indexing. Ph.D. thesis. Department of Computer Science, University of Waikato, New Zealand (2009) Medelyan, O.: Human-competitive automatic topic indexing. Ph.D. thesis. Department of Computer Science, University of Waikato, New Zealand (2009)
26.
go back to reference Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003) Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003)
27.
go back to reference Nguyen, T.D., Kan, M.-Y.: Keyphrase extraction in scientific publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007). doi:10.1007/978-3-540-77094-7_41 CrossRef Nguyen, T.D., Kan, M.-Y.: Keyphrase extraction in scientific publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007). doi:10.​1007/​978-3-540-77094-7_​41 CrossRef
28.
go back to reference Wan, X., Xiao, J.: CollabRank: towards a collaborative approach to single-document keyphrase extraction. In: Proceedings of COLING, pp. 969–976 (2008) Wan, X., Xiao, J.: CollabRank: towards a collaborative approach to single-document keyphrase extraction. In: Proceedings of COLING, pp. 969–976 (2008)
29.
go back to reference Krapivin, M., Autaeu, A., Marchese, M.: Large dataset for keyphrase extraction. Technical Report DISI-09-055, DISI, University of Trento, Italy (2009) Krapivin, M., Autaeu, A., Marchese, M.: Large dataset for keyphrase extraction. Technical Report DISI-09-055, DISI, University of Trento, Italy (2009)
30.
go back to reference Medelyan, O., Witten, I.H.: Domain independent automatic keyphrase indexing with small training sets. J. Am. Soc. Inf. Sci. Technol. 59(7), 1026–1040 (2008)CrossRef Medelyan, O., Witten, I.H.: Domain independent automatic keyphrase indexing with small training sets. J. Am. Soc. Inf. Sci. Technol. 59(7), 1026–1040 (2008)CrossRef
31.
go back to reference Marujo, L., Gershman, A., Carbonell, J., Frederking, R., Neto, J.P.: Supervised topical key phrase extraction of news stories using crowdsourcing. In: Light Filtering and Co-reference Normalization. Proceedings of LREC 2012 (2012) Marujo, L., Gershman, A., Carbonell, J., Frederking, R., Neto, J.P.: Supervised topical key phrase extraction of news stories using crowdsourcing. In: Light Filtering and Co-reference Normalization. Proceedings of LREC 2012 (2012)
32.
go back to reference Marujo, L., Viveiros, M., Neto, J.P.: Keyphrase cloud generation of broadcast news. In: Proceeding of 12th Annual Conference of the International Speech Communication Association, Interspeech (2011) Marujo, L., Viveiros, M., Neto, J.P.: Keyphrase cloud generation of broadcast news. In: Proceeding of 12th Annual Conference of the International Speech Communication Association, Interspeech (2011)
Metadata
Title
Network-Enabled Keyword Extraction for Under-Resourced Languages
Authors
Slobodan Beliga
Sanda Martinčić-Ipšić
Copyright Year
2017
Publisher
Springer International Publishing
DOI
https://doi.org/10.1007/978-3-319-53640-8_11