Skip to main content
Erschienen in: Quality & Quantity 3/2017

11.06.2016

A semantic annotation framework for scientific publications

verfasst von: Yuchul Jung

Erschienen in: Quality & Quantity | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Considering the growing volume of scientific literature, techniques that enable automatic detection of informational entities existing in scientific research articles may contribute to the extension of scientific knowledge and practical usages. Although there have been several efforts to extract informative entities from patent and biomedical research articles, there are few attempts in other scientific literatures. In this paper, we introduce an automatic semantic annotation framework for research articles based on entity recognition techniques. Our approach includes tag set modeling for semantic annotation, semi-automatic annotation tool, manual annotation for training data preparation, and supervised machine learning to develop entity type recognition module. For experiments, we choose two different domains, such as information and communication technology and chemical engineering due to their high usages. In addition, we provide three application scenarios of how our annotation framework can be used and extended further. It is to guide potential researchers who are willing to link their own contents with external data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, vol. 1, pp. 8–15 (2003) Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, vol. 1, pp. 8–15 (2003)
Zurück zum Zitat Atdaǧ, S., Labatut, V.: A comparison of named entity recognition tools applied to biographical texts. In: ICSCS 2013, 2nd International conference on systems and computer science, pp. 228–233 (2013) Atdaǧ, S., Labatut, V.: A comparison of named entity recognition tools applied to biographical texts. In: ICSCS 2013, 2nd International conference on systems and computer science, pp. 228–233 (2013)
Zurück zum Zitat Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the fifth conference on Applied natural language processing, pp. 194–201 (1998) Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the fifth conference on Applied natural language processing, pp. 194–201 (1998)
Zurück zum Zitat Cajaiba-Santana, G.: Social innovation: moving the field forward. A conceptual framework. Technol. Forecast. Soc. Change 82(1), 42–51 (2014)CrossRef Cajaiba-Santana, G.: Social innovation: moving the field forward. A conceptual framework. Technol. Forecast. Soc. Change 82(1), 42–51 (2014)CrossRef
Zurück zum Zitat Carmel, D., Chang, M.-W., Gabrilovich, E., Hsu, B.P., Wang, K.: ERD’14: Entity Recognition and Disambigutation Challenge. In: SIGIR’14, Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, vol. 48, pp. 1292–1292 (2014) Carmel, D., Chang, M.-W., Gabrilovich, E., Hsu, B.P., Wang, K.: ERD’14: Entity Recognition and Disambigutation Challenge. In: SIGIR’14, Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, vol. 48, pp. 1292–1292 (2014)
Zurück zum Zitat Chiu, Y., Shih, Y., Lee, Y., Shao, C.: NTUNLP Approaches to Recognizing and Disambiguating Entities in Long and Short Text in the 2014 ERD Challenge. In ERD’14, Proceedings of the first international workshop on Entity recognition & disambiguation, pp. 3–12 (2014) Chiu, Y., Shih, Y., Lee, Y., Shao, C.: NTUNLP Approaches to Recognizing and Disambiguating Entities in Long and Short Text in the 2014 ERD Challenge. In ERD’14, Proceedings of the first international workshop on Entity recognition & disambiguation, pp. 3–12 (2014)
Zurück zum Zitat Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic systems (I-Semantics), pp. 121–124 (2012) Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic systems (I-Semantics), pp. 121–124 (2012)
Zurück zum Zitat Dey, L., Mahajan, D., Gupta, H.: Obtaining technology insights from large and heterogeneous document collections. In: 2014 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT), pp. 102–109 (2014) Dey, L., Mahajan, D., Gupta, H.: Obtaining technology insights from large and heterogeneous document collections. In: 2014 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT), pp. 102–109 (2014)
Zurück zum Zitat Dos Santos, C.N., Guimarães, V.: Boosting named entity recognition with neural character embeddings. In: Proceedings of the fifth named entity workshop, joint with 53rd ACL and the 7th IJCNLP, vol. 2014, pp. 25–33 (2015) Dos Santos, C.N., Guimarães, V.: Boosting named entity recognition with neural character embeddings. In: Proceedings of the fifth named entity workshop, joint with 53rd ACL and the 7th IJCNLP, vol. 2014, pp. 25–33 (2015)
Zurück zum Zitat Eltyeb, S., Salim, N.: Chemical named entities recognition: a review on approaches and applications. J. Cheminform. 6(1), 1–12 (2014)CrossRef Eltyeb, S., Salim, N.: Chemical named entities recognition: a review on approaches and applications. J. Cheminform. 6(1), 1–12 (2014)CrossRef
Zurück zum Zitat Fadul, J.A.: Big data and knowledge generation in tertiary education in the Philippines. J. Contemp. East. Asia 13(1), 5–18 (2014)CrossRef Fadul, J.A.: Big data and knowledge generation in tertiary education in the Philippines. J. Contemp. East. Asia 13(1), 5–18 (2014)CrossRef
Zurück zum Zitat Ferragina, P., Scaiella, U.: TAGME: One-the-fly Annotation of Short Text Fragmetns (by Wikiepdia Entities). In: CIKM’10, Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 1625–1628 (2010) Ferragina, P., Scaiella, U.: TAGME: One-the-fly Annotation of Short Text Fragmetns (by Wikiepdia Entities). In: CIKM’10, Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 1625–1628 (2010)
Zurück zum Zitat Gongchang, R., Qi, L., Fenghai, Y.: On classification and extraction of deep knowledge in patents based on TRIZ theory. In: 2014 Fifth international conference on intelligent systems design and engineering applications, pp. 666–670 (2014) Gongchang, R., Qi, L., Fenghai, Y.: On classification and extraction of deep knowledge in patents based on TRIZ theory. In: 2014 Fifth international conference on intelligent systems design and engineering applications, pp. 666–670 (2014)
Zurück zum Zitat Grishman, R., Borthwick, A., Sterling, J., Agichtein, E.: NYU: description of the MENE named entity system as used in MUC-7. In: Proceedings of the seventh message understanding conference (MUC-7) (1998) Grishman, R., Borthwick, A., Sterling, J., Agichtein, E.: NYU: description of the MENE named entity system as used in MUC-7. In: Proceedings of the seventh message understanding conference (MUC-7) (1998)
Zurück zum Zitat Guo, Y., Korhonen, A., Poibeau, T.: A weakly-supervised approach to argumentative zoning of scientific documents. In: EMNLP’11, Proceedings of the conference on empirical methods in natural language processing, pp. 273–283 (2011) Guo, Y., Korhonen, A., Poibeau, T.: A weakly-supervised approach to argumentative zoning of scientific documents. In: EMNLP’11, Proceedings of the conference on empirical methods in natural language processing, pp. 273–283 (2011)
Zurück zum Zitat Gupta, S., Manning, C.: Analyzing the dynamics of research by extracting key aspects of scientific papers. In: Proceedings of 5th international joint conference on natural language processing, pp. 1–9 (2011) Gupta, S., Manning, C.: Analyzing the dynamics of research by extracting key aspects of scientific papers. In: Proceedings of 5th international joint conference on natural language processing, pp. 1–9 (2011)
Zurück zum Zitat He, C., Loh, H.T.: Pattern-oriented associative rule-based patent classification. Expert Syst. Appl. 37(3), 2395–2404 (2010)CrossRef He, C., Loh, H.T.: Pattern-oriented associative rule-based patent classification. Expert Syst. Appl. 37(3), 2395–2404 (2010)CrossRef
Zurück zum Zitat Ibekwe-SanJuan, F.: Semantic metadata annotation: tagging Medline abstracts for enhanced information access. Aslib Proc. 62, 476–488 (2010)CrossRef Ibekwe-SanJuan, F.: Semantic metadata annotation: tagging Medline abstracts for enhanced information access. Aslib Proc. 62, 476–488 (2010)CrossRef
Zurück zum Zitat Joachims, T., Finley, T., Yu, C.N.J.: Cutting-plane training of structural SVMs. Mach. Learn. 77, 27–59 (2009)CrossRef Joachims, T., Finley, T., Yu, C.N.J.: Cutting-plane training of structural SVMs. Mach. Learn. 77, 27–59 (2009)CrossRef
Zurück zum Zitat Jung, K., Park, H.W.: A semantic (TRIZ) network analysis of South Korea’s ‘Open Public Data’ policy. Gov. Inf. Q. 32(3), 353–358 (2015)CrossRef Jung, K., Park, H.W.: A semantic (TRIZ) network analysis of South Korea’s ‘Open Public Data’ policy. Gov. Inf. Q. 32(3), 353–358 (2015)CrossRef
Zurück zum Zitat Lee, Y.-G.: Multidisciplinary Team Research as an Innovation Engine in Knowledge-Based Transition Economies and Implication for Asian Countries -From the Perspective of the Science of Team Science. J. Contemp. East. Asia 12(1), 49–63 (2013)CrossRef Lee, Y.-G.: Multidisciplinary Team Research as an Innovation Engine in Knowledge-Based Transition Economies and Implication for Asian Countries -From the Perspective of the Science of Team Science. J. Contemp. East. Asia 12(1), 49–63 (2013)CrossRef
Zurück zum Zitat Lee, C., Jang, M.G.: A modified fixed-threshold SMO for 1-slack structural SVMs. ETRI J. 32(1), 120–128 (2010)CrossRef Lee, C., Jang, M.G.: A modified fixed-threshold SMO for 1-slack structural SVMs. ETRI J. 32(1), 120–128 (2010)CrossRef
Zurück zum Zitat Lee, C., Hwang, Y.-G., Oh, H.-J., Lim, S., Heo, J., Lee, C.-H., Kim, H.-J., Wang, J.-H., Jang, M.-G.: Fine-grained named entity recognition using conditional random fields for question answering. In: Information Retrieval Technology, vol. 5839, pp. 581–587 (2006) Lee, C., Hwang, Y.-G., Oh, H.-J., Lim, S., Heo, J., Lee, C.-H., Kim, H.-J., Wang, J.-H., Jang, M.-G.: Fine-grained named entity recognition using conditional random fields for question answering. In: Information Retrieval Technology, vol. 5839, pp. 581–587 (2006)
Zurück zum Zitat Lee, C., Ryu, P.-M., Kim, H.: Named entity recognition using a modified Pegasos algorithm. In: CIKM’11, Proceedings of the 20th ACM international conference on information and knowledge management, pp. 2337–2340 (2011) Lee, C., Ryu, P.-M., Kim, H.: Named entity recognition using a modified Pegasos algorithm. In: CIKM’11, Proceedings of the 20th ACM international conference on information and knowledge management, pp. 2337–2340 (2011)
Zurück zum Zitat McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: HLT-NAACL 2003, Proceedings of the seventh conference on natural language learning, vol. 4, pp. 188–191 (2003) McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: HLT-NAACL 2003, Proceedings of the seventh conference on natural language learning, vol. 4, pp. 188–191 (2003)
Zurück zum Zitat Mizuta, N.C.Y., Korhonen, A., Mullen, T.: Zone analysis in biology articles as a basis for in- formation extraction. Int. J. Med. Informatics. 75(6), 468–487 (2006)CrossRef Mizuta, N.C.Y., Korhonen, A., Mullen, T.: Zone analysis in biology articles as a basis for in- formation extraction. Int. J. Med. Informatics. 75(6), 468–487 (2006)CrossRef
Zurück zum Zitat Murphy, T., Mcintosh, T., Curran, J. R.: Named entity recognition for astronomy literature. In: Proceedings of the Australasian language technology workshop (ALTW), pp. 59–66 (2006) Murphy, T., Mcintosh, T., Curran, J. R.: Named entity recognition for astronomy literature. In: Proceedings of the Australasian language technology workshop (ALTW), pp. 59–66 (2006)
Zurück zum Zitat Park, H.W., Leydesdorff, L.: Decomposing social and semantic networks in emerging ‘big data’ research. J. Informetr. 7(3), 756–765 (2013)CrossRef Park, H.W., Leydesdorff, L.: Decomposing social and semantic networks in emerging ‘big data’ research. J. Informetr. 7(3), 756–765 (2013)CrossRef
Zurück zum Zitat Park, Y.M., Kang, S.W., Seo, J.G.: Title named entity recognition using Wikipedia and abbreviation generation. In: International conference on big data and smart computing (BIGCOMP), pp. 169–172 (2014) Park, Y.M., Kang, S.W., Seo, J.G.: Title named entity recognition using Wikipedia and abbreviation generation. In: International conference on big data and smart computing (BIGCOMP), pp. 169–172 (2014)
Zurück zum Zitat Phillips, F.: Triple helix and the circle of innovation. J. Contemp. East. Asia 13(1), 57–68 (2013)CrossRef Phillips, F.: Triple helix and the circle of innovation. J. Contemp. East. Asia 13(1), 57–68 (2013)CrossRef
Zurück zum Zitat Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a web-based tool for NLP-assisted text annotation. In: Proceedings of the demonstrations at the 13th conference of the European chapter of the association for computational linguistics (EACL), pp. 102–107 (2012) Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a web-based tool for NLP-assisted text annotation. In: Proceedings of the demonstrations at the 13th conference of the European chapter of the association for computational linguistics (EACL), pp. 102–107 (2012)
Zurück zum Zitat Tateisi, Y., Shidahara, Y., Miyao, Y., Aizawa, A.: Annotation of computer science papers for semantic relation extraction. In: Proceedings of the 9th international conference on language resources and evaluation, pp. 1423–1429 (2014) Tateisi, Y., Shidahara, Y., Miyao, Y., Aizawa, A.: Annotation of computer science papers for semantic relation extraction. In: Proceedings of the 9th international conference on language resources and evaluation, pp. 1423–1429 (2014)
Zurück zum Zitat Teufel, S., Moens, M.: Summarizing scientific articles: experiments with relevance and rhetorical status. Comput. Linguist. 28, 409–445 (2002)CrossRef Teufel, S., Moens, M.: Summarizing scientific articles: experiments with relevance and rhetorical status. Comput. Linguist. 28, 409–445 (2002)CrossRef
Zurück zum Zitat Teufel, S., Batchelor, C.: Towards discipline-independent argumentative zoning: evidence from chemistry and computational linguistics. In: Proceedings of the 2009 conference on empirical methods in natural language processing, pp. 1493–1502 (2009) Teufel, S., Batchelor, C.: Towards discipline-independent argumentative zoning: evidence from chemistry and computational linguistics. In: Proceedings of the 2009 conference on empirical methods in natural language processing, pp. 1493–1502 (2009)
Zurück zum Zitat Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML’04 Proceedings of the twenty-first international conference on Machine learning, p. 104 (2004) Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML’04 Proceedings of the twenty-first international conference on Machine learning, p. 104 (2004)
Zurück zum Zitat Yoon, B., Park, I., Coh, B.Y.: Exploring technological opportunities by linking technology and products: application of morphology analysis and text mining. Technol. Forecast. Soc. Change 86, 287–303 (2014)CrossRef Yoon, B., Park, I., Coh, B.Y.: Exploring technological opportunities by linking technology and products: application of morphology analysis and text mining. Technol. Forecast. Soc. Change 86, 287–303 (2014)CrossRef
Metadaten
Titel
A semantic annotation framework for scientific publications
verfasst von
Yuchul Jung
Publikationsdatum
11.06.2016
Verlag
Springer Netherlands
Erschienen in
Quality & Quantity / Ausgabe 3/2017
Print ISSN: 0033-5177
Elektronische ISSN: 1573-7845
DOI
https://doi.org/10.1007/s11135-016-0369-3

Weitere Artikel der Ausgabe 3/2017

Quality & Quantity 3/2017 Zur Ausgabe

Premium Partner