Skip to main content

2017 | OriginalPaper | Buchkapitel

Semantic Annotation of Data Processing Pipelines in Scientific Publications

verfasst von : Sepideh Mesbah, Kyriakos Fragkeskos, Christoph Lofi, Alessandro Bozzon, Geert-Jan Houben

Erschienen in: The Semantic Web

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data processing pipelines are a core object of interest for data scientist and practitioners operating in a variety of data-related application domains. To effectively capitalise on the experience gained in the creation and adoption of such pipelines, the need arises for mechanisms able to capture knowledge about datasets of interest, data processing methods designed to achieve a given goal, and the performance achieved when applying such methods to the considered datasets. However, due to its distributed and often unstructured nature, this knowledge is not easily accessible. In this paper, we use (scientific) publications as source of knowledge about Data Processing Pipelines. We describe a method designed to classify sentences according to the nature of the contained information (i.e. scientific objective, dataset, method, software, result), and to extract relevant named entities. The extracted information is then semantically annotated and published as linked data in open knowledge repositories according to the DMS ontology for data processing metadata. To demonstrate the effectiveness and performance of our approach, we present the results of a quantitative and qualitative analysis performed on four different conference series.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alexandru, C., Peroni, S., Pettifer, S., Shotton, D., Vitali, F.: The document components ontology (DoCO). Semant. Web 7(2), 167–181 (2016)CrossRef Alexandru, C., Peroni, S., Pettifer, S., Shotton, D., Vitali, F.: The document components ontology (DoCO). Semant. Web 7(2), 167–181 (2016)CrossRef
2.
Zurück zum Zitat Möller, K., Heath, T., Handschuh, S., Domingue, J.: Recipes for semantic web dog food — The ESWC and ISWC metadata projects. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 802–815. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76298-0_58CrossRef Möller, K., Heath, T., Handschuh, S., Domingue, J.: Recipes for semantic web dog food — The ESWC and ISWC metadata projects. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 802–815. Springer, Heidelberg (2007). doi:10.​1007/​978-3-540-76298-0_​58CrossRef
3.
Zurück zum Zitat Glaser, H., Millard, I.: Knowledge-enabled research support: RKBExplorer.com. In: Proceedings of Web Science, Athens, Greece (2009) Glaser, H., Millard, I.: Knowledge-enabled research support: RKBExplorer.com. In: Proceedings of Web Science, Athens, Greece (2009)
4.
Zurück zum Zitat Ghavimi, B., Mayr, P., Vahdati, S., Lange, C.: Identifying and improving dataset references in social sciences full texts. arXiv preprint arXiv:1603.01774 (2016) Ghavimi, B., Mayr, P., Vahdati, S., Lange, C.: Identifying and improving dataset references in social sciences full texts. arXiv preprint arXiv:​1603.​01774 (2016)
5.
Zurück zum Zitat O’Seaghdha, D., Teufel, S.: Unsupervised learning of rhetorical structure with untopic models. In: Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014) (2014) O’Seaghdha, D., Teufel, S.: Unsupervised learning of rhetorical structure with untopic models. In: Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014) (2014)
6.
Zurück zum Zitat Tuarob, S., et al.: AlgorithmSeer: a system for extracting and searching for algorithms in scholarly big data. IEEE Trans. Big Data 2(1), 3–17 (2016)CrossRef Tuarob, S., et al.: AlgorithmSeer: a system for extracting and searching for algorithms in scholarly big data. IEEE Trans. Big Data 2(1), 3–17 (2016)CrossRef
7.
Zurück zum Zitat Osborne, F., Ribaupierre, H., Motta, E.: TechMiner: extracting technologies from academic publications. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds.) EKAW 2016. LNCS (LNAI), vol. 10024, pp. 463–479. Springer, Cham (2016). doi:10.1007/978-3-319-49004-5_30CrossRef Osborne, F., Ribaupierre, H., Motta, E.: TechMiner: extracting technologies from academic publications. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds.) EKAW 2016. LNCS (LNAI), vol. 10024, pp. 463–479. Springer, Cham (2016). doi:10.​1007/​978-3-319-49004-5_​30CrossRef
8.
Zurück zum Zitat Khodra, M.L., et al.: Information extraction from scientific paper using rhetorical classifier. In: International Conference on Electrical Engineering and Informatics (ICEEI) (2011) Khodra, M.L., et al.: Information extraction from scientific paper using rhetorical classifier. In: International Conference on Electrical Engineering and Informatics (ICEEI) (2011)
9.
Zurück zum Zitat Helen, A., Purwarianti, A., Widyantoro, D.H.: Rhetorical sentences classification based on section class and title of paper for experimental technical papers. J. ICT Res. Appl. 9(3), 288–310 (2015)CrossRef Helen, A., Purwarianti, A., Widyantoro, D.H.: Rhetorical sentences classification based on section class and title of paper for experimental technical papers. J. ICT Res. Appl. 9(3), 288–310 (2015)CrossRef
10.
Zurück zum Zitat Burns, G.A., Dasigi, P., de Waard, A., Hovy, E.H.: Automated detection of discourse segment and experimental types from the text of cancer pathway results sections. Database. J. Biol. Databases Curation (2016) Burns, G.A., Dasigi, P., de Waard, A., Hovy, E.H.: Automated detection of discourse segment and experimental types from the text of cancer pathway results sections. Database. J. Biol. Databases Curation (2016)
11.
Zurück zum Zitat Sateli, B., Witte, R.: What’s in this paper? Combining rhetorical entities with linked open data for semantic literature querying. In: Proceedings of the 24th International Conference on World Wide Web. ACM (2015) Sateli, B., Witte, R.: What’s in this paper? Combining rhetorical entities with linked open data for semantic literature querying. In: Proceedings of the 24th International Conference on World Wide Web. ACM (2015)
12.
Zurück zum Zitat Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.R.: Corpora for the conceptualisation and zoning of scientific papers. In: LREC (2010) Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.R.: Corpora for the conceptualisation and zoning of scientific papers. In: LREC (2010)
13.
Zurück zum Zitat Gil, Y., Ratnakar, V., Garijo, D.: Ontosoft: capturing scientific software metadata. In: International Conference on Knowledge Capture, p. 32. ACM (2015) Gil, Y., Ratnakar, V., Garijo, D.: Ontosoft: capturing scientific software metadata. In: International Conference on Knowledge Capture, p. 32. ACM (2015)
14.
Zurück zum Zitat Groza, T.: Using typed dependencies to study and recognise conceptualisation zones in biomedical literature. PloS One 8(11), e79570 (2013)CrossRef Groza, T.: Using typed dependencies to study and recognise conceptualisation zones in biomedical literature. PloS One 8(11), e79570 (2013)CrossRef
15.
Zurück zum Zitat Dorgeloh, H., Wanner, A.: Formulaic argumentation in scientific discourse. In: Corrigan, R., Moravcsik, E.A., Ouli, H., Wheatley, K.M. (eds.) Formulaic Language, vol. 2, pp. 523–544. John Benjamins, Amsterdam (2009)CrossRef Dorgeloh, H., Wanner, A.: Formulaic argumentation in scientific discourse. In: Corrigan, R., Moravcsik, E.A., Ouli, H., Wheatley, K.M. (eds.) Formulaic Language, vol. 2, pp. 523–544. John Benjamins, Amsterdam (2009)CrossRef
17.
Zurück zum Zitat Mesbah, S., Bozzon, A., Lofi, C., Houben, G.-J.: Describing data processing pipelines in scientific publications for big data injection. In: WSDM Workshop on Scholary Web Mining (SWM), Cambridge, UK (2017) Mesbah, S., Bozzon, A., Lofi, C., Houben, G.-J.: Describing data processing pipelines in scientific publications for big data injection. In: WSDM Workshop on Scholary Web Mining (SWM), Cambridge, UK (2017)
18.
Zurück zum Zitat Lopez, P.: GROBID: combining automatic bibliographic data recognition and term extraction for scholarship publications. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 473–474. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04346-8_62CrossRef Lopez, P.: GROBID: combining automatic bibliographic data recognition and term extraction for scholarship publications. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 473–474. Springer, Heidelberg (2009). doi:10.​1007/​978-3-642-04346-8_​62CrossRef
19.
Zurück zum Zitat Lipinski, M., Yao, K., Breitinger, C., Beel, J., Gipp, B.: Evaluation of header metadata extraction approaches and tools for scientific PDF documents. In: JCDL, Indianapolis, USA (2013) Lipinski, M., Yao, K., Breitinger, C., Beel, J., Gipp, B.: Evaluation of header metadata extraction approaches and tools for scientific PDF documents. In: JCDL, Indianapolis, USA (2013)
20.
Zurück zum Zitat Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: International Joint Conference on Natural Language Processing of the AFNLP, Singapore (2009) Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: International Joint Conference on Natural Language Processing of the AFNLP, Singapore (2009)
Metadaten
Titel
Semantic Annotation of Data Processing Pipelines in Scientific Publications
verfasst von
Sepideh Mesbah
Kyriakos Fragkeskos
Christoph Lofi
Alessandro Bozzon
Geert-Jan Houben
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-58068-5_20

Neuer Inhalt