Skip to main content

2018 | Buch

Semantics, Analytics, Visualization

3rd International Workshop, SAVE-SD 2017, Perth, Australia, April 3, 2017, and 4th International Workshop, SAVE-SD 2018, Lyon, France, April 24, 2018, Revised Selected Papers

herausgegeben von: Alejandra González-Beltrán, Francesco Osborne, Dr. Silvio Peroni, Sahar Vahdati

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 3rd International Workshop, SAVE-SD 2017, held in Perth, Australia, in April 2017, and the 4th International Workshop, SAVE-SD 2018, held in Lyon, France, in April 2018. The 6 full, 2 position and 4 short papers were selected from 16 submissions. The papers describe multiple ways in which scholarly dissemination can be approved: Creating structured data, providing methods for semantic computational analysis and designing systems for navigating. This allows a variety of stakeholders to understand research dynamics, predict trends and evaluate the quality of research.

Inhaltsverzeichnis

Frontmatter
Towards a Cloud-Based Service for Maintaining and Analyzing Data About Scientific Events
Abstract
We propose the new cloud-based service OpenResearch for managing and analyzing data about scientific events such as conferences and workshops in a persistent and reliable way. This includes data about scientific articles, participants, acceptance rates, submission numbers, impact values as well as organizational details such as program committees, chairs, fees and sponsoring. OpenResearch is a centralized repository for scientific events and supports researchers in collecting, organizing, sharing and disseminating information about scientific events in a structured way. An additional feature currently under development is the possibility to archive web pages along with the extracted semantic data in order to lift the burden of maintaining new and old conference web sites from public research institutions. However, the main advantage is that this cloud-based repository enables a comprehensive analysis of conference data. Based on extracted semantic data, it is possible to determine quality estimations, scientific communities, research trends as well the development of acceptance rates, fees and number of participants in a continuous way complemented by projections into the future. Furthermore, data about research articles can be systematically explored using a content-based analysis as well as citation linkage. All data maintained in this crowd-sourcing platform is made freely available through an open SPARQL endpoint, which allows for analytical queries in a flexible and user-defined way.
Andreas Behrend, Sahar Vahdati, Christoph Lange, Christiane Engels
ILastic: Linked Data Generation Workflow and User Interface for iMinds Scholarly Data
Abstract
Enriching scholarly data with metadata enhances the publications’ meaning. Unfortunately, different publishers of overlapping or complementary scholarly data neglect general-purpose solutions for metadata and instead use their own ad-hoc solutions. This leads to duplicate efforts and entails non-negligible implementation and maintenance costs. In this paper, we propose a reusable Linked Data publishing workflow that can be easily adjusted by different data owners to (i) generate and publish Linked Data, and (ii) align scholarly data repositories with enrichments over the publications’ content. As a proof-of-concept, the proposed workflow was applied to the iMinds research institute data warehouse, which was aligned with publications’ content derived from Ghent University’s digital repository. Moreover, we developed a user interface to help lay users with the exploration of the iLastic Linked Data set. Our proposed approach relies on a general-purpose workflow. This way, we manage to reduce the development and maintenance costs and increase the quality of the resulting Linked Data.
Anastasia Dimou, Gerald Haesendonck, Martin Vanbrabant, Laurens De Vocht, Ruben Verborgh, Steven Latré, Erik Mannens
About a BUOI: Joint Custody of Persistent Universally Unique Identifiers on the Web, or, Making PIDs More FAIR
Abstract
The findability and interoperability of some persistent identifiers (PIDs) in use on the internet and their compliance with the FAIR data principles [12, 35] are explored. It is suggested that the wide distribution and findability (e.g. by simple ‘googling’) on the internet may be more important for the usefulness of identifiers, than the resolvability of links by one single authority, purportedly guaranteeing their permanence and authenticity. The prevalence of phenomena such as link rot implies that the permanence of URLs, PURLs or URIs cannot be trusted. By contrast, the well distributed, but seldom directly resolvable ISBN identifier has proved remarkably resilient, with far-reaching persistence, inherent structural meaning and good validatability, by means of fixed string-length, pattern-recognition, restricted character set and check digit. Adding context and meaning to identifiers through namespace prefixes and object types is also suggested. Arguing for a wide distribution of validatable identifiers, the conclusion resembles the experience of the boy Marcus in the novel-based film About a boy, from living with a suicidal mother: It’s not sufficient to rely on one source only for sustenance. You need more than that. You need backup, in case something happens [20, 32].
Joakim Philipson
Extending ScholarlyData with Research Impact Indicators
Abstract
ScholarlyData is the reference linked dataset of the Semantic Web community about papers, people, organisations, and events related to its academic conferences. In this paper we present an extension of such a linked dataset and its associated ontology (i.e. the conference ontology) in order to represent research impact indicators. The latter includes both traditional (e.g. citation count) and alternative indicators (e.g. altmetrics).
Andrea Giovanni Nuzzolese, Valentina Presutti, Aldo Gangemi, Paolo Ciancarini
Geographical Trends in Research: A Preliminary Analysis on Authors’ Affiliations
Abstract
In the last decade, research literature reached an enormous volume with an unprecedented current annual increase of 1.5 million new publications. As research gets ever more global and new countries and institutions, either from academia or corporate environment, start to contribute with their share, it is important to monitor this complex scenario and understand its dynamics.
We present a study on a conference proceedings dataset extracted from Springer Nature Scigraph that illustrates insightful geographical trends and highlights the unbalanced growth of competitive research institutions worldwide. Results emerged from our micro and macro analysis show that the distributions among countries of institutions and papers follow a power law, and thus very few countries keep producing most of the papers accepted by high-tier conferences. In addition, we found that the annual and overall turnover rate of the top 5, 10 and 25 countries is extremely low, suggesting a very static landscape in which new entries struggle to emerge. Finally, we highlight the presence of an increasing gap between the number of institutions initiating and overseeing research endeavours (i.e. first and last authors’ affiliations) and the total number of institutions participating in research. As a consequence of our analysis, the paper also discusses our experience in working with affiliations: an utterly simple matter at first glance, that is instead revealed to be a complex research and technical challenge yet far from being solved.
Andrea Mannocci, Francesco Osborne, Enrico Motta
A Web Application for Creating and Sharing Visual Bibliographies
Abstract
The amount of information provided by peer-reviewed scientific literature citation indexes such as Scopus, Web of Science (WOS), CrossRef and OpenCitations is huge: it offers users a lot of metadata about publications, such as the list of papers written by a specific author, the editorial and content details of a paper, the list of references and citations. But, for a researcher it could also be interesting to: extract these data in real time in order to create bibliographies, for example, by starting with a small set of significant papers or a restricted number of authors, progressively enriching them by exploring cited/citing references; dispose them in a graphical and aggregate representation; be able to easily share them with other interested researchers.
With these main intents, we modelled and realized VisualBib, a Web application prototype, which enables the user to select sets of papers and/or authors in order to create customized bibliographies, and visualize them in real time, aggregating data from different sources in a comprehensive, holistic graphical view.
The bibliographies are displayed using time-based visualizations, called narrative views, which contain explicit representations of the authorship and citing relations. These views may help users to: describe a research area; disseminate research on a specific topic and share personal opinions; present or evaluate the entire production of a researcher or research groups in a fresh way.
Marco Corbatto, Antonina Dattolo
Optimized Machine Learning Methods Predict Discourse Segment Type in Biological Research Articles
Abstract
To define salient rhetorical elements in scholarly text, we have earlier defined a set of Discourse Segment Types: semantically defined spans of discourse at the level of a clause with a single rhetorical purpose, such as Hypothesis, Method or Result. In this paper, we use machine learning methods to predict these Discourse Segment Types in a corpus of biomedical research papers. The initial experiment used features related to verb type and form, obtaining F-scores ranging from 0.41–0.65. To improve our results, we explored a variety of methods for balancing classes, before applying classification algorithms. We also performed an ablation study and stepwise approach for feature selection. Through these feature selection processes, we were able to reduce our 37 features to the 9 most informative ones, while maintaining F1 scores in the range of 0.63–0.65. Next, we performed an experiment with a reduced set of target classes. Using only verb tense features, logistic regression, a decision tree classifier and a random forest classifier, we predicted that a segment type was either a Result/Method or a Fact/Implication, with F1 scores above 0.8. Interestingly, findings from this machine learning approach are in line with a reader experiment, which found a correlation between verb tense and a biomedical reader’s interpretation of discourse segment type. This suggests that experimental and concept-centric discourse in biology texts can be distinguished by humans or machines, using verb tense as a key feature.
Jessica Cox, Corey A. Harper, Anita de Waard
EVENTS: A Dataset on the History of Top-Prestigious Events in Five Computer Science Communities
Abstract
Information emanating from scientific events, journal, organizations, institutions as well as scholars become increasingly available online. Therefore, there is a great demand to assess, analyze and organize this huge amount of data produced every day, or even every hour. In this paper, we present a dataset (EVENTS) of scientific events, containing historical data about the publications, submissions, start date, end date, location and homepage for 25 top-prestigious event series (718 editions in total) in five computer science communities. The dataset is publicly available online in three different formats (i.e., CSV, XML, and RDF). It is of primary interest to the steering committees or program chairs of the events to assess the progress of their event over time and compare it to competing events in the same field, and to potential authors looking for events to publish their work. In addition, we shed light on these events by analyzing their metadata over the last 50 years. Our transferable analysis is based on exploratory data analysis.
Said Fathalla, Christoph Lange
OSCAR: A Customisable Tool for Free-Text Search over SPARQL Endpoints
Abstract
SPARQL is a very powerful query language for RDF data, which can be used to retrieve data following specific patterns. In order to foster the availability of scholarly data on the Web, several project and institutions make available Web interfaces to SPARQL endpoints so as to enable a user to search for information in the RDF datasets they expose using SPARQL. However, SPARQL is quite complex to learn, and usually it is fully accessible only to experts in Semantic Web technologies, remaining completely obscure to ordinary Web users. In this paper we introduce OSCAR, the OpenCitations RDF Search Application, which is a user-friendly search platform that can be used to search any RDF triplestore providing a SPARQL endpoint, while hiding the complexities of SPARQL. We present its main features and demonstrate how it can be adapted to work with different SPARQL endpoints containing scholarly data, vis those provided by OpenCitations, ScholarlyData and Wikidata. We conclude by discussing the results of a user testing session that reveal the usability of the OSCAR search interface when employed to access information within the OpenCitations Corpus.
Ivan Heibi, Silvio Peroni, David Shotton
Storing Combustion Data Experiments: New Requirements Emerging from a First Prototype
Position Paper
Abstract
Repositories for scientific and scholarly data are valuable resources to share, search, and reuse data by the community. Such repositories are essential in data-driven research based on experimental data. In this paper we focus on the case of combustion kinetic modeling, where the goal is to design models typically validated by means of comparisons with a large number of experiments.
In this paper, we discuss new requirements emerging from the analysis of an existing data collection prototype and its associated services. New requirements, elaborated in the paper, include the acquisition of new experiments, the automatic discovery of new sources, semantic exploration of information and multi-source integration, the selection of data for model validation.
These new requirements set the need for a new representation of scientific data and associated metadata. This paper describes the scenario, the requirements and outlines an initial architecture to support them.
Gabriele Scalia, Matteo Pelucchi, Alessandro Stagni, Tiziano Faravelli, Barbara Pernici
Investigating Facets to Characterise Citations for Scholars
Abstract
Citations within academic literature keep gaining more importance both for the work of scholars and for improving digital libraries related tools and services. We present in this article the preliminary results of an investigation on the characterisations of citations whose objective is to propose a framework for globally enriching citations with explicit information about their nature, role and characteristics. This article focuses on the set of properties we are studying to support the automatic analysis of large corpora of citations. This model is grounded on a literature review also detailed here, and has been submitted to a group of several hundreds of scholars of all disciplines in the form of a survey. The results confirm that these properties are perceived as useful.
Angelo Di Iorio, Freddy Limpens, Silvio Peroni, Agata Rotondi, Georgios Tsatsaronis, Jorgos Achtsivassilis
Striving for Semantics of Plant Phenotyping Data
Abstract
Addressing the goal of the workshop, i.e. to bridge the gap between academic and industrial aspects in regard to scholarly data, we inspect the case of plant phenotyping data publishing. We discuss how the publishers could foster advancements in the field of plant research and data analysis methods by warranting good quality phenotypic data with foreseeable semantics.
Examining of a set of scientific journals dealing with life sciences for their policy with respect to plant phenotyping data publication shows that this type of resource seems largely overlooked by the data policy-makers. Current lack of recognition, and resulting lack of recommended standards and repositories for plant phenotypic data, leads to depreciation of such datasets and their dispersion within general-purpose, unstructured data storages. No clear incentive for individual researchers to follow data description and deposition guidelines makes it difficult to develop and promote new approaches and tools utilising public phenotypic data resources.
Hanna Ćwiek-Kupczyńska
Backmatter
Metadaten
Titel
Semantics, Analytics, Visualization
herausgegeben von
Alejandra González-Beltrán
Francesco Osborne
Dr. Silvio Peroni
Sahar Vahdati
Copyright-Jahr
2018
Electronic ISBN
978-3-030-01379-0
Print ISBN
978-3-030-01378-3
DOI
https://doi.org/10.1007/978-3-030-01379-0

Neuer Inhalt