Skip to main content
main-content
Top

About this book

This book constitutes thoroughly reviewed and selected papers presented at Workshops and Doctoral Consortium of the 24th East-European Conference on Advances in Databases and Information Systems, ADBIS 2020, the 24th International Conference on Theory and Practice of Digital Libraries, TPDL 2020, and the 16th Workshop on Business Intelligence and Big Data, EDA 2020, held in August 2020. Due to the COVID-19 the joint conference and satellite events were held online.

The 26 full papers and 5 short papers were carefully reviewed and selected from 56 submissions. This volume presents the papers that have been accepted for the following satellite events: Workshop on Intelligent Data - From Data to Knowledge, DOING 2020; Workshop on Modern Approaches in Data Engineering and Information System Design, MADEISD 2020; Workshop on Scientic Knowledge Graphs, SKG 2020; Workshop of BI & Big Data Applications, BBIGAP 2020; International Symposium on Data-Driven Process Discovery and Analysis, SIMPDA 2020; International Workshop on Assessing Impact and Merit in Science, AIMinScience 2020; Doctoral Consortium.

Table of Contents

Frontmatter

ADBIS, TPDL and EDA 2020 Common Workshops

Frontmatter

Databases and Information Systems in the AI Era: Contributions from ADBIS, TPDL and EDA 2020 Workshops and Doctoral Consortium

Research on database and information technologies has been rapidly evolving over the last couple of years. This evolution was lead by three major forces: Big Data, AI and Connected World that open the door to innovative research directions and challenges, yet exploiting four main areas: (i) computational and storage resource modeling and organization; (ii) new programming models, (iii) processing power and (iv) new applications that emerge related to health, environment, education, Cultural Heritage, Banking, etc. The 24th East-European Conference on Advances in Databases and Information Systems (ADBIS 2020), the 24th International Conference on Theory and Practice of Digital Libraries (TPDL 2020) and the 16th Workshop on Business Intelligence and Big Data (EDA 2020), held during August 25–27, 2020, at Lyon, France, and associated satellite events aimed at covering some emerging issues related to database and information system research in these areas. The aim of this paper is to present such events, their motivations, and topics of interest, as well as briefly outline the papers selected for presentations. The selected papers will then be included in the remainder of this volume.

Ladjel Bellatreche, Fadila Bentayeb, Mária Bieliková, Omar Boussaid, Barbara Catania, Paolo Ceravolo, Elena Demidova, Mirian Halfeld Ferrari, Maria Teresa Gomez Lopez, Carmem S. Hara, Slavica Kordić, Ivan Luković, Andrea Mannocci, Paolo Manghi, Francesco Osborne, Christos Papatheodorou, Sonja Ristić, Dimitris Sacharidis, Oscar Romero, Angelo A. Salatino, Guilaine Talens, Maurice van Keulen, Thanasis Vergoulis, Maja Zumer

1st Workshop on Intelligent Data - From Data to Knowledge (DOING 2020)

Frontmatter

Extraction of a Knowledge Graph from French Cultural Heritage Documents

Cultural heritage in Quebec is often represented as collections of French documents that contain a lot of valuable, yet unstructured, data. One of the current aims of the Quebec Ministry of Culture and Communications (MCCQ) is to learn a knowledge graph from unstructured documents to offer an integrated semantic portal on Quebec’s cultural heritage. In the context of this project, we describe a machine learning and open information extraction approach that leverages named entity extraction and open relation extraction in English to extract a knowledge graph from French documents. We also enhance the generic entities that can be recognized in texts with domain-related types.Our results show that our method leads to a substantial enrichment of the knowledge graph based on the initial corpus provided by the MCCQ.

Erwan Marchand, Michel Gagnon, Amal Zouaq

Natural Language Querying System Through Entity Enrichment

This paper focuses on a domain expert querying system over databases. It presents a solution designed for a French enterprise interested in offering a natural language interface for its clients. The approach, based on entity enrichment, aims at translating natural language queries into database queries. In this paper, the database is treated through a logical paradigm, suggesting the adaptability of our approach to different database models. The good precision of our method is shown through some preliminary experiments.

Joshua Amavi, Mirian Halfeld Ferrari, Nicolas Hiot

Public Riots in Twitter: Domain-Based Event Filtering During Civil Unrest

Civil unrest is public manifestations, where people demonstrate their position for different causes. Sometimes, violent events or riots are unleashed in this kind of events, and these can be revealed from tweets posted by involved people. This study describes a methodology to detect riots within the time of a protest to identify potential adverse developments from tweets. Using two own datasets related to a violent and non-violent protest in Peru, we applied temporal clustering to obtain events and identify a tweet headline per cluster. We then extracted relevant terms for the scoring and ranking process using a different domain and contrast corpus built from different sources. Finally, we filtered the relevant events for the violence domain by using a contrast evaluation between the two datasets. The obtained results highlight the adequacy of the proposed approach.

Arturo Oncevay, Marco Sobrevilla, Hugo Alatrista-Salas, Andrés Melgar

Classification of Relationship in Argumentation Using Graph Convolutional Network

The Argument Relationship Prediction is one of the tasks of Argumentation Mining that aim to find connections between arguments (or parts thereof). This task is considered as one of the most complex stages of argumentation. Concomitant to that, the Graph Convolutional Network (GCN) has been successfully applied to graph-based applications. In this study, we join the relationship prediction challenge with the ability of GCN to classification. We propose ArgGCN, a framework based in GCN method applied to the classification of relationships between arguments. The arguments are considered as short texts, and we abstracted the recognition of unitary elements from them (such as claims and evidence). In this study, we achieved promising results on the UKP Aspect, AFS, and Microtext corpus.

Dimmy Magalhães, Aurora Pozo

Recursive Expressions for SPARQL Property Paths

Regular expressions are used in SPARQL property paths to query RDF graphs. However, regular expressions can only define the most limited class of languages, called regular languages. Context-free languages are a wider class containing all regular languages. There are no context-free expressions to define them, so it is necessary to write grammars. We propose an extension of regular expressions, called recursive expressions, to support the definition of a subset of context-free languages. The goal of our work is therefore to provide simple operators allowing the definition of languages as close as possible to context-free languages.

Ciro Medeiros, Umberto Costa, Semyon Grigorev, Martin A. Musicante

Healthcare Decision-Making Over a Geographic, Socioeconomic, and Image Data Warehouse

Geographic, socioeconomic, and image data enrich the range of analysis that can be achieved in the healthcare decision-making. In this paper, we focus on these complex data with the support of a data warehouse. We propose three designs of star schema to store them: jointed, split, and normalized. We consider healthcare applications that require data sharing and manage huge volumes of data, where the use of frameworks like Spark is needed. To this end, we propose SimSparkOLAP, a Spark strategy to efficiently process analytical queries extended with geographic, socioeconomic, and image similarity predicates. Performance tests showed that the normalized schema provided the best performance results, followed closely by the jointed schema, which in turn outperformed the split schema. We also carried out examples of semantic queries and discuss their importance to the healthcare decision-making.

Guilherme M. Rocha, Piero L. Capelo, Cristina D. A. Ciferri

OMProv: Provenance Mechanism for Objects in Deep Learning

Deep learning technology is widely used in industry and academia nowadays. Several kinds of objects are involved in deep learning workflows, including algorithms, models, and labeled datasets. The effectiveness of organizing and understanding the relationship among these objects determines the efficiency of development and production. This paper proposes OMProv, which is a provenance mechanism for recording the lineage within each kind of object, and the relationship among different kinds of objects in the same execution. A weighted directed acyclic graph-based version graph abstraction and a version inference algorithm are proposed. They are consciously designed to fit the characteristics of deep learning scenarios. OMProv has been implemented in OMAI, an all-in-one deep learning platform for the cloud. OMProv helps users organize objects effectively and intuitively, and understand the root causes of the changed job results like performance or accuracy in an efficient way. The management of deep learning lifecycles and related data assets can also be simplified by using OMProv.

Jian Lin, Dongming Xie

Exploiting IoT Data Crossings for Gradual Pattern Mining Through Parallel Processing

Today, with the proliferation of Internet of Things (IoT) applications in almost every area of our society comes the trouble of deducing relevant information from real-time time-series data (from different sources) for decision making. In this paper, we propose a fuzzy temporal approach for crossing such data sets with the ultimate goal of exploiting them for temporal gradual pattern mining. A temporal gradual pattern may take the form: “the higher the humidity, the lower the temperature, almost 15 min later”. In addition, we apply parallel processing on our implementation and measure its computational performance.

Dickson Odhiambo Owuor, Anne Laurent, Joseph Onderi Orero

Cooking Related Carbon Footprint Evaluation and Optimisation

Carbon Footprint of foods has been a major concern for the past few years. Food production is responsible for a quarter of all GHG (Green House Gas) emissions. Many food’s Carbon Footprint calculators can be found online but most of them give individual results per ingredient and do not offer a perspective of the whole recipe’s Carbon Footprint. Many factors have to be taken into account for this calculation as the origin of the food, the location of the cooker, but also the way to cook and to assemble ingredients. In this paper, we present the CROPPER (CaRbon fOotprint reciPe oPtimizER) that improves an input recipe by updating its ingredients (origin, type) and its cooking procedures to reduce its Carbon Footprint while keeping it savory.

Damien Alvarez de Toledo, Laurent d’Orazio, Frederic Andres, Maria C. A. Leite

2nd Workshop on Modern Approaches in Data Engineering and Information System Design (MADEISD 2020)

Frontmatter

CrEx-Wisdom Framework for Fusion of Crowd and Experts in Crowd Voting Environment – Machine Learning Approach

In recent years crowd-voting and crowd-sourcing systems are attracting increased attention in research and industry. As a part of computational social choice (COMSOC) crowd-voting and crowd-sourcing address important societal problems (e.g. participatory budgeting), but also many industry problems (e.g. sentiment analyses, data labeling, ranking and selection, etc.). Consequently, decisions that are based on aggregation of crowd votes do not guarantee high-quality results. Even more, in many cases majority of crowd voters may not be satisfied with final decisions if votes have high heterogeneity. On the other side in many crowd voting problems and settings it is possible to acquire and formalize knowledge and/or opinions from domain experts. Integration of expert knowledge and “Wisdom of crowd” should lead to high-quality decisions that satisfy crowd opinion. In this research, we address the problem of integration of experts domain knowledge with “Wisdom of crowds” by proposing machine learning based framework that enables ranking and selection of alternatives as well as quantification of quality of crowd votes. This framework enables weighting of crowd votes with respect to expert knowledge and procedures for modeling trade-off between crowd and experts satisfaction with final decisions (ranking or selection).

Ana Kovacevic, Milan Vukicevic, Sandro Radovanovic, Boris Delibasic

Temporal Network Analytics for Fraud Detection in the Banking Sector

A new methodology in temporal networks is presented for the use of fraud detection systems in the banking sector. Standard approaches of fraudulence monitoring mainly have the focus on the individual client data. Our approach will concentrate on the hidden data produced by the network of the transaction database. The methodology is based on a cycle detection method with the help of which important patterns can be identified as shown by the test on real data. Our solution is integrated into a financial fraud system of a bank; the experimental results are demonstrated by a real-world case study.

László Hajdu, Miklós Krész

Abdominal Aortic Aneurysm Segmentation from Contrast-Enhanced Computed Tomography Angiography Using Deep Convolutional Networks

One of the most common imaging methods for diagnosing an abdominal aortic aneurysm, and an endoleak detection is computed tomography angiography. In this paper, we address the problem of aorta and thrombus semantic segmentation, what is a mandatory step to estimate aortic aneurysm diameter. Three end-to-end convolutional neural networks were trained and evaluated. Finally, we proposed an ensemble of deep neural networks with underlying U-Net, ResNet, and VBNet frameworks. Our results show that we are able to outperform state-of-the-art methods by 3% on the Dice metric without any additional post-processing steps.

Tomasz Dziubich, Paweł Białas, Łukasz Znaniecki, Joanna Halman, Jakub Brzeziński

Automated Classifier Development Process for Recognizing Book Pages from Video Frames

One of the latest developments made by publishing companies is introducing mixed and augmented reality to their printed media (e.g. to produce augmented books). An important computer vision problem that they are facing is classification of book pages from video frames. The problem is non-trivial, especially considering that typical training data is limited to only one digital original per book page, while the trained classifier should be suitable for real-time utilization on mobile devices, where camera can be exposed to highly diverse conditions and computing resources are limited. In this paper we address this problem by proposing an automated classifier development process that allows training classification models that run real-time, with high usability, on low-end mobile devices and achieve average accuracy of 88.95% on our in-house developed test set consisting of over 20 000 frames from real videos of 5 books for children. At the same time, deployment tests reveal that the classifier development process time is reduced approximately 16-fold.

Adam Brzeski, Jan Cychnerski, Karol Draszawka, Krystyna Dziubich, Tomasz Dziubich, Waldemar Korłub, Paweł Rościszewski

1st Workshop on Scientific Knowledge Graphs (SKG 2020)

Frontmatter

DINGO: An Ontology for Projects and Grants Linked Data

We present DINGO (Data INtegration for Grants Ontology), an ontology that provides a machine readable extensible framework to model data relative to projects, funding, actors, and, notably, funding policies in the research landscape. DINGO is designed to yield high modeling power and elasticity to cope with the huge variety in funding, research and policy practices, which makes it applicable also to other areas besides research where funding is an important aspect. We discuss its main features, the principles followed for its development, its community uptake, its maintenance and evolution.

Diego Chialva, Alexis-Michel Mugabushaka

Open Science Graphs Must Interoperate!

Open Science Graphs (OSGs) are Scientific Knowledge Graphs whose intent is to improve the overall FAIRness of science, by enabling open access to graph representations of metadata about people, artefacts, institutions involved in the research lifecycle, as well as the relationships between these entities, in order to support stakeholder needs, such as discovery, reuse, reproducibility, statistics, trends, monitoring, impact, validation, and assessment. The represented information may span across entities such as research artefacts (e.g. publications, data, software, samples, instruments) and items of their content (e.g. statistical hypothesis tests reported in publications), research organisations, researchers, services, projects, and funders. OSGs include relationships between such entities and sometimes formalised (semantic) concepts characterising them, such as machine-readable concept descriptions for advanced discoverability, interoperability, and reuse. OSGs are generally valuable individually, but would greatly benefit from information exchange across their collections, thereby improving their efficacy to serve stakeholder needs. They could, therefore, reuse and exploit the data aggregation and added value that characterise each OSG, decentralising the effort and capitalising on synergies, as no one-size-fits-all solution exists. The RDA IG on Open Science Graphs for FAIR Data is investigating the motivation and challenges underpinning the realisation of an Interoperability Framework for OSGs. This work describes the key motivations for i) the definition of a classification for OSGs to compare their features, identify commonalities and differences, and added value and for ii) the definition of an Interoperability Framework, specifically an information model and APIs that enable a seamless exchange of information across graphs.

Amir Aryani, Martin Fenner, Paolo Manghi, Andrea Mannocci, Markus Stocker

WikiCSSH: Extracting Computer Science Subject Headings from Wikipedia

Domain-specific classification schemas (or subject heading vocabularies) are often used to identify, classify, and disambiguate concepts that occur in scholarly articles. In this work, we develop, apply, and evaluate a human-in-the-loop workflow that first extracts an initial category tree from crowd-sourced Wikipedia data, and then combines community detection, machine learning, and hand-crafted heuristics or rules to prune the initial tree. This work resulted in WikiCSSH; a large-scale, hierarchically-organized subject heading vocabulary for the domain of computer science (CS). Our evaluation suggests that WikiCSSH outperforms alternative CS vocabularies in terms of coverage of CS terms that occur in research articles. WikiCSSH can further distinguish between coarse-grained versus fine-grained CS concepts. The outlined workflow can serve as a template for building hierarchically-organized subject heading vocabularies for other domains that are covered in Wikipedia.

Kanyao Han, Pingjing Yang, Shubhanshu Mishra, Jana Diesner

Integrating Knowledge Graphs for Analysing Academia and Industry Dynamics

Academia and industry are constantly engaged in a joint effort for producing scientific knowledge that will shape the society of the future. Analysing the knowledge flow between them and understanding how they influence each other is a critical task for researchers, governments, funding bodies, investors, and companies. However, current corpora are unfit to support large-scale analysis of the knowledge flow between academia and industry since they lack of a good characterization of research topics and industrial sectors. In this short paper, we introduce the Academia/Industry DynAmics (AIDA) Knowledge Graph, which characterizes 14M papers and 8M patents according to the research topics drawn from the Computer Science Ontology. 4M papers and 5M patents are also classified according to the type of the author’s affiliations (academy, industry, or collaborative) and 66 industrial sectors (e.g., automotive, financial, energy, electronics) obtained from DBpedia. AIDA was generated by an automatic pipeline that integrates several knowledge graphs and bibliographic corpora, including Microsoft Academic Graph, Dimensions, English DBpedia, the Computer Science Ontology, and the Global Research Identifier Database.

Simone Angioni, Angelo A. Salatino, Francesco Osborne, Diego Reforgiato Recupero, Enrico Motta

A Philological Perspective on Meta-scientific Knowledge Graphs

This paper discusses knowledge graphs and networks on the scientific process from a philological viewpoint. Relevant themes are: the smallest entities of scientific discourse; the treatment of documents or artefacts as texts and commentaries in discourse; and links between context, (co)text, and data points. As an illustration of the micro-level approach, version control of linguistic examples is discussed as a possible field of application in this discipline. This underlines the claim for data points to be treated like unique entities, which possess metadata of the datum and any text generating knowledge from it.

Tobias Weber

2nd Workshop of BI and Big Data Applications (BBIGAP 2020)

Frontmatter

A Scored Semantic Cache Replacement Strategy for Mobile Cloud Database Systems

Current mobile cloud database systems are widespread and require special considerations for mobile devices. Although many systems rely on numerous metrics for use and optimization, few systems leverage metrics for decisional cache replacement on the mobile device. In this paper we introduce the Lowest Scored Replacement (LSR) policy—a novel cache replacement policy based on a predefined score which leverages contextual mobile data and user preferences for decisional replacement. We show an implementation of the policy using our previously proposed MOCCAD-Cache as our decisional semantic cache and our Normalized Weighted Sum Algorithm (NWSA) as a score basis. Our score normalization is based on the factors of query response time, energy spent on mobile device, and monetary cost to be paid to a cloud provider. We then demonstrate a relevant scenario for LSR, where it excels in comparison to the Least Recently Used (LRU) and Least Frequently Used (LFU) cache replacement policies.

Zachary Arani, Drake Chapman, Chenxiao Wang, Le Gruenwald, Laurent d’Orazio, Taras Basiuk

Grid-Based Clustering of Waze Data on a Relational Database

In the urban environment, data collected from traffic events can serve as elements of study for city planning. The challenge is to transform this raw data into knowledge of mobility. Events are usually stored as individual records in a database system, and urban planning involves costly spatial queries. In this paper, we investigate the effect of a grid-based clustering on the performance of such queries, using an off-the-shelf relational database and index structure. We report on the results of this approach using data collected from Waze over a period of one year. We compare the performance of our grid-based approach with a clustered R-tree index over the geometric attribute. The results of this study are of interest to developers of applications that involve spatial data over a specific geographic area, using an existing database management system.

Mariana M. G. Duarte, Rebeca Schroeder, Carmem S. Hara

Your Age Revealed by Facebook Picture Metadata

Facebook users unknowingly reveal personal information that may help attackers to perpetrate malicious actions. In this paper, we show how sensitive age information of a given target user can be predicted from his/her online pictures. More precisely, we perform age inference attacks by leveraging picture metadata such as (i) alt-texts automatically generated by Facebook to describe the picture content, and (ii) picture reactions (comments and emojis) of other Facebook users. We investigate whether the target’s age affects other users’ reactions to his/her pictures. Our experiments show that age information can be inferred with AUC of 62% by using only alt-texts and with AUC of 89% by using combination of alt-texts and users’ reactions. Additionally, we present a detailed analysis of spearman correlation between reactions of Facebook users and age.

Sanaz Eidizadehakhcheloo, Bizhan Alipour Pijani, Abdessamad Imine, Michaël Rusinowitch

Enacting Data Science Pipelines for Exploring Graphs: From Libraries to Studios

This paper proposes a study of existing environments used to enact data science pipelines applied to graphs. Data science pipelines are a new form of queries combining classic graph operations with artificial intelligence graph analytics operations. A pipeline defines a data flow consisting of tasks for querying, exploring and analysing graphs. Different environments and systems can be used for enacting pipelines. They range from graph NoSQL stores, programming languages extended with libraries providing graph processing and analytics functions, to full machine learning and artificial intelligence studios. The paper describes these environments and the design principles that they promote for enacting data science pipelines intended to query, process and explore data collections and particularly graphs.

Genoveva Vargas-Solar, José-Luis Zechinelli-Martini, Javier A. Espinosa-Oviedo

International Symposium on Data-Driven Process Discovery and Analysis (SIMPDA 2020)

Frontmatter

Towards the Detection of Promising Processes by Analysing the Relational Data

Business process discovery provides mechanisms to extract the general process behaviour from event observations. However, not always the logs are available and must be extracted from repositories, such as relational databases. Derived from the references that exist between the relational tables, several are the possible combinations of traces of events that can be extracted from a relational database. Different traces can be extracted depending on which attribute represents the $$case_{-}id$$ c a s e - i d , what are the attributes that represent the execution of an activity, or how to obtain the timestamp to define the order of the events. This paper proposes a method to analyse a wide range of possible traces that could be extracted from a relational database, based on measuring the level of interest of extracting a trace log, later used for a discovery process. The analysis is done by means of a set of proposed metrics before the traces are generated and the process is discovered. This analysis helps to reduce the computational cost of process discovery. For a possible $$case_{-}id$$ c a s e - i d every possible traces are analysed and measured. To validate our proposal, we have used a real relational database, where the detection of processes (most and least promising) are compared to rely on our proposal.

Belén Ramos-Gutiérrez, Luisa Parody, María Teresa Gómez-López

Analysis of Language Inspired Trace Representation for Anomaly Detection

A great concern for organizations is to detect anomalous process instances within their business processes. For that, conformance checking performs model-aware analysis by comparing process logs to business models for the detection of anomalous process executions. However, in several scenarios, a model is either unavailable or its generation is costly, which requires the employment of alternative methods to allow a confident representation of traces. This work supports the analysis of language inspired process analysis grounded in the word2vec encoding algorithm. We argue that natural language encodings correctly model the behavior of business processes, supporting a proper distinction between common and anomalous behavior. In the experiments, we compared accuracy and time cost among different word2vec setups and classic encoding methods (token-based replay and alignment features), addressing seven different anomaly scenarios. Feature importance values and the impact of different anomalies in seven event logs were also evaluated to bring insights on the trace representation subject. Results show the proposed encoding overcomes representational capability of traditional conformance metrics for the anomaly detection task.

Gabriel Marques Tavares, Sylvio Barbon

The 1st International Workshop on Assessing Impact and Merit in Science (AIMinScience 2020)

Frontmatter

Exploring Citation Networks with Hybrid Tree Pattern Queries

Scientific impact of publications is often measured using citation networks. However, traditional measures typically rely on direct citations only. To fully leverage citation networks for assessing scientific impact, it is necessary to investigate also indirect scientific influence, which is captured by citation paths. Further, the analysis and exploration of citation networks requires the ability to efficiently evaluate expressive queries on them. In this paper, we propose to use hybrid query patterns to query citation networks. These allow for both edge-to-edge and edge-to-path mappings between the query pattern and the graph, thus being able to extract both direct and indirect relationships. To efficiently evaluate hybrid pattern queries on citation graphs, we employ a pattern matching algorithm which exploits graph simulation to prune nodes that do not appear in the final answer. Our experimental results on citation networks show that our method not only allows for more expressive queries but is also efficient and scalable.

Xiaoying Wu, Dimitri Theodoratos, Dimitrios Skoutas, Michael Lan

ArtSim: Improved Estimation of Current Impact for Recent Articles

As the number of published scientific papers continuously increases, the need to assess paper impact becomes more valuable than ever. In this work, we focus on citation-based measures that try to estimate the popularity (current impact) of an article. State-of-the-art methods in this category calculate estimates of popularity based on paper citation data. However, with respect to recent publications, only limited data of this type are available, rendering these measures prone to inaccuracies. In this work, we present ArtSim, an approach that exploits paper similarity, calculated using scholarly knowledge graphs, to better estimate paper popularity for recently published papers. Our approach is designed to be applied on top of existing popularity measures, to improve their accuracy. We apply ArtSim on top of four well-known popularity measures and demonstrate through experiments its potential in improving their popularity estimates.

Serafeim Chatzopoulos, Thanasis Vergoulis, Ilias Kanellos, Theodore Dalamagas, Christos Tryfonopoulos

Link Prediction in Bibliographic Networks

Analysing bibliographic networks is important for understanding the process of scientific publications. A bibliographic network can be studied using the framework of Heterogeneous Information Networks (HINs). In this paper, we comparatively evaluate two different algorithms for link prediction in HINs on an instance of a bibliographic network. These two algorithms represent two distinct categories: algorithms that use path-related features of the graph and algorithms that use node embeddings. The results suggest that the path-based algorithms achieve significantly better performance on bibliographic networks.

Pantelis Chronis, Dimitrios Skoutas, Spiros Athanasiou, Spiros Skiadopoulos

Open Science Observatory: Monitoring Open Science in Europe

Monitoring and evaluating Open Science (OS) practices and research output in a principled and continuous way is recognised as one of the necessary steps towards its wider adoption. This paper presents the Open Science Observatory, a prototype online platform which combines data gathered from OpenAIRE e-Infrastructure and other public data sources and informs users via rich visualizations on different OS indicators in Europe.

George Papastefanatos, Elli Papadopoulou, Marios Meimaris, Antonis Lempesis, Stefania Martziou, Paolo Manghi, Natalia Manola

Skyline-Based University Rankings

University rankings comprise a significant tool in making decisions in our modern educational process. In this paper, we propose a novel university ranking method based on the skyline operator, which is used on multi-dimensional objects to extract the non-dominated (i.e.“prevailing") ones. Our method is characterized by several advantages, such as: it is transparent, reproducible, without any arbitrarily selected parameters, based on the research output of universities only and not on publicly not traceable or random questionnaires. Our method does not provide meaningless absolute rankings but rather it ranks universities categorized in equivalence classes. We evaluate our method experimentally with data extracted from Microsoft Academic.

Georgios Stoupas, Antonis Sidiropoulos, Dimitrios Katsaros, Yannis Manolopoulos

Doctoral Consortium

Frontmatter

Supervised Machine Learning Model to Help Controllers Solving Aircraft Conflicts

When two or more airplanes find themselves less than a minimum distance apart on their trajectory, it is called a conflict situation. To solve a conflict, air traffic controllers use various types of information and decide on actions pilots have to apply on the fly. With the increase of the air traffic, the controllers’ workload increases; making quick and accurate decisions is more and more complex for humans. Our research work aims at reducing the controllers’ workload and help them in making the most appropriate decisions. More specifically, our PhD goal is to develop a model that learns the best possible action(s) to solve aircraft conflicts based on past decisions or examples. As the first steps in this work, we present a Conflict Resolution Deep Neural Network (CR-DNN) model as well as the evaluation framework we will follow to evaluate our model and a data set we developed for evaluation.

Md Siddiqur Rahman

Handling Context in Data Quality Management

Data Quality Management (DQM) concerns a wide range of tasks and techniques, largely used by companies and organizations for assessing and improving the quality of their data. Data Quality (DQ) is defined as fitness for use and naturally depends on application context and usage needs. Moreover, context is embedded in DQM tasks, for example, in the definition of DQ metrics, in the discovery of DQ rules or in the elicitation of DQ requirements. However, despite its recognized importance for DQM, the literature only manages obvious contextual aspects of data, and lacks of proposals for context definition, specification and usage within major DQM tasks. This PhD thesis is at the junction of these two main topics: Data Quality and Context. Our objective is to model context for DQM, exploiting the contextual nature of data, at each phase of the DQM process. We aim to provide a general model of context for DQM, an approach for using the model within a DQM project, and a proof of concept in the domain of Digital Government.

Flavia Serra

Backmatter

Additional information

Premium Partner

    Image Credits