Skip to main content
Top

2015 | Book

New Trends in Databases and Information Systems

ADBIS 2015 Short Papers and Workshops, BigDap, DCSA, GID, MEBIS, OAIS, SW4CH, WISARD, Poitiers, France, September 8-11, 2015. Proceedings

insite
SEARCH

About this book

This book constitutes the thoroughly refereed short papers and workshop papers of the 19th East European Conference on Advances in Databases and Information Systems, ADBIS 2015, held in Poitiers, France, in September 2015. The 31 revised full papers and 18 short papers presented were carefully selected and reviewed from 135 submissions. The papers are organized in topical sections on ADBIS Short Papers; Second International Workshop on Big Data Applications and Principles, BigDap 2015; First International Workshop on Data Centered Smart Applications, DCSA 2015; Fourth International Workshop on GPUs in Databases, GID 2015; First International Workshop on Managing Evolving Business Intelligence Systems, MEBIS 2015; Fourth International Workshop on Ontologies Meet Advanced Information Systems, OAIS 2015; First International Workshop on Semantic Web for Cultural Heritage, SW4CH 2015; First International Workshop on Information Systems for AlaRm Diffusion, WISARD 2015.

Table of Contents

Frontmatter

ADBIS Short Papers

Frontmatter
Revisiting the Definition of the Relational Tuple Calculus

The tuple relational calculus has been based on the classical predicate logics. Databases are however not exactly representable in this calculus. They are finite. This finiteness results in a different semantics. The result of a query must be finite as well and must be based on the values in the database and in the query. In this case, negation and disjunction of query expression must be defined in a different way. The classical theory has developed restrictions to the tuple relational calculus such as safe formulas.

This paper takes a different turn. We introduce a different definition of the tuple relational calculus and show that this calculus is equivalent to the relational algebra and thus equivalent to the safe tuple relational calculus.

Bader AlBdaiwi, Bernhard Thalheim
A Requirements Specification Framework for Big Data Collection and Capture

The ad hoc processes of data gathering used by most organizations nowadays are proving to be inadequate in a world that is expanding with infinite information. As a consequence, users are often unable to obtain relevant information from large-scale data collections. The current practice tends to collect bulks of data that most often: (1) containing large portions of useless data; (2) leading to longer analysis time frames and thus, longer time to insights. The premise of this paper is; that big data analytics can only be successful when they are able to digest captured data and deliver valuable information. Therefore, this paper introduces ‘big data scenarios’ to the domain of data collection. It contributes to a paradigm shift of big data collection through the development of a conceptual model. In time of mass content creation, this model aids in a structured approach to gathering scenario-relevant information from various domain contexts.

Noufa Al-Najran, Ajantha Dahanayake
AutoScale: Automatic ETL Scale Process

In this paper we investigate the problem of providing automatic scalability and data freshness to data warehouses, when at the same time dealing with high-rate data efficiently. In general, data freshness is not guaranteed in those contexts, since data loading, transformation and integration are heavy tasks that are performed only periodically, instead of row by row.

Many current data warehouse deployments are designed to be deployed and work in a single server. However, for many applications problems related with data volume processing times, data rates and requirements for fresh and fast responses, require new solutions to be found.

The solution is to use/build parallel architectures and mechanisms to speed-up data integration and to handle fresh data efficiently.

Desirably, users developing data warehouses need to concentrate solely on the conceptual and logic design (e.g. business driven requirements, logical warehouse schemas, workload and ETL process), while physical details, including mechanisms for scalability, freshness and integration of high-rate data, should be left to automated tools.

We propose a universal data warehouse parallelization solution, that is, an approach that enables the automatic scalability and freshness of any data warehouse and ETL process. Our results show that the proposed system can handle scalablity to provide the desired processing speed.

Pedro Martins, Maryam Abbasi, Pedro Furtado
Using a Domain-Specific Language to Enrich ETL Schemas

Today it is easy to find a lot of tools to define data migration schemas among different types of information systems. Data migration processes use to be implemented on a very diverse range of applications, ranging from conventional operational systems to data warehousing platforms. The implementation of a data migration process often involves a serious planning, considering the development of conceptual migration schemas at early stages. Such schemas help architects and engineers to plan and discuss the most adequate way to migrate data between two different systems. In this paper we present and discuss a way for enriching data migration conceptual schemas in BPMN using a domain-specific language, demonstrating how to convert such enriched schemas to a first correspondent physical representation (a skeleton) in a conventional ETL implementation tool like Kettle.

Orlando Belo, Claudia Gomes, Bruno Oliveira, Ricardo Marques, Vasco Santos
Continuous Query Processing Over Data, Streams and Services: Application to Robotics

Developing applications involving mobile robots is a difficult task which requires technical skills spanning different areas of expertise, mainly computer science, robotics and electronics. In this paper, we propose a SQL-like declarative approach based on data management techniques. The basic idea is to see a multi-robot environment as a set of data, streams and services which can be described at a high level of abstraction. A continuous query processing engine is used in order to optimize data acquisition and data consumption. We propose different scenarios to classify the difficulty of such an integration and a principled approach to deal with the development of multi-robot applications. We provide our first results using a SQL-like language showing that such applications can be devised easily with a few continuous queries.

Vasile-Marian Scuturici, Yann Gripay, Jean-Marc Petit, Yutaka Deguchi, Einoshin Suzuki
Database Querying in the Presence of Suspect Values

In this paper, we consider the situation where a database may contain suspect values, i.e. precise values whose validity is not certain. We propose a database model based on the notion of possibilistic certainty to deal with such values. The operators of relational algebra are extended in this framework. A very interesting aspect is that queries have the same data complexity as in a classical database context.

Olivier Pivert, Henri Prade
Context-Awareness and Viewer Behavior Prediction in Social-TV Recommender Systems: Survey and Challenges

This paper surveys the landscape of actual personalized TV recommender systems, and introduces challenges on context-awareness and viewer behavior prediction applied to social TV-recommender systems. Real data related to the viewers behaviors and the social context have been picked up in real-time through a social TV platform. We highlighted the future benefits of analyzing viewer behavior and exploiting the social influence on viewers’s preferences to improve recommendation in respect with TV contents’ change.

Mariem Bambia, Rim Faiz, Mohand Boughanem
Generalized Bichromatic Homogeneous Vicinity Query Algorithm in Road Network Distance

This paper proposes a bichromatic homogeneous vicinity query method and its efficient algorithm. Several query algorithms have been proposed individually including set

k

nearest neighbor (NN) query, ordered NN query, bichromatic reverse

k

NN query, and distance range query. When these types of queries are performed in the road network distance, all take long processing time. The algorithm proposed in this paper gives a unified procedure that can be applied to these queries with a different query condition for each, and it reduces the processing time drastically. The basic idea of the algorithm is to expand the region on the road network gradually while verifying the query condition. It is the most time consuming process, and to improve the deficiency in verification step, an efficient road network distance search method is proposed. Through extensive experiments, the proposed algorithm significantly improves the performance in terms of processing time by nearly two orders of magnitude.

Yutaka Ohsawa, Htoo Htoo, Naw Jacklin Nyunt, Myint Myint Sein
OLAP4Tweets: Multidimensional Modeling of Tweets

Twitter, a popular microblogging platform, is at the epicenter of the social media explosion, with millions of users being able to create and publish short posts, referred to as tweets, in real time. The application of the OLAP (On-Line Analytical Processing) on large volumes of tweets is a challenge that would allow the extraction of information (especially knowledge) such as user behavior, new emerging issues, trends… In this paper, we pursue a goal of providing a generic multidimensional model dedicated to the OLAP of tweets. The proposed model reflects on some specifics such as recursive references between tweets and calculated attributes.

Maha Ben Kraiem, Jamel Feki, Kaîs Khrouf, Franck Ravat, Olivier Teste
Data Warehouse Design Methods Review: Trends, Challenges and Future Directions for the Healthcare Domain

In secondary data use context, traditional data warehouse design methods don’t address many of today’s challenges; particularly in the healthcare domain were semantics plays an essential role to achieve an effective and implementable heterogeneous data integration while satisfying core requirements. Forty papers were selected based on seven core requirements: data integrity, sound temporal schema design, query expressiveness, heterogeneous data integration, knowledge/source evolution integration, traceability and guided automation. Proposed methods were compared based on twenty-two comparison criteria. Analysis of the results shows important trends and challenges, among them (1) a growing number of methods unify knowledge with source structure to obtain a well-defined data warehouse schema built on semantic integration; (2) none of the published methods cover all the core requirements as a whole and (3) their potential in real world is not demonstrated yet.

Christina Khnaisser, Luc Lavoie, Hassan Diab, Jean-Francois Ethier
Incrementally Maintaining Materialized Temporal Views in Column-Oriented NoSQL Databases with Partial Deltas

Different from the relational database systems, each column in a column-oriented NoSQL database (CoNoSQLDB) maintains multiple data versions in which each data version is attached with an explicit timestamp (TS). In this paper, we study how to maintain the materialized temporal views (MTVs) in CoNoSQLDBs with partial temporal data-changes (deltas). We first review our previous work and indicate that not all change-data capture (CDC) approaches are able to provide complete deltas. Then, we propose approaches for maintaining MTVs with partial deltas.

Yong Hu, Stefan Dessloch
Towards Self-management in a Distributed Column-Store System

In this paper, we discuss a self-managed distributed column-store system which would adapt its physical design to changing workloads. Architectural novelties of column-stores hold a great promise for construction of an efficient self-managed database. At first, we present a short survey of an existing self-managed systems. Then, we provide some views on the organization of a self-managed distributed column-store system. We discuss its three core components: alerter, reorganization controller and the set of physical design options (actions) available to such a system. We present possible approaches to each of these components and evaluate them. This study is the first step towards a creation of an adaptive distributed column-store system.

George Chernishev
Relational-Based Sensor Data Cleansing

Today sensors are widely used in many monitoring applications. Due to some random environmental effects and/or sensing failures, the collected sensor data is typically noisy. Thus, it is critical to cleanse the data before using it for answering queries or for data analysis. Popular data cleansing approaches, such as classification, prediction and moving average, are not suited for embedded sensor devices, due to their limit storage and processing capabilities. In this paper, we propose a sensor data cleansing approach using the relational-based technologies, including constraints, triggers and granularity-based data aggregation. The proposed approach is simple but effective to cleanse different types of dirty data, including

delayed data

,

incomplete data

,

incorrect data

,

duplicate data

and

missing data

. We evaluate the proposed strategy to verify its efficiency and effectiveness.

Nadeem Iftikhar, Xiufeng Liu, Finn Ebertsen Nordbjerg
Avoiding Ontology Confusion in ETL Processes

Extract-Transform-Load (

$$\mathcal {ETL}$$

) is a crucial phase in Data Warehouse (

$$\mathcal {DW}$$

) design life-cycle that copes with many issues: data provenance, data heterogeneity, process automation, data refreshment, execution time, etc. Ontologies and Semantic Web technologies have been largely used in the

$$\mathcal {ETL}$$

phase. Ontologies are a

buzzword

used by many research communities such as:

Databases

,

Artificial Intelligence

(AI),

Natural Language Processing

(NLP), where each community has its type of ontologies: conceptual canonical ontologies (for databases), conceptual non-canonical ontologies (for AI), and linguistic ontologies (for NLP). In

$$\mathcal {ETL}$$

approaches, these three types of ontologies are considered. However, these studies do not consider the types of the used ontologies which usually affect the quality of the managed data. We propose in this paper a semantic

$$\mathcal {ETL}$$

approach which considers both canonical and non-canonical layers. To evaluate the effectiveness of our approach, experiments are conducted using Oracle semantic databases referencing LUBM benchmark ontology.

Selma Khouri, Sabrina Abdellaoui, Fahima Nader
Towards a Generic Approach for the Management and the Assessment of Cooperative Work

Cooperative work, teamwork or networking of individuals and collective-labors have become the key elements in modern organizations. These new forms of work organization have favored the birth of Computer-Supported Cooperative Work tools (CSCW), who study the individual mechanisms as well as the group-work collectives and then research how to assemble and cooperate many actors with various skills and different prerequisites. However, despite the enormous benefits of CSCW, few of these ones focus on the assessment aspect. Thus, this paper proposes a generic approach which combines Workflow Man-agement Systems (WfMS) with generic design patterns in order to generate a model for the management and specifically for the assessment of cooperative processes and their final rendering.

Amina Cherouana, Amina Aouine, Abdelaziz Khadraoui, Latifa Mahdaoui
MLES: Multilayer Exploration Structure for Multimedia Exploration

The traditional content-based retrieval approaches usually use flat querying, where whole multimedia database is searched for a result of some similarity query with a user specified query object. However, there are retrieval scenarios (e.g., multimedia exploration), where users may not have a clear search intents in their minds, they just want to inspect a content of the multimedia collection. In such scenarios, flat querying is not suitable for the first phases of browsing, because it retrieves the most similar objects and does not consider a view on part of a multimedia space from different perspectives. Therefore, we defined a new Multilayer Exploration Structure (MLES), that enables exploration of a multimedia collection in different levels of details. Using the MLES, we formally defined popular exploration operations (zoom-in/out, pan) to enable horizontal and vertical browsing in explored space and we discussed several problems related to the area of multimedia exploration.

Juraj Moško, Jakub Lokoč, Tomáš Grošup, Přemysl Čech, Tomáš Skopal, Jan Lánský
SLA Ontology-Based Elasticity in Cloud Computing

Service Level Agreements (SLA) is the principal means of control which defines the Quality of Service (QoS) requirements in cloud computing. These requirements have to be guaranteed in order to avoid costly SLA violations. However, elasticity strategies, which have not been deeply considered yet in SLA documents, may significantly ameliorate the QoS. Therefore, in this paper, our aim is to guarantee the QoS by introducing the semantic meaning of the elasticity strategies in SLA. In this regard, we propose an ontology-based elasticity approach which allows getting an elastic cloud service by dynamically apply corrective actions. These corrective actions present the elasticity strategies applied following a violation or in prediction of a violation. Our proposed approach allows getting an interactive and flexible SLA document in order to maintain a reliable QoS and respect the SLA parameters.

Taher Labidi, Achraf Mtibaa, Faiez Gargouri
Bi-objective Optimization for Approximate Query Evaluation

A problem of effective and efficient approximate query evaluation is addressed as a special case of multi-objective optimization with 2 criteria: the computational resources and the quality of result. The proposed optimization and execution model provides for interactive trade of quality for speed.

Anna Yarygina, Boris Novikov

Second International Workshop on Big Data Applications and Principles (BigDap 2015)

Frontmatter
Cross-Checking Data Sources in MapReduce

Fact checking from multiple sources is investigated from different and diverse angles and the complexity and diversity of the problem calls for a wide range of methods and techniques [

1

]. Fact checking tasks are not easy to perform and, most importantly, it is not clear what kind of computations they involve. Fact checking usually involves a large number of data sources that talk about the same thing but we are not sure which holds the correct information, or which has any information at all about the query we care for [

2

]. A join among all or some data sources can guide us through a fact checking process. However, when we want to perform this join on a distributed computational environment such as MapReduce, it is not obvious how to distribute efficiently the records in the data sources to the reduce tasks in order to join any subset of them in a single MapReduce job. In this paper, we show that the nature of such sources (i.e., since they talk about similar things) offers this opportunity, i.e., to distribute the records with low replication. We also show that the multiway algorithm in [

3

] can be implemented efficiently in MapReduce when the relations in the join have large overlaps in their schemas (i.e., they share a large number of attributes).

Foto Afrati, Zaid Momani, Nikos Stasinopoulos
CLUS: Parallel Subspace Clustering Algorithm on Spark

Subspace clustering techniques were proposed to discover hidden clusters that only exist in certain subsets of the full feature spaces. However, the time complexity of such algorithms is at most exponential with respect to the dimensionality of the dataset. In addition, datasets are generally too large to fit in a single machine under the current big data scenarios. The extremely high computational complexity, which results in poor scalability with respect to both size and dimensionality of these datasets, give us strong motivations to propose a parallelized subspace clustering algorithm able to handle large high dimensional data. To the best of our knowledge, there are no other parallel subspace clustering algorithms that run on top of new generation big data distributed platforms such as MapReduce and Spark. In this paper we introduce CLUS: a novel parallel solution of subspace clustering based on SUBCLU algorithm. CLUS uses a new dynamic data partitioning method specifically designed to continuously optimize the varying size and content of required data for each iteration in order to fully take advantage of Spark’s in-memory primitives. This method minimizes communication cost between nodes, maximizes their CPU usage, and balances the load among them. Consequently the execution time is significantly reduced. Finally, we conduct several experiments with a series of real and synthetic datasets to demonstrate the scalability, accuracy and the nearly linear speedup with respect to number of nodes of the implementation.

Bo Zhu, Alexandru Mara, Alberto Mozo
Massively Parallel Unsupervised Feature Selection on Spark

High dimensional data sets pose important challenges such as the curse of dimensionality and increased computational costs. Dimensionality reduction is therefore a crucial step for most data mining applications. Feature selection techniques allow us to achieve said reduction. However, it is nowadays common to deal with huge data sets, and most existing feature selection algorithms are designed to function in a centralized fashion, which makes them non scalable. Moreover, some of them require the selection process to be validated according to some target, which constrains their applicability to the supervised learning setting. In this paper we propose as novelty a parallel, scalable, exact implementation of an existing centralized, unsupervised feature selection algorithm on Spark, an efficient big data framework for large-scale distributed computation that outperforms MapReduce when applied to multi-pass algorithms. We validate the efficiency of the implementation using 1GB of real Internet traffic captured at a medium-sized ISP.

Bruno Ordozgoiti, Sandra Gómez Canaval, Alberto Mozo
Unsupervised Network Anomaly Detection in Real-Time on Big Data

Network anomaly detection relies on intrusion detection systems based on knowledge databases. However, building this knowledge may take time as it requires manual inspection of experts. Actual detection systems are unable to deal with 0-day attack or new user’s behavior and in consequence they may fail in correctly detecting intrusions. Unsupervised network anomaly detectors overcome this issue as no previous knowledge is required. In counterpart, these systems may be very slow as they need to learn traffic’s pattern in order to acquire the necessary knowledge to detect anomalous flows. To improve speed, these systems are often only exposed to sampled traffic, harmful traffic may then avoid the detector examination. In this paper, we propose to take advantage of new distributed computing framework in order to speed up an Unsupervised Network Anomaly Detector Algorithm, UNADA. The evaluation shows that the execution time can be improved by a factor of 13 allowing UNADA to process large traces of traffic in real time.

Juliette Dromard, Gilles Roudière, Philippe Owezarski
NPEPE: Massive Natural Computing Engine for Optimally Solving NP-complete Problems in Big Data Scenarios

Networks of Evolutionary Processors

(NEP) is a bio-inspired computational model defining theoretical computing devices able to solve NP-complete problems in an efficient manner.

Networks of Polarized Evolutionary Processors

(NPEP) is an evolution of the NEP model that presents a simpler and more natural filtering strategy to simulate the communication between cells. Up to now, it has not been possible to have implementations neither in vivo nor in vitro of these models. Therefore, the only way to analyze and execute NPEP devices is by means of ultra-scalable simulators able to encapsulate the inherent parallelism in their computations. Nowadays, there is a lack of such simulators able to handle the size of non trivial problems in a massively distributed computing environment. We propose as novelty NPEPE, a high scalability engine that runs NPEP descriptions using Apache Giraph on top of Hadoop platforms. Giraph is the open source counterpart of Google Pregel, an iterative graph processing system built for high scalability. NPEPE takes advantage of the inherent Giraph and Hadoop parallelism and scalablity to be able to deploy and run massive networks of NPEPs. We show several experiments to demonstrate that NPEP descriptions can be easily deployed and run using a NPEPE engine on a Giraph+Hadoop platform. To this end, the well known 3-colorability NP complete problem is described as a network of NPEPs and run on a 10 nodes cluster.

Sandra Gómez Canaval, Bruno Ordozgoiti Rubio, Alberto Mozo
Andromeda: A System for Processing Queries and Updates on Big XML Documents

In this paper we present Andromeda, a system for processing queries and updates on large XML documents. The system is based on the idea of statically and dynamically partitioning the input document, so to distribute the computing load among the machines of a Map/Reduce cluster.

Nicole Bidoit, Dario Colazzo, Carlo Sartiani, Alessandro Solimando, Federico Ulliana
Fast and Effective Decision Support for Crisis Management by the Analysis of People’s Reactions Collected from Twitter

The impact of human behavior during a crisis is a crucial factor that should always be taken into account by emergency managers. An early estimation of people’s reaction can be performed through information posted on social networks. This paper proposes a platform for the extraction of real time information about an ongoing crisis from social networks, to understand the main concerns issued by users involved in the crisis. Such information is combined with other contextual data, in order to estimate the impacts of different alternative actions that can be undertaken by decision makers.

Antonio Attanasio, Louis Jallet, Antonio Lotito, Michele Osella, Francesco Ruà
Adaptive Quality of Experience: A Novel Approach to Real-Time Big Data Analysis in Core Networks

Mobile networks are more and more a key element in how users access services. Users’ perception of access to services is difficult to be grasped and enhanced if degraded. Adaptive Quality of Experience (AQoE) aims to provide a comprehensive framework for predicting and preventing QoE degradation situations. This paper provides a high-level architectural model for managing communication service providers’ (CSP) mobile networks. AQoE relies on real-time big data analysis techniques to measure and anticipate QoE to allow CSP’s to perform actions to maintain a certain level of QoE. Thus, CSP’s can adapt users’ QoE dynamically and pro-actively in order to deal with service experience degradations in mobile networks.

Alejandro Bascuñana, Manuel Lorenzo, Miguel-Ángel Monjas, Patricia Sánchez
A Review of Scalable Approaches for Frequent Itemset Mining

Frequent Itemset Mining is a popular data mining task with the aim of discovering frequently co-occurring items and, hence, correlations, hidden in data. Many attempts to apply this family of techniques to Big Data have been presented. Unfortunately, few implementations proved to efficiently scale to huge collections of information. This review presents a comparison of a carefully selected subset of the most efficient and scalable approaches. Focusing on Hadoop and Spark platforms, we consider not only the analysis dimensions typical of the data mining domain, but also criteria to be valued in the Big Data environment.

Daniele Apiletti, Paolo Garza, Fabio Pulvirenti

First International Workshop on Data Centered Smart Applications (DCSA 2015)

Frontmatter
A Mutual Resource Exchanging Model and Its Applications to Data Analysis in Mobile Environment

In this paper, a mutual resource exchanging model in mobile computing environment and its application to data analysis are introduced. Resource exchanging is a key issue in data analysis because the only way for efficient use of limited resources in mobile environment is exchanging them each other. Also this paper presents the applicability of this model by showing applications of (1) a universal battery, (2) bandwidth sharing, (3) a 3D movie production by collective intelligence, and, (4) mobile data analysis for motion sensors. By showing results of data analysis, the applicability of this model will be clarified.

Naofumi Yoshida
Detection of Trends and Opinions in Geo-Tagged Social Text Streams

This paper, describes an application of social media, database and data-mining techniques for the analysis of conflicting trends and opinions in a spatial area. This setup was used to demonstrate the distribution of interests during a global event and can be used for several social media datamining tasks, such as trend prediction, sentiment analysis and social-psychological feedback tracing. To this end the application clusters trends in social text media streams, such as Twitter and detects the different opinion differences within a single trend based on the temporal, spatial and semantic-pragmatic dimensions. The data is stored in a multidimensional space to detect correlations and combine similar trends into clusters, as it is expanded over time. The results of this work provide a system, with the focus to trace several clusters of conflicts within the same trend, as opposed to the common approach of tag-based filtering and sorting by occurrence count.

Jevgenij Jakunschin, Andreas Heuer, Antje Raab-Düsterhöft
Software Architecture for Collaborative Crowd-Storming Applications

Diversity in crowdsourcing systems in terms of processes, participants, workflow, and technologies is quite problematic; as there is no standard guidance to inform the designing process of such crowdsourcing ideation system. To build a well-engineered crowdsourcing system with different ideation and collaboration components, a software architecture model is needed to guide the design of the collaborative ideation process. Within this general context, this paper is focused to create a vision for the architectural design for crowdsourcing collaborative idea generation, with an attempt to provide the required features for coordinating and aggregating ideation of individual participants into collective solutions.

Nouf Jaafar, Ajantha Dahanayake
Gamification in Saudi Society: A Framework to Develop Human Values for Early Generations

In a technology era, where the evolution is faster than we can imagine, it is without no doubts that our daily life is mostly driven by the technology, in learning, arranging to do lists, in our cars, mobile phones and even in games. In fact if we want to talk about gaming nowadays, it is also used not just for entertainment and there is a new revolution on a young field called gamification, where it combines learning with entertainment. It is used in many fields such as social media, schools, sports, marketing and even in NASA, so you can imagine how gamification is spreading like fire in the past few years that even oxford has added the word “gamification” to its dictionary. We are aiming to use gamification in teaching values to the young generation. Research title will be “Using Gamification to Develop Human Values for Children in Saudi Arabia”.

Alia AlBalawi, Bariah AlSaawi, Ghada AlTassan, Zaynab Fakeerah

Fourth International Workshop on GPUs in Databases (GID 2015)

Frontmatter
Big Data Conditional Business Rule Calculations in Multidimensional In-GPU-Memory OLAP Databases

The ability to handle Big Data is one of the key requirements of today’s database systems. Calculating conditional business rules in OLAP scenarios means creating virtual cube cells out of previously stored database entries and precalculated aggregates based on a given condition. It requires passing several steps such as source data filtering, aggregation and conditional analysis, each involving storing intermediate results which can easily get very large. Therefore, algorithms allowing to stream data instead of calculating the results in one step are essential to process big sets of data without exceeding the hardware limitations. This paper shows how the evaluation of conditional business rules can be accelerated using GPUs and massively data-parallel streaming-algorithms written in CUDA.

Alexander Haberstroh, Peter Strohm
Optimizing Sorting and Top-k Selection Steps in Permutation Based Indexing on GPUs

Permutation-based indexing is one of the most popular techniques for the approximate nearest-neighbor search problem in high-dimensional spaces. Due to the exponential increase of multimedia data, the time required to index this data has become a serious constraint of current techniques. One of the possible steps towards faster index construction is the utilization of massively parallel platforms such as the GPGPU architectures. In this paper, we have focused on two particular steps of permutation index construction – the selection of top-k nearest pivot points and sorting these pivots according to their respective distances. Even though these steps are integrated into a more complex algorithm, we address them selectively since they may be employed individually for different indexing techniques or query processing algorithms in multimedia databases. We also provide a discussion of alternative approaches that we have tested but which have proved less efficient on present hardware.

Martin Kruliš, Hasmik Osipyan, Stéphane Marchand-Maillet

First International Workshop on Managing Evolving Business Intelligence Systems (MEBIS 2015)

Frontmatter
E-ETL Framework: ETL Process Reparation Algorithms Using Case-Based Reasoning

External data sources (EDSs) being integrated in a data warehouse (DW) frequently change their structures/schemas. As a consequence, in many cases, an already deployed ETL workflow stops its execution, yielding errors. Since structural changes of EDSs are frequent, an automatic reparation of an ETL workflow after such a change is of high importance. In this paper we present a framework, called

E-ETL

, for handling the evolution of an ETL layer. In the framework, an ETL workflow is semi-automatically or automatically (depending on a case) repaired as the result of structural changes in data sources, so that it works with the changed data sources.

E-ETL

supports three different reparation methods, but in this paper we discuss the one that is based on case-based reasoning. The proposed framework is being developed as a module external to an ETL engine, so that it can work with any engine that supports API for manipulating ETL workloads.

Artur Wojciechowski
Handling Evolving Data Warehouse Requirements

A data warehouse is a dynamic environment and its business requirements tend to evolve over time, therefore, it is necessary not only to handle changes in data warehouse data, but also to adjust a data warehouse schema in accordance with changes in requirements. In this paper, we propose an approach to propagate modified data warehouse requirements in data warehouse schemata. The approach supports versions of data warehouse schemata and employs the requirements formalization metamodel and multiversion data warehouse metamodel to identify necessary changes in a data warehouse.

Darja Solodovnikova, Laila Niedrite, Natalija Kozmina
Querying Multiversion Data Warehouses

Data warehouses (DWs) change in their content and structure due to changes in the feeding sources, business requirements, the modeled reality, and legislation, to name a few. Keeping the history of changes in the content and structure of a DW enables the user to analyze the state of the business world retrospectively or prospectively. Multiversion data warehouses (MVDWs) keep the history of content and structure changes by creating multiple data warehouse versions. Querying such DWs is complex as data is stored in multiple schema versions. In this paper, we discuss various schema changes in a multidimensional model, and elaborate their impact on the queries. Further, we also propose a system to support querying MVDWs.

Waqas Ahmed, Esteban Zimányi
CUDA-Powered CTBE Algorithm for Zero-Latency Data Warehouse

The systems dedicated for Zero-Latency Data Warehouses must meet the growing requirements for the most up-to-date data. The currently used sequential algorithms are not suited to deal with the pressure on receiving the freshest data. The one-module architecture implemented in current solutions, limits the development opportunities and increases the risk of critical system failure. In this paper we propose a new, innovative, multi-modular system that is based on parallel Choose Transaction by Election (CTBE) algorithm. Additionally we utilize the CUDA architecture to boost system efficiency, using computing power of multi-core graphic processors. The aim of this paper is to highlight pros and cons of such a solution. Performed tests and results show the potential and capabilities of the multi-modular system, using CUDA architecture.

Marcin Gorawski, Damian Lis, Anna Gorawska

Fourth International Workshop on Ontologies Meet Advanced Information Systems (OAIS 2015)

Frontmatter
Mobile Co-Authoring of Linked Data in the Cloud

The powerful evolution of hardware, software and data connectivity of mobile devices (such as smartphones and tablets) stimulates people to publish and share their personal data (like social network information or sensor readings) independently of spatial and temporal constraints. To do this, the development of an efficient semantic web collaborative editor for mobile devices is needed. However, collaboratively editing a shared semantic web document in real-time through ad-hoc peer-to-peer mobile networks requires increasing amounts of computation, data storage and network communication. In this paper, we propose a new cloud service-based model that allows a real-time co-authoring of Linked-Data (LD) as a RDF (Resource Description Framework) graph using mobile devices. Our model is built upon two layers: (i) cloning engine that enables users to clone their mobile devices in the cloud to delegate the overload of collaborative tasks and provides peer-to-peer networks where users can create ad-hoc groups; (ii) Collaborative engine that allows updating freely and concurrently a shared RDF graph in peer-to-peer fashion without requiring a central server. This work represents a step forward toward a practical and flexible co-authoring environment for LD.

Moulay Driss Mechaoui, Nadir Guetmi, Abdessamad Imine
Ontology Based Linkage Between Enterprise Architecture, Processes, and Time

In an highly dynamic social and business environment time becomes one of the most treasured recourses of companies and individuals. However, there are no many research works devoted to explicit analysis of time issues. Therefore, despite of a large number of enterprise and business process representation and analysis tools, it is still impossible to address time to full extent in systems modeling and analysis. This paper proposes linkage between enterprise architecture, process, and time models to promote development of methods for managing time issues in systems development, maintenance and change management. The linkage roots in Bunge’s systems ontology and concerns time ontology edited by Hobbs and Pan.

Marite Kirikova, Ludmila Penicina, Andrejs Gaidukovs
Fuzzy Inference-Based Ontology Matching Using Upper Ontology

Bio-ontologies are characterized by large sizes, and there is a large number of smaller ontologies derived from them. Determining semantic correspondences across these smaller ones can be based on this “upper” ontology. To this end, we introduce a new fuzzy inference-based ontology matching approach exploiting upper ontologies as semantic bridges in the matching process. The approach comprises two main steps: first, a fuzzy inference-based matching method is used to determine the confidence values in the ontology matching process. To learn the fuzzy system parameters and to enhance the adaptability of fuzzy membership function parameters, we exploit a gradient discriminate learning technique. Second, the achieved results are then composed and combined to derive the final match result. The experimental results show that the performance of the proposed approach compared to one of the famous benchmark research is acceptable.

S. Hashem Davarpanah, Alsayed Algergawy, Samira Babalou
An Ontology-Based Approach for Handling Explicit and Implicit Knowledge over Trajectories

The current information systems manage several, different and huge databases. The data can be temporal, spatial and other application domains with specific knowledge. For these reasons, new approaches must be designed to fully exploit data expressiveness and heterogeneity taking into account application’s needs. As part of ontology-based information system design, this paper proposes an ontology modeling approach for trajectories of moving objects. Consider domain, temporal and spatial knowledge gives a complexity to our system. We propose optimizations to annotate data with these knowledge.

Rouaa Wannous, Cécile Vincent, Jamal Malki, Alain Bouju
Interpretation of DD-LOTOS Specification by C-DATA*

The DD-LOTOS language is defined for the formal specification of distributed real-time systems. The peculiarity of this language compared to existing languages is its taken into account of the distributed aspect of real-time systems. DD-LOTOS has been defined on a semantic model of true concurrency ie the semantics of maximality. Our work focuses on the translation of DD-LOTOS specifications to an adequate semantic model. The destination model is a communicating timed automaton with durations of actions, temporal constraints and supports communication between localities; this model is called C-DATA*.

Maarouk Toufik Messaoud, Saidouni Djamel Eddine, Mahdaoui Rafik, Houassi Hichem

First International Workshop on Semantic Web for Cultural Heritage (SW4CH 2015)

Frontmatter
Knowledge Representation in EPNet

Semantic technologies are rapidly changing the historical research. This paper focuses on the knowledge representation and data modelling initiative that has characterised the first year of the EPNet project in the context of the historical research. The so-called EPNet CRM and Ontology are introduced here, and put in connection with existing modelling standards. The formal specification of the domain expert knowledge is the preliminary step toward the design of an innovative ‘Virtual Research Environment’ for scholars of the Roman Empire. Potential and actual benefits coming from the knowledge representation initiative are also discussed, with the main aim of encouraging experts in the humanities in embracing an innovative paradigm whose proved efficacy can positively affect their current research practices.

Alessandro Mosca, José Remesal, Martin Rezk, Guillem Rull
A Pattern-Based Framework for Best Practice Implementation of CRM/FRBRoo

The CIDOC Conceptual Reference Model and extensions to this model such as the FRBRoo, are important for semantic interoperability in the area of cultural heritage documentation. However, the real life use of such reference models is challenging due to their complexity as well as their extensive and detailed nature. In this paper we present a framework for sharing best practice knowledge related to the use of these models, which is inspired by the use of design patterns in software engineering. The main contribution is a framework for sharing and promoting knowledge and solutions for best practice that supplements existing documentation and is adapted to the needs of developers.

Trond Aalberg, Audun Vennesland, Maliheh Farrokhnia
Application of CIDOC-CRM for the Russian Heritage Cloud Platform

This paper describes the usage of CIDOC-CRM ontology for the online representation of cultural heritage data on, based on class templates; and also describes the motivation for choosing the CIDOC-CRM ontology as the basis for the Russian Heritage Cloud project, a recent collaboration started between ITMO University and a number of museums in Russia.

Eugene Cherny, Peter Haase, Dmitry Mouromtsev, Alexey Andreev, Dmitry Pavlov
Designing for Inconsistency – The Dependency-Based PERICLES Approach

The rise of the Semantic Web has provided cultural heritage researchers and practitioners with several tools for ensuring semantic-rich representations and interoperability of cultural heritage collections. Although indeed offering a lot of advantages, these tools, which come mostly in the form of ontologies and related vocabularies, do not provide a conceptual model for capturing contextual and environmental dependencies contributing to long-term digital preservation. This paper presents one of the key outcomes of the PERICLES FP7 project, the Linked Resource Model, for modelling dependencies as a set of evolving linked resources. The proposed model is evaluated via a domain-specific representation involving digital video art.

Jean-Yves Vion-Dury, Nikolaos Lagos, Efstratios Kontopoulos, Marina Riga, Panagiotis Mitzias, Georgios Meditskos, Simon Waddington, Pip Laurenson, Ioannis Kompatsiaris
A Semantic Exploration Method Based on an Ontology of 17 $$^{th}$$ Century Texts on Theatre: la Haine du Théâtre

This paper proposes a method to explore a collection of texts with an ontology depending on a particular point of view. In the first part, the paper points out the characteristics of the corpus, composed of 17

$$^{th}$$

century French texts. In the second part, it explains the methodology to isolate the discriminant terms for the ontology creation. Furthermore, not only the projection of the ontology on the texts is pointed out, but also how to explore the corpus thanks to the defined perspective based on semantic fields.

Chiara Mainardi, Zied Sellami, Vincent Jolivet
Combining Semantic and Collaborative Recommendations to Generate Personalized Museum Tours

Our work takes place in the field of support systems to museum visits and access to cultural heritage. Visitors of museums are often overwhelmed by the information available in the space they are exploring. Therefore, finding relevant artworks to see in a limited amount of time is a difficult task. Our goal is to design a recommender system for mobile devices that adapts to the users preferences and is sensitive to their contexts (location, time, expertise...). This system aims to improve the visitors’ experience and help them build their tours on-site according to their preferences and constraints. In this paper we describe our recommendation framework, which consists in a hybrid recommendation system. It combines a semantic approach for the representation of museum knowledge using ontologies and thesauruses with a semantically-enhanced collaborative filtering method. A contextual post-filtering enables the generation of a highly personalized tour based on the physical environment, the location of the visitors and the time they want to spend in the museum. This work is applied to the Compiègne Imperial Palace museum in Picardy.

Idir Benouaret, Dominique Lenne
A Novel Vision for Navigation and Enrichment in Cultural Heritage Collections

In the cultural heritage domain, there is a huge interest in utilizing semantic web technology and build services enabling users to query, explore and access the vast body of cultural heritage information that has been created over decades by memory institutions. For successful conversion of existing data into semantic web data, however, there is often a need to enhance and enrich the legacy data to validate and align it with other resources and reveal its full potential. In this visionary paper, we describe a framework for semantic enrichment that relies on the creation of thematic knowledge bases, i.e., about a given topic. These knowledge bases aggregate information by exploiting structured resources (e.g., Linked Open Data cloud) and by extracting new relationships from streams (e.g., Twitter) and textual documents (e.g., web pages). Our focused application in this paper is how this approach can be utilized when transforming library records into semantic web data based on the FRBR model in the process that commonly is called FRBRization.

Joffrey Decourselle, Audun Vennesland, Trond Aalberg, Fabien Duchateau, Nicolas Lumineau
Improving Retrieval of Historical Content with Entity Linking

The relevance of Named-Entity Recognition and Entity Linking for cultural heritage institutions is evaluated through a case-study involving the semantic enrichment of historical periodicals. A language-independent approach is proposed in order to improve the search experience of end-users with the mapping of entities to the Linked Open Data (LOD) cloud. Preliminary results show that a precision rate of almost 90% can be achieved with very little fine-tuning, while an increase in recall remains necessary.

Max De Wilde
Disambiguation of Named Entities in Cultural Heritage Texts Using Linked Data Sets

This paper proposes a graph-based algorithm baptized REDEN for the disambiguation of authors’ names in French literary criticism texts and scientific essays from the 19th century. It leverages knowledge from different Linked Data sources in order to select candidates for each author mention, then performs fusion of DBpedia and BnF individuals into a single graph, and finally decides the best referent using the notion of graph centrality. Some experiments are conducted in order to identify the best size of disambiguation context and to assess the influence on centrality of specific relations represented as edges. This work will help scholars to trace the impact of authors’ ideas across different works and time periods.

Carmen Brando, Francesca Frontini, Jean-Gabriel Ganascia

First International Workshop on Information Systems for AlaRm Diffusion (WISARD 2015)

Frontmatter
Abduction for Analysing Data Exchange Policies

This paper addresses the question of checking the quality of data exchange policies which exist in organizations in order to regulate data exchanges between members. More particularly, we address the question of generating the situations compliant with the policy but in which a given property is unsatisfied. We show that it comes to a problem of abduction and we propose to use an algorithm based on the SOL-resolution. Our contributions are illustrated on a case study (This research was supported by ONERA.).

Laurence Cholvy
ADMAN: An Alarm-Based Mobile Diabetes Management System for Mobile Geriatric Teams

In this article, we introduce ADMAN an alarm-based diabetes management system for the disposal of Mobile Geriatric Teams MGT. The system aims at providing a form of remote monitoring in order to control the diabetes rate for elder patients. The system is multidimensional in a way that it resides at the patient mobile machine from a side, the doctor’s mobile machine from another side and can be connected to any other entity related to the MGT that is handling his case (e.g. dietitian).

Dana Al Kukhun, Bouchra Soukkarieh, Florence Sèdes
An Architectural Roadmap Towards Building an Alarm Diffusion System

Alarm Diffusion Systems(ADS) have complex, exacting and critical requirements. In this work we aim to provide a software architecture perspective towards ADS. We look at both functional and quality requirements for an ADS and also attempt to identify certain quality attributes specific to an ADS and attempted to provide a set of architectural tactics to realise them. We also propose a Reference Architecture for designing such systems. We have provided ample examples to support our inferences and take a deeper look at a case study of the Traffic Collision Avoidance System(TCAS) in aircrafts.

Sumit Kalra, T. V. Prabhakar, Saurabh Srivastava
Information Exchange Policies at an Organisational Level : Formal Expression and Analysis

This paper starts from a logical framework intended to define and analyse information exchange policies for critical information systems. A layer is introduced to express organisational information exchange policies at abstract level. Properties are defined within this organisational layer, in particular information permeability through organisations. More efficiency is expected for policies expression and analysis.

Claire Saurel
Critical Information Diffusion Systems

Today, individuals, companies, organizations and national agencies are increasingly interconnected, forming complex and decentralized information systems. In some of these systems, the very fact of exchanging information can constitute a safety critical concern in itself. Take for instance the prevention of natural disasters. We have a set of actors who share their observation data and information in order to better manage crises by warning the more competent authorities. The aim of this article is to find a definition to such kind of systems we name Critical Information Diffusion Systems. In addition, we see why Critical Information Diffusion Systems need information exchange policies.

Rémi Delmas, Thomas Polacsek
A Case Study on the Influence of the User Profile Enrichment on Buzz Propagation in Social Media: Experiments on Delicious

The user is the main contributor for creating information in social media. In these media, users are influenced by the information shared through thenetwork. In a social context, there are so-called “buzz”, which is a technique to make noise around an event. This technique engenders that several users will be interested in this event at a time

t

. A buzz is then popular information in a specific time. A buzz may be a fact (true information) or a rumour (fake, false information). We are interested in studying buzz propagation through time in the social network

Delicious

. Also, we study the influence of enriched user profilesthat we proposed [

2

] to propagate the buzz in the same social network. In this paper, we state a case study on some information of the social network

Delicious

. This latter contains social annotations (tags) provided by users. These tags contribute to influence the other users to follow this information or to use it. This study relies onthree main axes: 1) we focus on tags considered as buzz and analyse their propagation through time 2) we consider a user profile as the set of tags provided by him. We will use the result of our previous work on dynamic user profile enrichment in order to analyse the influence of this enrichment in the buzz propagation. 3) we analyse each enriched user profile in order to show if the enrichment approach anticipate the buzz propagation. So, we can see the interest of filtering the information in order to avoid potential rumours and then, to propose relevant results to the user (e.g. avoid “bad” recommendation).

Manel Mezghani, Sirinya On-at, André Péninou, Marie-Françoise Canut, Corinne Amel Zayani, Ikram Amous, Florence Sedes
Backmatter
Metadata
Title
New Trends in Databases and Information Systems
Editors
Tadeusz Morzy
Patrick Valduriez
Ladjel Bellatreche
Copyright Year
2015
Electronic ISBN
978-3-319-23201-0
Print ISBN
978-3-319-23200-3
DOI
https://doi.org/10.1007/978-3-319-23201-0

Premium Partner