Skip to main content

2015 | Buch

Data Management Technologies and Applications

Third International Conference, DATA 2014, Vienna, Austria, August 29-31, 2014, Revised Selected papers

insite
SUCHEN

Über dieses Buch

This book constitutes the thoroughly refereed proceedings of the Third International Conference on Data Technologies and Applications, DATA 2014, held in Vienna, Austria, in August 2014.

The 12 revised full papers were carefully reviewed and selected from 87 submissions. The papers deal with the following topics: databases, data warehousing, data mining, data management, data security, knowledge and information systems and technologies; advanced application of data.

Inhaltsverzeichnis

Frontmatter
Using Information Visualization to Support Open Data Integration
Abstract
Data integration has always been a major problem in computer sciences. The more heterogeneous, large and distributed the data sources become, the more difficult the data integration process is. Nowadays, more and more information is being made available on the Web. This is especially the case in the Open Data (OD) movement. Large quantities of datasets are published and accessible. Besides size, heterogeneity grows as well: Datasets exist e.g. in different formats and shapes (tabular files, plain-text files and so on). The ability to efficiently interpret and integrate such datasets is of paramount importance for their potential users. Information Visualization may be an important tool to support this OD integration process. This article presents problems which can be encountered in the data integration process, and, more specifically, in the OD integration process. It also describes how Information Visualization can support OD integration process and make it more effective, friendlier, and faster.
Paulo Carvalho, Patrik Hitzelberger, Benoît Otjacques, Fatma Bouali, Gilles Venturini
Security Issues in Distributed Data Acquisition and Management of Large Data Volumes
Abstract
The internet is faced with new application scenarios like smart homes, smart traffic control and guidance systems, smart power grids, or smart buildings. They all have in common that they require a high degree of robustness, reliability, scalability, safety, and security. This paper provides a list of criteria for these properties and focuses on the aspect of data exchange and management. It introduces a security concept for scalable and easy-to-use Secure Generic Data Services, called SeGDS, which covers application scenarios extending from embedded field devices for data acquisition to large-scale generic data applications and data management. Our concept is based largely on proven standard solutions and uses standard enterprise hardware. The first application deals with transport and management of mass data originating from high-resolution electrical data devices, which measure parameters of the electrical grid with a high sample rate.
Alexander Kramer, Wilfried Jakob, Heiko Maaß, Wolfgang Süß
Large-Scale Residential Energy Maps: Estimation, Validation and Visualization Project SUNSHINE - Smart Urban Services for Higher Energy Efficiency
Abstract
This paper illustrates the preliminary results of a research project focused on the development of a Web 2.0 system designed to compute and visualize large-scale building energy performance maps, using: emerging platform-independent technologies such as WebGL for data presentation, an extended version of the EU-Founded project TABULA/EPISCOPE for automatic calculation of building energy parameters and CityGML OGC standard as data container. The proposed architecture will allow citizens, public administrations and government agencies to perform city-wide analyses on the energy performance of building stocks.
Umberto di Staso, Luca Giovannini, Marco Berti, Federico Prandi, Piergiorgio Cipriano, Raffaele De Amicis
A Researcher’s View on (Big) Data Analytics in Austria Results from an Online Survey
Abstract
We present results from questionnaire data that were collected from leading data analytics researchers and experts across Austria. The online survey addresses very pressing questions in the area of (big) data analysis. Our findings provide valuable insights about what top Austrian data scientists think about data analytics, what they consider as important application areas that can benefit from big data and data processing, the challenges of the future and how soon these challenges will become relevant, and become potential research topics of tomorrow. We visualize results, summarize our findings and suggest a roadmap for future decision making.
Ralf Bierig, Allan Hanbury, Florina Piroi, Marita Haas, Helmut Berger, Mihai Lupu, Michael Dittenbach
Accurate Data Cleansing through Model Checking and Machine Learning Techniques
Abstract
Most researchers agree that the quality of real-life data archives is often very poor, and this makes the definition and realisation of automatic techniques for cleansing data a relevant issue. In such a scenario, the Universal Cleansing framework has recently been proposed to automatically identify the most accurate cleansing alternatives among those synthesised through model-checking techniques. However, the identification of some values of the cleansed instances still relies on the rules defined by domain-experts and common practice, due to the difficulty to automatically derive them (e.g. the date value of an event to be added).
In this paper we extend this framework by including well-known machine learning algorithms - trained on the data recognised as consistent - to identify the information that the model based cleanser couldn’t produce. The proposed framework has been implemented and successfully evaluated on a real dataset describing the working careers of a population.
Roberto Boselli, Mirko Cesarini, Fabio Mercorio, Mario Mezzanzanica
Social Influence and Influencers Analysis: A Visual Perspective
Abstract
Identifying influencers is an important step towards understanding how information spreads within a network. Social networks follow a power-law degree distribution of nodes, with a few hub nodes and a long tail of peripheral nodes. While there exist consolidated approaches supporting the identification and characterization of hub nodes, research on the analysis of the multi-layered distribution of peripheral nodes is limited. In social media, hub nodes represent social influencers. However, the literature provides evidence of the multi-layered structure of influence networks, emphasizing the distinction between influencers and influence. Information seems to spread following multi-hop paths across nodes in peripheral network layers. This paper proposes a visual approach to the graphical representation and exploration of peripheral layers and clusters by exploiting the theory of k-shell decomposition analysis. The core concept of the proposed approach is to partition the node set of a graph into pre-defined hub and peripheral nodes. Then, a power-law based modified force-directed method is applied to clearly display local multi-layered neighborhood clusters around hub nodes based on a characterization of the content of message that we refer to as content specificity. We put forward three hypotheses that allow the graphical identification of the peripheral nodes that are more likely to be influential and contribute to the spread of information. Hypotheses are tested on a large sample of tweets from the tourism domain.
Chiara Francalanci, Ajaz Hussain
Using Behavioral Data Mining to Produce Friend Recommendations in a Social Bookmarking System
Abstract
Social recommender systems have been developed to filter the large amounts of data generated by social media systems. A type of social media, known as social bookmarking system, allows the users to tag bookmarks of interest and to share them. Although the popularity of these systems is increasing and even if users are allowed to connect both by following other users or by adding them as friends, no friend recommender system has been proposed in the literature. Behavioral data mining is a useful tool to extract information by analyzing the behavior of the users in a system. In this paper we first perform a preliminary analysis that shows that behavioral data mining is effective to discover how similar the preferences of two users are. Then, we exploit the analysis of the user behavior to produce friend recommendations, by analyzing the resources tagged by a user and the frequency of each used tag. Experimental results highlight that, by analyzing both the tagging and bookmarking behaviors of a user, our approach is able to mine preferences in a more accurate way with respect to a state-of-the-art approach that considers only the tags.
Matteo Manca, Ludovico Boratto, Salvatore Carta
Developing a Pedagogical Cybersecurity Ontology
Abstract
We present work on a hybrid method for developing a pedagogical cybersecurity ontology augmented with teaching and learning-related knowledge in addition to the domain content knowledge. The intended use of this ontology is to support students in the process of learning. The general methodology for developing the pedagogical cybersecurity ontology combines the semi-automatic classification and acquisition of domain content knowledge with pedagogical knowledge. The hybrid development method involves the use of a seed ontology, an electronically readable textbook with a back-of-the-book index, semi-automatic steps based on pattern matching, and the cyberSecurity Ontology eXpert tool (SOX) for an expert to fill the knowledge gaps. Pedagogical knowledge elements include importance, difficulty, prerequisites and likely misunderstandings. The pedagogical cybersecurity ontology can be more useful for students than an ontology that contains only domain content knowledge.
Soon Ae Chun, James Geller
A Reverse Engineering Process for Inferring Data Models from Spreadsheet-based Information Systems: An Automotive Industrial Experience
Abstract
Nowadays Spreadsheet-based Information Systems are widely used in industries to support different phases of their production processes. The intensive employment of Spreadsheets in industry is mainly due to their ease of use that allows the development of Information Systems even by not experienced programmers. The development of such systems is further aided by integrated scripting languages (e.g. Visual Basic for Applications, Libre Office Basic, JavaScript, etc.) that offer features for the implementation of Rapid Application Development processes. Although Spreadsheet-based Information Systems can be developed with a very short time to market, they are usually poorly documented or in some case not documented at all. As a consequence, they are very difficult to be comprehended, maintained or migrated towards other architectures, such as Database Oriented Information Systems or Web Applications. The abstraction of a data model from the source spreadsheet files represents a fundamental activity of the migration process towards different architectures. In our work we present an heuristic- based reverse engineering process for inferring a data model from an Excel based information system. The process is fully automatic and it is based on seven sequential steps. Both the applicability and the effectiveness of the proposed process have been assessed by an experiment we conducted in the automotive industrial context. The process was successfully used to obtain the UML class diagrams representing the conceptual data models of three different Spreadsheet-based Information Systems. The paper presents the results of the experiment and the lessons we learned from it.
Domenico Amalfitano, Anna Rita Fasolino, Porfirio Tramontana, Vincenzo De Simone, Giancarlo Di Mare, Stefano Scala
Validation Approaches for a Biological Model Generation Describing Visitor Behaviours in a Cultural Heritage Scenario
Abstract
In this paper we propose a biologically inspired mathematical model to simulate the personalized interactions of users with cultural heritage objects. The main idea is to measure the interests of a spectator w.r.t. an artwork by means of a model able to describe the behaviour dynamics. In this approach, the user is assimilated to a computational neuron, and its interests are deduced by counting potential spike trains, generated by external currents. The key idea of this paper consists in comparing a strengthened validation approach for neural networks based on classification with our novel proposal based on clustering; indeed, clustering allows to discover natural groups in the data, which are used to verify the neuronal response and to tune the computational model.
Preliminary experimental results, based on a phantom database and obtained from a real world scenario, are shown. They underline the accuracy improvements achieved by the clustering-based approach in supporting the tuning of the model parameters.
Salvatore Cuomo, Pasquale De Michele, Giovanni Ponti, Maria Rosaria Posteraro
A Method for Topic Detection in Great Volumes of Data
Abstract
Topics extraction has become increasingly important due to its effectiveness in many tasks, including information filtering, information retrieval and organization of document collections in digital libraries. The Topic Detection consists to find the most significant topics within a document corpus. In this paper we explore the adoption of a methodology of feature reduction to underline the most significant topics within a document corpus. We used an approach based on a clustering algorithm (X-means) over the \(tf-idf\) matrix calculated starting from the corpus, by which we describe the frequency of terms, represented by the columns, that occur in the documents, represented by the rows. To extract the topics, we build n binary problems, where n is the numbers of clusters produced by an unsupervised clustering approach and we operate a supervised feature selection over them, considering the top features as the topic descriptors. We will show the results obtained on two different corpora. Both collections are expressed in Italian: the first collection consists of documents of the University of Naples Federico II, the second one consists in a collection of medical records.
Flora Amato, Francesco Gargiulo, Alessandro Maisto, Antonino Mazzeo, Serena Pelosi, Carlo Sansone
A Framework for Real-Time Evaluation of Medical Doctors’ Performances While Using a Cricothyrotomy Simulator
Abstract
Cricothyrotomy is a life-saving procedure performed when an airway cannot be established through less invasive techniques. One of the main challenges of the research community in this area consists in designing and building a low-cost simulator that teaches essential anatomy, and providing a method of data collection for performance evaluation and guided instruction as well.
In this paper, we present a framework designed and developed for activity detection in the medical context. More in details, it first acquires data in real time from a cricothyrotomy simulator, when used by medical doctors, then it stores the acquired data into a scientific database and finally it exploits an Activity Detection Engine for finding expected activities, in order to evaluate the medical doctors’ performances in real time, that is very essential for this kind of applications. In fact, an incorrect use of the simulator promptly detected can save the patient’s life. The conducted experiments using real data show the approach efficiency and effectiveness. Eventually, we also received positive feedbacks by the medical personnel who used our prototype.
Daniela D’Auria, Fabio Persia
Backmatter
Metadaten
Titel
Data Management Technologies and Applications
herausgegeben von
Markus Helfert
Andreas Holzinger
Orlando Belo
Chiara Francalanci
Copyright-Jahr
2015
Electronic ISBN
978-3-319-25936-9
Print ISBN
978-3-319-25935-2
DOI
https://doi.org/10.1007/978-3-319-25936-9