nach oben

2014 | Buch

New Trends in Databases and Information Systems

17th East European Conference on Advances in Databases and Information Systems

herausgegeben von: Barbara Catania, Tania Cerquitelli, Silvia Chiusano, Giovanna Guerrini, Mirko Kämpf, Alfons Kemper, Boris Novikov, Themis Palpanas, Jaroslav Pokorný, Athena Vakali

Verlag: Springer International Publishing

Buchreihe : Advances in Intelligent Systems and Computing

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book reports on state-of-art research and applications in the field of databases and information systems. It includes both fourteen selected short contributions, presented at the East-European Conference on Advances in Databases and Information Systems (ADBIS 2013, September 1-4, Genova, Italy), and twenty-six papers from ADBIS 2013 satellite events. The short contributions from the main conference are collected in the first part of the book, which covers a wide range of topics, like data management, similarity searches, spatio-temporal and social network data, data mining, data warehousing, and data management on novel architectures, such as graphics processing units, parallel database management systems, cloud and MapReduce environments. In contrast, the contributions from the satellite events are organized in five different parts, according to their respective ADBIS satellite event: BiDaTA 2013 - Special Session on Big Data: New Trends and Applications); GID 2013 – The Second International Workshop on GPUs in Databases; OAIS 2013 – The Second International Workshop on Ontologies Meet Advanced Information Systems; SoBI 2013 – The First International Workshop on Social Business Intelligence: Integrating Social Content in Decision Making; and last but not least, the Doctoral Consortium, a forum for Ph.D. students. The book, which addresses academics and professionals alike, provides the readers with a comprehensive and timely overview of new trends in database and information systems research, and promotes new ideas and collaborations among the different research communities of the eastern European countries and the rest of the world.

Inhaltsverzeichnis

Frontmatter

New Trends in Databases and Information Systems: Contributions from ADBIS 2013

Research on database and information system technologies has been rapidly evolving over the last few years. Advances concern either new data types, new management issues, and new kind of architectures and systems. The 17th East-European Conference on Advances in Databases and Information Systems (ADBIS 2013), held on September 1–4, 2013 in Genova, Italy, and associated satellite events aimed at covering some emerging issues concerning such new trends in database and information system research. The aim of this paper is to present such events, their motivations and topics of interest, as well as briefly outline the papers selected for presentations. The selected papers will then be included in the remainder of this volume.

Yamine Ait Ameur, Witold Andrzejewski, Ladjel Bellatreche, Barbara Catania, Tania Cerquitelli, Silvia Chiusano, Matteo Golfarelli, Giovanna Guerrini, Krzysztof Kaczmarski, Mirko Kämpf, Alfons Kemper, Tobias Lauer, Boris Novikov, Themis Palpanas, Jaroslav Pokorný, Stefano Rizzi, Athena Vakali

ADBIS Short Contributions

Frontmatter

New Ontological Alignment System Based on a Non-monotonic Description Logic

The choice of the representation formalism of the knowledge manipulated on the Web thanks to the ontologies is a crucial point which can conditioned their use. We must be capable of assuring a maximum of expressiveness in the definition of ontologies’ concepts and relations by taking into account the normal context aspect and the exception one. The second very important point is the simultaneous use of several ontologies in a purpose of sharing information. This use became possible thanks to the ontology alignment. It is based on the syntactic, semantic and structural similarities of the different input ontologies. To write the ontology concepts, we propose in this work, a non-monotonic description logic whose semantics is algebraic-based. Then, based on this representation formalism, we show how to improve the measure used to compute the structural similarity.

Ratiba Guebaili-Djider, Aicha Mokhtari, Farid Nouioua, Narhimene Boustia, Karima Akli Astouati

Spatiotemporal Co-occurrence Rules

Spatiotemporal co-occurrence rules (STCORs) discovery is an important problem in many application domains such as weather monitoring and solar physics, which is our application focus. In this paper, we present a general framework to identify STCORs for continuously evolving spatiotemporal events that have extended spatial representations. We also analyse a set of anti-monotone (monotonically non-increasing) and non anti-monotone measures to identify STCORs. We then validate and evaluate our framework on a real-life data set and report results of the comparison of the number candidates needed to discover actual patterns, memory usage, and the number of STCORs discovered using the anti-monotonic and non anti-monotonic measures.

Karthik Ganesan Pillai, Rafal A. Angryk, Juan M. Banda, Tim Wylie, Michael A. Schuh

R + + -Tree: An Efficient Spatial Access Method for Highly Redundant Point Data

We present a new spatial index belonging to R-tree family. Since our new index comes out from the R

-tree and holds the concept of non-overlapping nodes we call it R

+ +

-tree. The original R

-tree was designed for both point and spatial data. Using R

-tree for indexing spatial data is very inefficient. In our research we face the problem of indexing product catalogues data that can be represented as point data. Therefore we suggested the R

+ +

-tree for point data only. We present a dynamic index R

+ +

-tree as an improvement of R

-tree. In the tests we show that R

+ +

-tree offers even better search efficiency than R

-tree when highly redundant point data is considered. Moreover the construction time of R

+ +

-tree is shorter than the construction time of R

-tree.

Martin Šumák, Peter Gurský

Labeling Association Rule Clustering through a Genetic Algorithm Approach

Among the post-processing association rule approaches, a promising one is clustering. When an association rule set is clustered, the user is provided with an improved presentation of the mined patterns, since he can have a view of the domain to be explored. However, to take advantage of this organization, it is essential that good labels be assigned to the groups, in order to guide the user during the exploration process. Moreover, few works have explored and proposed labeling methods to this context. Therefore, this paper proposes a labeling method, named GLM (

enetic

abeling

ethod), for association rule clustering. The method is a genetic algorithm approach that aims to balance the values of the measures that are used to evaluate labeling methods in this context. In the experiments, GLM presented a good performance and better results than some other methods already explored.

Renan de Padua, Veronica Oliveira de Carvalho, Adriane Beatriz de Souza Serapião

Time Series Queries Processing with GPU Support

In recent years, an increased interest in processing and exploration of time-series has been observed. Due to the growing volumes of data, extensive studies have been conducted in order to find new and effective methods for storing and processing data. Research has been carried out in different directions, including hardware based solutions or NoSQL databases. We present a prototype query engine based on GPGPU and NoSQL database plus a new model of data storage using lightweight compression. Our solution improves the time series database performance in all aspects and after some modifications can be also extended to general-purpose databases in the future.

Piotr Przymus, Krzysztof Kaczmarski

Rule-Based Multi-dialect Infrastructure for Conceptual Problem Solving over Heterogeneous Distributed Information Resources

An approach for applying a combination of the semantically different rule-based languages for interoperable conceptual programming over various rule-based systems (RS) and relying on the logic program transformation technique recommended by the W3C Rule Interchange Format (RIF) is presented. Such approach is coherently combined with the heterogeneous data base integration applying semantic rule mediation. The basic functions of the infrastructure implementing the multi-dialect conceptual specifications by the interoperable RS and mediator programs are defined. The references to the detailed description of the infrastructure application for solving complex combinatorial problem are given. The research results show the usability of the approach and of the infrastructure for declarative, resource independent and re-usable data analysis in various application domains.

Leonid Kalinichenko, Sergey Stupnikov, Alexey Vovchenko, Dmitry Kovalev

Distributed Processing of XPath Queries Using MapReduce

In this paper we investigate the problem of efficiently evaluating XPath queries over large XML data stored in a distributed manner. We propose a MapReduce algorithm based on a query decomposition which computes all expected answers in one MapReduce step. The algorithm can be applied over large XML data which is given either as a single distributed document or as a collection of small XML documents.

Matthew Damigos, Manolis Gergatsoulis, Stathis Plitsos

A Query Language for Workflow Instance Data

In our simulation project ProHTA (Prospective Health Technology Assessment), we want to estimate the outcome of new medical innovations. To this end, we employ agent-based simulations that require workflow definitions with associated data about workflow instances. For example, to optimize the clinical pathways of patients with stroke we need the time and associated costs of each step in the clinical pathway. We adapt an existing conceptual model to store workflow definitions and instance data in RDF. This paper presents a query language to aggregate and query workflow instance data. That way, we support domain experts in analyzing simulation input and output. We present a heuristic algorithm for efficient query processing. Finally, we evaluate the performance of our query processing algorithm and compare it to SPARQL.

Philipp Baumgärtel, Johannes Tenschert, Richard Lenz

When Too Similar Is Bad: A Practical Example of the Solar Dynamics Observatory Content-Based Image-Retrieval System

The measuring of interest and relevance have always been some of the main concerns when analyzing the results of a Content-Based Image-Retrieval (CBIR) system. In this work, we present a unique problem that the Solar Dynamics Observatory (SDO) CBIR system encounters: too many highly similar images. Producing over 70,000 images of the Sun per day, the problem of finding similar images is transformed into the problem of finding similar solar events based on image similarity. However, the most similar images of our dataset are temporal neighbors capturing the same event instance. Therefore a traditional CBIR system will return highly repetitive images rather than similar but distinct events. In this work we outline the problem in detail, present several approaches tested in order to solve this important image data mining and information retrieval issue.

Juan M. Banda, Michael A. Schuh, Tim Wylie, Patrick McInerney, Rafal A. Angryk

Viable Systems Model Based Information Flows

In information systems engineering it is important to ensure that all essential information flows are properly identified and supported. Usually the set of relevant information flows is detected using such knowledge acquisition techniques as interviews and document analysis. Since nowadays flexibility, adaptability, and agility are important features of enterprises for their operational strength in a highly turbulent environment, the research question arises whether there is a generic set of requirements that have to be met by information systems to obtain and sustain above mentioned enterprise features. To answer this question, Viable Systems Model (VSM) is experimentally used as a basis for identification of a set of information flows, which should be present in VSM complying enterprises. Specific constructs presented in the enterprise architecture description language are proposed for consistent integration of detected flows into enterprise information systems.

Marite Kirikova, Mara Pudane

On Materializing Paths for Faster Recursive Querying

Recursive data structures are often used in business applications. They store data on e.g. corporate hierarchies, product categories and bill-of-material. Therefore, recursive queries as introduced by SQL:1999 or formerly implemented by Oracle constitute a useful facility for application programmers. Unfortunately, recursive queries are not implemented by a number of database systems with MySQL as the most profound example. If an application has such a database as the backend storage, recursive queries will be usually hard-coded at the client side. This is not efficient. In this paper we propose using redundant data structures to answer recursive queries quicker. Such structures must be synchronized in response to updates of data. This means a significant processing overhead for updates. We present experimental evaluation to show loses and gains caused by our solution for various usage scenarios. They prove the feasibility of our proposal. We show our proof-of-concept implementation as a part of the Hibernate framework. Thus, application programmers are separated from all internals of necessary database objects and triggers. These are created automatically by Hibernate generators.

Aleksandra Boniewicz, Piotr Wiśniewski, Krzysztof Stencel

XSLTMark II – A Simple, Extensible and Portable XSLT Benchmark

In this paper we focus on the problem of XSLT benchmarking. Although it is a straightforward task, currently there exists only a single XSLT benchmark which is obsolete and no longer supported. Hence we have proposed a novel tool called

XSLTMark II

having several important features such as simplicity, portability, extensibility, and wide parametrization. It allows for generating of test cases from templates of tests, running tests, generating XML reports, transforming reports into HTML format and testing different XSLT processors. The basic set of templates was created on the basis of analysis of real-world XSLT scripts. And, last but not least, a proof of the concept is provided via application of the benchmark on a selected set of XSLT processors.

Viktor Mašíček, Irena Holubová (Mlýnková)

ReMoSSA: Reference Model for Specification of Self-adaptive Service-Oriented-Architecture

Specification of SOA has been used to decrease the complexity of service’s development to illustrate the self-adaptive applications. On the one hand, it is a means that provides us the appropriate vocabulary for describing the self-adaptive applications. On the other hand, it grants the key architectural characteristics of self-adaptive service under highly changing environments. In this paper, we present ReMoSSA a formal reference model for specifying self-adaptive Service-Based Applications (SBA). ReMoSSA integrates self-adaptation mechanisms and strategies to provide autonomic and adaptable services. It provides a dynamic monitoring and dynamic adaptation in the design phase. ReMoSSA reduces the cost and the effort of maintenance.

Sihem Cherif, Raoudha Ben Djemaa, Ikram Amous

DSD: A DaaS Service Discovery Method in P2P Environments

Exposing data sources through

Daas

(Data as a Service) services become increasingly important. The

DaaS

service discovery constitutes a real challenge in P2P environments. Although several data source discovery methods take into account the semantic heterogeneity problems by using several domain ontologies (DOs), most of them imposed a topology on the graph formed by DOs and mapping links. In this paper, we propose a

DaaS

Service Discovery (DSD) method without imposing any topology on this graph. Peers, using a common DO, are grouped in a Virtual Organization (VO) and connected in a Distributed Hash Table (DHT). Then, lookups within a same VO consists in a classical search in a DHT. Regarding the inter-VO discovery process, we propose an addressing system, based on the existing mapping links between DOs, to interconnect VOs. Furthermore, a lazy maintenance is adopted in order to reduce the number of messages required to update the system.

Riad Mokadem, Franck Morvan, Chirine Ghedira Guegan, Djamal Benslimane

Special Session on Big Data: New Trends and Applications

Frontmatter

Designing Parallel Relational Data Warehouses: A Global, Comprehensive Approach

The process of designing a parallel data warehouse has two main steps: (1) fragmentation and (2) allocation of so-generated fragments at various nodes. Usually, we split the data warehouse horizontally, allocate fragments over nodes, and finally balance the load over the nodes of the parallel machine. The main drawback of such design approach is that the high communication cost. Therefore,

Data Replication

(DR) has become a requirement for availability on the one hand but also for minimizing the communication cost on the other hand. In this paper, we present a

redundant allocation algorithm for designing shared-nothing parallel relational data warehouses

, which is based on the well-known

fuzzy k-means clustering algorithm

Soumia Benkrid, Ladjel Bellatreche, Alfredo Cuzzocrea

Big Data New Frontiers: Mining, Search and Management of Massive Repositories of Solar Image Data and Solar Events

This work presents one of the many emerging research domains where big data analysis has become an immediate need to process the massive amounts of data being generated each day: solar physics. While building a content-based image retrieval system for NASA’s Solar Dynamics Observatory mission, we have discovered research problems that can be addressed by the use of big data processing techniques and in some cases require the development of novel techniques. With over one terabyte of solar data being generated each day, and ever more missions on the horizon that expect to generate petabytes of data each year, solar physics presents many exciting opportunities. This paper presents the current status of our work with solar image data and events, our shift towards using big data methodologies, and future directions for big data processing in solar physics.

Juan M. Banda, Michael A. Schuh, Rafal A. Angryk, Karthik Ganesan Pillai, Patrick McInerney

Extraction, Sentiment Analysis and Visualization of Massive Public Messages

This paper describes the design and implementation of tools to extract, analyze and explore an arbitrarily great amount of public messages from diverse sources. The aim of our work is to flexibly support sentiment analysis by quickly adapting to different use cases, languages, and message sources. First, a highly parallel scraper has been implemented, allowing the user to customize the behavior with scripting technologies and thus being able to manage dynamically loaded content. Then, a novel framework is developed to support agile programming, building and validating a classifier for sentiment analysis. Finally, a web application allows the real-time selection and projection of the analysis results in different dimensions in an OLAP fashion.

Jacopo Farina, Mirjana Mazuran, Elisa Quintarelli

Desidoo, a Big-Data Application to Join the Online and Real-World Marketplaces

The paper presents a big-data application in the context of an innovative marketplace service running in the cloud. The marketplace aims at bridging the gap between the online e-commerce world and the offline physical places and shops. Experiences from the system startup, design patterns and challenges to scale the platform are discussed.

Daniele Apiletti, Fabio Forno

GraphDB – Storing Large Graphs on Secondary Memory

The volume of complex network data has been exponentially increased in the last years madding graph mining area the focus of a lot of research efforts. Most algorithms for mining this kind of data assume, however, that the complex network fits in primary memory. Unfortunately, such assumption is not always true. Even considering that, in some cases, using big computer clusters (in a MapReduce fashion, for instance) might be a suitable way to circumvent part of the difficulties of mining big data, efficiently storing and retrieving complex network data is still a great challenge. Thus the main goal of this work is to introduce the definition of a new data structure, called

GraphDB-tree

that can be used to efficiently store and retrieve complex networks, and also, allowing efficient queries in large complex networks.

Lucas Fonseca Navarro, Ana Paula Appel, Estevam Rafael Hruschka Junior

Hadoop on a Low-Budget General Purpose HPC Cluster in Academia

In the last decade, we witnessed an increasing interest in High Performance Computing (HPC) infrastructures, which play an important role in both academic and industrial research projects. At the same time, due to the increasing amount of available data, we also witnessed the introduction of new frameworks and applications based on the MapReduce paradigm (e.g., Hadoop). Traditional HPC systems are usually designed for CPU- and memory-intensive applications. However, the use of already available HPC infrastructures for data-intensive applications is an interesting topic, in particular in academia where the budget is usually limited and the same cluster is used by many researchers with different requirements. In this paper, we investigate the integration of Hadoop, and its performance, in an already existing low-budget general purpose HPC cluster characterized by heterogeneous nodes and a low amount of secondary memory per node.

Paolo Garza, Paolo Margara, Nicolò Nepote, Luigi Grimaudo, Elio Piccolo

Discovering Contextual Association Rules in Relational Databases

Contextual association rules represent co-occurrences between contexts and properties of data, where the context is a set of environmental or user personal features employed to customize an application. Due to their particular structure, these rules can be very tricky to mine, and if the process is not carried out with care, an unmanageable set of not significant rules may be extracted. In this paper we survey two existing algorithms for relational databases and present a novel algorithm that merges the two proposals overcoming their limitations.

Elisa Quintarelli, Emanuele Rabosio

Challenges and Issues on Collecting and Analyzing Large Volumes of Network Data Measurements

This paper presents the main challenges and issues faced when collecting and analyzing a large volume of network data measurements. We refer in particular to data collected by means of Neubot, an open source project that uses active probes on the client side to measure the evolution of key network parameters over time to better understand the performance of end-users’ Internet connections. The measured data are already freely accessible and stored on Measurement Lab (M-Lab), an organization that provides dedicated resources to perform network measurements and diagnostics in the Internet. Given the ever increasing amount of data collected by the Neubot project as well as other similar projects hosted by M-Lab, it is necessary to improve the platform to efficiently handle the huge amount of data that is expected to come in the very near future, so that it can be used by researchers and end-users themselves to gain a better understanding of network behavior.

Enrico Masala, Antonio Servetti, Simone Basso, Juan Carlos De Martin

Second International Workshop on GPUs in Databases

Frontmatter

GPU-Accelerated Query Selectivity Estimation Based on Data Clustering and Monte Carlo Integration Method Developed in CUDA Environment

Query selectivity is a parameter that allows to estimate the size of data satisfying a query condition. For complex range query condition it may be defined as multi integral over a multivariate probability density function (PDF). It describes a multidimensional attribute value distribution and may be estimated using the known approach based on a superposition of Gaussian clusters. But there is the problem of an efficient integration of the multivariate PDF. This may be solved by applying Monte Carlo (MC) method which exposes its advantages for high dimensions. To satisfy the time constraint of selectivity calculation, the parallelized MC integration method was proposed in the paper. The implementation of the method is based on CUDA technology. The paper also describes the application designated for obtaining the time-optimal parameter values of the method.

Dariusz Rafal Augustyn, Lukasz Warchal

Exploring the Design Space of a GPU-Aware Database Architecture

The vast amount of processing power and memory bandwidth provided by modern graphics cards make them an interesting platform for data-intensive applications. Unsurprisingly, the database research community has identified GPUs as effective co-processors for data processing several years ago. In the past years, there were many approaches to make use of GPUs at different levels of a database system. In this paper, we summarize the major findings of the literature on GPU-accelerated data processing. Based on this survey, we present key properties, important trade-offs and typical challenges of GPU-aware database architectures, and identify major open research questions.

Sebastian Breß, Max Heimel, Norbert Siegmund, Ladjel Bellatreche, Gunter Saake

Dynamic Compression Strategy for Time Series Database Using GPU

Nowadays, we can observe increasing interest in processing and exploration of time series. Growing volumes of data and needs of efficient processing pushed research in new directions. GPU devices combined with fast compression and decompression algorithms open new horizons for data intensive systems. In this paper we present improved cascaded compression mechanism for time series databases build on Big Table–like solution. We achieved extremely fast compression methods with good compression ratio.

Piotr Przymus, Krzysztof Kaczmarski

Online Document Clustering Using GPUs

An algorithm for performing online clustering on the GPU is proposed which makes heavy use of the atomic operations available on the GPU. The algorithm can cluster multiple documents in parallel in way that can saturate all the parallel threads on the GPU. The algorithm takes advantage of atomic operations available on the GPU in order to cluster multiple documents at the same time. The algorithm results in up to 3X speedup using a real time news document data set as well as on randomly generated data compared to a baseline algorithm on the GPU that clusters only one document at a time.

Benjamin E. Teitler, Jagan Sankaranarayanan, Hanan Samet, Marco D. Adelfio

Second International Workshop on Ontologies Meet Advanced Information Systems

Frontmatter

Using the Semantics of Texts for Information Retrieval: A Concept- and Domain Relation-Based Approach

Our hypothesis is that assessing the relevance of a document with respect to a query is equivalent to assessing the conceptual similarity between the terms of the query and those of the document. In this article, we therefore propose a method of calculating conceptual similarity. Our information retrieval strategy is based on exploring an ontology and domain relations between concepts marked by verbal forms. Our approach overall is implemented by a prototype and the results obtained are evaluated. We thus show that a semantic IR system based on concepts improves recall with respect to a classic IR system and that a semantic IR system based on concepts and domain relations improves precision with respect to IR based on concepts alone.

Davide Buscaldi, Marie-Noëlle Bessagnet, Albert Royer, Christian Sallaberry

A Latent Semantic Indexing-Based Approach to Determine Similar Clusters in Large-scale Schema Matching

Schema matching plays a central role in identifying the semantic correspondences across shared-data applications, such as data integration. Due to the increasing size and the widespread use of XML schemas and different kinds of ontologies, it becomes toughly challenging to cope with large-scale schema matching. Clustering-based matching is a great step towards more significant reduction of the search space and thus improved efficiency. However, methods used to identify similar clusters depend on literally matching terms. To improve this situation, in this paper, a new approach is proposed which uses Latent Semantic Indexing that allows retrieving the conceptual meaning between clusters. The experimental evaluations show encourage results towards building efficient large-scale matching approaches.

Seham Moawed, Alsayed Algergawy, Amany Sarhan, Ali Eldosouky, Gunter Saake

$ \mathcal{P}oss-\mathcal{S}\mathcal{R}\mathcal{O}\mathcal{I}\mathcal{Q}(\mathcal{D})$ : Possibilistic Description Logic Extension toward an Uncertain Geographic Ontology

The use of description logics (DL) formalism to represent geographical knowledge has received a lot of attention recently. Nevertheless, classical DLs are not suitable to represent incomplete and uncertain knowledge, which represent several situations in geographic domain. In addition they cannot represent the spatio-temporal information usually present in geographical application. In this paper, we propose a possibilistic extension of the very expressive Description Logic

$\mathcal{SROIQ(D)}$

, the basis of the language OWL2, called

$\mathcal{P}$

oss-

$\mathcal{SROIQ(D)}$

, as a solution to handling uncertainty and for dealing with inconsistency in geographical applications. Both syntax and semantics of

$\mathcal{P}$

oss-

$\mathcal{SROIQ(D)}$

are considered. Illustrative examples are given.

Safia Bal Bourai, Aicha Mokhtari, Faiza Khellaf

Ontology-Based Context-Aware Social Networks

Due to the increasing progress of context-aware platforms with social networks many works use these platforms with mobile devices. Thus, we expose, in this paper, a survey of some of these works. Knowledge representation in the social networks has a great interest to obtain a set of information with a valuable signification. Therefore, we expose a state-of-the-art about the knowledge extraction using ontologies in social networks. Then, we propose, in this paper, an approach to combine these technologies (context, mobile and ontology) together to have a contextualized ontology helping to assist a mobile user in his information retrieval from the social network. We conclude by giving an idea about our future works.

Maha Maalej, Achraf Mtibaa, Faïez Gargouri

Diversity in a Semantic Recommender System

In this paper, we introduce the notion of diversity in the recommender systems (RS). The aim is to provide the user with not only all the most relevant contents, but also the most diversified. To do this, we have developed a diversification algorithm that we have implemented on a semantic RS. This last performs the matching between the description of the contents and the user profile. A comparison of our algorithm to the diversity algorithm Swap, in terms of relevance and diversity, has revealed better results.

Latifa Baba-Hamed, Magloire Namber

Ontology - Driven Observer Pattern

We propose an ontology driven observer pattern which not only mitigates the drawbacks identified in the GoF observer pattern but also mitigates the drawbacks which occur in the general usage of patterns. We separate and encapsulate the pattern logic in an ontology component which increases the reusability of the pattern at the implementation level as well. The proposed solution enables to change the classes participating in the pattern even at runtime. Even the users/ non-programmers can make changes in the pattern to change the application behavior. It enables identification of a pattern present in a code and also allows easy change, addition/ removal of the pattern to/ from the code. The proposed pattern also decouples the participant classes from each other thereby enhancing the reusability and modifiability of each of the participant classes.

Amrita Chaturvedi, Prabhakar T.V.

First International Workshop on Social Business Intelligence: Integrating Social Content in Decision Making

Frontmatter

Towards a Semantic Data Infrastructure for Social Business Intelligence

The tremendous popularity of web-based social media is attracting the attention of the industry to take profit from the massive availability of sentiment data, which is considered of high value for Business Intelligence (BI). So far, BI has been mainly concerned with corporate data with little or null attention with the external world. However, for BI analysts, taking into account the Voice of the Customer (VoC) and the Voice of the Market (VoM) is crucial for putting in context the results of their analyses. Recent advances in Opinion Mining and Sentiment Analysis have made possible to effectively extract and summarize sentiment data from these massive social media. As a consequence, VoC and VoM can be now listened from web-based social media (e.g., blogs, reviews forums, social networks, and so on). However, new challenges arise when attempting to integrate traditional corporate data and external sentiment data. This paper aims to introduce these issues and to devise potential solutions for the near future. More specifically, the paper will focus on the proposal of a semantic data infrastructure for BI aimed at providing new opportunities for integrating traditional and social BI.

Rafael Berlanga, María José Aramburu, Dolores M. Llidó, Lisette García-Moya

Subjective Business Polarization: Sentiment Analysis Meets Predictive Modeling

The growth of Internet and the information technology has generated big changes in subjects communication, that, nowadays, occurs through social media or via thematic forums. This produced a surge of information that is freely available: it offers the possibility to companies to evaluate their credibility and to monitor the ”mood” of their markets. The application of Sentiment Analysis (SA) has been proposed in order to extract, via objective rules, positive or negative opinions from (unstructured) texts. Communication literature, instead, highlights how such polarization derives from a subjective evaluations of the texts by the receivers. In business applications the receiver (i.e. marketing manager) is leaded by the values and the mission of the company. In our paper we propose a strategy to fit brand image and company values with a subjective SA, a probabilistic Kernel classifier has been employed to get discrimination rule and to rank classification results.

Caterina Liberati, Furio Camillo

Sentiment Analysis and City Branding

The Web is a huge virtual space where to express and share individual opinions, influencing any aspect of life, with implications for marketing and communication alike. Social Media are already an important marketing arena.

This paper describes, on one hand, the characteristics of Sentiment Analysis and, on the other hand, the results of its application to an empirical research on the city of Bologna and on its brand perception on the Web.

In the international scenario a growing number of cities compete with each other in order to attract: investors and foreign companies; different types of tourists, and new residents.

City branding can be considered the starting point for developing effective policy of city marketing. The Bologna City Branding Project aims at increasing the effectiveness of territorial marketing policies carried out by the municipality of Bologna.

This study partially confirms and partially rejects what many sectors of the city would have expected from the perception of Bologna on the Web. From the point of view of academic research, it has shown the potential of Sentiment Analysis in the study of perception of the city brand. Further investigations should be made to integrate this approach with the more qualitative and quantitative techniques. From the point of view of the place marketing of cities, the results of this research have shown that place marketing is a complex activity and that, in order to be more effective, an integrated plurality of approaches have to be promoted and used.

Roberto Grandi, Federico Neri

A Case Study for a Collaborative Business Environment in Real Estate

According to recent vision of Web 2.0,this paper explores the prospective of implementing a business environment that enables users to be more agile in capturing and evaluating information about real estate offers. A cloud infrastructure hosts the business environment and introduces commercial services in a web community made up of a set of actors (i.e. citizens, enterprises, professionals, companies etc.). Users explore, change and share both quantitative and spatial information by means of a social network, the common venue within which they interact. Being offered as a cloud service, the business environment supports efficient and scalable data management of loosely structured information that is captured from web resources. A prototype is presented that provides users with the geographic representation of real estate offers and related statistics about the price trend.

Nicoletta Dessì, Gianfranco Garau

OLAP on Information Networks: A New Framework for Dealing with Bibliographic Data

In the context of decision making, data warehouses support OLAP technology and they have been very useful for efficient analysis onto structured data. For several years, OLAP is also used to analyze and visualize more complex data. Now, many data sets of interest can be described as a linked collection of interrelated objects. They could be represented as heterogeneous information networks, in which there are multiple object and link types. In this paper, we are focusing on bibliographic data. This type of data constitutes a rich source that is the starting point of research on bibliometrics, scientometrics domains. In this context, we discuss the interest of combining information networks, OLAP and data mining technologies. We propose a framework to materialize this combination and discuss the main challenges to build this framework. The basic idea is to be able to analyze various networks built from the bibliographic data representing different points of view (authors networks, citations networks...) and their dynamic.

Wararat Jakawat, Cécile Favre, Sabine Loudcher

Doctoral Consortium

Frontmatter

Spatial Indexes for Simplicial and Cellular Meshes

We address the problem of performing spatial and topological queries on simplicial and cellular meshes. These arise in several application domains including 3D GIS, scientific visualization and finite element analysis. Firstly, we present a family of spatial indexes for tetrahedral meshes, that we call tetrahedral trees. Then, we present the PR-star octree, that is a combined spatial data structure for performing efficient topological queries on simplicial meshes. Finally, we propose to extend these frameworks to arbitrary dimensions and to larger class of meshes, such as non-simplicial meshes.

Riccardo Fellegara

Mathematical Methods of Tensor Factorization Applied to Recommender Systems

On internet today, an overabundance of information can be accessed, making it difficult for users to process and evaluate options and make appropriate choices. This phenomenon is known as

information overload

. Over time, various methods of information filtering have been introduced in order to assist users in choosing what may be of their interest. Recommender Systems (RS) [14] are techniques for information filtering which play an important role in e-commerce, advertising, e-mail filtering, etc. Therefore, RS are an answer, though partial, to the problem of information overload. Recommendation algorithms need to be continuously updated because of a constant increase in both the quantity of information and ways of access to that information, which define the different contexts of information use. The research of more effective and more efficient methods than those currently known in literature is also stimulated by the interests of industrial research in this field, as demonstrated by the Netflix Prize Contest, the open competition for the best algorithm to predict user ratings for films, based on previous ratings. The contest showed the superiority of mathematical methods that discover latent factors which drives user-item similarity, with respect to classical collaborative filtering algorithms. With the ever-increasing information available in digital archives and textual databases, the challenge of implementing personalized filters has become the challenge of designing algorithms able to manage huge amounts of data for the elicitation of user needs and preferences. In recent years, matrix factorization techniques have proved to be a quite promising solution to the problem of designing efficient filtering algorithms in the

Big Data

Era. The main contribution of this paper is an analysis of these methods, which focuses on tensor factorization techniques, as well as the definition of a method for tensor factorization suitable for recommender systems.

Giuseppe Ricci, Marco de Gemmis, Giovanni Semeraro

Extended Dynamic Weighted Majority Using Diversity to Handle Drifts

Concept drift is the recent trend of online data. The distribution underlying the data is changing with time .There are many algorithms developed in the literature to handle such drifting data concepts. In our paper we are outlining the framework of our new approach to handle drifts which will be based on the concept of diversity. Diversity is the measure of variation in the predictive accuracy of ensemble members. Our approach would implement diversity concept first time on the

online approach that does not explicitly use a mechanism to handle drifts.

This type of online approach would give better accuracy at a slight increase in the running time and memory. In our paper we would also outline the main objectives behind our research and the state of the art in data stream mining.

Parneeta Sidhu, M. P. S Bhatia

Backmatter

Titel: New Trends in Databases and Information Systems
herausgegeben von: Barbara Catania
Tania Cerquitelli
Silvia Chiusano
Giovanna Guerrini
Mirko Kämpf
Alfons Kemper
Boris Novikov
Themis Palpanas
Jaroslav Pokorný
Athena Vakali
Verlag: Springer International Publishing
Electronic ISBN: 978-3-319-01863-8
Print ISBN: 978-3-319-01862-1
DOI: https://doi.org/10.1007/978-3-319-01863-8

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

New Trends in Databases and Information Systems: Contributions from ADBIS 2013

ADBIS Short Contributions

Frontmatter

New Ontological Alignment System Based on a Non-monotonic Description Logic

Spatiotemporal Co-occurrence Rules

R + + -Tree: An Efficient Spatial Access Method for Highly Redundant Point Data

Labeling Association Rule Clustering through a Genetic Algorithm Approach

Time Series Queries Processing with GPU Support

Rule-Based Multi-dialect Infrastructure for Conceptual Problem Solving over Heterogeneous Distributed Information Resources

Distributed Processing of XPath Queries Using MapReduce

A Query Language for Workflow Instance Data

When Too Similar Is Bad: A Practical Example of the Solar Dynamics Observatory Content-Based Image-Retrieval System

Viable Systems Model Based Information Flows

On Materializing Paths for Faster Recursive Querying

XSLTMark II – A Simple, Extensible and Portable XSLT Benchmark

ReMoSSA: Reference Model for Specification of Self-adaptive Service-Oriented-Architecture

DSD: A DaaS Service Discovery Method in P2P Environments

Special Session on Big Data: New Trends and Applications

Frontmatter

Designing Parallel Relational Data Warehouses: A Global, Comprehensive Approach

Big Data New Frontiers: Mining, Search and Management of Massive Repositories of Solar Image Data and Solar Events

Extraction, Sentiment Analysis and Visualization of Massive Public Messages

Desidoo, a Big-Data Application to Join the Online and Real-World Marketplaces

GraphDB – Storing Large Graphs on Secondary Memory

Hadoop on a Low-Budget General Purpose HPC Cluster in Academia

Discovering Contextual Association Rules in Relational Databases

Challenges and Issues on Collecting and Analyzing Large Volumes of Network Data Measurements

Second International Workshop on GPUs in Databases

Frontmatter

GPU-Accelerated Query Selectivity Estimation Based on Data Clustering and Monte Carlo Integration Method Developed in CUDA Environment

Exploring the Design Space of a GPU-Aware Database Architecture

Dynamic Compression Strategy for Time Series Database Using GPU

Online Document Clustering Using GPUs

Second International Workshop on Ontologies Meet Advanced Information Systems

Frontmatter

Using the Semantics of Texts for Information Retrieval: A Concept- and Domain Relation-Based Approach

A Latent Semantic Indexing-Based Approach to Determine Similar Clusters in Large-scale Schema Matching

$ \mathcal{P}oss-\mathcal{S}\mathcal{R}\mathcal{O}\mathcal{I}\mathcal{Q}(\mathcal{D})$ : Possibilistic Description Logic Extension toward an Uncertain Geographic Ontology

Ontology-Based Context-Aware Social Networks

Diversity in a Semantic Recommender System

Ontology - Driven Observer Pattern

First International Workshop on Social Business Intelligence: Integrating Social Content in Decision Making

Frontmatter

Towards a Semantic Data Infrastructure for Social Business Intelligence

Subjective Business Polarization: Sentiment Analysis Meets Predictive Modeling

Sentiment Analysis and City Branding

A Case Study for a Collaborative Business Environment in Real Estate

OLAP on Information Networks: A New Framework for Dealing with Bibliographic Data

Doctoral Consortium

Frontmatter

Spatial Indexes for Simplicial and Cellular Meshes

Mathematical Methods of Tensor Factorization Applied to Recommender Systems

Extended Dynamic Weighted Majority Using Diversity to Handle Drifts

Backmatter

Premium Partner