Skip to main content

Über dieses Buch

This book constitutes the refereed proceedings of the 8th International Workshop on Databases in Networked Information Systems, DNIS 2013, held in Aizu-Wakamatsu, Japan in March 2013. The 22 revised full papers presented were carefully reviewed and selected for inclusion in the book. The workshop generally puts the main focus on data semantics and infrastructure for information management and interchange. The papers are organized in topical sections on cloud-based database systems; information and knowledge management; information extraction from data resources; bio-medical information management; and networked information systems: infrastructure.



Access to Information Resources

Logic-Oriented Confidentiality Policies for Controlled Interaction Execution

Controlled Interaction Execution is a specific concept of inference control for logic-oriented information systems. Interactions include query answering, update processing and data publishing, and operational control is based on declarative policies. Complementing a previous survey on this concept, in this work we treat various forms of confidentiality policies regarding their syntax, semantics, algorithms and pragmatics.In each case, we consider an information provider’s interest in confidentiality as an exception from his general willingness and agreement to share information with a client.
Joachim Biskup

Cloud-Based Database Systems

Managing Geo-replicated Data in Multi-datacenters

Over the past few years, cloud computing and the growth of global large scale computing systems have led to applications which require data management across multiple datacenters. Initially the models provided single row level transactions with eventual consistency. Although protocols based on these models provide high availability, they are not ideal for applications needing a consistent view of the data. There has been now a gradual shift to provide transactions with strong consistency with Google’s Megastore and Spanner. We propose protocols for providing full transactional support while replicating data in multi-datacenter environments. First, an extension of Megastore is presented, which uses optimistic concurrency control. Second, a contrasting method is put forward, which uses gossip-based protocol for providing distributed transactions across datacenters. Our aim is to propose and evaluate different approaches for geo-replication which may be beneficial for diverse applications.
Divyakant Agrawal, Amr El Abbadi, Hatem A. Mahmoud, Faisal Nawab, Kenneth Salem

MapReduce Algorithms for Big Data Analysis

As there is an increasing trend of applications being expected to deal with big data that usually do not fit in the main memory of a single machine, analyzing big data is a challenging problem today. For such data-intensive applications, the MapReduce framework has recently attracted considerable attention and started to be investigated as a cost effective option to implement scalable parallel algorithms for big data analysis which can handle petabytes of data for millions of users. MapReduce is a programming model that allows easy development of scalable parallel applications to process big data on large clusters of commodity machines. Google’s MapReduce or its open-source equivalent Hadoop is a powerful tool for building such applications.
In this tutorial, we will introduce the MapReduce framework based on Hadoop and present the state-of-the-art in MapReduce algorithms for query processing, data analysis and data mining. The intended audience of this tutorial is professionals who plan to design and develop MapReduce algorithms and researchers who should be aware of the state-of-the-art in MapReduce algorithms available today for big data analysis.
Kyuseok Shim

Architectural Design of a Compensation Mechanism for Long Lived Transactions

Together with making Cloud computing reliable and dependable, there is a need to create mature definition of Service Level Agreements (SLA) for the quality assurances of services. Additionally, it is necessary to implement certain mechanisms in order to maintain the SLA further. In particular, the inter-Cloud environment with multiple entities of Cloud computing has evolved. In order to realize the inter-Cloud environment, the Composite Web Service (CWS) is a promising candidate. However, the CWS which consists of multiple elemental services has features of a transactional workflow adopting the notion of a long lived transaction (LLT). A compensation transaction is required for handling the occurrence of an exception. However its design methodology has remained immature. In particular, clarifying a concrete architecture for integrating Transaction-Aware together with the mechanism for maintaining SLA categorized as QoS-Aware has been ongoing. In this paper, we present our initial consideration on the architectural design and its principle for SLA management which carries out the compensations. The architectural aspects for an intelligent function which is required in the next scalable workflow will also be presented.
Shinji Kikuchi

Information and Knowledge Management I

The Palomar Transient Factory Data Archive

The Palomar Transient Factory (PTF) is a multi-epoch robotic survey of the northern sky for the scientific study of transient astronomical phenomena. The camera and telescope provide for wide-field imaging in two optical bands. The system has been in operation since December 2008. The image data are sent to the Infrared Processing and Analysis Center (IPAC) for processing and archiving. The archived science products are astrometrically and photometrically calibrated images, extracted source catalogs, and coadded reference images. Relational databases track these products in operations and the data archive. The fully automated system has benefited by lessons learned from past IPAC projects and comprises advantageous features that are potentially incorporable into other ground-based observatories. Both off-the-shelf and in-house software have been utilized for economy and rapid development. The PTF data archive is curated by the NASA/IPAC Infrared Science Archive (IRSA). A state-of-the-art custom web interface has been deployed for downloading the raw images, processed images, and source catalogs from IRSA. A public release of this science-rich archive is planned.
Wei Mi, R. Laher, J. Surace, C. Grillmair, S. Groom, D. Levitan, B. Sesar, G. Helou, T. Prince, S. Kulkarni

Information Extraction from Data Resources

Making Transaction Execution the Bottleneck

(Instead of All the Overheads)
Traditional database systems rely upon a proven set of tools to guarantee ACID properties without compromising performance: a buffer manager to mediate the transfer of data between fast in-memory processing and slow disk-based persistent storage, latching and locking to coordinate concurrent access to data, and logging to enable the recovery, verification, and repair of committed data. These tools are built on code bases that are 10-30 years old and designed for hardware assumptions nearly the same age. Modern hardware technologies such as fast persistent memories and multicore break those assumptions, turning the traditional proven tools into the new bottlenecks. Our goal is to rethink the traditional tools so that they will not be bottlenecks. Here, we review some of the concurrency-related bottlenecks that face the modern transactional storage management system and survey state of the art techniques that allow these traditional tools to provide intended functionality without becoming bottlenecks.
Harumi Kuno, Goetz Graefe, Hideaki Kimura

Performance Evaluation of Similar Sentences Extraction

Similar sentence extraction is an important issue because it is the basis of many applications. In this paper, we conduct comprehensive experiments on evaluating the performance of similar sentence extraction in a general framework. The effectiveness and the efficiency issues are explored on three real datasets, with different factors considered, i.e., size of data, top-k value. Moreover, the WordNet is taken into account as an additional semantic resource and incorporated into the framework. We thoroughly explore the performance of the updated framework to study the similar sentence extraction.
Yanhui Gu, Zhenglu Yang, Miyuki Nakano, Masaru Kitsuregawa

Using Neural Networks for Forecasting of Commodity Time Series Trends

Time series of commodity prices are investigated on two scales - across commodities for a portfolio of items available from the database@ of the International Monetary Fund on monthly averages scale, as well as high quality trade event tick data for crude oil futures contract from the market in Japan. The degree of causality is analyzed for both types of data using feed-forward neural network architecture. It is found that within the portfolio of commodities the predictability highly varies from stochastic behavior consistent with the efficient market hypothesis up to the predictability rates of ninety percent. For the crude oil in Japan, we analyze one month (January 2000) series of a mid-year delivery contract with 25,210 events, using several schemes for causality extraction. Both the event-driven sequence grid and second-wide implied time grid are used as the input data for the neural network. Using half of the data for network training, and the rest for validation, it is found in general that the degree of trend extraction for the single next event is in the sixty percent range, which can increase up to the ninety percent range when the symbolization technique is introduced to denoise the underlying data of normalized log returns. Auxiliary analysis is performed that incorporates the extra input information of trading volumes. The time distribution of trading event arrivals is found to exhibit interesting features consistent with several modes of trading strategies.
Akira Sato, Lukáš Pichl, Taisei Kaizoji

Finding Similar Legal Judgements under Common Law System

Legal judgements are complex in nature and contain citations of other judgements. Research efforts are going on to develop methods for efficient search of relevant legal information by extending the popular approaches used in information retrieval and web searching research areas. In the literature, it was shown that it is possible to find similar judgements by exploiting citations or links. In this paper, an approach has been has been proposed using the notion of “paragraph-link” to improve the efficiency of link-based similarity method. The experiments on real-world data set and user evaluation study show the encouraging results.
Sushanta Kumar, P. Krishna Reddy, V. Balakista Reddy, Malti Suri

Knowledge Visualization of the Deductive Reasoning for Word Problems in Mathematical Economics

In solving word problems in mathematical economics, such as national income determination problems and various financial problems, two different knowledge bases are required: a database of math formulas and a database of economics theories. For this we have developed the knowledge bases, with which we offer our students an effective education support system for economics word math problems. Solving a word math problem is nothing more or less than conducting a process of deductive reasoning to find the unknown of the problem. To construct the deductive reasoning process is to collect missing pieces of information from the knowledge bases, to bridge between the given data and the unknown of the problem. To promote students’ use of the formula and theory knowledge bases in our educational support system, we have visualized the reasoning processes as a solution plan graph and collected these charts to make a content center for teaching materials. The paper shows that a solution plan graph can play a role of a good user interface for accessing the knowledge bases. We illustrate a solution plan graph and its annotation technology for constructing the solution plan graph.
Yukari Shirota, Takako Hashimoto, Pamela Stanworth

Developing Re-usable Components Based on the Virtual-MVC Design Pattern

In modern complex enterprise applications re-usability and interoperability of software components are of vital importance due to the increasing heterogeneous development platforms and variety of end user devices in which computational services need to be exposed. The need of code solutions re-usability led to the development of Design Patterns, which are means to encapsulate proven solutions to recurrent problems and they provide developers with a common language to abstract the logic encrypted under source code implementation structures. In this article we focus on the Model-View-Controller (MVC) design pattern. Although it represented a step forward for components re-usability, the model and view are still coupled compromising the business logic and introducing a degree of complexity for the applications development. We discuss two main variations of the MVC pattern that are aiming to complete decoupling of Model from View, as well as platforms supporting the development of MVC based applications. Our research is based on the original Virtual-MVC design pattern, in which we model the controller as middleware to achieve full decoupling of the model from view. Our main contribution is to demonstrate the development process of re-usable components in the framework of the V-MVC pattern, through a development platform that supports Virtual-MVC based applications.
Ruth Cortez, Alexander Vazhenin

Information and Knowledge Management II

Real-Time Traffic Video Analysis Using Intel Viewmont Coprocessor

Vision-based traffic flow analysis is getting more attention due to its non-intrusive nature. However, real-time video processing techniques are CPU-intensive so accuracy of extracted traffic flow data from such techniques may be sacrificed in practice. Moreover, the traffic measurements extracted from cameras have hardly been validated with real dataset due to the limited availability of real world traffic data. This study provides a case study to demonstrate the performance enhancement of vision-based traffic flow data extraction algorithm using a hardware device, Intel Viewmont video analytics coprocessor, and also to evaluate the accuracy of the extracted data by comparing them to real data from traffic loop detector sensors in Los Angeles County. Our experimental results show that comparable traffic flow data to existing sensor data can be obtained in a cost effective way with Viewmont hardware.
Seon Ho Kim, Junyuan Shi, Abdullah Alfarrarjeh, Daru Xu, Yuwei Tan, Cyrus Shahabi

Multimedia Framework for Application of Spatial Auditory Information to Learning Materials

We have been investigating a tabletop interface at which spatial auditory information is presented as well as visual information. This paper describes a case study of using this platform in a classroom to enrich learning materials. The hundred waka poems by one hundred poets, called Hyakunin-Isshu, which is the famous Japanese poetry anthology and known as an intellectual game at home and school, is selected as a subject. Digital cards are spread out at random on the tabletop, where each of them has the second half of a poem. The user makes trials of taking a card to be matched with the first half of the poem that is read by a reciter (speech synthesis software). Auditory cue is given as well, as a hint at the position where the right answer is placed when the learner cannot find the one. Development of the learning material is achieved through user testing with people who are specialized in education. This multimedia framework would help the learner keep interest in learning Hyakunin-Isshu or other school subjects in general.
Ryuji Yamaguchi, Ami Sakoi, Masahito Hirakawa

F-Modeling Environment: Acquisition Techniques for Obtaining Special-Purpose Features

Programming based on algorithmic pictures is an approach where pictures and moving pictures are used as super-characters for representing and explaining features of computational algorithms. Generic pictures are used to define compound pictures and compound pictures are assembled into special series for representing algorithmic features. Programming in algorithmic pictures is supported by F-modeling environment which functionality supports knowledge/experience acquisition through special galleries and libraries of an open type. Such acquisition permanently enhances intelligent aspects of the environment in general and allows obtaining necessary features required by special applications. In this paper a case study of transferring knowledge/experience into F-modeling environment is considered. It is based on introducing a set of new picture-based constructs for programming systems of robotic and embedded types.
Yutaka Watanobe, Nikolay Mirenkov

Bio-Medical Information Management

Quasi-Relational Query Language Interface for Persistent Standardized EHRs: Using NoSQL Databases

Interoperability of health data for information exhange is an area of growing concern. Various new standards such as CEN 13606, HL7 and OpenEHR have been proposed. The OpenEHR standard provides a Standardized Electronic Health Records (EHRs) schema using dual-level modelling for information exchange. The complex structured EHRs and the archetypes form the domain knowledge of the model. It gives rise to the issue of efficient and scalable persistence mechanism for these standardized EHRs. Further, it is desirable to support in-depth query-ability on them. The standardized EHRs database can support a wide range of user queries. In this paper, a persistence mechanism using a NoSQL database for storing the standardized EHRs has been proposed. Further, a high-level QBE-like AQBE (Archetype based Query-By-Example) has been evolved for the EHRs data repository.
Aastha Madaan, Wanming Chu, Yaginuma Daigo, Subhash Bhalla

Aspect Oriented Programming for Modularization of Concerns for Improving Interoperability in Healthcare Application

Service Oriented Architecture (SOA) is an ideal Web Services based solution for achieving efficient healthcare interoperability. However, incorporation of non-functional specifications such as logging, authorization, transaction etc. in web services based interoperable healthcare information system leads to code tangling (significant dependencies between system) and code scattering (code duplication) problems which reduces the revision and reuse of web services. Aspect Oriented Software Development is an emerging developing approach utilizing modularization to support rapid data interchange among various healthcare providers in a heterogeneous distributed environment. The visionary promise of Aspect Oriented Programming (AOP) is to increase overall quality of software design and implementation by decreasing the problems of code scattering and code tangling while maintaining high level of abstraction in enterprise application integration. The introduction of aspects substantially increases modularity and helps in achieving cleaner modularization of concerns. In this research we propose introduction of aspects in healthcare system and show how AOP helps in a cleaner design and substantial code savings in SOA based healthcare interoperability resulting in modularization of crosscutting concerns.
Usha Batra, Saurabh Mukherjee, Shelly Sachdeva, Pulkit Mehndiratta

Enhancing Access to Standardized Clinical Application for Mobile Interfaces

As Electronic Health Records (EHRs) become more prevalent in health care, research is needed to understand the efficacy within clinical contexts for a standard based health application. The current research explores ‘Opereffa’ to be used for handheld moveable devices. Opereffa stands for openEHR REFerence Framework and Application. It is a project for creating an open source clinical application, which will be driven by the Clinical Review Board of openEHR [2]. It is based on openEHR standard which combines structure of archetypes and terminology codes. This is the first effort for its exploration on mobile devices. The aim is to generate an application programming interface for Android based mobile for its testing on a sample set of archetypes. Later, we will extend this research to other mobile operating systems. The study has been done for increasing the usability and reach ability of EHRs. It enhances data sharing through mobile for standardized EHRs (through use of archetypes).
Hem Jyotsana Parashar, Shelly Sachdeva, Shivani Batra

Networked Information Systems (NIS): Infrastructure

Conversation Quantization as a Foundation of Conversational Intelligence

The long term goal of this research is to build artificial conversational intelligence that can set up or participate in the fluent conversational interactions as good as people in order to benefit each other. This paper discusses conversation quanta as a foundation of conversational intelligence. In contrast to conversational systems for which much emphasis has been placed on the symbolic processing and algorithms, our approach is data-intensive, allowing for the conversational system to acquire the depth and proficiency in interaction in an incremental fashion, in addition to the broad coverage and robustness.
Toyoaki Nishida

Securing a B+tree for Use with Secret-Shared Databases

Information revelations from databases may result not only from intrusions by external attackers but also from malicious actions by employees and even database administrators. A promising new approach to solving this problem is the use of secret-shared databases. In this approach, information is divided into unreadable snippets, and the snippets are stored in separate subdatabases, thereby making it difficult for external and internal attackers to steal the original information. A secret-shared database is secure unless k or more database administrators collude, where k is a predefined threshold. Any query that is executable for a conventional database is executable for the corresponding secret-shared database. However, retrieval (i.e., selection) of a record from a secret-shared database has a time complexity of O(m), where m is the number of records stored in the database. We used a B+tree, which is a standard data structure for efficiently retrieving data from conventional databases, to develop a secret-shared B+tree that enables data retrieval from secret-shared databases with O(logm) time complexity while maintaining the security provided by secret sharing.
Yutaka Nishiwaki, Ryo Kato, Hiroshi Yoshiura

Mining Popular Places in a Geo-spatial Region Based on GPS Data Using Semantic Information

The increasing availability of Global Positioning System (GPS) enabled devices has given an opportunity for learning patterns of human behavior from the GPS traces. This paper describes how to extract popular and significant places (locations) by analyzing the GPS traces of multiple users. In contrast to the existing techniques, this approach takes into account the semantic aspects of the places in order to find interesting places in a geo-spatial region. GPS traces of multiple users are used for mining the places which are frequently visited by multiple users. However, the semantic meanings, such as ‘historical monument’, ‘traffic signal’, etc can further improve the ranking of popular places. The end result is the ranked list of popular places in a given geo-spatial region. This information can be useful for recommending interesting places to the tourists, planning locations for advertisement hoardings, traffic planning, etc.
Sunita Tiwari, Saroj Kaushik

Scalable Method for k Optimal Meeting Points (k-OMP) Computation in the Road Network Databases

Given a set of points Q on a road network G = (V,E), an optimal meeting point (OMP) query offers a point on a road network with the smallest sum-of-distances (SoD) to all the points in Q. For example, a travel agency may issue OMP query to decide the location for a tourist bus to pick up the tourists thus minimizing the total travel cost for tourist. The OMP problem has been well studied in the Euclidean space. The currently available algorithms for solving this problem in the context of road networks are still not efficient for the practical applications and are in-memory algorithms which do not guarantee the scalability for the large road databases. Further, the most of the research work has been carried out around the single point OMP; however, the k-OMP problem on the road network setting is still unexplored. In this paper, we are proposing multiple variants of the scalable external-memory based algorithms for computing the optimal meeting point. There are mainly three variants of the proposed grid based algorithms i.e. Basic Grid based, Hierarchical Grid based and Greedy Centroid based OMP search. Later we used single point OMP as a start point to explore the k points OMP using breadth first search. The I/O optimized spatial grids are loaded from the secondary storage as and when required and hence the I/O complexity is reduced to O(N/B) as opposed to O(N) in the existing methods; where B is the average number of road vertices of the grid block. Extensive experiments are conducted on both real and synthetic datasets.
Shivendra Tiwari, Saroj Kaushik

Skyline Queries for Sets of Spatial Objects by Utilizing Surrounding Environments

A skyline query finds objects that are not dominated by another object from a given set of objects. Skyline queries help us to filter unnecessary information efficiently and provide us clues for various decision making tasks. However, conventional skyline query algorithms do not consider the surrounding environments for skyline computations though surrounding environments are as important as other attributes. Moreover, they can not protect the privacy of the individual’s and are not well suited for group choice.
Considering above facts, in this paper, we consider skyline queries for sets of spatial objects that take into account the surrounding environments. Our proposed method can retrieve sets of spatial objects without disclosing individual record’s values. We provide an extensive experimental evaluation considering computational cost to show the effectiveness of our approach.
Mohammad Shamsul Arefin, Yasuhiko Morimoto


Weitere Informationen

Premium Partner