Skip to main content

2022 | Buch

Software Foundations for Data Interoperability

5th International Workshop, SFDI 2021, Copenhagen, Denmark, August 16, 2021, Revised Selected Papers

insite
SUCHEN

Über dieses Buch

This book constitutes selected papers presented at the 5th International Workshop on Software Foundations for Data Interoperability, SFDI 2021, held in Copenhagen, Denmark, in August 2021.
The 4 full papers and one short paper were thorougly reviewed and selected from 8 submissions. They present discussions in research and development in software foundations for data interoperability as well as the applications in real-world systems such as data markets.

Inhaltsverzeichnis

Frontmatter

Invited Papers

Frontmatter
Theory and Practice of Networks of Models
Abstract
Separating concerns into multiple data sources, such as multiple models of a software system under development, enables people to work in parallel. However, concerns must also be re-integrated, and this gives rise to many interesting problems, both theoretical and practical. In my keynote talk i will discuss some of them; in this accompanying paper i give some background to my work in recent years and summarise some key points and definitions.
Perdita Stevens
Bidirectional Collaborative Frameworks for Decentralized Data Management
Abstract
Along with the continuous evolution of data management systems for the new market requirements, we are moving from centralized systems towards decentralized systems, where data are maintained in different sites with autonomous storage and computation capabilities. There are two fundamental issues with such decentralized systems: local privacy and global consistency. By local privacy, the data owner wishes to control what information should be exposed and how it should be used or updated by other peers. By global consistency, the systems wish to have a globally consistent and integrated view of all data. In this paper, we report the progress of our BISCUITS (Bidirectional Information Systems for Collaborative, Updatable, Interoperable, and Trusted Sharing) project that attempts to systematically solve these two issues in distributed systems. We present a new bidirectional transformation-based approach to control and share distributed data, propose several distributed architectures for data integration via bidirectional updatable views, and demonstrate the applications of these architectures in ride-sharing alliances and gig job sites.
Yasuhito Asano, Yang Cao, Soichiro Hidaka, Zhenjiang Hu, Yasunori Ishihara, Hiroyuki Kato, Keisuke Nakano, Makoto Onizuka, Yuya Sasaki, Toshiyuki Shimizu, Masato Takeichi, Chuan Xiao, Masatoshi Yoshikawa

Contributed Papers

Frontmatter
Robust Cardinality Estimator by Non-autoregressive Model
Abstract
In database systems, cardinality estimation is a fundamental technology that significantly impacts query performance. Recently, machine-learning techniques are employed for cardinality estimation, which learns dependencies among attributes. However, they have a problem that the estimation accuracy is unstable and the inference speed is slow. In this paper, we propose a stable and fast cardinality estimation method that learns dependencies among attributes by a non-autoregressive model and performs estimation in fewer steps and proper order according to a given query at the inference phase.
Ryuichi Ito, Chuan Xiao, Makoto Onizuka
Conflict Resolution for Data Updates by Multiple Bidirectional Transformations
Abstract
Bidirectional transformation (BX) is a robust mechanism to propagate changes of data across the transformation while maintaining consistency between two or more data sources. Recently, systems that coordinate multiple BX have been proposed. However, conflicts when multiple BX update the same source have not been well studied yet. In this paper, we propose a new conflict resolution method for BX based on an algorithm of Operational Transformation (OT). We apply the algorithm of OT to the backward transformation of the tree duplication primitive \(\mathrm {Dup}\) of an existing BX language X proposed by Hu et al. X had been shown, by embedding into Inv, which is capable of maintaining structured documents such as XML through BX, to allow more flexible operations by Inv’s bidirectionality-satisfying operations. Using this mechanism, we propose a new backward transformation function \( mput \) that can resolve conflicts between updates on two views. Our implementation of \( mput \) can simultaneously satisfy the properties GETPUTGET and PUTGETPUT, which are the round-tripping properties inherited from X, and TP1, which is a confluence property inherited from OT for server-client environment.
Mikiya Habu, Soichiro Hidaka
Entity Matching with String Transformation and Similarity-Based Features
Abstract
Entity matching is an important task in common data cleaning and data integration problems of determining two records that refer to the same real-world entity. Many research use string similarity as features to infer entity matching but the power of the similarity may be affected by the pairs of hard-to-classify entities, which are actually different entities but have a high similarity or the same entity with low similarity. String transformation is a good solution to solve different representations between two domains of datasets, such as abbreviations, misspellings, and other expressions.
In this paper, we propose two powerful features, similarity gain and dissimilarity gain, that enables us to discriminate whether the two entities refer to the same entity after string transformation. The similarity gain is defined by the maximum amount of similarity increase among the variations in similarity before and after applying string transformations. The dissimilarity is defined by the maximum amount of similarity decrease. Moreover, the similarity gain and dissimilarity gain can also be used for selecting valuable samples in a limited labeling budget. Sufficient experiments are conducted, and our method with the proposed features improves the best accuracy in most cases.
Kazunori Sakai, Yuyang Dong, Masafumi Oyamada, Kunihiro Takeoka, Takeshi Okadome
Towards Automatic Synthesis of View Update Programs on Relations
Abstract
Automatic synthesis of bidirectional programs on relations has not been well studied yet. As an attempt to solve the problem, we propose an approach to synthesizing view update strategies on relations written in Datalog from examples and data schemes. Our approach has been implemented and used to successfully synthesize various view update tasks on relations.
Bach Nguyen Trong, Zhenjiang Hu
Adaptive SQL Query Optimization in Distributed Stream Processing: A Preliminary Study
Abstract
Distributed stream processing is widely adopted for real-time data analysis and management. SQL is becoming a common language for robust streaming analysis due to the introduction of time-varying relations and event time semantics. However, query optimization in state-of-the-art stream processing engines (SPEs) remains limited: runtime adjustments to execution plans are not applied. This fact restricts the optimization capabilities because SPEs lack the statistical data properties before query execution begins. Moreover, streaming queries are often long-lived, and these properties can change over time.
Adaptive optimization, used in databases for queries with insufficient or unknown data statistics, can fit the streaming scenario. In this work, we explore the main challenges that SPEs face during the adjustment of adaptive optimization, such as predicting statistical properties of streams and execution graph migration. We demonstrate potential performance gains of our approach within an extension of the NEXMark streaming benchmark and outline our further work.
Darya Sharkova, Alexander Chernokoz, Artem Trofimov, Nikita Sokolov, Ekaterina Gorshkova, Igor Kuralenok, Boris Novikov
Backmatter
Metadaten
Titel
Software Foundations for Data Interoperability
herausgegeben von
Dr. George Fletcher
Keisuke Nakano
Dr. Yuya Sasaki
Copyright-Jahr
2022
Electronic ISBN
978-3-030-93849-9
Print ISBN
978-3-030-93848-2
DOI
https://doi.org/10.1007/978-3-030-93849-9