2015 | OriginalPaper | Buchkapitel
Repairing Functional Dependency Violations in Distributed Data
verfasst von : Qing Chen, Zijing Tan, Chu He, Chaofeng Sha, Wei Wang
Erschienen in: Database Systems for Advanced Applications
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
One of the problems central to data consistency is data repairing. Given a database
$$D$$
violating a set
$$\Sigma $$
of data dependencies as data quality rules, it aims to modify
$$D$$
for a new relation
$$D'$$
satisfying
$$\Sigma $$
. When
$$D$$
is a centralized database, a host of methods have been provided to address this problem. In practice, a database may be fragmented and distributed to multiple sites, which is advocated by distributed systems for better scalability and is readily supported by commercial systems. This paper makes a first effort to develop techniques for repairing functional dependency violations in a horizontally partitioned database. (1) Based on a message-passing distributed computing model and two complexity measures (parallel time and data shipment) for distributed algorithms, we study data repairing with equivalence classes in the distributed setting. We show that it is NP-completeto build equivalence classes when the data is horizontally partitioned, and when we aim to minimize either data shipment or parallel computation time. (2) Despite the intractability, we propose efficient distributed algorithms and optimization techniques for data repairing based on equivalence classes. (3) We experimentally verify the effectiveness and efficiency of our algorithms, using both real-life and synthetic data.