Skip to main content
Top

2022 | OriginalPaper | Chapter

2. Functional-Dependency-Based Truth Discovery for Isomorphic Data

Authors : Chen Ye, Hongzhi Wang, Guojun Dai

Published in: Knowledge Discovery from Multi-Sourced Data

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

It is unavoidable that errors occur in databases. Reasons include recording errors, stale data, and even intentional errors. Such mistakes may cause serious consequences. It is impossible to correct those errors manually at scale. In fact, it is hard for people to even detect errors. However, since errors often occur rather randomly, they may cause inconsistencies within a database and conflicts among multiple databases from different sources. These inconsistencies and conflicts are easy to detect, but hard to repair. In this chapter, we first discuss two directions of work dealing with these inconsistencies and conflicts, namely data repairing and truth discovery. Then we introduce the idea of conducting functional-dependency-based truth discovery over multi-source data [1], which takes the advantages of both data repairing and truth discovery. Specifically, Sect. 2.1 discusses how existing methods resolve conflicts and inconsistencies and then motivates our approach. Section 2.2 defines the functional-dependency-based truth discovery problem, i.e., multi-source data repairing problem. Section 2.3 describes the overall framework and the details of each component in the framework, followed by a brief summary in Sect. 2.4.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
In this chapter, we consider the type of constraints as functional dependency. However, other types of constraints can also be adopted.
 
Literature
1.
go back to reference Ye, C., Li, Q., Zhang, H., Wang, H., Gao, J., Li, J.: AutoRepair: an automatic repairing approach over multi-source data. Knowl. Inf. Syst. 61(1), 227–257 (2019)CrossRef Ye, C., Li, Q., Zhang, H., Wang, H., Gao, J., Li, J.: AutoRepair: an automatic repairing approach over multi-source data. Knowl. Inf. Syst. 61(1), 227–257 (2019)CrossRef
2.
go back to reference Beskales, G., Ilyas, I.F., Golab, L.: Sampling the repairs of functional dependency violations under hard constraints. Proc. VLDB Endow. 3(1–2), 197–207 (2010) Beskales, G., Ilyas, I.F., Golab, L.: Sampling the repairs of functional dependency violations under hard constraints. Proc. VLDB Endow. 3(1–2), 197–207 (2010)
3.
go back to reference Bohannon, P., Flaster, M., Fan, W., Rastogi, R.: A cost-based model and effective heuristic for repairing constraints by value modification. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, June 14–16, pp. 143–154 (2005) Bohannon, P., Flaster, M., Fan, W., Rastogi, R.: A cost-based model and effective heuristic for repairing constraints by value modification. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, June 14–16, pp. 143–154 (2005)
4.
go back to reference Kolahi, S., Lakshmanan, L.V.S.: On approximating optimum repairs for functional dependency violations. In: Proceedings of the 12th International Conference on Database Theory, ICDT, March 23–25, pp. 53–62 (2009) Kolahi, S., Lakshmanan, L.V.S.: On approximating optimum repairs for functional dependency violations. In: Proceedings of the 12th International Conference on Database Theory, ICDT, March 23–25, pp. 53–62 (2009)
5.
go back to reference Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33(2), 6:1–6:48 (2008) Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33(2), 6:1–6:48 (2008)
6.
go back to reference Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. Proc. VLDB Endow. 3(1–2), 173–184 (2010) Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. Proc. VLDB Endow. 3(1–2), 173–184 (2010)
7.
go back to reference Chiang, F., Miller, R.J.: A unified model for data and constraint repair. In: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11–16, Hannover, pp. 446–457 (2011) Chiang, F., Miller, R.J.: A unified model for data and constraint repair. In: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11–16, Hannover, pp. 446–457 (2011)
8.
go back to reference Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A.K., Ilyas, I.F., Ouzzani, M., Tang, N.: NADEEF: a commodity data cleaning system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, June 22–27, pp. 541–552 (2013) Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A.K., Ilyas, I.F., Ouzzani, M., Tang, N.: NADEEF: a commodity data cleaning system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, June 22–27, pp. 541–552 (2013)
9.
go back to reference Geerts, F., Mecca, G., Papotti, P., Santoro, D.: The LLUNATIC data-cleaning framework. Proc. VLDB Endow. 6(9), 625–636 (2013)CrossRef Geerts, F., Mecca, G., Papotti, P., Santoro, D.: The LLUNATIC data-cleaning framework. Proc. VLDB Endow. 6(9), 625–636 (2013)CrossRef
10.
go back to reference Rekatsinas, T., Chu, X., Ilyas, I.F., Ré, C.: HoloClean: holistic data repairs with probabilistic inference. Proc. VLDB Endow. 10(11), 1190–1201 (2017) Rekatsinas, T., Chu, X., Ilyas, I.F., Ré, C.: HoloClean: holistic data repairs with probabilistic inference. Proc. VLDB Endow. 10(11), 1190–1201 (2017)
11.
go back to reference Yin, X., Han, J., Philip, S.Y.: Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Eng. 20(6), 796–808 (2008)CrossRef Yin, X., Han, J., Philip, S.Y.: Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Eng. 20(6), 796–808 (2008)CrossRef
12.
go back to reference Dong, X.L., Berti-Equille, L., Srivastava, D.: Truth discovery and copying detection in a dynamic world. Proc. VLDB Endow. 2(1), 562–573 (2009) Dong, X.L., Berti-Equille, L., Srivastava, D.: Truth discovery and copying detection in a dynamic world. Proc. VLDB Endow. 2(1), 562–573 (2009)
13.
go back to reference Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, Snowbird, June 22–27, pp. 1187–1198 (2014) Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, Snowbird, June 22–27, pp. 1187–1198 (2014)
14.
go back to reference Li, Y., Li, Q., Gao, J., Su, L., Zhao, B., Fan, W., Han, J.: On the discovery of evolving truth. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Aug 10–13, pp. 675–684 (2015) Li, Y., Li, Q., Gao, J., Su, L., Zhao, B., Fan, W., Han, J.: On the discovery of evolving truth. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Aug 10–13, pp. 675–684 (2015)
15.
go back to reference Zhao, B., Rubinstein, B.I.P., Gemmell, J., Han, J.: A Bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endow. 5(6), 550–561 (2012) Zhao, B., Rubinstein, B.I.P., Gemmell, J., Han, J.: A Bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endow. 5(6), 550–561 (2012)
16.
go back to reference Pasternack, J., Roth, D.: Making better informed trust decisions with generalized fact-finding. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, IJCAI 2011, Barcelona, July 16–22, pp. 2324–2329 (2011) Pasternack, J., Roth, D.: Making better informed trust decisions with generalized fact-finding. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, IJCAI 2011, Barcelona, July 16–22, pp. 2324–2329 (2011)
17.
go back to reference Li, Q., Li, Y., Gao, J., Su, L., Zhao, B., Demirbas, M., Fan, W., Han, J.: A confidence-aware approach for truth discovery on long-tail data. Proc. VLDB Endow. 8(4), 425–436 (2014) Li, Q., Li, Y., Gao, J., Su, L., Zhao, B., Demirbas, M., Fan, W., Han, J.: A confidence-aware approach for truth discovery on long-tail data. Proc. VLDB Endow. 8(4), 425–436 (2014)
Metadata
Title
Functional-Dependency-Based Truth Discovery for Isomorphic Data
Authors
Chen Ye
Hongzhi Wang
Guojun Dai
Copyright Year
2022
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-19-1879-7_2

Premium Partner