Skip to main content
Top

2022 | OriginalPaper | Chapter

3. Denial-Constraint-Based Truth Discovery for Isomorphic Data

Authors : Chen Ye, Hongzhi Wang, Guojun Dai

Published in: Knowledge Discovery from Multi-Sourced Data

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Aggregating accurate information from multi-source conflicting data is crucial. A common approach to address this problem is Voting/Averaging. However, such methods usually fail to achieve correct results, since they assume that all the sources are equally reliable. In most cases, the information quality usually varies a lot among diversified sources, due to the existence of different levels of errors such as recording errors, outdated data , and even intentional errors in each source. Based on the above observation, a research topic named truth discovery has been proposed. Considering relations among entities and attributes are commonly existing in the real-world applications, in this chapter, we introduce the constrained truth discovery problem [1]. We incorporate denial constraints, a universally quantified first-order logic formalism which can express a large number of effective and widely existing relations among entities, into the process of truth discovery. Specifically, we give a motivate example and define the problem in Sects. 3.1 and 3.2, respectively. In Sect. 3.3, we investigate the constrained optimization problem and provide solutions to the optimization problem. Finally, we conclude this chapter in Sect. 3.4.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
The information is obtained according to the US federal salary minimum for exemption.
 
2
Note that in this chapter, we are only interested in DCs with at most two entities. DCs involving more entities are less likely in real life and incur larger predicate space to find the truths [13].
 
Literature
1.
go back to reference Ye, C., Wang, H., Zheng, K., Kong, Y.K., Zhu, R., Gao, J., Li, J.: Constrained truth discovery. IEEE Trans. Knowl. Data Eng. 34(1), 205–218 (2022)CrossRef Ye, C., Wang, H., Zheng, K., Kong, Y.K., Zhu, R., Gao, J., Li, J.: Constrained truth discovery. IEEE Trans. Knowl. Data Eng. 34(1), 205–218 (2022)CrossRef
2.
go back to reference Yin, X., Han, J., Philip, S.Y.: Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Eng. 20(6), 796–808 (2008) Yin, X., Han, J., Philip, S.Y.: Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Eng. 20(6), 796–808 (2008)
3.
go back to reference Li, Y., Gao, J., Meng, C., Li, Q., Su, L., Zhao, B., Fan, W., Han, J.: A survey on truth discovery. SIGKDD Explor. 17(2), 1–16 (2015) Li, Y., Gao, J., Meng, C., Li, Q., Su, L., Zhao, B., Fan, W., Han, J.: A survey on truth discovery. SIGKDD Explor. 17(2), 1–16 (2015)
4.
go back to reference Dong, X.L., Berti-Equille, L., Srivastava, D.: Truth discovery and copying detection in a dynamic world. Proc. VLDB Endow. 2(1), 562–573 (2009) Dong, X.L., Berti-Equille, L., Srivastava, D.: Truth discovery and copying detection in a dynamic world. Proc. VLDB Endow. 2(1), 562–573 (2009)
5.
go back to reference Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views. In: Proceedings of the 3rd International Conference on Web Search and Web Data Mining, WSDM 2010, New York, Feb 4–6, pp. 131–140 (2010) Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views. In: Proceedings of the 3rd International Conference on Web Search and Web Data Mining, WSDM 2010, New York, Feb 4–6, pp. 131–140 (2010)
6.
go back to reference Li, Q., Li, Y., Gao, J., Su, L., Zhao, B., Demirbas, M., Fan, W., Han, J.: A confidence-aware approach for truth discovery on long-tail data. Proc. VLDB Endow. 8(4), 425–436 (2014) Li, Q., Li, Y., Gao, J., Su, L., Zhao, B., Demirbas, M., Fan, W., Han, J.: A confidence-aware approach for truth discovery on long-tail data. Proc. VLDB Endow. 8(4), 425–436 (2014)
7.
go back to reference Li, X., Dong, X.L., Lyons, K.B., Meng, W., Srivastava, D.: Truth finding on the deep web: is the problem solved? Proc. VLDB Endow. (2013) Li, X., Dong, X.L., Lyons, K.B., Meng, W., Srivastava, D.: Truth finding on the deep web: is the problem solved? Proc. VLDB Endow. (2013)
8.
go back to reference Rekatsinas, T., Joglekar, M., Garcia-Molina, H., Parameswaran, A.G., Ré, C.: SLiMFast: guaranteed results for data fusion and source reliability. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, May 14–19, pp. 1399–1414 (2017) Rekatsinas, T., Joglekar, M., Garcia-Molina, H., Parameswaran, A.G., Ré, C.: SLiMFast: guaranteed results for data fusion and source reliability. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, May 14–19, pp. 1399–1414 (2017)
9.
go back to reference Xiao, H., Gao, J., Li, Q., Ma, F., Su, L., Feng, Y., Zhang, A.: Towards confidence in the truth: a bootstrapping based truth discovery approach. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, Aug 13–17, pp. 1935–1944 (2016) Xiao, H., Gao, J., Li, Q., Ma, F., Su, L., Feng, Y., Zhang, A.: Towards confidence in the truth: a bootstrapping based truth discovery approach. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, Aug 13–17, pp. 1935–1944 (2016)
10.
go back to reference Zhao, B., Han, J.: A probabilistic model for estimating real-valued truth from conflicting sources. In: International Workshop on Quality in Databases (2012) Zhao, B., Han, J.: A probabilistic model for estimating real-valued truth from conflicting sources. In: International Workshop on Quality in Databases (2012)
11.
go back to reference Zhao, B., Rubinstein, B.I.P., Gemmell, J., Han, J.: A Bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endow. 5(6), 550–561 (2012) Zhao, B., Rubinstein, B.I.P., Gemmell, J., Han, J.: A Bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endow. 5(6), 550–561 (2012)
12.
go back to reference Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, Snowbird, June 22–27, pp. 1187–1198 (2014) Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, Snowbird, June 22–27, pp. 1187–1198 (2014)
13.
go back to reference Chu, X., Ilyas, I.F, Papotti, P.: Discovering denial constraints. Proc. VLDB Endow. 6(13), 1498–1509 (2013) Chu, X., Ilyas, I.F, Papotti, P.: Discovering denial constraints. Proc. VLDB Endow. 6(13), 1498–1509 (2013)
14.
go back to reference Garey, M.R., Johnson, D.S.: Computers and Intractability. W. H. Freeman (1979) Garey, M.R., Johnson, D.S.: Computers and Intractability. W. H. Freeman (1979)
15.
go back to reference Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press (2004) Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press (2004)
16.
go back to reference Johnson, D.S., Aragon, C.R., McGeoch, L.A., Schevon, C.: Optimization by simulated annealing: An experimental evaluation; part ii, graph coloring and number partitioning. Oper. Res. 39(3), 378–406 Johnson, D.S., Aragon, C.R., McGeoch, L.A., Schevon, C.: Optimization by simulated annealing: An experimental evaluation; part ii, graph coloring and number partitioning. Oper. Res. 39(3), 378–406
17.
go back to reference Andersen, M., Dahl, J., Liu, Z., Vandenberghe, L.: Interior-point methods for large-scale cone programming. Optim. Mach. Learn. 5583, 55–83 (2012) Andersen, M., Dahl, J., Liu, Z., Vandenberghe, L.: Interior-point methods for large-scale cone programming. Optim. Mach. Learn. 5583, 55–83 (2012)
18.
go back to reference Kozlov, M.K., Tarasov, S.P., Khachiyan, L.G.: The polynomial solvability of convex quadratic programming. USSR Comput. Math. Math. Phys. 20(5), 223–228 (1980) Kozlov, M.K., Tarasov, S.P., Khachiyan, L.G.: The polynomial solvability of convex quadratic programming. USSR Comput. Math. Math. Phys. 20(5), 223–228 (1980)
19.
go back to reference Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33(2), 6:1–6:48 (2008) Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33(2), 6:1–6:48 (2008)
20.
go back to reference Bleifuß, T., Kruse, S., Naumann, F.: Efficient denial constraint discovery with hydra. Proc. VLDB Endow. 11(3), 311–323 (2017)CrossRef Bleifuß, T., Kruse, S., Naumann, F.: Efficient denial constraint discovery with hydra. Proc. VLDB Endow. 11(3), 311–323 (2017)CrossRef
21.
go back to reference Lin, X., Chen, L.: Domain-aware multi-truth discovery from conflicting sources. PVLDB 11(5), 635–647 (2018) Lin, X., Chen, L.: Domain-aware multi-truth discovery from conflicting sources. PVLDB 11(5), 635–647 (2018)
22.
go back to reference Li, Y., Li, Q., Gao, J., Su, L., Zhao, B., Fan, W., Han, J.: On the discovery of evolving truth. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Aug 10–13, pp. 675–684 (2015) Li, Y., Li, Q., Gao, J., Su, L., Zhao, B., Fan, W., Han, J.: On the discovery of evolving truth. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Aug 10–13, pp. 675–684 (2015)
Metadata
Title
Denial-Constraint-Based Truth Discovery for Isomorphic Data
Authors
Chen Ye
Hongzhi Wang
Guojun Dai
Copyright Year
2022
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-19-1879-7_3

Premium Partner