Skip to main content
Erschienen in: World Wide Web 4/2019

22.08.2018

SmartVote: a full-fledged graph-based model for multi-valued truth discovery

verfasst von: Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Dianhui Chu, Anne H. H. Ngu

Erschienen in: World Wide Web | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the era of Big Data, truth discovery has emerged as a fundamental research topic, which estimates data veracity by determining the reliability of multiple, often conflicting data sources. Although considerable research efforts have been conducted on this topic, most current approaches assume only one true value for each object. In reality, objects with multiple true values widely exist and the existing approaches that cope with multi-valued objects still lack accuracy. In this paper, we propose a full-fledged graph-based model, SmartVote, which models two types of source relations with additional quantification to precisely estimate source reliability for effective multi-valued truth discovery. Two graphs are constructed and further used to derive different aspects of source reliability (i.e., positive precision and negative precision) via random walk computations. Our model incorporates four important implications, including two types of source relations, object popularity, loose mutual exclusion, and long-tail phenomenon on source coverage, to pursue better accuracy in truth discovery. Empirical studies on two large real-world datasets demonstrate the effectiveness of our approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
In this paper we focus on the parent-children relation in the dataset because it corresponds to multi-valued objects.
 
2
Note that this probability is based on a prior knowledge that s1 and s2 each provides a true value, which is different from the probability of two sources s1 and s2 independently provide the same true value.
 
3
Here we neglect the smoothing links, if there is no common value between two sources, there is no link between them in the graphs.
 
4
We neglect the confidence scores of each source and omit the dependence score normalization step in this example.
 
6
Such values are then normalized to represent probabilities.
 
7
For Voting, we predict the number of true values as the number with the highest vote counts.
 
8
Note that there are overlaps among those categories. For example, Investment belongs to both Web-link based methods and iterative methods.
 
Literatur
1.
Zurück zum Zitat Benslimane, D., et al.: The uncertain Web: concepts, challenges, and current solutions. ACM Transactions on Internet Technology (TOIT) 16(1), 1 (2015)CrossRef Benslimane, D., et al.: The uncertain Web: concepts, challenges, and current solutions. ACM Transactions on Internet Technology (TOIT) 16(1), 1 (2015)CrossRef
2.
Zurück zum Zitat Bleiholder, J., Naumann, F.: Conflict handling strategies in an integrated information system. In: Proceedings of the Intelligence Workshop on Information Integration on the Web (IIWeb) (2006) Bleiholder, J., Naumann, F.: Conflict handling strategies in an integrated information system. In: Proceedings of the Intelligence Workshop on Information Integration on the Web (IIWeb) (2006)
3.
Zurück zum Zitat Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys (CSUR) 41(1), 1–41 (2009)CrossRef Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys (CSUR) 41(1), 1–41 (2009)CrossRef
4.
Zurück zum Zitat Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)CrossRef Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)CrossRef
5.
Zurück zum Zitat Dong, X.L., Berti-Equille, L., Hu, Y., Srivastava, D.: Global detection of complex copying relationships between sources. Proc. VLDB Endowment 3(1-2), 1358–1369 (2010)CrossRef Dong, X.L., Berti-Equille, L., Hu, Y., Srivastava, D.: Global detection of complex copying relationships between sources. Proc. VLDB Endowment 3(1-2), 1358–1369 (2010)CrossRef
6.
Zurück zum Zitat Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating conflicting data: the role of source dependence. Proc. VLDB Endowment 2(1), 550–561 (2009)CrossRef Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating conflicting data: the role of source dependence. Proc. VLDB Endowment 2(1), 550–561 (2009)CrossRef
7.
Zurück zum Zitat Dong, X.L., Berti-Equille, L., Srivastava, D.: Truth discovery and copying detection in a dynamic world. Proc. VLDB Endowment 2(1), 562–573 (2009)CrossRef Dong, X.L., Berti-Equille, L., Srivastava, D.: Truth discovery and copying detection in a dynamic world. Proc. VLDB Endowment 2(1), 562–573 (2009)CrossRef
8.
Zurück zum Zitat Dong, X.L., Naumann, F.: Data fusion: resolving data conflicts for integration. Proc. VLDB Endowment 2(2), 1654–1655 (2009)CrossRef Dong, X.L., Naumann, F.: Data fusion: resolving data conflicts for integration. Proc. VLDB Endowment 2(2), 1654–1655 (2009)CrossRef
9.
Zurück zum Zitat Dong, X.L., Saha, B., Srivastava, D.: Less is more: selecting sources wisely for integration. Proc. VLDB Endowment 6(2), 37–48 (2012)CrossRef Dong, X.L., Saha, B., Srivastava, D.: Less is more: selecting sources wisely for integration. Proc. VLDB Endowment 6(2), 37–48 (2012)CrossRef
10.
Zurück zum Zitat Dong, X.L., et al.: From data fusion to knowledge fusion. Proc. VLDB Endowment 7(10), 881–892 (2014)CrossRef Dong, X.L., et al.: From data fusion to knowledge fusion. Proc. VLDB Endowment 7(10), 881–892 (2014)CrossRef
11.
Zurück zum Zitat Dong, X.L., et al.: Knowledge vault: a Web-scale approach to probabilistic knowledge fusion. In: Proceedings of the ACM SIGKDD Intelligence Conference on Knowledge Discovery and Data Mining, pp 601–610 (2014) Dong, X.L., et al.: Knowledge vault: a Web-scale approach to probabilistic knowledge fusion. In: Proceedings of the ACM SIGKDD Intelligence Conference on Knowledge Discovery and Data Mining, pp 601–610 (2014)
12.
Zurück zum Zitat Fan, W.: Data quality: theory and practice. In: Web-Age Information Management, pp 1–16 (2012) Fan, W.: Data quality: theory and practice. In: Web-Age Information Management, pp 1–16 (2012)
13.
Zurück zum Zitat Fan, W., et al.: Data quality problems beyond consistency and duduplication. In: Search of Elegance in the Theory and Practice of Computation, pp 237–249 (2013) Fan, W., et al.: Data quality problems beyond consistency and duduplication. In: Search of Elegance in the Theory and Practice of Computation, pp 237–249 (2013)
14.
Zurück zum Zitat Fang, X.S.: Generating actionable knowledge from big data. In: Proceedings of the 2015 SIGMOD Phd Symposium (SIGMOD), pp 3–8 (2015) Fang, X.S.: Generating actionable knowledge from big data. In: Proceedings of the 2015 SIGMOD Phd Symposium (SIGMOD), pp 3–8 (2015)
15.
Zurück zum Zitat Fang, X.S., Sheng, Q.Z., Wang, X., Ngu, A.H.: Value veracity estimation for multi-truth ojbects via a graph-based approach. In: Proceedings of the Intelligence World Wide Web Conference (WWW), pp 217–226 (2017) Fang, X.S., Sheng, Q.Z., Wang, X., Ngu, A.H.: Value veracity estimation for multi-truth ojbects via a graph-based approach. In: Proceedings of the Intelligence World Wide Web Conference (WWW), pp 217–226 (2017)
16.
Zurück zum Zitat Fang, X.S., Wang, X., Sheng, Q.Z.: Ontology augmentation via attribute extraction from multiple types of sources. In: Proceedings of the 26Th Australasian Database Conference (ADC), pp 16–27 (2015) Fang, X.S., Wang, X., Sheng, Q.Z.: Ontology augmentation via attribute extraction from multiple types of sources. In: Proceedings of the 26Th Australasian Database Conference (ADC), pp 16–27 (2015)
17.
Zurück zum Zitat Galland, A., et al.: Corroborating information from disagreeing views. In: Proceedings of the ACM Intelligence Conference on Web Search and Data Mining (WSDM), pp 131–140 (2010) Galland, A., et al.: Corroborating information from disagreeing views. In: Proceedings of the ACM Intelligence Conference on Web Search and Data Mining (WSDM), pp 131–140 (2010)
18.
Zurück zum Zitat Gao, J., Li, Q., Zhao, B., Fan, W., Han, J.: Truth discovery and crowdsourcing aggregation: a unified perspective. Proc. VLDB Endowment 8(12), 2048–2049 (2015)CrossRef Gao, J., Li, Q., Zhao, B., Fan, W., Han, J.: Truth discovery and crowdsourcing aggregation: a unified perspective. Proc. VLDB Endowment 8(12), 2048–2049 (2015)CrossRef
19.
Zurück zum Zitat Gleich, D.F., et al.: Tracking the random surfer: empirically measured teleportation parameters in pagerank. In: Proceedings of the Intelligence World Wide Web Conference (WWW), pp 381–390 (2010) Gleich, D.F., et al.: Tracking the random surfer: empirically measured teleportation parameters in pagerank. In: Proceedings of the Intelligence World Wide Web Conference (WWW), pp 381–390 (2010)
20.
Zurück zum Zitat Gwet, K.L.: Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among raters. Adv. Anal. LLC 4, 57–64 (2014) Gwet, K.L.: Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among raters. Adv. Anal. LLC 4, 57–64 (2014)
22.
Zurück zum Zitat Li, Q., et al.: A confidence-aware approach for truth discovery on long-tail data. Proc. VLDB Endowment 8(4), 425–436 (2014)CrossRef Li, Q., et al.: A confidence-aware approach for truth discovery on long-tail data. Proc. VLDB Endowment 8(4), 425–436 (2014)CrossRef
23.
Zurück zum Zitat Li, Q., et al.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings ACM SIGMOD Intelligence Conference on Management of Data, pp 1187–1198 (2014) Li, Q., et al.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings ACM SIGMOD Intelligence Conference on Management of Data, pp 1187–1198 (2014)
24.
Zurück zum Zitat Li, X., Dong, X.L., Lyons, K., Meng, W., Srivastava, D.: Truth finding on the deep web: is the problem solved?. Proc. VLDB Endowment 6(2), 97–108 (2012)CrossRef Li, X., Dong, X.L., Lyons, K., Meng, W., Srivastava, D.: Truth finding on the deep web: is the problem solved?. Proc. VLDB Endowment 6(2), 97–108 (2012)CrossRef
25.
Zurück zum Zitat Li, X., et al.: Scaling up copy detection. In: IEEE Intelligence Conference on Data Engineering (ICDE), pp 89–100 (2015) Li, X., et al.: Scaling up copy detection. In: IEEE Intelligence Conference on Data Engineering (ICDE), pp 89–100 (2015)
26.
Zurück zum Zitat Li, Y., et al.: A survey on truth discovery. ACM SIGKDD Explor. Newslett. 17(2), 1–16 (2016)CrossRef Li, Y., et al.: A survey on truth discovery. ACM SIGKDD Explor. Newslett. 17(2), 1–16 (2016)CrossRef
27.
Zurück zum Zitat Liu, X., et al.: Online data fusion. Proc. VLDB Endowment 4(11), 932–943 (2011) Liu, X., et al.: Online data fusion. Proc. VLDB Endowment 4(11), 932–943 (2011)
28.
Zurück zum Zitat Mukherjee, S., et al.: People on drugs: credibility of user statements in health communities. In: ACM SIGKDD Intelligence Conference on Knowledge Discovery and Data Mining, pp 65–74 (2014) Mukherjee, S., et al.: People on drugs: credibility of user statements in health communities. In: ACM SIGKDD Intelligence Conference on Knowledge Discovery and Data Mining, pp 65–74 (2014)
29.
Zurück zum Zitat Naumann, F., et al.: Data fusion in three steps: resolving schema, tuple, and value inconsistencies. IEEE Data Eng. Bull. 29(2), 21–31 (2006) Naumann, F., et al.: Data fusion in three steps: resolving schema, tuple, and value inconsistencies. IEEE Data Eng. Bull. 29(2), 21–31 (2006)
30.
Zurück zum Zitat Pasternack, J., Roth, D.: Knowing what to believe (when you already know something). In: Proceedings of Intelligent Conference on Computational Linguistics (COLING), pp 877–885 (2010) Pasternack, J., Roth, D.: Knowing what to believe (when you already know something). In: Proceedings of Intelligent Conference on Computational Linguistics (COLING), pp 877–885 (2010)
31.
Zurück zum Zitat Pochampally, R., et al.: Fusing data with correlations. In: Proceedings of the ACM SIGMOD Intelligent Conference on Management of Data, pp 433–444 (2014) Pochampally, R., et al.: Fusing data with correlations. In: Proceedings of the ACM SIGMOD Intelligent Conference on Management of Data, pp 433–444 (2014)
32.
Zurück zum Zitat Popat, K., Mukherjee, S., Strötgen, J., Weikum, G.: Where the truth lies: explaining the credibility of emerging claims on the Web and social media. In: Proceedings Intelligent World Wide Web Conference (WWW), pp 1003–1012 (2017) Popat, K., Mukherjee, S., Strötgen, J., Weikum, G.: Where the truth lies: explaining the credibility of emerging claims on the Web and social media. In: Proceedings Intelligent World Wide Web Conference (WWW), pp 1003–1012 (2017)
33.
Zurück zum Zitat Rozenshtein, P., Anagnostopoulos, A., Gionis, A., Tatti, N.: Event detection in activity networks. In: Proceedings of the ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1176–1185 (2014) Rozenshtein, P., Anagnostopoulos, A., Gionis, A., Tatti, N.: Event detection in activity networks. In: Proceedings of the ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1176–1185 (2014)
34.
Zurück zum Zitat Waguih, D.A., Berti-Equille, L.: Truth discovery algorithms: an experimental evaluation. arXiv:1409.6428 (2014) Waguih, D.A., Berti-Equille, L.: Truth discovery algorithms: an experimental evaluation. arXiv:1409.​6428 (2014)
35.
Zurück zum Zitat Wan, M., et al.: From truth discovery to trustworthy opinion discovery: an uncertainty-aware quantitative modeling approach. In: Proceedings of the ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1885–1894 (2016) Wan, M., et al.: From truth discovery to trustworthy opinion discovery: an uncertainty-aware quantitative modeling approach. In: Proceedings of the ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1885–1894 (2016)
36.
Zurück zum Zitat Wang, X., et al.: An integrated Bayesian approach for effective multi-truth discovery. In: Proceedings the 24th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 493–502 (2015) Wang, X., et al.: An integrated Bayesian approach for effective multi-truth discovery. In: Proceedings the 24th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 493–502 (2015)
37.
Zurück zum Zitat Wang, X., et al: Empowering truth discovery with multi-truth prediction. In: Proceedings the 25th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 881–890 (2016) Wang, X., et al: Empowering truth discovery with multi-truth prediction. In: Proceedings the 25th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 881–890 (2016)
38.
Zurück zum Zitat Wang, X., et al.: Truth discovery via exploiting implications from multi-source data. In: Proceedings the 25th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 861–870 (2016) Wang, X., et al.: Truth discovery via exploiting implications from multi-source data. In: Proceedings the 25th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 861–870 (2016)
39.
Zurück zum Zitat Xiao, H., Gao, J., Li, Q., Ma, F., Su, L., Feng, Y., Zhang, A.: Towards confidence in the truth: a bootstrapping based truth discovery approach. In: Proceedings ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1935–1944 (2016) Xiao, H., Gao, J., Li, Q., Ma, F., Su, L., Feng, Y., Zhang, A.: Towards confidence in the truth: a bootstrapping based truth discovery approach. In: Proceedings ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1935–1944 (2016)
40.
Zurück zum Zitat Xiao, H., Gao, J., Wang, Z., Wang, S., Su, L., Liu, H.: A truth discovery approach with theoretical guarantee. In: Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp 1925–1934 (2016) Xiao, H., Gao, J., Wang, Z., Wang, S., Su, L., Liu, H.: A truth discovery approach with theoretical guarantee. In: Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp 1925–1934 (2016)
41.
Zurück zum Zitat Yin, X., Han, J., Yu, P.S.: Truth discovery with multiple conflicting information providers on the Web. IEEE Transactions on Knowledge and Data Engineering (TKDE) 20(6), 796–808 (2008)CrossRef Yin, X., Han, J., Yu, P.S.: Truth discovery with multiple conflicting information providers on the Web. IEEE Transactions on Knowledge and Data Engineering (TKDE) 20(6), 796–808 (2008)CrossRef
42.
Zurück zum Zitat Yin, X., et al.: Semi-supervised truth discovery. In: Proceedings Intelligent World Wide Web Conference (WWW), pp 217–226 (2011) Yin, X., et al.: Semi-supervised truth discovery. In: Proceedings Intelligent World Wide Web Conference (WWW), pp 217–226 (2011)
43.
Zurück zum Zitat Yu, D., et al.: The wisdom of minority: unsupervised slot filling validation based on multi-dimensional truth-finding. In: Proceedings Intelligent Conference on Computational Linguistics (COLING), pp 1567–1578 (2014) Yu, D., et al.: The wisdom of minority: unsupervised slot filling validation based on multi-dimensional truth-finding. In: Proceedings Intelligent Conference on Computational Linguistics (COLING), pp 1567–1578 (2014)
44.
Zurück zum Zitat Zhang, H., Li, Q., Ma, F., Xiao, H., Li, Y., Gao, J., Su, L.: Influence-aware truth discovery. In: Proceedings the 25th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 851–860 (2016) Zhang, H., Li, Q., Ma, F., Xiao, H., Li, Y., Gao, J., Su, L.: Influence-aware truth discovery. In: Proceedings the 25th ACM Intelligent Conference on Information and Knowledge Management (CIKM), pp 851–860 (2016)
45.
Zurück zum Zitat Zhao, B., Han, J.: A probabilistic model for estimating real-valued truth from conflicting sources. In: Proceedings of the Intelligent Workshop on Quality in Databases (QDB), Coheld with VLDB (2012) Zhao, B., Han, J.: A probabilistic model for estimating real-valued truth from conflicting sources. In: Proceedings of the Intelligent Workshop on Quality in Databases (QDB), Coheld with VLDB (2012)
46.
Zurück zum Zitat Zhao, B., Rubinstein, B.I., Gemmell, J., Han, J.: A bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endowment 5(6), 550–561 (2012)CrossRef Zhao, B., Rubinstein, B.I., Gemmell, J., Han, J.: A bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endowment 5(6), 550–561 (2012)CrossRef
47.
Zurück zum Zitat Zhi, S., Zhao, B., Tong, W., Gao, J., Yu, D., Ji, H., Han, J.: Modeling truth existence in truth discovery. In: Proceedings ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1543–1552 (2015) Zhi, S., Zhao, B., Tong, W., Gao, J., Yu, D., Ji, H., Han, J.: Modeling truth existence in truth discovery. In: Proceedings ACM SIGKDD Intelligent Conference on Knowledge Discovery and Data Mining, pp 1543–1552 (2015)
Metadaten
Titel
SmartVote: a full-fledged graph-based model for multi-valued truth discovery
verfasst von
Xiu Susie Fang
Quan Z. Sheng
Xianzhi Wang
Dianhui Chu
Anne H. H. Ngu
Publikationsdatum
22.08.2018
Verlag
Springer US
Erschienen in
World Wide Web / Ausgabe 4/2019
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-018-0629-3

Weitere Artikel der Ausgabe 4/2019

World Wide Web 4/2019 Zur Ausgabe