Skip to main content
Top

2016 | OriginalPaper | Chapter

An Ensemble Approach for Better Truth Discovery

Authors : Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang

Published in: Advanced Data Mining and Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Truth discovery is a hot research topic in the Big Data era, with the goal of identifying true values from the conflicting data provided by multiple sources on the same data items. Previously, many methods have been proposed to tackle this issue. However, none of the existing methods is a clear winner that consistently outperforms the others due to the varied characteristics of different methods. In addition, in some cases, an improved method may not even beat its original version as a result of the bias introduced by limited ground truths or different features of the applied datasets. To realize an approach that achieves better and robust overall performance, we propose to fully leverage the advantages of existing methods by extracting truth from the prediction results of these existing truth discovery methods. In particular, we first distinguish between the single-truth and multi-truth discovery problems and formally define the ensemble truth discovery problem. Then, we analyze the feasibility of the ensemble approach, and derive two models, i.e., serial model and parallel model, to implement the approach, and to further tackle the above two types of truth discovery problems. Extensive experiments over three large real-world datasets and various synthetic datasets demonstrate the effectiveness of our approach.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
If a source claims value(s) for a certain object, it implicitly votes against other candidate values of this object.
 
2
Hereafter we call the revised methods the modified single-truth discovery methods.
 
3
Such values are then normalized to represent probabilities.
 
4
We chose this order because it is the increasing order of precision of these four methods performed on three real-world datasets in [15].
 
5
Random ground truth distribution per source means the number of true positive claims per source is random.
 
6
80-pessimistic ground truth distribution per source means 80 % of the sources provide 20 % true positive claims, while 20 % of the sources provide 80 % true positive claims.
 
Literature
1.
go back to reference Berti-Equille, L.: Data veracity estimation with ensembling truth discovery methods. In: IEEE Big Data Workshop on Data Quality Issues in Big Data (2015) Berti-Equille, L.: Data veracity estimation with ensembling truth discovery methods. In: IEEE Big Data Workshop on Data Quality Issues in Big Data (2015)
2.
go back to reference Dietterich, T.G.: Ensemble methods in machine learning. In: Proceedings of the First International Workshop on Multiple Classifier Systems (MCS 2000), Cagliari, Italy (2000) Dietterich, T.G.: Ensemble methods in machine learning. In: Proceedings of the First International Workshop on Multiple Classifier Systems (MCS 2000), Cagliari, Italy (2000)
3.
go back to reference Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29(2), 103–130 (1997)CrossRefMATH Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29(2), 103–130 (1997)CrossRefMATH
4.
go back to reference Dong, X.L., et al.: From data fusion to knowledge fusion. In: Proceedings of the 40th International Conference on Very Large Data Bases (VLDB 2014), Hangzhou, China (2014) Dong, X.L., et al.: From data fusion to knowledge fusion. In: Proceedings of the 40th International Conference on Very Large Data Bases (VLDB 2014), Hangzhou, China (2014)
5.
go back to reference Dong, X.L., et al.: Integrating conflicting data: the role of source dependence. VLDB Endowment (PVLDB) 2(1), 550–561 (2009)CrossRef Dong, X.L., et al.: Integrating conflicting data: the role of source dependence. VLDB Endowment (PVLDB) 2(1), 550–561 (2009)CrossRef
6.
go back to reference Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010), New York, NY, USA (2010) Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010), New York, NY, USA (2010)
7.
go back to reference Goasdoué, F., et al.: Fact checking and analyzing the web. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD 2013), New York, NY, USA (2013) Goasdoué, F., et al.: Fact checking and analyzing the web. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD 2013), New York, NY, USA (2013)
8.
go back to reference Li, Q., et al.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD 2014), Snowbird, Utah, USA (2014) Li, Q., et al.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD 2014), Snowbird, Utah, USA (2014)
9.
go back to reference Li, Q., et al.: A confidence-aware approach for truth discovery on long-tail data. VLDB Endowment (PVLDB) 8(4), 425–436 (2015)CrossRef Li, Q., et al.: A confidence-aware approach for truth discovery on long-tail data. VLDB Endowment (PVLDB) 8(4), 425–436 (2015)CrossRef
10.
go back to reference Li, X., et al.: Truth finding on the deep web: is the problem solved? VLDB Endowment (PVLDB) 6(2), 97–108 (2013)CrossRef Li, X., et al.: Truth finding on the deep web: is the problem solved? VLDB Endowment (PVLDB) 6(2), 97–108 (2013)CrossRef
11.
go back to reference Li, Y., Gao, J., Meng, C., Li, Q., Su, L., Zhao, B., Fan, W., Han, J.: A survey on truth discovery. ACM SIGKDD Explor. Newsl. (2016) Li, Y., Gao, J., Meng, C., Li, Q., Su, L., Zhao, B., Fan, W., Han, J.: A survey on truth discovery. ACM SIGKDD Explor. Newsl. (2016)
12.
go back to reference Pasternack, J., Roth, D.: Knowing what to believe (when you already know something). In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Stroudsburg, PA, USA (2010) Pasternack, J., Roth, D.: Knowing what to believe (when you already know something). In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Stroudsburg, PA, USA (2010)
13.
go back to reference Pasternack, J., Roth, D.: Latent credibility analysis. In: Proceedings of the 22nd International World Wide Web Conference (WWW 2013), Rio de Janeiro, Brazil (2013) Pasternack, J., Roth, D.: Latent credibility analysis. In: Proceedings of the 22nd International World Wide Web Conference (WWW 2013), Rio de Janeiro, Brazil (2013)
14.
go back to reference Waguih, D.A., Berti-Equille, L.: Truth discovery algorithms: an experimental evaluation. CoRR abs/1409.6428 (2014) Waguih, D.A., Berti-Equille, L.: Truth discovery algorithms: an experimental evaluation. CoRR abs/1409.6428 (2014)
15.
go back to reference Wang, X., et al.: An integrated Bayesian approach for effective multi-truth discovery. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM 2015), Melbourne, Australia (2015) Wang, X., et al.: An integrated Bayesian approach for effective multi-truth discovery. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM 2015), Melbourne, Australia (2015)
16.
go back to reference Yin, X., Tan, W.: Semi-supervised truth discovery. In: Proceedings of the 20th International World Wide Web Conference (WWW 2011), Hyderabad, India (2011) Yin, X., Tan, W.: Semi-supervised truth discovery. In: Proceedings of the 20th International World Wide Web Conference (WWW 2011), Hyderabad, India (2011)
17.
go back to reference Yin, X., et al.: Truth discovery with multiple conflicting information providers on the web. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), San Jose, California, USA (2007) Yin, X., et al.: Truth discovery with multiple conflicting information providers on the web. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), San Jose, California, USA (2007)
18.
go back to reference Yu, D., et al.: The wisdom of minority: unsupervised slot filling validation based on multi-dimensional truth-finding. In: Proceedings of the International Conference on Computational Linguistics (COLING 2014), Dublin, Ireland (2014) Yu, D., et al.: The wisdom of minority: unsupervised slot filling validation based on multi-dimensional truth-finding. In: Proceedings of the International Conference on Computational Linguistics (COLING 2014), Dublin, Ireland (2014)
19.
go back to reference Zhao, B., et al.: A Bayesian approach to discovering truth from conflicting sources for data integration. VLDB Endowment (PVLDB) 5(6), 550–561 (2012)CrossRef Zhao, B., et al.: A Bayesian approach to discovering truth from conflicting sources for data integration. VLDB Endowment (PVLDB) 5(6), 550–561 (2012)CrossRef
20.
go back to reference Zhao, B., Han, J.: A probabilistic model for estimating real-valued truth from conflicting sources. In: Proceedings of 10th International Workshop on Quality in Databases (QDB 2012), Instanbul, Turkey (2012) Zhao, B., Han, J.: A probabilistic model for estimating real-valued truth from conflicting sources. In: Proceedings of 10th International Workshop on Quality in Databases (QDB 2012), Instanbul, Turkey (2012)
Metadata
Title
An Ensemble Approach for Better Truth Discovery
Authors
Xiu Susie Fang
Quan Z. Sheng
Xianzhi Wang
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-49586-6_20

Premium Partner