ABSTRACT
Truth discovery is the problem of detecting true values from the conflicting data provided by multiple sources on the same data items. Since sources' reliability is unknown a priori, a truth discovery method usually estimates sources' reliability along with the truth discovery process. A major limitation of existing truth discovery methods is that they commonly assume exactly one true value on each data item and therefore cannot deal with the more general case that a data item may have multiple true values (or multi-truth). Since the number of true values may vary from data item to data item, this requires truth discovery methods being able to detect varying numbers of truth values from the multi-source data. In this paper, we propose a multi-truth discovery approach, which addresses the above challenges by providing a generic framework for enhancing existing truth discovery methods. In particular, we redeem the numbers of true values as an important clue for facilitating multi-truth discovery. We present the procedure and components of our approach, and propose three models, namely the byproduct model, the joint model, and the synthesis model to implement our approach. We further propose two extensions to enhance our approach, by leveraging the implications of similar numerical values and values' co-occurrence information in sources' claims to improve the truth discovery accuracy. Experimental studies on real-world datasets demonstrate the effectiveness of our approach.
- M. Allahbakhsh, B. Benatallah, A. Ignjatovic, H. R. Motahari-Nezhad, E. Bertino, and S. Dustdar. Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing, (2):76--81, 2013. Google ScholarDigital Library
- D. Benslimane, Q. Z. Sheng, M. Barhamgi, and H. Prade. The uncertain web: concepts, challenges, and current solutions. ACM Transactions on Internet Technology (TOIT), 16(1):1, 2015. Google ScholarDigital Library
- C. Dobre and F. Xhafa. Intelligent services for big data science. Future Generation Computer Systems, 37:267--281, 2014.Google ScholarCross Ref
- X. L. Dong, L. Berti-Equille, Y. Hu, and D. Srivastava. Global detection of complex copying relationships between sources. Proc. the VLDB Endowment, 3(1--2):1358--1369, 2010. Google ScholarDigital Library
- X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: the role of source dependence. Proc. the VLDB Endowment, 2(1):550--561, 2009. Google ScholarDigital Library
- A. Galland, S. Abiteboul, A. Marian, and P. Senellart. Corroborating information from disagreeing views. In Proc. ACM International Conference on Web Search and Data Mining (WSDM), pages 131--140, 2010. Google ScholarDigital Library
- D. J. Kim, D. L. Ferrin, and H. R. Rao. A trust-based consumer decision-making model in electronic commerce: The role of trust, perceived risk, and their antecedents. Decis. Support Syst., 44(2):544--564, 2008. Google ScholarDigital Library
- Q. Li, Y. Li, J. Gao, L. Su, B. Zhao, M. Demirbas, W. Fan, and J. Han. A confidence-aware approach for truth discovery on long-tail data. Proc. the VLDB Endowment, 8(4), 2014. Google ScholarDigital Library
- Q. Li, Y. Li, J. Gao, B. Zhao, W. Fan, and J. Han. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In Proc. ACM SIGMOD International Conference on Management of Data, pages 1187--1198, 2014. Google ScholarDigital Library
- Y. Li, J. Gao, C. Meng, Q. Li, L. Su, B. Zhao, W. Fan, and J. Han. A survey on truth discovery. ACM SIGKDD Exploration Newsletters, 17(2):1--16, 2016. Google ScholarDigital Library
- J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In Proc. International Conference on Computational Linguistics (COLING), pages 877--885, 2010. Google ScholarDigital Library
- J. Pasternack and D. Roth. Latent credibility analysis. In Proc. the 22th international conference on World Wide Web (WWW), pages 1009--1020, 2013. Google ScholarDigital Library
- A. Rajaraman, J. D. Ullman, J. D. Ullman, and J. D. Ullman. Mining of massive datasets, volume 77. Cambridge University Press Cambridge, 2012. Google ScholarDigital Library
- D. Wang, L. Kaplan, H. Le, and T. Abdelzaher. On truth discovery in social sensing: a maximum likelihood estimation approach. In Proc. ACM International Conference on Information Processing in Sensor Networks (Sensys), pages 233--244, 2012. Google ScholarDigital Library
- X. Wang, Q. Z. Sheng, X. S. Fang, X. Li, X. Xu, and L. Yao. Approximate truth discovery via problem scale reduction. In Proc. the 24th ACM International Conference on Information and Knowledge Management (CIKM), pages 503--512, 2015. Google ScholarDigital Library
- X. Wang, Q. Z. Sheng, X. S. Fang, L. Yao, X. Xu, and X. Li. An integrated bayesian approach for effective multi-truth discovery. In Proc. the 24th ACM International Conference on Information and Knowledge Management (CIKM), pages 493--502, 2015. Google ScholarDigital Library
- X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. IEEE Transactions on Knowledge and Data Engineering (TKDE), 20(6):796--808, 2008. Google ScholarDigital Library
- X. Yin and W. Tan. Semi-supervised truth discovery. In Proc. the 20th international conference on World Wide Web (WWW), pages 217--226, 2011. Google ScholarDigital Library
- B. Zhao, B. I. Rubinstein, J. Gemmell, and J. Han. A bayesian approach to discovering truth from conflicting sources for data integration. Proc. the VLDB Endowment, 5(6):550--561, 2012. Google ScholarDigital Library
Index Terms
- Empowering Truth Discovery with Multi-Truth Prediction
Recommendations
Truth Discovery via Exploiting Implications from Multi-Source Data
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementData veracity is a grand challenge for various tasks on the Web. Since the web data sources are inherently unreliable and may provide conflicting information about the same real-world entities, truth discovery is emerging as a countermeasure of resolving ...
Towards Confidence in the Truth: A Bootstrapping based Truth Discovery Approach
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningThe demand for automatic extraction of true information (i.e., truths) from conflicting multi-source data has soared recently. A variety of truth discovery methods have witnessed great successes via jointly estimating source reliability and truths. All ...
Truth Discovery in Data Streams: A Single-Pass Probabilistic Approach
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementTruth discovery is a long-standing problem for assessing the validity of information from various data sources that may provide different and conflicting information. With the increasing prominence of data streams arising in a wide range of applications ...
Comments