ABSTRACT
In crowdsourced data aggregation task, there exist conflicts in the answers provided by large numbers of sources on the same set of questions. The most important challenge for this task is to estimate source reliability and select answers that are provided by high-quality sources. Existing work solves this problem by simultaneously estimating sources' reliability and inferring questions' true answers (i.e., the truths). However, these methods assume that a source has the same reliability degree on all the questions, but ignore the fact that sources' reliability may vary significantly among different topics. To capture various expertise levels on different topics, we propose FaitCrowd, a fine grained truth discovery model for the task of aggregating conflicting data collected from multiple users/sources. FaitCrowd jointly models the process of generating question content and sources' provided answers in a probabilistic model to estimate both topical expertise and true answers simultaneously. This leads to a more precise estimation of source reliability. Therefore, FaitCrowd demonstrates better ability to obtain true answers for the questions compared with existing approaches. Experimental results on two real-world datasets show that FaitCrowd can significantly reduce the error rate of aggregation compared with the state-of-the-art multi-source aggregation approaches due to its ability of learning topical expertise from question content and collected answers.
- B. I. Aydin, Y. S. Yilmaz, Y. Li, Q. Li, J. Gao, and M. Demirbas. Crowdsourcing for multiple-choice question answering. In Twenty-Sixth IAAI Conference, pages 2946--2953, 2014.Google Scholar
- A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the EM algorithm. In Applied Statistics, pages 20--28, 1979.Google Scholar
- G. Demartini, D. E. Difallah, and P. Cudré-Mauroux. Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of the 21st International Conference on World Wide Web, pages 469--478, 2012. Google ScholarDigital Library
- X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: the role of source dependence. In Proceedings of the VLDB Endowment, 2(1):550--561, 2009. Google ScholarDigital Library
- X. L. Dong, B. Saha, and D. Srivastava. Less is more: Selecting sources wisely for integration. In Proceedings of the VLDB Endowment, 6(2):37--48, 2012. Google ScholarDigital Library
- A. Galland, S. Abiteboul, A. Marian, and P. Senellart. Corroborating information from disagreeing views. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, pages 131--140, 2010. Google ScholarDigital Library
- T. L. Griffiths and M. Steyvers. Finding scientific topics. In Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1):5228--5235, 2004.Google ScholarCross Ref
- H. Ji, R. Grishman, and H. T. Dang. Overview of the TAC 2011 knowledge base population track. In Third Text Analysis Conference, 2011.Google Scholar
- J. Guo, S. Xu, S. Bao, and Y. Yu. Tapping on the potential of Q&A community by recommending answer providers. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, pages 921--930, 2008. Google ScholarDigital Library
- Q. Li, Y. Li, J. Gao, L. Su, B. Zhao, M. Demirbas, W. Fan, and J. Han. A confidence-aware approach for truth discovery on long-tail data. In Proceedings of the VLDB Endowment, 8(4):425--436, 2014. Google ScholarDigital Library
- Q. Li, Y. Li, J. Gao, B. Zhao, W. Fan, and J. Han. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pages 1187--1198, 2014. Google ScholarDigital Library
- Y. Li, J. Gao, C. Meng, Q. Li, L. Su, B. Zhao, W. Fan, and J. Han. A Survey on Truth Discovery. In ArXiv Preprint ArXiv:1505.02463, 2015.Google Scholar
- S. Mukherjee, G. Weikum, and C. Danescu-Niculescu-Mizil. People on drugs: Credibility of user statements in health communities. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 65--74, 2014. Google ScholarDigital Library
- J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In Proceedings of the 23rd International Conference on Computational Linguistics, pages 877--885, 2010. Google ScholarDigital Library
- J. Pasternack and D. Roth. Latent credibility analysis. In Proceedings of the 22nd International Conference on World Wide Web, pages 1009--1020, 2013. Google ScholarDigital Library
- V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. In Journal of Machine Learning Research, 11:1297--1322, 2010. Google ScholarDigital Library
- R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast--but is it good? Evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 254--263, 2008. Google ScholarDigital Library
- L. Su, Q. Li, S. Hu, S. Wang, J. Gao, H. Liu, T. F. Abdelzaher, J. Han, X. Liu, Y. Gao, and L. Kaplan. Generalized decision aggregation in distributed sensing systems. In 2014 IEEE Real-Time Systems Symposium (RTSS), pages 1--10, 2014.Google ScholarCross Ref
- M. Venanzi, J. Guiver, G. Kazai, P. Kohli, and M. Shokouhi. Community-based bayesian aggregation models for crowdsourcing. In Proceedings of the 23rd International Conference on World Wide Web, pages 155--164, 2014. Google ScholarDigital Library
- V. Vydiswaran, C. Zhai, and D. Roth. Content-driven trust propagation framework. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 974--982, 2011. Google ScholarDigital Library
- H. M. Wallach. Topic modeling: Beyond bag-of-words. In Proceedings of the 23rd International Conference on Machine Learning, pages 977--984, 2006. Google ScholarDigital Library
- D. Wang, L. Kaplan, H. Le, and T. Abdelzaher. On truth discovery in social sensing: A maximum likelihood estimation approach. In Proceedings of the 11th International Conference on Information Processing in Sensor Networks, pages 233--244, 2012. Google ScholarDigital Library
- P. Welinder, S. Branson, P. Perona, and S. J. Belongie. The multidimensional wisdom of crowds. In Advances in Neural Information Processing Systems, pages 2424--2432, 2010.Google ScholarDigital Library
- J. Whitehill, T.-f. Wu, J. Bergsma, J. R. Movellan, and P. L. Ruvolo. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in Neural Information Processing Systems, pages 2035--2043, 2009.Google ScholarDigital Library
- L. Yang, M. Qiu, S. Gottipati, F. Zhu, J. Jiang, H. Sun, and Z. Chen. CQARank: jointly model topics and expertise in community question answering. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pages 99--108, 2013. Google ScholarDigital Library
- X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. In IEEE Transactions on Knowledge and Data Engineering, 20(6):796--808, 2008. Google ScholarDigital Library
- B. Zhao and J. Han. A probabilistic model for estimating real-valued truth from conflicting sources. In Proceedings of the 10th International Workshop on Quality in Databases, 2012.Google Scholar
- B. Zhao, B. I. Rubinstein, J. Gemmell, and J. Han. A bayesian approach to discovering truth from conflicting sources for data integration. In Proceedings of the VLDB Endowment, 5(6):550--561, 2012. Google ScholarDigital Library
- T. Zhao, N. Bian, C. Li, and M. Li. Topic-level expert modeling in community question answering. In SIAM International Conference on Data Mining, pages 776--784, 2013.Google ScholarCross Ref
- W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In Proceedings of the 33rd European Conference on Advances in Information Retrieval, pages 338--349, 2011. Google ScholarDigital Library
- D. Zhou, S. Basu, Y. Mao, and J. C. Platt. Learning from the wisdom of crowds by minimax entropy. In Advances in Neural Information Processing Systems, pages 2195--2203, 2012.Google ScholarDigital Library
Index Terms
- FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation
Recommendations
A Survey on Truth Discovery
Thanks to information explosion, data for the objects of interest can be collected from increasingly more sources. However, for the same object, there usually exist conflicts among the collected multi-source information. To tackle this challenge, truth ...
Truth Discovery on Crowd Sensing of Correlated Entities
SenSys '15: Proceedings of the 13th ACM Conference on Embedded Networked Sensor SystemsWith the popular usage of mobile devices and smartphones, crowd sensing becomes pervasive in real life when human acts as sensors to report their observations about entities. For the same entity, users may report conflicting information, and thus it is ...
A confidence-aware approach for truth discovery on long-tail data
In many real world applications, the same item may be described by multiple sources. As a consequence, conflicts among these sources are inevitable, which leads to an important task: how to identify which piece of information is trustworthy, i.e., the ...
Comments