skip to main content
10.1145/2783258.2783314acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation

Authors Info & Claims
Published:10 August 2015Publication History

ABSTRACT

In crowdsourced data aggregation task, there exist conflicts in the answers provided by large numbers of sources on the same set of questions. The most important challenge for this task is to estimate source reliability and select answers that are provided by high-quality sources. Existing work solves this problem by simultaneously estimating sources' reliability and inferring questions' true answers (i.e., the truths). However, these methods assume that a source has the same reliability degree on all the questions, but ignore the fact that sources' reliability may vary significantly among different topics. To capture various expertise levels on different topics, we propose FaitCrowd, a fine grained truth discovery model for the task of aggregating conflicting data collected from multiple users/sources. FaitCrowd jointly models the process of generating question content and sources' provided answers in a probabilistic model to estimate both topical expertise and true answers simultaneously. This leads to a more precise estimation of source reliability. Therefore, FaitCrowd demonstrates better ability to obtain true answers for the questions compared with existing approaches. Experimental results on two real-world datasets show that FaitCrowd can significantly reduce the error rate of aggregation compared with the state-of-the-art multi-source aggregation approaches due to its ability of learning topical expertise from question content and collected answers.

References

  1. B. I. Aydin, Y. S. Yilmaz, Y. Li, Q. Li, J. Gao, and M. Demirbas. Crowdsourcing for multiple-choice question answering. In Twenty-Sixth IAAI Conference, pages 2946--2953, 2014.Google ScholarGoogle Scholar
  2. A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the EM algorithm. In Applied Statistics, pages 20--28, 1979.Google ScholarGoogle Scholar
  3. G. Demartini, D. E. Difallah, and P. Cudré-Mauroux. Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of the 21st International Conference on World Wide Web, pages 469--478, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: the role of source dependence. In Proceedings of the VLDB Endowment, 2(1):550--561, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. X. L. Dong, B. Saha, and D. Srivastava. Less is more: Selecting sources wisely for integration. In Proceedings of the VLDB Endowment, 6(2):37--48, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Galland, S. Abiteboul, A. Marian, and P. Senellart. Corroborating information from disagreeing views. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, pages 131--140, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. L. Griffiths and M. Steyvers. Finding scientific topics. In Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1):5228--5235, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  8. H. Ji, R. Grishman, and H. T. Dang. Overview of the TAC 2011 knowledge base population track. In Third Text Analysis Conference, 2011.Google ScholarGoogle Scholar
  9. J. Guo, S. Xu, S. Bao, and Y. Yu. Tapping on the potential of Q&A community by recommending answer providers. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, pages 921--930, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Q. Li, Y. Li, J. Gao, L. Su, B. Zhao, M. Demirbas, W. Fan, and J. Han. A confidence-aware approach for truth discovery on long-tail data. In Proceedings of the VLDB Endowment, 8(4):425--436, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Q. Li, Y. Li, J. Gao, B. Zhao, W. Fan, and J. Han. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pages 1187--1198, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Li, J. Gao, C. Meng, Q. Li, L. Su, B. Zhao, W. Fan, and J. Han. A Survey on Truth Discovery. In ArXiv Preprint ArXiv:1505.02463, 2015.Google ScholarGoogle Scholar
  13. S. Mukherjee, G. Weikum, and C. Danescu-Niculescu-Mizil. People on drugs: Credibility of user statements in health communities. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 65--74, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In Proceedings of the 23rd International Conference on Computational Linguistics, pages 877--885, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Pasternack and D. Roth. Latent credibility analysis. In Proceedings of the 22nd International Conference on World Wide Web, pages 1009--1020, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. In Journal of Machine Learning Research, 11:1297--1322, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast--but is it good? Evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 254--263, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Su, Q. Li, S. Hu, S. Wang, J. Gao, H. Liu, T. F. Abdelzaher, J. Han, X. Liu, Y. Gao, and L. Kaplan. Generalized decision aggregation in distributed sensing systems. In 2014 IEEE Real-Time Systems Symposium (RTSS), pages 1--10, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Venanzi, J. Guiver, G. Kazai, P. Kohli, and M. Shokouhi. Community-based bayesian aggregation models for crowdsourcing. In Proceedings of the 23rd International Conference on World Wide Web, pages 155--164, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. Vydiswaran, C. Zhai, and D. Roth. Content-driven trust propagation framework. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 974--982, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. M. Wallach. Topic modeling: Beyond bag-of-words. In Proceedings of the 23rd International Conference on Machine Learning, pages 977--984, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Wang, L. Kaplan, H. Le, and T. Abdelzaher. On truth discovery in social sensing: A maximum likelihood estimation approach. In Proceedings of the 11th International Conference on Information Processing in Sensor Networks, pages 233--244, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Welinder, S. Branson, P. Perona, and S. J. Belongie. The multidimensional wisdom of crowds. In Advances in Neural Information Processing Systems, pages 2424--2432, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Whitehill, T.-f. Wu, J. Bergsma, J. R. Movellan, and P. L. Ruvolo. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in Neural Information Processing Systems, pages 2035--2043, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Yang, M. Qiu, S. Gottipati, F. Zhu, J. Jiang, H. Sun, and Z. Chen. CQARank: jointly model topics and expertise in community question answering. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pages 99--108, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. In IEEE Transactions on Knowledge and Data Engineering, 20(6):796--808, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Zhao and J. Han. A probabilistic model for estimating real-valued truth from conflicting sources. In Proceedings of the 10th International Workshop on Quality in Databases, 2012.Google ScholarGoogle Scholar
  28. B. Zhao, B. I. Rubinstein, J. Gemmell, and J. Han. A bayesian approach to discovering truth from conflicting sources for data integration. In Proceedings of the VLDB Endowment, 5(6):550--561, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Zhao, N. Bian, C. Li, and M. Li. Topic-level expert modeling in community question answering. In SIAM International Conference on Data Mining, pages 776--784, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  30. W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In Proceedings of the 33rd European Conference on Advances in Information Retrieval, pages 338--349, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Zhou, S. Basu, Y. Mao, and J. C. Platt. Learning from the wisdom of crowds by minimax entropy. In Advances in Neural Information Processing Systems, pages 2195--2203, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
      August 2015
      2378 pages
      ISBN:9781450336642
      DOI:10.1145/2783258

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 August 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader