ABSTRACT
Collective intelligence, which aggregates the shared information from large crowds, is often negatively impacted by unreliable information sources with the low quality data. This becomes a barrier to the effective use of collective intelligence in a variety of applications. In order to address this issue, we propose a probabilistic model to jointly assess the reliability of sources and find the true data. We observe that different sources are often not independent of each other. Instead, sources are prone to be mutually influenced, which makes them dependent when sharing information with each other. High dependency between sources makes collective intelligence vulnerable to the overuse of redundant (and possibly incorrect) information from the dependent sources. Thus, we reveal the latent group structure among dependent sources, and aggregate the information at the group level rather than from individual sources directly. This can prevent the collective intelligence from being inappropriately dominated by dependent sources. We will also explicitly reveal the reliability of groups, and minimize the negative impacts of unreliable groups. Experimental results on real-world data sets show the effectiveness of the proposed approach with respect to existing algorithms.
- Y. Bachrach, T. Minka, J. Guiver, and T. Graepel. How to grade a test without knowing the answers - a bayesian graphical model for adaptive crowdsourcing and aptitude testing. In Proc. of International Conference on Machine Learning, 2012.Google Scholar
- M. Bilgic, G. Namata, and L. Getoor. Combining collective classification and link prediction. In Workshop on Mining Graphs and Complex Structures (at ICDM), 2007. Google ScholarDigital Library
- A. Clauset, M. E. J. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E, 70:066111, 2004.Google ScholarCross Ref
- X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: The role of source dependence. In Proc. of International Conference on Very Large Databases, August 2009. Google ScholarDigital Library
- A. Galland, S. Abiteboul, A. Marian, and P. Senellart. Corroborating information from disagreeing views. In Proc. of ACM International Conference on Web Search and Data Mining, February 2010. Google ScholarDigital Library
- L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models of link structure. Journal of Machine Learning Research, (3):679--707, 2002. Google ScholarDigital Library
- M. Girvan and M. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821--7826, June 2002.Google ScholarCross Ref
- M. Gupta, Y. Sun, and J. Han. Trust analysis with clustering. In Proc. of International World Wide Web Conference, April 2011. Google ScholarDigital Library
- O. Hassanzadeh and et al. A framework for semantic link discovery over relational data. In CIKM, 2009. Google ScholarDigital Library
- M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. Introduction to variational methods for graphical models. Machine Learning, 37:183--233, 1999. Google ScholarDigital Library
- G. Kasneci, J. V. Gael, D. Stern, and T. Graepel. Cobayes: Bayesian knowledge corroboration with assessors of unknown areas of expertise. In Proc. of ACM International Conference on Web Search and Data Mining, 2011. Google ScholarDigital Library
- Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30--37, August 2009. Google ScholarDigital Library
- K. Kurihara, M. Welling, and N. Vlassis. Accelerated variational dirichlet process mixtures. In NIPS, 2006.Google Scholar
- J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In Proc. of International Conference on Computational Linguistics, August 2010. Google ScholarDigital Library
- J. Sethuraman. A constructive definition of dirichlet priors. Statistica Sinica, 4:639--650, 1994.Google Scholar
- X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. In Proc. of ACM SIGKDD conference on Knowledge Discovery and Data Mining, August 2007. Google ScholarDigital Library
- X. Yin and W. Tan. Semi-supervised truth discovery. In Proc. of International World Wide Web Conference, March 28-April 1 2011. Google ScholarDigital Library
- B. Zhao, B. I. P. Rubinstein, J. Gemmell, and J. Han. A bayesian approach to discovering truth from conflicting sources for data integration. In Proc. of International Conference on Very Large Databases, 2012. Google ScholarDigital Library
- X. Zhou, N. Cui, Z. Li, F. Liang, and T. Huang. Hierarchical gaussianization for image classification, 2009.Google Scholar
Index Terms
- Mining collective intelligence in diverse groups
Recommendations
Workings of collective intelligence within open source communities
SBP'10: Proceedings of the Third international conference on Social Computing, Behavioral Modeling, and PredictionOpen source communities have been of great interest for researchers recently, yet little can be agreed upon when it comes to developers motives. While it has been shown that participants are mostly driven to contribute based on work related needs, it ...
Collective Intelligence: from the Enlightenment to the Crowd Science
ICCSE'17: Proceedings of the 2nd International Conference on Crowd Science and EngineeringFrom1 the 18th century Enlightenment to the New Millennium, the wisdom of crowd, later named the collective intelligence, has attracted many scientists to study its principle and application. This article shows the picture of collective intelligence ...
A Sociopsychological Perspective on Collective Intelligence in Metaheuristic Computing
In studies of genetic algorithms, evolutionary computing, and ant colony mechanisms, it is recognized that the higher-order forms of collective intelligence play an important role in metaheuristic computing and computational intelligence. Collective ...
Comments