skip to main content
research-article

Maximum likelihood analysis of conflicting observations in social sensing

Published:31 January 2014Publication History
Skip Abstract Section

Abstract

This article addresses the challenge of truth discovery from noisy social sensing data. The work is motivated by the emergence of social sensing as a data collection paradigm of growing interest, where humans perform sensory data collection tasks. Unlike the case with well-calibrated and well-tested infrastructure sensors, humans are less reliable, and the likelihood that participants' measurements are correct is often unknown a priori. Given a set of human participants of unknown trustworthiness together with their sensory measurements, we pose the question of whether one can use this information alone to determine, in an analytically founded manner, the probability that a given measurement is true. In our previous conference paper, we offered the first maximum likelihood solution to the aforesaid truth discovery problem for corroborating observations only. In contrast, this article extends the conference paper and provides the first maximum likelihood solution to handle the cases where measurements from different participants may be conflicting. The article focuses on binary measurements. The approach is shown to outperform our previous work used for corroborating observations, the state-of-the-art fact-finding baselines, as well as simple heuristics such as majority voting.

References

  1. T. Abdelzaher, Y. Anokwa, P. Boda, J. Burke, and D. Estrin, et al. 2007. Mobiscopes for human spaces. IEEE Pervas. Comput. 6, 2, 20--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Engin. 17, 6, 734--749. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Hossein Ahmadi, Tarek Abdelzaher, Jiawei Han, Nam Pham, and Raghu Ganti. 2011. The sparse regression cube: A reliable modeling technique for open cyber-physical systems. In Proceedings of the 2nd International Conference on Cyber-Physical Systems (ICCPS'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, and Jiawei Han. 2010. Privacy-aware regression modeling of participatory sensing data. In Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems (SenSys'10). ACM Press, New York, 99--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Raju Balakrishnan. 2011. Source rank: Relevance and trust assessment for deep web sources based on inter-source agreement. In Proceedings of the 20th World Wide Web Conference (WWW'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Rajesh Krishna Balan, Nguyen Xuan Khoa, and Lingxiao Jiang. 2011. Real-time trip information service for a large taxi fleet. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services (MobiSys'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Sibren Isaacman Richard Becker, Ramón Cáceres, Margaret Martonosi James Rowland, Alexander Varshavsky, and Walter Willinger. 2012. Human mobility modeling at metropolitan scales. In Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services (MobiSys'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Laure Berti-Equille, Anish Das Sarma, Xin Dong, Amélie Marian, and Divesh Srivastava. 2009. Sailing the information ocean with awareness of currents: Discovery and application of source dependence. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR'09).Google ScholarGoogle Scholar
  9. Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International Conference on World Wide Web (WWW'07). 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Harald Cramer. 1946. Mathematical Methods of Statistics. Princeton University Press.Google ScholarGoogle Scholar
  11. Ofer Dekel and Ohad Shamir. 2009. Vox populi: Collecting high-quality labels from a crowd. In Proceedings of the 22nd Annual Conference on Learning Theory.Google ScholarGoogle Scholar
  12. Sebastiano A. Delre, Wander Jager, and Marco A. Janssen. 2007. Diffusion dynamics in small-world networks with heterogeneous consumers. Comput. Math. Organ. Theory 13, 185--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the em algorithm. J. Royal Statist. Soc. B39, 1, 1--38.Google ScholarGoogle Scholar
  14. Xin Dong, Laure Berti-Equille, Yifan Hu, and Divesh Srivastava. 2010. Global detection of complex copying relationships between sources. Proc. VLDB Endow. 3, 1, 1358--1369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. Dong, L. Berti-Equille, and Divesh Srivastava. 2009. Truth discovery and copying detection in a dynamic world. Proc. VLDB Endow. 2, 1, 562--573. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Arnaud Doucet, Nando De Freitas, and Neil Gordon, Eds. 2001. Sequential Monte Carlo Methods in Practice. Springer.Google ScholarGoogle Scholar
  17. Richard O. Duda, Peter E. Hart, and David G. Stork. 2001. Pattern Classification. 2nd Ed. Wiley-Interscience. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. B. Eisenman, E. Miluzzo, N. E. Lane, R. A. Peterson, G.-S. Ahn, and A. T. Campbell. 2007. The bikenet mobile sensing system for cyclist experience mapping. In Proceedings of the 5th International Conference on Embedded Networked Sensor Systems (Sen-Sys'07). 87--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Alban Galland, Serge Abiteboul, A. Marian, and Pierre Senellart. 2010. Corroborating information from disagreeing views. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM'10). 131--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Han, M. Kamber, and Jian Pei. 2011. Data Mining: Concepts and Techniques. 3rd Ed. Morgan Kaufman. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jyh-How Huang, Saqib Amjad, and Shivakant Mishra. 2005. CenWits: A sensor-based loosely coupled search and rescue system using witnesses. In Proceedings of the 3rd International Conference on Embedded Networked Sensor Systems (SenSys'05). 180--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Cindy Hui, Mark K. Goldberg, Malik Magdon-Ismail, and William A. Wallace. 2010. Simulating the diffusion of information: An agent-based modeling approach. Int. J. Agent Technol. Syst. 2, 3, 31--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Bret Hull, V. Bychkovsky, Y. Zhang, K. Chen, M. Goraczko, et al. 2006. CarTel: A distributed mobile sensor computing system. In Proceedings of the 4th International Conference on Embedded Networked Sensor Systems (SenSys'06). 125--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. A. Johnson and D. W. Wichern. 2002. Applied Multivariate Statistical Analysis. Prentice-Hall, Upper Saddle River, NJ.Google ScholarGoogle Scholar
  25. R. E. Kalman. 1960. A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Engin. D82, 35--45.Google ScholarGoogle Scholar
  26. J. M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Norwati Mustapha, Manijeh Jalali, and Mehrdad Jalali. 2009. Expectation maximization clustering algorithm for user modeling in web usage mining systems. Euro. J. Sci. Res. 32, 4, 467--476.Google ScholarGoogle Scholar
  28. Suman Nath. 2012. Exploiting correlation for energy-efficient and continuous context sensing. In Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services (MobiSys'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Taiwoo Park, Jinwon Lee, Inseok Hwang, Chungkuk Yoo, Lama Nachman, and Junehwa Song. 2011. E-gesture: A collaborative architecture for energy-efficient gesture recognition with hand-worn sensor and mobile devices. In Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems (SenSys'11). ACM Press, New York, 260--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Pasternack and D. Roth. 2010. Knowing what to believe (when you already know something). In International Conference on Computational Linguistics (COLING'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Nam Pham, Raghu K. Ganti, Yusuf S. Uddin, Suman Nath, and Tarek Abdelzaher. 2010. Privacy-preserving reconstruction of multidimensional data maps in vehicular participatory sensing. In Proceedings of the 7th European Conference on Wireless Sensor Networks (EWSN'10). 114--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Daniel Pomerantz and Gregory Dudek. 2009. Context dependent movie recommendations using a hierarchical bayesian model. In Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence (CanadianAI'09). Springer, 98--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sasank Reddy, Deborah Estrin, and Mani Srivastava. 2010a. Recruitment framework for participatory sensing data collections. In Proceedings of the 8th International Conference on Pervasive Computing. Springer, 138--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sasank Reddy, Katie Shilton, Gleb Denisov, Christian Cenizal, Deborah Estrin, and Mani Srivastava. 2010b. Biketastic: Sensing and mapping for better biking. In Proceedings of the 28th International Conference on Human Factors in Computing Systems (CHI'10). ACM Press, New York, 1817--1820. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08). ACM Press, New York, 614--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 797--806. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Unica Technologies. 1997. Solving Data Mining Problems Using Pattern Recognition Software with Cdrom. 1st Ed. Prentice Hall PTR, Upper Saddle River, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Dong Wang, Tarek Abdelzaher, Hossein Ahmadi, Jeff Pasternack, Dan Roth, Manish Gupta, Jiawei Han, Omid Fatemieh, and Hieu Le. 2011a. On bayesian interpretation of fact-finding in information networks. In Proceedings of the 14th International Conference on Information Fusion (Fusion'11).Google ScholarGoogle Scholar
  39. Dong Wang, Tarek Abdelzaher, Lance Kaplan, and Charu C. Aggarwal. 2013a. Recursive fact-finding: A streaming approach to truth estimation in crowdsourcing applications. In Proceedings of the 33rd International Conference on Distributed Computing Systems (ICDCS'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Dong Wang, H. Ahmadi, T. Abdelzaher, H. Chenji, R. Stoleru, and C. C. Aggarwal. 2011b. Optimizing quality-of-information in cost-sensitive sensor data fusion. In Proceedings of the 7th IEEE International Conference on Distributed Computing in Sensor Systems (DCoSS'11).Google ScholarGoogle Scholar
  41. Dong Wang, Lance Kaplan, Hieu Khac Le, and Tarek Abdelzaher. 2012a. On truth discovery in social sensing: A maximum likelihood estimation approach. In Proceedings of the 11th ACM/IEEE Conference on Information Processing in Sensor Networks (IPSN'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Dong Wang, Lance Kaplan, Tarek Abdelzaher, and Charu C. Aggarwal. 2012b. On scalability and robustness limitations of real and asymptotic confidence bounds in social sensing. In Proceedings of the 9th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON'12).Google ScholarGoogle Scholar
  43. Dong Wang, Lance Kaplan, Tarek Abdelzaher, and Charu C. Aggarwal. 2013b. On credibility tradeoffs in assured social sensing. IEEE J. Select. Areas Comm. 31, 6.Google ScholarGoogle ScholarCross RefCross Ref
  44. C. F. Jeff Wu. 1983. On the convergence properties of the em algorithm. Ann. Statist. 11, 1, 95--103.Google ScholarGoogle ScholarCross RefCross Ref
  45. J. Xie, S. Sreenivasan, G. Korniss, W. Zhang, C. Lim, and B. K. Szymanski. 2011. Social consensus through the influence of committed minorities. Phys. Rev. E84, 1.Google ScholarGoogle Scholar
  46. Xiaoxin Yin, Jiawei Han, and Philip S. Yu. 2008. Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Engin. 20, 796--808. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Xiaoxin Yin and Wenzhao Tan. 2011. Semi-supervised truth discovery. In Proceedings of the 20th International Conference on World Wide Web (WWW'11). ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Bo Zhao, Benjamin I. P. Rubinstein, Jim Gemmell, and Jiawei Han. 2012. A bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endow. 5, 6, 550--561. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Pengfei Zhou, Yuanqing Zheng, and Mo Li. 2012. How long to wait? Predicting bus arrival time with mobile phone based participatory sensing. In Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services (MobiSys'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Arkaitz Zubiaga, Damiano Spina, Enrique Amigó, and Julio Gonzalo. 2012. Towards real-time summarization of scheduled events from twitter streams. In Proceedings of the 23rd ACM Conference on Hypertext and Social Media (Hypertext'12). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Maximum likelihood analysis of conflicting observations in social sensing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader