ABSTRACT
The proliferation of mobile sensing and communication devices in the possession of the average individual generated much recent interest in social sensing applications. Significant advances were made on the problem of uncovering ground truth from observations made by participants of unknown reliability. The problem, also called fact-finding commonly arises in applications where unvetted individuals may opt in to report phenomena of interest. For example, reliability of individuals might be unknown when they can join a participatory sensing campaign simply by downloading a smartphone app. This paper extends past social sensing literature by offering a scalable approach for exploiting dependencies between observed variables to increase fact-finding accuracy. Prior work assumed that reported facts are independent, or incurred exponential complexity when dependencies were present. In contrast, this paper presents the first scalable approach for accommodating dependency graphs between observed states. The approach is tested using real-life data collected in the aftermath of hurricane Sandy on availability of gas, food, and medical supplies, as well as extensive simulations. Evaluation shows that combining expected correlation graphs (of outages) with reported observations of unknown reliability, results in a much more reliable reconstruction of ground truth from the noisy social sensing data. We also show that correlation graphs can help test hypotheses regarding underlying causes, when different hypotheses are associated with different correlation patterns. For example, an observed outage profile can be attributed to a supplier outage or to excessive local demand. The two differ in expected correlations in observed outages, enabling joint identification of both the actual outages and their underlying causes.
- A. Agresti. An introduction to categorical data analysis, volume 135. Wiley New York, 1996.Google Scholar
- All hazards consortium. http://www.ahcusa.org/.Google Scholar
- D. M. Chickering. Learning bayesian networks is np-complete. In Learning from data, pages 121--130. Springer, 1996.Google ScholarCross Ref
- A. P. Dempster, N. M. Laird, D. B. Rubin, et al. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal statistical Society, 39(1): 1--38, 1977.Google Scholar
- S. Gu, C. Pan, H. Liu, S. Li, S. Hu, L. Su, S. Wang, D. Wang, T. Amin, R. Govindan, G. Aggarwal, R. Ganti, M. Srivatsa, A. Barnoy, P. Terlecky, and T. Abdelzaher. Data extrapolation in social sensing for disaster response. In Proceedings of the 10th IEEE International Conference on Distributed Computing in Sensor Systems. IEEE Press, 2014. Google ScholarDigital Library
- R. I. Jennrich. An asymptotic χ2 test for the equality of two correlation matrices. Journal of the American Statistical Association, 65(330): 904--912, 1970.Google Scholar
- A. R. Jonckheere. A distribution-free k-sample test against ordered alternatives. Biometrika, pages 133--145, 1954.Google Scholar
- Kevin Murphy. Bayes Net Toolbox for Matlab. https://code.google.com/p/bnt/.Google Scholar
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5): 604--632, 1999. Google ScholarDigital Library
- D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009. Google ScholarDigital Library
- Q. Li, Y. Li, J. Gao, B. Zhao, W. Fan, and J. Han. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 1187--1198. ACM, 2014. Google ScholarDigital Library
- Mark Paskin. A short course on graphical models. http://ai.stanford.edu/paskin/gm-short-course/.Google Scholar
- T. D. Nielsen and F. V. Jensen. Bayesian networks and decision graphs. Springer, 2009. Google Scholar
- J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In COLING, 2010. Google ScholarDigital Library
- C. S. Raghavendra, K. M. Sivalingam, and T. Znati. Wireless sensor networks. Springer, 2004. Google ScholarDigital Library
- L. Su, J. Gao, Y. Yang, T. F. Abdelzaher, B. Ding, and J. Han. Hierarchical aggregate classification with limited supervision for data reduction in wireless sensor networks. In Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems, pages 40--53. ACM, 2011. Google ScholarDigital Library
- L. Su, S. Hu, S. Li, F. Liang, J. Gao, T. F. Abdelzaher, and J. Han. Quality of information based data selection and transmission in wireless sensor networks. In RTSS, pages 327--338, 2012. Google ScholarDigital Library
- L. Su, Q. Li, S. Hu, S. Wang, J. Gao, H. Liu, T. Abdelzaher, J. Han, X. Liu, Y. Gao, and L. Kaplan. Generalized decision aggregation in distributed sensing systems. In Real-Time Systems Symposium (RTSS), 2014.Google ScholarCross Ref
- I. Tsamardinos, L. E. Brown, and C. F. Aliferis. The max-min hill-climbing bayesian network structure learning algorithm. Machine learning, 65(1): 31--78, 2006. Google ScholarDigital Library
- D. Wang, T. Abdelzaher, L. Kaplan, R. Ganti, S. Hu, and H. Liu. Exploitation of physical constraints for reliable social sensing. In RTSS, 2013. Google ScholarDigital Library
- D. Wang, T. Amin, S. Li, T. A. L. Kaplan, S. G. C. Pan, H. Liu, C. Aggrawal, R. Ganti, X. Wang, P. Mohapatra, B. Szymanski, and H. Le. Humans as sensors: An estimation theoretic perspective. In IPSN, 2014. Google ScholarDigital Library
- D. Wang, L. Kaplan, H. Le, and T. Abdelzaher. On truth discovery in social sensing: a maximum likelihood estimation approach. In IPSN, 2012. Google ScholarDigital Library
- S. Wang, D. Wang, L. Su, L. Kaplan, and T. Abdelzaher. Towards cyber-physical systems in social spaces: The data reliability challenge. In Real-Time Systems Symposium (RTSS), 2014.Google ScholarCross Ref
- X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. Knowledge and Data Engineering, IEEE Transactions on, 20(6): 796--808, 2008. Google ScholarDigital Library
Index Terms
- Scalable social sensing of interdependent phenomena
Recommendations
On truth discovery in social sensing: a maximum likelihood estimation approach
IPSN '12: Proceedings of the 11th international conference on Information Processing in Sensor NetworksThis paper addresses the challenge of truth discovery from noisy social sensing data. The work is motivated by the emergence of social sensing as a data collection paradigm of growing interest, where humans perform sensory data collection tasks. A ...
Maximum likelihood analysis of conflicting observations in social sensing
This article addresses the challenge of truth discovery from noisy social sensing data. The work is motivated by the emergence of social sensing as a data collection paradigm of growing interest, where humans perform sensory data collection tasks. ...
Iterative expectation maximization for reliable social sensing with information flows
Highlights- This paper is to find the truth Z of events in social sensing with information flows.
AbstractSocial sensing relies on a large number of observations reported by different, possibly unreliable, agents to determine if an event has occurred or not. In this paper, we consider the truth discovery problem in social sensing, in which ...
Comments