research-article

Beyond Artificial Reality: Finding and Monitoring Live Events from Social Sensors

Authors:
Calton Pu

School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA

School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
View Profile

,
Abhijit Suprem

School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA

School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
View Profile

,
Rodrigo Alves Lima

School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA

School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
View Profile

,
Aibek Musaev

Department of Computer Science, University of Alabama, Tuscaloosa, AL, USA

Department of Computer Science, University of Alabama, Tuscaloosa, AL, USA
View Profile

,
De Wang

Sunmi US Inc, USA

Sunmi US Inc, USA
View Profile

,
Danesh Irani

Google, USA

Google, USA
View Profile

,
Steve Webb

Web Gnomes, USA

Web Gnomes, USA
View Profile

,
Joao Eduardo Ferreira

Department of Computer Science, University of Sao Paulo, Sao Paulo, Brazil

Department of Computer Science, University of Sao Paulo, Sao Paulo, Brazil
View Profile

Authors Info & Claims

ACM Transactions on Internet Technology Volume 20 Issue 1Article No.: 2pp 1–21https://doi.org/10.1145/3374214

Published:02 March 2020Publication History

ACM Transactions on Internet Technology

Abstract

With billions of active social media accounts and millions of live video cameras, live new big data offer many opportunities for smart applications. However, the main consumers of the new big data have been humans. We envision the research on live knowledge, to automatically acquire real-time, validated, and actionable information. Live knowledge presents two significant and diverging technical challenges: big noise and concept drift. We describe the EBKA (evidence-based knowledge acquisition) approach, illustrated by the LITMUS landslide information system. LITMUS achieves both high accuracy and wide coverage, demonstrating the feasibility and promise of EBKA approach to achieve live knowledge.

References

Jeremy Ginsberg, Matthew H. Mohebbi, Rajan S. Patel, Lynnette Brammer, Mark S. Smolinski, and Larry Brilliant. 2009. Detecting influenza epidemics using search engine query data. Nature. 457 (7232), 1012--1014.Google ScholarCross Ref
S. Cook, C. Conrad, A. L. Fowlkes, and M. H. Mohebbi. 2011 Assessing Google Flu trends performance in the United States during the 2009 influenza virus a (H1N1) pandemic. PLoS ONE 6, 8 (2011), e23610.86Google ScholarCross Ref
Google Flu Trends (GTF) failure story. [<https://en.wikipedia.org/wiki/Google_Flu_Trends>]. Retrieved November 9, 2019.Google Scholar
Declan Butler. 2013. When Google got flu wrong. Nature 494, 7436 (2013), 155.Google Scholar
David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. The parable of Google flu: Traps in big data analysis. Science 343, 6176 (2014), 1203--1205.Google Scholar
NTSB preliminary report on the Uber fatal accident in Tempe, Arizona. [https://www.ntsb.gov/investigations/AccidentReports/Reports/HWY18MH010-prelim.pdf]. Retrieved November 9, 2019.Google Scholar
Joseph Farman, C. Brian, G. Gardiner, and Jonathan D. Shanklin. 1985. Large losses of total ozone in Antarctica reveal seasonal ClOx/NOx interaction. Nature 315, 6016 (1985), 207.Google ScholarCross Ref
Microsoft Tay chatbot. [<https://en.wikipedia.org/wiki/Tay_(bot)>]. Retrieved November 9, 2019.Google Scholar
Array of Things project at Github [https://arrayofthings.github.io/]. Retrieved November 9, 2019.Google Scholar
Guia USP and Campus USP: mobile apps for users to communicate with campus police and obtain other information. Available for iPhones (Apple Store) and Android devices (Google Play).Google Scholar
J. E. Ferreira, J. A. Visintin, J. Okamoto, and C. Pu. 2017. Smart services: A case study on smarter public safety by a mobile app for University of São Paulo. In Proceedings of the IEEE SmartWorld Congress.Google Scholar
Sohei Kojima, Akira Uchiyama, Masumi Shirakawa, Akihito Hiromori, Hirozumi Yamaguchi, and Teruo Higashino. 2017. Crowd and event detection by fusion of camera images and micro blogs. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops.Google ScholarCross Ref
GRAIT-DM project and the RCN on Real-Time Big Data Analytics for Resilient Infrastructures in Smart and Connected Communities. [https://grait-dm.gatech.edu/]. Retrieved November 9, 2019.Google Scholar
LITMUS landslide information service [https://grait-dm.gatech.edu/demo-multi-source-integration/]. Retrieved November 9, 2019.Google Scholar
Open Set Recognition [<https://www.wjscheirer.com/projects/openset-recognition/>]. Retrieved November 9, 2019.Google Scholar
Open World Machine Learning [<https://www.cs.uic.edu/~liub/open-classification.html>]. Retrieved November 9, 2019.Google Scholar
Bendale Abhijit and Terrance Boult. 2015. Towards open world recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1893--1902.Google Scholar
T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, B. Yang, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, and J. Krishnamurthy. 2018. Never-ending learning. Commun. ACM, 61, 5 (2018), 103--115.Google ScholarDigital Library
Bing Liu. 2017. Lifelong machine learning: A paradigm for continuous learning. Front. Comput. Sci. 11, 3 (2017), 359--361.Google ScholarDigital Library
Etzioni Oren. 2018. Breaking the mold of machine learning: Technical perspective. Commun. ACM 61, 5 (2018), 102--102.Google ScholarDigital Library
USGS Global Seismographic Network [http://earthquake.usgs.gov/monitoring/gsn/]. Retrieved November 9, 2019.Google Scholar
NASA TRMM. Tropical Rainfall Measuring Mission: Satellite monitoring of the intensity of rainfalls in the tropical and subtropical regions. Retrieved on November 9, 2019 from http://trmm.gsfc.nasa.gov/.Google Scholar
NOAA landslide risk predictions for locations with 7-day rainfall: [https://trmm.gsfc.nasa.gov/trmm_rain/Events/latest_7_day_landslide.html]. Retrieved November 9, 2019.Google Scholar
USGS list of landslide events—Landslide Hazards Program. http://landslides.usgs.gov/recent/. Accessed on September 15, 2015. Discontinued in July 2016 and unavailable as of August 2019. Its previous content may have been preserved by the Internet Archive [http://www.archive.org/].Google Scholar
CDC data on Ebola outbreaks [https://www.cdc.gov/vhf/ebola/history/chronology.html]. Accessed on August 8, 2019.Google Scholar
List of Most Trusted News Sources, compiled by Pew Research Center [http://www.pewresearch.org/fact-tank/2014/10/30/which-news-organization-is-the-most-trusted-the-answer-is-complicated/]. Accessed on September 11, 2015.Google Scholar
BBC poll on trusted news sources per country, [http://www.globescan.com/news_archives/bbcreut_country.html]. Accessed on September 15, 2015.Google Scholar
Facebook data statistics. [https://www.brandwatch.com/blog/facebook-statistics/] and [https://www.quora.com/How-many-bytes-does-Facebook-store-every-day]. Retrieved July 25, 2019.Google Scholar
500M/day tweets on Twitter. [https://www.internetlivestats.com/twitter-statistics/]. Retrieved July 25, 2019.Google Scholar
Alexa's Top 500 Global Sites ranking [https://www.alexa.com/topsites]. Retrieved November 9, 2019.Google Scholar
IBM. 2017. “10 Key Marketing Trends for 2017” [<https://www.ibm.com/downloads/cas/XKBEABLN>]. Retrieved April 8, 2019.Google Scholar
The Stanford Natural Language Processing Group, “Stanford CoreNLP,” [http://nlp.stanford.edu/software/corenlp.shtml]. Retrieved January 2, 2015.Google Scholar
Mikolov Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. ArXiv Preprint ArXiv:1301.3781 (2013).Google Scholar
TensorFlow project website [https://www.tensorflow.org/]. Retrieved November 9, 2019.Google Scholar
Keras documentation website [https://keras.io/]. Retrieved November 9, 2019.Google Scholar
WEKA project website [http://www.cs.waikato.ac.nz/ml/weka/]. Retrieved November 9, 2019.Google Scholar
DeepQA Project and Watson Q8A System created by the group at IBM Research [http://researcher.watson.ibm.com/researcher/view_group.php?id=2099]. Retrieved November 9, 2019.Google Scholar
NIST Text Retrieval Conference (TREC) English documents, 2001. http://trec.nist.gov/data/docs eng.html. Retrieved November 9, 2019.Google Scholar
List of data sets for machine learning research [https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research]. Retrieved November 9, 2019.Google Scholar
MNIST (Modified National Institute of Standards and Technology database) [https://en.wikipedia.org/wiki/MNIST_database]. Retrieved November 9, 2019.Google Scholar
CIFAR-10 (Canadian Institute For Advanced Research), labeled subset (60,000 images) of the 80 million tiny images data set, with 10 classes. [https://www.cs.toronto.edu/~kriz/cifar.html]. The associated CIFAR-100 is a superset that contains 100 classes. Retrieved November 9, 2019.Google Scholar
Calton Pu, Steve Webb, Oleg Kolesnikov, Wenke Lee, Richard Lipton. 2006. Towards the integration of diverse spam filtering techniques. In Proceedings of the IEEE International Conference on Granular Computing.Google ScholarCross Ref
De Wang, Danesh Irani, Calton Pu. 2012. A perspective of evolution after five years: A large-scale study of web spam evolution. Int. J. Coop. Inf. Syst. 23, 2 (2014).Google Scholar
Qinyi Wu, Danesh Irani, Calton Pu, Lakshmish Ramaswamy. 2010. Elusive vandalism detection at Wikipedia: A text stability-based approach. In Proceedings of the 19^th International Conference on Information and Knowledge Management.Google ScholarDigital Library
De Wang, Danesh Irani, and Calton Pu. 2014. SPADE: A social-spam analytics and detection framework. Soc. Netw. Anal. Mining 4, 1 (2014).Google Scholar
Danesh Irani, S. Webb, K. Li, and C. Pu. 2011. Modeling unintended personal information leakage from multiple online social networks IEEE Internet Comput. 15, 3 (May--June 2011), 13--19.Google Scholar
Jenny Luebbe. 2015. How dirty is social data? An analysis of social spam. Netw. Insights (April 1, 2015). [http://www.networkedinsights.com/socialspam/].Google Scholar
Aibek Musaev, De Wang, and Calton Pu. 2014. LITMUS: Landslide detection by integrating multiple sources. In Proceedings of the 11th International Conference on Information Systems for Crisis Response and Management.Google Scholar
Aibek Musaev, De Wang, Chien-An Cho, and Calton Pu. 2014. Landslide detection service based on composition of physical and social information services. In Proceedings of the IEEE International Conference on Web Services.Google ScholarDigital Library
Aibek Musaev, De Wang, Saajan Shridhar, and Calton Pu. 2015. Fast text classification using randomized explicit semantic analysis. In Proceedings of the IEEE International Conference on Information Reuse and Integration for Data Science.Google ScholarDigital Library
Aibek Musaev, De Wang, Saajan Shridhar, and Calton Pu. 2015. Toward a real-time service for landslide detection: Augmented explicit semantic analysis and clustering composition approaches. In Proceedings of the IEEE International Conference on Web Services.Google ScholarDigital Library
Aibek Musaev, De Wang, and Calton Pu. 2015. LITMUS: A multi-service composition system for landslide detection. IEEE Trans. Serv. Comput. 8, 5 (2015), 715--726.Google ScholarCross Ref
D. Wang, A. Musaev, and C. Pu. 2016. Information diffusion analysis of rumor dynamics over a social-interaction based model. In Proceedings of the IEEE 2nd International Conference on Collaboration and Internet Computing.Google Scholar
I. Tien, A. Musaev, D. Benas, A. Ghadi, S. Goodman, and C. Pu. 2016. Detection of damage and failure events of critical public infrastructure using social sensor big data. In Proceedings of the International Conference on Internet of Things and Big Data. 435--440.Google Scholar
Qixuan Hou, A. Musaev, Y. Yang, and C. Pu. 2017. Towards multilingual support of landslides information service. In Proceedings of the IEEE International Conference on Collaborative and Internet Computing.Google Scholar
A. Musaev and C. Pu. 2017. Towards multilingual automated classification systems. In Proceedings of the IEEE 37^th International Conference on Distributed Computing Systems.Google Scholar
A. Musaev, Q. Hou, Y. Yang, and C. Pu. 2017. LITMUS: Towards multilingual reporting of landslides. In Proceedings of the IEEE 37th International Conference on Distributed Computing Systems.Google Scholar
A. Musaev, D. Wang, J. Xie, and C. Pu. 2017. REX: Rapid ensemble classification system for landslide detection using social media. In Proceedings of the IEEE 37^th International Conference on Distributed Computing Systems.Google Scholar
Aibek Musaev and Calton Pu. 2017. Landslide information service based on composition of physical and social sensors. In Proceedings of the IEEE International Conference on Data Engineering.Google ScholarCross Ref
Abhijit Suprem and Pu Calton. 2019. ASSED—A framework for identifying physical events through adaptive social sensor data filtering. In Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems.Google ScholarDigital Library
A. Suprem, A. Musaev, and C. Pu. 2019. Concept drift adaptive physical event detection for social media streams. In Proceedings of the World Congress on Services. Lecture Notes in Computer Science, Y. Xia, L. J. Zhang (eds.). Springer, Cham, 11517.Google Scholar
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1D998), 2278--2324.Google Scholar
T. Sakaki, M. Okazaki, and Y. Matsuo. 2010. Earthquake shakes Twitter users: Real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web. 851--860.Google Scholar
X. Wang, F. Zhu, J. Jiang, and S. Li. 2013. Real time event detection in Twitter. In Web-Age Information Management, Vol. 7923, Lecture Notes in Computer Science, 502--513. Springer Berlin.Google Scholar
K. Radinsky and E. Horvitz. 2013. Mining the web to predict future events. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 255--264.Google Scholar
M. Kitsuregawa and M. Toyoda. 2011. Analytics for info-plosion including information diffusion studies for the 3.11 disaster. In Web-Age Information Management,Vol. 6897, Lecture Notes in Computer Science, 1--1. Springer Berlin.Google Scholar
Jonathan A. Silva, Elaine R. Faria, Rodrigo C. Barros, Eduardo R. Hruschka, Andre C. P. L. F. De Carvalho, and João Gama. 2013. Data stream clustering: A survey. ACM Comput. Surv. 46, 1 (2013), 13.Google ScholarDigital Library
Sergio Ramírez-Gallego, Bartosz Krawczyk, Salvador García, Michał Woźniak, and Francisco Herrera. 2017. A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing 239 (2017), 39--57.Google ScholarDigital Library
Atefeh Farzindar and Wael Khreich. 2015. A survey of techniques for event detection in Twitter. Comput. Intell. 31, 1 (2015), 132--164.Google ScholarDigital Library
Pan Sinno Jialin and Qiang Yang. 2009. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10 (2009), 1345--1359.Google Scholar
Karl Weiss, Taghi M. Khoshgoftaar, and Ding Ding Wang. 2016. A survey of transfer learning. J. Big Data 3, 1 (2016), 9.Google ScholarCross Ref
J. A. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia. 2014. A survey on concept drift adaptation. ACM Comput. Surv. 46, 4 (2014), 44 1--37.Google Scholar
Sun Yu, Ke Tang, Zexuan Zhu, and Xin Yao. 2018. Concept drift adaptation by exploiting historical knowledge. IEEE Trans. Neural Netw. Learn. Syst. 29, 10 (2018), 4822--4832.Google ScholarCross Ref
Geoffrey I. Webb, Loong Kuan Lee, Bart Goethals, and François Petitjean. 2018. Analyzing concept drift and shift from sample data. Data Mining Knowl. Disc. 32, 5 (2018), 1179--1199.Google ScholarDigital Library
Avidan Shai. 2007. Ensemble tracking. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2 (2007).Google Scholar
Helmut Grabner, Michael Grabner, and Horst Bischof. 2006. Real-time tracking via on-line boosting. In Proceedings of the British Machine Vision Conference 1, 5 (2006), 6.Google ScholarCross Ref
Mahmud Hasan, Mehmet A. Orgun, and Rolf Schwitter. 2019. Real-time event detection from the Twitter data stream using the Twitternews+ framework. Inf. Proc. Manag. 56, 3 (2019), 1146--1165.Google ScholarDigital Library
M. Hasan, M. A. Orgun, and R. Schwitter. 2017. A survey on real-time event detection from the Twitter data stream. J. Inf. Sci. 44, 4 (2017), 443--463. DOI:http://dx.doi.org/10.1177/0165551517698564 0165551517698564Google ScholarDigital Library
Chao Zhang, Dongming Lei, Quan Yuan, Honglei Zhuang, Lance Kaplan, Shaowen Wang, and Jiawei Han. 2018. Geoburst+: Effective and real-time local event detection in geo-tagged tweet streams. ACM Trans. Intell. Syst. Technol. 9, 3 (2018), 34.Google ScholarDigital Library
Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. 2012. Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34, 7 (2012), 1409--1422.Google ScholarDigital Library
Qinxun Bai, Zheng Wu, Stan Sclaroff, Margrit Betke, and Camille Monnier. 2013. Randomized ensemble tracking. In Proceedings of the IEEE International Conference on Computer Vision. 2040--2047.Google ScholarDigital Library
Bartosz Krawczyk, Leandro L. Minku, João Gama, Jerzy Stefanowski, Michał Woźniak. 2017. Ensemble learning for data stream analysis: A survey. Inf. Fusion 37 (2017), 132--156, Elsevier.Google ScholarDigital Library
Cha Zhang and Yunqian Ma (eds.). 2012. Ensemble Machine Learning: Methods and Applications. Springer Science 8 Business Media.Google Scholar
K-means clustering. [<https://en.wikipedia.org/wiki/K-means_clustering>].Google Scholar
Burr Settles. 2009. Active Learning Literature Survey. Technical report. University of Wisconsin-Madison Department of Computer Sciences.Google Scholar
Panagiotis G. Ipeirotis and Evgeniy Gabrilovich. 2014. Quizz: Targeted crowdsourcing with a billion (potential) users. In Proceedings of the 23rd International Conference on World Wide Web. 143--154.Google Scholar
Audun Josang, Roslan Ismail, and Colin A. Boyd. 2007. A survey of trust and reputation systems for online service provisioning. Dec. Supp. Syst. 43, 2 (Mar. 2007), 618--644. Elsevier.Google Scholar
E. Lex, C. Seifert, M. Granitzer, and A. Junger. 2010. Efficient cross-domain classification of weblogs. Int. J. Intell. Comput. Res. 1, 1 (2010), 36--45.Google ScholarCross Ref
S. J. Pan, X. Ni, J.-T. Sun, Q. Yang, and Z. Chen. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th International Conference on World Wide Web,. 751--760.Google Scholar
Y. Zhen and C. Li. 2008. Cross-domain knowledge transfer using semi-supervised classification. In AI 2008: Advances in Artificial Intelligence, Vol. 5360, Lecture Notes in Computer Science, 362--371. Springer Berlin.Google Scholar
Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT press.Google ScholarDigital Library
Fei-Yue Wang, Jun Jason Zhang, Xinhu Zheng, Xiao Wang, Yong Yuan, Xiaoxiao Dai, Jie Zhang, and Liuqing Yang. 2016. Where does AlphaGo go: From church-turing thesis to AlphaGo thesis and beyond. IEEE/CAA J. Autom. Sin. 3, 2 (2016), 113--120.Google ScholarCross Ref
Hutter Frank, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated machine learning-methods, systems, challenges. Autom. Mach. Learn. Springer, New York, NY, USA.Google Scholar
ImageNet data set. Retrieved on November 9, 2019 from http://www.image-net.org/.Google Scholar

Index Terms

Beyond Artificial Reality: Finding and Monitoring Live Events from Social Sensors
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications

Recommendations

Learning concept-drifting data streams with random ensemble decision trees

Few online classification algorithms based on traditional inductive ensembling, such as online bagging or boosting, focus on handling concept drifting data streams while performing well on noisy data. Motivated by this, an incremental algorithm based on ...
Read More
Decision trees for mining data streams

In this paper we study the problem of constructing accurate decision tree models from data streams. Data streams are incremental tasks that require incremental, online, and any-time learning algorithms. One of the most successful algorithms for mining ...
Read More
Ensemble learning for data stream analysis

A comprehensive survey of ensemble approaches for data stream analysis.Taxonomy of ensemble algorithms for various data stream mining tasks.Discussion of open research problems and lines of future research. In many applications of information systems ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Internet Technology Volume 20, Issue 1
Visions and Regular Papers
February 2020
135 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/3381410
Editor:
Ling Liu
Georgia Institute of Technology, USA
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 March 2020
- Revised: 1 November 2019
- Accepted: 1 November 2019
- Received: 1 August 2019
Published in toit Volume 20, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Artificial reality
concept drift
evidence-based knowledge acquisition
live knowledge
real-time event detection
true novelty
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 403
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Beyond Artificial Reality: Finding and Monitoring Live Events from Social Sensors

ACM Transactions on Internet Technology

Abstract

References

Cited By

Index Terms

Recommendations

Learning concept-drifting data streams with random ensemble decision trees

Decision trees for mining data streams

Ensemble learning for data stream analysis