skip to main content
research-article

Beyond Artificial Reality: Finding and Monitoring Live Events from Social Sensors

Authors Info & Claims
Published:02 March 2020Publication History
Skip Abstract Section

Abstract

With billions of active social media accounts and millions of live video cameras, live new big data offer many opportunities for smart applications. However, the main consumers of the new big data have been humans. We envision the research on live knowledge, to automatically acquire real-time, validated, and actionable information. Live knowledge presents two significant and diverging technical challenges: big noise and concept drift. We describe the EBKA (evidence-based knowledge acquisition) approach, illustrated by the LITMUS landslide information system. LITMUS achieves both high accuracy and wide coverage, demonstrating the feasibility and promise of EBKA approach to achieve live knowledge.

References

  1. Jeremy Ginsberg, Matthew H. Mohebbi, Rajan S. Patel, Lynnette Brammer, Mark S. Smolinski, and Larry Brilliant. 2009. Detecting influenza epidemics using search engine query data. Nature. 457 (7232), 1012--1014.Google ScholarGoogle ScholarCross RefCross Ref
  2. S. Cook, C. Conrad, A. L. Fowlkes, and M. H. Mohebbi. 2011 Assessing Google Flu trends performance in the United States during the 2009 influenza virus a (H1N1) pandemic. PLoS ONE 6, 8 (2011), e23610.86Google ScholarGoogle ScholarCross RefCross Ref
  3. Google Flu Trends (GTF) failure story. [<https://en.wikipedia.org/wiki/Google_Flu_Trends>]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  4. Declan Butler. 2013. When Google got flu wrong. Nature 494, 7436 (2013), 155.Google ScholarGoogle Scholar
  5. David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. The parable of Google flu: Traps in big data analysis. Science 343, 6176 (2014), 1203--1205.Google ScholarGoogle Scholar
  6. NTSB preliminary report on the Uber fatal accident in Tempe, Arizona. [https://www.ntsb.gov/investigations/AccidentReports/Reports/HWY18MH010-prelim.pdf]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  7. Joseph Farman, C. Brian, G. Gardiner, and Jonathan D. Shanklin. 1985. Large losses of total ozone in Antarctica reveal seasonal ClOx/NOx interaction. Nature 315, 6016 (1985), 207.Google ScholarGoogle ScholarCross RefCross Ref
  8. Microsoft Tay chatbot. [<https://en.wikipedia.org/wiki/Tay_(bot)>]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  9. Array of Things project at Github [https://arrayofthings.github.io/]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  10. Guia USP and Campus USP: mobile apps for users to communicate with campus police and obtain other information. Available for iPhones (Apple Store) and Android devices (Google Play).Google ScholarGoogle Scholar
  11. J. E. Ferreira, J. A. Visintin, J. Okamoto, and C. Pu. 2017. Smart services: A case study on smarter public safety by a mobile app for University of São Paulo. In Proceedings of the IEEE SmartWorld Congress.Google ScholarGoogle Scholar
  12. Sohei Kojima, Akira Uchiyama, Masumi Shirakawa, Akihito Hiromori, Hirozumi Yamaguchi, and Teruo Higashino. 2017. Crowd and event detection by fusion of camera images and micro blogs. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops.Google ScholarGoogle ScholarCross RefCross Ref
  13. GRAIT-DM project and the RCN on Real-Time Big Data Analytics for Resilient Infrastructures in Smart and Connected Communities. [https://grait-dm.gatech.edu/]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  14. LITMUS landslide information service [https://grait-dm.gatech.edu/demo-multi-source-integration/]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  15. Open Set Recognition [<https://www.wjscheirer.com/projects/openset-recognition/>]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  16. Open World Machine Learning [<https://www.cs.uic.edu/~liub/open-classification.html>]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  17. Bendale Abhijit and Terrance Boult. 2015. Towards open world recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1893--1902.Google ScholarGoogle Scholar
  18. T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, B. Yang, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, and J. Krishnamurthy. 2018. Never-ending learning. Commun. ACM, 61, 5 (2018), 103--115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Bing Liu. 2017. Lifelong machine learning: A paradigm for continuous learning. Front. Comput. Sci. 11, 3 (2017), 359--361.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Etzioni Oren. 2018. Breaking the mold of machine learning: Technical perspective. Commun. ACM 61, 5 (2018), 102--102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. USGS Global Seismographic Network [http://earthquake.usgs.gov/monitoring/gsn/]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  22. NASA TRMM. Tropical Rainfall Measuring Mission: Satellite monitoring of the intensity of rainfalls in the tropical and subtropical regions. Retrieved on November 9, 2019 from http://trmm.gsfc.nasa.gov/.Google ScholarGoogle Scholar
  23. NOAA landslide risk predictions for locations with 7-day rainfall: [https://trmm.gsfc.nasa.gov/trmm_rain/Events/latest_7_day_landslide.html]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  24. USGS list of landslide events—Landslide Hazards Program. http://landslides.usgs.gov/recent/. Accessed on September 15, 2015. Discontinued in July 2016 and unavailable as of August 2019. Its previous content may have been preserved by the Internet Archive [http://www.archive.org/].Google ScholarGoogle Scholar
  25. CDC data on Ebola outbreaks [https://www.cdc.gov/vhf/ebola/history/chronology.html]. Accessed on August 8, 2019.Google ScholarGoogle Scholar
  26. List of Most Trusted News Sources, compiled by Pew Research Center [http://www.pewresearch.org/fact-tank/2014/10/30/which-news-organization-is-the-most-trusted-the-answer-is-complicated/]. Accessed on September 11, 2015.Google ScholarGoogle Scholar
  27. BBC poll on trusted news sources per country, [http://www.globescan.com/news_archives/bbcreut_country.html]. Accessed on September 15, 2015.Google ScholarGoogle Scholar
  28. Facebook data statistics. [https://www.brandwatch.com/blog/facebook-statistics/] and [https://www.quora.com/How-many-bytes-does-Facebook-store-every-day]. Retrieved July 25, 2019.Google ScholarGoogle Scholar
  29. 500M/day tweets on Twitter. [https://www.internetlivestats.com/twitter-statistics/]. Retrieved July 25, 2019.Google ScholarGoogle Scholar
  30. Alexa's Top 500 Global Sites ranking [https://www.alexa.com/topsites]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  31. IBM. 2017. “10 Key Marketing Trends for 2017” [<https://www.ibm.com/downloads/cas/XKBEABLN>]. Retrieved April 8, 2019.Google ScholarGoogle Scholar
  32. The Stanford Natural Language Processing Group, “Stanford CoreNLP,” [http://nlp.stanford.edu/software/corenlp.shtml]. Retrieved January 2, 2015.Google ScholarGoogle Scholar
  33. Mikolov Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. ArXiv Preprint ArXiv:1301.3781 (2013).Google ScholarGoogle Scholar
  34. TensorFlow project website [https://www.tensorflow.org/]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  35. Keras documentation website [https://keras.io/]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  36. WEKA project website [http://www.cs.waikato.ac.nz/ml/weka/]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  37. DeepQA Project and Watson Q8A System created by the group at IBM Research [http://researcher.watson.ibm.com/researcher/view_group.php?id=2099]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  38. NIST Text Retrieval Conference (TREC) English documents, 2001. http://trec.nist.gov/data/docs eng.html. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  39. List of data sets for machine learning research [https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  40. MNIST (Modified National Institute of Standards and Technology database) [https://en.wikipedia.org/wiki/MNIST_database]. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  41. CIFAR-10 (Canadian Institute For Advanced Research), labeled subset (60,000 images) of the 80 million tiny images data set, with 10 classes. [https://www.cs.toronto.edu/~kriz/cifar.html]. The associated CIFAR-100 is a superset that contains 100 classes. Retrieved November 9, 2019.Google ScholarGoogle Scholar
  42. Calton Pu, Steve Webb, Oleg Kolesnikov, Wenke Lee, Richard Lipton. 2006. Towards the integration of diverse spam filtering techniques. In Proceedings of the IEEE International Conference on Granular Computing.Google ScholarGoogle ScholarCross RefCross Ref
  43. De Wang, Danesh Irani, Calton Pu. 2012. A perspective of evolution after five years: A large-scale study of web spam evolution. Int. J. Coop. Inf. Syst. 23, 2 (2014).Google ScholarGoogle Scholar
  44. Qinyi Wu, Danesh Irani, Calton Pu, Lakshmish Ramaswamy. 2010. Elusive vandalism detection at Wikipedia: A text stability-based approach. In Proceedings of the 19th International Conference on Information and Knowledge Management.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. De Wang, Danesh Irani, and Calton Pu. 2014. SPADE: A social-spam analytics and detection framework. Soc. Netw. Anal. Mining 4, 1 (2014).Google ScholarGoogle Scholar
  46. Danesh Irani, S. Webb, K. Li, and C. Pu. 2011. Modeling unintended personal information leakage from multiple online social networks IEEE Internet Comput. 15, 3 (May--June 2011), 13--19.Google ScholarGoogle Scholar
  47. Jenny Luebbe. 2015. How dirty is social data? An analysis of social spam. Netw. Insights (April 1, 2015). [http://www.networkedinsights.com/socialspam/].Google ScholarGoogle Scholar
  48. Aibek Musaev, De Wang, and Calton Pu. 2014. LITMUS: Landslide detection by integrating multiple sources. In Proceedings of the 11th International Conference on Information Systems for Crisis Response and Management.Google ScholarGoogle Scholar
  49. Aibek Musaev, De Wang, Chien-An Cho, and Calton Pu. 2014. Landslide detection service based on composition of physical and social information services. In Proceedings of the IEEE International Conference on Web Services.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Aibek Musaev, De Wang, Saajan Shridhar, and Calton Pu. 2015. Fast text classification using randomized explicit semantic analysis. In Proceedings of the IEEE International Conference on Information Reuse and Integration for Data Science.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Aibek Musaev, De Wang, Saajan Shridhar, and Calton Pu. 2015. Toward a real-time service for landslide detection: Augmented explicit semantic analysis and clustering composition approaches. In Proceedings of the IEEE International Conference on Web Services.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Aibek Musaev, De Wang, and Calton Pu. 2015. LITMUS: A multi-service composition system for landslide detection. IEEE Trans. Serv. Comput. 8, 5 (2015), 715--726.Google ScholarGoogle ScholarCross RefCross Ref
  53. D. Wang, A. Musaev, and C. Pu. 2016. Information diffusion analysis of rumor dynamics over a social-interaction based model. In Proceedings of the IEEE 2nd International Conference on Collaboration and Internet Computing.Google ScholarGoogle Scholar
  54. I. Tien, A. Musaev, D. Benas, A. Ghadi, S. Goodman, and C. Pu. 2016. Detection of damage and failure events of critical public infrastructure using social sensor big data. In Proceedings of the International Conference on Internet of Things and Big Data. 435--440.Google ScholarGoogle Scholar
  55. Qixuan Hou, A. Musaev, Y. Yang, and C. Pu. 2017. Towards multilingual support of landslides information service. In Proceedings of the IEEE International Conference on Collaborative and Internet Computing.Google ScholarGoogle Scholar
  56. A. Musaev and C. Pu. 2017. Towards multilingual automated classification systems. In Proceedings of the IEEE 37th International Conference on Distributed Computing Systems.Google ScholarGoogle Scholar
  57. A. Musaev, Q. Hou, Y. Yang, and C. Pu. 2017. LITMUS: Towards multilingual reporting of landslides. In Proceedings of the IEEE 37th International Conference on Distributed Computing Systems.Google ScholarGoogle Scholar
  58. A. Musaev, D. Wang, J. Xie, and C. Pu. 2017. REX: Rapid ensemble classification system for landslide detection using social media. In Proceedings of the IEEE 37th International Conference on Distributed Computing Systems.Google ScholarGoogle Scholar
  59. Aibek Musaev and Calton Pu. 2017. Landslide information service based on composition of physical and social sensors. In Proceedings of the IEEE International Conference on Data Engineering.Google ScholarGoogle ScholarCross RefCross Ref
  60. Abhijit Suprem and Pu Calton. 2019. ASSED—A framework for identifying physical events through adaptive social sensor data filtering. In Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. A. Suprem, A. Musaev, and C. Pu. 2019. Concept drift adaptive physical event detection for social media streams. In Proceedings of the World Congress on Services. Lecture Notes in Computer Science, Y. Xia, L. J. Zhang (eds.). Springer, Cham, 11517.Google ScholarGoogle Scholar
  62. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1D998), 2278--2324.Google ScholarGoogle Scholar
  63. T. Sakaki, M. Okazaki, and Y. Matsuo. 2010. Earthquake shakes Twitter users: Real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web. 851--860.Google ScholarGoogle Scholar
  64. X. Wang, F. Zhu, J. Jiang, and S. Li. 2013. Real time event detection in Twitter. In Web-Age Information Management, Vol. 7923, Lecture Notes in Computer Science, 502--513. Springer Berlin.Google ScholarGoogle Scholar
  65. K. Radinsky and E. Horvitz. 2013. Mining the web to predict future events. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 255--264.Google ScholarGoogle Scholar
  66. M. Kitsuregawa and M. Toyoda. 2011. Analytics for info-plosion including information diffusion studies for the 3.11 disaster. In Web-Age Information Management,Vol. 6897, Lecture Notes in Computer Science, 1--1. Springer Berlin.Google ScholarGoogle Scholar
  67. Jonathan A. Silva, Elaine R. Faria, Rodrigo C. Barros, Eduardo R. Hruschka, Andre C. P. L. F. De Carvalho, and João Gama. 2013. Data stream clustering: A survey. ACM Comput. Surv. 46, 1 (2013), 13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Sergio Ramírez-Gallego, Bartosz Krawczyk, Salvador García, Michał Woźniak, and Francisco Herrera. 2017. A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing 239 (2017), 39--57.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Atefeh Farzindar and Wael Khreich. 2015. A survey of techniques for event detection in Twitter. Comput. Intell. 31, 1 (2015), 132--164.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Pan Sinno Jialin and Qiang Yang. 2009. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10 (2009), 1345--1359.Google ScholarGoogle Scholar
  71. Karl Weiss, Taghi M. Khoshgoftaar, and Ding Ding Wang. 2016. A survey of transfer learning. J. Big Data 3, 1 (2016), 9.Google ScholarGoogle ScholarCross RefCross Ref
  72. J. A. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia. 2014. A survey on concept drift adaptation. ACM Comput. Surv. 46, 4 (2014), 44 1--37.Google ScholarGoogle Scholar
  73. Sun Yu, Ke Tang, Zexuan Zhu, and Xin Yao. 2018. Concept drift adaptation by exploiting historical knowledge. IEEE Trans. Neural Netw. Learn. Syst. 29, 10 (2018), 4822--4832.Google ScholarGoogle ScholarCross RefCross Ref
  74. Geoffrey I. Webb, Loong Kuan Lee, Bart Goethals, and François Petitjean. 2018. Analyzing concept drift and shift from sample data. Data Mining Knowl. Disc. 32, 5 (2018), 1179--1199.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Avidan Shai. 2007. Ensemble tracking. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2 (2007).Google ScholarGoogle Scholar
  76. Helmut Grabner, Michael Grabner, and Horst Bischof. 2006. Real-time tracking via on-line boosting. In Proceedings of the British Machine Vision Conference 1, 5 (2006), 6.Google ScholarGoogle ScholarCross RefCross Ref
  77. Mahmud Hasan, Mehmet A. Orgun, and Rolf Schwitter. 2019. Real-time event detection from the Twitter data stream using the Twitternews+ framework. Inf. Proc. Manag. 56, 3 (2019), 1146--1165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. M. Hasan, M. A. Orgun, and R. Schwitter. 2017. A survey on real-time event detection from the Twitter data stream. J. Inf. Sci. 44, 4 (2017), 443--463. DOI:http://dx.doi.org/10.1177/0165551517698564 0165551517698564Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Chao Zhang, Dongming Lei, Quan Yuan, Honglei Zhuang, Lance Kaplan, Shaowen Wang, and Jiawei Han. 2018. Geoburst+: Effective and real-time local event detection in geo-tagged tweet streams. ACM Trans. Intell. Syst. Technol. 9, 3 (2018), 34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. 2012. Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34, 7 (2012), 1409--1422.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Qinxun Bai, Zheng Wu, Stan Sclaroff, Margrit Betke, and Camille Monnier. 2013. Randomized ensemble tracking. In Proceedings of the IEEE International Conference on Computer Vision. 2040--2047.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Bartosz Krawczyk, Leandro L. Minku, João Gama, Jerzy Stefanowski, Michał Woźniak. 2017. Ensemble learning for data stream analysis: A survey. Inf. Fusion 37 (2017), 132--156, Elsevier.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Cha Zhang and Yunqian Ma (eds.). 2012. Ensemble Machine Learning: Methods and Applications. Springer Science 8 Business Media.Google ScholarGoogle Scholar
  84. K-means clustering. [<https://en.wikipedia.org/wiki/K-means_clustering>].Google ScholarGoogle Scholar
  85. Burr Settles. 2009. Active Learning Literature Survey. Technical report. University of Wisconsin-Madison Department of Computer Sciences.Google ScholarGoogle Scholar
  86. Panagiotis G. Ipeirotis and Evgeniy Gabrilovich. 2014. Quizz: Targeted crowdsourcing with a billion (potential) users. In Proceedings of the 23rd International Conference on World Wide Web. 143--154.Google ScholarGoogle Scholar
  87. Audun Josang, Roslan Ismail, and Colin A. Boyd. 2007. A survey of trust and reputation systems for online service provisioning. Dec. Supp. Syst. 43, 2 (Mar. 2007), 618--644. Elsevier.Google ScholarGoogle Scholar
  88. E. Lex, C. Seifert, M. Granitzer, and A. Junger. 2010. Efficient cross-domain classification of weblogs. Int. J. Intell. Comput. Res. 1, 1 (2010), 36--45.Google ScholarGoogle ScholarCross RefCross Ref
  89. S. J. Pan, X. Ni, J.-T. Sun, Q. Yang, and Z. Chen. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th International Conference on World Wide Web,. 751--760.Google ScholarGoogle Scholar
  90. Y. Zhen and C. Li. 2008. Cross-domain knowledge transfer using semi-supervised classification. In AI 2008: Advances in Artificial Intelligence, Vol. 5360, Lecture Notes in Computer Science, 362--371. Springer Berlin.Google ScholarGoogle Scholar
  91. Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Fei-Yue Wang, Jun Jason Zhang, Xinhu Zheng, Xiao Wang, Yong Yuan, Xiaoxiao Dai, Jie Zhang, and Liuqing Yang. 2016. Where does AlphaGo go: From church-turing thesis to AlphaGo thesis and beyond. IEEE/CAA J. Autom. Sin. 3, 2 (2016), 113--120.Google ScholarGoogle ScholarCross RefCross Ref
  93. Hutter Frank, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated machine learning-methods, systems, challenges. Autom. Mach. Learn. Springer, New York, NY, USA.Google ScholarGoogle Scholar
  94. ImageNet data set. Retrieved on November 9, 2019 from http://www.image-net.org/.Google ScholarGoogle Scholar

Index Terms

  1. Beyond Artificial Reality: Finding and Monitoring Live Events from Social Sensors

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Internet Technology
        ACM Transactions on Internet Technology  Volume 20, Issue 1
        Visions and Regular Papers
        February 2020
        135 pages
        ISSN:1533-5399
        EISSN:1557-6051
        DOI:10.1145/3381410
        • Editor:
        • Ling Liu
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 March 2020
        • Revised: 1 November 2019
        • Accepted: 1 November 2019
        • Received: 1 August 2019
        Published in toit Volume 20, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format