Abstract
With billions of active social media accounts and millions of live video cameras, live new big data offer many opportunities for smart applications. However, the main consumers of the new big data have been humans. We envision the research on live knowledge, to automatically acquire real-time, validated, and actionable information. Live knowledge presents two significant and diverging technical challenges: big noise and concept drift. We describe the EBKA (evidence-based knowledge acquisition) approach, illustrated by the LITMUS landslide information system. LITMUS achieves both high accuracy and wide coverage, demonstrating the feasibility and promise of EBKA approach to achieve live knowledge.
- Jeremy Ginsberg, Matthew H. Mohebbi, Rajan S. Patel, Lynnette Brammer, Mark S. Smolinski, and Larry Brilliant. 2009. Detecting influenza epidemics using search engine query data. Nature. 457 (7232), 1012--1014.Google ScholarCross Ref
- S. Cook, C. Conrad, A. L. Fowlkes, and M. H. Mohebbi. 2011 Assessing Google Flu trends performance in the United States during the 2009 influenza virus a (H1N1) pandemic. PLoS ONE 6, 8 (2011), e23610.86Google ScholarCross Ref
- Google Flu Trends (GTF) failure story. [<https://en.wikipedia.org/wiki/Google_Flu_Trends>]. Retrieved November 9, 2019.Google Scholar
- Declan Butler. 2013. When Google got flu wrong. Nature 494, 7436 (2013), 155.Google Scholar
- David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. The parable of Google flu: Traps in big data analysis. Science 343, 6176 (2014), 1203--1205.Google Scholar
- NTSB preliminary report on the Uber fatal accident in Tempe, Arizona. [https://www.ntsb.gov/investigations/AccidentReports/Reports/HWY18MH010-prelim.pdf]. Retrieved November 9, 2019.Google Scholar
- Joseph Farman, C. Brian, G. Gardiner, and Jonathan D. Shanklin. 1985. Large losses of total ozone in Antarctica reveal seasonal ClOx/NOx interaction. Nature 315, 6016 (1985), 207.Google ScholarCross Ref
- Microsoft Tay chatbot. [<https://en.wikipedia.org/wiki/Tay_(bot)>]. Retrieved November 9, 2019.Google Scholar
- Array of Things project at Github [https://arrayofthings.github.io/]. Retrieved November 9, 2019.Google Scholar
- Guia USP and Campus USP: mobile apps for users to communicate with campus police and obtain other information. Available for iPhones (Apple Store) and Android devices (Google Play).Google Scholar
- J. E. Ferreira, J. A. Visintin, J. Okamoto, and C. Pu. 2017. Smart services: A case study on smarter public safety by a mobile app for University of São Paulo. In Proceedings of the IEEE SmartWorld Congress.Google Scholar
- Sohei Kojima, Akira Uchiyama, Masumi Shirakawa, Akihito Hiromori, Hirozumi Yamaguchi, and Teruo Higashino. 2017. Crowd and event detection by fusion of camera images and micro blogs. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops.Google ScholarCross Ref
- GRAIT-DM project and the RCN on Real-Time Big Data Analytics for Resilient Infrastructures in Smart and Connected Communities. [https://grait-dm.gatech.edu/]. Retrieved November 9, 2019.Google Scholar
- LITMUS landslide information service [https://grait-dm.gatech.edu/demo-multi-source-integration/]. Retrieved November 9, 2019.Google Scholar
- Open Set Recognition [<https://www.wjscheirer.com/projects/openset-recognition/>]. Retrieved November 9, 2019.Google Scholar
- Open World Machine Learning [<https://www.cs.uic.edu/~liub/open-classification.html>]. Retrieved November 9, 2019.Google Scholar
- Bendale Abhijit and Terrance Boult. 2015. Towards open world recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1893--1902.Google Scholar
- T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, B. Yang, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, and J. Krishnamurthy. 2018. Never-ending learning. Commun. ACM, 61, 5 (2018), 103--115.Google ScholarDigital Library
- Bing Liu. 2017. Lifelong machine learning: A paradigm for continuous learning. Front. Comput. Sci. 11, 3 (2017), 359--361.Google ScholarDigital Library
- Etzioni Oren. 2018. Breaking the mold of machine learning: Technical perspective. Commun. ACM 61, 5 (2018), 102--102.Google ScholarDigital Library
- USGS Global Seismographic Network [http://earthquake.usgs.gov/monitoring/gsn/]. Retrieved November 9, 2019.Google Scholar
- NASA TRMM. Tropical Rainfall Measuring Mission: Satellite monitoring of the intensity of rainfalls in the tropical and subtropical regions. Retrieved on November 9, 2019 from http://trmm.gsfc.nasa.gov/.Google Scholar
- NOAA landslide risk predictions for locations with 7-day rainfall: [https://trmm.gsfc.nasa.gov/trmm_rain/Events/latest_7_day_landslide.html]. Retrieved November 9, 2019.Google Scholar
- USGS list of landslide events—Landslide Hazards Program. http://landslides.usgs.gov/recent/. Accessed on September 15, 2015. Discontinued in July 2016 and unavailable as of August 2019. Its previous content may have been preserved by the Internet Archive [http://www.archive.org/].Google Scholar
- CDC data on Ebola outbreaks [https://www.cdc.gov/vhf/ebola/history/chronology.html]. Accessed on August 8, 2019.Google Scholar
- List of Most Trusted News Sources, compiled by Pew Research Center [http://www.pewresearch.org/fact-tank/2014/10/30/which-news-organization-is-the-most-trusted-the-answer-is-complicated/]. Accessed on September 11, 2015.Google Scholar
- BBC poll on trusted news sources per country, [http://www.globescan.com/news_archives/bbcreut_country.html]. Accessed on September 15, 2015.Google Scholar
- Facebook data statistics. [https://www.brandwatch.com/blog/facebook-statistics/] and [https://www.quora.com/How-many-bytes-does-Facebook-store-every-day]. Retrieved July 25, 2019.Google Scholar
- 500M/day tweets on Twitter. [https://www.internetlivestats.com/twitter-statistics/]. Retrieved July 25, 2019.Google Scholar
- Alexa's Top 500 Global Sites ranking [https://www.alexa.com/topsites]. Retrieved November 9, 2019.Google Scholar
- IBM. 2017. “10 Key Marketing Trends for 2017” [<https://www.ibm.com/downloads/cas/XKBEABLN>]. Retrieved April 8, 2019.Google Scholar
- The Stanford Natural Language Processing Group, “Stanford CoreNLP,” [http://nlp.stanford.edu/software/corenlp.shtml]. Retrieved January 2, 2015.Google Scholar
- Mikolov Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. ArXiv Preprint ArXiv:1301.3781 (2013).Google Scholar
- TensorFlow project website [https://www.tensorflow.org/]. Retrieved November 9, 2019.Google Scholar
- Keras documentation website [https://keras.io/]. Retrieved November 9, 2019.Google Scholar
- WEKA project website [http://www.cs.waikato.ac.nz/ml/weka/]. Retrieved November 9, 2019.Google Scholar
- DeepQA Project and Watson Q8A System created by the group at IBM Research [http://researcher.watson.ibm.com/researcher/view_group.php?id=2099]. Retrieved November 9, 2019.Google Scholar
- NIST Text Retrieval Conference (TREC) English documents, 2001. http://trec.nist.gov/data/docs eng.html. Retrieved November 9, 2019.Google Scholar
- List of data sets for machine learning research [https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research]. Retrieved November 9, 2019.Google Scholar
- MNIST (Modified National Institute of Standards and Technology database) [https://en.wikipedia.org/wiki/MNIST_database]. Retrieved November 9, 2019.Google Scholar
- CIFAR-10 (Canadian Institute For Advanced Research), labeled subset (60,000 images) of the 80 million tiny images data set, with 10 classes. [https://www.cs.toronto.edu/~kriz/cifar.html]. The associated CIFAR-100 is a superset that contains 100 classes. Retrieved November 9, 2019.Google Scholar
- Calton Pu, Steve Webb, Oleg Kolesnikov, Wenke Lee, Richard Lipton. 2006. Towards the integration of diverse spam filtering techniques. In Proceedings of the IEEE International Conference on Granular Computing.Google ScholarCross Ref
- De Wang, Danesh Irani, Calton Pu. 2012. A perspective of evolution after five years: A large-scale study of web spam evolution. Int. J. Coop. Inf. Syst. 23, 2 (2014).Google Scholar
- Qinyi Wu, Danesh Irani, Calton Pu, Lakshmish Ramaswamy. 2010. Elusive vandalism detection at Wikipedia: A text stability-based approach. In Proceedings of the 19th International Conference on Information and Knowledge Management.Google ScholarDigital Library
- De Wang, Danesh Irani, and Calton Pu. 2014. SPADE: A social-spam analytics and detection framework. Soc. Netw. Anal. Mining 4, 1 (2014).Google Scholar
- Danesh Irani, S. Webb, K. Li, and C. Pu. 2011. Modeling unintended personal information leakage from multiple online social networks IEEE Internet Comput. 15, 3 (May--June 2011), 13--19.Google Scholar
- Jenny Luebbe. 2015. How dirty is social data? An analysis of social spam. Netw. Insights (April 1, 2015). [http://www.networkedinsights.com/socialspam/].Google Scholar
- Aibek Musaev, De Wang, and Calton Pu. 2014. LITMUS: Landslide detection by integrating multiple sources. In Proceedings of the 11th International Conference on Information Systems for Crisis Response and Management.Google Scholar
- Aibek Musaev, De Wang, Chien-An Cho, and Calton Pu. 2014. Landslide detection service based on composition of physical and social information services. In Proceedings of the IEEE International Conference on Web Services.Google ScholarDigital Library
- Aibek Musaev, De Wang, Saajan Shridhar, and Calton Pu. 2015. Fast text classification using randomized explicit semantic analysis. In Proceedings of the IEEE International Conference on Information Reuse and Integration for Data Science.Google ScholarDigital Library
- Aibek Musaev, De Wang, Saajan Shridhar, and Calton Pu. 2015. Toward a real-time service for landslide detection: Augmented explicit semantic analysis and clustering composition approaches. In Proceedings of the IEEE International Conference on Web Services.Google ScholarDigital Library
- Aibek Musaev, De Wang, and Calton Pu. 2015. LITMUS: A multi-service composition system for landslide detection. IEEE Trans. Serv. Comput. 8, 5 (2015), 715--726.Google ScholarCross Ref
- D. Wang, A. Musaev, and C. Pu. 2016. Information diffusion analysis of rumor dynamics over a social-interaction based model. In Proceedings of the IEEE 2nd International Conference on Collaboration and Internet Computing.Google Scholar
- I. Tien, A. Musaev, D. Benas, A. Ghadi, S. Goodman, and C. Pu. 2016. Detection of damage and failure events of critical public infrastructure using social sensor big data. In Proceedings of the International Conference on Internet of Things and Big Data. 435--440.Google Scholar
- Qixuan Hou, A. Musaev, Y. Yang, and C. Pu. 2017. Towards multilingual support of landslides information service. In Proceedings of the IEEE International Conference on Collaborative and Internet Computing.Google Scholar
- A. Musaev and C. Pu. 2017. Towards multilingual automated classification systems. In Proceedings of the IEEE 37th International Conference on Distributed Computing Systems.Google Scholar
- A. Musaev, Q. Hou, Y. Yang, and C. Pu. 2017. LITMUS: Towards multilingual reporting of landslides. In Proceedings of the IEEE 37th International Conference on Distributed Computing Systems.Google Scholar
- A. Musaev, D. Wang, J. Xie, and C. Pu. 2017. REX: Rapid ensemble classification system for landslide detection using social media. In Proceedings of the IEEE 37th International Conference on Distributed Computing Systems.Google Scholar
- Aibek Musaev and Calton Pu. 2017. Landslide information service based on composition of physical and social sensors. In Proceedings of the IEEE International Conference on Data Engineering.Google ScholarCross Ref
- Abhijit Suprem and Pu Calton. 2019. ASSED—A framework for identifying physical events through adaptive social sensor data filtering. In Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems.Google ScholarDigital Library
- A. Suprem, A. Musaev, and C. Pu. 2019. Concept drift adaptive physical event detection for social media streams. In Proceedings of the World Congress on Services. Lecture Notes in Computer Science, Y. Xia, L. J. Zhang (eds.). Springer, Cham, 11517.Google Scholar
- Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1D998), 2278--2324.Google Scholar
- T. Sakaki, M. Okazaki, and Y. Matsuo. 2010. Earthquake shakes Twitter users: Real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web. 851--860.Google Scholar
- X. Wang, F. Zhu, J. Jiang, and S. Li. 2013. Real time event detection in Twitter. In Web-Age Information Management, Vol. 7923, Lecture Notes in Computer Science, 502--513. Springer Berlin.Google Scholar
- K. Radinsky and E. Horvitz. 2013. Mining the web to predict future events. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 255--264.Google Scholar
- M. Kitsuregawa and M. Toyoda. 2011. Analytics for info-plosion including information diffusion studies for the 3.11 disaster. In Web-Age Information Management,Vol. 6897, Lecture Notes in Computer Science, 1--1. Springer Berlin.Google Scholar
- Jonathan A. Silva, Elaine R. Faria, Rodrigo C. Barros, Eduardo R. Hruschka, Andre C. P. L. F. De Carvalho, and João Gama. 2013. Data stream clustering: A survey. ACM Comput. Surv. 46, 1 (2013), 13.Google ScholarDigital Library
- Sergio Ramírez-Gallego, Bartosz Krawczyk, Salvador García, Michał Woźniak, and Francisco Herrera. 2017. A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing 239 (2017), 39--57.Google ScholarDigital Library
- Atefeh Farzindar and Wael Khreich. 2015. A survey of techniques for event detection in Twitter. Comput. Intell. 31, 1 (2015), 132--164.Google ScholarDigital Library
- Pan Sinno Jialin and Qiang Yang. 2009. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10 (2009), 1345--1359.Google Scholar
- Karl Weiss, Taghi M. Khoshgoftaar, and Ding Ding Wang. 2016. A survey of transfer learning. J. Big Data 3, 1 (2016), 9.Google ScholarCross Ref
- J. A. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia. 2014. A survey on concept drift adaptation. ACM Comput. Surv. 46, 4 (2014), 44 1--37.Google Scholar
- Sun Yu, Ke Tang, Zexuan Zhu, and Xin Yao. 2018. Concept drift adaptation by exploiting historical knowledge. IEEE Trans. Neural Netw. Learn. Syst. 29, 10 (2018), 4822--4832.Google ScholarCross Ref
- Geoffrey I. Webb, Loong Kuan Lee, Bart Goethals, and François Petitjean. 2018. Analyzing concept drift and shift from sample data. Data Mining Knowl. Disc. 32, 5 (2018), 1179--1199.Google ScholarDigital Library
- Avidan Shai. 2007. Ensemble tracking. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2 (2007).Google Scholar
- Helmut Grabner, Michael Grabner, and Horst Bischof. 2006. Real-time tracking via on-line boosting. In Proceedings of the British Machine Vision Conference 1, 5 (2006), 6.Google ScholarCross Ref
- Mahmud Hasan, Mehmet A. Orgun, and Rolf Schwitter. 2019. Real-time event detection from the Twitter data stream using the Twitternews+ framework. Inf. Proc. Manag. 56, 3 (2019), 1146--1165.Google ScholarDigital Library
- M. Hasan, M. A. Orgun, and R. Schwitter. 2017. A survey on real-time event detection from the Twitter data stream. J. Inf. Sci. 44, 4 (2017), 443--463. DOI:http://dx.doi.org/10.1177/0165551517698564 0165551517698564Google ScholarDigital Library
- Chao Zhang, Dongming Lei, Quan Yuan, Honglei Zhuang, Lance Kaplan, Shaowen Wang, and Jiawei Han. 2018. Geoburst+: Effective and real-time local event detection in geo-tagged tweet streams. ACM Trans. Intell. Syst. Technol. 9, 3 (2018), 34.Google ScholarDigital Library
- Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. 2012. Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34, 7 (2012), 1409--1422.Google ScholarDigital Library
- Qinxun Bai, Zheng Wu, Stan Sclaroff, Margrit Betke, and Camille Monnier. 2013. Randomized ensemble tracking. In Proceedings of the IEEE International Conference on Computer Vision. 2040--2047.Google ScholarDigital Library
- Bartosz Krawczyk, Leandro L. Minku, João Gama, Jerzy Stefanowski, Michał Woźniak. 2017. Ensemble learning for data stream analysis: A survey. Inf. Fusion 37 (2017), 132--156, Elsevier.Google ScholarDigital Library
- Cha Zhang and Yunqian Ma (eds.). 2012. Ensemble Machine Learning: Methods and Applications. Springer Science 8 Business Media.Google Scholar
- K-means clustering. [<https://en.wikipedia.org/wiki/K-means_clustering>].Google Scholar
- Burr Settles. 2009. Active Learning Literature Survey. Technical report. University of Wisconsin-Madison Department of Computer Sciences.Google Scholar
- Panagiotis G. Ipeirotis and Evgeniy Gabrilovich. 2014. Quizz: Targeted crowdsourcing with a billion (potential) users. In Proceedings of the 23rd International Conference on World Wide Web. 143--154.Google Scholar
- Audun Josang, Roslan Ismail, and Colin A. Boyd. 2007. A survey of trust and reputation systems for online service provisioning. Dec. Supp. Syst. 43, 2 (Mar. 2007), 618--644. Elsevier.Google Scholar
- E. Lex, C. Seifert, M. Granitzer, and A. Junger. 2010. Efficient cross-domain classification of weblogs. Int. J. Intell. Comput. Res. 1, 1 (2010), 36--45.Google ScholarCross Ref
- S. J. Pan, X. Ni, J.-T. Sun, Q. Yang, and Z. Chen. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th International Conference on World Wide Web,. 751--760.Google Scholar
- Y. Zhen and C. Li. 2008. Cross-domain knowledge transfer using semi-supervised classification. In AI 2008: Advances in Artificial Intelligence, Vol. 5360, Lecture Notes in Computer Science, 362--371. Springer Berlin.Google Scholar
- Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT press.Google ScholarDigital Library
- Fei-Yue Wang, Jun Jason Zhang, Xinhu Zheng, Xiao Wang, Yong Yuan, Xiaoxiao Dai, Jie Zhang, and Liuqing Yang. 2016. Where does AlphaGo go: From church-turing thesis to AlphaGo thesis and beyond. IEEE/CAA J. Autom. Sin. 3, 2 (2016), 113--120.Google ScholarCross Ref
- Hutter Frank, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated machine learning-methods, systems, challenges. Autom. Mach. Learn. Springer, New York, NY, USA.Google Scholar
- ImageNet data set. Retrieved on November 9, 2019 from http://www.image-net.org/.Google Scholar
Index Terms
- Beyond Artificial Reality: Finding and Monitoring Live Events from Social Sensors
Recommendations
Learning concept-drifting data streams with random ensemble decision trees
Few online classification algorithms based on traditional inductive ensembling, such as online bagging or boosting, focus on handling concept drifting data streams while performing well on noisy data. Motivated by this, an incremental algorithm based on ...
Decision trees for mining data streams
In this paper we study the problem of constructing accurate decision tree models from data streams. Data streams are incremental tasks that require incremental, online, and any-time learning algorithms. One of the most successful algorithms for mining ...
Ensemble learning for data stream analysis
A comprehensive survey of ensemble approaches for data stream analysis.Taxonomy of ensemble algorithms for various data stream mining tasks.Discussion of open research problems and lines of future research. In many applications of information systems ...
Comments