Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 3/2018

30.08.2017

Mining urban events from the tweet stream through a probabilistic mixture model

verfasst von: Joan Capdevila, Jesús Cerquides, Jordi Torres

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The geographical identification of content in Social Networks have enabled to bridge the gap between online social platforms and the physical world. Although vast amounts of data in such networks are due to breaking news or global occurrences, local events witnessed by users in situ are also present in these streams and of great importance for many city entities. Nowadays, unsupervised machine learning techniques, such as Tweet-SCAN, are able to retrospectively detect these local events from tweets. However, these approaches have limited abilities to reason about unseen observations in a principled way due to the lack of a proper probabilistic foundation. Probabilistic models have also been proposed for the task, but their event identification capabilities are far from those of Tweet-SCAN. In this paper, we identify two key factors which, when combined, boost the accuracy of such models. As a first key factor, we notice that the large amount of meaningless social data requires explicitly modeling non-event observations.Therefore, we propose to incorporate a background model that captures spatio-temporal fluctuations of non-event tweets. As a second key factor, we observe that the shortness of tweets hampers the application of traditional topic models. Thus, we integrate event detection and topic modeling, assigning topic proportions to events instead of assigning them to individual tweets. As a result, we propose Warble, a new probabilistic model and learning scheme for retrospective event detection that incorporates these two key factors. We evaluate Warble in a data set of tweets located in Barcelona during its festivities. The empirical results show that the model outperforms other state-of-the-art techniques in detecting various types of events while relying on a principled probabilistic framework that enables to reason under uncertainty.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
This is an extended version of an unpublished paper that was presented at the ICML Anomaly Detection Workshop 2016 (Capdevila et al. 2016a). The present work also incorporates event summaries, evaluation in terms of BCubed metrics, further details on the model and learning algorithm as well as the release of the Warble code and “La Mercè” datasets.
 
Literatur
Zurück zum Zitat Akbari M, Hu X, Liqiang N, Chua TS (2016) From tweets to wellness: wellness event detection from Twitter streams. In: Proceedings of the 30th AAAI conference on artificial intelligence Akbari M, Hu X, Liqiang N, Chua TS (2016) From tweets to wellness: wellness event detection from Twitter streams. In: Proceedings of the 30th AAAI conference on artificial intelligence
Zurück zum Zitat Allan J, Carbonell JG, Doddington G, Yamron J, Yang Y (1998) Topic detection and tracking pilot study final report. In: Proceedings of the DARPA broadcast news transcription and understanding workshop Allan J, Carbonell JG, Doddington G, Yamron J, Yang Y (1998) Topic detection and tracking pilot study final report. In: Proceedings of the DARPA broadcast news transcription and understanding workshop
Zurück zum Zitat Amigó E, Gonzalo J, Artiles J, Verdejo F (2009) A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf Retr 12(4):461–486CrossRef Amigó E, Gonzalo J, Artiles J, Verdejo F (2009) A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf Retr 12(4):461–486CrossRef
Zurück zum Zitat Atefeh F, Khreich W (2015) A survey of techniques for event detection in Twitter. Comput Intell 31(1):132–164MathSciNetCrossRef Atefeh F, Khreich W (2015) A survey of techniques for event detection in Twitter. Comput Intell 31(1):132–164MathSciNetCrossRef
Zurück zum Zitat Bagga A, Baldwin B (1998) Algorithms for scoring coreference chains. In: Proceedings of the first international conference on language resources and evaluation workshop on linguistics coreference, pp 563–566 Bagga A, Baldwin B (1998) Algorithms for scoring coreference chains. In: Proceedings of the first international conference on language resources and evaluation workshop on linguistics coreference, pp 563–566
Zurück zum Zitat Becker H, Naaman M, Gravano L (2011) Beyond trending topics: real-world event identification on Twitter. In: Proceedings of the 5th international AAAI conference on weblogs and social media (ICWSM), vol 11, pp 438–441 Becker H, Naaman M, Gravano L (2011) Beyond trending topics: real-world event identification on Twitter. In: Proceedings of the 5th international AAAI conference on weblogs and social media (ICWSM), vol 11, pp 438–441
Zurück zum Zitat Birant D, Kut A (2007) ST-DBSCAN: an algorithm for clustering spatial-temporal data. Data Knowl Eng 60(1):208–221CrossRef Birant D, Kut A (2007) ST-DBSCAN: an algorithm for clustering spatial-temporal data. Data Knowl Eng 60(1):208–221CrossRef
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022MATH
Zurück zum Zitat Boettcher A, Lee D (2012) Eventradar: a real-time local event detection scheme using Twitter stream. In: Proceedings of the IEEE international conference on green computing and communications (GreenCom), IEEE, pp 358–367 Boettcher A, Lee D (2012) Eventradar: a real-time local event detection scheme using Twitter stream. In: Proceedings of the IEEE international conference on green computing and communications (GreenCom), IEEE, pp 358–367
Zurück zum Zitat Capdevila J, Cerquides J, Torres J (2016a) Recognizing warblers: a probabilistic model for event detection in Twitter. In: The anomaly detection workshop in the international conference on machine learning (ICML) Capdevila J, Cerquides J, Torres J (2016a) Recognizing warblers: a probabilistic model for event detection in Twitter. In: The anomaly detection workshop in the international conference on machine learning (ICML)
Zurück zum Zitat Capdevila J, Cerquides J, Nin J, Torres J (2017) Tweet-SCAN: an event discovery technique for geo-located tweets. Pattern Recognit Lett 93:58–68CrossRef Capdevila J, Cerquides J, Nin J, Torres J (2017) Tweet-SCAN: an event discovery technique for geo-located tweets. Pattern Recognit Lett 93:58–68CrossRef
Zurück zum Zitat Cheng T, Wicks T (2014) Event detection using Twitter: a spatio-temporal approach. PLoS ONE 9(6):1–10 Cheng T, Wicks T (2014) Event detection using Twitter: a spatio-temporal approach. PLoS ONE 9(6):1–10
Zurück zum Zitat Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Proc Second Int Conf Knowl Discov Data Min (KDD) 96:226–231 Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Proc Second Int Conf Knowl Discov Data Min (KDD) 96:226–231
Zurück zum Zitat Fox CW, Roberts SJ (2012) A tutorial on variational Bayesian inference. Artif Intell Rev 38(2):85–95CrossRef Fox CW, Roberts SJ (2012) A tutorial on variational Bayesian inference. Artif Intell Rev 38(2):85–95CrossRef
Zurück zum Zitat Ghahramani Z, Beal MJ (2001) Propagation algorithms for variational Bayesian learning. In: Proceeding of the advances in neural information processing systems (NIPS) Ghahramani Z, Beal MJ (2001) Propagation algorithms for variational Bayesian learning. In: Proceeding of the advances in neural information processing systems (NIPS)
Zurück zum Zitat Gomide J, Veloso A, Meira W Jr., Almeida V, Benevenuto F, Ferraz F, Teixeira M (2011) Dengue surveillance based on a computational model of spatio-temporal locality of Twitter. In: Proceedings of the 3rd international web science conference (WebSci), pp 3:1–3:8 Gomide J, Veloso A, Meira W Jr., Almeida V, Benevenuto F, Ferraz F, Teixeira M (2011) Dengue surveillance based on a computational model of spatio-temporal locality of Twitter. In: Proceedings of the 3rd international web science conference (WebSci), pp 3:1–3:8
Zurück zum Zitat Hong L, Davison BD (2010) Empirical study of topic modeling in Twitter. In: Proceedings of the first workshop on social media analytics, pp 80–88 Hong L, Davison BD (2010) Empirical study of topic modeling in Twitter. In: Proceedings of the first workshop on social media analytics, pp 80–88
Zurück zum Zitat Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37(2):183–233CrossRefMATH Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37(2):183–233CrossRefMATH
Zurück zum Zitat Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT press, CambridgeMATH Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT press, CambridgeMATH
Zurück zum Zitat Krumm J, Horvitz E (2015) Eyewitness: Identifying local events via space-time signals in Twitter feeds. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, ACM, pp 20:1–20:10 Krumm J, Horvitz E (2015) Eyewitness: Identifying local events via space-time signals in Twitter feeds. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, ACM, pp 20:1–20:10
Zurück zum Zitat Kulldorff M, Heffernan R, Hartman J, Assunção R, Mostashari F (2005) A space-time permutation Scan statistic for disease outbreak detection. PLoS Med 2(3):e59CrossRef Kulldorff M, Heffernan R, Hartman J, Assunção R, Mostashari F (2005) A space-time permutation Scan statistic for disease outbreak detection. PLoS Med 2(3):e59CrossRef
Zurück zum Zitat Lee CH (2012) Mining spatio-temporal information on microblogging streams using a density-based online clustering method. Expert Syst Appl 39(10):9623–9641CrossRef Lee CH (2012) Mining spatio-temporal information on microblogging streams using a density-based online clustering method. Expert Syst Appl 39(10):9623–9641CrossRef
Zurück zum Zitat Lee R, Sumiya K (2010) Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection. In: Proceedings of the 2nd ACM SIGSPATIAL international workshop on location based social networks (LBSN), pp 1–10 Lee R, Sumiya K (2010) Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection. In: Proceedings of the 2nd ACM SIGSPATIAL international workshop on location based social networks (LBSN), pp 1–10
Zurück zum Zitat Li J, Cardie C (2014) Timeline generation: tracking individuals on Twitter. In: Proceedings of the 23rd international conference on World Wide Web (WWW), pp 643–652 Li J, Cardie C (2014) Timeline generation: tracking individuals on Twitter. In: Proceedings of the 23rd international conference on World Wide Web (WWW), pp 643–652
Zurück zum Zitat Li L, Goodchild MF, Xu B (2013) Spatial, temporal, and socioeconomic patterns in the use of twitter and flickr. Cartogr Geogr Inf Sci 40(2):61–77CrossRef Li L, Goodchild MF, Xu B (2013) Spatial, temporal, and socioeconomic patterns in the use of twitter and flickr. Cartogr Geogr Inf Sci 40(2):61–77CrossRef
Zurück zum Zitat Li Z, Wang B, Li M, Ma WY (2005) A probabilistic model for retrospective news event detection. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 106–113 Li Z, Wang B, Li M, Ma WY (2005) A probabilistic model for retrospective news event detection. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 106–113
Zurück zum Zitat Long R, Wang H, Chen Y, Jin O, Yu Y (2011) Towards effective event detection, tracking and summarization on microblog data. In: Web-age information management, Springer, pp 652–663 Long R, Wang H, Chen Y, Jin O, Yu Y (2011) Towards effective event detection, tracking and summarization on microblog data. In: Web-age information management, Springer, pp 652–663
Zurück zum Zitat McInerney J, Blei DM (2014) Discovering newsworthy tweets with a geographical topic model. In: The news publishing workshop in the 20th ACM SIGKDD conference on knowledge discovery and data mining (KDD) McInerney J, Blei DM (2014) Discovering newsworthy tweets with a geographical topic model. In: The news publishing workshop in the 20th ACM SIGKDD conference on knowledge discovery and data mining (KDD)
Zurück zum Zitat Newman N (2011) Mainstream media and the distribution of news in the age of social discovery. Reuters Institute for the Study of Journalism, University of Oxford Newman N (2011) Mainstream media and the distribution of news in the age of social discovery. Reuters Institute for the Study of Journalism, University of Oxford
Zurück zum Zitat Pan CC, Mitra P (2011) Event detection with spatial latent dirichlet allocation. In: Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, pp 349–358 Pan CC, Mitra P (2011) Event detection with spatial latent dirichlet allocation. In: Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, pp 349–358
Zurück zum Zitat Panagiotou N, Katakis I, Gunopulos D (2016) Detecting events in online social networks: definitions, trends and challenges. Springer International Publishing, Cham, pp 42–84 Panagiotou N, Katakis I, Gunopulos D (2016) Detecting events in online social networks: definitions, trends and challenges. Springer International Publishing, Cham, pp 42–84
Zurück zum Zitat Petrović S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to Twitter. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, pp 181–189 Petrović S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to Twitter. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, pp 181–189
Zurück zum Zitat Quan X, Kit C, Ge Y, Pan SJ (2015) Short and sparse text topic modeling via self-aggregation. In: Proceedings of the 24th international conference on artificial intelligence (IJCAI) Quan X, Kit C, Ge Y, Pan SJ (2015) Short and sparse text topic modeling via self-aggregation. In: Proceedings of the 24th international conference on artificial intelligence (IJCAI)
Zurück zum Zitat Ritter A, Etzioni O, Clark S, et al. (2012) Open domain event extraction from Twitter. In: Proceedings of the 18th international conference on Knowledge discovery and data mining (KDD), pp 1104–1112 Ritter A, Etzioni O, Clark S, et al. (2012) Open domain event extraction from Twitter. In: Proceedings of the 18th international conference on Knowledge discovery and data mining (KDD), pp 1104–1112
Zurück zum Zitat Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World Wide Web (WWW), pp 851–860 Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World Wide Web (WWW), pp 851–860
Zurück zum Zitat Singh S (2015) Spatial temporal analysis of social media data. Master’s thesis, Technische Universität München Singh S (2015) Spatial temporal analysis of social media data. Master’s thesis, Technische Universität München
Zurück zum Zitat Tamura K, Ichimura T (2013) Density-based spatiotemporal clustering algorithm for extracting bursty areas from georeferenced documents. In: Proceedings of IEEE international conference on systems, man, and cybernetics (SMC), pp 2079–2084 Tamura K, Ichimura T (2013) Density-based spatiotemporal clustering algorithm for extracting bursty areas from georeferenced documents. In: Proceedings of IEEE international conference on systems, man, and cybernetics (SMC), pp 2079–2084
Zurück zum Zitat Wang X, Grimson E (2008) Spatial latent dirichlet allocation. In: Advances in neural information processing systems (NIPS) Wang X, Grimson E (2008) Spatial latent dirichlet allocation. In: Advances in neural information processing systems (NIPS)
Zurück zum Zitat Weng J, Lee BS (2011) Event detection in Twitter. In: Proceedings of the 5th international AAAI conference on weblogs and social media (ICWSM) Weng J, Lee BS (2011) Event detection in Twitter. In: Proceedings of the 5th international AAAI conference on weblogs and social media (ICWSM)
Zurück zum Zitat Wong WK, Neill DB (2009) Tutorial on event detection. In: the international conference on knowledge discovery and data mining (KDD) Wong WK, Neill DB (2009) Tutorial on event detection. In: the international conference on knowledge discovery and data mining (KDD)
Zurück zum Zitat Yang Y, Pierce T, Carbonell J (1998) A study of retrospective and on-line event detection. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval Yang Y, Pierce T, Carbonell J (1998) A study of retrospective and on-line event detection. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval
Zurück zum Zitat Zheng Y (2012) Tutorial on location-based social networks. In: the 21st international conference on World Wide Web (WWW) Zheng Y (2012) Tutorial on location-based social networks. In: the 21st international conference on World Wide Web (WWW)
Metadaten
Titel
Mining urban events from the tweet stream through a probabilistic mixture model
verfasst von
Joan Capdevila
Jesús Cerquides
Jordi Torres
Publikationsdatum
30.08.2017
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 3/2018
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-017-0541-y

Weitere Artikel der Ausgabe 3/2018

Data Mining and Knowledge Discovery 3/2018 Zur Ausgabe

Premium Partner