Skip to main content

2016 | OriginalPaper | Buchkapitel

TwitterNews+: A Framework for Real Time Event Detection from the Twitter Data Stream

verfasst von : Mahmud Hasan, Mehmet A. Orgun, Rolf Schwitter

Erschienen in: Social Informatics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years, substantial research efforts have gone into investigating different approaches to the detection of events in real time from the Twitter data stream. Most of these approaches, however, suffer from a high computational cost and are not evaluated using a publicly available corpus, thus making it difficult to properly compare them. In this paper, we propose a scalable event detection system, TwitterNews+, to detect and track newsworthy events in real time. TwitterNews+ uses a novel approach to cluster event related tweets from Twitter with a significantly lower computational cost compared to the existing state-of-the-art approaches. Finally, we evaluate the effectiveness of TwitterNews+ using a publicly available corpus and its associated ground truth data set of newsworthy events. The result of the evaluation shows a significant improvement, in terms of recall and precision, over the baselines we have used.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Atefeh, F., Khreich, W.: A survey of techniques for event detection in Twitter. Comput. Intell. 31(1), 132–164 (2015). Wiley Online LibraryMathSciNetCrossRef Atefeh, F., Khreich, W.: A survey of techniques for event detection in Twitter. Comput. Intell. 31(1), 132–164 (2015). Wiley Online LibraryMathSciNetCrossRef
2.
Zurück zum Zitat Mathioudakis, M., Koudas, N.: TwitterMonitor: trend detection over the Twitter stream. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, NY, USA, pp. 1155–1158. ACM, New York (2010) Mathioudakis, M., Koudas, N.: TwitterMonitor: trend detection over the Twitter stream. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, NY, USA, pp. 1155–1158. ACM, New York (2010)
3.
Zurück zum Zitat Alvanaki, F., Sebastian, M., Ramamritham, K., Weikum, G.: Enblogue: emergent topic detection in web 2.0 streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, NY, USA, pp. 1271–1274. ACM, New York (2011) Alvanaki, F., Sebastian, M., Ramamritham, K., Weikum, G.: Enblogue: emergent topic detection in web 2.0 streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, NY, USA, pp. 1271–1274. ACM, New York (2011)
4.
Zurück zum Zitat Gaglio, S., Re, G.L., Morana, M.: A framework for real-time Twitter data analysis. Comput. Commun. 73, 236–242 (2016). ElsevierCrossRef Gaglio, S., Re, G.L., Morana, M.: A framework for real-time Twitter data analysis. Comput. Commun. 73, 236–242 (2016). ElsevierCrossRef
5.
Zurück zum Zitat Xie, R., Zhu, F., Ma, H., Xie, W., Lin, C.: CLEar: a real-time online observatory for bursty and viral events. Proc. VLDB Endowment 7(13), 1637–1640 (2014). VLDB EndowmentCrossRef Xie, R., Zhu, F., Ma, H., Xie, W., Lin, C.: CLEar: a real-time online observatory for bursty and viral events. Proc. VLDB Endowment 7(13), 1637–1640 (2014). VLDB EndowmentCrossRef
6.
Zurück zum Zitat Li, J., Wen, J., Tai, Z., Zhang, R., Yu, W.: Bursty event detection from microblog: a distributed and incremental approach. In: Concurrency and Computation:Practice and Experience. Wiley Online Library (2015) Li, J., Wen, J., Tai, Z., Zhang, R., Yu, W.: Bursty event detection from microblog: a distributed and incremental approach. In: Concurrency and Computation:Practice and Experience. Wiley Online Library (2015)
7.
Zurück zum Zitat Cai, H., Yang, Y., Li, X., Huang, Z.: What are popular: exploring Twitter features for event detection, tracking and visualization. In: Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, pp. 89–98. ACM (2015) Cai, H., Yang, Y., Li, X., Huang, Z.: What are popular: exploring Twitter features for event detection, tracking and visualization. In: Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, pp. 89–98. ACM (2015)
8.
Zurück zum Zitat Petrović, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, HLT 2010, ACL, Stroudsburg, PA, USA, pp. 181–189 (2010) Petrović, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, HLT 2010, ACL, Stroudsburg, PA, USA, pp. 181–189 (2010)
9.
Zurück zum Zitat McMinn, A.J., Jose, J.M.: Real-time entity-based event detection for Twitter. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 65–77. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24027-5_6 CrossRef McMinn, A.J., Jose, J.M.: Real-time entity-based event detection for Twitter. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 65–77. Springer, Heidelberg (2015). doi:10.​1007/​978-3-319-24027-5_​6 CrossRef
10.
Zurück zum Zitat Hasan, M., Orgun, M.A., Schwitter, R.: TwitterNews: real time event detection from the Twitter data stream. PeerJ PrePrints 4, e2297v1 (2016) Hasan, M., Orgun, M.A., Schwitter, R.: TwitterNews: real time event detection from the Twitter data stream. PeerJ PrePrints 4, e2297v1 (2016)
11.
Zurück zum Zitat Sahlgren, M.: An introduction to random indexing. In: Proceedings of the Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE, vol. 5 (2005) Sahlgren, M.: An introduction to random indexing. In: Proceedings of the Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE, vol. 5 (2005)
12.
Zurück zum Zitat Guzman, J., Poblete, B.: On-line relevant anomaly detection in the Twitter stream: an efficient bursty keyword detection model. In: Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, pp. 31–39. ACM (2013) Guzman, J., Poblete, B.: On-line relevant anomaly detection in the Twitter stream: an efficient bursty keyword detection model. In: Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, pp. 31–39. ACM (2013)
13.
Zurück zum Zitat Petkos, G., Papadopoulos, S., Aiello, L., Skraba, R., Kompatsiaris, Y.: A soft frequent pattern mining approach for textual topic detection. In: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics, WIMS, pp. 25: 1–25: 10. ACM (2014) Petkos, G., Papadopoulos, S., Aiello, L., Skraba, R., Kompatsiaris, Y.: A soft frequent pattern mining approach for textual topic detection. In: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics, WIMS, pp. 25: 1–25: 10. ACM (2014)
14.
Zurück zum Zitat Marcus, A., Bernstein, M.S., Badar, O., Karger, D.R., Madden, S., Miller, R.C.: TwitInfo: aggregating and visualizing microblogs for event exploration. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2011, NY, USA, pp. 227–236. ACM, New York (2011) Marcus, A., Bernstein, M.S., Badar, O., Karger, D.R., Madden, S., Miller, R.C.: TwitInfo: aggregating and visualizing microblogs for event exploration. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2011, NY, USA, pp. 227–236. ACM, New York (2011)
15.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678. ACM (2014) Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
16.
Zurück zum Zitat Derczynski, L., Ritter, A., Clark, S., Bontcheva, K.: Twitter part-of-speech tagging for all: overcoming sparse and noisy data. In: Proceedings of the Recent Advances in Natural Language Processing, RANLP, pp. 198–206 (2013) Derczynski, L., Ritter, A., Clark, S., Bontcheva, K.: Twitter part-of-speech tagging for all: overcoming sparse and noisy data. In: Proceedings of the Recent Advances in Natural Language Processing, RANLP, pp. 198–206 (2013)
17.
Zurück zum Zitat Aiello, L.M., Petkos, G., Martin, C., Corney, D., Papadopoulos, S., Skraba, R., Goker, A., Kompatsiaris, I., Jaimes, A.: Sensing trending topics in Twitter. IEEE Trans. Multimedia 15(6), 1268–1282 (2013). IEEE Aiello, L.M., Petkos, G., Martin, C., Corney, D., Papadopoulos, S., Skraba, R., Goker, A., Kompatsiaris, I., Jaimes, A.: Sensing trending topics in Twitter. IEEE Trans. Multimedia 15(6), 1268–1282 (2013). IEEE
18.
Zurück zum Zitat Stilo, G., Velardi, P.: Efficient temporal mining of micro-blog texts and its application to event discovery. In: Fürnkranz, J. (ed.) Data Mining and Knowledge Discovery, pp. 1–31. Springer, Heidelberg (2015) Stilo, G., Velardi, P.: Efficient temporal mining of micro-blog texts and its application to event discovery. In: Fürnkranz, J. (ed.) Data Mining and Knowledge Discovery, pp. 1–31. Springer, Heidelberg (2015)
19.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). JMLR.org Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). JMLR.org
20.
Zurück zum Zitat Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A.: Improved part-of-speech tagging for online conversational text with word clusters. In: Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, HLT 2013, ACL, pp. 380–391 (2013) Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A.: Improved part-of-speech tagging for online conversational text with word clusters. In: Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, HLT 2013, ACL, pp. 380–391 (2013)
21.
Zurück zum Zitat McMinn, A.J., Moshfeghi, Y., Jose, J.M.: Building a large-scale corpus for evaluating event detection on Twitter. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, NY, USA, pp. 409–418. ACM, New York (2013) McMinn, A.J., Moshfeghi, Y., Jose, J.M.: Building a large-scale corpus for evaluating event detection on Twitter. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, NY, USA, pp. 409–418. ACM, New York (2013)
22.
Zurück zum Zitat Kumar, S., Liu, H., Mehta, S., Subramaniam, L.V.: From tweets to events: exploring a scalable solution for Twitter streams. arXiv preprint arXiv:1405.1392 (2014) Kumar, S., Liu, H., Mehta, S., Subramaniam, L.V.: From tweets to events: exploring a scalable solution for Twitter streams. arXiv preprint arXiv:​1405.​1392 (2014)
23.
Zurück zum Zitat Lehmann, J., Gonçalves, B., Ramasco, J.J., Cattuto, C.: Dynamical classes of collective attention in Twitter. In: Proceedings of the International Conference on World Wide Web, pp. 251–260. ACM (2012) Lehmann, J., Gonçalves, B., Ramasco, J.J., Cattuto, C.: Dynamical classes of collective attention in Twitter. In: Proceedings of the International Conference on World Wide Web, pp. 251–260. ACM (2012)
24.
Zurück zum Zitat Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining, pp. 177–186. ACM (2011) Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining, pp. 177–186. ACM (2011)
Metadaten
Titel
TwitterNews+: A Framework for Real Time Event Detection from the Twitter Data Stream
verfasst von
Mahmud Hasan
Mehmet A. Orgun
Rolf Schwitter
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-47880-7_14

Neuer Inhalt