Skip to main content
Erschienen in: Journal of Intelligent Information Systems 3/2020

05.09.2019

WebKey: a graph-based method for event detection in web news

verfasst von: Elham Rasouli, Sajjad Zarifzadeh, Amir Jahangard Rafsanjani

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With rapid and vast publishing of news over the Internet, there is a surge of interest to detect underlying hot events from online news streams. There are two main challenges in event detection: accuracy and scalability. In this paper, we propose a fast and efficient method to detect events in news websites. First, we identify bursty terms which suddenly appear in a lot of news documents. Then, we construct a novel co-occurrence graph between terms in which nodes and edges are weighted based on important features such as click and document frequency within burst intervals. Finally, a weighted community detection algorithm is used to cluster terms and find events. We also propose a couple of techniques to reduce the size of the graph. The results of our evaluations show that the proposed method yields a much higher precision and recall than past methods, such that their harmonic mean is improved by at least 40%. Moreover, it reduces the running time and memory usage by a factor of at least 2.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aggarwal, C.C., & Subbian, K. (2012). Event detection in social streams. In Proceedings of the 2012 SIAM international conference on data mining (pp. 624–635). Aggarwal, C.C., & Subbian, K. (2012). Event detection in social streams. In Proceedings of the 2012 SIAM international conference on data mining (pp. 624–635).
Zurück zum Zitat Allan, J. (2002). Topic detection and tracking: event-based information organization. In Topic detection and tracking: event-based information organization (pp. 1–16). Springer Science and Business Media. Allan, J. (2002). Topic detection and tracking: event-based information organization. In Topic detection and tracking: event-based information organization (pp. 1–16). Springer Science and Business Media.
Zurück zum Zitat Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y., et al. (1998). Topic detection and tracking pilot study: final report. In Proceedings of the DARPA broadcast news transcription and understanding workshop (pp. 194–218). Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y., et al. (1998). Topic detection and tracking pilot study: final report. In Proceedings of the DARPA broadcast news transcription and understanding workshop (pp. 194–218).
Zurück zum Zitat Atefeh, F., & Khreich, W. (2015). A survey of techniques for event detection in Twitter. Computational Intelligence, 31(1), 132–164.MathSciNetCrossRef Atefeh, F., & Khreich, W. (2015). A survey of techniques for event detection in Twitter. Computational Intelligence, 31(1), 132–164.MathSciNetCrossRef
Zurück zum Zitat Becker, H., Naaman, M., Gravano, L. (2010). Learning similarity metrics for event identification in social media. In Proceedings of the 3rd ACM international conference on web search and data mining (pp. 291–300). Becker, H., Naaman, M., Gravano, L. (2010). Learning similarity metrics for event identification in social media. In Proceedings of the 3rd ACM international conference on web search and data mining (pp. 291–300).
Zurück zum Zitat Borsje, J., Hogenboom, F., Frasincar, F. (2010). Semi-automatic financial events discovery based on lexico-semantic patterns. International Journal of Web Engineeringand Technology, 6(2), 115–140.CrossRef Borsje, J., Hogenboom, F., Frasincar, F. (2010). Semi-automatic financial events discovery based on lexico-semantic patterns. International Journal of Web Engineeringand Technology, 6(2), 115–140.CrossRef
Zurück zum Zitat Brandes, U. (2001). A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25(2), 163–177.CrossRef Brandes, U. (2001). A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25(2), 163–177.CrossRef
Zurück zum Zitat Cataldi, M., DiCaro, L., Schifanella, C. (2010). Emerging topic detection on twitter based on temporal and social terms evaluation. In Proceedings of the 10th international workshop on multimedia data mining (Article No. 4). Cataldi, M., DiCaro, L., Schifanella, C. (2010). Emerging topic detection on twitter based on temporal and social terms evaluation. In Proceedings of the 10th international workshop on multimedia data mining (Article No. 4).
Zurück zum Zitat Chen, Q., Guo, X., Bai, H. (2017). Semantic-based topic detection using Markov decision processes. Elsevier Neurocomputing, 242, 40–50.CrossRef Chen, Q., Guo, X., Bai, H. (2017). Semantic-based topic detection using Markov decision processes. Elsevier Neurocomputing, 242, 40–50.CrossRef
Zurück zum Zitat Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. (2009a). Breadth-first search. In Introduction to algorithms. 3rd edn. Chapter 22 (pp. 594–602): The MIT Press. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. (2009a). Breadth-first search. In Introduction to algorithms. 3rd edn. Chapter 22 (pp. 594–602): The MIT Press.
Zurück zum Zitat Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. (2009b). Dijkstra’s algorithm. In Introduction to algorithms. 3rd edn. Chapter 24 (pp. 658–662): The MIT Press. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. (2009b). Dijkstra’s algorithm. In Introduction to algorithms. 3rd edn. Chapter 24 (pp. 658–662): The MIT Press.
Zurück zum Zitat Dai, X., & Sun, Y. (2010). Event identification within news topics. In Proceedings of IEEE international conference on intelligent computing and integrated systems (ICISS) (pp. 498–502). Dai, X., & Sun, Y. (2010). Event identification within news topics. In Proceedings of IEEE international conference on intelligent computing and integrated systems (ICISS) (pp. 498–502).
Zurück zum Zitat Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H. (2005). Parameter free bursty events detection in text streams. In Proceedings of the 31st international conference on very large data bases (VLDB) (pp. 181–192). Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H. (2005). Parameter free bursty events detection in text streams. In Proceedings of the 31st international conference on very large data bases (VLDB) (pp. 181–192).
Zurück zum Zitat Garg, M., & Kumar, M. (2018). TWCM: Twitter word co-occurrence model for event detection. Elsevier Procedia Computer Science, 143, 434–441.CrossRef Garg, M., & Kumar, M. (2018). TWCM: Twitter word co-occurrence model for event detection. Elsevier Procedia Computer Science, 143, 434–441.CrossRef
Zurück zum Zitat Ge, T., Cui, L., Chang, B., Sui, Z., Zhou, M. (2016). Event detection with burst information networks. In Proceedings of 26th international conference on computational linguistics: technical papers (pp. 3276–3286). Ge, T., Cui, L., Chang, B., Sui, Z., Zhou, M. (2016). Event detection with burst information networks. In Proceedings of 26th international conference on computational linguistics: technical papers (pp. 3276–3286).
Zurück zum Zitat Hu, L., Zhang, B., Hou, L., Li, J. (2017). Adaptive online event detection in news streams. Elsevier Knowledge-Based Systems, 138, 105–112.CrossRef Hu, L., Zhang, B., Hou, L., Li, J. (2017). Adaptive online event detection in news streams. Elsevier Knowledge-Based Systems, 138, 105–112.CrossRef
Zurück zum Zitat Kleinberg, J. (2003). Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery, 7(4), 373–397.MathSciNetCrossRef Kleinberg, J. (2003). Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery, 7(4), 373–397.MathSciNetCrossRef
Zurück zum Zitat Kourtellis, N., Morales, G.D.F., Bonchi, F. (2015). Scalable online betweenness centrality in evolving graphs. IEEE Transactions on Knowledge and Data Engineering, 27(9), 2494–2506.CrossRef Kourtellis, N., Morales, G.D.F., Bonchi, F. (2015). Scalable online betweenness centrality in evolving graphs. IEEE Transactions on Knowledge and Data Engineering, 27(9), 2494–2506.CrossRef
Zurück zum Zitat Leskovec, J., & Faloutsos, C. (2006). Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 631–636). Leskovec, J., & Faloutsos, C. (2006). Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 631–636).
Zurück zum Zitat Li, H., & Yamanishi, K. (2000). Topic analysis using a finite mixture model. In Proceedings of the joint SIGDAT conference on empirical methods in natural language processing (EMNLP) and very large corpora (pp. 35–44). Li, H., & Yamanishi, K. (2000). Topic analysis using a finite mixture model. In Proceedings of the joint SIGDAT conference on empirical methods in natural language processing (EMNLP) and very large corpora (pp. 35–44).
Zurück zum Zitat Li, Z., Wang, B., Li, M., Ma, W.Y. (2005). A probabilistic model for retrospective news event detection. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 106–113). Li, Z., Wang, B., Li, M., Ma, W.Y. (2005). A probabilistic model for retrospective news event detection. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 106–113).
Zurück zum Zitat Manning, C.D., Raghavan, P., Schutze, H. (2009). Introduction to information retrieval. Cambridge University Press. Manning, C.D., Raghavan, P., Schutze, H. (2009). Introduction to information retrieval. Cambridge University Press.
Zurück zum Zitat Miller, G.A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39–41.CrossRef Miller, G.A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39–41.CrossRef
Zurück zum Zitat Mori, M., Miura, T., Shioya, I. (2006). Topic detection and tracking for news web pages. In Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence (pp. 338–342). Mori, M., Miura, T., Shioya, I. (2006). Topic detection and tracking for news web pages. In Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence (pp. 338–342).
Zurück zum Zitat Nasre, M., Pontecorvi, M., Ramachandran, V. (2014). Betweenness centrality, incremental and faster. In Springer international symposium on mathematical foundations of computer science (pp. 577–588). Nasre, M., Pontecorvi, M., Ramachandran, V. (2014). Betweenness centrality, incremental and faster. In Springer international symposium on mathematical foundations of computer science (pp. 577–588).
Zurück zum Zitat Petkos, G., Papadopoulos, S., Aiello, L., Skraba, R., Kompatsiaris, Y. (2014). A soft frequent pattern mining approach for textual topic detection. In Proceedings of the 4th international conference on web intelligence, mining and semantics (WIMS) (Article No. 25). Petkos, G., Papadopoulos, S., Aiello, L., Skraba, R., Kompatsiaris, Y. (2014). A soft frequent pattern mining approach for textual topic detection. In Proceedings of the 4th international conference on web intelligence, mining and semantics (WIMS) (Article No. 25).
Zurück zum Zitat Phuvipadawat, S., & Murata, T. (2010). Breaking news detection and tracking in Twitter. In Proceedings of the IEEE international conference on web intelligence and intelligent agent technology (WI-IAT) (pp. 120–123). Phuvipadawat, S., & Murata, T. (2010). Breaking news detection and tracking in Twitter. In Proceedings of the IEEE international conference on web intelligence and intelligent agent technology (WI-IAT) (pp. 120–123).
Zurück zum Zitat Sakaki, T., Okazaki, M., Matsuo, Y. (2010). Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World Wide Web (WWW) (pp. 851–860). Sakaki, T., Okazaki, M., Matsuo, Y. (2010). Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World Wide Web (WWW) (pp. 851–860).
Zurück zum Zitat Sakaki, T., Okazaki, M., Matsuo, Y. (2013). Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Transactions on Knowledge and Data Engineering, 25(4), 919–931.CrossRef Sakaki, T., Okazaki, M., Matsuo, Y. (2013). Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Transactions on Knowledge and Data Engineering, 25(4), 919–931.CrossRef
Zurück zum Zitat Sankaranarayanan, J., Samet, H., Teitler, B.E., Lieberman, M.D., Sperling, J. (2009). Twitterstand: news in tweets. In Proceedings of the 17th ACM SigSpatial international conference on advances in geographic information systems (pp. 42–51). Sankaranarayanan, J., Samet, H., Teitler, B.E., Lieberman, M.D., Sperling, J. (2009). Twitterstand: news in tweets. In Proceedings of the 17th ACM SigSpatial international conference on advances in geographic information systems (pp. 42–51).
Zurück zum Zitat Sayyadi, H., & Raschid, L. (2013). A graph analytical approach for topic detection. ACM Transactions on Internet Technology, 13(2), Article No. 4.CrossRef Sayyadi, H., & Raschid, L. (2013). A graph analytical approach for topic detection. ACM Transactions on Internet Technology, 13(2), Article No. 4.CrossRef
Zurück zum Zitat Sayyadi, H., Hurst, M., Maykov, A. (2009). Event detection and tracking in social streams. In Proceedings of international AAAI conference on web and social media. Sayyadi, H., Hurst, M., Maykov, A. (2009). Event detection and tracking in social streams. In Proceedings of international AAAI conference on web and social media.
Zurück zum Zitat Shakiba, T., Zarifzadeh, S., Derhami, V. (2018). Spam query detection using stream clustering. Springer World Wide Web, 21(2), 557–572.CrossRef Shakiba, T., Zarifzadeh, S., Derhami, V. (2018). Spam query detection using stream clustering. Springer World Wide Web, 21(2), 557–572.CrossRef
Zurück zum Zitat Taghi-Zadeh, H., Sadreddini, M.H., Diyanati, M.H., Rasekh, A.H. (2017). A new hybrid stemming method for persian language. Digital Scholarship in the Humanities, 32(1), 209–221. Taghi-Zadeh, H., Sadreddini, M.H., Diyanati, M.H., Rasekh, A.H. (2017). A new hybrid stemming method for persian language. Digital Scholarship in the Humanities, 32(1), 209–221.
Zurück zum Zitat Wartena, C., & Brussee, R. (2008). Topic detection by clustering keywords. In Proceedings of the IEEE computer society DEXA workshops (pp. 54–58). Wartena, C., & Brussee, R. (2008). Topic detection by clustering keywords. In Proceedings of the IEEE computer society DEXA workshops (pp. 54–58).
Zurück zum Zitat Wei, Y., Singh, L., Buttler, D., Gallagher, B. (2018). Using semantic graphs to detect overlapping target events and story lines from newspaper articles. International Journal of Data Science and Analytics, 5(1), 41–60.CrossRef Wei, Y., Singh, L., Buttler, D., Gallagher, B. (2018). Using semantic graphs to detect overlapping target events and story lines from newspaper articles. International Journal of Data Science and Analytics, 5(1), 41–60.CrossRef
Zurück zum Zitat Weng, J., & Lee, B.S. (2011). Event detection in Twitter. In Proceedings of the international AAAI conference on web and social media (ICWSM) (pp. 401–422). Weng, J., & Lee, B.S. (2011). Event detection in Twitter. In Proceedings of the international AAAI conference on web and social media (ICWSM) (pp. 401–422).
Zurück zum Zitat Xiaomei, Z., Jing, Y., Jianpei, Z. (2018). Sentiment-based and hashtag-based Chinese online bursty event detection. Springer Multimedia Tools and Applications, 77 (16), 725–750. Xiaomei, Z., Jing, Y., Jianpei, Z. (2018). Sentiment-based and hashtag-based Chinese online bursty event detection. Springer Multimedia Tools and Applications, 77 (16), 725–750.
Zurück zum Zitat Yang, Y., Pierce, T., Carbonell, J. (1998). A study of retrospective and on-line event detection. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 28–36). Yang, Y., Pierce, T., Carbonell, J. (1998). A study of retrospective and on-line event detection. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 28–36).
Zurück zum Zitat Yang, Y., Carbonell, J.G., Brown, R.D., Pierce, T., Archibald, B.T., Liu, X. (1999). Learning approaches for detecting and tracking news events. IEEE Intelligent Systems and Their Applications, 14(4), 32–43.CrossRef Yang, Y., Carbonell, J.G., Brown, R.D., Pierce, T., Archibald, B.T., Liu, X. (1999). Learning approaches for detecting and tracking news events. IEEE Intelligent Systems and Their Applications, 14(4), 32–43.CrossRef
Zurück zum Zitat Yang, C.C., Shi, X., Wei, C.P. (2009). Discovering event evolution graphs from news corpora. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 39(4), 850–863.CrossRef Yang, C.C., Shi, X., Wei, C.P. (2009). Discovering event evolution graphs from news corpora. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 39(4), 850–863.CrossRef
Zurück zum Zitat Zhang, W., Pan, G., Wu, Z., Li, S. (2013). Online community detection for large complex networks. In Proceedings of the 23th international joint conference on artificial intelligence (IJCAI) (pp. 1903–1909). Zhang, W., Pan, G., Wu, Z., Li, S. (2013). Online community detection for large complex networks. In Proceedings of the 23th international joint conference on artificial intelligence (IJCAI) (pp. 1903–1909).
Zurück zum Zitat Zhao, W.X., Chen, R., Fan, K., Yan, H., Li, X. (2012). A novel burst-based text representation model for scalable event detection. In Proceedings of the 50th annual meeting of the association for computational linguistics: short papers (pp. 43–47). Zhao, W.X., Chen, R., Fan, K., Yan, H., Li, X. (2012). A novel burst-based text representation model for scalable event detection. In Proceedings of the 50th annual meeting of the association for computational linguistics: short papers (pp. 43–47).
Metadaten
Titel
WebKey: a graph-based method for event detection in web news
verfasst von
Elham Rasouli
Sajjad Zarifzadeh
Amir Jahangard Rafsanjani
Publikationsdatum
05.09.2019
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 3/2020
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-019-00576-7

Weitere Artikel der Ausgabe 3/2020

Journal of Intelligent Information Systems 3/2020 Zur Ausgabe