Skip to main content
Erschienen in: Information Systems Frontiers 5/2014

01.11.2014

Finding story chains in newswire articles using random walks

verfasst von: Xianshu Zhu, Tim Oates

Erschienen in: Information Systems Frontiers | Ausgabe 5/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Massive amounts of information about news events are published on the Internet every day in online newspapers, blogs, and social network messages. While search engines like Google help retrieve information using keywords, the large volumes of unstructured search results returned by search engines make it hard to track the evolution of an event. A story chain is composed of a set of news articles that reveal hidden relationships among different events. Traditional keyword-based search engines provide limited support for finding story chains. In this paper, we propose a random walk based algorithm to find story chains. When breaking news happens, many media outlets report the same event. We have two pruning mechanisms in the algorithm to automatically exclude redundant articles from the story chain and to ensure efficiency of the algorithm. We further explore how named entities and word relevance can help find relevant news articles and improve algorithm efficiency by creating a co-clustering based correlation graph. Experimental results show that our proposed algorithm can generate coherent story chains without redundancy. The efficiency of the algorithm is significantly improved on the correlation graph.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ahmed, S.T., Bhindwale, R., Davulcu, H. (2009). Tracking terrorism news threads by extracting eventsignatures. In Proceedings of the 2009 IEEE international conference on intelligence and security informatics, ISI’09 (pp. 182–184). Piscataway: IEEE Press.CrossRef Ahmed, S.T., Bhindwale, R., Davulcu, H. (2009). Tracking terrorism news threads by extracting eventsignatures. In Proceedings of the 2009 IEEE international conference on intelligence and security informatics, ISI’09 (pp. 182–184). Piscataway: IEEE Press.CrossRef
Zurück zum Zitat Angelova, R., & Weikum, G. (2006). Graph-based text classification: learn from your neighbors. In Proceedings of the 29th annual international ACM SIGIR conference (pp. 485–492). New York: ACM. Angelova, R., & Weikum, G. (2006). Graph-based text classification: learn from your neighbors. In Proceedings of the 29th annual international ACM SIGIR conference (pp. 485–492). New York: ACM.
Zurück zum Zitat Chen, H., & Dumais, S. (2000). Bringing order to the web: automatically categorizing search results. In Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’00 (pp. 145–152). New York: ACM.CrossRef Chen, H., & Dumais, S. (2000). Bringing order to the web: automatically categorizing search results. In Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’00 (pp. 145–152). New York: ACM.CrossRef
Zurück zum Zitat Chieu, H.L., & Lee, Y.K. (2004). Query based event extraction along a timeline. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 425–432). New York: ACM. Chieu, H.L., & Lee, Y.K. (2004). Query based event extraction along a timeline. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 425–432). New York: ACM.
Zurück zum Zitat Dhillon, I.S., Mallela, S., Modha, D.S. (2003). Information-theoretic co-clustering. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03 (pp. 89–98). New York: ACM.CrossRef Dhillon, I.S., Mallela, S., Modha, D.S. (2003). Information-theoretic co-clustering. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03 (pp. 89–98). New York: ACM.CrossRef
Zurück zum Zitat Fung, G.P.C., Yu, J.X., Liu, H., Yu, P.S. (2007). Time-dependent event hierarchy construction. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’07 (pp. 300–309). New York: ACM.CrossRef Fung, G.P.C., Yu, J.X., Liu, H., Yu, P.S. (2007). Time-dependent event hierarchy construction. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’07 (pp. 300–309). New York: ACM.CrossRef
Zurück zum Zitat Haveliwala, T.H. (2002). Topic-sensitive pagerank. In Proceedings of the 11th international conference on world wide web, WWW ’02 (pp. 517–526). New York: ACM. Haveliwala, T.H. (2002). Topic-sensitive pagerank. In Proceedings of the 11th international conference on world wide web, WWW ’02 (pp. 517–526). New York: ACM.
Zurück zum Zitat He, Q., Chen, B., Pei, J., Qiu, B., Mitra, P., Giles, L. (2009). Detecting topic evolution in scientific literature: How can citations help? In Proceeding of the 18th ACM conference on Information and knowledge management, CIKM ’09 (pp. 957–966). New York: ACM.CrossRef He, Q., Chen, B., Pei, J., Qiu, B., Mitra, P., Giles, L. (2009). Detecting topic evolution in scientific literature: How can citations help? In Proceeding of the 18th ACM conference on Information and knowledge management, CIKM ’09 (pp. 957–966). New York: ACM.CrossRef
Zurück zum Zitat IEEE 13th International Conference on Information Reuse & Integration, IRI 2012, Las Vegas, NV, USA, August 8–10, 2012. IEEE, 2012. IEEE 13th International Conference on Information Reuse & Integration, IRI 2012, Las Vegas, NV, USA, August 8–10, 2012. IEEE, 2012.
Zurück zum Zitat Jo, Y., Hopcroft, J.E., Lagoze, C. (2011). The web of topics: discovering the topology of topic evolution in a corpus. In Proceedings of the 20th international conference on world wide web, WWW ’11 (pp. 257–266). New York: ACM. Jo, Y., Hopcroft, J.E., Lagoze, C. (2011). The web of topics: discovering the topology of topic evolution in a corpus. In Proceedings of the 20th international conference on world wide web, WWW ’11 (pp. 257–266). New York: ACM.
Zurück zum Zitat Kumaran, G., & Allan, J. (2004). Text classification and named entities for new event detection. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information, retrieval, SIGIR ’04 (pp. 297–304). New York: ACM.CrossRef Kumaran, G., & Allan, J. (2004). Text classification and named entities for new event detection. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information, retrieval, SIGIR ’04 (pp. 297–304). New York: ACM.CrossRef
Zurück zum Zitat Lin, F.-r., & Liang, C.-H. (2008). Storyline-based summarization for news topic retrospection. Decision Support Systems, 45, 473–490.CrossRef Lin, F.-r., & Liang, C.-H. (2008). Storyline-based summarization for news topic retrospection. Decision Support Systems, 45, 473–490.CrossRef
Zurück zum Zitat Makkonen, J., Ahonen-Myka, H., Salmenkivi, M. (2002). Applying semantic classes in event detection and tracking. In Proceedings of international conference on natural language process (pp. 175–183). Mumbai: Springer. Makkonen, J., Ahonen-Myka, H., Salmenkivi, M. (2002). Applying semantic classes in event detection and tracking. In Proceedings of international conference on natural language process (pp. 175–183). Mumbai: Springer.
Zurück zum Zitat Makkonen, J., Ahonen-Myka, H., Salmenkivi, M. (2004). Simple semantics in topic detection and tracking. Information Retrieval, 7, 347–368.CrossRef Makkonen, J., Ahonen-Myka, H., Salmenkivi, M. (2004). Simple semantics in topic detection and tracking. Information Retrieval, 7, 347–368.CrossRef
Zurück zum Zitat Mei, Q., & Zhai, C. (2005). Discovering evolutionary theme patterns from text: An exploration of temporal text mining. In Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD ’05 (pp. 198–207). New York: ACM.CrossRef Mei, Q., & Zhai, C. (2005). Discovering evolutionary theme patterns from text: An exploration of temporal text mining. In Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD ’05 (pp. 198–207). New York: ACM.CrossRef
Zurück zum Zitat Morinaga, S., & Yamanishi, K. (2004). Tracking dynamics of topic trends using a finite mixture model. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’04 (pp. 811–816). New York: ACM. Morinaga, S., & Yamanishi, K. (2004). Tracking dynamics of topic trends using a finite mixture model. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’04 (pp. 811–816). New York: ACM.
Zurück zum Zitat Nallapati, R., Feng, A., Peng, F., Allan, J. (2004). Event threading within news topics. In Proceedings of the thirteenth ACM international conference on information and knowledge management, CIKM ’04 (pp. 446–453). New York: ACM.CrossRef Nallapati, R., Feng, A., Peng, F., Allan, J. (2004). Event threading within news topics. In Proceedings of the thirteenth ACM international conference on information and knowledge management, CIKM ’04 (pp. 446–453). New York: ACM.CrossRef
Zurück zum Zitat Perkio, J., Buntine, W., Perttu, S. (2004). Exploring independent trends in a topic-based search engine. In Proceedings of the 2004 IEEE/WIC/ACM international conference on web intelligence, WI ’04 (pp. 664–668). Washington: IEEE Computer Society. Perkio, J., Buntine, W., Perttu, S. (2004). Exploring independent trends in a topic-based search engine. In Proceedings of the 2004 IEEE/WIC/ACM international conference on web intelligence, WI ’04 (pp. 664–668). Washington: IEEE Computer Society.
Zurück zum Zitat Shahaf, D., & Guestrin, C. (2010). Connecting the dots between news articles. In Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10 (pp. 623–632). New York: ACM.CrossRef Shahaf, D., & Guestrin, C. (2010). Connecting the dots between news articles. In Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10 (pp. 623–632). New York: ACM.CrossRef
Zurück zum Zitat Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., Schult, R. (2006). Monic: modeling and monitoring cluster transitions. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’06 (pp. 706–711). New York: ACM.CrossRef Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., Schult, R. (2006). Monic: modeling and monitoring cluster transitions. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’06 (pp. 706–711). New York: ACM.CrossRef
Zurück zum Zitat Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C. (2005). Neighborhood formation and anomaly detection in bipartite graphs. In ICDM (pp. 418–425). Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C. (2005). Neighborhood formation and anomaly detection in bipartite graphs. In ICDM (pp. 418–425).
Zurück zum Zitat Xiang, L., Yuan, Q., Zhao, S., Chen, L., Zhang, X., Yang, Q., Sun, J. (2010). Temporal recommendation on graphs via long- and short-term preference fusion. In Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10 (pp. 723–732). New York: ACM.CrossRef Xiang, L., Yuan, Q., Zhao, S., Chen, L., Zhang, X., Yang, Q., Sun, J. (2010). Temporal recommendation on graphs via long- and short-term preference fusion. In Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10 (pp. 723–732). New York: ACM.CrossRef
Zurück zum Zitat Yan, R., Wan, X., Otterbacher, J., Kong, L., Li, X., Zhang, Y. (2011). Evolutionary timeline summarization: A balanced optimization framework via iterative substitution. In Proceedings of the 34th international ACM SIGIR conference (pp. 745–754). New York: ACM. Yan, R., Wan, X., Otterbacher, J., Kong, L., Li, X., Zhang, Y. (2011). Evolutionary timeline summarization: A balanced optimization framework via iterative substitution. In Proceedings of the 34th international ACM SIGIR conference (pp. 745–754). New York: ACM.
Zurück zum Zitat Zhu, X., Oates, T. (2012). Finding story chains in newswire articles. In IEEE 13th International conference on information reuse & integration, IRI 2012, Las Vegas, NV, USA, August 8–10. IEEE. Zhu, X., Oates, T. (2012). Finding story chains in newswire articles. In IEEE 13th International conference on information reuse & integration, IRI 2012, Las Vegas, NV, USA, August 8–10. IEEE.
Metadaten
Titel
Finding story chains in newswire articles using random walks
verfasst von
Xianshu Zhu
Tim Oates
Publikationsdatum
01.11.2014
Verlag
Springer US
Erschienen in
Information Systems Frontiers / Ausgabe 5/2014
Print ISSN: 1387-3326
Elektronische ISSN: 1572-9419
DOI
https://doi.org/10.1007/s10796-013-9420-2

Weitere Artikel der Ausgabe 5/2014

Information Systems Frontiers 5/2014 Zur Ausgabe