ABSTRACT
We have developed an application called Wikipedia Live Monitor that monitors article edits on different language versions of Wikipedia--as they happen in realtime. Wikipedia articles in different languages are highly interlinked. For example, the English article "en:2013_Russian_meteor_event" on the topic of the February 15 meteoroid that exploded over the region of Chelyabinsk Oblast, Russia, is interlinked with "ru:ПaДehne_meteopnta_ha_Ypajie_B_2013_roДy?, the Russian article on the same topic. As we monitor multiple language versions of Wikipedia in parallel, we can exploit this fact to detect concurrent edit spikes of Wikipedia articles covering the same topics, both in only one, and in different languages. We treat such concurrent edit spikes as signals for potential breaking news events, whose plausibility we then check with full-text cross-language searches on multiple social networks. Unlike the reverse approach of monitoring social networks first, and potentially checking plausibility on Wikipedia second, the approach proposed in this paper has the advantage of being less prone to false-positive alerts, while being equally sensitive to true-positive events, however, at only a fraction of the processing cost. A live demo of our application is available online at the URL http://wikipedia-irc.herokuapp.com/, the source code is available under the terms of the Apache 2.0 license at https://github.com/tomayac/wikipedia-irc.
- C. Beaumont. Michael Jackson's death sparks Wikipedia editing war, June 2009. http://bit.ly/Michael-Jacksons-death-sparks-Wikipedia-editing-war, accessed 02/18/2013.Google Scholar
- I. Hickson. The WebSocket API. Candidate Recommendation, W3C, Sept. 2012.Google Scholar
- M. Hu, S. Liu, F. Wei, Y. Wu, J. Stasko, and K.-L. Ma. Breaking News on Twitter. In Proceedings of the 2012 ACM Annual Conference on Human Factors in Computing Systems, CHI '12, pages 2751--2754. ACM, 2012. Google ScholarDigital Library
- M. Osborne, S. Petrović, R. McCreadie, C. Macdonald, and I. Ounis. Bieber no more: First Story Detection using Twitter and Wikipedia. In Proceedings of the SIGIR Workshop on Time-aware Information Access, 2012.Google Scholar
- S. Petrović, M. Osborne, and V. Lavrenko. Streaming First Story Detection with Application to Twitter. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT '10, pages 181--189. Association for Computational Linguistics, 2010. Google ScholarDigital Library
- E. Summers. An ode to node, Nov. 2011. http://inkdroid.org/journal/2011/11/07/anode-to-node/, accessed 02/18/2013.Google Scholar
- M. Tsagkias, M. de Rijke, and W. Weerkamp. Linking Online News and Social Media. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM '11, pages 565--574. ACM, 2011. Google ScholarDigital Library
- B. Vibber. Current events and traffic spikes, June 2009. http://blog.wikimedia.org/2009/06/25/current-events/, accessed 02/18/2013.Google Scholar
- D. Vrandečić. Wikidata: A New Platform for Collaborative Data Collection. In Proceedings of the 21st International Conference Companion on World Wide Web, WWW '12 Companion, pages 1063--1064. ACM, 2012. Google ScholarDigital Library
Index Terms
- MJ no more: using concurrent wikipedia edit spikes with social network plausibility checks for breaking news detection
Recommendations
DAWT: Densely Annotated Wikipedia Texts Across Multiple Languages
WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web CompanionIn this work, we open up the DAWT dataset - Densely Annotated Wikipedia Texts across multiple languages. The annotations include labeled text mentions mapping to entities (represented by their Freebase machine ids) as well as the type of the entity. The ...
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and CommunicationIn natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
With a Little Help from my Neighbors: Person Name Linking Using the Wikipedia Social Network
WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide WebDriven by the popularity of social networks, there has been an increasing interest in employing such networks in the context of named entity linking. In this paper, we present a novel approach to person name disambiguation and linking that uses a large-...
Comments