ABSTRACT
Tracking new topics, ideas, and "memes" across the Web has been an issue of considerable interest. Recent work has developed methods for tracking topic shifts over long time scales, as well as abrupt spikes in the appearance of particular named entities. However, these approaches are less well suited to the identification of content that spreads widely and then fades over time scales on the order of days - the time scale at which we perceive news and events.
We develop a framework for tracking short, distinctive phrases that travel relatively intact through on-line text; developing scalable algorithms for clustering textual variants of such phrases, we identify a broad class of memes that exhibit wide spread and rich variation on a daily basis. As our principal domain of study, we show how such a meme-tracking approach can provide a coherent representation of the news cycle - the daily rhythms in the news media that have long been the subject of qualitative interpretation but have never been captured accurately enough to permit actual quantitative analysis. We tracked 1.6 million mainstream media sites and blogs over a period of three months with the total of 90 million articles and we find a set of novel and persistent temporal patterns in the news cycle. In particular, we observe a typical lag of 2.5 hours between the peaks of attention to a phrase in the news media and in blogs respectively, with divergent behavior around the overall peak and a "heartbeat"-like pattern in the handoff between news and blogs. We also develop and analyze a mathematical model for the kinds of temporal variation that the system exhibits.
Supplemental Material
- Supporting website: http://memetracker.orgGoogle Scholar
- L. Adamic and N. Glance. The political blogosphere and the 2004 U.S. election. Workshop on Link Discovery, 2005. Google ScholarDigital Library
- E. Adar, L. Zhang, L. Adamic, R. Lukose. Implicit structure and dynamics of blogspace. Wks. Weblogging Ecosystem'04.Google Scholar
- R. Albert and A.-L. Barabási. Statistical mechanics of complex networks. Rev. of Modern Phys., 74:47--97, 2002.Google ScholarCross Ref
- J. Allan (ed). Topic Detection and Tracking. Kluwer, 2002.Google ScholarCross Ref
- L. Bennett. News: The Politics of Illusion. A. B. Longman (Classics in Political Science), seventh edition, 2006.Google Scholar
- D. Blei, J. Lafferty. Dynamic topic models. ICML, 2006. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, pages 3:993--1022, 2003. Google ScholarDigital Library
- G. Calinescu, H. Karloff, Y. Rabani. An improved approximation algorithm for multiway cut. JCSS 60(2000). Google ScholarDigital Library
- E. Dahlhaus, D. S. Johnson, C. H. Papadimitriou, P. D. Seymour, and M. Yannakakis. The complexity of multiterminal cuts. SIAM J. Comput., 23(4):864--894, 1994. Google ScholarDigital Library
- E. Gabrilovich, S. Dumais, and E. Horvitz. Newsjunkie: Providing personalized newsfeeds via analysis ofinformation novelty. In WWW '04, 2004. Google ScholarDigital Library
- M. Gamon, S. Basu, D. Belenko, D. Fisher, M. Hurst, and A. C. Kanig. Blews: Using blogs to provide context for news articles. In ICWSM '08, 2008.Google Scholar
- N. Godbole, M. Srinivasaiah, and S. Skiena. Large-scale sentiment analysis for news and blogs. In ICWSM '07, 2007.Google Scholar
- D. Gruhl, D. Liben-Nowell, R. V. Guha, and A. Tomkins. Information diffusion through blogspace. In WWW '04, 2004. Google ScholarDigital Library
- J. Harsin. The rumour bomb: Theorising the convergence of new and old trendsin mediated U.S. politics. Southern Review: Communication, Politics and Culture,39(2006).Google Scholar
- S. Havre, B. Hetzler, L. Nowell. ThemeRiver: Visualizing theme changes over time. IEEE Symp. Info. Vis. 2000. Google ScholarDigital Library
- J. Kleinberg. Bursty and hierarchical structure in streams. In KDD '02, pages 91--101, 2002. Google ScholarDigital Library
- M. Kot. Elements of Mathematical Ecology. Cambridge University Press, 2001.Google ScholarCross Ref
- B. Kovach and T. Rosenstiel. Warp Speed: America in the Age of Mixed Media. Century Foundation Press, 1999.Google Scholar
- R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. Structure and evolution of blogspace. CACM, 47(12):35--39, 2004. Google ScholarDigital Library
- M. Lacker and C. Peskin. Control of ovulation number in a model of ovarian follicularmaturation. In AMS Symposium on Mathematical Biology,pages 21--32, 1981.Google Scholar
- P.F. Lazarsfeld, B. Berelson, and H. Gaudet. The People's Choice. Duell, Sloan, and Pearce, 1944.Google Scholar
- J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, M. Hurst. Cascading behavior in large blog graphs. SDM'07.Google Scholar
- R. D. Malmgren, D. B. Stouffer, A. Motter, and L. A. N. Amaral. A poissonian explanation for heavy tails in e-mail communication. PNAS, to appear, 2008.Google Scholar
- J. Schmidt. Blogging practices: An analytical framework. Journal of Computer-Mediated Communication, 12(4), 2007.Google ScholarCross Ref
- J. Singer. The political j-blogger. Journalism, 6(2005).Google Scholar
- Spinn3r API. http://www.spinn3r.com. 2008.Google Scholar
- M. L. Stein, S. Paterno, and R. C. Burnett. Newswriter's Handbook: An Introduction to Journalism. Blackwell, 2006.Google Scholar
- A. Vazquez, J. G. Oliveira, Z. Deszo, K.-I. Goh, I. Kondor, and A.-L. Barabasi. Modeling bursts and heavy tails in human dynamics. Physical Review E, 73(036127), 2006.Google Scholar
- X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topicaltrends. Proc. KDD, 2006. Google ScholarDigital Library
- X. Wang, C. Zhai, X. Hu, R. Sproat. Mining correlated bursty topic patterns from coordinated textstreams. KDD, 2007. Google ScholarDigital Library
- F. Wu and B. Huberman. Novelty and collective attention. Proc. Natl. Acad. Sci. USA, 104, 2007.Google Scholar
Index Terms
- Meme-tracking and the dynamics of the news cycle
Recommendations
Structure and dynamics of information pathways in online media
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data miningDiffusion of information, spread of rumors and infectious diseases are all instances of stochastic processes that occur over the edges of an underlying network. Many times networks over which contagions spread are unobserved, and such networks are often ...
Inferring Networks of Diffusion and Influence
Information diffusion and virus propagation are fundamental processes taking place in networks. While it is often possible to directly observe when nodes become infected with a virus or publish the information, observing individual transmissions (who ...
NIFTY: a system for large scale information flow tracking and clustering
WWW '13: Proceedings of the 22nd international conference on World Wide WebThe real-time information on news sites, blogs and social networking sites changes dynamically and spreads rapidly through the Web. Developing methods for handling such information at a massive scale requires that we think about how information content ...
Comments