skip to main content
10.1145/1935826.1935863acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Patterns of temporal variation in online media

Published:09 February 2011Publication History

ABSTRACT

Online content exhibits rich temporal dynamics, and diverse realtime user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored.

We study temporal patterns associated with online content and how the content's popularity grows and fades over time. The attention that content receives on the Web varies depending on many factors and occurs on very different time scales and at different resolutions. In order to uncover the temporal dynamics of online content we formulate a time series clustering problem using a similarity metric that is invariant to scaling and shifting. We develop the K-Spectral Centroid (K-SC) clustering algorithm that effectively finds cluster centroids with our similarity measure. By applying an adaptive wavelet-based incremental approach to clustering, we scale K-SC to large data sets.

We demonstrate our approach on two massive datasets: a set of 580 million Tweets, and a set of 170 million blog posts and news media articles. We find that K-SC outperforms the K-means clustering algorithm in finding distinct shapes of time series. Our analysis shows that there are six main temporal shapes of attention of online content. We also present a simple model that reliably predicts the shape of attention by using information about only a small number of participants. Our analyses offer insight into common temporal patterns of the content on theWeb and broaden the understanding of the dynamics of human attention.

Skip Supplemental Material Section

Supplemental Material

wsdm2011_yang_tvo_01.mp4

mp4

169.2 MB

References

  1. Extended version of the paper. Patterns of temporal variation in online media. Technical Report, Stanford Infolab, 2010.Google ScholarGoogle Scholar
  2. E. Adar, D. Weld, B. Bershad, and S. Gribble Why We Search: Visualizing and Predicting User Behavior. In WWW '07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Adar, L. Zhang, L. A. Adamic, and R. M. Lukose. Implicit structure and the dynamics of blogspace. In Workshop on the Weblogging Ecosystem, 2004.Google ScholarGoogle Scholar
  4. C. Aperjis, B. A. Huberman, and F. Wu. Harvesting collective intelligence: Temporal behavior in yahoo answers. ArXiv e-prints, Jan 2010.Google ScholarGoogle Scholar
  5. L. Backstrom, J. Kleinberg, and R. Kumar. Optimizing web traffic via the media scheduling problem. In KDD '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A.-L. Barabási. The origin of bursts and heavy tails in human dynamics. Nature, 435:207, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  7. F. K.-P. Chan, A. W. chee Fu, and C. Yu. Haar wavelets for efficient similarity search of time-series: With and without time warping. IEEE TKDE, 15(3):686--705, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Chien and N. Immorlica. Semantic Similarity between Search Engine Queries Using Temporal Correlation. In WWW '05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. K. W. Chu and M. H. Wong. Fast time-series searching with scaling and shifting. In PODS '99, 237--248, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Crane and D. Sornette. Robust dynamic classes revealed by measuring the response function of a social system. PNAS, 105(41):15649--15653, October 2008.Google ScholarGoogle ScholarCross RefCross Ref
  11. H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh. Querying and mining of time series data: experimental comparison of representations and distance measures. VLDB., 1(2):1542--1552, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. H. Golub and C. F. Van Loan. Matrix computations (3rd ed.). Johns Hopkins University Press, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Gruhl, R. Guha, R. Kumar, J. Novak, and A. Tomkins. The predictive power of online chatter. In KDD '05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Gruhl, D. Liben-Nowell, R. V. Guha, and A. Tomkins. Information diffusion through blogspace. In WWW, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In WebKDD workshop, pages 56--65. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. Katz and P. Lazarsfeld. Personal influence: The part played by people in the flow of mass communications. Free Press, 1955.Google ScholarGoogle Scholar
  17. L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis (Wiley Series in Probability and Statistics). Wiley-Interscience, March 2005.Google ScholarGoogle Scholar
  18. E. Keogh and C. Ratanamahatana. Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3):358--386, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Krause, J. Leskovec, and C. Guestrin. Data association for topic intensity tracking. In ICML '06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Kumar, N. R. Patel, and J. Woo. Clustering seasonality patterns in the presence of errors. In KDD '02, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In WWW '02, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news cycle. In KDD '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD '07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, and M. Hurst. Cascading behavior in large blog graphs. In SDM'07, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  25. J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of time series, with implications for streaming algorithms. In SIGMOD '03, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. D. Malmgren, D. B. Stouffer, A. E. Motter, and L. A. A. N. Amaral. A poissonian explanation for heavy tails in e-mail communication. PNAS, 105(47):18153--18158, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  27. Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In WWW '06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. L. Michail, J. Lin, M. Vlachos, E. Keogh, and D. Gunopulos. Iterative incremental clustering of time series. In EDBT, 2004.Google ScholarGoogle Scholar
  29. G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. Szabo and B. A. Huberman. Predicting the popularity of online content. ArXiv e-prints, Nov 2008.Google ScholarGoogle Scholar
  31. X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In KDD '07, page 793, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Warren Liao. Clustering of time series data - a survey. Pattern Recognition, 38(11):1857--1874, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. J. Watts and P. S. Dodds. Influentials, networks, and public opinion formation. Journal of Consumer Research, 34(4):441--458, December 2007.Google ScholarGoogle ScholarCross RefCross Ref
  34. F. Wu and B. A. Huberman. Novelty and collective attention. PNAS, 104(45):17599--17601, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  35. S. Yardi, S. A. Golder, and M. J. Brzozowski. Blogging at work and the corporate attention economy. In CHI '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Patterns of temporal variation in online media

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining
        February 2011
        870 pages
        ISBN:9781450304931
        DOI:10.1145/1935826

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 February 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        WSDM '11 Paper Acceptance Rate83of372submissions,22%Overall Acceptance Rate498of2,863submissions,17%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader