ABSTRACT
Online content exhibits rich temporal dynamics, and diverse realtime user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored.
We study temporal patterns associated with online content and how the content's popularity grows and fades over time. The attention that content receives on the Web varies depending on many factors and occurs on very different time scales and at different resolutions. In order to uncover the temporal dynamics of online content we formulate a time series clustering problem using a similarity metric that is invariant to scaling and shifting. We develop the K-Spectral Centroid (K-SC) clustering algorithm that effectively finds cluster centroids with our similarity measure. By applying an adaptive wavelet-based incremental approach to clustering, we scale K-SC to large data sets.
We demonstrate our approach on two massive datasets: a set of 580 million Tweets, and a set of 170 million blog posts and news media articles. We find that K-SC outperforms the K-means clustering algorithm in finding distinct shapes of time series. Our analysis shows that there are six main temporal shapes of attention of online content. We also present a simple model that reliably predicts the shape of attention by using information about only a small number of participants. Our analyses offer insight into common temporal patterns of the content on theWeb and broaden the understanding of the dynamics of human attention.
Supplemental Material
- Extended version of the paper. Patterns of temporal variation in online media. Technical Report, Stanford Infolab, 2010.Google Scholar
- E. Adar, D. Weld, B. Bershad, and S. Gribble Why We Search: Visualizing and Predicting User Behavior. In WWW '07, 2007. Google ScholarDigital Library
- E. Adar, L. Zhang, L. A. Adamic, and R. M. Lukose. Implicit structure and the dynamics of blogspace. In Workshop on the Weblogging Ecosystem, 2004.Google Scholar
- C. Aperjis, B. A. Huberman, and F. Wu. Harvesting collective intelligence: Temporal behavior in yahoo answers. ArXiv e-prints, Jan 2010.Google Scholar
- L. Backstrom, J. Kleinberg, and R. Kumar. Optimizing web traffic via the media scheduling problem. In KDD '09, 2009. Google ScholarDigital Library
- A.-L. Barabási. The origin of bursts and heavy tails in human dynamics. Nature, 435:207, 2005.Google ScholarCross Ref
- F. K.-P. Chan, A. W. chee Fu, and C. Yu. Haar wavelets for efficient similarity search of time-series: With and without time warping. IEEE TKDE, 15(3):686--705, 2003. Google ScholarDigital Library
- S. Chien and N. Immorlica. Semantic Similarity between Search Engine Queries Using Temporal Correlation. In WWW '05, 2005. Google ScholarDigital Library
- K. K. W. Chu and M. H. Wong. Fast time-series searching with scaling and shifting. In PODS '99, 237--248, 1999. Google ScholarDigital Library
- R. Crane and D. Sornette. Robust dynamic classes revealed by measuring the response function of a social system. PNAS, 105(41):15649--15653, October 2008.Google ScholarCross Ref
- H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh. Querying and mining of time series data: experimental comparison of representations and distance measures. VLDB., 1(2):1542--1552, 2008. Google ScholarDigital Library
- G. H. Golub and C. F. Van Loan. Matrix computations (3rd ed.). Johns Hopkins University Press, 1996. Google ScholarDigital Library
- D. Gruhl, R. Guha, R. Kumar, J. Novak, and A. Tomkins. The predictive power of online chatter. In KDD '05, 2005. Google ScholarDigital Library
- D. Gruhl, D. Liben-Nowell, R. V. Guha, and A. Tomkins. Information diffusion through blogspace. In WWW, 2004. Google ScholarDigital Library
- A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In WebKDD workshop, pages 56--65. 2007. Google ScholarDigital Library
- E. Katz and P. Lazarsfeld. Personal influence: The part played by people in the flow of mass communications. Free Press, 1955.Google Scholar
- L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis (Wiley Series in Probability and Statistics). Wiley-Interscience, March 2005.Google Scholar
- E. Keogh and C. Ratanamahatana. Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3):358--386, 2005. Google ScholarDigital Library
- A. Krause, J. Leskovec, and C. Guestrin. Data association for topic intensity tracking. In ICML '06, 2006. Google ScholarDigital Library
- M. Kumar, N. R. Patel, and J. Woo. Clustering seasonality patterns in the presence of errors. In KDD '02, 2002. Google ScholarDigital Library
- R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In WWW '02, 2003. Google ScholarDigital Library
- J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news cycle. In KDD '09, 2009. Google ScholarDigital Library
- J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD '07, 2007. Google ScholarDigital Library
- J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, and M. Hurst. Cascading behavior in large blog graphs. In SDM'07, 2007.Google ScholarCross Ref
- J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of time series, with implications for streaming algorithms. In SIGMOD '03, 2003. Google ScholarDigital Library
- R. D. Malmgren, D. B. Stouffer, A. E. Motter, and L. A. A. N. Amaral. A poissonian explanation for heavy tails in e-mail communication. PNAS, 105(47):18153--18158, 2008.Google ScholarCross Ref
- Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In WWW '06, 2006. Google ScholarDigital Library
- J. L. Michail, J. Lin, M. Vlachos, E. Keogh, and D. Gunopulos. Iterative incremental clustering of time series. In EDBT, 2004.Google Scholar
- G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1986. Google ScholarDigital Library
- G. Szabo and B. A. Huberman. Predicting the popularity of online content. ArXiv e-prints, Nov 2008.Google Scholar
- X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In KDD '07, page 793, 2007. Google ScholarDigital Library
- T. Warren Liao. Clustering of time series data - a survey. Pattern Recognition, 38(11):1857--1874, 2005. Google ScholarDigital Library
- D. J. Watts and P. S. Dodds. Influentials, networks, and public opinion formation. Journal of Consumer Research, 34(4):441--458, December 2007.Google ScholarCross Ref
- F. Wu and B. A. Huberman. Novelty and collective attention. PNAS, 104(45):17599--17601, 2007.Google ScholarCross Ref
- S. Yardi, S. A. Golder, and M. J. Brzozowski. Blogging at work and the corporate attention economy. In CHI '09, 2009. Google ScholarDigital Library
Index Terms
- Patterns of temporal variation in online media
Recommendations
Clustering Hashtags Using Temporal Patterns
Web Information Systems Engineering – WISE 2020AbstractTwitter hashtags provide a high-level summary of tweets, while cluster hashtags have many applications. Existing text-based methods (relying on explicit words in tweets) are greatly affected by the sparsity of the short tweet texts and the low co-...
Automatic creation of photo books from stories in social media
WSM '10: Proceedings of second ACM SIGMM workshop on Social mediaPhotos are a special way to tell stories of our best memories. The representation of those photos in appealing physical photo books is highly appreciated by many people. Today, many photos are shared via social networking sites, where people upload ...
A peek into the future: predicting the evolution of popularity in user generated content
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data miningContent popularity prediction finds application in many areas, including media advertising, content caching, movie revenue estimation, traffic management and macro-economic trends forecasting, to name a few. However, predicting this popularity is ...
Comments