Skip to main content
Top
Published in: Knowledge and Information Systems 1/2016

01-10-2016 | Regular Paper

CPB: a classification-based approach for burst time prediction in cascades

Authors: Senzhang Wang, Zhao Yan, Xia Hu, Philip S. Yu, Zhoujun Li, Biao Wang

Published in: Knowledge and Information Systems | Issue 1/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Studying the bursty nature of cascades in social media is practically important in many real applications such as product sales prediction, disaster relief, and stock market prediction. Although both the cascade size prediction and the burst patterns of the cascades have been extensively studied, how to predict when a burst will come remains an open problem. It is challenging for traditional time-series-based models such as regression models to address this task directly. Firstly, times-series-based prediction models focus on predicting the future values based on previously observed ones. It is hard to apply them to predict the time of a bursts with the “quick rise-and-fall” pattern. Secondly, besides the cascade popularity, a lot of other side information like user profile and social relation are available in social media. Although the potential utility of such information can be high, it is also hard for time-series-based models to capture and integrate these rich information with diverse formats seamlessly. This paper proposes a classification-based approach for burst time prediction by exploiting rich knowledge in information diffusion. Particularly, we first propose a time-window-based transformation to predict in which time window the burst will appear. By dividing the time spans of all the cascades into the same number of time windows K, the cascades with diverse time spans can thus be handled uniformly. To exploit the rich and heterogenous information in social media, we next propose a scale-independent feature extraction framework to model the heterogenous knowledge in a scale-independent manner. Systematical evaluations are conducted on the Sina Weibo reposting dataset and MemeTracker dataset. Besides the superior performance of the proposed approach, we also observe that: (1) surprisingly, social/structure knowledge is more indicative of the bursts than the cascade popularity information, especially for the bursts occurring in a farther future. (2) Larger cascades are harder to predict as the spreading process of the cascades with higher popularity is usually more diverse and fluctuant. (3) The proposed approach is robust in the sense that the result is not much sensitive to the popularity of the training cascades.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Hu X, Tang L, Tang JL, Liu H (2013) Exploiting social relations for sentiment analysis in microblogging. In: Proceedings of the sixth ACM international conference on web search and data mining, pp 537–546 Hu X, Tang L, Tang JL, Liu H (2013) Exploiting social relations for sentiment analysis in microblogging. In: Proceedings of the sixth ACM international conference on web search and data mining, pp 537–546
3.
go back to reference Wang SZ, Hu X, Yu PS, Li ZJ (2014) MMRate: inferring multi-aspect diffusion networks with multi-pattern cascades. In: Proceedings of the 20th ACM SIGKDD conference on knowledge discovery and data mining, pp 1246–1255 Wang SZ, Hu X, Yu PS, Li ZJ (2014) MMRate: inferring multi-aspect diffusion networks with multi-pattern cascades. In: Proceedings of the 20th ACM SIGKDD conference on knowledge discovery and data mining, pp 1246–1255
4.
go back to reference Wang SZ, Zhang HH, Zhang JW, Zhang XM, Yu PS, Li ZJ (2015) Inferring diffusion networks with sparse cascades by structure transfer. In: Proceedings of the 20th international conference on database systems for advanced applications, pp 405–421 Wang SZ, Zhang HH, Zhang JW, Zhang XM, Yu PS, Li ZJ (2015) Inferring diffusion networks with sparse cascades by structure transfer. In: Proceedings of the 20th international conference on database systems for advanced applications, pp 405–421
5.
go back to reference Parikh N, Sundaresan N (2008) Scalable and near real-time burst detection from e-commerce queries. In: Proceedings of the 14th ACM SIGKDD conference on knowledge discovery and data mining, pp 972–980 Parikh N, Sundaresan N (2008) Scalable and near real-time burst detection from e-commerce queries. In: Proceedings of the 14th ACM SIGKDD conference on knowledge discovery and data mining, pp 972–980
6.
go back to reference Cui P, Jin SF, Yu LY, Wang F, Zhu WW, Yang SQ (2013) Cascading outbreak prediction in networks: a data-driven approach. In: Proceedings of the 19th ACM conference on knowledge discovery and data mining, pp 901–909 Cui P, Jin SF, Yu LY, Wang F, Zhu WW, Yang SQ (2013) Cascading outbreak prediction in networks: a data-driven approach. In: Proceedings of the 19th ACM conference on knowledge discovery and data mining, pp 901–909
7.
go back to reference Mill TC (1990) Time series techniques for economists. Cambridge University Press, Cambridge Mill TC (1990) Time series techniques for economists. Cambridge University Press, Cambridge
8.
go back to reference Goel S, Anderson A, Hofman J, Watts D (2013) The structure virality of online diffusion (preprint) Goel S, Anderson A, Hofman J, Watts D (2013) The structure virality of online diffusion (preprint)
9.
go back to reference Gruhl D, Guha R, Kumar R, Novak J, Tomkins A (2005) The predictive power of online chatter. In: Proceedings of the 11th ACM SIGKDD conference on knowledge discovery and data mining, pp 78–87 Gruhl D, Guha R, Kumar R, Novak J, Tomkins A (2005) The predictive power of online chatter. In: Proceedings of the 11th ACM SIGKDD conference on knowledge discovery and data mining, pp 78–87
10.
go back to reference Kong SB, Mei QZ, Feng L, Zhao Z, Ye F (2014) On the Real-time prediction problem of bursting hashtags in twitter. CoRR abs/1401.2018 Kong SB, Mei QZ, Feng L, Zhao Z, Ye F (2014) On the Real-time prediction problem of bursting hashtags in twitter. CoRR abs/1401.2018
11.
go back to reference Papadimitriou P, Dasdan A, Garcia-Molina H (2008) Web Graph Similarity for Anomaly Detection. iN: Proceedings of the 17th International World Wide Web Conference, pp 1167–1168 Papadimitriou P, Dasdan A, Garcia-Molina H (2008) Web Graph Similarity for Anomaly Detection. iN: Proceedings of the 17th International World Wide Web Conference, pp 1167–1168
12.
go back to reference Ma ZY, Sun AX, Cong G (2013) On predicting the popularity of newly emerging hashtags in twitter. J Am Soc Inf Sci Technol 7(64):1399–1410CrossRef Ma ZY, Sun AX, Cong G (2013) On predicting the popularity of newly emerging hashtags in twitter. J Am Soc Inf Sci Technol 7(64):1399–1410CrossRef
14.
go back to reference Zhang J, Liu B, Tang J, Chen T, Li JZ (2013) Social influence locality for modeling retweeting behaviors. In: Proceedings of the 23rd international joint conference on artificial intelligence, pp 2761–2767 Zhang J, Liu B, Tang J, Chen T, Li JZ (2013) Social influence locality for modeling retweeting behaviors. In: Proceedings of the 23rd international joint conference on artificial intelligence, pp 2761–2767
15.
go back to reference Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international World Wide Web conference, pp 851–860 Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international World Wide Web conference, pp 851–860
16.
go back to reference Kleinberg J (2002) Bursty and hierarchical structure in streams. In: Proceedings of the 8th ACM SIGKDD conference on knowledge discovery and data mining, pp 91–101 Kleinberg J (2002) Bursty and hierarchical structure in streams. In: Proceedings of the 8th ACM SIGKDD conference on knowledge discovery and data mining, pp 91–101
17.
go back to reference Li L, Liang CJM, Liu J, Nath S, Terzis A, Faloutsos C (2011) Thermocast: a cyber-physical forecasting model for data centers. In: Proceedings of the 17th ACM SIGKDD conference on knowledge discovery and data mining, pp 1370–1378 Li L, Liang CJM, Liu J, Nath S, Terzis A, Faloutsos C (2011) Thermocast: a cyber-physical forecasting model for data centers. In: Proceedings of the 17th ACM SIGKDD conference on knowledge discovery and data mining, pp 1370–1378
18.
go back to reference Crane R, Sornette D (2008) Robust dynamic classes revealed by measuring the response function of a social system. Proc Natl Acad Sci USA 41(105):15649–15663CrossRef Crane R, Sornette D (2008) Robust dynamic classes revealed by measuring the response function of a social system. Proc Natl Acad Sci USA 41(105):15649–15663CrossRef
19.
go back to reference Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the fourth ACM international conference on web search and data mining, pp 177–186 Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the fourth ACM international conference on web search and data mining, pp 177–186
20.
go back to reference Kleinberg J (2005) Temporal dynamics of on-line information streams. In: Data Stream Managemnt: Processing High-speed Data. Springer Kleinberg J (2005) Temporal dynamics of on-line information streams. In: Data Stream Managemnt: Processing High-speed Data. Springer
21.
go back to reference Zhu YY, Shasha D (2003) Efficient elastic burst detection in data streams. In: Proceedings of the 9th ACM SIGKDD conference on knowledge discovery and data mining, pp 336–345 Zhu YY, Shasha D (2003) Efficient elastic burst detection in data streams. In: Proceedings of the 9th ACM SIGKDD conference on knowledge discovery and data mining, pp 336–345
23.
go back to reference Barabási A (2011) BURSTS: the hidden pattern behind everything we Do, from Your E-mail to Bloody Crusades. Penguin, New York Barabási A (2011) BURSTS: the hidden pattern behind everything we Do, from Your E-mail to Bloody Crusades. Penguin, New York
24.
go back to reference Barabási A (2005) The origin of bursts and heavy tails in human dynamics. Nature 435:207–211CrossRef Barabási A (2005) The origin of bursts and heavy tails in human dynamics. Nature 435:207–211CrossRef
25.
go back to reference Vazquez A, Oliveira JG, Dezso Z, Goh K, Kondor I, Barabási A (2006) Modeling bursts and heavy tails in human dynamics. Phys Rev E 73, 036126:1-19 Vazquez A, Oliveira JG, Dezso Z, Goh K, Kondor I, Barabási A (2006) Modeling bursts and heavy tails in human dynamics. Phys Rev E 73, 036126:1-19
26.
go back to reference Matsubara Y, Sakurai Y, Prakash BA, Li L, Faloutsos C (2012) Rise and fall patterns of information diffusion: model and implications. In: Proceedings of the 18th ACM SIGKDD conference on knowledge discovery and data mining, pp 6–14 Matsubara Y, Sakurai Y, Prakash BA, Li L, Faloutsos C (2012) Rise and fall patterns of information diffusion: model and implications. In: Proceedings of the 18th ACM SIGKDD conference on knowledge discovery and data mining, pp 6–14
27.
go back to reference Hong LJ, Dan O, Davison BD (2011) Predicting popular messages in twitter. In: Proceedings of the 20th international World Wide Web conference, pp 57–58 Hong LJ, Dan O, Davison BD (2011) Predicting popular messages in twitter. In: Proceedings of the 20th international World Wide Web conference, pp 57–58
28.
go back to reference Szabo G, Huberman BA (2010) Predicting the popularity of online content. Commun ACM 53(8):81–88 Szabo G, Huberman BA (2010) Predicting the popularity of online content. Commun ACM 53(8):81–88
29.
go back to reference Kupavskii A, Umnov A, Gusev G, Serdyukov P (2013) Predicting the audience size of a Tweet. In: Proceedings of the seventh international AAAI conference on weblogs and social media, pp 693–696 Kupavskii A, Umnov A, Gusev G, Serdyukov P (2013) Predicting the audience size of a Tweet. In: Proceedings of the seventh international AAAI conference on weblogs and social media, pp 693–696
30.
go back to reference Petrovic S, Osborne M, Lavrenko V (2011) RT to Win! Predicting message propagation in twitter. In: Proceedings of the fifth international AAAI conference on weblogs and social media Petrovic S, Osborne M, Lavrenko V (2011) RT to Win! Predicting message propagation in twitter. In: Proceedings of the fifth international AAAI conference on weblogs and social media
31.
go back to reference Myers S, Leskovec J (2014) The bursty dynamics of the twitter information network. In: Proceedings of the 23th international World Wide Web conference, pp 913–924 Myers S, Leskovec J (2014) The bursty dynamics of the twitter information network. In: Proceedings of the 23th international World Wide Web conference, pp 913–924
32.
go back to reference Goel S, Watts DJ, Goldstein DG (2012) The structure of online diffusion networks. In: Proceedings of conceptual modeling—31st international conference, pp 623–638 Goel S, Watts DJ, Goldstein DG (2012) The structure of online diffusion networks. In: Proceedings of conceptual modeling—31st international conference, pp 623–638
33.
go back to reference Cheng J, Adamic LA, Dow PA, Kleinberg J, Leskovec J (2014) Can cascades be predicted? In: Proceedings of the 23rd international World Wide Web conference, pp 925–936 Cheng J, Adamic LA, Dow PA, Kleinberg J, Leskovec J (2014) Can cascades be predicted? In: Proceedings of the 23rd international World Wide Web conference, pp 925–936
34.
go back to reference Kupavskii A, Ostroumova L, Umnov A, Usachev S, Serdyukov P, Gusev G, Kustarev A (2012) Prediction of retweet cascade size over time. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp 2335–2338 Kupavskii A, Ostroumova L, Umnov A, Usachev S, Serdyukov P, Gusev G, Kustarev A (2012) Prediction of retweet cascade size over time. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp 2335–2338
35.
go back to reference Gershenfeld N (1999) The nature of mathematical modeling. Cambridge University Press, Cambridge, pp 205–208 Gershenfeld N (1999) The nature of mathematical modeling. Cambridge University Press, Cambridge, pp 205–208
36.
go back to reference Said SE, Dickey DA (1984) Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika 71(3):599–607MathSciNetCrossRefMATH Said SE, Dickey DA (1984) Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika 71(3):599–607MathSciNetCrossRefMATH
37.
go back to reference Motulsky H, Christopoulos A (2004) Fitting models to biological data using linear and nonlinear regression: a practical guide to curve fitting. England Oxford University Press, OxfordMATH Motulsky H, Christopoulos A (2004) Fitting models to biological data using linear and nonlinear regression: a practical guide to curve fitting. England Oxford University Press, OxfordMATH
38.
go back to reference Chakrabarti D, Faloutsos C (2002) Large-scale automated forecasting using fractals. In: Proceedings of the eleventh international conference on information and knowledge management Chakrabarti D, Faloutsos C (2002) Large-scale automated forecasting using fractals. In: Proceedings of the eleventh international conference on information and knowledge management
39.
go back to reference Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the Web. Technical Report Stanford InfoLab Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the Web. Technical Report Stanford InfoLab
40.
go back to reference Kleinberg JM (1999) Hubs, authorities, and communities. ACM Comput Surv 31(4):5CrossRef Kleinberg JM (1999) Hubs, authorities, and communities. ACM Comput Surv 31(4):5CrossRef
41.
go back to reference Gomez-Rodriguez M, Leskovec J, Scholkopf B (2013) Modeling information propagation with survival theory. The 30th international conference on machine learning Gomez-Rodriguez M, Leskovec J, Scholkopf B (2013) Modeling information propagation with survival theory. The 30th international conference on machine learning
42.
go back to reference Wang SZ, Xie SH, Zhang XM, Li ZJ, Yu PS, and Shu XY (2014) Future influence ranking of scientific literature. In: 2014 SIAM international conference on data mining Wang SZ, Xie SH, Zhang XM, Li ZJ, Yu PS, and Shu XY (2014) Future influence ranking of scientific literature. In: 2014 SIAM international conference on data mining
43.
go back to reference Cui P, Wang F, Liu SW, Ou MD, Yang SQ (2011) Who should share what? Item-level social influence prediction for users and posts ranking. In: The 34th international ACM SIGIR conference on research and development in information retrieval Cui P, Wang F, Liu SW, Ou MD, Yang SQ (2011) Who should share what? Item-level social influence prediction for users and posts ranking. In: The 34th international ACM SIGIR conference on research and development in information retrieval
44.
go back to reference Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923 (1998)CrossRef Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923 (1998)CrossRef
45.
go back to reference Wang SZ, Yan Z, Hu X, Yu PS, Li ZJ (2015) Burst time prediction in cascades. In: The twenty-ninth AAAI conference on artificial intelligence Wang SZ, Yan Z, Hu X, Yu PS, Li ZJ (2015) Burst time prediction in cascades. In: The twenty-ninth AAAI conference on artificial intelligence
Metadata
Title
CPB: a classification-based approach for burst time prediction in cascades
Authors
Senzhang Wang
Zhao Yan
Xia Hu
Philip S. Yu
Zhoujun Li
Biao Wang
Publication date
01-10-2016
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 1/2016
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-015-0899-3

Other articles of this Issue 1/2016

Knowledge and Information Systems 1/2016 Go to the issue

Premium Partner