ABSTRACT
How predictable is success in complex social systems? In spite of a recent profusion of prediction studies that exploit online social and information network data, this question remains unanswered, in part because it has not been adequately specified. In this paper we attempt to clarify the question by presenting a simple stylized model of success that attributes prediction error to one of two generic sources: insufficiency of available data and/or models on the one hand; and inherent unpredictability of complex social systems on the other. We then use this model to motivate an illustrative empirical study of information cascade size prediction on Twitter. Despite an unprecedented volume of information about users, content, and past performance, our best performing models can explain less than half of the variance in cascade sizes. In turn, this result suggests that even with unlimited data predictive performance would be bounded well below deterministic accuracy. Finally, we explore this potential bound theoretically using simulations of a diffusion process on a random scale free network similar to Twitter. We show that although higher predictive power is possible in theory, such performance requires a homogeneous system and perfect ex-ante knowledge of it: even a small degree of uncertainty in estimating product quality or slight variation in quality across products leads to substantially more restrictive bounds on predictability. We conclude that realistic bounds on predictive accuracy are not dissimilar from those we have obtained empirically, and that such bounds for other complex social systems for which data is more difficult to obtain are likely even lower.
- K. J. Arrow, R. Forsythe, M. Gorham, R. Hahn, R. Hanson, J. O. Ledyard, S. Levmore, R. Litan, P. Milgrom, F. D. Nelson, et al. The promise of prediction markets. Science, 320:877--878, 2008. Google ScholarCross Ref
- S. Asur, B. Huberman, et al. Predicting the future with social media. In International Conference on Web Intelligence and Intelligent Agent Technology, volume 1, pages 492--499. IEEE, 2010. Google ScholarDigital Library
- E. Bakshy, J. M. Hofman, W. A. Mason, and D. J. Watts. Everyone's an influencer: quantifying influence on Twitter. In Fourth ACM international conference on Web search and data mining, pages 65--74. ACM, 2011. Google ScholarDigital Library
- F. M. Bass. Comments on "a new product growth for model consumer durables the bass model". Management science, 50(12): 1833--1840, 2004. Google ScholarDigital Library
- P. Bauer, A. Thorpe, and G. Brunet. The quiet revolution of numerical weather prediction. Nature, 525(7567):47--55, 2015. Google Scholar
- J. Berger. Contagious: Why things catch on. Simon and Schuster, 2013.Google Scholar
- G. S. Berns and S. E. Moore. A neural predictor of cultural popularity. Journal of Consumer Psychology, 22:154--160, 2012. Google ScholarCross Ref
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
- J. Bollen, H. Mao, and X. Zeng. Twitter mood predicts the stock market. Journal of Computational Science, 2(1):1--8, 2011. Google ScholarCross Ref
- J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, and J. Leskovec. Can cascades be predicted? In 23rd international conference on World wide web, pages 925--936. ACM, 2014. Google ScholarDigital Library
- H. Choi and H. Varian. Predicting the present with Google trends. Economic Record, 88(s1): 2--9, 2012. Google ScholarCross Ref
- V. Colizza, A. Barrat, M. Barthelemy, A.-J. Valleron, A. Vespignani, et al. Modeling the worldwide spread of pandemic influenza: baseline case and containment interventions. PLoS medicine, 4(1): 95, 2007. Google ScholarCross Ref
- B. B. De Mesquita. The Predictioneer's Game: Using the logic of brazen self-interest to see and shape the future. Random House Incorporated, 2010.Google Scholar
- A. De Vany. Hollywood economics: How extreme uncertainty shapes the film industry. Routledge, 2004.Google Scholar
- T. DelSole. Predictability and information theory. part i: Measures of predictability. Journal of the atmospheric sciences, 61(20): 2425, 2004. Google ScholarCross Ref
- D. DeSolla Price. Networks of scientific papers. Science, 149(3683): 510--515, 1965. Google ScholarCross Ref
- P. Domingos. The Master Algorithm: How the Quest for the Ultimate Learning Machine will Remake our World. BasicBooks, 2015.Google Scholar
- R. H. Frank and P. J. Cook. The winner-take-all society: Why the few at the top get so much more than the rest of us. Random House, 2010.Google Scholar
- G. Friedman. The next 100 years: a forecast for the 21st century. Anchor, 2010.Google Scholar
- D. Gardner. Future Babble: Why Expert Predictions Fail and Why We Believe Them Anyway. McClelland & Stewart Limited, 2010.Google Scholar
- J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski, and L. Brilliant. Detecting influenza epidemics using search engine query data. Nature, 457(7232): 1012--1014, 2009. Google ScholarCross Ref
- S. Goel, A. Anderson, J. Hofman, and D. Watts. The structural virality of online diffusion. Management Science, 2015. Google ScholarCross Ref
- S. Goel, J. M. Hofman, S. Lahaie, D. M. Pennock, and D. J. Watts. Predicting consumer behavior with web search. Proceedings of the National Academy of Sciences, 107(41): 17486--17490, 2010. Google ScholarCross Ref
- S. Goel, D. M. Reeves, D. J. Watts, and D. M. Pennock. Prediction without markets. In 11th ACM conference on Electronic commerce, pages 357--366. ACM, 2010. Google ScholarDigital Library
- D. Herremans, D. Martens, and K. Sorensen. Dance hit song prediction. Journal of New Music Research, 43(3):291--302, 2014. Google ScholarCross Ref
- N. O. Hodas and K. Lerman. How visibility and divided attention constrain social contagion. In Conference on Social Computing (SocialCom), pages 249--257. IEEE, 2012. Google ScholarDigital Library
- P. Holme and T. Takaguchi. Time evolution of predictability of epidemics on networks. Physical Review E, 91(4): 042811, 2015. Google ScholarCross Ref
- L. Hong and B. D. Davison. Empirical study of topic modeling in Twitter. In First Workshop on Social Media Analytics, pages 80--88. ACM, 2010. Google ScholarDigital Library
- L. Hufnagel, D. Brockmann, and T. Geisel. Forecast and control of epidemics in a globalized world. Proceedings of the National Academy of Sciences, 101(42):15124--15129, 2004. Google ScholarCross Ref
- Y. Ijiri and H. A. Simon. Skew distributions and the sizes of business firms, volume 24. North Holland, 1977.Google Scholar
- S. Jamali and H. Rangwala. Digging digg: Commenmining, popularity prediction, and social network analysis. In International Conference on Web Information Systems and Mining, pages 32--38. IEEE, 2009. Google ScholarDigital Library
- M. Jenders, G. Kasneci, and F. Naumann. Analyzing and predicting viral tweets. In 22nd international conference on World Wide Web, pages 657--664. ACM, 2013. Google ScholarDigital Library
- D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137--146. ACM, 2003. Google ScholarDigital Library
- D. Lazer, R. Kennedy, G. King, and A. Vespignani. The parable of Google flu: traps in big data analysis. Science, 343: 1203--1205, 2014. Google ScholarCross Ref
- K. Lerman and T. Hogg. Using a model of social dynamics to predict popularity of news. In 19th international conference on World wide web, pages 621--630. ACM, 2010. Google ScholarDigital Library
- S. K. Maity, A. Gupta, P. Goyal, and A. Mukherjee. A stratified learning approach for predicting the popularity of Twitter idioms. In Ninth International AAAI Conference on Web and Social Media, 2015.Google Scholar
- M. J. Mauboussin. The success equation: Untangling skill and luck in business, sports, and investing. Harvard Business Press, 2012.Google Scholar
- A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.Google Scholar
- D. Orrell. The future of everything: The science of prediction. BasicBooks, 2008. Google ScholarDigital Library
- J. R. Parish. Fiasco: A History of Hollywood's Iconic Flops. Wiley, 2006.Google Scholar
- S. Petrovic, M. Osborne, and V. Lavrenko. RT to win! predicting message propagation in twitter. In ICWSM, 2011.Google Scholar
- H. Pinto, J. M. Almeida, and M. A. Gonçalves. Using early view patterns to predict the popularity of Youtube videos. In Sixth ACM international conference on Web search and data mining, pages 365--374. ACM, 2013. Google ScholarDigital Library
- P. M. Polgreen, Y. Chen, D. M. Pennock, F. D. Nelson, and R. A. Weinstein. Using internet searches for influenza surveillance. Clinical infectious diseases, 47(11):1443--1448, 2008. Google ScholarCross Ref
- D. M. Romero, B. Meeder, and J. Kleinberg. Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on Twitter. In 20th international conference on World wide web, pages 695--704. ACM, 2011. Google ScholarDigital Library
- D. M. Romero, C. Tan, and J. Ugander. On the interplay between social and topical structure. In Seventh International AAAI Conference on Web and Social Media, 2013.Google Scholar
- M. J. Salganik, P. S. Dodds, and D. J. Watts. Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311(5762): 854--856, 2006. Google ScholarCross Ref
- S. P. Schnaars. Megamistakes. Free Press; Collier Macmillan, 1989.Google Scholar
- W. A. Sherden. The fortune sellers: The big business of buying and selling predictions. John Wiley & Sons, 1998.Google Scholar
- B. Shulman, A. Sharma, and D. Cosley. Predictability of item popularity: Gaps between prediction and understanding. Unpublished.Google Scholar
- J. S. Simono and I. R. Sparrow. Predicting movie grosses: Winners and losers, blockbusters and sleepers. Chance, 13(3): 15--24, 2000. Google ScholarCross Ref
- B. State and L. Adamic. The diffusion of support in an online social movement: Evidence from the adoption of equal-sign profile pictures. In 18th ACM Conference on Computer Supported Cooperative Work, CSCW '15, pages 1741--1750, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- J. Surowiecki. The wisdom of crowds. Anchor, 2005. Google ScholarDigital Library
- G. Szabo and B. A. Huberman. Predicting the popularity of online content. Communications of the ACM, 53(8):80--88, 2010. Google ScholarDigital Library
- N. N. Taleb. The black swan: The impact of the highly improbable. Random House, 2010. Google ScholarDigital Library
- P. Tetlock. Expert political judgment: How good is it? How can we know? Princeton University Press, 2005.Google Scholar
- P. E. Tetlock and D. Gardner. Superforecasting: The art and science of prediction. Crown, 2015. Google ScholarDigital Library
- D. J. Watts. Everything is obvious:* Once you know the answer. Crown Business, 2011.Google Scholar
- W. Weaver. A quarter century in the natural sciences. Public health reports, 76(1): 57, 1961. Google ScholarCross Ref
- L. Weng, F. Menczer, and Y.-Y. Ahn. Virality prediction and community structure in social networks. Scientific reports, 3, 2013. Google ScholarCross Ref
- L. Weng, F. Menczer, and Y.-Y. Ahn. Predicting successful memes using network and community structure. In Eighth International AAAI Conference on Weblogs and Social Media, 2014.Google Scholar
- S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. Who says what to whom on Twitter. In 20th International Conference on World Wide Web, pages 705--714. ACM, 2011. Google ScholarDigital Library
- L. Yu, P. Cui, F. Wang, C. Song, and S. Yang. From micro to macro: Uncovering and predicting information cascading process with behavioral dynamics. IEEE International Conference on Data Mining, 2015. Google ScholarDigital Library
- Q. Zhao, M. A. Erdogdu, H. Y. He, A. Rajaraman, and J. Leskovec. Seismic: A self-exciting point process model for predicting tweet popularity. In 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1513--1522. ACM, 2015. Google ScholarDigital Library
Index Terms
- Exploring Limits to Prediction in Complex Social Systems
Recommendations
Can cascades be predicted?
WWW '14: Proceedings of the 23rd international conference on World wide webOn many social networking web sites such as Facebook and Twitter, resharing or reposting functionality allows users to share others' content with their own friends or followers. As content is reshared from user to user, large cascades of reshares can ...
CPB: a classification-based approach for burst time prediction in cascades
Studying the bursty nature of cascades in social media is practically important in many real applications such as product sales prediction, disaster relief, and stock market prediction. Although both the cascade size prediction and the burst patterns of ...
A novel information diffusion model for online social networks
iiWAS '17: Proceedings of the 19th International Conference on Information Integration and Web-based Applications & ServicesOnline social networks form a central platform for information sharing and influence maximization. Even though the information dissemination within online social networks flows naturally as diffusion process, the dynamics of online social networks make ...
Comments