ABSTRACT
We analyze a massive social network, gathered from the records of a large mobile phone operator, with more than a million users and tens of millions of calls. We examine the distributions of the number of phone calls per customer; the total talk minutes per customer; and the distinct number of calling partners per customer. We find that these distributions are skewed, and that they significantly deviate from what would be expected by power-law and lognormal distributions.
To analyze our observed distributions (of number of calls, distinct call partners, and total talk time), we propose PowerTrack , a method which fits a lesser known but more suitable distribution, namely the Double Pareto LogNormal (DPLN) distribution, to our data and track its parameters over time. Using PowerTrack , we find that our graph changes over time in a way consistent with a generative process that naturally results in the DPLN distributions we observe. Furthermore, we show that this generative process lends itself to a natural and appealing social wealth interpretation in the context of social networks such as ours. We discuss the application of those results to our model and to forecasting.
Supplemental Material
- L. A. Adamic and B. A. Huberman. The Web-shidden order. Communications of the ACM, 44(9):55--60, 2001. Google ScholarDigital Library
- L. A. N. Amaral, A. Scala, M. Barthélémy, and H. E. Stanley. Classes of small-world networks. Proceedings of the National Academy of Sciences, 97(21):11149--11152, 2000.Google ScholarCross Ref
- A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509--512, 1999.Google ScholarCross Ref
- Z. Bi, C. Faloutsos, and F. Korn. The DGX distribution for mining massive, skewed data. In Proceedings of ACM KDD, pages 17--26, New York, NY, 2001. ACM Press. Google ScholarDigital Library
- P. Boldi, B. Codenotti, M. Santini, and S. Vigna. Structural properties of the African Web. In International World Wide Web Conference, New York, NY, 2002. ACM Press.Google Scholar
- A. Z. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web: experiments and models. In International World Wide Web Conference, New York, NY, 2000. ACM Press. Google ScholarDigital Library
- A. Clauset. Power law distributions in empirical data. http://www.santafe.edu/~aaronc/powerlaws/ Google ScholarDigital Library
- A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. ArXiv e-print 0706.1062v1, 2007.Google Scholar
- C. Cortes, D. Pregibon, and C. Volinsky. Communities of interest. In IDA '01: Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis, pages 105--114, London, UK, 2001. Springer-Verlag. Google ScholarDigital Library
- M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the Internet topology. In Proceedings of ACM SIGCOMM, pages 251--262, New York, NY, 1999. Google ScholarDigital Library
- R. Govindan and H. Tangmunarunkit. Heuristics for Internet map discovery. In IEEE INFOCOM, pages 1371--1380, Los Alamitos, CA, March 2000. IEEE Computer Society Press.Google ScholarCross Ref
- J.-P. Onnela, J. Saramaäki, J. Hyvöven, G. Szabó, M. Argollo de Menezes, K. Kaski, and A.-L. Barabási. Structure and Tie Strengths in Mobile Communication Networks. New Journal of Physics, 9, 2007.Google Scholar
- S. Keshav. Why cell phones will dominate the future Internet. Computer Communications Review, 35(2), April 2005. Google ScholarDigital Library
- S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Extracting large-scale knowledge bases from the web. VLDB, pages 639--650, 1999. Google ScholarDigital Library
- M. Mitzenmacher. A Brief History of Generative Models for Power Law and Lognormal Distributions. Internet Mathematics, 1(2):226--251.Google ScholarCross Ref
- M. Mitzenmacher. Dynamic Models for File Sizes and Double Pareto Distributions. Internet Mathematics, 1(3):305--334, 2004.Google ScholarCross Ref
- A. A. Nanavati, S. Gurumurthy, G. Das, D. Chakraborty, K. Dasgupta, S. Mukherjea, and A. Joshi. On the structural properties of massive telecom call graphs : findings and implications. Proc. of 15th ACM Conference on Information and Knowledge Management, pages 435--444, 2006. Google ScholarDigital Library
- M. E. J. Newman. Power laws, pareto distributions and Zipf's law. Contemporary Physics, 46:323--351, 2005.Google ScholarCross Ref
- M. E. J. Newman. Power laws, pareto distributions and Zipf's law. Contemporary Physics, 46:323--351, 2005.Google ScholarCross Ref
- V. Pareto. Oeuvres Completes. Droz, Geneva, 1896.Google Scholar
- D. M. Pennock, G. W. Flake, S. Lawrence, E. J. Glover, and C. L. Giles. Winners don't take all: Characterizing the competition for links on the Web. Proceedings of the National Academy of Sciences, 99(8):5207--5211, 2002.Google ScholarCross Ref
- S. Redner. How popular is your paper? an empirical study of the citation distribution. The European Physics Journal B, 4:131--134, 1998.Google ScholarCross Ref
- W. Reed and M. Jorgensen. The double pareto-lognormal distribution - a new parametric model for size distribution. Communications in Statistics -Theory and Methods, 33(8):1733--1753, 2004.Google ScholarCross Ref
- R.Gibrat. inégalités économiques. Librarie du Recuil Sirey, 1931.Google Scholar
- G. Szabo and A.-L. Barabasi. Network effects in service usage. ArXiv e-prints physics/0611177, November 2006.Google Scholar
Index Terms
- Mobile call graphs: beyond power-law and lognormal distributions
Recommendations
Mobile Phone Graph Evolution: Findings, Model and Interpretation
ICDMW '11: Proceedings of the 2011 IEEE 11th International Conference on Data Mining WorkshopsWhat are the features of mobile phone graph along the time? How to model these features? What are the interpretation for the evolutional graph generation process? To answer the above challenging problems, we analyze a massive who-call-whom networks as ...
Uniform approximation of the distribution for the number of retransmissions of bounded documents
SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer SystemsRetransmission-based failure recovery represents a primary approach in existing communication networks, on all protocol layers, that guarantees data delivery in the presence of channel failures. Contrary to the traditional belief that the number of ...
A transition of limiting distributions of large matchings in random graphs
We study the asymptotic distribution of the number of matchings of size = ( n ) in G ( n , p ) for a wide range of p = p ( n ) ( 0 , 1 ) and for every 1 n / 2 . We prove that this distribution changes from normal to log-normal as increases, and we ...
Comments