Skip to main content
Erschienen in: Annals of Data Science 3/2017

10.06.2017

Flexible Heavy Tailed Distributions for Big Data

verfasst von: Yuanyuan Zhang, Saralees Nadarajah

Erschienen in: Annals of Data Science | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Pareto type I distribution (also known as the power law distribution and Zipf’s law) appears to be the main distribution used to model heavy tailed phenomena in the big data literature. The Pareto type I distribution being one of the oldest heavy tailed distributions is not very flexible. Here, we show flexibility of four other heavy tailed distributions for modeling four big data sets in social networks. The Pareto type I distribution is shown not to provide the best or even an adequate fit for any of the data sets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723CrossRef Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723CrossRef
2.
Zurück zum Zitat Arnold BC (2008) Pareto and generalized Pareto distributions. In: Modeling income distributions and lorenz curves, volume 5 of the series economic studies in equality, social exclusion and well-being, pp 119–145 Arnold BC (2008) Pareto and generalized Pareto distributions. In: Modeling income distributions and lorenz curves, volume 5 of the series economic studies in equality, social exclusion and well-being, pp 119–145
3.
Zurück zum Zitat Arnold BC (2015) Pareto distributions, 2nd edn. Chapman and Hall, New York Arnold BC (2015) Pareto distributions, 2nd edn. Chapman and Hall, New York
4.
Zurück zum Zitat Bartels R (1982) The rank version of von Neumann’s ratio test for randomness. J Am Stat Assoc 77:40–46CrossRef Bartels R (1982) The rank version of von Neumann’s ratio test for randomness. J Am Stat Assoc 77:40–46CrossRef
5.
Zurück zum Zitat Box GEP, Pierce DA (1970) Distribution of residual correlations in autoregressive-integrated moving average time series models. J Am Stat Assoc 65:1509–1526CrossRef Box GEP, Pierce DA (1970) Distribution of residual correlations in autoregressive-integrated moving average time series models. J Am Stat Assoc 65:1509–1526CrossRef
6.
Zurück zum Zitat Breusch TS (1979) Testing for autocorrelation in dynamic linear models. Aust Econ Pap 17:334–355CrossRef Breusch TS (1979) Testing for autocorrelation in dynamic linear models. Aust Econ Pap 17:334–355CrossRef
7.
Zurück zum Zitat Coleman R, Johnson MA (2014) Power-laws and structure in functional programs. In: Akhgar B, Arabnia HR (eds) Proceedings of the 2014 international conference on computational science and computational intelligence, pp 168–172 Coleman R, Johnson MA (2014) Power-laws and structure in functional programs. In: Akhgar B, Arabnia HR (eds) Proceedings of the 2014 international conference on computational science and computational intelligence, pp 168–172
8.
Zurück zum Zitat Cox DR, Stuart A (1955) Some quick sign test for trend in location and dispersion. Biometrika 42:80–95CrossRef Cox DR, Stuart A (1955) Some quick sign test for trend in location and dispersion. Biometrika 42:80–95CrossRef
9.
Zurück zum Zitat Davison AC, Smith RL (1990) Models for exceedances over high thresholds (with discussion). J R Stat Soc B 52:393–442 Davison AC, Smith RL (1990) Models for exceedances over high thresholds (with discussion). J R Stat Soc B 52:393–442
10.
Zurück zum Zitat Durbin J, Watson GS (1950) Testing for serial correlation in least squares regression I. Biometrika 37:409–428 Durbin J, Watson GS (1950) Testing for serial correlation in least squares regression I. Biometrika 37:409–428
11.
Zurück zum Zitat Durbin J, Watson GS (1951) Testing for serial correlation in least squares regression II. Biometrika 38:159–178CrossRef Durbin J, Watson GS (1951) Testing for serial correlation in least squares regression II. Biometrika 38:159–178CrossRef
12.
Zurück zum Zitat Durbin J, Watson GS (1971) Testing for serial correlation in least squares regression III. Biometrika 58:1–19 Durbin J, Watson GS (1971) Testing for serial correlation in least squares regression III. Biometrika 58:1–19
13.
Zurück zum Zitat Godfrey LG (1978) Testing against general autoregressive and moving average error models when the regressors include lagged dependent variables. Econometrica 46:1293–1302CrossRef Godfrey LG (1978) Testing against general autoregressive and moving average error models when the regressors include lagged dependent variables. Econometrica 46:1293–1302CrossRef
14.
Zurück zum Zitat Jiang B, Yin J, Liu Q (2015) Zipf’s law for all the natural cities around the world. Int J Geogr Inf Sci 29:498–522CrossRef Jiang B, Yin J, Liu Q (2015) Zipf’s law for all the natural cities around the world. Int J Geogr Inf Sci 29:498–522CrossRef
15.
Zurück zum Zitat Kotz S, Balakrishnan N, Johnson NL (2000) Continuous multivariate distributions, vol 1, 2nd edn. Wiley, New YorkCrossRef Kotz S, Balakrishnan N, Johnson NL (2000) Continuous multivariate distributions, vol 1, 2nd edn. Wiley, New YorkCrossRef
16.
Zurück zum Zitat Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: WWW’10 proceedings of the 19th international conference on the world wide web, pp 591–600 Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: WWW’10 proceedings of the 19th international conference on the world wide web, pp 591–600
17.
Zurück zum Zitat Ljung GM, Box GEP (1978) On a measure of lack of fit in time series models. Biometrika 65:297–303CrossRef Ljung GM, Box GEP (1978) On a measure of lack of fit in time series models. Biometrika 65:297–303CrossRef
18.
Zurück zum Zitat Lu J, Li D (2013) Bias correction in small sample from big data. IEEE Trans Data Knowl Eng 25:2658–2663CrossRef Lu J, Li D (2013) Bias correction in small sample from big data. IEEE Trans Data Knowl Eng 25:2658–2663CrossRef
19.
Zurück zum Zitat Ma D, Sandberg M, Jiang B (2015) Characterizing the heterogeneity of the openstreetmap data and community. ISPRS Int J Geoinf 4:535–550CrossRef Ma D, Sandberg M, Jiang B (2015) Characterizing the heterogeneity of the openstreetmap data and community. ISPRS Int J Geoinf 4:535–550CrossRef
20.
Zurück zum Zitat Pareto V (1964) Cours d’Économie Politique: Nouvelle édition par G. -H. Bousquet et G. Busino. Librairie Droz, Geneva, pp 299–345 Pareto V (1964) Cours d’Économie Politique: Nouvelle édition par G. -H. Bousquet et G. Busino. Librairie Droz, Geneva, pp 299–345
21.
Zurück zum Zitat R Development Core Team (2016) A language and environment for statistical computing: R foundation for statistical computing, Vienna R Development Core Team (2016) A language and environment for statistical computing: R foundation for statistical computing, Vienna
22.
Zurück zum Zitat Wald A, Wolfowitz J (1940) On a test whether two samples are from the same population. Ann Math Stat 11:147–162CrossRef Wald A, Wolfowitz J (1940) On a test whether two samples are from the same population. Ann Math Stat 11:147–162CrossRef
23.
Zurück zum Zitat Wang TC, Phoa FKH (2014) Scanning network communities with power-law-distributed attributes. In: Wu X, Ester M, Xu G (eds) Proceedings of the 2014 proceedings of the IEEE/ACM international conference on advances in social networks analysis and mining, pp 204–207 Wang TC, Phoa FKH (2014) Scanning network communities with power-law-distributed attributes. In: Wu X, Ester M, Xu G (eds) Proceedings of the 2014 proceedings of the IEEE/ACM international conference on advances in social networks analysis and mining, pp 204–207
24.
Zurück zum Zitat Wang TC, Phoa FKH, Hsu TC (2015) Power-law distributions of attributes in community detection. Social Network Analysis and Mining, 5, Article Number UNSP 45 Wang TC, Phoa FKH, Hsu TC (2015) Power-law distributions of attributes in community detection. Social Network Analysis and Mining, 5, Article Number UNSP 45
25.
Zurück zum Zitat Zhao ZD, Yang ZM, Zhang ZK, Zhou T, Huang ZG, Lai YC (2013) Emergence of scaling in human-interest dynamics. Scientific Reports, 3, Article Number 3472 Zhao ZD, Yang ZM, Zhang ZK, Zhou T, Huang ZG, Lai YC (2013) Emergence of scaling in human-interest dynamics. Scientific Reports, 3, Article Number 3472
Metadaten
Titel
Flexible Heavy Tailed Distributions for Big Data
verfasst von
Yuanyuan Zhang
Saralees Nadarajah
Publikationsdatum
10.06.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
Annals of Data Science / Ausgabe 3/2017
Print ISSN: 2198-5804
Elektronische ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-017-0113-4

Weitere Artikel der Ausgabe 3/2017

Annals of Data Science 3/2017 Zur Ausgabe