Skip to main content
Top
Published in: Annals of Telecommunications 1-2/2011

01-02-2011

A new statistical approach to estimate global file populations from local observations in the eDonkey P2P file sharing system

Authors: Patrick Brown, Sanja Petrovic

Published in: Annals of Telecommunications | Issue 1-2/2011

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we propose a new statistical approach, also known in biology under the name capture–recapture methods in order to estimate global population statistics from local observations. Evaluating population sizes in P2P systems has received much attention lately as these may be useful to set system parameters, to derive other system statistics, or to predict system performance. As these systems are very large, encompassing several millions of users and since they are highly distributed estimating population sizes is a challenging task. More precisely, we are interested in estimating the number of file replicas in the system, i.e., the size of the population of users possessing given files. To this end, we propose a capture–recapture method which is both computationally efficient and accurate. The method proposed allows deriving global population statistics from local and time-limited observations. We apply the method on a measurement data set of several days on a residential network. We compare the results obtained from direct counting procedures with those derived with the proposed methodology.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Assumption \(({\cal H}1)\) is crucial in traditional population estimation as the samples are taken in successive time periods. In the current context, we will apply analog methods on samples taken on identical time periods. If the population varies during the observation and assumption \(({\cal H}2)\) is valid, we estimate the total number of peers having belonged to the population any time during the observation instead of the average population. Both values are close if the average time in system is long compared to the observation period.
 
Literature
1.
go back to reference Anagnostopoulos I, Stavropoulos P, Kouzas G, Anagnostopoulos C, Vergados DD (2006) Estimating the evolution of categorized web page populations. In: ICWE ’06: workshop proceedings of the sixth international conference on Web engineering. ACM, New York, p 13CrossRef Anagnostopoulos I, Stavropoulos P, Kouzas G, Anagnostopoulos C, Vergados DD (2006) Estimating the evolution of categorized web page populations. In: ICWE ’06: workshop proceedings of the sixth international conference on Web engineering. ACM, New York, p 13CrossRef
2.
go back to reference Bawa M, Garcia-Molina H, Gionis A, Motwani R (2003) Estimating aggregates on a peer-to-peer network. Technical report, Dept of Computer Science, Stanford University Bawa M, Garcia-Molina H, Gionis A, Motwani R (2003) Estimating aggregates on a peer-to-peer network. Technical report, Dept of Computer Science, Stanford University
3.
go back to reference Brown P, Petrovic S (2009) Large scale analysis of the eDonkey P2P file sharing system. In: INFOCOM, Rio de Janeiro, Brazil, pp 2746–2750 Brown P, Petrovic S (2009) Large scale analysis of the eDonkey P2P file sharing system. In: INFOCOM, Rio de Janeiro, Brazil, pp 2746–2750
4.
go back to reference Brown P, Petrovic S (2009) A new statistical approach to estimate global file populations in the eDonkey P2P file sharing system. In: International teletraffic congress, Paris, France Brown P, Petrovic S (2009) A new statistical approach to estimate global file populations in the eDonkey P2P file sharing system. In: International teletraffic congress, Paris, France
5.
go back to reference Feller W (1968) An introduction to probability theory and its applications, vol 1, 3rd edn. Wiley, New York Feller W (1968) An introduction to probability theory and its applications, vol 1, 3rd edn. Wiley, New York
6.
go back to reference Fessant FL, Handurukande SB, Kermarrec AM, Massoulié L (2004) Clustering in peer-to-peer file sharing workloads. In: IPTPS, lecture notes in computer science, vol 3279. Springer, Berlin, pp 217–226 Fessant FL, Handurukande SB, Kermarrec AM, Massoulié L (2004) Clustering in peer-to-peer file sharing workloads. In: IPTPS, lecture notes in computer science, vol 3279. Springer, Berlin, pp 217–226
7.
go back to reference Gazey W, Staley M (1986) Population estimation from mark–recapture experiments using a sequential Bayes algorithm. Ecology 67:941–951CrossRef Gazey W, Staley M (1986) Population estimation from mark–recapture experiments using a sequential Bayes algorithm. Ecology 67:941–951CrossRef
8.
go back to reference Handurukande S, Kermarrec A, Fessant FL, Massoulié L, Patarin S (2006) Peer sharing behaviour in the eDonkey network, and implications for the design of server-less file sharing systems. In: EuroSys’06. Leuven, Belgium Handurukande S, Kermarrec A, Fessant FL, Massoulié L, Patarin S (2006) Peer sharing behaviour in the eDonkey network, and implications for the design of server-less file sharing systems. In: EuroSys’06. Leuven, Belgium
9.
go back to reference Krebs CJ (1989) Ecological methodology. Harper and Row, New York Krebs CJ (1989) Ecological methodology. Harper and Row, New York
10.
go back to reference Massoulié L, Merrer EL, Kermarrec AM, Ganesh A (2006) Peer counting and sampling in overlay networks: random walk methods. In: PODC ’06: proceedings of the twenty-fifth annual ACM symposium on principles of distributed computing, New York, NY, USA, pp 123–132 Massoulié L, Merrer EL, Kermarrec AM, Ganesh A (2006) Peer counting and sampling in overlay networks: random walk methods. In: PODC ’06: proceedings of the twenty-fifth annual ACM symposium on principles of distributed computing, New York, NY, USA, pp 123–132
11.
go back to reference Petrovic S (2008) Towards a better understanding of eMule. Ph.D. thesis, University of Nice–Sophia Antipolis Petrovic S (2008) Towards a better understanding of eMule. Ph.D. thesis, University of Nice–Sophia Antipolis
12.
go back to reference Petrovic S, Brown P, Costeux JL (2007) Unfairness in the e-mule file sharing system. In: International teletraffic congress, Ottawa, Canada, pp 594–605 Petrovic S, Brown P, Costeux JL (2007) Unfairness in the e-mule file sharing system. In: International teletraffic congress, Ottawa, Canada, pp 594–605
13.
go back to reference Plissonneau L, Costeux JL, Brown P (2006) Detailed analysis of eDonkey transfers on ADSL. In: 2nd EuroNGI conference on next generation internet design and engineering, Valencia, Spain Plissonneau L, Costeux JL, Brown P (2006) Detailed analysis of eDonkey transfers on ADSL. In: 2nd EuroNGI conference on next generation internet design and engineering, Valencia, Spain
14.
go back to reference Ricker WE (1975) Computation and interpretation of biological statistics of fish populations. Fish Res Board Can 191:1–382 Ricker WE (1975) Computation and interpretation of biological statistics of fish populations. Fish Res Board Can 191:1–382
15.
go back to reference Schumacher FX, Eschmeyer RW (1943) The estimate of fish population in lakes or ponds. J Tenn Acad Sci (18):228–249 Schumacher FX, Eschmeyer RW (1943) The estimate of fish population in lakes or ponds. J Tenn Acad Sci (18):228–249
16.
go back to reference Schwarz C, Seber G (1999) Estimating animal abundance: review III. Stat Sci 14:427–56CrossRef Schwarz C, Seber G (1999) Estimating animal abundance: review III. Stat Sci 14:427–56CrossRef
17.
go back to reference Seber G (1982) The estimation of animal abundance and related parameters, 2nd edn. Charles Griffin & Co, London Seber G (1982) The estimation of animal abundance and related parameters, 2nd edn. Charles Griffin & Co, London
18.
go back to reference Steiner M, Biersack EW, En Najjary T (2007) Actively monitoring peers in KAD. In: IPTPS’07, 6th international workshop on peer-to-peer systems. Bellevue, USA Steiner M, Biersack EW, En Najjary T (2007) Actively monitoring peers in KAD. In: IPTPS’07, 6th international workshop on peer-to-peer systems. Bellevue, USA
19.
go back to reference Stutzbach D, Rejaie R (2006) Understanding churn in peer-to-peer networks. In: Internet measurement conference Stutzbach D, Rejaie R (2006) Understanding churn in peer-to-peer networks. In: Internet measurement conference
Metadata
Title
A new statistical approach to estimate global file populations from local observations in the eDonkey P2P file sharing system
Authors
Patrick Brown
Sanja Petrovic
Publication date
01-02-2011
Publisher
Springer-Verlag
Published in
Annals of Telecommunications / Issue 1-2/2011
Print ISSN: 0003-4347
Electronic ISSN: 1958-9395
DOI
https://doi.org/10.1007/s12243-010-0202-2

Other articles of this Issue 1-2/2011

Annals of Telecommunications 1-2/2011 Go to the issue

Acknowledgments

List of 2010 reviewers