skip to main content
research-article

On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs

Authors Info & Claims
Published:02 July 2009Publication History
Skip Abstract Section

Abstract

Understanding the graph structure of the Internet is a crucial step for building accurate network models and designing efficient algorithms for Internet applications. Yet, obtaining this graph structure can be a surprisingly difficult task, as edges cannot be explicitly queried. For instance, empirical studies of the network of Internet Protocol (IP) addresses typically rely on indirect methods like traceroute to build what are approximately single-source, all-destinations, shortest-path trees. These trees only sample a fraction of the network's edges, and a paper by Lakhina et al. [2003] found empirically that the resulting sample is intrinsically biased. Further, in simulations, they observed that the degree distribution under traceroute sampling exhibits a power law even when the underlying degree distribution is Poisson.

In this article, we study the bias of traceroute sampling mathematically and, for a very general class of underlying degree distributions, explicitly calculate the distribution that will be observed. As example applications of our machinery, we prove that traceroute sampling finds power-law degree distributions in both δ-regular and Poisson-distributed random graphs. Thus, our work puts the observations of Lakhina et al. on a rigorous footing, and extends them to nearly arbitrary degree distributions.

References

  1. Alderson, D., Li, L., Willinger, W., and Doyle, J. 2005. Understanding Internet topology: Principles, models, and validation. IEEE/ACM Trans. Netw. 13, 1205--1218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amini, L., Shaikh, A., and Schulzrinne, H. 2002. Issues with inferring Internet topological attributes. In SPIE IT-Com.Google ScholarGoogle Scholar
  3. Barford, P., Bestavros, A., Byers, J., and Crovella, M. 2001. On the marginal utility of network topology measurements. In SIGCOMM Internet Measurement Workshop. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Blondel, V., Guillaume, J.-L., Hendrickx, J., and Jungers, R. 2007. Distance distribution in random graphs and application to network exploration. Phys. Rev. E 76, 066101.Google ScholarGoogle ScholarCross RefCross Ref
  5. Bollobás, B. 2001. Random Graphs, 2nd ed. Cambridge University Press, Cambridge.Google ScholarGoogle Scholar
  6. Bollobás, B., and Chung, F. 1988. The diameter of a cycle plus a random matching. SIAM J. Disc. Math. 1, 328--333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chen, Q., Chang, H., Govindan, R., Jamin, S., Shenker, S., and Willinger, W. 2002. The origin of power laws in Internet topologies revisited. In Proceedings of the 21st ACM SIGCOMM Conference. ACM, New York.Google ScholarGoogle Scholar
  8. Clauset, A., and Moore, C. 2005. Accuracy and scaling phenomena in Internet mapping. Phys. Rev. Lett. 94, 018701.Google ScholarGoogle ScholarCross RefCross Ref
  9. Cohen, R., Gonen, M., and Wool, A. 2007. Bounding the bias of tree-like sampling in ip topologies. In Proceedings of the European Conference on Complex Systems (ECCS).Google ScholarGoogle Scholar
  10. Dall'Asta, L. 2007. Dynamic exploration of networks: From general principles to the traceroute process. Europ. Phys. J. B 60, 123--133.Google ScholarGoogle ScholarCross RefCross Ref
  11. Dall'Asta, L., Alvarez-Hamelin, I., Barrat, A., Vázquez, A., and Vespignani, A. 2006. Exploring networks with traceroute-like probes: Theory and simulations. Theoret. Comput. Sci. 355, 6--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Erdős, P., and Rényi, A. 1960. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5, 17--61.Google ScholarGoogle Scholar
  13. Fabrikant, A., Koutsoupias, E., and Papadimitriou, C. 2002. Heuristically optimized trade-offs: A new paradigm for power laws in the internet. In Proceedings of the 29th International Colloguium on Automata, Languages and Programming. 110--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Faloutsos, M., Faloutsos, P., and Faloutsos, C. 1999. On power-law relationships of the Internet topology. In Proceedings of the 18th ACM SIGCOMM Conference. ACM, New York. 251--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Flaxman, A., and Vera, J. 2007. Bias reduction in traceroute sampling: Towards a more accurate map of the internet. In Proceedings of the 5th Workshop on Algorithms and Models for the Web-Graph (WAW2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Govindan, R., and Tangmunarunkit, H. 2000. Heuristics for Internet map discovery. In Proceedings of the 19th ACM SIGCOMM Conference. ACM, New York.Google ScholarGoogle Scholar
  17. Guillaume, J.-L., Latapy, M., and Magoni, D. 2006. Relevance of massively distributed explorations of the internet topology: Qualitative results. Comput. Netw. 50, 3197--3224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. J. ASA 58, 13--30.Google ScholarGoogle Scholar
  19. Kim, J. H. 2006. Poisson cloning model for random graphs. Preprint.Google ScholarGoogle Scholar
  20. Kleinberg, J., Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1999. The web as a graph: Measurements, models and methods. In Proceedings of the International Conference on Combinatorics and Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lakhina, A., Byers, J., Crovella, M., and Xie, P. 2003. Sampling biases in IP topology measurements. In Proceedings of the 22nd IEEE INFOCOM Conference. IEEE Computer Society Press, Los Alamitos.Google ScholarGoogle Scholar
  22. Leguay, J., Latapy, M., Friedman, T., and Salamatian, K. 2007. Describing and simulating Internet routes. Comput. Netw. 51, 2067--2087. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. McDiarmid, C. 1998. Concentration. In Probabilistic Methods for Algorithmic Discrete Mathematics, M. Habib, C. McDiarmid, J. Ramirez-Alfonsin, and B. Reed, Eds. Springer-Verlag, Berlin, Germany, 195--247.Google ScholarGoogle Scholar
  24. Mihail, M., Papadimitriou, C., and Saberi, A. 2003. On certain connectivity properties of the Internet topology. In Proceedings of the 35th ACM Symposium on Theory of Computing. ACM, New York.Google ScholarGoogle Scholar
  25. Molloy, M., and Reed, B. 1995. A critical point for random graphs with a given degree sequence. Rand. Struct. Algor. 6, 161--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Molloy, M., and Reed, B. 1998. The size of the largest component of a random graph on a fixed degree sequence. Combin. Probab. Comput. 7, 295--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Motwani, R., and Raghavan, P. 1990. Randomized Algorithms. Cambridge University Press. Cambridge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Pansiot, J.-J., and Grad, D. 1998. On routers and multicast trees in the Internet. ACM SIGCOMM Commun. Rev. 28, 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Petermann, T., and De Los Rios, P. 2004. Exploration of scale-free networks: Do we measure the real exponents? Euro. Phys. J. B 38, 201--204.Google ScholarGoogle ScholarCross RefCross Ref
  30. Seaborn, J. 1991. Hypergeometric Functions and Their Applications. Springer-Verlag, Berlin, Germany. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Shavitt, Y., and Shir, E. 2005. Dimes: Let the Internet measure itself. ACM SIGCOMM Comput. Commun. Rev. 35, 71--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Spanier, J., and Oldham, K. 1987. The exponential integral ei(x) and related functions. In An Atlas of Functions. Hemisphere, Chapter 37, 351--360.Google ScholarGoogle Scholar
  33. Spring, N., Mahajan, R., Wetherall, D., and Anderson, T. 2004. Measuring ISP topologies with rocketfuel. IEEE/ACM Trans. Netw. 12, 2--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tangmunarunkit, H., Govindan, R., Shenker, S., and Estrin, D. 2001. The impact of policy on Internet paths. In Proceedings of the 20th IEEE INFOCOM Conference. IEEE Computer Society Press, Los Alamitos.Google ScholarGoogle Scholar
  35. Viger, F., Augustin, B., Cuvellier, X., Magnien, C., Latapy, M., Friedman, T., and Teixeira, R. 2006. Detection, understanding, and prevention of traceroute measurement artifacts. In Proceedings of the 6th Internet Measurement Conference (IMC'06).Google ScholarGoogle Scholar
  36. Wilf, H. 1994. Generatingfunctionology. Academic Press, Orlando, FL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wormald, N. 1999. Models of random regular graphs. In London Math. Soc. Lecture Note Series, J. Lamb and D. Preece, Eds. Vol. 276. Cambridge University Press, Cambridge. 239--298.Google ScholarGoogle Scholar

Index Terms

  1. On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Journal of the ACM
            Journal of the ACM  Volume 56, Issue 4
            June 2009
            195 pages
            ISSN:0004-5411
            EISSN:1557-735X
            DOI:10.1145/1538902
            Issue’s Table of Contents

            Copyright © 2009 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 2 July 2009
            • Accepted: 1 February 2008
            • Revised: 1 January 2008
            • Received: 1 July 2007
            Published in jacm Volume 56, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader