Abstract
Understanding the graph structure of the Internet is a crucial step for building accurate network models and designing efficient algorithms for Internet applications. Yet, obtaining this graph structure can be a surprisingly difficult task, as edges cannot be explicitly queried. For instance, empirical studies of the network of Internet Protocol (IP) addresses typically rely on indirect methods like traceroute to build what are approximately single-source, all-destinations, shortest-path trees. These trees only sample a fraction of the network's edges, and a paper by Lakhina et al. [2003] found empirically that the resulting sample is intrinsically biased. Further, in simulations, they observed that the degree distribution under traceroute sampling exhibits a power law even when the underlying degree distribution is Poisson.
In this article, we study the bias of traceroute sampling mathematically and, for a very general class of underlying degree distributions, explicitly calculate the distribution that will be observed. As example applications of our machinery, we prove that traceroute sampling finds power-law degree distributions in both δ-regular and Poisson-distributed random graphs. Thus, our work puts the observations of Lakhina et al. on a rigorous footing, and extends them to nearly arbitrary degree distributions.
- Alderson, D., Li, L., Willinger, W., and Doyle, J. 2005. Understanding Internet topology: Principles, models, and validation. IEEE/ACM Trans. Netw. 13, 1205--1218. Google ScholarDigital Library
- Amini, L., Shaikh, A., and Schulzrinne, H. 2002. Issues with inferring Internet topological attributes. In SPIE IT-Com.Google Scholar
- Barford, P., Bestavros, A., Byers, J., and Crovella, M. 2001. On the marginal utility of network topology measurements. In SIGCOMM Internet Measurement Workshop. Google ScholarDigital Library
- Blondel, V., Guillaume, J.-L., Hendrickx, J., and Jungers, R. 2007. Distance distribution in random graphs and application to network exploration. Phys. Rev. E 76, 066101.Google ScholarCross Ref
- Bollobás, B. 2001. Random Graphs, 2nd ed. Cambridge University Press, Cambridge.Google Scholar
- Bollobás, B., and Chung, F. 1988. The diameter of a cycle plus a random matching. SIAM J. Disc. Math. 1, 328--333. Google ScholarDigital Library
- Chen, Q., Chang, H., Govindan, R., Jamin, S., Shenker, S., and Willinger, W. 2002. The origin of power laws in Internet topologies revisited. In Proceedings of the 21st ACM SIGCOMM Conference. ACM, New York.Google Scholar
- Clauset, A., and Moore, C. 2005. Accuracy and scaling phenomena in Internet mapping. Phys. Rev. Lett. 94, 018701.Google ScholarCross Ref
- Cohen, R., Gonen, M., and Wool, A. 2007. Bounding the bias of tree-like sampling in ip topologies. In Proceedings of the European Conference on Complex Systems (ECCS).Google Scholar
- Dall'Asta, L. 2007. Dynamic exploration of networks: From general principles to the traceroute process. Europ. Phys. J. B 60, 123--133.Google ScholarCross Ref
- Dall'Asta, L., Alvarez-Hamelin, I., Barrat, A., Vázquez, A., and Vespignani, A. 2006. Exploring networks with traceroute-like probes: Theory and simulations. Theoret. Comput. Sci. 355, 6--24. Google ScholarDigital Library
- Erdős, P., and Rényi, A. 1960. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5, 17--61.Google Scholar
- Fabrikant, A., Koutsoupias, E., and Papadimitriou, C. 2002. Heuristically optimized trade-offs: A new paradigm for power laws in the internet. In Proceedings of the 29th International Colloguium on Automata, Languages and Programming. 110--122. Google ScholarDigital Library
- Faloutsos, M., Faloutsos, P., and Faloutsos, C. 1999. On power-law relationships of the Internet topology. In Proceedings of the 18th ACM SIGCOMM Conference. ACM, New York. 251--262. Google ScholarDigital Library
- Flaxman, A., and Vera, J. 2007. Bias reduction in traceroute sampling: Towards a more accurate map of the internet. In Proceedings of the 5th Workshop on Algorithms and Models for the Web-Graph (WAW2007). Google ScholarDigital Library
- Govindan, R., and Tangmunarunkit, H. 2000. Heuristics for Internet map discovery. In Proceedings of the 19th ACM SIGCOMM Conference. ACM, New York.Google Scholar
- Guillaume, J.-L., Latapy, M., and Magoni, D. 2006. Relevance of massively distributed explorations of the internet topology: Qualitative results. Comput. Netw. 50, 3197--3224. Google ScholarDigital Library
- Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. J. ASA 58, 13--30.Google Scholar
- Kim, J. H. 2006. Poisson cloning model for random graphs. Preprint.Google Scholar
- Kleinberg, J., Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1999. The web as a graph: Measurements, models and methods. In Proceedings of the International Conference on Combinatorics and Computing. Google ScholarDigital Library
- Lakhina, A., Byers, J., Crovella, M., and Xie, P. 2003. Sampling biases in IP topology measurements. In Proceedings of the 22nd IEEE INFOCOM Conference. IEEE Computer Society Press, Los Alamitos.Google Scholar
- Leguay, J., Latapy, M., Friedman, T., and Salamatian, K. 2007. Describing and simulating Internet routes. Comput. Netw. 51, 2067--2087. Google ScholarDigital Library
- McDiarmid, C. 1998. Concentration. In Probabilistic Methods for Algorithmic Discrete Mathematics, M. Habib, C. McDiarmid, J. Ramirez-Alfonsin, and B. Reed, Eds. Springer-Verlag, Berlin, Germany, 195--247.Google Scholar
- Mihail, M., Papadimitriou, C., and Saberi, A. 2003. On certain connectivity properties of the Internet topology. In Proceedings of the 35th ACM Symposium on Theory of Computing. ACM, New York.Google Scholar
- Molloy, M., and Reed, B. 1995. A critical point for random graphs with a given degree sequence. Rand. Struct. Algor. 6, 161--180. Google ScholarDigital Library
- Molloy, M., and Reed, B. 1998. The size of the largest component of a random graph on a fixed degree sequence. Combin. Probab. Comput. 7, 295--306. Google ScholarDigital Library
- Motwani, R., and Raghavan, P. 1990. Randomized Algorithms. Cambridge University Press. Cambridge. Google ScholarDigital Library
- Pansiot, J.-J., and Grad, D. 1998. On routers and multicast trees in the Internet. ACM SIGCOMM Commun. Rev. 28, 41--50. Google ScholarDigital Library
- Petermann, T., and De Los Rios, P. 2004. Exploration of scale-free networks: Do we measure the real exponents? Euro. Phys. J. B 38, 201--204.Google ScholarCross Ref
- Seaborn, J. 1991. Hypergeometric Functions and Their Applications. Springer-Verlag, Berlin, Germany. Google ScholarDigital Library
- Shavitt, Y., and Shir, E. 2005. Dimes: Let the Internet measure itself. ACM SIGCOMM Comput. Commun. Rev. 35, 71--74. Google ScholarDigital Library
- Spanier, J., and Oldham, K. 1987. The exponential integral ei(x) and related functions. In An Atlas of Functions. Hemisphere, Chapter 37, 351--360.Google Scholar
- Spring, N., Mahajan, R., Wetherall, D., and Anderson, T. 2004. Measuring ISP topologies with rocketfuel. IEEE/ACM Trans. Netw. 12, 2--16. Google ScholarDigital Library
- Tangmunarunkit, H., Govindan, R., Shenker, S., and Estrin, D. 2001. The impact of policy on Internet paths. In Proceedings of the 20th IEEE INFOCOM Conference. IEEE Computer Society Press, Los Alamitos.Google Scholar
- Viger, F., Augustin, B., Cuvellier, X., Magnien, C., Latapy, M., Friedman, T., and Teixeira, R. 2006. Detection, understanding, and prevention of traceroute measurement artifacts. In Proceedings of the 6th Internet Measurement Conference (IMC'06).Google Scholar
- Wilf, H. 1994. Generatingfunctionology. Academic Press, Orlando, FL. Google ScholarDigital Library
- Wormald, N. 1999. Models of random regular graphs. In London Math. Soc. Lecture Note Series, J. Lamb and D. Preece, Eds. Vol. 276. Cambridge University Press, Cambridge. 239--298.Google Scholar
Index Terms
- On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs
Recommendations
The Study of Sampling Bias in Traceroute Topology Measurement
IMCCC '11: Proceedings of the 2011 First International Conference on Instrumentation, Measurement, Computer, Communication and ControlThe current study of Internet topology measurement is mainly based on trace route which brings the problem of sampling bias. This paper makes a model of the sampling process, and analyzes the bias of node degree in sampling probe. We find that the probe ...
On the bias of traceroute sampling: or, power-law degree distributions in regular graphs
STOC '05: Proceedings of the thirty-seventh annual ACM symposium on Theory of computingUnderstanding the structure of the Internet graph is a crucial step for building accurate network models and designing efficient algorithms for Internet applications. Yet, obtaining its graph structure is a surprisingly difficult task, as edges cannot ...
Measuring the Impacts of Sampling Bias on Internet AS-Level Topology Inference
CMC '09: Proceedings of the 2009 WRI International Conference on Communications and Mobile Computing - Volume 03In this paper, we carried out a systematic qualitative study of the impacts of sampling bias over Internet AS-level topology inference. Based on the relatively complete Chinese Internet AS graph derived from our large-scale measurement effort, we ...
Comments