Abstract
Communication networks are complex systems whose operation relies on a large number of components that work together to provide services to end users. As the quality of these services depends on different parameters, understanding how each of them impacts the final performance of a service is a challenging but important problem. However, intervening on individual factors to evaluate the impact of the different parameters is often impractical due to the high cost of intervention in a network. It is, therefore, desirable to adopt a formal approach to understand the role of the different parameters and to predict how a change in any of these parameters will impact performance.
The approach of causality pioneered by J. Pearl provides a powerful framework to investigate these questions. Most of the existing theory is non-parametric and does not make any assumption on the nature of the system under study. However, most of the implementations of causal model inference algorithms and most of the examples of usage of a causal model to predict intervention rely on assumptions such linearity, normality, or discrete data.
In this article, we present a methodology to overcome the challenges of working with real-world data and extend the application of causality to complex systems in the area of telecommunication networks, for which assumptions of normality, linearity and discrete data do no hold. Specifically, we study the performance of TCP, which is the prevalent protocol for reliable end-to-end transfer in the Internet. Analytical models of the performance of TCP exist, but they take into account the state of network only and disregard the impact of the application at the sender and the receiver, which often influences TCP performance. To address this point, we take as application the file transfer protocol (FTP), which uses TCP for reliable transfer. Studying a well-understood protocol such as TCP allows us to validate our approach and compare its results to previous studies.
We first present and evaluate our methodology using TCP traffic obtained via network emulation, which allows us to experimentally validate the prediction of an intervention. We then apply the methodology to real-world TCP traffic sent over the Internet. Throughout the article, we compare the causal approach for studying TCP performance to other approaches such as analytical modeling or simulation and and show how they can complement each other.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, A Causal Approach to the Study of TCP Performance
- D. W. K. Andrews and M. Buchinsky. 2001. Evaluation of a three-step method for choosing the number of bootstrap repetitions. Journal of Econometrics 103, 1--2 (July 2001), 345--386.Google ScholarCross Ref
- J. Audrius, L. Jukka-Pekka, H. Matti, and W. Alf Inge. 2011. An empirical study of NetEm network emulation functionalities.. In ICCCN. IEEE, 1--6.Google Scholar
- G. Borboudakis and I. Tsamardinos. 2012. Incorporating causal prior knowledge as path-constraints in bayesian networks and maximal ancestral graphs. ArXiv e-prints (June 2012).Google Scholar
- C. Chow and C. Liu. 1968. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory 14, 3 (May 1968), 462--467. DOI:http://dx.doi.org/10.1109/TIT.1968.1054142 Google ScholarDigital Library
- A. C. Davison and D.V. Hinkley. 1997. Bootstrap Methods and their Application. Cambridge University Press.Google Scholar
- M. Dischinger, A. Mislove, A. Haeberlen, and K. P. Gummadi. 2008. Detecting bittorrent blocking. In Proceedings of the 8th ACM SIGCOMM Conference on Internet Measurement (IMC’08). ACM, 3--8. Google ScholarDigital Library
- T. En-Najjary and G. Urvoy-Keller. 2006. PPrate: A passive capacity estimation tool. In E2EMON. 82--89.Google Scholar
- P. Ford, A. Shelest, and N. Srinivas. 2002. Method for automatic tuning of TCP receive window. (Aug. 15 2002). US Patent App. 09/736,988.Google Scholar
- S. Fortin-Parisi and B. Sericola. 2004. A Markov model of TCP throughput, goodput and slow start. Performance Evaluation 58, 2+3 (Nov. 2004), 89--108. Google ScholarDigital Library
- FreeBSD. 2009. FreeBSD File Formats Manual. Retrieved from http://www.freebsd.org/cgi/man.cgi?query=tar.Google Scholar
- A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Schölkopf, and A. J. Smola. 2007. A kernel statistical test of independence. In NIPS. Curran Associates, Inc.Google Scholar
- N. Handigol, B. Heller, V. Jeyakumar, B. Lantz, and N. McKeown. 2012. Reproducible network experiments using container-based emulation. In CoNEXT’12. ACM, New York, NY, 253--264. Google ScholarDigital Library
- H. Hours, E. Biersack, and P. Loiseau. 2014a. A causal study of an emulated network. In 10ème Atelier en Evaluation de Performances.Google Scholar
- H. Hours, E. Biersack, and P. Loiseau. 2014b. Causal study of network performance. In ALGOTEL 2014, 16èmes Rencontres Francophones sur les Aspects Algorithmiques des Télécommunications.Google Scholar
- P. O. Hoyer and A. Hyttinen. 2009. Bayesian discovery of linear acyclic causal models. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI’09). AUAI Press, Arlington, Virginia, 240--248. Google ScholarDigital Library
- P. Jaworski, F. Durante, W. K. Härdle, and T. Rychlik. 2010. Copula Theory and Its Applications. Springer.Google Scholar
- I. Kaj and J. Olsén. 2001. Throughput modeling and simulation for single connection TCP-Tahoe. In ITC-I7. Teletraffic Science and Engineering, Vol. 4. Elsevier, 705--718.Google Scholar
- M. Kalisch, M. Mächler, D. Colombo, M. H. Maathuis, and P. Bühlmann. 2012. Causal inference using graphical models with the R package pcalg. Journal of Statistical Software 47, 11 (2012), 1--26.Google ScholarCross Ref
- M. Koivisto. 2006. Advances in exact Bayesian structure discovery in Bayesian networks. In UAI.Google Scholar
- M. Koivisto and K. Sood. 2004. Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research 5 (Dec. 2004), 549--573. Google ScholarDigital Library
- J. F. Kurose and K. W. Ross. 2009. Computer Networking: A Top-Down Approach (5th ed.). Addison-Wesley. Google ScholarDigital Library
- B. Lantz, B. Heller, and N. McKeown. 2010. A network in a laptop: Rapid prototyping for software-defined networks. In Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks (Hotnets-IX). ACM, New York, NY, Article 19, 6 pages. Google ScholarDigital Library
- P. Loiseau, P. Gonçalves, J. Barral, and P. Vicat-Blanc Primet. 2010. Modeling TCP throughput: An elaborated large-deviations-based model and its empirical validation. In Proceedings of IFIP Performance. Google ScholarDigital Library
- M. Mathis, J. Heffner, and R. Reddy. 2003. Web100: Extended TCP instrumentation for research, education and diagnosis. Computer Communication Review 33, 3 (2003), 69--79. Google ScholarDigital Library
- D. C. Maxwell. 2002. Learning equivalence classes of Bayesian-network structures. Journal of Machine Learning Research 2 (March 2002), 445--498. Google ScholarDigital Library
- C. Meek. 1995. Causal inference and causal explanation with background knowledge. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence (UAI’95). Morgan Kaufmann, San Francisco, CA, 403--410. Google ScholarDigital Library
- M. Mellia, R. Lo Cigno, and F. Neri. 2005. Measuring IP and TCP behavior on edge nodes with Tstat. Computer Networks 47, 1 (Jan. 2005), 1--21. Google ScholarDigital Library
- M. Mirza, J. Sommers, P. Barford, and X. Zhu. 2007. A machine learning approach to TCP throughput prediction. In SIGMETRICS’07. ACM, New York, NY, 97--108. Google ScholarDigital Library
- K. P. Murphy. 2001. The Bayes net toolbox for MATLAB. Computing Science and Statistics 33 (2001), 2001.Google Scholar
- M. Nagode and M. Fajdiga. 2011. The REBMIX algorithm for the univariate finite mixture estimation. Communications in Statistics - Theory and Methods 40, 5 (2011), 876--892.Google ScholarCross Ref
- J. Padhye, V. Firoiu, D. Towsley, and J. Kurose. 1998. Modeling TCP throughput: A simple model and its empirical validation. SIGCOMM Computer Communication Review 28, 4 (Oct. 1998), 303--314. Google ScholarDigital Library
- J. Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- J. Pearl. 2009. Causality: Models, Reasoning and Inference. Cambridge University Press, New York, NY, USA. Google ScholarDigital Library
- J. Pearl. 2013. Direct and indirect effects. CoRR abs/1301.2300 (2013). Google ScholarDigital Library
- J. Ramsey. 2014. A scalable conditional independence test for nonlinear, non-Gaussian data. CoRR abs/1401.5031 (2014).Google Scholar
- J. D. Ramsey, S. J. Hanson, C. Hanson, Y. O. Halchenko, R. A. Poldrack, and C. Glymour. 2010. Six problems for causal inference from fMRI. Neuroimage 49, 2 (2010), 1545--1558.Google ScholarCross Ref
- S. M. H. Shah, A. u. Rehman, A. N. Khan, and M. A. Shah. 2007. TCP throughput estimation: A new neural networks model. In Proceedings of the International Conference on Emerging Technologies (ICET’07). 94--98.Google Scholar
- M. Siekkinen, E. W. Biersack, G. Urvoy-Keller, V. Goebel, and T. Plagemann. 2005. InTraBase: Integrated traffic analysis based on a database management system. In E2EMON. IEEE, 32--46. Google ScholarDigital Library
- M. Siekkinen, G. Urvoy-Keller, E. W. Biersack, and D. Collange. 2008. A root cause analysis toolkit for TCP. Computer Networks 52, 9 (2008), 56. Google ScholarDigital Library
- B. Sikdar, S. Kalyanaraman, and K. S. Vastola. 2001. Analytic models for the latency and steady-state throughput of TCP Tahoe, Reno and SACK. In Proceedings of the IEEE GLOBECOM. 25--29.Google Scholar
- P. Spirtes and C. Glymour. 1991. An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review 9, 1 (1991), 62--72.Google ScholarCross Ref
- P. Spirtes, C. Glymour, and R. Scheines. 2001. Causation, Prediction, and Search (second ed.). MIT Press, Cambridge, MA, USA.Google Scholar
- P. Spirtes, C. Meek, and T. Richardson. 1995. Causal inference in the presence of latent variables and selection bias. In Proceedings of 11th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, 499--506. Google ScholarDigital Library
- M. Tariq, M. Motiwala, N. Feamster, and M. Ammar. 2009. Detecting network neutrality violations with causal inference. In CoNEXT. ACM, 289--300. Google ScholarDigital Library
- M. Tariq, A. Zeitoun, V. Valancius, N. Feamster, and M. Ammar. 2008. Answering what-if deployment and configuration questions with wise. In SIGCOMM’08. ACM, New York, NY, USA, 99--110. Google ScholarDigital Library
- B. Thomas, R. Jurdak, and I. Atkinson. 2012. SPDYing up the web. Commun. ACM 55, 12 (Dec. 2012), 64--73. Google ScholarDigital Library
- R. Tillman, A. Gretton, and P. Spirtes. 2009. Nonlinear directed acyclic structure learning with weakly additive noise models. In Advances in Neural Information Processing Systems (NIPS).Google Scholar
- M. E. Tipping. 2001. Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1 (2001), 211--244. Google ScholarDigital Library
- M. Voortman and M. J. Druzdzel. 2008. Insensitivity of constraint-based causal discovery algorithms to violations of the assumption of multivariate normality. In Proceedings of the FLAIRS Conference. AAAI Press, 690--695.Google Scholar
- K. Winstein and H. Balakrishnan. 2013. Tcp ex machina: Computer-generated congestion control. In ACM SIGCOMM Computer Communication Review, Vol. 43. 123--134. Google ScholarDigital Library
- K. Zhang, J. Peters, D. Janzing, and B. Schölkopf. 2012. Kernel-based conditional independence test and application in causal discovery. CoRR abs/1202.3775 (2012).Google Scholar
- Y. Zhang, L Breslau, V. Paxson, and S. Shenker. 2002. On the characteristics and origins of internet flow rates. In Proceedings of the 2002 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’02). ACM, 309--322. Google ScholarDigital Library
Index Terms
- A Causal Approach to the Study of TCP Performance
Recommendations
Enhancing TCP performance in wide-area cellular wireless networks: transport level approaches
Wireless communications systems and networksInternet technology-based architectures and protocols for supporting multimedia traffic over wireless networks are evolving. Since TCP (Transmission Control Protocol)/IP (Internet Protocol) is the standard network protocol stack on the Internet, its use ...
An Extended Review of Techniques for Enhancing TCP Performance
Transmission Control Protocol (TCP) is considered one of the most important protocols in the Internet. An important mechanism in TCP is the congestion control mechanism which controls TCP sending rate and makes TCP react to congestion signals. Nowadays ...
The incremental deployability of RTT-based congestion avoidance for high speed TCP Internet connections
SIGMETRICS '00: Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systemsOur research focuses on end-to-end congestion avoidance algorithms that use round trip time (RTT) fluctuations as an indicator of the level of network congestion. The algorithms are referred to as delay-based congestion avoidance or DCA. Due to the ...
Comments