ABSTRACT
Many network applications have stringent end-to-end latency requirements, including VoIP and interactive video conferencing, automated trading, and high-performance computing---where even microsecond variations may be intolerable. The resulting fine-grain measurement demands cannot be met effectively by existing technologies, such as SNMP, NetFlow, or active probing. We propose instrumenting routers with a hash-based primitive that we call a Lossy Difference Aggregator (LDA) to measure latencies down to tens of microseconds and losses as infrequent as one in a million.
Such measurement can be viewed abstractly as what we refer to as a coordinated streaming problem, which is fundamentally harder than standard streaming problems due to the need to coordinate values between nodes. We describe a compact data structure that efficiently computes the average and standard deviation of latency and loss rate in a coordinated streaming environment. Our theoretical results translate to an efficient hardware implementation at 40 Gbps using less than 1% of a typical 65-nm 400-MHz networking ASIC. When compared to Poisson-spaced active probing with similar overheads, our LDA mechanism delivers orders of magnitude smaller relative error; active probing requires 50--60 times as much bandwidth to deliver similar levels of accuracy.
- Corvil, Ltd. http://www.corvil.com.Google Scholar
- Multicast-based intference of network-internal characteristics. http://gaia.cs.umass.edu/minc/.Google Scholar
- Alon, N., Matias, Y., and Szegedy, M. The space complexity of approximating the frequency moments. J. Computer and System Sciences 58, 1 (Feb. 1999), 137--147. Google ScholarDigital Library
- Arista Networks, Inc. 7100 series datasheet. http://www.aristanetworks.com/en/7100_datasheet.pdf, 2008.Google Scholar
- Beigbeder, T., Coughlan, R., Lusher, C., Plunkett, J., Agu, E., and Claypool, M. The effects of loss and latency on user performance in Unreal Tournament 2003. In Proceedings of the ACM SIGCOMM Workshop on Network Games (Aug. 2004). Google ScholarDigital Library
- Chen, Y., Bindel, D., Song, H., and Katz, R. H. An algebraic approach to practical and scalable overlay network monitoring. In ACM SIGCOMM (Sept. 2004). Google ScholarDigital Library
- Dobra, A., Garofalakis, M., Gehrke, J. E., and Rastogi, R. Processing complex aggregate queries over data streams. In Proceedings of ACM SIGMOD (June 2002). Google ScholarDigital Library
- Duffield, N. Simple network performance tomography. In Proceedings of USENIX/ACM Internet Measurement Conference (Oct. 2003). Google ScholarDigital Library
- Duffield, N., Gerber, A., and Grossglauser, M. Trajectory engine: A backend for trajectory sampling. In Proceedings of IEEE Network Operations and Management Symposium (Apr. 2002).Google ScholarCross Ref
- Duffield, N., and Grossglauser, M. Trajectory sampling for direct traffic observation. In Proceedings of ACM SIGCOMM (Aug. 2000). Google ScholarDigital Library
- Estan, C., and Varghese, G. New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Transactions on Computer Systems 21, 3 (Aug. 2003). Google ScholarDigital Library
- Estan, C., Varghese, G., and Fisk, M. Bitmap algorithms for counting active flows on high speed links. In Proceedings of the USENIX/ACM Internet Measurement Conference (Oct. 2003). Google ScholarDigital Library
- Hoeffding, W. Probability inequalities for sums of bounded random variables. J. American Statistical Association 58, 301 (March 1963), 13--30.Google ScholarCross Ref
- Hohn, N., Veitch, D., Papagiannaki, K., and Diot, C. Bridging router performance and queuing theory. In Proceedings of ACM SIGMETRICS (June 2004). Google ScholarDigital Library
- IEEE. Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems, 2002. IEEE/ANSI 1588 Standard.Google Scholar
- INCITS. Fibre channel backbone-5 (FC-BB-5), Oct. 2008. Ver. 1.03.Google Scholar
- Kandula, S., Katabi, D., and Vasseur, J. P. Shrink: A tool for failure diagnosis in IP networks. In Proceedings of ACM SIGCOMM MineNet Workshop (Aug. 2005). Google ScholarDigital Library
- Kompella, R. R., Yates, J., Greenberg, A., and Snoeren, A. C. Detection and localization of network black holes. In Proceedings of IEEE Infocom (May 2007).Google ScholarDigital Library
- Kumar, A., Sung, M., Xu, J., and Zegura, E. W. A data streaming algorithm for estimating subpopulation flow size distribution. In Proceedings of ACM SIGMETRICS (June 2005). Google ScholarDigital Library
- Lall, A., Sekar, V., Ogihara, M., Xu, J., and Zhang, H. Data streaming algorithms for estimating entropy of network traffic. In Proceedings of ACM SIGMETRICS (June 2006). Google ScholarDigital Library
- London Stock Exchange plc. Launch of exchange hosting creates sub-millisecond access to its markets. http://www.londonstockexchange.com/NR/exeres/04192D02-B949-423D-94E2-68%3D7506C530.htm, Sept. 2008.Google Scholar
- Lu, Y., Montanari, A., Prabhakar, B., Dharmapurikar, S., and Kabbani, A. Counter braids: a novel counter architecture for per-flow measurement. In Proceedings of ACM SIGMETRICS (June 2008). Google ScholarDigital Library
- Machiraju, S., and Veitch, D. A measurement-friendly network (MFN) architecture. In Proceedings of ACM SIGCOMM Workshop on Internet Network Management (Sept. 2006). Google ScholarDigital Library
- Mahdavi, J., Paxson, V., Adams, A., and Mathis, M. Creating a scalable architecture for internet measurement. In Proceedings of INET (July 1998).Google Scholar
- Martin, R. Wall street's quest to process data at the speed of light. http://www.informationweek.com/news/infrastructure/showArticle.jhtml?articleID=199200297.Google Scholar
- Misra, V., Gong, W.-B., and Towsley, D. Stochastic differential equation modeling and analysis of tcp windowsize behavior. In Proceedings of IFIP WG 7.3 Performance (Nov. 1999).Google Scholar
- Nguyen, H. X., and Thiran, P. Network loss inference with second order statistics of end-to-end flows. In Proceedings of ACM Internet Measurement Conference (Oct. 2007). Google ScholarDigital Library
- Papagiannaki, K., Moon, S., Fraleigh, C., Thiran, P., Tobagi, F., and Diot, C. Analaysis of measured single--hop delay from an operational backbone network. IEEE Journal on Selected Areas in Communications 21, 6 (Aug. 2003).Google ScholarDigital Library
- Ramakrishna, M., Fu, E., and Bahcekapili, E. Efficient hardware hashing functions for high performance computers. IEEE Transactions on Computers 46, 12 (Dec. 1997). Google ScholarDigital Library
- Riska, M., Malik, D., and Kessler, A. Trading flow architecture. Tech. rep., Cisco Systems, Inc. http://www.cisco.com/en/US/docs/solutions/Verticals/Trading_Floor_Architecture-E.pdf.Google Scholar
- Savage, S. Sting: a TCP--based network measurement tool. In Proceedings of USENIX Symposium on Internet Technologies and Systems (Oct. 1999). Google ScholarDigital Library
- Shaikh, A., and Greenberg, A. OSPF monitoring: Architecture, design and deployment experience. In Proceedings of USENIX NSDI (Mar. 2004). Google ScholarDigital Library
- Sommers, J., Barford, P., Duffield, N., and Ron, A. Improving accuracy in end-to-end packet loss measurement. In Proceedings of ACM SIGCOMM (Aug. 2005). Google ScholarDigital Library
- Szigeti, T., and Hattingh, C. Quality of service design overview. http://www.ciscopress.com/articles/article.asp?p=357102&seqNum=2, Dec. 2004.Google Scholar
- Toomey, F. Monitoring and analysis of traffic for low-latency trading networks. Tech. rep., Corvil, Ltd., 2008.Google Scholar
- Vardi, Y. Network tomography: estimating source-destination traffic intensities from link data. J. American Statistical Association 91 (1996), 365--377.Google ScholarCross Ref
- Woven Systems, Inc. EFX switch series overview. http://www.wovensystems.com/pdfs/products/Woven_EFX_Series.pdf, 2008.Google Scholar
- Zhang, Y., Roughan, M., Duffield, N., and Greenberg, A. Fast accurate computation of large-scale IP traffic matrices from link loads. In Proceedings of ACM SIGMETRICS (June 2003). Google ScholarDigital Library
- Zhao, Y., Chen, Y., and Bindel, D. Towards unbiased end-to-end network diagnosis. In Proceedings of ACM SIGCOMM (Sept. 2006). Google ScholarDigital Library
- Zseby, T., Zander, S., and Carle, G. Evaluation of building blocks for passive one-way-delay measurements. In Proceedings of Passive and Active Measurement Workshop (Apr. 2001).Google Scholar
Index Terms
- Every microsecond counts: tracking fine-grain latencies with a lossy difference aggregator
Recommendations
Every microsecond counts: tracking fine-grain latencies with a lossy difference aggregator
SIGCOMM '09Many network applications have stringent end-to-end latency requirements, including VoIP and interactive video conferencing, automated trading, and high-performance computing---where even microsecond variations may be intolerable. The resulting fine-...
FineComb: measuring microscopic latency and loss in the presence of reordering
Modern stock trading and cluster applications require microsecond latencies and almost no losses in data centers. This paper introduces an algorithm called FineComb that can obtain fine-grain end-to-end loss and latency measurements between edge routers ...
Fine-grained latency and loss measurements in the presence of reordering
SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systemsModern trading and cluster applications require microsecond latencies and almost no losses in data centers. This paper introduces an algorithm called FineComb that can estimate fine-grain end-to-end loss and latency measurements between edge routers in ...
Comments