research-article

Low latency via redundancy

Authors:
Ashish Vulimiri

UIUC, Urbana, IL, USA

UIUC, Urbana, IL, USA
View Profile

,
Philip Brighten Godfrey

UIUC, Urbana, IL, USA

UIUC, Urbana, IL, USA
View Profile

,
Radhika Mittal

UC Berkeley, Berkeley, CA, USA

UC Berkeley, Berkeley, CA, USA
View Profile

,
Justine Sherry

UC Berkeley, Berkeley, CA, USA

UC Berkeley, Berkeley, CA, USA
View Profile

,
Sylvia Ratnasamy

UC Berkeley, Berkeley, CA, USA

UC Berkeley, Berkeley, CA, USA
View Profile

,
Scott Shenker

UC Berkeley and ICSI, Berkeley, CA, USA

UC Berkeley and ICSI, Berkeley, CA, USA
View Profile

CoNEXT '13: Proceedings of the ninth ACM conference on Emerging networking experiments and technologiesDecember 2013Pages 283–294https://doi.org/10.1145/2535372.2535392

Published:09 December 2013Publication History

CoNEXT '13: Proceedings of the ninth ACM conference on Emerging networking experiments and technologies

Pages 283–294

ABSTRACT

Low latency is critical for interactive networked applications. But while we know how to scale systems to increase capacity, reducing latency --- especially the tail of the latency distribution --- can be much more difficult. In this paper, we argue that the use of redundancy is an effective way to convert extra capacity into reduced latency. By initiating redundant operations across diverse resources and using the first result which completes, redundancy improves a system's latency even under exceptional conditions. We study the tradeoff with added system utilization, characterizing the situations in which replicating all tasks reduces mean latency. We then demonstrate empirically that replicating all operations can result in significant mean and tail latency reduction in real-world systems including DNS queries, database servers, and packet forwarding within networks.

References

M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: dynamic ow scheduling for data center networks. In Proceedings of the 7th USENIX conference on Networked systems design and implementation, NSDI'10, pages 19--19, Berkeley, CA, USA, 2010. USENIX Association. Google ScholarDigital Library
M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data center TCP (DCTCP). In SIGCOMM, 2010. Google ScholarDigital Library
M. Alizadeh, S. Yang, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. Deconstructing datacenter packet transport. In Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI, pages 133--138, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Why let resources idle? Aggressive cloning of jobs with Dolly. In USENIX HotCloud, 2012. Google ScholarDigital Library
D. G. Andersen, H. Balakrishnan, M. F. Kaashoek, and R. N. Rao. Improving web availability for clients with MONET. In USENIX NSDI, pages 115--128, Berkeley, CA, USA, 2005. USENIX Association. Google ScholarDigital Library
S. Asmussen. Applied Probability and Queues. Wiley, 1987.Google Scholar
D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel. Finding a needle in haystack: facebook's photo storage. In Proceedings of the 9th USENIX conference on Operating systems design and implementation, OSDI'10, pages 1--8, Berkeley, CA, USA, 2010. USENIX Association. Google ScholarDigital Library
T. Benson, A. Akella, and D. A. Maltz. Network traffic characteristics of data centers in the wild. In IMC, pages 267--280, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
J. Brutlag. Speed matters for Google web search, June 2009. http://services.google.com/fh/files/blogs/google_delayexp.pdf.Google Scholar
Apache Cassandra. http://cassandra.apache.org.Google Scholar
E. W. Chan, X. Luo, W. Li, W. W. Fok, and R. K. Chang. Measurement of loss pairs in network paths. In IMC, pages 88--101, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
J. Chu. Tuning TCP parameters for the 21st century. http://www.ietf.org/proceedings/75/slides/tcpm-1.pdf, July 2009.Google Scholar
J. Dean and L. A. Barroso. The tail at scale. Commun. ACM, 56(2):74--80, Feb. 2013. Google ScholarDigital Library
P. Dixon. Shopzilla site redesign -- we get what we measure, June 2009. http://www.slideshare.net/shopzilla/shopzillas-you-get-what-you-measure-velocity-2009.Google Scholar
C. C. Foster and E. M. Riseman. Percolation of code to enhance parallel dispatching and execution. IEEE Trans. Comput., 21(12):1411--1415, Dec. 1972. Google ScholarDigital Library
Google AppEngine datastore: memcached cache. https://developers.google.com/appengine/docs/python/memcache/usingmemcache#Pattern.Google Scholar
W. Gray and D. Boehm-Davis. Milliseconds matter: An introduction to microstrategies and to their use in describing and predicting interactive behavior. Journal of Experimental Psychology: Applied, 6(4):322, 2000.Google ScholarCross Ref
A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: a scalable and exible data center network. In ACM SIGCOMM, pages 51--62, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
D. Han, A. Anand, A. Akella, and S. Seshan. RPT: re-architecting loss protection for content-aware networks. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, NSDI'12, pages 6--6, Berkeley, CA, USA, 2012. USENIX Association. Google ScholarDigital Library
C. Hopps. Computing TCP's retransmission timer (RFC 6298), 2000.Google Scholar
S. Jain, M. Demmer, R. Patra, and K. Fall. Using redundancy to cope with failures in a delay tolerant network. In ACM SIGCOMM, 2005. Google ScholarDigital Library
J. Li, J. Stribling, R. Morris, and M. Kaashoek. Bandwidth-efficient management of DHT routing tables. In NSDI, 2005. Google ScholarDigital Library
D. S. Myers and M. K. Vernon. Estimating queue length distributions for queues with random arrivals. SIGMETRICS Perform. Eval. Rev., 40(3):77--79, Jan. 2012. Google ScholarDigital Library
M. Olvera-Cravioto, J. Blanchet, and P. Glynn. On the transition from heavy-traffic to heavy-tails for the m/g/1 queue: The regularly varying case. Annals of Applied Probability, 21:645--668, 2011.Google ScholarCross Ref
S. Ramachandran. Web metrics: Size and number of resources, May 2010. https://developers.google.com/speed/articles/web-metrics.Google Scholar
K. Sigman. Appendix: A primer on heavy-tailed distributions. Queueing Systems, 33(1-3):261--275, 1999. Google ScholarDigital Library
E. Soljanin. Reducing delay with coding in (mobile) multi-agent information transfer. In Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on, pages 1428--1433. IEEE, 2010.Google ScholarCross Ref
S. Souders. Velocity and the bottom line. http://radar.oreilly.com/2009/07/velocity-making-your-site-fast.html.Google Scholar
A. Vulimiri, P. B. Godfrey, and S. Shenker. A cost-benefit analysis of low latency via added utilization, June 2013. http://web.engr.illinois.edu/~vulimir1/benchmark.pdf.Google Scholar
A. Vulimiri, O. Michel, P. B. Godfrey, and S. Shenker. More is less: Reducing latency via redundancy. In Eleventh ACM Workshop on Hot Topics in Networks (HotNets-XI), October 2012. Google ScholarDigital Library
X. Wu and X. Yang. Dard: Distributed adaptive routing for datacenter networks. In Proceedings of the 2012 IEEE 32nd International Conference on Distributed Computing Systems, ICDCS '12, pages 32--41, Washington, DC, USA, 2012. IEEE Computer Society. Google ScholarDigital Library
M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In USENIX OSDI, pages 29--42, Berkeley, CA, USA, 2008. Google ScholarDigital Library
A. P. Zwart. Queueing Systems With Heavy Tails. PhD thesis, Technische Universiteit Eindhoven, September 2001.Google Scholar

Index Terms

Low latency via redundancy
1. Hardware
  1. Communication hardware, interfaces and storage
2. Networks

Recommendations

Fine-grained latency and loss measurements in the presence of reordering
SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems

Modern trading and cluster applications require microsecond latencies and almost no losses in data centers. This paper introduces an algorithm called FineComb that can estimate fine-grain end-to-end loss and latency measurements between edge routers in ...
Read More
Fine-grained latency and loss measurements in the presence of reordering
Performance evaluation review

Modern trading and cluster applications require microsecond latencies and almost no losses in data centers. This paper introduces an algorithm called FineComb that can estimate fine-grain end-to-end loss and latency measurements between edge routers in ...
Read More
Performance analysis of an ATM switch with multiple paths
ICNP '95: Proceedings of the 1995 International Conference on Network Protocols

An ATM switch based on multistage interconnection networks with multiple paths can support higher bandwidth than that of buffered networks with single path by passing multiple packets to the same destination simultaneously. These multiple packets are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CoNEXT '13: Proceedings of the ninth ACM conference on Emerging networking experiments and technologies
December 2013
454 pages
ISBN:9781450321013
DOI:10.1145/2535372
General Chairs:
Kevin Almeroth
UC Santa Barbara, USA
,
Laurent Mathy
Université de Liège, Belgium
,
Program Chairs:
Konstantina Papagiannaki
Telefonica Research, Spain
,
Vishal Misra
Columbia University
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 December 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
latency
performance
reliability
Qualifiers
- research-article
Conference

Acceptance Rates
CoNEXT '13 Paper Acceptance Rate44of226submissions,19%Overall Acceptance Rate198of789submissions,25%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 180
  Total Citations
  View Citations
- 740
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.