skip to main content
research-article

Dcell: a scalable and fault-tolerant network structure for data centers

Published:17 August 2008Publication History
Skip Abstract Section

Abstract

A fundamental challenge in data center networking is how to efficiently interconnect an exponentially increasing number of servers. This paper presents DCell, a novel network structure that has many desirable features for data center networking. DCell is a recursively defined structure, in which a high-level DCell is constructed from many low-level DCells and DCells at the same level are fully connected with one another. DCell scales doubly exponentially as the node degree increases. DCell is fault tolerant since it does not have single point of failure and its distributed fault-tolerant routing protocol performs near shortest-path routing even in the presence of severe link or node failures. DCell also provides higher network capacity than the traditional tree-based structure for various types of services. Furthermore, DCell can be incrementally expanded and a partial DCell provides the same appealing features. Results from theoretical analysis, simulations, and experiments show that DCell is a viable interconnection structure for data centers.

References

  1. S. Akers and B. Krishnamurthy. A group-theoretic model for symmetric interconnection networks. IEEE trans. Computers, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Arnold. Google Version 2.0: The Calculating Predator, 2007. Infonortics Ltd.Google ScholarGoogle Scholar
  3. L. Barroso, J. Dean, and U. Hölzle. Web Search for a Planet: The Google Cluster Architecture. IEEE Micro, March-April 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Carter. Do It Green: Media Interview with Michael Manos, 2007. http://edge.technet.com/Media/Doing-IT-Green/.Google ScholarGoogle Scholar
  5. J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI'04, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Duato, S. Yalamanchili, and L. Ni. Interconnection networks: an engineering approach. Morgan Kaufmann, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Chang et. al. Bigtable: A Distributed Storage System for Structured Data. In OSDI'06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Ghemawat, H. Gobioff, and S. Leung. The Google File System. In ACM SOSP'03, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Hoff. Google Architecture, July 2007. http://highscalability.com/google-architecture.Google ScholarGoogle Scholar
  10. Intel. High-Performance 1000BASE-SX and 1000BASE-LX Gigabit Fiber Connections for Servers. http://www.intel.com/network/connectivity/resources/doc_library/data_sheets/pro1000mf_mf-lx.pdf.Google ScholarGoogle Scholar
  11. M. Isard, M. Budiu, and Y. Yu. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. In ACM EuroSys, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Leighton. Introduction to Parallel Algorithms and Architectures: Arrays. Trees. Hypercubes. Morgan Kaufmann, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Liszka, J. Antonio, and H. Siegel. Is an Alligator Better Than an Armadillo? IEEE Concurrency, Oct-Dec 1997.Google ScholarGoogle Scholar
  14. D. Loguinov, A. Kumar, V. Rai, and S. Ganesh. Graph-Theoretic Analysis of Structured Peer-to-Peer Systems: Routing Distances and Fault Resilience. In ACM SIGCOMM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Moy. OSPF Version 2, April 1998. RFC 2328.Google ScholarGoogle Scholar
  16. L. Ni and P. McKinley. A Survey of Wormhole Routing Techniques in Direct Networks. IEEE Computer, Feb 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Parhami. Introduction to Parallel Processing: Algorithms and Architectures. Kluwer Academic, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jon Postel. Internet Protocol. RFC 791.Google ScholarGoogle Scholar
  19. L. Rabbe. Powering the Yahoo! network, 2006. http://yodel.yahoo.com/2006/11/27/powering-the-yahoo-network/.Google ScholarGoogle Scholar
  20. S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content-addressable network. In ACM SIGCOMM'01, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Jay Seigel, W. Nation, C. Kruskal, and L. Napolitando. Using the Multistage Cube Network Topology in Parallel Supercomputers. Proceedings of the IEEE, Dec 1989.Google ScholarGoogle Scholar
  22. J. Snyder. Microsoft: Datacenter Growth Defies Moore's Law, 2007. http://www.pcworld.com/article/id,130921/article.html.Google ScholarGoogle Scholar
  23. I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In ACM SIGCOMM'01, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dcell: a scalable and fault-tolerant network structure for data centers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGCOMM Computer Communication Review
        ACM SIGCOMM Computer Communication Review  Volume 38, Issue 4
        October 2008
        436 pages
        ISSN:0146-4833
        DOI:10.1145/1402946
        Issue’s Table of Contents
        • cover image ACM Conferences
          SIGCOMM '08: Proceedings of the ACM SIGCOMM 2008 conference on Data communication
          August 2008
          452 pages
          ISBN:9781605581750
          DOI:10.1145/1402958

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 August 2008

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader