skip to main content
10.1145/2487575.2487581acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC

Published:11 August 2013Publication History

ABSTRACT

Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although distributed graph engines such as GBase and Pregel handle billion-scale graphs, the user needs to be skilled at managing and tuning a distributed system in a cluster, which is a nontrivial job for the ordinary user. Furthermore, these distributed systems need many machines in a cluster in order to provide reasonable performance. In order to address this problem, a disk-based parallel graph engine called Graph-Chi, has been recently proposed. Although Graph-Chi significantly outperforms all representative (disk-based) distributed graph engines, we observe that Graph-Chi still has serious performance problems for many important types of graph queries due to 1) limited parallelism and 2) separate steps for I/O processing and CPU processing. In this paper, we propose a general, disk-based graph engine called TurboGraph to process billion-scale graphs very efficiently by using modern hardware on a single PC. TurboGraph is the first truly parallel graph engine that exploits 1) full parallelism including multi-core parallelism and FlashSSD IO parallelism and 2) full overlap of CPU processing and I/O processing as much as possible. Specifically, we propose a novel parallel execution model, called pin-and-slide. TurboGraph also provides engine-level operators such as BFS which are implemented under the pin-and-slide model. Extensive experimental results with large real datasets show that TurboGraph consistently and significantly outperforms Graph-Chi by up to four orders of magnitude! Our implementation of TurboGraph is available at ``http://wshan.net/turbograph}" as executable files.

References

  1. Yahoo webscope. yahoo! altavista web page hyperlink connectivity graph. http://webscope.sandbox.yahoo.com.Google ScholarGoogle Scholar
  2. L. Addario-Berry, W. S. Kennedy, A. D. King, Z. Li, and B. A. Reed. Finding a maximum-weight induced k-partite subgraph of an i-triangulated graph. Discrete Applied Mathematics, 158(7):765--770, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD, pages 44--54. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: distributed graph-parallel computation on natural graphs. In OSDI, pages 17--30, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. W.-S. Han, J. Lee, and J.-H. Lee. TurboISO: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W.-S. Han, J. Lee, M.-D. Pham, and J. X. Yu. igraph: a framework for comparisons of disk-based graph indexing techniques. Proc. VLDB Endow., 3(1--2):449--459, Sept. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-marl: a dsl for easy and efficient graph analysis. In ASPLOS, pages 349--362, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys, pages 59--72, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. Gbase: a scalable and general graph management system. In KDD, pages 1091--1099, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. gbase: an efficient analysis platform for large graphs. VLDB J., 21(5):637--650, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system - implementation and observations. In ICDM, pages 229--238, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Karypis and V. Kumar. Parallel multilevel k-way partitioning for irregular graphs. SIAM Review, 41(2):278--300, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Kwak, C. Lee, H. Park, and S. B. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: large-scale graph computation on just a pc. In OSDI, pages 31--46, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Lee, W.-S. Han, R. Kasperovics, and J.-H. Lee. An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB, 6(2):133--144, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning in the cloud. PVLDB, 5(8):716--727, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135--146, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Maserrat and J. Pei. Neighbor query friendly compression of social networks. In KDD, pages 533--542, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Shao, H. Wang, and Y. Li. The trinity graph engine. Technical Report 161291, Microsoft Research, 2012.Google ScholarGoogle Scholar
  20. Y. Shiloach and U. Vishkin. An o(logn) parallel connectivity algorithm. Journal of Algorithms, 3(1):57--67, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Stonebraker et al. C-store: a column-oriented dbms. In VLDB, pages 553--564, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Zhao and J. Han. On graph query optimization in large networks. PVLDB, 3(1):340--351, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2013
        1534 pages
        ISBN:9781450321747
        DOI:10.1145/2487575

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 August 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '13 Paper Acceptance Rate125of726submissions,17%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader