skip to main content
10.1145/2020408.2020580acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

GBASE: a scalable and general graph management system

Published:21 August 2011Publication History

ABSTRACT

Graphs appear in numerous applications including cyber-security, the Internet, social networks, protein networks, recommendation systems, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose GBASE, a scalable and general graph management and mining system. The key novelties lie in 1) our storage and compression scheme for a parallel setting and 2) the carefully chosen graph operations and their efficient implementation. We designed and implemented an instance of GBASE using MapReduce/Hadoop. GBASE provides a parallel indexing mechanism for graph mining operations that both saves storage space, as well as accelerates queries. We ran numerous experiments on real graphs, spanning billions of nodes and edges, and we show that our proposed GBASE is indeed fast, scalable and nimble, with significant savings in space and time.

References

  1. Hadoop information. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  2. Daniel J. Abadi, Peter A. Boncz, and Stavros Harizopoulos. Column oriented database systems. PVLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Daniel J. Abadi, Samuel Madden, and Nabil Hachem. Column-stores vs. row-stores: how different are they really? In SIGMOD Conference, pages 967--980, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Louigi Addario-Berry, W. Sean Kennedy, Andrew D. King, Zhentao Li, and Bruce A. Reed. Finding a maximum-weight induced k-partite subgraph of an i-triangulated graph. Discrete Applied Mathematics, 158(7):765--770, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Charu C. Aggarwal, Yan Xie, and Philip S. Yu. Gconnect: A connectivity index for massive disk-resident graphs. PVLDB, 2(1):862--873, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Leman Akoglu, Mary McGlohon, and Christos Faloutsos. oddball: Spotting anomalies in weighted graphs. In PAKDD (2), pages 410--421, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. I. Alvarez-Hamelin, L. Dall'Asta, A. Barrat, and A. Vespignani. k-core decompositions: A tool for the visualization of large scale networks. http://arxiv.org/abs/cs.NI/0504107.Google ScholarGoogle Scholar
  8. Periklis Andritsos, Renée J. Miller, and Panayiotis Tsaparas. Information-theoretic tools for mining database structure from large data sets. In SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. David A. Bader, Shiva Kintali, Kamesh Madduri, and Milena Mihail. Approximating betweenness centrality. WAW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Paolo Boldi and Sebastiano Vigna. The webgraph framework i: compression techniques. In WWW, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ronnie Chaiken, Bob Jenkins, Per-Ake Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. Scope: easy and efficient parallel processing of massive data sets. VLDB, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and Prabhakar Raghavan. On compressing social networks. In KDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI'04, December 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Erdos and A. Rényi. On random graphs. Publicationes Mathematicae, 6:290--297, 1959.Google ScholarGoogle ScholarCross RefCross Ref
  15. Robert L. Grossman and Yunhong Gu. Data mining using high performance data clouds: experimental studies using sector and sphere. KDD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sándor Héman, Marcin Zukowski, Niels J. Nes, Lefteris Sidirourgos, and Peter A. Boncz. Positional update handling in column stores. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Michael Isard and Yuan Yu. Distributed data-parallel computing using a high-level programming language. In SIGMOD Conference, pages 987--994, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Milena Ivanova, Martin L. Kersten, Niels J. Nes, and Romulo Goncalves. An architecture for recycling intermediates in a column-store. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. U Kang, C.E Tsourakakis, Ana Paula Appel, C Faloutsos, and Jure Leskovec. Radius plots for mining tera-byte scale graphs: Algorithms, patterns, and observations. SIAM International Conference on Data Mining, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  20. U Kang, C.E Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system - implementation and observations. ICDM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Karypis and V. Kumar. Parallel multilevel k-way partitioning for irregular graphs. SIAM Review, 41(2):278--300, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. George Karypis and Vipin Kumar. Multilevel k-way hypergraph partitioning. In DAC, pages 343--348, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jure Leskovec, Deepayan Chakrabarti, Jon M. Kleinberg, and Christos Faloutsos. Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. PKDD, pages 133--145, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ching-Yung Lin, Nan Cao, Shixia Liu, Spiros Papadimitriou, Jimeng Sun, and Xifeng Yan. Smallblue: Social network analysis for expertise search and collective intelligence. In ICDE, pages 1483--1486, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Chao Liu, Fan Guo, and Christos Faloutsos. Bbm: bayesian browsing model from petabyte-scale data. In KDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD Conference, pages 135--146, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hossein Maserrat and Jian Pei. Neighbor query friendly compression of social networks. In KDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD '08, pages 1099--1110, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google ScholarGoogle Scholar
  30. Spiros Papadimitriou and Jimeng Sun. Disco: Distributed co-clustering with map-reduce. ICDM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. Interpreting the data: Parallel analysis with sawzall. Scientific Programming Journal, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Purnamrita Sarkar and Andrew W. Moore. Fast nearest-neighbor search in disk-resident graphs. In KDD, pages 513--522, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, and Stanley B. Zdonik. C-store: A column-oriented dbms. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos. Neighborhood formation and anomaly detection in bipartite graphs. In ICDM, pages 418--425, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. Fast random walk with restart and its applications. In ICDM, pages 613--622, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Silke Trißl and Ulf Leser. Fast and practical indexing and querying of very large graphs. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In VLDB, pages 709--720, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xifeng Yan, Hong Cheng, Jiawei Han, and Philip S. Yu. Mining significant graph patterns by leap search. In SIGMOD Conference, pages 433--444, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Peixiang Zhao, Jeffrey Xu Yu, and Philip S. Yu. Graph indexing: Tree + delta >= graph. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GBASE: a scalable and general graph management system

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2011
      1446 pages
      ISBN:9781450308137
      DOI:10.1145/2020408

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 August 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader