poster

GBASE: a scalable and general graph management system

Authors:
U. Kang

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Hanghang Tong

IBM T.J. Watson, Hawthorne, NY, USA

IBM T.J. Watson, Hawthorne, NY, USA
View Profile

,
Jimeng Sun

IBM T.J. Watson, Hawthorne, NY, USA

IBM T.J. Watson, Hawthorne, NY, USA
View Profile

,
Ching-Yung Lin

IBM T.J. Watson, Hawthorne, NY, USA

IBM T.J. Watson, Hawthorne, NY, USA
View Profile

,
Christos Faloutsos

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2011Pages 1091–1099https://doi.org/10.1145/2020408.2020580

Published:21 August 2011Publication History

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1091–1099

ABSTRACT

Graphs appear in numerous applications including cyber-security, the Internet, social networks, protein networks, recommendation systems, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose GBASE, a scalable and general graph management and mining system. The key novelties lie in 1) our storage and compression scheme for a parallel setting and 2) the carefully chosen graph operations and their efficient implementation. We designed and implemented an instance of GBASE using MapReduce/Hadoop. GBASE provides a parallel indexing mechanism for graph mining operations that both saves storage space, as well as accelerates queries. We ran numerous experiments on real graphs, spanning billions of nodes and edges, and we show that our proposed GBASE is indeed fast, scalable and nimble, with significant savings in space and time.

References

Hadoop information. http://hadoop.apache.org/.Google Scholar
Daniel J. Abadi, Peter A. Boncz, and Stavros Harizopoulos. Column oriented database systems. PVLDB, 2009. Google ScholarDigital Library
Daniel J. Abadi, Samuel Madden, and Nabil Hachem. Column-stores vs. row-stores: how different are they really? In SIGMOD Conference, pages 967--980, 2008. Google ScholarDigital Library
Louigi Addario-Berry, W. Sean Kennedy, Andrew D. King, Zhentao Li, and Bruce A. Reed. Finding a maximum-weight induced k-partite subgraph of an i-triangulated graph. Discrete Applied Mathematics, 158(7):765--770, 2010. Google ScholarDigital Library
Charu C. Aggarwal, Yan Xie, and Philip S. Yu. Gconnect: A connectivity index for massive disk-resident graphs. PVLDB, 2(1):862--873, 2009. Google ScholarDigital Library
Leman Akoglu, Mary McGlohon, and Christos Faloutsos. oddball: Spotting anomalies in weighted graphs. In PAKDD (2), pages 410--421, 2010. Google ScholarDigital Library
I. Alvarez-Hamelin, L. Dall'Asta, A. Barrat, and A. Vespignani. k-core decompositions: A tool for the visualization of large scale networks. http://arxiv.org/abs/cs.NI/0504107.Google Scholar
Periklis Andritsos, Renée J. Miller, and Panayiotis Tsaparas. Information-theoretic tools for mining database structure from large data sets. In SIGMOD, 2004. Google ScholarDigital Library
David A. Bader, Shiva Kintali, Kamesh Madduri, and Milena Mihail. Approximating betweenness centrality. WAW, 2007. Google ScholarDigital Library
Paolo Boldi and Sebastiano Vigna. The webgraph framework i: compression techniques. In WWW, 2004. Google ScholarDigital Library
Ronnie Chaiken, Bob Jenkins, Per-Ake Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. Scope: easy and efficient parallel processing of massive data sets. VLDB, 2008. Google ScholarDigital Library
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and Prabhakar Raghavan. On compressing social networks. In KDD, 2009. Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI'04, December 2004. Google ScholarDigital Library
P. Erdos and A. Rényi. On random graphs. Publicationes Mathematicae, 6:290--297, 1959.Google ScholarCross Ref
Robert L. Grossman and Yunhong Gu. Data mining using high performance data clouds: experimental studies using sector and sphere. KDD, 2008. Google ScholarDigital Library
Sándor Héman, Marcin Zukowski, Niels J. Nes, Lefteris Sidirourgos, and Peter A. Boncz. Positional update handling in column stores. In SIGMOD, 2010. Google ScholarDigital Library
Michael Isard and Yuan Yu. Distributed data-parallel computing using a high-level programming language. In SIGMOD Conference, pages 987--994, 2009. Google ScholarDigital Library
Milena Ivanova, Martin L. Kersten, Niels J. Nes, and Romulo Goncalves. An architecture for recycling intermediates in a column-store. In SIGMOD, 2009. Google ScholarDigital Library
U Kang, C.E Tsourakakis, Ana Paula Appel, C Faloutsos, and Jure Leskovec. Radius plots for mining tera-byte scale graphs: Algorithms, patterns, and observations. SIAM International Conference on Data Mining, 2010.Google ScholarCross Ref
U Kang, C.E Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system - implementation and observations. ICDM, 2009. Google ScholarDigital Library
G. Karypis and V. Kumar. Parallel multilevel k-way partitioning for irregular graphs. SIAM Review, 41(2):278--300, 1999. Google ScholarDigital Library
George Karypis and Vipin Kumar. Multilevel k-way hypergraph partitioning. In DAC, pages 343--348, 1999. Google ScholarDigital Library
Jure Leskovec, Deepayan Chakrabarti, Jon M. Kleinberg, and Christos Faloutsos. Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. PKDD, pages 133--145, 2005. Google ScholarDigital Library
Ching-Yung Lin, Nan Cao, Shixia Liu, Spiros Papadimitriou, Jimeng Sun, and Xifeng Yan. Smallblue: Social network analysis for expertise search and collective intelligence. In ICDE, pages 1483--1486, 2009. Google ScholarDigital Library
Chao Liu, Fan Guo, and Christos Faloutsos. Bbm: bayesian browsing model from petabyte-scale data. In KDD, 2009. Google ScholarDigital Library
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD Conference, pages 135--146, 2010. Google ScholarDigital Library
Hossein Maserrat and Jian Pei. Neighbor query friendly compression of social networks. In KDD, 2010. Google ScholarDigital Library
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD '08, pages 1099--1110, 2008. Google ScholarDigital Library
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
Spiros Papadimitriou and Jimeng Sun. Disco: Distributed co-clustering with map-reduce. ICDM, 2008. Google ScholarDigital Library
Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. Interpreting the data: Parallel analysis with sawzall. Scientific Programming Journal, 2005. Google ScholarDigital Library
Purnamrita Sarkar and Andrew W. Moore. Fast nearest-neighbor search in disk-resident graphs. In KDD, pages 513--522, 2010. Google ScholarDigital Library
Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, and Stanley B. Zdonik. C-store: A column-oriented dbms. In VLDB, 2005. Google ScholarDigital Library
Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos. Neighborhood formation and anomaly detection in bipartite graphs. In ICDM, pages 418--425, 2005. Google ScholarDigital Library
Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. Fast random walk with restart and its applications. In ICDM, pages 613--622, 2006. Google ScholarDigital Library
Silke Trißl and Ulf Leser. Fast and practical indexing and querying of very large graphs. In SIGMOD, 2007. Google ScholarDigital Library
D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In VLDB, pages 709--720, 2005. Google ScholarDigital Library
Xifeng Yan, Hong Cheng, Jiawei Han, and Philip S. Yu. Mining significant graph patterns by leap search. In SIGMOD Conference, pages 433--444, 2008. Google ScholarDigital Library
Peixiang Zhao, Jeffrey Xu Yu, and Philip S. Yu. Graph indexing: Tree + delta >= graph. In VLDB, 2007. Google ScholarDigital Library

Index Terms

GBASE: a scalable and general graph management system
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

gbase: an efficient analysis platform for large graphs

Graphs appear in numerous applications including cyber security, the Internet, social networks, protein networks, recommendation systems, citation networks, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to ...
Read More
Equimatchable Graphs on Surfaces

A graph G is equimatchable if each matching in G is a subset of a maximum-size matching and it is factor critical if G-v has a perfect matching for each vertex v of G. It is known that any 2-connected equimatchable graph is either bipartite or factor ...
Read More
Acyclic 3-choosability of sparse graphs with girth at least 7

Every planar graph is known to be acyclically 7-choosable and is conjectured to be acyclically 5-choosable (Borodin et al. 2002 [4]). This conjecture if proved would imply both Borodin's acyclic 5-color theorem (1979) and Thomassen's 5-choosability ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2011
1446 pages
ISBN:9781450308137
DOI:10.1145/2020408
General Chair:
Chid Apte
IBM Research
,
Program Chairs:
Joydeep Ghosh
UT Austin
,
Padhraic Smyth
UC Irvine
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 August 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
compression
distributed computing
graph
indexing
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 83
  Total Citations
  View Citations
- 821
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GBASE: a scalable and general graph management system

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

gbase: an efficient analysis platform for large graphs

Equimatchable Graphs on Surfaces

Acyclic 3-choosability of sparse graphs with girth at least 7