research-article

A partitioning framework for aggressive data skipping

Authors:
Liwen Sun

AMPLab, UC Berkeley

AMPLab, UC Berkeley
View Profile

,
Sanjay Krishnan

AMPLab, UC Berkeley

AMPLab, UC Berkeley
View Profile

,
Reynold S. Xin

AMPLab, UC Berkeley

AMPLab, UC Berkeley
View Profile

,
Michael J. Franklin

AMPLab, UC Berkeley

AMPLab, UC Berkeley
View Profile

Proceedings of the VLDB Endowment Volume 7 Issue 13pp 1617–1620https://doi.org/10.14778/2733004.2733044

Published:01 August 2014Publication History

Proceedings of the VLDB Endowment

Abstract

We propose to demonstrate a fine-grained partitioning framework that reorganizes the data tuples into small blocks at data loading time. The goal is to enable queries to maximally skip scanning data blocks. The partition framework consists of four steps: (1) workload analysis, which extracts features from a query workload, (2) augmentation, which augments each data tuple with a feature vector, (3) reduce, which succinctly represents a set of data tuples using a set of feature vectors, and (4) partitioning, which performs a clustering algorithm to partition the feature vectors and uses the clustering result to guide the actual data partitioning. Our experiments show that our techniques result in a 3-7x query response time improvement over traditional range partitioning due to more effective data skipping.

References

Running Spark on Amazon EC2. https://spark.apache.org/docs/0.9.0/ec2-scripts.html.Google Scholar
A. Hall, O. Bachmann, R. Büssow, S. Gănceanu, and M. Nunkesser. Processing a trillion cells per mouse click. PVLDB, 5(11):1436--1446, 2012. Google ScholarDigital Library
L. Sun, M. J. Franklin, S. Krishnan, and R. S. Xin. Fine-grained partitioning for aggressive data skipping. In SIGMOD Conference, pages 1115--1126, 2014. Google ScholarDigital Library
V. Raman et al. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, 2013. Google ScholarDigital Library
R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: SQL and Rich Analytics at Scale. In SIGMOD, pages 13--24, 2013. Google ScholarDigital Library
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In NSDI, pages 2--2, 2012. Google ScholarDigital Library

Index Terms

A partitioning framework for aggressive data skipping
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

Adaptive Data Skipping in Main-Memory Systems
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

As modern main-memory optimized data systems increasingly rely on fast scans, lightweight indexes that allow for data skipping play a crucial role in data filtering to reduce system I/O. Scans benefit from data skipping when the data order is sorted, ...
Read More
Fine-grained partitioning for aggressive data skipping
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

Modern query engines are increasingly being required to process enormous datasets in near real-time. While much can be done to speed up the data access, a promising technique is to reduce the need to access data through data skipping. By maintaining ...
Read More
Skipping-oriented partitioning for columnar layouts

As data volumes continue to grow, modern database systems increasingly rely on data skipping mechanisms to improve performance by avoiding access to irrelevant data. Recent work [39] proposed a fine-grained partitioning scheme that was shown to improve ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 7, Issue 13
August 2014
466 pages
ISSN:2150-8097
Editors:
H. V. Jagadish
University of Michigan
,
Aoying Zhou
East Normal University, China
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2014
Published in pvldb Volume 7, Issue 13
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 100
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A partitioning framework for aggressive data skipping

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Adaptive Data Skipping in Main-Memory Systems

Fine-grained partitioning for aggressive data skipping

Skipping-oriented partitioning for columnar layouts

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A partitioning framework for aggressive data skipping

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Adaptive Data Skipping in Main-Memory Systems

Fine-grained partitioning for aggressive data skipping

Skipping-oriented partitioning for columnar layouts

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media