research-article

Blink and it's done: interactive queries on very large data

Authors:
Sameer Agarwal

UC Berkeley

UC Berkeley
View Profile

,
Anand P. Iyer

UC Berkeley

UC Berkeley
View Profile

,
Aurojit Panda

UC Berkeley

UC Berkeley
View Profile

,
Samuel Madden

MIT CSAIL

MIT CSAIL
View Profile

,
Barzan Mozafari

MIT CSAIL

MIT CSAIL
View Profile

,
Ion Stoica

UC Berkeley

UC Berkeley
View Profile

Proceedings of the VLDB Endowment Volume 5 Issue 12pp 1902–1905https://doi.org/10.14778/2367502.2367533

Published:01 August 2012Publication History

Proceedings of the VLDB Endowment

Abstract

In this demonstration, we present BlinkDB, a massively parallel, sampling-based approximate query processing framework for running interactive queries on large volumes of data. The key observation in BlinkDB is that one can make reasonable decisions in the absence of perfect answers. BlinkDB extends the Hive/HDFS stack and can handle the same set of SPJA (selection, projection, join and aggregate) queries as supported by these systems. BlinkDB provides real-time answers along with statistical error guarantees, and can scale to petabytes of data and thousands of machines in a fault-tolerant manner. Our experiments using the TPC-H benchmark and on an anonymized real-world video content distribution workload from Conviva Inc. show that BlinkDB can execute a wide range of queries up to 150x faster than Hive on MapReduce and 10--150x faster than Shark (Hive on Spark) over tens of terabytes of data stored across 100 machines, all with an error of 2--10%.

References

Apache Hive Project. http://hive.apache.org/.Google Scholar
Conviva Inc. http://www.conviva.com/.Google Scholar
S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou. Re-optimizing Data Parallel Computing. In NSDI, pages 281--294, 2012. Google Scholar
S. Agarwal, A. Panda, B. Mozafari, S. Madden, and I. Stoica. BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data. Technical Report, http://arxiv.org/abs/1203.5485, 2012.Google Scholar
N. Bruno, S. Agarwal, S. Kandula, B. Shi, M.-C. Wu, and J. Zhou. Recurring Job Optimization in Scope. In SIGMOD, pages 805--806, 2012. Google Scholar
C. Engle et al. Shark: Fast Data Analysis Using Coarse-grained Distributed Memory. In SIGMOD Conference, pages 689--692, 2012. Google Scholar
M. Garofalakis and P. Gibbons. Approximate Query Processing: Taming the Terabytes. In VLDB, 2001. Tutorial. Google Scholar
M. L. Kersten, S. Idreos, S. Manegold, and E. Liarou. The Researcher's Guide to the Data Deluge: Querying a Scientific Database in Just a Few Seconds. PVLDB, 4(12):1474--1477, 2011.Google Scholar
L. Sidirourgos et al. SciBORQ: Scientific Data Management With Bounds On Runtime and Quality. In CIDR, pages 296--301, 2011.Google Scholar
M. Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI, pages 15--28, 2012. Google Scholar
K. Zeng, B. Mozafari, S. Gao, and C. Zaniolo. Uncertainty Propagation in Complex Query Networks on Data Streams: A New Paradigm for Load Shedding. Technical Report 120016, UCLA, 2011.Google Scholar

Index Terms

Blink and it's done: interactive queries on very large data
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

Combining Joint and Semi-Join Operations for Distributed Query Processing

The application of a combination of join and semi-join operations to minimize the amount of data transmission required for distributed query processing is discussed. Specifically, two important concepts that occur with the use of join operations as ...
Read More
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database Theory

The problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...
Read More
Scalable and efficient processing of top-k multiple-type integrated queries
Abstract
In this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 5, Issue 12
August 2012
340 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2012
Published in pvldb Volume 5, Issue 12
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 42
  Total Citations
  View Citations
- 487
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Blink and it's done: interactive queries on very large data

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Combining Joint and Semi-Join Operations for Distributed Query Processing

Equivalence and minimization of conjunctive queries under combined semantics

Scalable and efficient processing of top-k multiple-type integrated queries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Blink and it's done: interactive queries on very large data

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Combining Joint and Semi-Join Operations for Distributed Query Processing

Equivalence and minimization of conjunctive queries under combined semantics

Scalable and efficient processing of top-k multiple-type integrated queries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media