research-article

Free Access

Dremel: interactive analysis of web-scale datasets

Authors:
Sergey Melnik

Google, Inc.

Google, Inc.
View Profile

,
Andrey Gubarev

Google, Inc.

Google, Inc.
View Profile

,
Jing Jing Long

Google, Inc.

Google, Inc.
View Profile

,
Geoffrey Romer

Google, Inc.

Google, Inc.
View Profile

,
Shiva Shivakumar

Google, Inc.

Google, Inc.
View Profile

,
Matt Tolton

Google, Inc.

Google, Inc.
View Profile

,
Theo Vassilakis

Google, Inc.

Google, Inc.
View Profile

Authors Info & Claims

Communications of the ACM Volume 54 Issue 6June 2011pp 114–123https://doi.org/10.1145/1953122.1953148

Published:01 June 2011Publication History

Communications of the ACM

Abstract

Dremel is a scalable, interactive ad hoc query system for analysis of read-only nested data. By combining multilevel execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. In this paper, we describe the architecture and implementation of Dremel, and explain how it complements MapReduce-based computing. We present a novel columnar storage representation for nested records and discuss experiments on few-thousand node instances of the system.

References

Abadi, D. J., Boncz, P. A., Harizopoulos, S. Column-oriented database systems. VLDB 2, 2 (2009). Google ScholarDigital Library
Abiteboul, S., Hull, R., and Vianu, V. Foundations of Databases. Addison Wesley, Reading, PA, 1995. Google ScholarDigital Library
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D. J., Rasin, A., Silberschatz, A. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. VLDB 2, 1 (2009). Google ScholarDigital Library
Bar-Yossef, Z., Jayram, T. S., Kumar, R., Sivakumar, D., Trevisan, L. Counting distinct elements in a data stream. In RANDOM, 2002, 1--10. Google ScholarDigital Library
Barroso, L. A., Hölzle, U. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan & Claypool Publishers, 2009. Google ScholarDigital Library
BigQuery. http://code.google.com/apis/bigquery.Google Scholar
Chaiken, R., Jenkins, B., Larson, P.-?., Ramsey, B., Shakib, D., Weaver, S., Zhou, J. SCOPE: Easy and efficient parallel processing of massive data sets. VLDB 1, 2 (2008). Google ScholarDigital Library
Chambers, C., Raniwala, A., Perry, F., Adams, S., Henry, R., Bradshaw, R., Weizenbaum, N. FlumeJava: Easy, efficient data-parallel pipelines. In PLDI, 2010. Google ScholarDigital Library
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., Gruber, R. Bigtable: A distributed storage system for structured data. In OSDI, 2006. Google ScholarDigital Library
Colby, L. S. A recursive algebra and query optimization for nested relations. In SIGMOD, 1989. Google ScholarDigital Library
Dean. J., Challenges in building large-scale information retrieval systems: Invited talk. In WSDM, 2009. Google ScholarDigital Library
Dean, J., Ghemawat, S. MapReduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarDigital Library
Dean, J., Ghemawat, S. MapReduce: A Flexible data processing tool. Commun. ACM 53, 1 (2010). Google ScholarDigital Library
Ghemawat, S., Gobioff, H., Leung, S.-T. The Google File System. In SOSP, 2003. Google ScholarDigital Library
Hadoop Apache Project. http://hadoop.apache.org.Google Scholar
Hive. http://wiki.apache.org/hadoop/Hive, 2009.Google Scholar
Liefke, H., Suciu, D. XMill: An efficient compressor for XML data. In SIGMOD, 2000. Google ScholarDigital Library
Melnik, S., Gubarev, A., Long, J. J., Romer, G., Shivakumar, S., Tolton, M., Vassilakis, T. Dremel: Interactive analysis of web-scale datasets. PVLDB 3, 1 (2010). Google ScholarDigital Library
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A. Pig Latin: A not-so-foreign language for data processing. In SIGMOD, 2008. Google ScholarDigital Library
O'Neil, P. E., O'Neil, E. J., Pal, S., Cseri, I., Schaller, G., Westbury, N. ORDPATHs: Insert-friendly XML node labels. In SIGMOD, 2004. Google ScholarDigital Library
Pike, R., Dorward, S., Griesemer, R., Quinlan, S. Interpreting the data: Parallel analysis with Sawzall. Sci. Program. 13, 4 (2005). Google ScholarDigital Library
Protocol Buffers: Developer Guide. Available at http://code.google.com/apis/protocolbuffers/docs/overview.html.Google Scholar
Stonebraker, M., Abadi, D., DeWitt, D. J., Madden, S., Paulson, E., Pavlo, A., Rasin, A., MapReduce and parallel DBMSs: Friends or foes? Commun. ACM 53, 1 (2010). Google ScholarDigital Library
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, Ú., Gunda, P. K., Currey, J. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI, 2008. Google ScholarDigital Library

Index Terms

Dremel: interactive analysis of web-scale datasets

Recommendations

Dremel: interactive analysis of web-scale datasets

Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system ...
Read More
Dremel: a decade of interactive SQL analysis at web scale

Google's Dremel was one of the first systems that combined a set of architectural principles that have become a common practice in today's cloud-native analytics tools, including disaggregated storage and compute, in situ analysis, and columnar storage ...
Read More
Storing and querying tree-structured records in Dremel

In Dremel, data is stored as nested relations. The schema for a relation is a tree, all of whose nodes are attributes, and whose leaf attributes hold values. We explore filter and aggregate queries that are given in the Dremel dialect of SQL. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Communications of the ACM Volume 54, Issue 6
June 2011
134 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/1953122
Issue’s Table of Contents

Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Popular
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 72
  Total Citations
  View Citations
- 9,063
  Total Downloads
- Downloads (Last 12 months)875
- Downloads (Last 6 weeks)63
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Dremel: interactive analysis of web-scale datasets

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Dremel: interactive analysis of web-scale datasets

Dremel: a decade of interactive SQL analysis at web scale

Storing and querying tree-structured records in Dremel

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Dremel: interactive analysis of web-scale datasets

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Dremel: interactive analysis of web-scale datasets

Dremel: a decade of interactive SQL analysis at web scale

Storing and querying tree-structured records in Dremel

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media