research-article

Processing a trillion cells per mouse click

Authors:
Alexander Hall

Google, Inc.

Google, Inc.
View Profile

,
Olaf Bachmann

Google, Inc.

Google, Inc.
View Profile

,
Robert Büssow

Google, Inc.

Google, Inc.
View Profile

,
Silviu Gănceanu

Google, Inc.

Google, Inc.
View Profile

,
Marc Nunkesser

Google, Inc.

Google, Inc.
View Profile

Proceedings of the VLDB Endowment Volume 5 Issue 11pp 1436–1446https://doi.org/10.14778/2350229.2350259

Published:01 July 2012Publication History

Proceedings of the VLDB Endowment

Abstract

Column-oriented database systems have been a real game changer for the industry in recent years. Highly tuned and performant systems have evolved that provide users with the possibility of answering ad hoc queries over large datasets in an interactive manner.

In this paper we present the column-oriented datastore developed as one of the central components of PowerDrill. It combines the advantages of columnar data layout with other known techniques (such as using composite range partitions) and extensive algorithmic engineering on key data structures. The main goal of the latter being to reduce the main memory footprint and to increase the efficiency in processing typical user queries. In this combination we achieve large speed-ups. These enable a highly interactive Web UI where it is common that a single mouse click leads to processing a trillion values in the underlying dataset.

References

D. J. Abadi. Query Execution in Column-Oriented Database Systems. PhD thesis, MIT, 2008. Google ScholarDigital Library
D. J. Abadi, P. A. Boncz, and S. Harizopoulos. Column oriented database systems. PVLDB, 2(2):1664--1665, 2009. Google ScholarDigital Library
D. J. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD Conference, pages 671--682, 2006. Google ScholarDigital Library
D. J. Abadi, S. Madden, and N. Hachem. Column-stores vs. row-stores: how different are they really? In SIGMOD Conference, pages 967--980, 2008. Google ScholarDigital Library
S. Abiteboul and N. Bidoit. Non first normal form relations to represent hierarchically organized data. In PODS, pages 191--200, 1984. Google ScholarDigital Library
Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting distinct elements in a data stream. In RANDOM, pages 1--10, 2002. Google ScholarDigital Library
H. Bast and I. Weber. The completesearch engine: Interactive, efficient, and towards ir& db integration. In CIDR, pages 88--95, 2007.Google Scholar
S. Bischof, T. Schickinger, and A. Steger. Load balancing using bisectors - a tight average-case analysis. In ESA, pages 172--183, 1999. Google ScholarDigital Library
P. A. Boncz, M. Zukowski, and N. Nes. Monetdb/x100: Hyper-pipelining query execution. In CIDR, pages 225--237, 2005.Google Scholar
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, 2008. Google ScholarDigital Library
S. Chaudhuri and U. Dayal. An overview of data warehousing and olap technology. SIGMOD Record, 26(1):65--74, 1997. Google ScholarDigital Library
J. Dean. Challenges in building large-scale information retrieval systems: invited talk. In WSDM, page 1, 2009. Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008. Google ScholarDigital Library
P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 31(2):182--209, 1985. Google ScholarDigital Library
S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In SOSP, pages 29--43, 2003. Google ScholarDigital Library
Google Web Toolkit. code.google.com/webtoolkit.Google Scholar
S. Idreos, M. L. Kersten, and S. Manegold. Self-organizing tuple reconstruction in column-stores. In SIGMOD Conference, pages 297--308, 2009. Google ScholarDigital Library
D. S. Johnson, S. Krishnan, J. Chhugani, S. Kumar, and S. Venkatasubramanian. Compressing large boolean matrices using reordering techniques. In VLDB, pages 13--23, 2004. Google ScholarDigital Library
T. Johnson and D. Shasha. 2q: A low overhead high performance buffer management replacement algorithm. In VLDB, pages 439--450, 1994. Google ScholarDigital Library
M. L. Kersten, S. Idreos, S. Manegold, and E. Liarou. The researcher's guide to the data deluge: Querying a scientific database in just a few seconds. PVLDB, 4(12):1474--1477, 2011.Google ScholarDigital Library
D. Lemire and O. Kaser. Reordering columns for smaller indexes. Inf. Sci., 181(12):2550--2570, 2011. Google ScholarDigital Library
N. Megiddo and D. S. Modha. Outperforming lru with an adaptive replacement cache algorithm. IEEE Computer, 37(4):58--65, 2004. Google ScholarDigital Library
S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: interactive analysis of web-scale datasets. Commun. ACM, 54(6):114--123, 2011. Google ScholarDigital Library
G. Moerkotte. Small materialized aggregates: A light weight index structure for data warehousing. In VLDB, pages 476--487, 1998. Google ScholarDigital Library
MonetDB. www.monetdb.org.Google Scholar
Netezza. www.netezza.com.Google Scholar
Page replacement algorithms. en.wikipedia.org/wiki/Page_replacement_algorithm.Google Scholar
Partition databases (wiki). en.wikipedia.org/wiki/Partition_(database).Google Scholar
Protocol Buffers: Developer Guide. code.google.com/apis/protocolbuffers/docs/overview.html.Google Scholar
QlikTech. www.qlikview.com.Google Scholar
R. Sedgewick and K. Wayne. Algorithms. Addison Wesley, 2011. Google ScholarDigital Library
D. Slezak, J. Wroblewski, V. Eastwood, and P. Synak. Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB, 1(2):1337--1345, 2008. Google ScholarDigital Library
Snappy compression algorithm. code.google.com/p/snappy.Google Scholar
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-store: A column-oriented dbms. In VLDB, pages 553--564, 2005. Google ScholarDigital Library
L. Trevisan. When hamming meets euclid: The approximability of geometric tsp and mst (extended abstract). In STOC, pages 21--29, 1997. Google ScholarDigital Library
VectorWise. www.actian.com/products/vectorwise.Google Scholar

Recommendations

Session based click features for recency ranking
AAAI'10: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence

Recency ranking refers to the ranking of web results by accounting for both relevance and freshness. This is particularly important for "recency sensitive" queries such as breaking news queries. In this study, we propose a set of novel click features to ...
Read More
Tool for detecting webpage usability problems from mouse click coordinate logs
HCI'07: Proceedings of the 12th international conference on Human-computer interaction: interaction design and usability

In this paper, we propose a method that detects inconsistencies between user interaction logs of a task and desired sequences for the task based on mouse click coordinate logs. The proposed method models two successive clicks as a vector and thus a ...
Read More
Click patterns: an empirical representation of complex query intents
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Understanding users' search intents is critical component of modern search engines. A key limitation made by most query log analyses is the assumption that each clicked web result represents one unique intent. However, there are many search tasks, such ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 5, Issue 11
July 2012
608 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 July 2012
Published in pvldb Volume 5, Issue 11
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 281
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Processing a trillion cells per mouse click

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Session based click features for recency ranking

Tool for detecting webpage usability problems from mouse click coordinate logs

Click patterns: an empirical representation of complex query intents

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Processing a trillion cells per mouse click

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Session based click features for recency ranking

Tool for detecting webpage usability problems from mouse click coordinate logs

Click patterns: an empirical representation of complex query intents

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media