research-article

Stale view cleaning: getting fresh answers from stale materialized views

Authors:
Sanjay Krishnan

UC Berkeley

UC Berkeley
View Profile

,
Jiannan Wang

UC Berkeley

UC Berkeley
View Profile

,
Michael J. Franklin

UC Berkeley

UC Berkeley
View Profile

,
Ken Goldberg

UC Berkeley

UC Berkeley
View Profile

,
Tim Kraska

Brown University

Brown University
View Profile

Proceedings of the VLDB Endowment Volume 8 Issue 12pp 1370–1381https://doi.org/10.14778/2824032.2824037

Published:01 August 2015Publication History

Proceedings of the VLDB Endowment

Abstract

Materialized views (MVs), stored pre-computed results, are widely used to facilitate fast queries on large datasets. When new records arrive at a high rate, it is infeasible to continuously update (maintain) MVs and a common solution is to defer maintenance by batching updates together. Between batches the MVs become increasingly stale with incorrect, missing, and superfluous rows leading to increasingly inaccurate query results. We propose Stale View Cleaning (SVC) which addresses this problem from a data cleaning perspective. In SVC, we efficiently clean a sample of rows from a stale MV, and use the clean sample to estimate aggregate query results. While approximate, the estimated query results reflect the most recent data. As sampling can be sensitive to long-tailed distributions, we further explore an outlier indexing technique to give increased accuracy when the data distributions are skewed. SVC complements existing deferred maintenance approaches by giving accurate and bounded query answers between maintenance. We evaluate our method on a generated dataset from the TPC-D benchmark and a real video distribution application. Experiments confirm our theoretical results: (1) cleaning an MV sample is more efficient than full view maintenance, (2) the estimated results are more accurate than using the stale MV, and (3) SVC is applicable for a wide variety of MVs.

References

Conviva. http://www.conviva.com/.Google Scholar
S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. I. Jordan, S. Madden, B. Mozafari, and I. Stoica. Knowing when you're wrong: building fast and reliable approximate query processing systems. In SIGMOD Conference, pages 481--492, 2014. Google Scholar
S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. Blinkdb: queries with bounded errors and bounded response times on very large data. In EuroSys, pages 29--42, 2013. Google Scholar
A. Chalamalla, I. F. Ilyas, M. Ouzzani, and P. Papotti. Descriptive and prescriptive data cleaning. In SIGMOD Conference, pages 445--456, 2014. Google Scholar
S. Chaudhuri, G. Das, M. Datar, R. Motwani, and V. R. Narasayya. Overcoming limitations of sampling for aggregation queries. In ICDE, pages 534--542, 2001. Google Scholar
S. Chaudhuri and V. Narasayya. TPC-D data generation with skew. ftp.research.microsoft.com/users/viveknar/tpcdskew.Google Scholar
R. Chirkova and J. Yang. Materialized views. Foundations and Trends in Databases, 4(4):295--405, 2012.Google Scholar
A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Review, 51(4):661--703, 2009. Google Scholar
G. Cormode, M. N. Garofalakis, P. J. Haas, and C. Jermaine. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends in Databases, 4(1-3):1--294, 2012. Google Scholar
D. R. Cox and D. V. Hinkley. Theoretical statistics. CRC Press, 1979.Google Scholar
Y. Cui and J. Widom. Lineage tracing for general data warehouse transformations. VLDB J., 12(1):41--58, 2003. Google Scholar
M. N. Garofalakis and P. B. Gibbons. Approximate query processing: Taming the terabytes. In VLDB, 2001. Google Scholar
P. B. Gibbons, Y. Matias, and V. Poosala. Fast incremental maintenance of approximate histograms. ACM Trans. Database Syst., 27(3):261--298, 2002. Google Scholar
A. Gupta and I. S. Mumick. Maintenance of materialized views: Problems, techniques, and applications. IEEE Data Eng. Bull., 18(2):3--18, 1995.Google Scholar
H. Gupta and I. S. Mumick. Incremental maintenance of aggregate and outerjoin expressions. Information Systems, 31(6):435--464, 2006. Google Scholar
C. Henke, C. Schmoll, and T. Zseby. Empirical evaluation of hash functions for packetid generation in sampled multipoint measurements. In Passive and Active Network Measurement, pages 197--206. Springer, 2009. Google Scholar
S. Joshi and C. M. Jermaine. Materialized sample views for database approximation. IEEE Trans. Knowl. Data Eng., 20(3):337--351, 2008. Google Scholar
C. Koch, Y. Ahmad, O. Kennedy, M. Nikolic, A. Nötzli, D. Lupei, and A. Shaikhha. Dbtoaster: higher-order delta processing for dynamic, frequently fresh views. VLDB J., 23(2):253--278, 2014. Google Scholar
S. Krishnan, J. Wang, M. J. Franklin, K. Goldberg, and T. Kraska. Stale view cleaning: Getting fresh answers from stale materialized views. http://www.ocf.berkeley.edu/~sanjayk/pubs/svc-2014.pdf, 2014.Google Scholar
P.-A. Larson, W. Lehner, J. Zhou, and P. Zabback. Cardinality estimation using sample views with quality assurance. In SIGMOD, pages 175--186, 2007. Google Scholar
P.-Å. Larson and H. Z. Yang. Computing queries from derived relations. In VLDB, pages 259--269, 1985. Google Scholar
P. L'Ecuyer and R. Simard. Testu01: Ac library for empirical testing of random number generators. ACM Transactions on Mathematical Software (TOMS), 33(4):22, 2007. Google Scholar
Z. Liu, B. Jiang, and J. Heer. imMens: Real-time visual querying of big data. Comput. Graph. Forum, 32(3):421--430, 2013.Google Scholar
A. Meliou, W. Gatterbauer, S. Nath, and D. Suciu. Tracing data errors with view-conditioned causality. In SIGMOD Conference, pages 505--516, 2011. Google Scholar
M. Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, 1(2):226--251, 2003.Google Scholar
S. Nirkhiwale, A. Dobra, and C. M. Jermaine. A sampling algebra for aggregate estimation. PVLDB, 6(14):1798--1809, 2013. Google Scholar
F. Olken. Random sampling from databases. PhD thesis, University of California, 1993.Google Scholar
F. Olken and D. Rotem. Simple random sampling from relational databases. In VLDB, pages 160--169, 1986. Google Scholar
F. Olken and D. Rotem. Maintenance of materialized views of sampling queries. In ICDE, pages 632--641, 1992. Google Scholar
E. Rahm and H. H. Do. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull., 23(4):3--13, 2000.Google Scholar
V. Srinivasan and M. J. Carey. Compensation-based on-line query processing. In SIGMOD Conference, pages 331--340, 1992. Google Scholar
J. Wang, S. Krishnan, M. J. Franklin, K. Goldberg, T. Kraska, and T. Milo. A sample-and-clean framework for fast and accurate query processing on dirty data. In SIGMOD Conference, pages 469--480, 2014. Google Scholar
K. Weil. Rainbird: Real-time analytics at twitter. In Strata, 2011.Google Scholar
E. Wu and S. Madden. Scorpion: Explaining away outliers in aggregate queries. PVLDB, 6(8):553--564, 2013. Google Scholar
K. Zeng, S. Gao, B. Mozafari, and C. Zaniolo. The analytical bootstrap: a new method for fast error estimation in approximate query processing. In SIGMOD, pages 277--288, 2014. Google Scholar

Index Terms

Stale view cleaning: getting fresh answers from stale materialized views
1. Information systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

View-based query containment
PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Query containment is the problem of checking whether for all databases the answer to a query is a subset of the answer to a second query. In several data management tasks, such as data integration, mobile computing, etc., the data of interest are only ...
Read More
Keyword query cleaning

Unlike traditional database queries, keyword queries do not adhere to predefined syntax and are often dirty with irrelevant words from natural languages. This makes accurate and efficient keyword query processing over databases a very challenging task.

...
Read More
View-based query processing: On the relationship between rewriting, answering and losslessness

As a result of the extensive research in view-based query processing, three notions have been identified as fundamental, namely rewriting, answering, and losslessness. Answering amounts to computing the tuples satisfying the query in all databases ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 8, Issue 12
Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii
August 2015
728 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2015
Published in pvldb Volume 8, Issue 12
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 152
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Stale view cleaning: getting fresh answers from stale materialized views

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

View-based query containment

Keyword query cleaning

View-based query processing: On the relationship between rewriting, answering and losslessness

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Stale view cleaning: getting fresh answers from stale materialized views

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

View-based query containment

Keyword query cleaning

View-based query processing: On the relationship between rewriting, answering and losslessness

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media