research-article

Factorized Databases

Authors:
Dan Olteanu

University of Oxford

University of Oxford
View Profile

,
Maximilian Schleich

University of Oxford

University of Oxford
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 45 Issue 2June 2016pp 5–16https://doi.org/10.1145/3003665.3003667

Published:28 September 2016Publication History

ACM SIGMOD Record

Abstract

This paper overviews factorized databases and their application to machine learning. The key observation underlying this work is that state-of-the-art relational query processing entails a high degree of redundancy in the computation and representation of query results. This redundancy can be avoided and is not necessary for subsequent analytics such as learning regression models.

References

S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. In FOCS, pages 739--748, 2008. Google ScholarDigital Library
N. Bakibayev, T. Kociský, D. Olteanu, and J. Závodný. Aggregation and ordering in factorised databases. PVLDB, 6(14):1990--2001, 2013. Google ScholarDigital Library
N. Bakibayev, D. Olteanu, and J. Závodný. FDB: A query engine for factorised relational databases. PVLDB, 5(11):1232--1243, 2012. Google ScholarDigital Library
C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics), 2006. Google ScholarDigital Library
R. Ciucanu and D. Olteanu. Worst-case optimal join at a time. Technical report, Oxford, Nov 2015.Google Scholar
G. Gottlob. On minimal constraint networks. Artif. Intell., 191-192:42--60, 2012. Google ScholarDigital Library
T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, pages 31--40, 2007. Google ScholarDigital Library
M. A. Khamis, H. Q. Ngo, and A. Rudra. FAQ: Questions Asked Frequently, CoRR:1504.04044, April 2015.Google Scholar
P. Koutris, P. Beame, and D. Suciu. Worst-case optimal algorithms for parallel query processing. In ICDT, pages 8:1--8:18, 2016.Google Scholar
H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms. In PODS, pages 37--48, 2012. Google ScholarDigital Library
D. Olteanu and J. Huang. Using OBDDs for efficient query evaluation on probabilistic databases. In SUM, pages 326--340, 2008. Google ScholarDigital Library
D. Olteanu, C. Koch, and L. Antova. World-set decompositions: Expressiveness and efficient algorithms. TCS, 403(2-3):265--284, 2008. Google ScholarDigital Library
D. Olteanu and M. Schleich. F: Regression models over factorized views. PVLDB, 9(10), 2016. Google ScholarDigital Library
D. Olteanu and J. Závodný. On factorisation of provenance polynomials. In TaPP, 2011.Google Scholar
D. Olteanu and J. Závodný. Factorised representations of query results: size bounds and readability. In ICDT, pages 285--298, 2012. Google ScholarDigital Library
D. Olteanu and J. Závodný. Size bounds for factorised representations of query results. TODS, 40(1):2, 2015. Google ScholarDigital Library
J. Pearl. Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann, 1989. Google ScholarDigital Library
M. Schleich, D. Olteanu, and R. Ciucanu. Learning linear regression models over factorized joins. In SIGMOD, 2016. Google ScholarDigital Library
P. Sen, A. Deshpande, and L. Getoor. Read-once functions and query evaluation in probabilistic databases. PVLDB, 3(1):1068--1079, 2010. Google ScholarDigital Library
J. Shute, R. Vingralek, B. Samwel, B. Handy, C. Whipkey, E. Rollins, M. Oancea, K. Littlefield, D. Menestrina, S. Ellner, J. Cieslewicz, I. Rae, T. Stancescu, and H. Apte. F1: A distributed SQL database that scales. PVLDB, 6(11):1068--1079, 2013. Google ScholarDigital Library
Simons Institute for the Theory of Computing, UC Berkeley. Workshop on "Succinct Data Representations and Applications", September 2013.Google Scholar
T. L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In ICDT, pages 96--106, 2014.Google Scholar

Recommendations

Demonstration of the FDB query engine for factorised databases

FDB is an in-memory query engine for factorised databases, which are relational databases that use compact factorised representations at the physical layer to reduce data redundancy and boost query performance.

We demonstrate FDB using real data sets ...
Read More
Aggregation and ordering in factorised databases

A common approach to data analysis involves understanding and manipulating succinct representations of data. In earlier work, we put forward a succinct representation system for relational data called factorised databases and reported on the main-memory ...
Read More
Integrated non-factorized variational inference
NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2

We present a non-factorized variational method for full posterior inference in Bayesian hierarchical models, with the goal of capturing the posterior variable dependencies via efficient and possibly parallel computation. Our approach unifies the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMOD Record Volume 45, Issue 2
June 2016
66 pages
ISSN:0163-5808
DOI:10.1145/3003665
Editors:
Yanlei Diao
University of Massachusetts Amherst
,
Vanessa Braganholo
Universidade Federal Fluminense
,
Marco Brambilla
Politecnico di Milano
,
Chee Yong Chan
National University of Singapore
,
Rada Chirkova
North Carolina State University
,
Zackary Ives
University of Pennsylvania
,
Anastasios Kementsietsidis
Google Research
,
Jeffrey Naughton
University of Wisconsin-Madison
,
Frank Neven
Hasselt University
,
Olga Papaemmanoui
Brandeis Univesity
,
Aditya Parameswaran
University of Illinois
,
Anish Das Sarma
Google Research
,
Alkis Simitsis
HP Labs
,
Wang-Chiew Tan
University of California Santa Cruz
,
Nesime Tatbul
MIT CSAIL
,
Marianne Winslett
University of Illinois
,
Jun Yang
Duke University
Issue’s Table of Contents
Copyright © 2016 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 September 2016
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 54
  Total Citations
  View Citations
- 427
  Total Downloads
- Downloads (Last 12 months)85
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Factorized Databases

ACM SIGMOD Record

Abstract

References

Cited By

Recommendations

Demonstration of the FDB query engine for factorised databases

Aggregation and ordering in factorised databases

Integrated non-factorized variational inference