Abstract
This paper overviews factorized databases and their application to machine learning. The key observation underlying this work is that state-of-the-art relational query processing entails a high degree of redundancy in the computation and representation of query results. This redundancy can be avoided and is not necessary for subsequent analytics such as learning regression models.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
- A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. In FOCS, pages 739--748, 2008. Google ScholarDigital Library
- N. Bakibayev, T. Kociský, D. Olteanu, and J. Závodný. Aggregation and ordering in factorised databases. PVLDB, 6(14):1990--2001, 2013. Google ScholarDigital Library
- N. Bakibayev, D. Olteanu, and J. Závodný. FDB: A query engine for factorised relational databases. PVLDB, 5(11):1232--1243, 2012. Google ScholarDigital Library
- C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics), 2006. Google ScholarDigital Library
- R. Ciucanu and D. Olteanu. Worst-case optimal join at a time. Technical report, Oxford, Nov 2015.Google Scholar
- G. Gottlob. On minimal constraint networks. Artif. Intell., 191-192:42--60, 2012. Google ScholarDigital Library
- T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, pages 31--40, 2007. Google ScholarDigital Library
- M. A. Khamis, H. Q. Ngo, and A. Rudra. FAQ: Questions Asked Frequently, CoRR:1504.04044, April 2015.Google Scholar
- P. Koutris, P. Beame, and D. Suciu. Worst-case optimal algorithms for parallel query processing. In ICDT, pages 8:1--8:18, 2016.Google Scholar
- H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms. In PODS, pages 37--48, 2012. Google ScholarDigital Library
- D. Olteanu and J. Huang. Using OBDDs for efficient query evaluation on probabilistic databases. In SUM, pages 326--340, 2008. Google ScholarDigital Library
- D. Olteanu, C. Koch, and L. Antova. World-set decompositions: Expressiveness and efficient algorithms. TCS, 403(2-3):265--284, 2008. Google ScholarDigital Library
- D. Olteanu and M. Schleich. F: Regression models over factorized views. PVLDB, 9(10), 2016. Google ScholarDigital Library
- D. Olteanu and J. Závodný. On factorisation of provenance polynomials. In TaPP, 2011.Google Scholar
- D. Olteanu and J. Závodný. Factorised representations of query results: size bounds and readability. In ICDT, pages 285--298, 2012. Google ScholarDigital Library
- D. Olteanu and J. Závodný. Size bounds for factorised representations of query results. TODS, 40(1):2, 2015. Google ScholarDigital Library
- J. Pearl. Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann, 1989. Google ScholarDigital Library
- M. Schleich, D. Olteanu, and R. Ciucanu. Learning linear regression models over factorized joins. In SIGMOD, 2016. Google ScholarDigital Library
- P. Sen, A. Deshpande, and L. Getoor. Read-once functions and query evaluation in probabilistic databases. PVLDB, 3(1):1068--1079, 2010. Google ScholarDigital Library
- J. Shute, R. Vingralek, B. Samwel, B. Handy, C. Whipkey, E. Rollins, M. Oancea, K. Littlefield, D. Menestrina, S. Ellner, J. Cieslewicz, I. Rae, T. Stancescu, and H. Apte. F1: A distributed SQL database that scales. PVLDB, 6(11):1068--1079, 2013. Google ScholarDigital Library
- Simons Institute for the Theory of Computing, UC Berkeley. Workshop on "Succinct Data Representations and Applications", September 2013.Google Scholar
- T. L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In ICDT, pages 96--106, 2014.Google Scholar
Recommendations
Demonstration of the FDB query engine for factorised databases
FDB is an in-memory query engine for factorised databases, which are relational databases that use compact factorised representations at the physical layer to reduce data redundancy and boost query performance.
We demonstrate FDB using real data sets ...
Aggregation and ordering in factorised databases
A common approach to data analysis involves understanding and manipulating succinct representations of data. In earlier work, we put forward a succinct representation system for relational data called factorised databases and reported on the main-memory ...
Integrated non-factorized variational inference
NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2We present a non-factorized variational method for full posterior inference in Bayesian hierarchical models, with the goal of capturing the posterior variable dependencies via efficient and possibly parallel computation. Our approach unifies the ...
Comments