skip to main content
research-article

Factorized Databases

Published:28 September 2016Publication History
Skip Abstract Section

Abstract

This paper overviews factorized databases and their application to machine learning. The key observation underlying this work is that state-of-the-art relational query processing entails a high degree of redundancy in the computation and representation of query results. This redundancy can be avoided and is not necessary for subsequent analytics such as learning regression models.

References

  1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. In FOCS, pages 739--748, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Bakibayev, T. Kociský, D. Olteanu, and J. Závodný. Aggregation and ordering in factorised databases. PVLDB, 6(14):1990--2001, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Bakibayev, D. Olteanu, and J. Závodný. FDB: A query engine for factorised relational databases. PVLDB, 5(11):1232--1243, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Ciucanu and D. Olteanu. Worst-case optimal join at a time. Technical report, Oxford, Nov 2015.Google ScholarGoogle Scholar
  7. G. Gottlob. On minimal constraint networks. Artif. Intell., 191-192:42--60, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, pages 31--40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. A. Khamis, H. Q. Ngo, and A. Rudra. FAQ: Questions Asked Frequently, CoRR:1504.04044, April 2015.Google ScholarGoogle Scholar
  10. P. Koutris, P. Beame, and D. Suciu. Worst-case optimal algorithms for parallel query processing. In ICDT, pages 8:1--8:18, 2016.Google ScholarGoogle Scholar
  11. H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms. In PODS, pages 37--48, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Olteanu and J. Huang. Using OBDDs for efficient query evaluation on probabilistic databases. In SUM, pages 326--340, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Olteanu, C. Koch, and L. Antova. World-set decompositions: Expressiveness and efficient algorithms. TCS, 403(2-3):265--284, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Olteanu and M. Schleich. F: Regression models over factorized views. PVLDB, 9(10), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Olteanu and J. Závodný. On factorisation of provenance polynomials. In TaPP, 2011.Google ScholarGoogle Scholar
  16. D. Olteanu and J. Závodný. Factorised representations of query results: size bounds and readability. In ICDT, pages 285--298, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Olteanu and J. Závodný. Size bounds for factorised representations of query results. TODS, 40(1):2, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Pearl. Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Schleich, D. Olteanu, and R. Ciucanu. Learning linear regression models over factorized joins. In SIGMOD, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Sen, A. Deshpande, and L. Getoor. Read-once functions and query evaluation in probabilistic databases. PVLDB, 3(1):1068--1079, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Shute, R. Vingralek, B. Samwel, B. Handy, C. Whipkey, E. Rollins, M. Oancea, K. Littlefield, D. Menestrina, S. Ellner, J. Cieslewicz, I. Rae, T. Stancescu, and H. Apte. F1: A distributed SQL database that scales. PVLDB, 6(11):1068--1079, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Simons Institute for the Theory of Computing, UC Berkeley. Workshop on "Succinct Data Representations and Applications", September 2013.Google ScholarGoogle Scholar
  23. T. L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In ICDT, pages 96--106, 2014.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader