Abstract
This paper presents a new view of federated databases to address the growing need for managing information that spans multiple data models. This trend is fueled by the proliferation of storage engines and query languages based on the observation that 'no one size fits all'. To address this shift, we propose a polystore architecture; it is designed to unify querying over multiple data models. We consider the challenges and opportunities associated with polystores. Open questions in this space revolve around query optimization and the assignment of objects to storage engines. We introduce our approach to these topics and discuss our prototype in the context of the Intel Science and Technology Center for Big Data
- Accumulo. https://accumulo.apache.org/.Google Scholar
- L. Amsaleg, A. Tomasic, M. J. Franklin, and T. Urhan. Scrambling query plans to cope with unexpected delays. In Fourth International Conference on Parallel and Distributed Information Systems, 1996, pages 208--219. IEEE, 1996. Google ScholarDigital Library
- B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In PODS, pages 1--16. ACM, 2002. Google ScholarDigital Library
- C. Batini, M. Lenzerini, and S. B. Navathe. A comparative analysis of methodologies for database schema integration. ACM Computing Surveys, 18(4):323--364, 1986. Google ScholarDigital Library
- L. Bouganim, F. Fabret, C. Mohan, and P. Valduriez. A dynamic query processing architecture for data integration systems. IEEE Data Eng. Bull., 23(2):42--48, 2000.Google Scholar
- P. G. Brown. Overview of scidb: large scale array storage, processing and analysis. In SIGMOD, pages 963--968. ACM, 2010. Google ScholarDigital Library
- M. J. Carey, L. M. Haas, P. M. Schwarz, M. Arya, W. F. Cody, R. Fagin, M. Flickner, A. W. Luniewski,W. Niblack, and D. Petkovic. Towards heterogeneous multimedia information systems: The Garlic approach. In Data Engineering: Distributed Object Management, pages 124--131. IEEE, 1995. Google ScholarDigital Library
- U. Cetintemel, J. Du, T. Kraska, S. Madden, D. Maier, J. Meehan, A. Pavlo, M. Stonebraker, E. Sutherland, and N. Tatbul. S-Store: A Streaming NewSQL System for Big Velocity Applications. PVLDB, 7(13), 2014. Google ScholarDigital Library
- S. Chawathe, H. G. Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS Project: Integration of Heterogeneous Information Sources. In IPSJ, 1994.Google Scholar
- A. Deshpande and J. M. Hellerstein. Decoupled query optimization for federated database systems. In ICDE, pages 716--727. IEEE, 2002. Google ScholarDigital Library
- D. J. DeWitt, A. Halverson, R. Nehme, S. Shankar, J. Aguilar-Saborit, A. Avanes, M. Flasza, and J. Gramling. Split query processing in polybase. SIGMOD, pages 1255--1266, 2013. Google ScholarDigital Library
- M. Franklin, A. Halevy, and D. Maier. From databases to dataspaces: a new abstraction for information management. Sigmod Record, 34(4):27--33, 2005. Google ScholarDigital Library
- D. Halperin, V. Teixeira de Almeida, L. L. Choo, S. Chu, P. Koutris, D. Moritz, J. Ortiz, V. Ruamviboonsuk, J. Wang, A. Whitaker, et al. Demonstration of the Myria big data management service. In SIGMOD. ACM, 2014. Google ScholarDigital Library
- R. Hull. Managing semantic heterogeneity in databases: a theoretical prospective. In PODS, pages 51--61. ACM, 1997. Google ScholarDigital Library
- J. Kepner, W. Arcand, W. Bergeron, N. Bliss, R. Bond, C. Byun, G. Condon, K. Gregson, M. Hubbell, and J. Kurz. Dynamic distributed dimensional data model (d4m) database and computation system. In ICASSP. IEEE, 2012.Google ScholarCross Ref
- J. LeFevre, J. Sankaranarayanan, H. Hacigümüs, J. Tatemura, N. Polyzotis, and M. J. Carey. MISO: souping up big data query processing with a multistore system. In SIGMOD, pages 1591--1602, 2014. Google ScholarDigital Library
- L. M. Mackinnon, D. H. Marwick, and M. H. Williams. A model for query decomposition and answer construction in heterogeneous distributed database systems. Journal of Intelligent Information Systems, 11(1):69--87, 1998. Google ScholarDigital Library
- M. Saeed, M. Villarroel, A. T. Reisner, G. Clifford, L.-W. Lehman, G. Moody, T. Heldt, T. H. Kyaw, B. Moody, and R. G. Mark. Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): A public-access intensive care unit database. Critical Care Medicine, 39:952--960, 2011.Google ScholarCross Ref
- P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In SIGMOD, pages 23--34. ACM, 1979. Google ScholarDigital Library
- M. Stonebraker, P. M. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell, C. Staelin, and A. Yu. Mariposa: a wide-area distributed database system. In The VLDB Journal, volume 5, pages 48--63. Springer, 1996. Google ScholarDigital Library
- M. Stonebraker and U. Cetintemel. ¿One Size Fits All': An Idea Whose time has come and gone. In ICDE, pages 2--11, 2005. Google ScholarDigital Library
- R. Taft, M. Vartak, N. R. Satish, N. Sundaram, S. Madden, and M. Stonebraker. Genbase: A complex analytics genomics benchmark. In SIGMOD, pages 177--188. ACM, 2014. Google ScholarDigital Library
- G. Wiederhold. Mediators in the architecture of future information systems. Computer, pages 38--49, 1992. Google ScholarDigital Library
Index Terms
- The BigDAWG Polystore System
Recommendations
A demonstration of the BigDAWG polystore system
Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, HawaiiThis paper presents BigDAWG, a reference implementation of a new architecture for "Big Data" applications. Such applications not only call for large-scale analytics, but also for real-time streaming support, smaller analytics at interactive speeds, data ...
Parallel query processing in a polystore
AbstractThe blooming of different data stores has made polystores a major topic in the cloud and big data landscape. As the amount of data grows rapidly, it becomes critical to exploit the inherent parallel processing capabilities of underlying data ...
An Architecture for the Development of Distributed Analytics Based on Polystore Events
Heterogeneous Data Management, Polystores, and Analytics for HealthcareAbstractTo balance the requirements for data consistency and availability, organisations increasingly migrate towards hybrid data persistence architectures (called polystores throughout this paper) comprising both relational and NoSQL databases. The EC-...
Comments