Abstract
Multiversion databases store both current and historical data. Rows are typically annotated with timestamps representing the period when the row is/was valid. We develop novel techniques for reducing index maintenance in multiversion databases, so that indexes can be used effectively for analytical queries over current data without being a heavy burden on transaction throughput. To achieve this end, we re-design persistent index data structures in the storage hierarchy to employ an extra level of indirection. The indirection level is stored on solid state disks that can support very fast random I/Os, so that traversing the extra level of indirection incurs a relatively small overhead.
The extra level of indirection dramatically reduces the number of magnetic disk I/Os that are needed for index updates, and localizes maintenance to indexes on updated attributes. Further, we batch insertions within the indirection layer in order to reduce physical disk I/Os for indexing new records. By reducing the index maintenance overhead on transactions, we enable operational data stores to create more indexes to support queries. We have developed a prototype of our indirection proposal by extending the widely used Generalized Search Tree (GiST) open-source project, which is also employed in PostgreSQL. Our working implementation demonstrates that we can significantly reduce index maintenance and/or query processing cost, by a factor of 3. For insertions of new records, our novel batching technique can save up to 90% of the insertion time.
- BioPostgres: Data management for computational biology. http://www.biopostgres.org/.Google Scholar
- IBM DB2 Database for Linux, UNIX, and Windows. www.ibm.com/software/data/db2/linux-unix-windows/.Google Scholar
- IBM DB2 with BLU Acceleration. www.ibm.com/software/data/ db2/linux-unix-windows/db2-blu-acceleration/.Google Scholar
- OpenFTS: Open source full text search engine. http://openfts.sourceforge.net/.Google Scholar
- PostGIS: Geographic information systems. http://postgis.refractions.net/.Google Scholar
- PostgreSQL: Open source object-relational database system. http://www.postgresql.org/.Google Scholar
- YAGO2: High-quality knowledge base. http://www.mpi-inf.mpg.de/yago-naga/yago/.Google Scholar
- D. Agrawal, D. Ganesan, R. K. Sitaraman, Y. Diao, and S. Singh. Lazy-adaptive tree: An optimized index structure for flash devices. PVLDB, 2(1):361-372, 2009. Google Scholar
- C.-H. Ang and K.-P. Tan. The interval B-tree. Inf. Process. Lett., 53(2):85-89, Jan. 1995. Google Scholar
- R. Arpaci-Dusseau and A. Arpaci-Dusseau. Operating Systems: Three Easy Pieces. Arpaci-Dusseau Books, 0.5 edition, 2012.Google Scholar
- M. Athanassoulis, S. Chen, A. Ailamaki, P. B. Gibbons, and R. Stoica. MaSM: efficient online updates in data warehouses. In SIGMOD Conference, pages 865-876, 2011. Google Scholar
- B. Becker, S. Gschwind, T. Ohler, B. Seeger, and P. Widmayer. An asymptotically optimal multiversion B-Tree. VLDB J., 5(4):264-275, 1996. Google Scholar
- B. Bhattacharjee, L. Lim, T. Malkemus, G. Mihaila, K. Ross, S. Lau, C. McArthur, Z. Toth, and R. Sherkat. Efficient index compression in DB2 LUW. Proc. VLDB Endow., 2(2):1462-1473, Aug. 2009. Google Scholar
- B. Bhattacharjee, T. Malkemus, S. Lau, S. Mckeough, J.-A. Kirton, R. V. Boeschoten, and J. Kennedy. Efficient bulk deletes for multi dimensionally clustered tables in DB2. In VLDB, pages 1197-1206, 2007. Google Scholar
- B. Bhattacharjee, K. A. Ross, C. A. Lang, G. A. Mihaila, and M. Banikazemi. Enhancing recovery using an SSD buffer pool extension. In DaMoN, pages 10-16, 2011. Google Scholar
- T. Bozkaya and M. Özsoyoǧlu. Indexing valid time intervals. Lecture Notes in Computer Science, 1460:541-550, 1998. Google Scholar
- M. Canim, B. Bhattacharjee, G. A. Mihaila, C. A. Lang, and K. A. Ross. An object placement advisor for DB2 using solid state storage. PVLDB, 2(2):1318-1329, 2009. Google Scholar
- M. Canim, G. A. Mihaila, B. Bhattacharjee, K. A. Ross, and C. A. Lang. SSD bufferpool extensions for database systems. PVLDB, 3(2):1435-1446, 2010. Google Scholar
- S. Chaudhuri and V. Narasayya. Automating statistics management for query optimizers. IEEE Trans. on Knowl. and Data Eng., 13(1):7-20, Jan. 2001. Google Scholar
- S. Chaudhuri and V. R. Narasayya. An efficient cost-driven index selection tool for microsoft SQL server. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB '97, pages 146-155, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc. Google Scholar
- F. Chen, T. Luo, and X. Zhang. CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In FAST, pages 77-90, 2011. Google Scholar
- S. Chen. Time travel query or bi-temporal. In DB2 for z/OS Technical Forum, 2010.Google Scholar
- J. Do, D. Zhang, J. M. Patel, D. J. DeWitt, J. F. Naughton, and A. Halverson. Turbocharging DBMS buffer pool using SSDs. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, SIGMOD '11, pages 1113-1124, New York, NY, USA, 2011. ACM. Google Scholar
- A. J. Dou, S. Lin, and V. Kalogeraki. Real-time querying of historical data in flash-equipped sensor devices. In IEEE Real-Time Systems Symposium, pages 335-344, 2008. Google Scholar
- G. Drossel. Methodologies for calculating SSD usable life. In Storage Developer Conference, 2009.Google Scholar
- R. Elmasri, G. T. J. Wuu, and V. Kouramajian. The time index and the monotonic B+-tree. In Temporal Databases, pages 433-456. 1993.Google Scholar
- Fusion-io breaks one billion IOPS barrier. http://www.fusionio.com/press-releases/fusion-io-breaks-one-billion-iops-barrier/.Google Scholar
- H. Garcia-Molina, J. D. Ullman, and J. Widom. Database Systems: The Complete Book. Prentice Hall Press, Upper Saddle River, NJ, USA, 2 edition, 2008. Google Scholar
- The GiST indexing project. http://gist.cs.berkeley.edu/.Google Scholar
- H. Gunadhi and A. Segev. Efficient indexing methods for temporal relations. IEEE Trans. on Knowledge and Data Eng., 5(3):496, June 1993. Google Scholar
- J. M. Hellerstein, J. F. Naughton, and A. Pfeffer. Generalized search trees for database systems. In Proceedings of the 21th International Conference on Very Large Data Bases, VLDB '95, pages 562-573, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc. Google Scholar
- F. D. Hinshaw, C. S. Harris, and S. K. Sarin. Controlling visibility in multi-version database systems, 2007. US 7305386 Patent, Netezza Corporation.Google Scholar
- D. Hitz, J. Lau, and M. Malcolm. File system design for an NFS file server appliance. In Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference, WTEC'94, pages 19-19, Berkeley, CA, USA, 1994. USENIX Association. Google Scholar
- DB2 10 for z/OS. ftp://public.dhe.ibm.com/software/systemz/whitepapers/DB210_for_zOS_Upgrade_ebook.pdf.Google Scholar
- W. H. Inmon. Building the Operational Data Store. John Wiley & Sons, Inc., New York, NY, USA, 2nd edition, 1999. Google Scholar
- K. Jouini and G. Jomier. Indexing multiversion databases. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, CIKM '07, pages 915-918, New York, NY, USA, 2007. ACM. Google Scholar
- W.-H. Kang, S.-W. Lee, and B. Moon. Flash-based extended cache for higher throughput and faster recovery. PVLDB, 5(11):1615-1626, 2012. Google Scholar
- P.-A. Larson, S. Blanas, C. Diaconu, C. Freedman, J. M. Patel, and M. Zwilling. High-performance concurrency control mechanisms for main-memory databases. Proc. VLDB Endow., 5(4):298-309, Dec. 2011. Google Scholar
- J. J. Levandoski, D. B. Lomet, and S. Sengupta. The Bw-Tree: A B-tree for new hardware platforms. In Proceedings of the 2013 IEEE 29th International Conference on Data Engineering, ICDE '13, Washington, DC, USA, 2013. IEEE Computer Society. Google Scholar
- A. Leventhal. Flash storage memory. Commun. ACM, 51(7):47-51, July 2008. Google Scholar
- Y. Li, B. He, Q. Luo, and K. Yi. Tree indexing on flash disks. In Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE '09, pages 1303-1306, Washington, DC, USA, 2009. IEEE Computer Society. Google Scholar
- D. Lomet, R. Barga, M. F. Mokbel, G. Shegalov, R. Wang, and Y. Zhu. Immortal DB: transaction time support for SQL server. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, SIGMOD '05, pages 939-941, New York, NY, USA, 2005. ACM. Google Scholar
- D. Lomet, M. Hong, R. Nehme, and R. Zhang. Transaction time indexing with version compression. Proc. VLDB Endow., 1(1):870-881, Aug. 2008. Google Scholar
- E. Omiecinski, W. Liu, and I. F. Akyildiz. Analysis of a deferred and incremental update strategy for secondary indexes. Inf. Syst., 16(3):345-356, 1991. Google Scholar
- P. E. O'Neil, E. Cheng, D. Gawlick, and E. J. O'Neil. The log-structured merge-tree (LSM-Tree). Acta Inf., 33(4):351-385, 1996. Google Scholar
- Oracle database 11g workspace manager overview. http://www.oracle.com/technetwork/database/twp-appdev-workspace-manager-11g-128289.pdf.Google Scholar
- Oracle total recall/flashback data archive. http://www.oracle.com/technetwork/issue-archive/2008/08-jul/flashback-data-archive-whitepaper-129145.pdf.Google Scholar
- M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst., 10(1):26-52, Feb. 1992. Google Scholar
- Salzberg and Tsotras. Comparison of access methods for time-evolving data. CSURV: Computing Surveys, 31, 1999. Google Scholar
- C. M. Saracco, M. Nicola, and L. Gandhi. A matter of time: Temporal data management in DB2 for z/OS, 2010.Google Scholar
- R. Sears and R. Ramakrishnan. bLSM: a general purpose log structured merge tree. In SIGMOD Conference, pages 217-228, 2012. Google Scholar
- H. Shen, B. Chin, and O. H. Lu. The TP-Index: A dynamic and efficient indexing mechanism for temporal databases. In In Proceedings of the Tenth International Conference on Data Engineering, pages 274-281. IEEE, 1994. Google Scholar
- R. T. Snodgrass. A case study of temporal data, 2010. Teradata Corporation.Google Scholar
- TPC-H, decision support benchmark. http://www.tpc.org/tpch/.Google Scholar
- H. T. Vo, S. Wang, D. Agrawal, G. Chen, and B. C. Ooi. LogBase: A scalable log-structured database system in the cloud. PVLDB, 5(10):1004-1015, 2012. Google Scholar
- C.-H. Wu, T.-W. Kuo, and L.-P. Chang. An efficient B-tree layer implementation for flash-memory storage systems. ACM Trans. Embedded Comput. Syst., 6(3), 2007. Google Scholar
Index Terms
- Making updates disk-I/O friendly using SSDs
Recommendations
DCD—disk caching disk: a new approach for boosting I/O performance
ISCA '96: Proceedings of the 23rd annual international symposium on Computer architectureThis paper presents a novel disk storage architecture called DCD, Disk Caching Disk, for the purpose of optimizing I/O performance. The main idea of the DCD is to use a small log disk, referred to as cache-disk, as a secondary disk cache to optimize ...
DCD—disk caching disk: a new approach for boosting I/O performance
Special Issue: Proceedings of the 23rd annual international symposium on Computer architecture (ISCA '96)This paper presents a novel disk storage architecture called DCD, Disk Caching Disk, for the purpose of optimizing I/O performance. The main idea of the DCD is to use a small log disk, referred to as cache-disk, as a secondary disk cache to optimize ...
Performance of Two-Disk Failure-Tolerant Disk Arrays
RAID5 disk arrays use the rebuild process to reconstruct the contents of a failed disk on a spare disk, but this process is unsuccessful if latent sector failures are encountered or a second disk failure occurs. The high cost of data loss has led to two-...
Comments