research-article

In-RDBMS hardware acceleration of advanced analytics

Authors:
Divya Mahajan

Georgia Institute of Technology

Georgia Institute of Technology
View Profile

,
Joon Kyung Kim

Georgia Institute of Technology

Georgia Institute of Technology
View Profile

,
Jacob Sacks

Georgia Institute of Technology

Georgia Institute of Technology
View Profile

,
Adel Ardalan

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Arun Kumar

University of California

University of California
View Profile

,
Hadi Esmaeilzadeh

University of California

University of California
View Profile

Proceedings of the VLDB Endowment Volume 11 Issue 11pp 1317–1331https://doi.org/10.14778/3236187.3236188

Published:01 July 2018Publication History

Proceedings of the VLDB Endowment

Abstract

The data revolution is fueled by advances in machine learning, databases, and hardware design. Programmable accelerators are making their way into each of these areas independently. As such, there is a void of solutions that enables hardware acceleration at the intersection of these disjoint fields. This paper sets out to be the initial step towards a unifying solution for in-Database Acceleration of Advanced Analytics (DAnA). Deploying specialized hardware, such as FPGAs, for in-database analytics currently requires hand-designing the hardware and manually routing the data. Instead, DAnA automatically maps a high-level specification of advanced analytics queries to an FPGA accelerator. The accelerator implementation is generated for a User Defined Function (UDF), expressed as a part of an SQL query using a Python-embedded Domain-Specific Language (DSL). To realize an efficient in-database integration, DAnA accelerators contain a novel hardware structure, Striders, that directly interface with the buffer pool of the database. Striders extract, cleanse, and process the training data tuples that are consumed by a multi-threaded FPGA engine that executes the analytics algorithm. We integrate DAnA with PostgreSQL to generate hardware accelerators for a range of real-world and synthetic datasets running diverse ML algorithms. Results show that DAnA-enhanced PostgreSQL provides, on average, 8.3× end-to-end speedup for real datasets, with a maximum of 28.2×. Moreover, DAnA-enhanced PostgreSQL is, on average, 4.0× faster than the multi-threaded Apache MADLib running on Greenplum. DAnA provides these benefits while hiding the complexity of hardware design from data scientists and allowing them to express the algorithm in ≈30-60 lines of Python.

References

Gartner Report on Analytics. gartner.com/it/page.jsp?id=1971516.Google Scholar
SAS Report on Analytics. sas.com/reg/wp/corp/23876.Google Scholar
M. Owaida, D. Sidler, K. Kara, and G. Alonso. Centaur: A framework for hybrid cpu-fpga databases. In 2017 IEEE 25th International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 211--218, April 2017.Google ScholarCross Ref
David Sidler, Muhsen Owaida, Zsolt István, Kaan Kara, and Gustavo Alonso. doppiodb: A hardware accelerated database. In 27th International Conference on Field Programmable Logic and Applications, FPL 2017, Ghent, Belgium, September 4--8, 2017, page 1, 2017.Google Scholar
Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kim, and Hadi Esmaeilzadeh. T<scp>abla</scp>: A unified template-based framework for accelerating statistical machine learning. In IEEE International Symposium on High Performance Computer Architecture (HPCA), March 2016.Google ScholarCross Ref
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceeding of the 41st Annual International Symposium on Computer Architecuture, ISCA '14, pages 13--24, 2014. Google ScholarDigital Library
Xixuan Feng, Arun Kumar, Benjamin Recht, and Christopher Ré. Towards a Unified Architecture for in-RDBMS Analytics. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12, pages 325--336. ACM, 2012. Google ScholarDigital Library
Yu Cheng, Chengjie Qin, and Florin Rusu. GLADE: Big Data Analytics Made Easy. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12, pages 697--700. ACM, 2012. Google ScholarDigital Library
Andrew R. Putnam, Dave Bennett, Eric Dellinger, Jeff Mason, and Prasanna Sundararajan. CHiMPS: A high-level compilation flow for hybrid CPU-FPGA architectures. In Field Programmable Gate Arrays (FPGA), 2008. Google ScholarDigital Library
Amazon web services postgresql. https://aws.amazon.com/rds/postgresql/.Google Scholar
Azure sql database. https://azure.microsoft.com/enus/services/sql-database/.Google Scholar
Oracle Data Mining. http://www.oracle.com/technetwork/database/options/advanced-analytics/odm/overview/index.html.Google Scholar
Oracle R Enterprise. http://www.oracle.com/technetwork/database/databasetechnologies/r/r-enterprise/overview/index.html.Google Scholar
Jeffrey Cohen, Brian Dolan, Mark Dunlap, Joseph M. Hellerstein, and Caleb Welton. MAD Skills: New Analysis Practices for Big Data. PVLDB, 2(2):1481--1492, 2009. Google ScholarDigital Library
Joseph M. Hellerstein, Christoper Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar. The MADlib Analytics Library: Or MAD Skills, the SQL. PVLDB, 5(12):1700--1711, 2012. Google ScholarDigital Library
Microsoft SQL Server Data Mining. https://docs.microsoft.com/en-us/sql/analysis-services/data-mining/data-mining-ssas.Google Scholar
Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 24. Curran Associates, Inc., 2011. Google ScholarDigital Library
Arun Kumar, Jeffrey Naughton, and Jignesh M. Patel. Learning generalized linear models over normalized data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pages 1969--1984. ACM, 2015. Google ScholarDigital Library
Jongse Park, Hardik Sharma, Divya Mahajan, Joon Kyung Kim, Preston Olds, and Hadi Esmaeilzadeh. Scale-out acceleration for machine learning. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pages 367--381, New York, NY, USA, 2017. ACM. Google ScholarDigital Library
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. Dadiannao: A machine-learning supercomputer. In Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on, pages 609--622. IEEE, 2014. Google ScholarDigital Library
Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. Pudiannao: A polyvalent machine learning accelerator. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015. Google ScholarDigital Library
Jing Li, Hung-Wei Tseng, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. Hippogriffdb: Balancing i/o and gpu bandwidth in big data analytics. PVLDB, 9(14):1647--1658, 2016. Google ScholarDigital Library
Rene Mueller, Jens Teubner, and Gustavo Alonso. Data processing on fpgas. PVLDB, 2(1):910--921, 2009. Google ScholarDigital Library
Kaan Kara, Jana Giceva, and Gustavo Alonso. Fpga-based data partitioning. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, pages 433--445. ACM, 2017. Google ScholarDigital Library
Kaan Kara, Dan Alistarh, Gustavo Alonso, Onur Mutlu, and Ce Zhang. Fpga-accelerated dense linear machine learning: A precision-convergence trade-off. 2017 IEEE 25th FCCM, pages 160--167, 2017.Google ScholarCross Ref
Amazon EC2 F1 instances: Run custom FPGAs in the amazon web services (aws) cloud. https://aws.amazon.com/ec2/instance-types/f1/, 2017.Google Scholar
Adrian M Caulfield, Eric S Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, et al. A cloud-scale acceleration architecture. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on, pages 1--13. IEEE, 2016. Google ScholarDigital Library
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, Richard C. Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. In-datacenter performance analysis of a tensor processing unit. CoRR, abs/1704.04760, 2017. Google ScholarDigital Library
Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, and Hadi Esmaeilzadeh. Tabla: A unified template-based framework for accelerating statistical machine learning. In HPCA, 2016.Google ScholarCross Ref
Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. From high-level deep neural models to FPGAs. In ACM/IEEE International Symposium on Microarchitecture (MICRO), October 2016. Google ScholarDigital Library
Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. On parallelizability of stochastic gradient descent for speech dnns. In ICASSP, 2014.Google ScholarCross Ref
Martin Zinkevich, Markus Weimer, Lihong Li, and Alex J Smola. Parallelized stochastic gradient descent. In Neural Information Processing Systems, 2010. Google ScholarDigital Library
Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, and Lin Xiao. Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research, 13(Jan):165--202, 2012. Google ScholarDigital Library
J. Langford, A.J. Smola, and M. Zinkevich. Slow learners are fast. In NIPS, 2009. Google ScholarDigital Library
Gideon Mann, Ryan McDonald, Mehryar Mohri, Nathan Silberman, and Daniel D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In NIPS, 2009. Google ScholarDigital Library
Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidynathan, Srinivas Sridharan, Dhiraj Kalamkar, Bharat Kaul, and Pradeep Dubey. Distributed deep learning using synchronous stochastic gradient descent. arXiv:1602.06709 {cs}, 2016.Google Scholar
Jianmin Chen, Rajat Monga, Samy Bengio, and Rafal Jozefowicz. Revisiting distributed synchronous SGD. In International Conference on Learning Representations Workshop Track, 2016.Google Scholar
A. Frank and A. Asuncion. University of california, irvine (uci) machine learning repository, 2010.Google Scholar
Jin Kyu Kim, Qirong Ho, Seunghak Lee, Xun Zheng, Wei Dai, Garth A. Gibson, and Eric P. Xing. Strads: A distributed framework for scheduled model parallel machine learning. In Proceedings of the 11th European Conference on Computer Systems, pages 5:1--5:16, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification. J. Mach. Learn. Res., 9:1871--1874, June 2008. Google ScholarDigital Library
Ce Zhang and Christopher Ré. Dimmwitted: A study of main-memory statistical analytics. Computing Research Repository (CoRR), abs/1403.7550, 2014.Google Scholar
S. Cadambi, I. Durdanovic, V. Jakkula, M. Sankaradass, E. Cosatto, S. Chakradhar, and H. P. Graf. A massively parallel fpga-based coprocessor for support vector machines. In 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines, pages 115--122, April 2009. Google ScholarDigital Library
M. Papadonikolakis and C. S. Bouganis. A heterogeneous fpga architecture for support vector machine training. In 2010 18th IEEE FCCM, pages 211--214, May 2010. Google ScholarDigital Library
Falcon computing. http://cadlab.cs.ucla.edu/~cong/slides/HALO15 keynote.pdf.Google Scholar
TABLA source code. http://www.act-lab.org/artifacts/tabla/.Google Scholar
Eric S. Chung, John D. Davis, and Jaewon Lee. LINQits: Big data on little clients. In ISCA, 2013. Google ScholarDigital Library
Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, K. Sjoerd Mullender, and Martin L. Kersten. Monetdb: Two decades of research in column-oriented database architectures. IEEE Technical Committee on Data Engineering, 35(1):40--45, 2012.Google Scholar
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proceedings of 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2014. Google ScholarDigital Library
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. Shidiannao: shifting vision processing closer to the sensor. In 42nd International Symposium on Computer Architecture (ISCA), 2015. Google ScholarDigital Library
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. Eie: Efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture, ISCA '16, pages 243--254, Piscataway, NJ, USA, 2016. IEEE Press. Google ScholarDigital Library
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In ISCA, 2016. Google ScholarDigital Library
J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing. In ISCA, 2016. Google ScholarDigital Library
Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, Jose Miguel Hernandez-Lobato, Gu-Yeon Wei, and David Brooks. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In ISCA, 2016. Google ScholarDigital Library
H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh. From high-level deep neural models to fpgas. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1--12, Oct 2016. Google ScholarDigital Library
Boriana L. Milenova, Joseph S. Yarmus, and Marcos M. Campos. Svm in oracle database 10 g : Removing the barriers to widespread adoption of support vector machines. PVLDB, pages 1152--1163, 2005. Google ScholarDigital Library
Daisy Zhe Wang, Michael J. Franklin, Minos Garofalakis, and Joseph M. Hellerstein. Querying probabilistic information extraction. PVLDB, 3(1--2):1057--1067, 2010. Google ScholarDigital Library
Michael Wick, Andrew McCallum, and Gerome Miklau. Scalable probabilistic databases with factor graphs and mcmc. PVLDB, 3(1--2):794--804, 2010. Google ScholarDigital Library
Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, and Ion Stoica. Shark: Sql and rich analytics at scale. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 13--24, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
M. Levent Koc and Christopher Ré. Incrementally maintaining classification using an rdbms. PVLDB, 4(5):302--313, 2011. Google ScholarDigital Library
Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Carsten Binnig, Ugur Cetintemel, and Stan Zdonik. An architecture for compiling udf-centric workflows. PVLDB, 8(12):1466--1477, 2015. Google ScholarDigital Library
Shoumik Palkar, James J. Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, and Matei Zaharia. Weld: A common runtime for high performance data analytics. January 2017.Google Scholar
Arun Kumar, Matthias Boehm, and Jun Yang. Data management in machine learning: Challenges, techniques, and systems. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, pages 1717--1722, New York, NY, USA, 2017. ACM. Google ScholarDigital Library

Recommendations

Hardware acceleration of database operations
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

As the amount of memory in database systems grows, entire database tables, or even databases, are able to fit in the system's memory, making in-memory database operations more prevalent. This shift from disk-based to in-memory database systems has ...
Read More
Fingerprint image processing acceleration through run-time reconfigurable hardware

To the best of the authors' knowledge, this is the first brief that implements a complete automatic fingerprint-based authentication system (AFAS) application under a dynamically partial self-reconfigurable field-programmable gate array (FPGA). The main ...
Read More
Transparent acceleration of program execution using reconfigurable hardware
DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition

The acceleration of applications, running on a general purpose processor (GPP), by mapping parts of their execution to reconfigurable hardware is an approach which does not involve program's source code and still ensures program portability over ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 11, Issue 11
July 2018
507 pages
ISSN:2150-8097
Editors:
Sihem Amer-Yahia
University of Grenoble Alpes, CNRS
,
Jian Pei
Simon Fraser University
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 July 2018
Published in pvldb Volume 11, Issue 11
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 208
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

In-RDBMS hardware acceleration of advanced analytics

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Hardware acceleration of database operations

Fingerprint image processing acceleration through run-time reconfigurable hardware

Transparent acceleration of program execution using reconfigurable hardware

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

In-RDBMS hardware acceleration of advanced analytics

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Hardware acceleration of database operations

Fingerprint image processing acceleration through run-time reconfigurable hardware

Transparent acceleration of program execution using reconfigurable hardware

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media