skip to main content
research-article

In-RDBMS hardware acceleration of advanced analytics

Published:01 July 2018Publication History
Skip Abstract Section

Abstract

The data revolution is fueled by advances in machine learning, databases, and hardware design. Programmable accelerators are making their way into each of these areas independently. As such, there is a void of solutions that enables hardware acceleration at the intersection of these disjoint fields. This paper sets out to be the initial step towards a unifying solution for in-Database Acceleration of Advanced Analytics (DAnA). Deploying specialized hardware, such as FPGAs, for in-database analytics currently requires hand-designing the hardware and manually routing the data. Instead, DAnA automatically maps a high-level specification of advanced analytics queries to an FPGA accelerator. The accelerator implementation is generated for a User Defined Function (UDF), expressed as a part of an SQL query using a Python-embedded Domain-Specific Language (DSL). To realize an efficient in-database integration, DAnA accelerators contain a novel hardware structure, Striders, that directly interface with the buffer pool of the database. Striders extract, cleanse, and process the training data tuples that are consumed by a multi-threaded FPGA engine that executes the analytics algorithm. We integrate DAnA with PostgreSQL to generate hardware accelerators for a range of real-world and synthetic datasets running diverse ML algorithms. Results show that DAnA-enhanced PostgreSQL provides, on average, 8.3× end-to-end speedup for real datasets, with a maximum of 28.2×. Moreover, DAnA-enhanced PostgreSQL is, on average, 4.0× faster than the multi-threaded Apache MADLib running on Greenplum. DAnA provides these benefits while hiding the complexity of hardware design from data scientists and allowing them to express the algorithm in ≈30-60 lines of Python.

References

  1. Gartner Report on Analytics. gartner.com/it/page.jsp?id=1971516.Google ScholarGoogle Scholar
  2. SAS Report on Analytics. sas.com/reg/wp/corp/23876.Google ScholarGoogle Scholar
  3. M. Owaida, D. Sidler, K. Kara, and G. Alonso. Centaur: A framework for hybrid cpu-fpga databases. In 2017 IEEE 25th International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 211--218, April 2017.Google ScholarGoogle ScholarCross RefCross Ref
  4. David Sidler, Muhsen Owaida, Zsolt István, Kaan Kara, and Gustavo Alonso. doppiodb: A hardware accelerated database. In 27th International Conference on Field Programmable Logic and Applications, FPL 2017, Ghent, Belgium, September 4--8, 2017, page 1, 2017.Google ScholarGoogle Scholar
  5. Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kim, and Hadi Esmaeilzadeh. T<scp>abla</scp>: A unified template-based framework for accelerating statistical machine learning. In IEEE International Symposium on High Performance Computer Architecture (HPCA), March 2016.Google ScholarGoogle ScholarCross RefCross Ref
  6. Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceeding of the 41st Annual International Symposium on Computer Architecuture, ISCA '14, pages 13--24, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Xixuan Feng, Arun Kumar, Benjamin Recht, and Christopher Ré. Towards a Unified Architecture for in-RDBMS Analytics. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12, pages 325--336. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yu Cheng, Chengjie Qin, and Florin Rusu. GLADE: Big Data Analytics Made Easy. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12, pages 697--700. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Andrew R. Putnam, Dave Bennett, Eric Dellinger, Jeff Mason, and Prasanna Sundararajan. CHiMPS: A high-level compilation flow for hybrid CPU-FPGA architectures. In Field Programmable Gate Arrays (FPGA), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Amazon web services postgresql. https://aws.amazon.com/rds/postgresql/.Google ScholarGoogle Scholar
  11. Azure sql database. https://azure.microsoft.com/enus/services/sql-database/.Google ScholarGoogle Scholar
  12. Oracle Data Mining. http://www.oracle.com/technetwork/database/options/advanced-analytics/odm/overview/index.html.Google ScholarGoogle Scholar
  13. Oracle R Enterprise. http://www.oracle.com/technetwork/database/databasetechnologies/r/r-enterprise/overview/index.html.Google ScholarGoogle Scholar
  14. Jeffrey Cohen, Brian Dolan, Mark Dunlap, Joseph M. Hellerstein, and Caleb Welton. MAD Skills: New Analysis Practices for Big Data. PVLDB, 2(2):1481--1492, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Joseph M. Hellerstein, Christoper Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar. The MADlib Analytics Library: Or MAD Skills, the SQL. PVLDB, 5(12):1700--1711, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Microsoft SQL Server Data Mining. https://docs.microsoft.com/en-us/sql/analysis-services/data-mining/data-mining-ssas.Google ScholarGoogle Scholar
  17. Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 24. Curran Associates, Inc., 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Arun Kumar, Jeffrey Naughton, and Jignesh M. Patel. Learning generalized linear models over normalized data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pages 1969--1984. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jongse Park, Hardik Sharma, Divya Mahajan, Joon Kyung Kim, Preston Olds, and Hadi Esmaeilzadeh. Scale-out acceleration for machine learning. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pages 367--381, New York, NY, USA, 2017. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. Dadiannao: A machine-learning supercomputer. In Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on, pages 609--622. IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. Pudiannao: A polyvalent machine learning accelerator. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jing Li, Hung-Wei Tseng, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. Hippogriffdb: Balancing i/o and gpu bandwidth in big data analytics. PVLDB, 9(14):1647--1658, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Rene Mueller, Jens Teubner, and Gustavo Alonso. Data processing on fpgas. PVLDB, 2(1):910--921, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kaan Kara, Jana Giceva, and Gustavo Alonso. Fpga-based data partitioning. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, pages 433--445. ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kaan Kara, Dan Alistarh, Gustavo Alonso, Onur Mutlu, and Ce Zhang. Fpga-accelerated dense linear machine learning: A precision-convergence trade-off. 2017 IEEE 25th FCCM, pages 160--167, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  26. Amazon EC2 F1 instances: Run custom FPGAs in the amazon web services (aws) cloud. https://aws.amazon.com/ec2/instance-types/f1/, 2017.Google ScholarGoogle Scholar
  27. Adrian M Caulfield, Eric S Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, et al. A cloud-scale acceleration architecture. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on, pages 1--13. IEEE, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, Richard C. Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. In-datacenter performance analysis of a tensor processing unit. CoRR, abs/1704.04760, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, and Hadi Esmaeilzadeh. Tabla: A unified template-based framework for accelerating statistical machine learning. In HPCA, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  30. Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. From high-level deep neural models to FPGAs. In ACM/IEEE International Symposium on Microarchitecture (MICRO), October 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. On parallelizability of stochastic gradient descent for speech dnns. In ICASSP, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  32. Martin Zinkevich, Markus Weimer, Lihong Li, and Alex J Smola. Parallelized stochastic gradient descent. In Neural Information Processing Systems, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, and Lin Xiao. Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research, 13(Jan):165--202, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Langford, A.J. Smola, and M. Zinkevich. Slow learners are fast. In NIPS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Gideon Mann, Ryan McDonald, Mehryar Mohri, Nathan Silberman, and Daniel D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In NIPS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidynathan, Srinivas Sridharan, Dhiraj Kalamkar, Bharat Kaul, and Pradeep Dubey. Distributed deep learning using synchronous stochastic gradient descent. arXiv:1602.06709 {cs}, 2016.Google ScholarGoogle Scholar
  37. Jianmin Chen, Rajat Monga, Samy Bengio, and Rafal Jozefowicz. Revisiting distributed synchronous SGD. In International Conference on Learning Representations Workshop Track, 2016.Google ScholarGoogle Scholar
  38. A. Frank and A. Asuncion. University of california, irvine (uci) machine learning repository, 2010.Google ScholarGoogle Scholar
  39. Jin Kyu Kim, Qirong Ho, Seunghak Lee, Xun Zheng, Wei Dai, Garth A. Gibson, and Eric P. Xing. Strads: A distributed framework for scheduled model parallel machine learning. In Proceedings of the 11th European Conference on Computer Systems, pages 5:1--5:16, New York, NY, USA, 2016. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification. J. Mach. Learn. Res., 9:1871--1874, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ce Zhang and Christopher Ré. Dimmwitted: A study of main-memory statistical analytics. Computing Research Repository (CoRR), abs/1403.7550, 2014.Google ScholarGoogle Scholar
  42. S. Cadambi, I. Durdanovic, V. Jakkula, M. Sankaradass, E. Cosatto, S. Chakradhar, and H. P. Graf. A massively parallel fpga-based coprocessor for support vector machines. In 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines, pages 115--122, April 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Papadonikolakis and C. S. Bouganis. A heterogeneous fpga architecture for support vector machine training. In 2010 18th IEEE FCCM, pages 211--214, May 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Falcon computing. http://cadlab.cs.ucla.edu/~cong/slides/HALO15 keynote.pdf.Google ScholarGoogle Scholar
  45. TABLA source code. http://www.act-lab.org/artifacts/tabla/.Google ScholarGoogle Scholar
  46. Eric S. Chung, John D. Davis, and Jaewon Lee. LINQits: Big data on little clients. In ISCA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, K. Sjoerd Mullender, and Martin L. Kersten. Monetdb: Two decades of research in column-oriented database architectures. IEEE Technical Committee on Data Engineering, 35(1):40--45, 2012.Google ScholarGoogle Scholar
  48. Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proceedings of 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. Shidiannao: shifting vision processing closer to the sensor. In 42nd International Symposium on Computer Architecture (ISCA), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. Eie: Efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture, ISCA '16, pages 243--254, Piscataway, NJ, USA, 2016. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Yu-Hsin Chen, Joel Emer, and Vivienne Sze. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing. In ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, Jose Miguel Hernandez-Lobato, Gu-Yeon Wei, and David Brooks. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh. From high-level deep neural models to fpgas. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1--12, Oct 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Boriana L. Milenova, Joseph S. Yarmus, and Marcos M. Campos. Svm in oracle database 10 g : Removing the barriers to widespread adoption of support vector machines. PVLDB, pages 1152--1163, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Daisy Zhe Wang, Michael J. Franklin, Minos Garofalakis, and Joseph M. Hellerstein. Querying probabilistic information extraction. PVLDB, 3(1--2):1057--1067, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Michael Wick, Andrew McCallum, and Gerome Miklau. Scalable probabilistic databases with factor graphs and mcmc. PVLDB, 3(1--2):794--804, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, and Ion Stoica. Shark: Sql and rich analytics at scale. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 13--24, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. M. Levent Koc and Christopher Ré. Incrementally maintaining classification using an rdbms. PVLDB, 4(5):302--313, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Carsten Binnig, Ugur Cetintemel, and Stan Zdonik. An architecture for compiling udf-centric workflows. PVLDB, 8(12):1466--1477, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Shoumik Palkar, James J. Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, and Matei Zaharia. Weld: A common runtime for high performance data analytics. January 2017.Google ScholarGoogle Scholar
  62. Arun Kumar, Matthias Boehm, and Jun Yang. Data management in machine learning: Challenges, techniques, and systems. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, pages 1717--1722, New York, NY, USA, 2017. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 11, Issue 11
    July 2018
    507 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 July 2018
    Published in pvldb Volume 11, Issue 11

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader