skip to main content
research-article

Database Meets Deep Learning: Challenges and Opportunities

Published:28 September 2016Publication History
Skip Abstract Section

Abstract

Deep learning has recently become very popular on account of its incredible success in many complex datadriven applications, including image classification and speech recognition. The database community has worked on data-driven applications for many years, and therefore should be playing a lead role in supporting this new wave. However, databases and deep learning are different in terms of both techniques and applications. In this paper, we discuss research problems at the intersection of the two fields. In particular, we discuss possible improvements for deep learning systems from a database perspective, and analyze database applications that may benefit from deep learning techniques.

References

  1. F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, and Y. Bengio. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop, 2012.Google ScholarGoogle Scholar
  2. J. Chen, R. Monga, S. Bengio, and R. Józefowicz. Revisiting distributed synchronous SGD. CoRR, abs/1604.00981, 2016.Google ScholarGoogle Scholar
  3. T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR, abs/1512.01274, 2015.Google ScholarGoogle Scholar
  4. T. Chen, B. Xu, C. Zhang, and C. Guestrin. Training deep nets with sublinear memory cost. CoRR, abs/1604.06174, 2016.Google ScholarGoogle Scholar
  5. A. Coates, B. Huval, T. Wang, D. J. Wu, B. C. Catanzaro, and A. Y. Ng. Deep learning with COTS HPC systems. In ICML, pages 1337--1345, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop, number EPFL-CONF-192376, 2011.Google ScholarGoogle Scholar
  7. M. Courbariaux, Y. Bengio, and J.-P. David. Low precision arithmetic for deep learning. arXiv preprint arXiv:1412.7024, 2014.Google ScholarGoogle Scholar
  8. H. Cui, H. Zhang, G. R. Ganger, P. B. Gibbons, and E. P. Xing. Geeps: Scalable deep learning on distributed gpus with a gpu-specialized parameter server. In EuroSys, page 4. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. W. Senior, P. A. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In NIPS, pages 1232--1240, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, K. Murphy, S. Sun, and W. Zhang. From data fusion to knowledge fusion. PVLDB, 7(10):881--892, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. A. et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.Google ScholarGoogle Scholar
  12. J. Gao, H. Jagadish, and B. C. Ooi. Active sampler: Light-weight accelerator for complex data analytics at scale. arXiv preprint arXiv:1512.03880, 2015.Google ScholarGoogle Scholar
  13. Y. Goldberg. A primer on neural network models for natural language processing. CoRR, abs/1510.00726, 2015.Google ScholarGoogle Scholar
  14. C. Guo, C. S. Jensen, and B. Yang. Towards total traffic awareness. ACM SIGMOD Record, 43(3):18--23, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Gupta, W. Zhang, and J. Milthorpe. Model accuracy and runtime tradeoff in distributed deep learning. arXiv preprint arXiv:1509.04210, 2015.Google ScholarGoogle Scholar
  16. S. Hadjis, C. Zhang, I. Mitliagkas, and C. Ré. Omnivore: An optimizer for multi-device deep learning on cpus and gpus. CoRR, abs/1606.04487, 2016.Google ScholarGoogle Scholar
  17. J. R. Haritsa. The picasso database query optimizer visualizer. Proceedings of the VLDB Endowment, 3(1-2):1517--1520, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. B. Ian Goodfellow and A. Courville. Deep learning. Book in preparation for MIT Press, 2016.Google ScholarGoogle Scholar
  19. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093, 2014.Google ScholarGoogle Scholar
  20. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097--1105, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Lacey, G. W. Taylor, and S. Areibi. Deep learning on fpgas: Past, present, and future. CoRR,abs/1602.04283, 2016.Google ScholarGoogle Scholar
  22. Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436--444, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  23. M. L. Lee, M. Kitsuregawa, B. C. Ooi, K.-L. Tan, and A. Mondal. Towards self-tuning data placement in parallel database systems. In ACM SIGMOD Record, volume 29, pages 225--236. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Li and H. Jagadish. Constructing an interactive natural language interface for relational databases. PVLDB, 8(1):73--84, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. F. Li, B. C. Ooi, M. T. Özsu, and S. Wu. Distributed data management using mapreduce. ACM Comput. Surv., 46(3):31:1--31:42, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. R. Mould. Models for disease progression: New approaches and uses. Clinical Pharmacology & Therapeutics, 92(1):125--131, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  27. B. C. Ooi, K. Tan, Q. T. Tran, J. W. L. Yip, G. Chen, Z. J. Ling, T. Nguyen, A. K. H. Tung, and M. Zhang. Contextual crowd intelligence. SIGKDD Explorations, 16(1):39--46, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. C. Ooi, K.-L. Tan, S. Wang, W. Wang, Q. Cai, G. Chen, J. Gao, Z. Luo, A. K. H. Tung, Y. Wang, Z. Xie, M. Zhang, and K. Zheng. SINGA: A distributed deep learning platform. In ACM Multimedia, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Ré, D. Agrawal, M. Balazinska, M. I. Cafarella, M. I. Jordan, T. Kraska, and R. Ramakrishnan. Machine learning and databases: The sound of things to come or a cacophony of hype? In SIGMOD, pages 283--284, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In INTERSPEECH, pages 1058--1062, 2014.Google ScholarGoogle Scholar
  31. D. Silver and et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484--489, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  32. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.Google ScholarGoogle Scholar
  33. R. Socher, D. Chen, C. D. Manning, and A. Ng. Reasoning with neural tensor networks for knowledge base completion. In NIPS, pages 926--934, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, pages 3104--3112, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. K.-L. Tan, Q. Cai, B. C. Ooi, W.-F. Wong, C. Yao, and H. Zhang. In-memory databases: Challenges and opportunities from software and hardware perspectives. ACM SIGMOD Record, 44(2):35--40, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. O. Vinyals, L. Kaiser, T. Koo, S. Petrov, I. Sutskever, and G. Hinton. Grammar as a foreign language. arXiv:1412.7449, 2014.Google ScholarGoogle Scholar
  37. Q. H. Vu, M. Lupu, and B. C. Ooi. Peer-to-peer computing. Springer, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  38. W. Wang, G. Chen, T. T. A. Dinh, J. Gao, B. C. Ooi, K.-L. Tan, and S. Wang. SINGA: Putting deep learning in the hands of multimedia users. In ACM Multimedia, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. W. Wang, B. C. Ooi, X. Yang, D. Zhang, and Y. Zhuang. Effective multi-modal retrieval based on stacked auto-encoders. PVLDB, 7(8):649--660, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. W. Wang, X. Yang, B. C. Ooi, D. Zhang, and Y. Zhuang. Effective deep learning-based multi-modal retrieval. The VLDB Journal, pages 1--23, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Wei, W. Dai, A. Qiao, Q. Ho, H. Cui, G. R. Ganger, P. B. Gibbons, G. A. Gibson, and E. P. Xing. Managed communication and consistency for fast data-parallel iterative analytics. In SoCC, pages 381--394, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. R. Wu, S. Yan, Y. Shan, Q. Dang, and G. Sun. Deep image: Scaling up image recognition. CoRR, abs/1501.02876, 2015.Google ScholarGoogle Scholar
  43. T. Wu, L. Chen, P. Hui, C. J. Zhang, and W. Li. Hear the whole story: Towards the diversity of opinion in crowdsourcing markets. PVLDB, 8(5):485--496, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. C. Yao, D. Agrawal, G. Chen, Q. Lin, B. C. Ooi, W. F. Wong, and M. Zhang. Exploiting single-threaded model in multi-core in-memory systems. IEEE Trans. Knowl. Data Eng., 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. D. Zeiler. Adadelta: An adaptive learning rate method. arXiv:1212.5701, 2012.Google ScholarGoogle Scholar
  46. H. Zhang, G. Chen, B. C. Ooi, K. Tan, and M. Zhang. In-memory big data management and processing: A survey. IEEE Trans. Knowl. Data Eng., 27(7):1920--1948, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader