skip to main content
research-article

Deep bilateral learning for real-time image enhancement

Published:20 July 2017Publication History
Skip Abstract Section

Abstract

Performance is a critical challenge in mobile image processing. Given a reference imaging pipeline, or even human-adjusted pairs of images, we seek to reproduce the enhancements and enable real-time evaluation. For this, we introduce a new neural network architecture inspired by bilateral grid processing and local affine color transforms. Using pairs of input/output images, we train a convolutional neural network to predict the coefficients of a locally-affine model in bilateral space. Our architecture learns to make local, global, and content-dependent decisions to approximate the desired image transformation. At runtime, the neural network consumes a low-resolution version of the input image, produces a set of affine transformations in bilateral space, upsamples those transformations in an edge-preserving fashion using a new slicing node, and then applies those upsampled transformations to the full-resolution image. Our algorithm processes high-resolution images on a smartphone in milliseconds, provides a real-time viewfinder at 1080p resolution, and matches the quality of state-of-the-art approximation techniques on a large class of image operators. Unlike previous work, our model is trained off-line from data and therefore does not require access to the original operator at runtime. This allows our model to learn complex, scene-dependent transformations for which no reference implementation is available, such as the photographic edits of a human retoucher.

Skip Supplemental Material Section

Supplemental Material

papers-0027.mp4

mp4

217.8 MB

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http://tensorflow.org/Google ScholarGoogle Scholar
  2. Andrew Adams, Jongmin Baek, and Myers Abraham Davis. 2010. Fast High-Dimensional Filtering Using the Permutohedral Lattice. Computer Graphics Forum (2010).Google ScholarGoogle Scholar
  3. Mathieu Aubry, Sylvain Paris, Samuel W Hasinoff, Jan Kautz, and Frédo Durand. 2014. Fast local laplacian filters: Theory and applications. ACM TOG (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jonathan T Barron, Andrew Adams, YiChang Shih, and Carlos Hernández. 2015. Fast bilateral-space stereo for synthetic defocus. CVPR (2015).Google ScholarGoogle Scholar
  5. Jonathan T Barron and Ben Poole. 2016. The Fast Bilateral Solver. ECCV (2016).Google ScholarGoogle Scholar
  6. Adrien Bousseau, Sylvain Paris, and Frédo Durand. 2009. User-assisted intrinsic images. ACM TOG (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Vladimir Bychkovsky, Sylvain Paris, Eric Chan, and Frédo Durand. 2011. Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs. CVPR (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jiawen Chen, Andrew Adams, Neal Wadhwa, and Samuel W Hasinoff. 2016. Bilateral guided upsampling. ACM TOG (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jiawen Chen, Sylvain Paris, and Frédo Durand. 2007. Real-time edge-aware image processing with the bilateral grid. ACM TOG (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2014. Learning a deep convolutional network for image super-resolution. ECCV (2014).Google ScholarGoogle Scholar
  11. David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. NIPS (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Zeev Farbman, Raanan Fattal, and Dani Lischinski. 2011. Convolution pyramids. ACM TOG (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Michaël Gharbi, Gaurav Chaurasia, Sylvain Paris, and Frédo Durand. 2016. Deep Joint Demosaicking and Denoising. ACM TOG (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Michaël Gharbi, YiChang Shih, Gaurav Chaurasia, Jonathan Ragan-Kelley, Sylvain Paris, and Frédo Durand. 2015. Transform Recipes for Efficient Cloud Photo Enhancement. ACM TOG (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Samuel W Hasinoff, Dillon Sharlet, Ryan Geiss, Andrew Adams, Jonathan T Barron, Florian Kainz, Jiawen Chen, and Marc Levoy. 2016. Burst photography for high dynamic range and low-light imaging on mobile cameras. ACM TOG (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kaiming He and Jian Sun. 2015. Fast Guided Filter. CoRR (2015).Google ScholarGoogle Scholar
  17. Kaiming He, Jian Sun, and Xiaoou Tang. 2013. Guided image filtering. TPAMI (2013).Google ScholarGoogle Scholar
  18. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. CoRR (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. James Hegarty, John Brunhaver, Zachary DeVito, Jonathan Ragan-Kelley, Noy Cohen, Steven Bell, Artem Vasilyev, Mark Horowitz, and Pat Hanrahan. 2014. Darkroom: compiling high-level image processing code into hardware pipelines. ACM TOG (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sung Ju Hwang, Ashish Kapoor, and Sing Bing Kang. 2012. Context-based automatic local image enhancement. ECCV (2012).Google ScholarGoogle Scholar
  21. Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2016. Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM TOG (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2016. Flownet 2.0: Evolution of optical flow estimation with deep networks. CoRR (2016).Google ScholarGoogle Scholar
  23. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2016. Image-to-Image Translation with Conditional Adversarial Networks. CoRR (2016).Google ScholarGoogle Scholar
  25. Max Jaderberg, Karen Simonyan, Andrew Zisserman, and others. 2015. Spatial transformer networks. In Advances in Neural Information Processing Systems. 2017--2025. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Vidit Jain and Erik Learned-Miller. 2010. FDDB: A Benchmark for Face Detection in Unconstrained Settings. Technical Report UM-CS-2010--009. University of Massachusetts, Amherst.Google ScholarGoogle Scholar
  27. Varun Jampani, Martin Kiefel, and Peter V. Gehler. 2016. Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks. CVPR (2016).Google ScholarGoogle Scholar
  28. Liad Kaufman, Dani Lischinski, and Michael Werman. 2012. Content-Aware Automatic Photo Enhancement. Computer Graphics Forum (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. ICLR (2015).Google ScholarGoogle Scholar
  30. Johannes Kopf, Michael F Cohen, Dani Lischinski, and Matt Uyttendaele. 2007. Joint bilateral upsampling. ACM TOG (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet classification with deep convolutional neural networks. NIPS (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Anat Levin, Dani Lischinski, and Yair Weiss. 2008. A closed-form solution to natural image matting. TPAMI (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sifei Liu, Jinshan Pan, and Ming-Hsuan Yang. 2016. Learning recursive filters for low-level vision via a hybrid neural network. ECCV (2016).Google ScholarGoogle Scholar
  34. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. CVPR (2015).Google ScholarGoogle Scholar
  35. Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically Scheduling Halide Image Processing Pipelines. ACM TOG (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sylvain Paris and Frédo Durand. 2006. A fast approximation of the bilateral filter using a signal processing approach. ECCV (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Sylvain Paris, Samuel W Hasinoff, and Jan Kautz. 2011. Local Laplacian filters: edge-aware image processing with a Laplacian pyramid. ACM TOG (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. ACM TOG (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Google ScholarGoogle ScholarCross RefCross Ref
  40. Xiaoyong Shen, Xin Tao, Hongyun Gao, Chao Zhou, and Jiaya Jia. 2016. Deep Automatic Portrait Matting. ECCV (2016).Google ScholarGoogle Scholar
  41. Yichang Shih, Sylvain Paris, Frédo Durand, and William T Freeman. 2013. Data-driven hallucination of different times of day from a single outdoor photo. ACM TOG (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Carlo Tomasi and Roberto Manduchi. 1998. Bilateral filtering for gray and color images. ICCV (1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Li Xu, Jimmy Ren, Qiong Yan, Renjie Liao, and Jiaya Jia. 2015. Deep Edge-Aware Filters. ICML (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Zhicheng Yan, Hao Zhang, Baoyuan Wang, Sylvain Paris, and Yizhou Yu. 2016. Automatic photo adjustment using deep neural networks. ACM TOG (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. CoRR (2015).Google ScholarGoogle Scholar
  46. Lu Yuan and Jian Sun. 2011. High quality image reconstruction from raw and jpeg image pair. ICCV (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. 2016. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. CoRR (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Deep bilateral learning for real-time image enhancement

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 36, Issue 4
        August 2017
        2155 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/3072959
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 July 2017
        Published in tog Volume 36, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader