skip to main content
research-article

Darkroom: compiling high-level image processing code into hardware pipelines

Published:27 July 2014Publication History
Skip Abstract Section

Abstract

Specialized image signal processors (ISPs) exploit the structure of image processing pipelines to minimize memory bandwidth using the architectural pattern of line-buffering, where all intermediate data between each stage is stored in small on-chip buffers. This provides high energy efficiency, allowing long pipelines with tera-op/sec. image processing in battery-powered devices, but traditionally requires painstaking manual design in hardware. Based on this pattern, we present Darkroom, a language and compiler for image processing. The semantics of the Darkroom language allow it to compile programs directly into line-buffered pipelines, with all intermediate values in local line-buffer storage, eliminating unnecessary communication with off-chip DRAM. We formulate the problem of optimally scheduling line-buffered pipelines to minimize buffering as an integer linear program. Finally, given an optimally scheduled pipeline, Darkroom synthesizes hardware descriptions for ASIC or FPGA, or fast CPU code. We evaluate Darkroom implementations of a range of applications, including a camera pipeline, low-level feature detection algorithms, and deblurring. For many applications, we demonstrate gigapixel/sec. performance in under 0.5mm2 of ASIC silicon at 250 mW (simulated on a 45nm foundry process), real-time 1080p/60 video processing using a fraction of the resources of a modern FPGA, and tens of megapixels/sec. of throughput on a quad-core x86 processor.

Skip Supplemental Material Section

Supplemental Material

a144-sidebyside.mp4

mp4

25.3 MB

References

  1. Adams, A., Talvala, E.-V., Park, S. H., Jacobs, D. E., Ajdin, B., Gelfand, N., Dolson, J., Vaquero, D., Baek, J., Tico, M., Lensch, H. P. A., Matusik, W., Pulli, K., Horowitz, M., and Levoy, M. 2010. The Frankencamera: An experimental platform for computational photography. ACM Transactions on Graphics 29, 4 (July), 29:1--29:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aptina. Aptina MT9P111. http://www.aptina.com/products/soc/mt9p111/.Google ScholarGoogle Scholar
  3. Berkelaar, M., Eikland, K., Notebaert, P., et al. 2004. lpsolve: Open source (mixed-integer) linear programming system. Eindhoven U. of Technology.Google ScholarGoogle Scholar
  4. Bilsen, G., Engels, M., Lauwereins, R., and Peperstraete, J. 1995. Cyclo-static data flow. In 1995 International Conference on Acoustics, Speech, and Signal Processing, vol. 5, 3255--3258.Google ScholarGoogle Scholar
  5. Bouguet, J.-Y. 2001. Pyramidal implementation of the affine Lucas Kanade feature tracker description of the algorithm. Tech. rep., Intel Corporation.Google ScholarGoogle Scholar
  6. Canny, J. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 679--698. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., and Yelick, K. 2008. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, IEEE Press, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. DeVito, Z., Hegarty, J., Aiken, A., Hanrahan, P., and Vitek, J. 2013. Terra: A multi-stage language for high-performance computing. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, 105--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Elliott, C. 2001. Functional image synthesis. In Proceedings of Bridges.Google ScholarGoogle Scholar
  10. Frigo, M., and Strumpen, V. 2005. Cache oblivious stencil computations. In Proceedings of the 19th annual international conference on Supercomputing, ACM, 361--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gummaraju, J., and Rosenblum, M. 2005. Stream programming on general-purpose processors. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, IEEE, 343--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hameed, R., Qadeer, W., Wachs, M., Azizi, O., Solomatnikov, A., Lee, B. C., Richardson, S., Kozyrakis, C., and Horowitz, M. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer Architecture, ACM, 37--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Harris, C., and Stephens, M. 1988. A combined corner and edge detector. In Proceedings of the 4th Alvey Vision Conference, 147--151.Google ScholarGoogle Scholar
  14. Holzmann, G. 1988. Beyond Photography: The Digital Darkroom. Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kung, H. T. 1979. Let's design algorithms for VLSI systems. In Proceedings of the Caltech Conference on Very Large Scale Integration.Google ScholarGoogle Scholar
  16. Lattner, C., and Adve, V. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lee, E. A., and Messerschmitt, D. G. 1987. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers 100, 1, 24--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Leiserson, C. E., and Saxe, J. B. 1991. Retiming synchronous circuitry. Algorithmica 6, 1--6, 5--35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lucas, B. D., Kanade, T., et al. 1981. An iterative image registration technique with an application to stereo vision. In IJCAI, vol. 81, 674--679. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Malladi, K., Nothaft, F., Periyathambi, K., Lee, B., Kozyrakis, C., and Horowitz, M. 2012. Towards energy-proportional datacenter memory with mobile dram. In 2012 39th Annual International Symposium on Computer Architecture (ISCA), 37--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Muralimanohar, N., and Balasubramonian, R. 2009. Cacti 6.0: A tool to understand large caches. Tech. rep., HP Labs.Google ScholarGoogle Scholar
  22. Murthy, P., Bhattacharyya, S., and Lee, E. 1997. Joint minimization of code and data for synchronous dataflow programs. Formal Methods in System Design 11, 1, 41--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Nguyen, A., Satish, N., Chhugani, J., Kim, C., and Dubey, P. 2010. 3.5-d blocking optimization for stencil computations on modern cpus and gpus. In in Proc. of the 2010 ACM/IEEE Intl Conf. for High Performance Computing, Networking, Storage and Analysis, 2010, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. OpenCV. OpenCV. http://opencv.org/.Google ScholarGoogle Scholar
  25. Qualcomm. Qualcomm hexagon SDK. https://developer.qualcomm.com/mobile-development/maximize-hardware/mobile-multimedia-optimization-hexagon-sdk.Google ScholarGoogle Scholar
  26. Ragan-Kelley, J., Adams, A., Paris, S., Levoy, M., Amarasinghe, S., and Durand, F. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics (TOG) 31, 4, 32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., and Amarasinghe, S. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, ACM, 519--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Richardson, W. H. 1972. Bayesian-based iterative method of image restoration. JOSA 62, 1, 55--59.Google ScholarGoogle ScholarCross RefCross Ref
  29. Shacham, O., Galal, S., Sankaranarayanan, S., Wachs, M., Brunhaver, J., Vassiliev, A., Horowitz, M., Danowitz, A., Qadeer, W., and Richardson, S. 2012. Avoiding game over: Bringing design to the next level. In Proceedings of the 49th Annual Design Automation Conference (DAC), 623--629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Shantzis, M. A. 1994. A model for efficient and flexible image computing. In Proceedings of the 21st annual conference on Computer graphics and interactive techniques, ACM, 147--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sugerman, J., Fatahalian, K., Boulos, S., Akeley, K., and Hanrahan, P. 2009. Gramps: A programming model for graphics pipelines. ACM Transactions on Graphics (TOG) 28, 1 (Feb.), 4:1--4:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Tang, Y., Chowdhury, R. A., Kuszmaul, B. C., Luk, C.-K., and Leiserson, C. E. 2011. The Pochoir stencil compiler. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, ACM, 117--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vivado. vivado. http://www.xilinx.com/products/design-tools/vivado/integration/esl-design/.Google ScholarGoogle Scholar

Index Terms

  1. Darkroom: compiling high-level image processing code into hardware pipelines

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Graphics
          ACM Transactions on Graphics  Volume 33, Issue 4
          July 2014
          1366 pages
          ISSN:0730-0301
          EISSN:1557-7368
          DOI:10.1145/2601097
          Issue’s Table of Contents

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 July 2014
          Published in tog Volume 33, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader