skip to main content
10.1145/1542275.1542337acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Using many-core hardware to correlate radio astronomy signals

Published:08 June 2009Publication History

ABSTRACT

A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are cross-correlated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is not only computationally intensive, but also very I/O intensive. The LOFAR telescope, for instance, will produce over 100 terabytes per day. The future SKA telescope will even require in the order of exaflops, and petabits/s of I/O. A recent trend is to correlate in software instead of dedicated hardware. This is done to increase flexibility and to reduce development efforts. Examples include e-VLBI and LOFAR.

In this paper, we evaluate the correlator algorithm on multi-core CPUs and many-core architectures, such as NVIDIA and ATI GPUs, and the Cell/B.E. The correlator is a streaming, real-time application, and is much more I/O intensive than applications that are typically implemented on many-core hardware today. We compare with the LOFAR production correlator on an IBM Blue Gene/P supercomputer. We investigate performance, power efficiency, and programmability. We identify several important architectural problems which cause architectures to perform suboptimally. Our findings are applicable to data-intensive applications in general.

The results show that the processing power and memory bandwidth of current GPUs are highly imbalanced for correlation purposes. While the production correlator on the Blue Gene/P achieves a superb 96% of the theoretical peak performance, this is only 14% on ATI GPUs, and 26% on NVIDIA GPUs. The Cell/B.E. processor, in contrast, achieves an excellent 92%. We found that the Cell/B.E. is also the most energy-efficient solution, it runs the correlator 5-7 times more energy efficiently than the Blue Gene/P. The research presented is an important pathfinder for next-generation telescopes.

References

  1. The Karoo Array Telescope (MeerKAT). See http://www.ska.ac.za.Google ScholarGoogle Scholar
  2. NVIDIA CUDA Compute Unified Device Architecture Programming Guide Version 2.0, july 2008.Google ScholarGoogle Scholar
  3. Advanced Micro Devices Corporation (AMD). AMD Stream Computing User Guide, august 2008. Revision 1.1.Google ScholarGoogle Scholar
  4. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. In ACM Transactions on Graphics, Proceedings of SIGGRAPH 2004, pages 777--786, Los Angeles, California, August 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Gschwind, H. P. Hofstee, B. K. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. Synergistic Processing in Cell's Multicore Architecture. IEEE Micro, 26(2):10--24, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. IBM Blue Gene team. Overview of the IBM Blue Gene/P project. IBM Journal of Research and Development, 52(1/2):199--220, January/March 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Johnston, R. Taylor, M. Bailes, et al. Science with ASKAP. The Australian square-kilometre-array pathfinder. Experimental Astronomy, 22(3):151--273, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  8. L. de Souza, J. D. Bunton, D. Campbell-Wilson, R. J. Cappallo, and B. Kincaid. A radio astronomy correlator optimized for the Xilinx Virtex-4 SX FPGA. In International Conference on Field Programmable Logic and Applications (FPL'07), pages 62--67, August 2007.Google ScholarGoogle ScholarCross RefCross Ref
  9. E. D. Lazowska, J. Zahorjana, G. S. Graham, and K. C. Sevcik. Quantitative System Performance, Computer System Analysis Using Queueing Network Models. Prentice-Hall, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. G. Mattson, R. V. der Wijngaart, and M. Frumkin. Programming the Intel 80-core network-on-a-chip terascale processor. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing (SC'08), pages 1--11, Austin, Texas, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn, and T. Purcell. A Survey of General-Purpose Computation on Graphics Hardware. Computer Graphics Forum, 26(1):80--113, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. W. Romein, P. C. Broekema, J. D. Mol, and Rob V. van Nieuwpoort. Processing Real-Time LOFAR Telescope Data on a Blue Gene/P Supercomputer. 2009. Submitted for publication. See http://www.astron.nl/ romein/papers.Google ScholarGoogle Scholar
  13. J. W. Romein, P. C. Broekema, E. van Meijeren, K. van der Schaaf, and W. H. Zwart. Astronomical Real-Time Streaming Signal Processing on a Blue Gene/L Supercomputer. In ACM Symposium on Parallel Algorithms and Architectures (SPAA'06), pages 59--66, Cambridge, MA, July 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. T. Schilizzi, P. E. F. Dewdney, and T. J. W. Lazio. The Square Kilometre Array. Proceedings of SPIE, 7012, july 2008.Google ScholarGoogle Scholar
  15. L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: A Many-Core x86 Architecture for Visual Computing. ACM Transactions on Graphics, 27(3), August 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Silberstein, A. Schuster, D. Geiger, A. Patney, and J. D. Owens. Efficient Computation of Sum-products on GPUs Through Software-Managed Cache. In Proceedings of the 22nd ACM International Conference on Supercomputing, pages 309--318, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Varbanescu, A. van Amesfoort, T. Cornwell, G. van Diepen, R. van Nieuwpoort, B. Elmegreen, and H. Sips. Building High-Resolution Sky Images using the Cell/B.E. Scientific Programming (accepted, to appear) Special Issue on High Performance Computing on the Cell BE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Williams, K. Datta, J. Carter, L. Oliker, J. Half, K. Yelick, and D. Bailey. PERI - Auto-tuning memory-intensive kernels for multicore. Journal of Physics: Conference Series, 125(012038), 2008.Google ScholarGoogle Scholar
  19. S. Williams, A. Waterman, and D. Patterson. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures. Communications of the ACM (CACM), 2009. to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Using many-core hardware to correlate radio astronomy signals

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICS '09: Proceedings of the 23rd international conference on Supercomputing
          June 2009
          544 pages
          ISBN:9781605584980
          DOI:10.1145/1542275

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 June 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate584of2,055submissions,28%

          Upcoming Conference

          ICS '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader