research-article

Using many-core hardware to correlate radio astronomy signals

Authors:
Rob V. van Nieuwpoort

ASTRON, Dwingeloo, Netherlands

ASTRON, Dwingeloo, Netherlands
View Profile

,
John W. Romein

ASTRON, Dwingeloo, Netherlands

ASTRON, Dwingeloo, Netherlands
View Profile

ICS '09: Proceedings of the 23rd international conference on SupercomputingJune 2009Pages 440–449https://doi.org/10.1145/1542275.1542337

Published:08 June 2009Publication History

ICS '09: Proceedings of the 23rd international conference on Supercomputing

Pages 440–449

ABSTRACT

A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are cross-correlated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is not only computationally intensive, but also very I/O intensive. The LOFAR telescope, for instance, will produce over 100 terabytes per day. The future SKA telescope will even require in the order of exaflops, and petabits/s of I/O. A recent trend is to correlate in software instead of dedicated hardware. This is done to increase flexibility and to reduce development efforts. Examples include e-VLBI and LOFAR.

In this paper, we evaluate the correlator algorithm on multi-core CPUs and many-core architectures, such as NVIDIA and ATI GPUs, and the Cell/B.E. The correlator is a streaming, real-time application, and is much more I/O intensive than applications that are typically implemented on many-core hardware today. We compare with the LOFAR production correlator on an IBM Blue Gene/P supercomputer. We investigate performance, power efficiency, and programmability. We identify several important architectural problems which cause architectures to perform suboptimally. Our findings are applicable to data-intensive applications in general.

The results show that the processing power and memory bandwidth of current GPUs are highly imbalanced for correlation purposes. While the production correlator on the Blue Gene/P achieves a superb 96% of the theoretical peak performance, this is only 14% on ATI GPUs, and 26% on NVIDIA GPUs. The Cell/B.E. processor, in contrast, achieves an excellent 92%. We found that the Cell/B.E. is also the most energy-efficient solution, it runs the correlator 5-7 times more energy efficiently than the Blue Gene/P. The research presented is an important pathfinder for next-generation telescopes.

References

The Karoo Array Telescope (MeerKAT). See http://www.ska.ac.za.Google Scholar
NVIDIA CUDA Compute Unified Device Architecture Programming Guide Version 2.0, july 2008.Google Scholar
Advanced Micro Devices Corporation (AMD). AMD Stream Computing User Guide, august 2008. Revision 1.1.Google Scholar
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. In ACM Transactions on Graphics, Proceedings of SIGGRAPH 2004, pages 777--786, Los Angeles, California, August 2004. ACM Press. Google ScholarDigital Library
M. Gschwind, H. P. Hofstee, B. K. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. Synergistic Processing in Cell's Multicore Architecture. IEEE Micro, 26(2):10--24, 2006. Google ScholarDigital Library
IBM Blue Gene team. Overview of the IBM Blue Gene/P project. IBM Journal of Research and Development, 52(1/2):199--220, January/March 2008. Google ScholarDigital Library
S. Johnston, R. Taylor, M. Bailes, et al. Science with ASKAP. The Australian square-kilometre-array pathfinder. Experimental Astronomy, 22(3):151--273, 2008.Google ScholarCross Ref
L. de Souza, J. D. Bunton, D. Campbell-Wilson, R. J. Cappallo, and B. Kincaid. A radio astronomy correlator optimized for the Xilinx Virtex-4 SX FPGA. In International Conference on Field Programmable Logic and Applications (FPL'07), pages 62--67, August 2007.Google ScholarCross Ref
E. D. Lazowska, J. Zahorjana, G. S. Graham, and K. C. Sevcik. Quantitative System Performance, Computer System Analysis Using Queueing Network Models. Prentice-Hall, 1984. Google ScholarDigital Library
T. G. Mattson, R. V. der Wijngaart, and M. Frumkin. Programming the Intel 80-core network-on-a-chip terascale processor. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing (SC'08), pages 1--11, Austin, Texas, 2008. Google ScholarDigital Library
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn, and T. Purcell. A Survey of General-Purpose Computation on Graphics Hardware. Computer Graphics Forum, 26(1):80--113, 2007.Google ScholarCross Ref
J. W. Romein, P. C. Broekema, J. D. Mol, and Rob V. van Nieuwpoort. Processing Real-Time LOFAR Telescope Data on a Blue Gene/P Supercomputer. 2009. Submitted for publication. See http://www.astron.nl/ romein/papers.Google Scholar
J. W. Romein, P. C. Broekema, E. van Meijeren, K. van der Schaaf, and W. H. Zwart. Astronomical Real-Time Streaming Signal Processing on a Blue Gene/L Supercomputer. In ACM Symposium on Parallel Algorithms and Architectures (SPAA'06), pages 59--66, Cambridge, MA, July 2006. Google ScholarDigital Library
R. T. Schilizzi, P. E. F. Dewdney, and T. J. W. Lazio. The Square Kilometre Array. Proceedings of SPIE, 7012, july 2008.Google Scholar
L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: A Many-Core x86 Architecture for Visual Computing. ACM Transactions on Graphics, 27(3), August 2008. Google ScholarDigital Library
M. Silberstein, A. Schuster, D. Geiger, A. Patney, and J. D. Owens. Efficient Computation of Sum-products on GPUs Through Software-Managed Cache. In Proceedings of the 22nd ACM International Conference on Supercomputing, pages 309--318, June 2008. Google ScholarDigital Library
A. Varbanescu, A. van Amesfoort, T. Cornwell, G. van Diepen, R. van Nieuwpoort, B. Elmegreen, and H. Sips. Building High-Resolution Sky Images using the Cell/B.E. Scientific Programming (accepted, to appear) Special Issue on High Performance Computing on the Cell BE, 2008. Google ScholarDigital Library
S. Williams, K. Datta, J. Carter, L. Oliker, J. Half, K. Yelick, and D. Bailey. PERI - Auto-tuning memory-intensive kernels for multicore. Journal of Physics: Conference Series, 125(012038), 2008.Google Scholar
S. Williams, A. Waterman, and D. Patterson. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures. Communications of the ACM (CACM), 2009. to appear. Google ScholarDigital Library

Index Terms

Using many-core hardware to correlate radio astronomy signals

Recommendations

From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture

Comparing the architectures and performance levels of an Nvidia Fermi accelerator with an Intel MIC Architecture coprocessor demonstrates the benefit of the coprocessor for bringing highly parallel applications into, or even beyond, GPGPU performance ...
Read More
Using many-core coprocessor to boost up Erlang VM
Erlang '13: Proceedings of the twelfth ACM SIGPLAN workshop on Erlang

The trend in processor design is to build more cores on a single chip. Commercial many-core processor is emerging these years. Intel Xeon Phi coprocessor , which is equipped with at least 60 relatively slow cores, is the first commercial many-core ...
Read More
Multi- and many-core data mining with adaptive sparse grids
CF '11: Proceedings of the 8th ACM International Conference on Computing Frontiers

Gaining knowledge out of vast datasets is a main challenge in data-driven applications nowadays. Sparse grids provide a numerical method for both classification and regression in data mining which scales only linearly in the number of data points and is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '09: Proceedings of the 23rd international conference on Supercomputing
June 2009
544 pages
ISBN:9781605584980
DOI:10.1145/1542275
General Chairs:
Michael Gschwind
IBM TJ Watson Research Center, USA
,
Alex Nicolau
UC Irvine, USA
,
Program Chairs:
Valentina Salapura
IBM TJ Watson Research Center, USA
,
José Moreira
IBM TJ Watson Research Center, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
correlator
lofar
many-core
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Upcoming Conference
ICS '24

Sponsor:

sigarch

2024 International Conference on Supercomputing

June 4 - 7, 2024

Kyoto , Japan
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 369
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Using many-core hardware to correlate radio astronomy signals

ICS '09: Proceedings of the 23rd international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture

Using many-core coprocessor to boost up Erlang VM

Multi- and many-core data mining with adaptive sparse grids