Skip to main content

2020 | OriginalPaper | Buchkapitel

HDF5-Based I/O Optimization for Extragalactic HI Data Pipeline of FAST

verfasst von : Yiming Ji, Ce Yu, Jian Xiao, Shanjiang Tang, Hao Wang, Bo Zhang

Erschienen in: Algorithms and Architectures for Parallel Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Five-hundred-meter Aperture Spherical Radio Telescope (FAST), which is the largest single-dish radio telescope in the world, has been producing a very large data volume with high speed. So it requires a high performance data pipeline to covert the huge raw observed data to science data product. However, the existing solutions of pipelines widely used in radio data processing cannot tackle this situation efficiently. The paper proposes a pipeline architecture for FAST based on HDF5 format and several I/O optimization strategies. First, we design the workflow engine driving the various tasks efficiently in the pipeline; second, we design a common radio data storage specification on the top of HDF5 format, and also developed a fast converter to map the original FITS format to the new HDF5 format; third, we apply several concrete strategies to optimize the I/O operations, including chunks storage, parallel reading/writing, on-demand dump, and stream process etc. In the experiment of processing 700 GB of FAST data, the results show that HDF5 based data structure without other optimizations was 1.7 times faster than original FITS format. If chunk storage and parallel I/O optimization are applied, the overall performance can reach 4.5 times as the original one. Moreover, due to the good expansibility and flexibility, our solution of FAST pipeline can be adapted to other radio telescopes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Anderson, K., Alexov, A., Baehren, L., Griessmeier, J.M., Renting, A.: LOFAR and HDF5: toward a new radio data standard. Astron. Data Anal. Softw. Syst. XX 442, 53–56 (2010) Anderson, K., Alexov, A., Baehren, L., Griessmeier, J.M., Renting, A.: LOFAR and HDF5: toward a new radio data standard. Astron. Data Anal. Softw. Syst. XX 442, 53–56 (2010)
2.
Zurück zum Zitat Bacon, R., et al.: The second-generation VLT instrument muse: science drivers and instrument design. In: Proceedings of SPIE - The International Society for Optical Engineering, pp. 1145–1149 (2004) Bacon, R., et al.: The second-generation VLT instrument muse: science drivers and instrument design. In: Proceedings of SPIE - The International Society for Optical Engineering, pp. 1145–1149 (2004)
3.
Zurück zum Zitat Ballester, P., et al.: Data reduction pipelines for the very large telescope. Proc. SPIE - Int. Soc. Opt. Eng. 22(2), 85–98 (2006) Ballester, P., et al.: Data reduction pipelines for the very large telescope. Proc. SPIE - Int. Soc. Opt. Eng. 22(2), 85–98 (2006)
4.
Zurück zum Zitat Chen, Y., Winslett, M., Yong, C., Kuo, S.W.: Automatic parallel I/O performance optimization in Panda. In: Proceedings of Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 108–118 (1998) Chen, Y., Winslett, M., Yong, C., Kuo, S.W.: Automatic parallel I/O performance optimization in Panda. In: Proceedings of Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 108–118 (1998)
5.
Zurück zum Zitat Davis, L.E.: An overview of the ALMA pipeline system. In: Astronomical Data Analysis Software and Systems XVIII ASP Conference Series, vol. 411, p. 306 (2009) Davis, L.E.: An overview of the ALMA pipeline system. In: Astronomical Data Analysis Software and Systems XVIII ASP Conference Series, vol. 411, p. 306 (2009)
6.
Zurück zum Zitat Davis, L.E., Glendenning, B.E., Tody, D.: The ALMA prototype science pipeline. Astron. Data Anal. Softw. Syst. XIII 314, 89 (2004) Davis, L.E., Glendenning, B.E., Tody, D.: The ALMA prototype science pipeline. Astron. Data Anal. Softw. Syst. XIII 314, 89 (2004)
7.
Zurück zum Zitat Folk, M., Heber, G., Koziol, Q., Pourmal, E., Robinson, D.: An overview of the HDF5 technology suite and its applications. In: EDBT/ICDT Workshop on Array Databases, pp. 36–47 (2011) Folk, M., Heber, G., Koziol, Q., Pourmal, E., Robinson, D.: An overview of the HDF5 technology suite and its applications. In: EDBT/ICDT Workshop on Array Databases, pp. 36–47 (2011)
8.
Zurück zum Zitat Fridman, P.A., Baan, W.A.: RFI mitigation methods in radio astronomy. Astron. Astrophys. 378, 327–344 (2001)CrossRef Fridman, P.A., Baan, W.A.: RFI mitigation methods in radio astronomy. Astron. Astrophys. 378, 327–344 (2001)CrossRef
11.
Zurück zum Zitat Ma, X., Jiao, X., Campbell, M.T., Winslett, M.: Flexible and efficient parallel I/O for large-scale multi-component simulations. In: International Parallel and Distributed Processing Symposium (2003) Ma, X., Jiao, X., Campbell, M.T., Winslett, M.: Flexible and efficient parallel I/O for large-scale multi-component simulations. In: International Parallel and Distributed Processing Symposium (2003)
12.
Zurück zum Zitat Madhyastha, T.M., Reed, D.A.: Exploiting Global Input/Output Access Pattern Classification. In: Supercomputing, ACM/IEEE Conference (1997) Madhyastha, T.M., Reed, D.A.: Exploiting Global Input/Output Access Pattern Classification. In: Supercomputing, ACM/IEEE Conference (1997)
13.
Zurück zum Zitat Masui, K., et al.: A compression scheme for radio data in high performance computing. Astron. Comput. 12, 181–190 (2015)CrossRef Masui, K., et al.: A compression scheme for radio data in high performance computing. Astron. Comput. 12, 181–190 (2015)CrossRef
14.
Zurück zum Zitat McMullin, J.P., et al.: CASA architecture and applications. In: Astronomical Data Analysis Software and Systems XVI, Vol. 376 (2007) McMullin, J.P., et al.: CASA architecture and applications. In: Astronomical Data Analysis Software and Systems XVI, Vol. 376 (2007)
15.
Zurück zum Zitat Nan, R.: Five hundred meter aperture spherical radio telescope (FAST). Sci. China 49(2), 129–148 (2006)CrossRef Nan, R.: Five hundred meter aperture spherical radio telescope (FAST). Sci. China 49(2), 129–148 (2006)CrossRef
16.
Zurück zum Zitat Pence, W.D., Chiappetti, L., Page, C.G., Shaw, R.A., Stobie, E.: Definition of the flexible image transport system (FITS), version 3.0. Astron. Astrophys. 524, 10 (2010)CrossRef Pence, W.D., Chiappetti, L., Page, C.G., Shaw, R.A., Stobie, E.: Definition of the flexible image transport system (FITS), version 3.0. Astron. Astrophys. 524, 10 (2010)CrossRef
17.
Zurück zum Zitat Price, D.C., Barsdell, B.R., Greenhill, L.J.: HDFITS: porting the FITS data model to HDF5. Astron. Comput. 12, 212–220 (2015)CrossRef Price, D.C., Barsdell, B.R., Greenhill, L.J.: HDFITS: porting the FITS data model to HDF5. Astron. Comput. 12, 212–220 (2015)CrossRef
19.
Zurück zum Zitat Rodrigues, J.E., Rodriguez Bezos, J.E.: A graph model for parallel computation. Massachusetts Institute of Technology (1969) Rodrigues, J.E., Rodriguez Bezos, J.E.: A graph model for parallel computation. Massachusetts Institute of Technology (1969)
20.
Zurück zum Zitat Sanders, P.: Asynchronous scheduling of redundant disk array. IEEE Trans. Comput. 52(9), 1170–1184 (2000)CrossRef Sanders, P.: Asynchronous scheduling of redundant disk array. IEEE Trans. Comput. 52(9), 1170–1184 (2000)CrossRef
22.
Zurück zum Zitat Schaaf, R., Brazier, A., Jenness, T., Nikola, T., Shepherd, M.: A new HDF5 based raw data model for CCAT. Eprint Arxiv (2014) Schaaf, R., Brazier, A., Jenness, T., Nikola, T., Shepherd, M.: A new HDF5 based raw data model for CCAT. Eprint Arxiv (2014)
23.
Zurück zum Zitat Smith, S., Dunning, A., Bowen, M., Hellicar, A.D.: Analysis of the five-hundred-metre aperture spherical radio telescope with a 19-element multibeam feed. In: IEEE International Symposium on Antennas and Propagation, pp. 383–384 (2016) Smith, S., Dunning, A., Bowen, M., Hellicar, A.D.: Analysis of the five-hundred-metre aperture spherical radio telescope with a 19-element multibeam feed. In: IEEE International Symposium on Antennas and Propagation, pp. 383–384 (2016)
24.
Zurück zum Zitat Swinbank, J.D., et al.: The lofar transients pipeline. Astron. Comput. 11, 25–48 (2015)CrossRef Swinbank, J.D., et al.: The lofar transients pipeline. Astron. Comput. 11, 25–48 (2015)CrossRef
25.
Zurück zum Zitat Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Symposium on the Frontiers of Massively Parallel Computation (1999) Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Symposium on the Frontiers of Massively Parallel Computation (1999)
26.
Zurück zum Zitat Wells, W.D., Greisen, E.W., Harten, R.H.: FITS-a flexible image transport system. Astron. Astrophys. Suppl. Ser. 44, 363 (1981) Wells, W.D., Greisen, E.W., Harten, R.H.: FITS-a flexible image transport system. Astron. Astrophys. Suppl. Ser. 44, 363 (1981)
27.
Zurück zum Zitat Wu, C., et al.: DALiuGE: a graph execution framework for harnessing the astronomical data deluge. Astron. Comput. 20, 1–15 (2017)CrossRef Wu, C., et al.: DALiuGE: a graph execution framework for harnessing the astronomical data deluge. Astron. Comput. 20, 1–15 (2017)CrossRef
28.
Zurück zum Zitat Zichao, Y., et al.: An energy efficient storage system for astronomical observation data on dome A. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 33–46 (2015) Zichao, Y., et al.: An energy efficient storage system for astronomical observation data on dome A. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 33–46 (2015)
Metadaten
Titel
HDF5-Based I/O Optimization for Extragalactic HI Data Pipeline of FAST
verfasst von
Yiming Ji
Ce Yu
Jian Xiao
Shanjiang Tang
Hao Wang
Bo Zhang
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-38961-1_55