Skip to main content
Erschienen in:
Buchtitelbild

2017 | OriginalPaper | Buchkapitel

Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales

verfasst von : Ian Foster, Mark Ainsworth, Bryce Allen, Julie Bessac, Franck Cappello, Jong Youl Choi, Emil Constantinescu, Philip E. Davis, Sheng Di, Wendy Di, Hanqi Guo, Scott Klasky, Kerstin Kleese Van Dam, Tahsin Kurc, Qing Liu, Abid Malik, Kshitij Mehta, Klaus Mueller, Todd Munson, George Ostouchov, Manish Parashar, Tom Peterka, Line Pouchard, Dingwen Tao, Ozan Tugluk, Stefan Wild, Matthew Wolf, Justin M. Wozniak, Wei Xu, Shinjae Yoo

Erschienen in: Euro-Par 2017: Parallel Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A growing disparity between supercomputer computation speeds and I/O rates makes it increasingly infeasible for applications to save all results for offline analysis. Instead, applications must analyze and reduce data online so as to output only those results needed to answer target scientific question(s). This change in focus complicates application and experiment design and introduces algorithmic, implementation, and programming model challenges that are unfamiliar to many scientists and that have major implications for the design of various elements of supercomputer systems. We review these challenges and describe methods and tools that we are developing to enable experimental exploration of algorithmic, software, and system design alternatives.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Ahrens, J.: Increasing scientific data insights about exascale class simulations under power and storage constraints. IEEE Comput. Graph. Appl. 35(2), 8–11 (2015)MathSciNetCrossRef Ahrens, J.: Increasing scientific data insights about exascale class simulations under power and storage constraints. IEEE Comput. Graph. Appl. 35(2), 8–11 (2015)MathSciNetCrossRef
3.
Zurück zum Zitat Armstrong, T.G., Wozniak, J.M., Wilde, M., Foster, I.T.: Compiler techniques for massively scalable implicit task parallelism. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014 (2014) Armstrong, T.G., Wozniak, J.M., Wilde, M., Foster, I.T.: Compiler techniques for massively scalable implicit task parallelism. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014 (2014)
4.
Zurück zum Zitat Bauer, A.C., Abbasi, H., Ahrens, J., et al.: In situ methods, infrastructures, and applications on high performance computing platforms. Comput. Graph. Forum 35(3), 577–597 (2016)CrossRef Bauer, A.C., Abbasi, H., Ahrens, J., et al.: In situ methods, infrastructures, and applications on high performance computing platforms. Comput. Graph. Forum 35(3), 577–597 (2016)CrossRef
5.
Zurück zum Zitat Biswas, A., Dutta, S., Shen, H.W., Woodring, J.: An information-aware framework for exploring multivariate data sets. IEEE Trans. Vis. Comput. Graph. 19(12), 2683–2692 (2013)CrossRef Biswas, A., Dutta, S., Shen, H.W., Woodring, J.: An information-aware framework for exploring multivariate data sets. IEEE Trans. Vis. Comput. Graph. 19(12), 2683–2692 (2013)CrossRef
7.
Zurück zum Zitat Colella, P., Woodward, P.R.: The piecewise parabolic method (PPM) for gas-dynamical simulations. J. Comput. Phys. 54(1), 174–201 (1984)CrossRefMATH Colella, P., Woodward, P.R.: The piecewise parabolic method (PPM) for gas-dynamical simulations. J. Comput. Phys. 54(1), 174–201 (1984)CrossRefMATH
8.
Zurück zum Zitat Di, S., Cappello, F.: Fast error-bounded lossy HPC data compression with SZ. In: IEEE International Parallel and Distributed Processing Symposium, pp. 730–739 (2016) Di, S., Cappello, F.: Fast error-bounded lossy HPC data compression with SZ. In: IEEE International Parallel and Distributed Processing Symposium, pp. 730–739 (2016)
9.
Zurück zum Zitat Dorier, M., Dreher, M., Peterka, T., Wozniak, J.M., Antoniu, G., Raffin, B.: Lessons learned from building in situ coupling frameworks. In: 1st Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, pp. 19–24. ACM (2015) Dorier, M., Dreher, M., Peterka, T., Wozniak, J.M., Antoniu, G., Raffin, B.: Lessons learned from building in situ coupling frameworks. In: 1st Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, pp. 19–24. ACM (2015)
10.
Zurück zum Zitat Dreher, M., Raffin, B.: A flexible framework for asynchronous in situ and in transit analytics for scientific simulations. In: 14th International Symposium on Cluster, Cloud and Grid Computing, pp. 277–286. IEEE (2014) Dreher, M., Raffin, B.: A flexible framework for asynchronous in situ and in transit analytics for scientific simulations. In: 14th International Symposium on Cluster, Cloud and Grid Computing, pp. 277–286. IEEE (2014)
11.
Zurück zum Zitat Foster, I., Ananthakrishnan, R., Blaiszik, B., Chard, K., Osborn, R., Tuecke, S., Wilde, M., Wozniak, J.: Networking materials data: accelerating discovery at an experimental facility. In: Big Data and High Performance Computing, pp. 117–132. IOS Press (2015) Foster, I., Ananthakrishnan, R., Blaiszik, B., Chard, K., Osborn, R., Tuecke, S., Wilde, M., Wozniak, J.: Networking materials data: accelerating discovery at an experimental facility. In: Big Data and High Performance Computing, pp. 117–132. IOS Press (2015)
12.
Zurück zum Zitat Foster, I., Kohr Jr., D.R., Krishnaiyer, R., Choudhary, A.: Double standards: bringing task parallelism to HPF via the message passing interface. In: ACM/IEEE Conference on Supercomputing, pp. 36–36 (1996) Foster, I., Kohr Jr., D.R., Krishnaiyer, R., Choudhary, A.: Double standards: bringing task parallelism to HPF via the message passing interface. In: ACM/IEEE Conference on Supercomputing, pp. 36–36 (1996)
13.
Zurück zum Zitat Fryxell, B., Olson, K., Ricker, P., Timmes, F., et al.: FLASH: an adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. Astrophys. J. Suppl. Ser. 131(1), 273 (2000)CrossRef Fryxell, B., Olson, K., Ricker, P., Timmes, F., et al.: FLASH: an adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. Astrophys. J. Suppl. Ser. 131(1), 273 (2000)CrossRef
14.
Zurück zum Zitat Guo, H., He, W., Peterka, T., Shen, H.W., Collis, S., Helmus, J.: Finite-time Lyanpunov exponents and Lagrangian coherent structures in uncertain unsteady flows. IEEE Trans. Vis. Comput. Graph. 22(6), 1672–1682 (2016)CrossRef Guo, H., He, W., Peterka, T., Shen, H.W., Collis, S., Helmus, J.: Finite-time Lyanpunov exponents and Lagrangian coherent structures in uncertain unsteady flows. IEEE Trans. Vis. Comput. Graph. 22(6), 1672–1682 (2016)CrossRef
16.
Zurück zum Zitat Habib, S., Pope, A., Finkel, H., Frontiere, N., Heitmann, K., Daniel, D., Fasel, P., Morozov, V., Zagaris, G., Peterka, T., et al.: HACC: simulating sky surveys on state-of-the-art supercomputing architectures. New Astron. 42, 49–65 (2016)CrossRef Habib, S., Pope, A., Finkel, H., Frontiere, N., Heitmann, K., Daniel, D., Fasel, P., Morozov, V., Zagaris, G., Peterka, T., et al.: HACC: simulating sky surveys on state-of-the-art supercomputing architectures. New Astron. 42, 49–65 (2016)CrossRef
17.
Zurück zum Zitat Herbein, S., Matheny, M., Wezowicz, M., Krogel, J., Logan, J., Kim, J., Klasky, S., Taufer, M.: Performance impact of I/O on QMCPack simulations at the petascale and beyond. In: 16th International Conference on Computational Science and Engineering, pp. 92–99. IEEE (2013) Herbein, S., Matheny, M., Wezowicz, M., Krogel, J., Logan, J., Kim, J., Klasky, S., Taufer, M.: Performance impact of I/O on QMCPack simulations at the petascale and beyond. In: 16th International Conference on Computational Science and Engineering, pp. 92–99. IEEE (2013)
18.
21.
Zurück zum Zitat Iverson, J., Kamath, C., Karypis, G.: Fast and effective lossy compression algorithms for scientific datasets. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 843–856. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32820-6_83 CrossRef Iverson, J., Kamath, C., Karypis, G.: Fast and effective lossy compression algorithms for scientific datasets. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 843–856. Springer, Heidelberg (2012). doi:10.​1007/​978-3-642-32820-6_​83 CrossRef
22.
Zurück zum Zitat Jenkins, J., et al.: ALACRITY: analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying. In: Hameurlain, A., Küng, J., Wagner, R., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems X. LNCS, vol. 8220, pp. 95–114. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41221-9_4 CrossRef Jenkins, J., et al.: ALACRITY: analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying. In: Hameurlain, A., Küng, J., Wagner, R., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems X. LNCS, vol. 8220, pp. 95–114. Springer, Heidelberg (2013). doi:10.​1007/​978-3-642-41221-9_​4 CrossRef
23.
Zurück zum Zitat Koziol, Q., Podhorszki, N., Klasky, S., Liu, Q., Tian, Y., Parashar, M., Schwan, K., Wolf, M., Lakshminarasimhan, S.: ADIOS. In: High Performance Parallel I/O, pp. 203–213. Chapman and Hall/CRC (2014) Koziol, Q., Podhorszki, N., Klasky, S., Liu, Q., Tian, Y., Parashar, M., Schwan, K., Wolf, M., Lakshminarasimhan, S.: ADIOS. In: High Performance Parallel I/O, pp. 203–213. Chapman and Hall/CRC (2014)
24.
Zurück zum Zitat Ku, S., Chang, C., Adams, M., Cummings, J., Hinton, F., Keyes, D., Klasky, S., Lee, W., Lin, Z., Parker, S., et al.: Gyrokinetic particle simulation of neoclassical transport in the pedestal/scrape-off region of a Tokamak plasma. J. Phys: Conf. Ser. 46(1), 87 (2006) Ku, S., Chang, C., Adams, M., Cummings, J., Hinton, F., Keyes, D., Klasky, S., Lee, W., Lin, Z., Parker, S., et al.: Gyrokinetic particle simulation of neoclassical transport in the pedestal/scrape-off region of a Tokamak plasma. J. Phys: Conf. Ser. 46(1), 87 (2006)
26.
Zurück zum Zitat Lakshminarasimhan, S., Jenkins, J., Arkatkar, I., Gong, Z., Kolla, H., et al.: ISABELA-QA: query-driven analytics with ISABELA-compressed extreme-scale scientific data. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 1–11. ACM (2011). http://doi.acm.org/10.1145/2063384.2063425 Lakshminarasimhan, S., Jenkins, J., Arkatkar, I., Gong, Z., Kolla, H., et al.: ISABELA-QA: query-driven analytics with ISABELA-compressed extreme-scale scientific data. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 1–11. ACM (2011). http://​doi.​acm.​org/​10.​1145/​2063384.​2063425
27.
Zurück zum Zitat Lakshminarasimhan, S., Shah, N., Ethier, S., Ku, S.H., Chang, C.S., Klasky, S., Latham, R., Ross, R., Samatova, N.F.: ISABELA for effective in situ compression of scientific data. Concurr. Comput.: Pract. Exp. 25(4), 524–540 (2013)CrossRef Lakshminarasimhan, S., Shah, N., Ethier, S., Ku, S.H., Chang, C.S., Klasky, S., Latham, R., Ross, R., Samatova, N.F.: ISABELA for effective in situ compression of scientific data. Concurr. Comput.: Pract. Exp. 25(4), 524–540 (2013)CrossRef
28.
30.
Zurück zum Zitat Liu, Z., Wang, B., Wang, T., Tian, Y., Xu, C., Wang, Y., Yu, W., Cruz, C.A., Zhou, S., Clune, T., et al.: Profiling and improving I/O performance of a large-scale climate scientific application. In: 22nd IEEE International Conference on Computer Communication and Networks, pp. 1–7 (2013) Liu, Z., Wang, B., Wang, T., Tian, Y., Xu, C., Wang, Y., Yu, W., Cruz, C.A., Zhou, S., Clune, T., et al.: Profiling and improving I/O performance of a large-scale climate scientific application. In: 22nd IEEE International Conference on Computer Communication and Networks, pp. 1–7 (2013)
31.
Zurück zum Zitat Malakar, P., Vishwanath, V., Munson, T., Knight, C., Hereld, M., Leyffer, S., Papka, M.E.: Optimal scheduling of in-situ analysis for large-scale scientific simulations. In: ACM International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015 (2015) Malakar, P., Vishwanath, V., Munson, T., Knight, C., Hereld, M., Leyffer, S., Papka, M.E.: Optimal scheduling of in-situ analysis for large-scale scientific simulations. In: ACM International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015 (2015)
33.
Zurück zum Zitat Perilla, J.R., Goh, B.C., Cassidy, C.K., Liu, B., Bernardi, R.C., Rudack, T., Yu, H., Wu, Z., Schulten, K.: Molecular dynamics simulations of large macromolecular complexes. Curr. Opin. Struct. Biol. 31, 64–74 (2015)CrossRef Perilla, J.R., Goh, B.C., Cassidy, C.K., Liu, B., Bernardi, R.C., Rudack, T., Yu, H., Wu, Z., Schulten, K.: Molecular dynamics simulations of large macromolecular complexes. Curr. Opin. Struct. Biol. 31, 64–74 (2015)CrossRef
34.
Zurück zum Zitat Peterka, T., Kwan, J., Pope, A., Finkel, H., Heitmann, K., Habib, S., Wang, J., Zagaris, G.: Meshing the universe: integrating analysis in cosmological simulations. In: Ultrascale Visualization Workshop, SC 2012, pp. 186–195. IEEE (2012) Peterka, T., Kwan, J., Pope, A., Finkel, H., Heitmann, K., Habib, S., Wang, J., Zagaris, G.: Meshing the universe: integrating analysis in cosmological simulations. In: Ultrascale Visualization Workshop, SC 2012, pp. 186–195. IEEE (2012)
35.
Zurück zum Zitat Peterka, T., Ross, R., Nouanesengsey, B., Lee, T.Y., Shen, H.W., Kendall, W., Huang, J.: A study of parallel particle tracing for steady-state and time-varying flow fields. In: IEEE International Parallel and Distributed Processing Symposium, pp. 580–591 (2011) Peterka, T., Ross, R., Nouanesengsey, B., Lee, T.Y., Shen, H.W., Kendall, W., Huang, J.: A study of parallel particle tracing for steady-state and time-varying flow fields. In: IEEE International Parallel and Distributed Processing Symposium, pp. 580–591 (2011)
36.
Zurück zum Zitat Schendel, E.R., Jin, Y., Shah, N., Chen, J., Chang, C.S., Ku, S.H., Ethier, S., Klasky, S., Latham, R., Ross, R., Samatova, N.F.: ISOBAR preconditioner for effective and high-throughput lossless data compression. In: 28th International Conference on Data Engineering, pp. 138–149, April 2012 Schendel, E.R., Jin, Y., Shah, N., Chen, J., Chang, C.S., Ku, S.H., Ethier, S., Klasky, S., Latham, R., Ross, R., Samatova, N.F.: ISOBAR preconditioner for effective and high-throughput lossless data compression. In: 28th International Conference on Data Engineering, pp. 138–149, April 2012
37.
Zurück zum Zitat Shekhar, A., Nomura, K.I., Kalia, R.K., Nakano, A., Vashishta, P.: Nanobubble collapse on a silica surface in water: billion-atom reactive molecular dynamics simulations. Phys. Rev. Lett. 111(18), 184503 (2013)CrossRef Shekhar, A., Nomura, K.I., Kalia, R.K., Nakano, A., Vashishta, P.: Nanobubble collapse on a silica surface in water: billion-atom reactive molecular dynamics simulations. Phys. Rev. Lett. 111(18), 184503 (2013)CrossRef
38.
Zurück zum Zitat Slawinska, M., Clark, M., Wolf, M., Bode, T., Zou, H., Laguna, P., Logan, J., Kinsey, M., Klasky, S.: A Maya use case: adaptable scientific workflows with ADIOS for general relativistic astrophysics. In: ACM Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery, p. 54 (2013) Slawinska, M., Clark, M., Wolf, M., Bode, T., Zou, H., Laguna, P., Logan, J., Kinsey, M., Klasky, S.: A Maya use case: adaptable scientific workflows with ADIOS for general relativistic astrophysics. In: ACM Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery, p. 54 (2013)
39.
Zurück zum Zitat Tao, D., Di, S., Chen, Z., Cappello, F.: Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In: IEEE International Parallel and Distributed Processing Symposium (2017) Tao, D., Di, S., Chen, Z., Cappello, F.: Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In: IEEE International Parallel and Distributed Processing Symposium (2017)
41.
Zurück zum Zitat Windstorm, P.: Fixed-rate compressed floating-point arrays. IEEE Trans. Vis. Comput. Graph. 20(12), 2674–2683 (2014)CrossRef Windstorm, P.: Fixed-rate compressed floating-point arrays. IEEE Trans. Vis. Comput. Graph. 20(12), 2674–2683 (2014)CrossRef
42.
Zurück zum Zitat Wozniak, J.M., Armstrong, T.G., Wilde, M., Katz, D.S., Lusk, E., Foster, I.T.: Swift/T: Scalable data flow programming for distributed-memory task-parallel applications. In: 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 95–102 (2013) Wozniak, J.M., Armstrong, T.G., Wilde, M., Katz, D.S., Lusk, E., Foster, I.T.: Swift/T: Scalable data flow programming for distributed-memory task-parallel applications. In: 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 95–102 (2013)
43.
Zurück zum Zitat Wu, L., Wu, K., Sim, A., Churchill, M., Choi, J.Y., Stathopoulos, A., Chang, C., Klasky, S.: Towards real-time detection and tracking of blob-filaments in fusion plasma big data. IEEE Trans. Big Data 2(3), 262–275 (2016)CrossRef Wu, L., Wu, K., Sim, A., Churchill, M., Choi, J.Y., Stathopoulos, A., Chang, C., Klasky, S.: Towards real-time detection and tracking of blob-filaments in fusion plasma big data. IEEE Trans. Big Data 2(3), 262–275 (2016)CrossRef
44.
Zurück zum Zitat Zhao, Y., Wilde, M., Foster, I.: Virtual data language: a typed workflow notation for diversely structured scientific data. In: Taylor, I., Deelman, E., Gannon, D., Shields, M. (eds.) Workflows for e-Science, pp. 258–278. Springer, London (2007). doi:10.1007/978-1-84628-757-2_17 CrossRef Zhao, Y., Wilde, M., Foster, I.: Virtual data language: a typed workflow notation for diversely structured scientific data. In: Taylor, I., Deelman, E., Gannon, D., Shields, M. (eds.) Workflows for e-Science, pp. 258–278. Springer, London (2007). doi:10.​1007/​978-1-84628-757-2_​17 CrossRef
Metadaten
Titel
Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales
verfasst von
Ian Foster
Mark Ainsworth
Bryce Allen
Julie Bessac
Franck Cappello
Jong Youl Choi
Emil Constantinescu
Philip E. Davis
Sheng Di
Wendy Di
Hanqi Guo
Scott Klasky
Kerstin Kleese Van Dam
Tahsin Kurc
Qing Liu
Abid Malik
Kshitij Mehta
Klaus Mueller
Todd Munson
George Ostouchov
Manish Parashar
Tom Peterka
Line Pouchard
Dingwen Tao
Ozan Tugluk
Stefan Wild
Matthew Wolf
Justin M. Wozniak
Wei Xu
Shinjae Yoo
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-64203-1_1

Premium Partner