Skip to main content
Erschienen in: Cluster Computing 1/2015

01.03.2015

In-situ feature-based objects tracking for data-intensive scientific and enterprise analytics workflows

verfasst von: Solomon Lasluisa, Fan Zhang, Tong Jin, Ivan Rodero, Hoang Bui, Manish Parashar

Erschienen in: Cluster Computing | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Emerging scientific simulations on leadership class systems are generating huge amounts of data and processing this data in an efficient and timely manner is critical for generating insights from the simulations. However, the increasing gap between computation and disk I/O speeds makes traditional data analytics pipelines based on post-processing cost prohibitive and often infeasible. In this paper, we investigate an alternate approach that aims to bring the analytics closer to the data using in-situ execution of data analysis operations. Specifically, we present the design, implementation and evaluation of a framework that can support in-situ feature-based objects tracking on distributed scientific datasets. Central to this framework is a scalable decentralized and online clustering, a cluster tracking algorithm, which executes in-situ (on different cores) in parallel with the simulation processes, and retrieves data from the simulations directly via on-chip shared memory. The results from our experimental evaluation demonstrate that the in-situ approach significantly reduces the cost of data movement, that the presented framework can support scalable feature-based objects tracking, and that it can be effectively used for in-situ analytics in large scale simulations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Childs, H.: Architectural challenges and solutions for petascale postprocessing. J. Phys. 78(1), 12 (2007) Childs, H.: Architectural challenges and solutions for petascale postprocessing. J. Phys. 78(1), 12 (2007)
2.
Zurück zum Zitat Gamell, M., Rodero, I., Parashar, M., Poole, S.: “Exploring energy and performance behaviors of data-intensive scientific workflows on systems with deep memory hierarchies”. In: Proceedings of the 20th International Conference on High Performance Computing (HiPC), pp. 1–10. (2013) Gamell, M., Rodero, I., Parashar, M., Poole, S.: “Exploring energy and performance behaviors of data-intensive scientific workflows on systems with deep memory hierarchies”. In: Proceedings of the 20th International Conference on High Performance Computing (HiPC), pp. 1–10. (2013)
3.
Zurück zum Zitat Zhang, F., Docan, C., Parashar, M., Klasky, S.: “Dads: a dynamic and adaptive data space for interacting parallel applications”. In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2010), Marina Del Rey (2010) Zhang, F., Docan, C., Parashar, M., Klasky, S.: “Dads: a dynamic and adaptive data space for interacting parallel applications”. In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2010), Marina Del Rey (2010)
4.
Zurück zum Zitat Bennett, J.C., Abbasi, H., Bremer, P.-T., Grout, R., Gyulassy, A., Jin, T., Klasky, S., Kolla, H., Parashar, M., Pascucci, V., Pebay, P., Thompson, D., Yu, H., Zhang, F., Chen, J.: “Combining in-situ and in-transit processing to enable extreme-scale scientific analysis”. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’12, 2012, pp. 49:1–49:9 Bennett, J.C., Abbasi, H., Bremer, P.-T., Grout, R., Gyulassy, A., Jin, T., Klasky, S., Kolla, H., Parashar, M., Pascucci, V., Pebay, P., Thompson, D., Yu, H., Zhang, F., Chen, J.: “Combining in-situ and in-transit processing to enable extreme-scale scientific analysis”. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’12, 2012, pp. 49:1–49:9
5.
Zurück zum Zitat Gamell, M., Rodero, I., Parashar, M., Bennett, J., et al.: “Exploring power behaviors and tradeoffs of in-situ data analytics”. In: International Conferencce on High Performance Computing Networking, Storage and Analysis (SC), pp. 1–12. Denver, Nov 2013 Gamell, M., Rodero, I., Parashar, M., Bennett, J., et al.: “Exploring power behaviors and tradeoffs of in-situ data analytics”. In: International Conferencce on High Performance Computing Networking, Storage and Analysis (SC), pp. 1–12. Denver, Nov 2013
6.
Zurück zum Zitat Quiroz, A., Parashar, M., Gnanasambandam, N., Sharma, N.: “Design and evaluation of decentralized online clustering”. ACM Trans. Auton. Adapt. Syst. 7(3), 34:1–34:31 (2012). doi:10.1145/2348832.2348837 CrossRef Quiroz, A., Parashar, M., Gnanasambandam, N., Sharma, N.: “Design and evaluation of decentralized online clustering”. ACM Trans. Auton. Adapt. Syst. 7(3), 34:1–34:31 (2012). doi:10.​1145/​2348832.​2348837 CrossRef
7.
Zurück zum Zitat Quiroz, A., Gnanasambandam, N., Parashar, M., Sharma, N.: Robust clustering analysis for the management of self-monitoring distributed systems. Clust. Comput. 12(1), 73–85 (Mar. 2009) Quiroz, A., Gnanasambandam, N., Parashar, M., Sharma, N.: Robust clustering analysis for the management of self-monitoring distributed systems. Clust. Comput. 12(1), 73–85 (Mar. 2009)
8.
Zurück zum Zitat Chen, J.H., Choudhary, A., de Supinski, B., DeVries, M., Hawkes, E.R., Klasky, S., Liao, W.K., Ma, K.L., Mellor-Crummey, J., Podhorski, N., Sankaran, R., Shende, S., Yoo, C.S.: Terascale direct numerical simulations of turbulent combustion using s3d. Comput. Sci. Discov. 2, 1–31 (2009)CrossRef Chen, J.H., Choudhary, A., de Supinski, B., DeVries, M., Hawkes, E.R., Klasky, S., Liao, W.K., Ma, K.L., Mellor-Crummey, J., Podhorski, N., Sankaran, R., Shende, S., Yoo, C.S.: Terascale direct numerical simulations of turbulent combustion using s3d. Comput. Sci. Discov. 2, 1–31 (2009)CrossRef
9.
10.
Zurück zum Zitat Podhorszki, N., Klasky, S., Liu, Q., Docan, C., Parashar, M., Abbasi, H., Lofstead, J., Schwan, K., Wolf, M., Zheng, F., Cummings, J.: “Plasma fusion code coupling using scalable i/o services and scientific workflows”. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, ser. WORKS ’09, pp. 8:1–8:9. ACM, New York, (2009) doi:10.1145/1645164.1645172 Podhorszki, N., Klasky, S., Liu, Q., Docan, C., Parashar, M., Abbasi, H., Lofstead, J., Schwan, K., Wolf, M., Zheng, F., Cummings, J.: “Plasma fusion code coupling using scalable i/o services and scientific workflows”. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, ser. WORKS ’09, pp. 8:1–8:9. ACM, New York, (2009) doi:10.​1145/​1645164.​1645172
11.
Zurück zum Zitat Pak, A., Paroubek, P.: “Twitter as a corpus for sentiment analysis and opinion mining”. In: LREC, Baton Rouge (2010) Pak, A., Paroubek, P.: “Twitter as a corpus for sentiment analysis and opinion mining”. In: LREC, Baton Rouge (2010)
12.
Zurück zum Zitat Zhang, F., Docan, C., Parashar, M., Klasky, S., Podhorszki, N., Abbasi, H.: “Enabling in-situ execution of coupled scientific workflow on multi-core platform”. In: Proceedings of 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS’12), (2012) Zhang, F., Docan, C., Parashar, M., Klasky, S., Podhorszki, N., Abbasi, H.: “Enabling in-situ execution of coupled scientific workflow on multi-core platform”. In: Proceedings of 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS’12), (2012)
13.
Zurück zum Zitat Quiroz, A.: Decentralized online clustering for supporting autonomic management of distributed systems. Ph.D in Electrical and Computer Engineering, Rutgers University, (2010) Quiroz, A.: Decentralized online clustering for supporting autonomic management of distributed systems. Ph.D in Electrical and Computer Engineering, Rutgers University, (2010)
14.
Zurück zum Zitat Schmidt, C., Parashar, M.: “Flexible information discovery in decentralized distributed systems”. In: Proceedings of the 12th High Performance Distributed Computing (HPDC), pp. 226–235. (2003) Schmidt, C., Parashar, M.: “Flexible information discovery in decentralized distributed systems”. In: Proceedings of the 12th High Performance Distributed Computing (HPDC), pp. 226–235. (2003)
15.
Zurück zum Zitat Yu, H., Wang, C., Grout, R., Chen, J., Ma, K.-L.: In situ visualization for large-scale combustion simulations. IEEE Comput. Graph. Appl. 30(3), 45–57 (2010)CrossRef Yu, H., Wang, C., Grout, R., Chen, J., Ma, K.-L.: In situ visualization for large-scale combustion simulations. IEEE Comput. Graph. Appl. 30(3), 45–57 (2010)CrossRef
16.
Zurück zum Zitat Kim, J., Abbasi, H., Chacon, L., Docan, C., Klasky, S., Liu, Q., Podhorszki, N., Shoshani, A., Wu, K.: “Parallel in situ indexing for data-intensive computing”. In: Proceedings of IEEE Symposium on Large Data Analysis and Visualization (LDAV’11), Oct (2011) Kim, J., Abbasi, H., Chacon, L., Docan, C., Klasky, S., Liu, Q., Podhorszki, N., Shoshani, A., Wu, K.: “Parallel in situ indexing for data-intensive computing”. In: Proceedings of IEEE Symposium on Large Data Analysis and Visualization (LDAV’11), Oct (2011)
17.
Zurück zum Zitat Whitlock, B., Favre, J.M., Meredith, J.S.: “Parallel in situ coupling of simulation with a fully featured visualization system”. In: Proceedings of 11th Eurographics Symposium on Parallel Graphics and Visualization (EGPGV’11), Apr (2011) Whitlock, B., Favre, J.M., Meredith, J.S.: “Parallel in situ coupling of simulation with a fully featured visualization system”. In: Proceedings of 11th Eurographics Symposium on Parallel Graphics and Visualization (EGPGV’11), Apr (2011)
18.
Zurück zum Zitat Fabian, N., Moreland, K., Thompson, D., Bauer, A., Marion, P., Gevecik, B., Rasquin, M., Jansen, K.: “The paraview coprocessing library: a scalable, general purpose in situ visualization library”. In Proceedings of IEEE Symposium on Large Data Analysis and Visualization (LDAV’11), Oct (2011) Fabian, N., Moreland, K., Thompson, D., Bauer, A., Marion, P., Gevecik, B., Rasquin, M., Jansen, K.: “The paraview coprocessing library: a scalable, general purpose in situ visualization library”. In Proceedings of IEEE Symposium on Large Data Analysis and Visualization (LDAV’11), Oct (2011)
19.
Zurück zum Zitat Abbasi, H., Wolf, M., Eisenhauer, G., Klasky, S., Schwan, K., Zheng, F.: “Datastager: scalable data staging services for petascale applications”. In: Proceedings of 18th International Symposium on High Performance Distributed Computing (HPDC’09), (2009) Abbasi, H., Wolf, M., Eisenhauer, G., Klasky, S., Schwan, K., Zheng, F.: “Datastager: scalable data staging services for petascale applications”. In: Proceedings of 18th International Symposium on High Performance Distributed Computing (HPDC’09), (2009)
20.
Zurück zum Zitat Zheng, F., Abbasi, H., Docan, C., Lofstead, J., Klasky, S., Liu, Q., Parashar, M., Podhorszki, N., Schwan, K., Wolf, M.: “PreDatA - preparatory data analytics on peta-scale machines”. In: Proceedings of 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS’10), Apr (2010) Zheng, F., Abbasi, H., Docan, C., Lofstead, J., Klasky, S., Liu, Q., Parashar, M., Podhorszki, N., Schwan, K., Wolf, M.: “PreDatA - preparatory data analytics on peta-scale machines”. In: Proceedings of 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS’10), Apr (2010)
21.
Zurück zum Zitat Abbasi, H., Eisenhauer, G., Wolf, M., Schwan, K., Klasky, S.: “Just in time: adding value to the IO pipelines of high performance applications with JIT staging”. In: Proceedings 20th International Symposium on High Performance Distributed Computing (HPDC’11), June (2011) Abbasi, H., Eisenhauer, G., Wolf, M., Schwan, K., Klasky, S.: “Just in time: adding value to the IO pipelines of high performance applications with JIT staging”. In: Proceedings 20th International Symposium on High Performance Distributed Computing (HPDC’11), June (2011)
22.
Zurück zum Zitat Docan, C., Parashar, M., Cummings, J., Klasky, S.: “Moving the code to the data - dynamic code deployment using active spaces”. In: Proceedings of 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS’11), May (2011) Docan, C., Parashar, M., Cummings, J., Klasky, S.: “Moving the code to the data - dynamic code deployment using active spaces”. In: Proceedings of 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS’11), May (2011)
23.
Zurück zum Zitat Vishwanath, V., Hereld, M., Papka, M.: “Toward simulation-time data analysis and i/o acceleration on leadership-class systems”. In: Proceedings of IEEE Symposium on Large Data Analysis and Visualization (LDAV’11), Oct 2011 Vishwanath, V., Hereld, M., Papka, M.: “Toward simulation-time data analysis and i/o acceleration on leadership-class systems”. In: Proceedings of IEEE Symposium on Large Data Analysis and Visualization (LDAV’11), Oct 2011
24.
Zurück zum Zitat Gelernter, D.: Generative communication in Linda. ACM Trans. Programm. Lang. Syst. 7(1), 80–112 (1985)CrossRefMATH Gelernter, D.: Generative communication in Linda. ACM Trans. Programm. Lang. Syst. 7(1), 80–112 (1985)CrossRefMATH
25.
Zurück zum Zitat Zhang, L., Parashar, M.: “A dynamic geometry-based shared space interaction framework for parallel scientific applications”. In: Proceedings of the 11th International Conference on High Performance Computing (HiPC’04), 2004 Zhang, L., Parashar, M.: “A dynamic geometry-based shared space interaction framework for parallel scientific applications”. In: Proceedings of the 11th International Conference on High Performance Computing (HiPC’04), 2004
26.
Zurück zum Zitat “Enabling efficient and flexible coupling of parallel scientific applications”. In: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS’06), 2006 “Enabling efficient and flexible coupling of parallel scientific applications”. In: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS’06), 2006
27.
Zurück zum Zitat Docan, C., Parashar, M., Klasky, S.: “DataSpaces: an interaction and coordination framework for coupled simulation workflows”. In: Proceedings of 19th International Symposium on High Performance and Distributed Computing (HPDC’10), June 2010 Docan, C., Parashar, M., Klasky, S.: “DataSpaces: an interaction and coordination framework for coupled simulation workflows”. In: Proceedings of 19th International Symposium on High Performance and Distributed Computing (HPDC’10), June 2010
28.
Zurück zum Zitat Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)CrossRef Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)CrossRef
29.
Zurück zum Zitat Charikar, M., O’Callaghan, L., Panigrahy, R.: “Better streaming algorithms for clustering problems”. In: Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, pp. 30–39. (2003) Charikar, M., O’Callaghan, L., Panigrahy, R.: “Better streaming algorithms for clustering problems”. In: Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, pp. 30–39. (2003)
30.
Zurück zum Zitat Aggarwal, C.C., Watson, T.J., Ctr, R., Han, J., Wang, J., Yu, P.S.: “A framework for clustering evolving data streams”. In: VLDB, pp. 81–92. (2003) Aggarwal, C.C., Watson, T.J., Ctr, R., Han, J., Wang, J., Yu, P.S.: “A framework for clustering evolving data streams”. In: VLDB, pp. 81–92. (2003)
31.
Zurück zum Zitat O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: “Streaming-data algorithms for high-quality clustering”. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE) pp. 0685-0685. IEEE Computer Society (2013) O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: “Streaming-data algorithms for high-quality clustering”. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE) pp. 0685-0685. IEEE Computer Society (2013)
32.
Zurück zum Zitat Csernel, B., Clerot, F., Hbrail, G.: “Streamsamp: datastream clustering over tilted windows through sampling”. In: ECML PKDD 2006 Workshop on Knowledge Discovery from Data Streams, (2006) Csernel, B., Clerot, F., Hbrail, G.: “Streamsamp: datastream clustering over tilted windows through sampling”. In: ECML PKDD 2006 Workshop on Knowledge Discovery from Data Streams, (2006)
33.
Zurück zum Zitat Abrantes, A.J.,Marques, J.S.: “A method for dynamic clustering of data”. In: British Machine Vision Conference, (1998) Abrantes, A.J.,Marques, J.S.: “A method for dynamic clustering of data”. In: British Machine Vision Conference, (1998)
34.
Zurück zum Zitat Silver, D., Wang, X.: Tracking and visualizing turbulent 3d features. IEEE Trans. Visual. Comput. Graph. 3(2), 129–141 (1997) Silver, D., Wang, X.: Tracking and visualizing turbulent 3d features. IEEE Trans. Visual. Comput. Graph. 3(2), 129–141 (1997)
35.
Zurück zum Zitat Chen, J., Silver, D., Parashar, M.: “Real-time feature extraction and tracking in a computational steering environment”. In: Proceedings of Advanced Simulations Technologies Conference (ASTC’03), (2003) Chen, J., Silver, D., Parashar, M.: “Real-time feature extraction and tracking in a computational steering environment”. In: Proceedings of Advanced Simulations Technologies Conference (ASTC’03), (2003)
Metadaten
Titel
In-situ feature-based objects tracking for data-intensive scientific and enterprise analytics workflows
verfasst von
Solomon Lasluisa
Fan Zhang
Tong Jin
Ivan Rodero
Hoang Bui
Manish Parashar
Publikationsdatum
01.03.2015
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 1/2015
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-014-0396-6

Weitere Artikel der Ausgabe 1/2015

Cluster Computing 1/2015 Zur Ausgabe