research-article

An Intelligent Parallel Distributed Streaming Framework for near Real-time Science Sensors and High-Resolution Medical Images

Authors:
Samit Shivadekar

UMBC, United States of America

UMBC, United States of America
View Profile

,
Jayalakshmi Mangalagiri

UMBC, United States of America

UMBC, United States of America
View Profile

,
Phuong Nguyen

UMBC, OpenKneck Inc, United States of America

UMBC, OpenKneck Inc, United States of America
View Profile

,
David Chapman

UMBC, United States of America

UMBC, United States of America
View Profile

,
Milton Halem

UMBC, United States of America

UMBC, United States of America
View Profile

,
Rahul Gite

UMBC, United States of America

UMBC, United States of America
View Profile

ICPP Workshops '21: 50th International Conference on Parallel Processing WorkshopAugust 2021Article No.: 7Pages 1–9https://doi.org/10.1145/3458744.3474039

Published:23 September 2021Publication History

ICPP Workshops '21: 50th International Conference on Parallel Processing Workshop

Pages 1–9

ABSTRACT

Our goals are to address challenges such as latency, scalability, throughput and heterogeneous data sources of streaming analytics and deep learning pipelines in science sensors and medical imaging applications. We present a prototype Intelligent Parallel Distributed Streaming Framework (IPDSF) that is capable of distributed streaming processing as well as performing distributed deep training in batch mode. IPDSF is designed to run streaming Artificial Intelligent (AI) analytic tasks using data parallelism including partitions of multiple streams of short time sensing data and high-resolution 3D medical images, and fine grain tasks distribution. We will show the implementation of IPDSF for two real world applications, (i) an Air Quality Index based on near real time streaming of aerosol Lidar backscatter and (ii) data generation of Covid-19 Computing Tomography (CT) scans using deep learning. We evaluate the latency, throughput, scalability, and quantitative evaluation of training and prediction compared against a baseline single instance. As the results, IPDSF scales to process thousands of streaming science sensors in parallel for Air Quality Index application. IPDSF uses novel 3D conditional Generative Adversarial Network (cGAN) training using parallel distributed Graphic Processing Units (GPU) nodes to generate realistic 3D high resolution Computed Tomography scans of Covid-19 patient lungs. We will show that IPDSF can reduce cGAN training time linearly with the number of GPUs.

References

A. A. Awan, A. Jain, C.-H. Chu, H. Subramoni, and D. Panda, “Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects,” in Hot Interconnects 26 (HotI ’19), August 2019.Google ScholarCross Ref
“AmazonKinesis,”https://aws.amazon.com/kinesis/data-streams/.Google Scholar
Anthony, Q., Awan, A. A., Jain, A., Subramoni, H., & Panda, D. K. D. K. (2020). Efficient training of semantic image segmentation on summit using horovod and MVAPICH2-GDR. 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 1015–1023. IEEE..Google ScholarCross Ref
B. Yadranjiaghdam, S. Yasrobi, and N. Tabrizi, “Developing a real-time data analytics framework for twitter streaming data,” in 2017 IEEE International Congress on Big Data (BigData Congress), June 2017, pp. 329–336.Google ScholarCross Ref
D. Tiwari, S. Gupta, and S. S. Vazhkudai, “Lazy checkpointing: Exploiting temporal locality in failures to mitigate checkpointing overheads on extreme-scale systems,” International Conference on Dependable Systems and Networks (DSN), 2014.Google ScholarDigital Library
H. Ye, Z. Wu, R.-W. Zhao, X. Wang, Y.-G. Jiang, and X. Xue, “Evaluating two-stream cnn for video classification,” in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ser. ICMR ’15. New York, NY, USA: ACM, 2015, pp. 435–442. [Online]. Available:http://doi.acm.org/10.1145/2671188.2749406Google ScholarDigital Library
Ichinose, A., Takefusa, A., Nakada, H., & Oguchi, M. (2017). A study of a video analysis framework using Kafka and spark streaming. 2017 IEEE International Conference on Big Data (Big Data), 2396–2401. IEEE.Google ScholarCross Ref
J. Read, F. Perez-Cruz, and A. Bifet, “Deep learning in partially labeled data streams,” in Proceedings of the 30th Annual ACM Symposium on Applied Computing, ser. SAC ’15. New York, NY, USA: ACM, 2015, pp. 954–959. [Online].Available:http://doi.acm.org/10.1145/2695664.2695871Google ScholarDigital Library
Tanaka, K., Arikawa, Y., Ito, T., Morita, K., Nemoto, N., Miura, F., … Sakamoto, T. (2020). Communication-efficient distributed deep learning with GPU-FPGA heterogeneous computing. 2020 IEEE Symposium on High-Performance Interconnects (HOTI), 43–46. IEEEGoogle ScholarCross Ref
Tudoran, R., Nano, O., Santos, I., Costan, A., Soncu, H., Bougé, L., & Antoniu, G. (2014). JetStream: Enabling high performance event streaming across cloud data-centers. Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems - DEBS ’14. New York, New York, USA: ACM Press.Google ScholarDigital Library
Marcu, O.-C., Costan, A., Antoniu, G., Perez-Hernandez, M., Nicolae, B., Tudoran, R., & Bortoli, S. (2018). KerA: Scalable Data Ingestion for Stream Processing. 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), 1480–1485. IEEE.Google Scholar
A. Sergeev and M. Del Balso, Horovod: Fast and Easy Distributed Deep Learning in TensorFlow, arXiv:1802.05799v3, Feb. 21, 2018.Google Scholar
Tiwari, D., Gupta, S., Gallarno, G., Rogers, J., & Maxwell, D. (2015). Reliability lessons learned from GPU experience with the Titan supercomputer at Oak Ridge leadership computing facility. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC ’15. New York, New York, USA: ACM Press.Google ScholarDigital Library
J. Sleeman, Z. Yang, V.Caicedo, M. Halem, B. Demoz, R. Delgado, “A Deep Machine Learning Approach for LIDAR Based Boundary Layer Height Detection”, International Geoscience and Remote Sensing Symposium (IGARSS).Google Scholar
P.Nguyen and M.Halem “Deep Learning Models for Predicting CO2 Flux Employing Multivariate Time Series” SIGKDD MileTS, Alaska 2019.Google Scholar
J. Mangalagiri, D. Chapman, A. Gangopadhyay, Y. Yesha, J. Galita, S. Menon, Y. Yesha, B. Saboury, M. Morris, P. Nguyen, “Toward Generating Synthetic CT Volumes using a 3D-Conditional Generative Adversarial Network”, The 2020 International Conference on Computational Science and Computational Intelligence Symposium on Health Informatics and Medical Systems (CSCI-ISHI), Las Vegas 2020.Google ScholarCross Ref
Nokleby, M., Raja, H., & Bajwa, W. U. (2020). Scaling-up distributed processing of data streams for machine learning. Proceedings of the IEEE. Institute of Electrical and Electronics Engineers, 1–29.Google ScholarCross Ref
Phuong Nguyen, Samit Shivadekar, Sai Sree Chukkapalli, Milton Halem. Satellite Data Fusion of Multiple Observed XCO2 using Compressive Sensing and Deep Learning. IEEE IGARSS 2020Google Scholar
Y. Mirsky, T. Mahler, I. Shelef, and Y. Elovici, “CT-GAN: Malicious tampering of 3D medical imagery using deep learning,” In 28th {USENIX} Security Symposium ({USENIX} Security 19) (pp. 461-478), 2019.Google Scholar
Efremov, M. A., Kholod, I. I., & Kolpaschikov, M. A. (2021). Java Federated Learning framework architecture. 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus), 306–309.Google ScholarCross Ref
Kim, D., Park, H., & Choi, J. K. (2021). Optimal load allocation for coded distributed computation in heterogeneous clusters. IEEE Transactions on Communications, 69(1), 44–58.Google ScholarCross Ref
Reyes-Anastacio, H. G., Gonzalez-Compean, J. L., Sosa-Sosa, V. J., Carretero, J., & Garcia-Blas, J. (2020). Kulla, a container-centric construction model for building infrastructure-agnostic distributed and parallel applications. The Journal of Systems and Software, 168(110665), 110665.Google ScholarCross Ref

Index Terms

An Intelligent Parallel Distributed Streaming Framework for near Real-time Science Sensors and High-Resolution Medical Images

Index terms have been assigned to the content through auto-classification.

Recommendations

Analysis of high-resolution reconstruction of medical images based on deep convolutional neural networks in lung cancer diagnostics
Highlights
- The diagnostic effect of 64-slice spiral CT and MRI high-resolution images based on deep CNN in lung cancer is studied.
Abstract Background and objective
To study the diagnostic effect of 64-slice spiral CT and MRI high-resolution images based on deep convolutional neural networks(CNN) in lung cancer.
Methods
In this ...
Read More
Towards Scalable Parallel Training of Deep Neural Networks
MLHPC'17: Proceedings of the Machine Learning on HPC Environments

We propose a new framework for parallelizing deep neural network training that maximize the amount of data that is ingested by the training algorithm. Our proposed framework called Livermore Tournament Fast Batch Learning (LTFB) targets large-scale data ...
Read More
High Resolution Medical Image Segmentation Using Data-Swapping Method
Medical Image Computing and Computer Assisted Intervention – MICCAI 2019
Abstract
Deep neural network models used for medical image segmentation are large because they are trained with high-resolution three-dimensional (3D) images. Graphics processing units (GPUs) are widely used to accelerate training. However, the memory on a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICPP Workshops '21: 50th International Conference on Parallel Processing Workshop
August 2021
314 pages
ISBN:9781450384414
DOI:10.1145/3458744

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 September 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Computed Tomography cGAN
Data Parallelism
Deep Learning
Distributed streaming
Lidar backscatter
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate91of313submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 111
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

An Intelligent Parallel Distributed Streaming Framework for near Real-time Science Sensors and High-Resolution Medical Images

ICPP Workshops '21: 50th International Conference on Parallel Processing Workshop

ABSTRACT

References

Cited By

Index Terms

Recommendations

Analysis of high-resolution reconstruction of medical images based on deep convolutional neural networks in lung cancer diagnostics

Towards Scalable Parallel Training of Deep Neural Networks

High Resolution Medical Image Segmentation Using Data-Swapping Method

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

An Intelligent Parallel Distributed Streaming Framework for near Real-time Science Sensors and High-Resolution Medical Images

ICPP Workshops '21: 50th International Conference on Parallel Processing Workshop

ABSTRACT

References

Cited By

Index Terms

Recommendations

Analysis of high-resolution reconstruction of medical images based on deep convolutional neural networks in lung cancer diagnostics

Towards Scalable Parallel Training of Deep Neural Networks

High Resolution Medical Image Segmentation Using Data-Swapping Method

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media