skip to main content
10.1145/3458744.3474039acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

An Intelligent Parallel Distributed Streaming Framework for near Real-time Science Sensors and High-Resolution Medical Images

Published:23 September 2021Publication History

ABSTRACT

Our goals are to address challenges such as latency, scalability, throughput and heterogeneous data sources of streaming analytics and deep learning pipelines in science sensors and medical imaging applications. We present a prototype Intelligent Parallel Distributed Streaming Framework (IPDSF) that is capable of distributed streaming processing as well as performing distributed deep training in batch mode. IPDSF is designed to run streaming Artificial Intelligent (AI) analytic tasks using data parallelism including partitions of multiple streams of short time sensing data and high-resolution 3D medical images, and fine grain tasks distribution. We will show the implementation of IPDSF for two real world applications, (i) an Air Quality Index based on near real time streaming of aerosol Lidar backscatter and (ii) data generation of Covid-19 Computing Tomography (CT) scans using deep learning. We evaluate the latency, throughput, scalability, and quantitative evaluation of training and prediction compared against a baseline single instance. As the results, IPDSF scales to process thousands of streaming science sensors in parallel for Air Quality Index application. IPDSF uses novel 3D conditional Generative Adversarial Network (cGAN) training using parallel distributed Graphic Processing Units (GPU) nodes to generate realistic 3D high resolution Computed Tomography scans of Covid-19 patient lungs. We will show that IPDSF can reduce cGAN training time linearly with the number of GPUs.

References

  1. A. A. Awan, A. Jain, C.-H. Chu, H. Subramoni, and D. Panda, “Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects,” in Hot Interconnects 26 (HotI ’19), August 2019.Google ScholarGoogle ScholarCross RefCross Ref
  2. “AmazonKinesis,”https://aws.amazon.com/kinesis/data-streams/.Google ScholarGoogle Scholar
  3. Anthony, Q., Awan, A. A., Jain, A., Subramoni, H., & Panda, D. K. D. K. (2020). Efficient training of semantic image segmentation on summit using horovod and MVAPICH2-GDR. 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 1015–1023. IEEE..Google ScholarGoogle ScholarCross RefCross Ref
  4. B. Yadranjiaghdam, S. Yasrobi, and N. Tabrizi, “Developing a real-time data analytics framework for twitter streaming data,” in 2017 IEEE International Congress on Big Data (BigData Congress), June 2017, pp. 329–336.Google ScholarGoogle ScholarCross RefCross Ref
  5. D. Tiwari, S. Gupta, and S. S. Vazhkudai, “Lazy checkpointing: Exploiting temporal locality in failures to mitigate checkpointing overheads on extreme-scale systems,” International Conference on Dependable Systems and Networks (DSN), 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Ye, Z. Wu, R.-W. Zhao, X. Wang, Y.-G. Jiang, and X. Xue, “Evaluating two-stream cnn for video classification,” in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ser. ICMR ’15. New York, NY, USA: ACM, 2015, pp. 435–442. [Online]. Available:http://doi.acm.org/10.1145/2671188.2749406Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ichinose, A., Takefusa, A., Nakada, H., & Oguchi, M. (2017). A study of a video analysis framework using Kafka and spark streaming. 2017 IEEE International Conference on Big Data (Big Data), 2396–2401. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Read, F. Perez-Cruz, and A. Bifet, “Deep learning in partially labeled data streams,” in Proceedings of the 30th Annual ACM Symposium on Applied Computing, ser. SAC ’15. New York, NY, USA: ACM, 2015, pp. 954–959. [Online].Available:http://doi.acm.org/10.1145/2695664.2695871Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Tanaka, K., Arikawa, Y., Ito, T., Morita, K., Nemoto, N., Miura, F., … Sakamoto, T. (2020). Communication-efficient distributed deep learning with GPU-FPGA heterogeneous computing. 2020 IEEE Symposium on High-Performance Interconnects (HOTI), 43–46. IEEEGoogle ScholarGoogle ScholarCross RefCross Ref
  10. Tudoran, R., Nano, O., Santos, I., Costan, A., Soncu, H., Bougé, L., & Antoniu, G. (2014). JetStream: Enabling high performance event streaming across cloud data-centers. Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems - DEBS ’14. New York, New York, USA: ACM Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Marcu, O.-C., Costan, A., Antoniu, G., Perez-Hernandez, M., Nicolae, B., Tudoran, R., & Bortoli, S. (2018). KerA: Scalable Data Ingestion for Stream Processing. 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), 1480–1485. IEEE.Google ScholarGoogle Scholar
  12. A. Sergeev and M. Del Balso, Horovod: Fast and Easy Distributed Deep Learning in TensorFlow, arXiv:1802.05799v3, Feb. 21, 2018.Google ScholarGoogle Scholar
  13. Tiwari, D., Gupta, S., Gallarno, G., Rogers, J., & Maxwell, D. (2015). Reliability lessons learned from GPU experience with the Titan supercomputer at Oak Ridge leadership computing facility. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC ’15. New York, New York, USA: ACM Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Sleeman, Z. Yang, V.Caicedo, M. Halem, B. Demoz, R. Delgado, “A Deep Machine Learning Approach for LIDAR Based Boundary Layer Height Detection”, International Geoscience and Remote Sensing Symposium (IGARSS).Google ScholarGoogle Scholar
  15. P.Nguyen and M.Halem “Deep Learning Models for Predicting CO2 Flux Employing Multivariate Time Series”   SIGKDD MileTS, Alaska 2019.Google ScholarGoogle Scholar
  16. J. Mangalagiri, D. Chapman, A. Gangopadhyay, Y. Yesha, J. Galita, S. Menon, Y. Yesha, B. Saboury, M. Morris, P. Nguyen, “Toward Generating Synthetic CT Volumes using a 3D-Conditional Generative Adversarial Network”, The 2020 International Conference on Computational Science and Computational Intelligence Symposium on Health Informatics and Medical Systems (CSCI-ISHI), Las Vegas 2020.Google ScholarGoogle ScholarCross RefCross Ref
  17. Nokleby, M., Raja, H., & Bajwa, W. U. (2020). Scaling-up distributed processing of data streams for machine learning. Proceedings of the IEEE. Institute of Electrical and Electronics Engineers, 1–29.Google ScholarGoogle ScholarCross RefCross Ref
  18. Phuong Nguyen, Samit Shivadekar, Sai Sree Chukkapalli, Milton Halem. Satellite Data Fusion of Multiple Observed XCO2 using Compressive Sensing and Deep Learning.  IEEE IGARSS 2020Google ScholarGoogle Scholar
  19. Y. Mirsky, T. Mahler, I. Shelef, and Y. Elovici, “CT-GAN:  Malicious tampering of 3D medical imagery using deep learning,” In 28th {USENIX} Security Symposium ({USENIX} Security 19) (pp. 461-478), 2019.Google ScholarGoogle Scholar
  20. Efremov, M. A., Kholod, I. I., & Kolpaschikov, M. A. (2021). Java Federated Learning framework architecture. 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus), 306–309.Google ScholarGoogle ScholarCross RefCross Ref
  21. Kim, D., Park, H., & Choi, J. K. (2021). Optimal load allocation for coded distributed computation in heterogeneous clusters. IEEE Transactions on Communications, 69(1), 44–58.Google ScholarGoogle ScholarCross RefCross Ref
  22. Reyes-Anastacio, H. G., Gonzalez-Compean, J. L., Sosa-Sosa, V. J., Carretero, J., & Garcia-Blas, J. (2020). Kulla, a container-centric construction model for building infrastructure-agnostic distributed and parallel applications. The Journal of Systems and Software, 168(110665), 110665.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An Intelligent Parallel Distributed Streaming Framework for near Real-time Science Sensors and High-Resolution Medical Images
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICPP Workshops '21: 50th International Conference on Parallel Processing Workshop
          August 2021
          314 pages
          ISBN:9781450384414
          DOI:10.1145/3458744

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 September 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate91of313submissions,29%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format