Skip to main content
Top
Published in: Journal of Intelligent Information Systems 3/2020

14-08-2019

Study of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications

Author: Yoji Yamato

Published in: Journal of Intelligent Information Systems | Issue 3/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

To overcome of the high cost of developing IoT (Internet of Things) services by vertically integrating devices and services, Open IoT has been developed to enable various IoT services to be developed by integrating horizontally separated devices and services. For Open IoT, we have proposed Tacit Computing technology to discover the devices that can provide the data users need on demand and use them dynamically. We have also proposed an automatic GPU (graphics processing unit) offloading method as an elementary technology of Tacit Computing. However, our GPU offloading method can improve only a limited number of applications because it only optimizes the extraction of parallelizable loop statements. Therefore, in this paper, to improve performances of more applications automatically, we propose an improved GPU offloading method with fewer data transfers between the CPU and GPU that can improve performance of many IoT applications. We evaluate our proposed GPU offloading method by applying it to Darknet and Fourier Transform, which are general large applications for CPU, and find that it can process them 3 times and 5 times as quickly as only using CPUs within 10-hour tuning time.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Beylkin, G., Fann, G., Harrison, R.J., Kurcz, C., Monzon, L. (2012). Multiresolution representation of operators with boundary conditions on simple domains. Elsevier Applied and Computational Harmonic Analysis, 33(1), 109–139.MathSciNetCrossRef Beylkin, G., Fann, G., Harrison, R.J., Kurcz, C., Monzon, L. (2012). Multiresolution representation of operators with boundary conditions on simple domains. Elsevier Applied and Computational Harmonic Analysis, 33(1), 109–139.MathSciNetCrossRef
go back to reference Holland, J.H. (1992). Genetic algorithms. Scientific american, 267(1), 66–73.CrossRef Holland, J.H. (1992). Genetic algorithms. Scientific american, 267(1), 66–73.CrossRef
go back to reference Ishizaki, K. (2016). Transparent GPU exploitation for Java. In The fourth international symposium on computing and networking (CANDAR 2016). Ishizaki, K. (2016). Transparent GPU exploitation for Java. In The fourth international symposium on computing and networking (CANDAR 2016).
go back to reference Putnam, A., Caulfield, A.M., Chung, E.S., Chiou, D., Constantinides, K., Demme, J., Esmaeilzadeh, H., Fowers, J., Gopal, G.P., Gray, J., Haselman, M., Hauck, S., Heil, S., Hormati, A., Kim, J.-Y., Lanka, S., Larus, J., Peterson, E., Pope, S., Smith, A., Thong, J., Xiao, P.Y., Burger, D. (2014). A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the 41th annual international symposium on computer architecture (ISCA’14) (pp. 13–24). Putnam, A., Caulfield, A.M., Chung, E.S., Chiou, D., Constantinides, K., Demme, J., Esmaeilzadeh, H., Fowers, J., Gopal, G.P., Gray, J., Haselman, M., Hauck, S., Heil, S., Hormati, A., Kim, J.-Y., Lanka, S., Larus, J., Peterson, E., Pope, S., Smith, A., Thong, J., Xiao, P.Y., Burger, D. (2014). A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the 41th annual international symposium on computer architecture (ISCA’14) (pp. 13–24).
go back to reference Redmon, J., & Angelova, A. (2015). Real-time grasp detection using convolutional neural networks. In IEEE international conference on robotics and automation (ICRA) (p. 2015). Redmon, J., & Angelova, A. (2015). Real-time grasp detection using convolutional neural networks. In IEEE international conference on robotics and automation (ICRA) (p. 2015).
go back to reference Sanders, J., & Kandrot, E. (2011). CUDA by example: an introduction to general-purpose GPU programming, Addison-Wesley ISBN-0131387685. Sanders, J., & Kandrot, E. (2011). CUDA by example: an introduction to general-purpose GPU programming, Addison-Wesley ISBN-0131387685.
go back to reference Shirahata, K., Sato, H., Matsuoka, S. (2010). Hybrid map task scheduling for GPU-based heterogeneous clusters. In IEEE second international conference on cloud computing technology and science (CloudCom) (pp. 733–740). Shirahata, K., Sato, H., Matsuoka, S. (2010). Hybrid map task scheduling for GPU-based heterogeneous clusters. In IEEE second international conference on cloud computing technology and science (CloudCom) (pp. 733–740).
go back to reference Shitara, A., Nakahama, T., Yamada, M., Kamata, T., Nishikawa, Y., Yoshimi, M., Amano, H. (2011). Vegeta: an implementation and evaluation of development-support middleware on multiple opencl platform. In IEEE second international conference on networking and computing (ICNC 2011) (pp. 141–147). Shitara, A., Nakahama, T., Yamada, M., Kamata, T., Nishikawa, Y., Yoshimi, M., Amano, H. (2011). Vegeta: an implementation and evaluation of development-support middleware on multiple opencl platform. In IEEE second international conference on networking and computing (ICNC 2011) (pp. 141–147).
go back to reference Stone, J.E., Gohara, D., Shi, G. (2010). OpenCL: a parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering, 12 (3), 66–73.CrossRef Stone, J.E., Gohara, D., Shi, G. (2010). OpenCL: a parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering, 12 (3), 66–73.CrossRef
go back to reference Su, E., Tian, X., Girkar, M., Haab, G., Shah, S., Petersen, P. (2002). Compiler support of the workqueuing execution model for Intel SMP architectures. In Fourth European workshop on OpenMP. Su, E., Tian, X., Girkar, M., Haab, G., Shah, S., Petersen, P. (2002). Compiler support of the workqueuing execution model for Intel SMP architectures. In Fourth European workshop on OpenMP.
go back to reference Sunaga, H., Yamato, Y., Ohnishi, H., Kaneko, M., Iio, M., Hirano, M. (2008). Service delivery platform architecture for the next-generation network, ICIN 2008, Session 9-A. Sunaga, H., Yamato, Y., Ohnishi, H., Kaneko, M., Iio, M., Hirano, M. (2008). Service delivery platform architecture for the next-generation network, ICIN 2008, Session 9-A.
go back to reference Tanaka, Y., Miki, M., Yoshimi, M., Hiroyasu, T. (2011). Evaluation of optimization method for fortran codes with GPU automatic parallelization compiler. IPSJ SIG Technical Report, 2011(9), 1–6. Tanaka, Y., Miki, M., Yoshimi, M., Hiroyasu, T. (2011). Evaluation of optimization method for fortran codes with GPU automatic parallelization compiler. IPSJ SIG Technical Report, 2011(9), 1–6.
go back to reference Tomatsu, Y., Hiroyasu, T., Yoshimi, M., Miki, M. (2010). Gpot: intelligent compiler for GPGPU using combinatorial optimization techniques. In The 7th joint symposium between Doshisha University and Chonnam National University. Tomatsu, Y., Hiroyasu, T., Yoshimi, M., Miki, M. (2010). Gpot: intelligent compiler for GPGPU using combinatorial optimization techniques. In The 7th joint symposium between Doshisha University and Chonnam National University.
go back to reference Wienke, S., Springer, P., Terboven, C., an Mey, D. (2012). Open ACC-first experiences with real-world applications. Euro-Par 2012 Parallel Processing, pp. 859–870. Wienke, S., Springer, P., Terboven, C., an Mey, D. (2012). Open ACC-first experiences with real-world applications. Euro-Par 2012 Parallel Processing, pp. 859–870.
go back to reference Wolfe, M. (2010). Implementing the PGI accelerator model. In ACM the 3rd workshop on general-purpose computation on graphics processing units (pp. 43–50). Wolfe, M. (2010). Implementing the PGI accelerator model. In ACM the 3rd workshop on general-purpose computation on graphics processing units (pp. 43–50).
go back to reference Wuhib, F., Stadler, R., Lindgren, H. (2012). Dynamic resource allocation with management objectives - implementation for an OpenStack cloud. In 2012 8th international conference and 2012 workshop on systems virtualiztion management, Proceedings of Network and service management (pp. 309–315). Wuhib, F., Stadler, R., Lindgren, H. (2012). Dynamic resource allocation with management objectives - implementation for an OpenStack cloud. In 2012 8th international conference and 2012 workshop on systems virtualiztion management, Proceedings of Network and service management (pp. 309–315).
go back to reference Yamato, Y. (2007). Ubiquitous service composition technology for ubiquitous network environments. IPSJ Journal, 48(2), 562–577. Yamato, Y. (2007). Ubiquitous service composition technology for ubiquitous network environments. IPSJ Journal, 48(2), 562–577.
go back to reference Yamato, Y. (2015a). Use case study of HDD-SSD hybrid storage, distributed storage and HDD storage on OpenStack. In 19th international database engineering & applications symposium (IDEAS15) (pp. 228–229). Yamato, Y. (2015a). Use case study of HDD-SSD hybrid storage, distributed storage and HDD storage on OpenStack. In 19th international database engineering & applications symposium (IDEAS15) (pp. 228–229).
go back to reference Yamato, Y. (2015b). OpenStack Hypervisor, container and baremetal servers performance comparison. IEICE Communication Express, 4(7), 228–232.CrossRef Yamato, Y. (2015b). OpenStack Hypervisor, container and baremetal servers performance comparison. IEICE Communication Express, 4(7), 228–232.CrossRef
go back to reference Yamato, Y. (2016a). Cloud storage application area of HDD-SSD hybrid storage, distributed storage and HDD storage. IEEJ Transactions on Electrical and Electronic Engineering, 11(5), 674–675.CrossRef Yamato, Y. (2016a). Cloud storage application area of HDD-SSD hybrid storage, distributed storage and HDD storage. IEEJ Transactions on Electrical and Electronic Engineering, 11(5), 674–675.CrossRef
go back to reference Yamato, Y. (2016b). Performance-aware server architecture recommendation and automatic performance verification technology on IaaS cloud, Service oriented computing and applications, Springer. Yamato, Y. (2016b). Performance-aware server architecture recommendation and automatic performance verification technology on IaaS cloud, Service oriented computing and applications, Springer.
go back to reference Yamato, Y. (2017b). Optimum application deployment technology for heterogeneous IaaS cloud. Journal of Information Processing, 25(1), 56–58.CrossRef Yamato, Y. (2017b). Optimum application deployment technology for heterogeneous IaaS cloud. Journal of Information Processing, 25(1), 56–58.CrossRef
go back to reference Yamato, Y., & Sunaga, H. (2007). Context-aware service composition and component change-over using semantic web techniques. In IEEE international conference on web services (ICWS 2007) (pp. 687–694). Yamato, Y., & Sunaga, H. (2007). Context-aware service composition and component change-over using semantic web techniques. In IEEE international conference on web services (ICWS 2007) (pp. 687–694).
go back to reference Yamato, Y., Tanaka, Y., Sunaga, H. (2006). Context-aware ubiquitous service composition technology. In The IFIP international conference on research and practical issues of enterprise information systems (CONFENIS 2006) (pp. 51–61). Yamato, Y., Tanaka, Y., Sunaga, H. (2006). Context-aware ubiquitous service composition technology. In The IFIP international conference on research and practical issues of enterprise information systems (CONFENIS 2006) (pp. 51–61).
go back to reference Yamato, Y., Ohnishi, H., Sunaga, H. (2008). Development of service control server for web-telecom coordination service. In IEEE international conference on web services (ICWS 2008) (pp. 600–607). Yamato, Y., Ohnishi, H., Sunaga, H. (2008). Development of service control server for web-telecom coordination service. In IEEE international conference on web services (ICWS 2008) (pp. 600–607).
go back to reference Yamato, Y., Katsuragi, S., Nagao, S., Miura, N. (2015b). Software maintenance evaluation of agile software development method based on OpenStack. IEICE Transactions on Information & Systems, E98-D(7), 1377–1380.CrossRef Yamato, Y., Katsuragi, S., Nagao, S., Miura, N. (2015b). Software maintenance evaluation of agile software development method based on OpenStack. IEICE Transactions on Information & Systems, E98-D(7), 1377–1380.CrossRef
go back to reference Yamato, Y., Fukumoto, Y., Kumazaki, H. (2017). Predictive maintenance platform with sound stream analysis in edges. Journal of Information Processing, 25, 317–320.CrossRef Yamato, Y., Fukumoto, Y., Kumazaki, H. (2017). Predictive maintenance platform with sound stream analysis in edges. Journal of Information Processing, 25, 317–320.CrossRef
go back to reference Yamato, Y., Demizu, T., Noguchi, H., Kataoka, M. (2018a). Automatic GPU offloading technology for open IoT environment. IEEE Internet of Things Journal. Yamato, Y., Demizu, T., Noguchi, H., Kataoka, M. (2018a). Automatic GPU offloading technology for open IoT environment. IEEE Internet of Things Journal.
go back to reference Yamato, Y., Noguchi, H., Kataoka, M., Isoda, T., Demizu, T. (2018b). Proposal of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications. In The 3rd international conference on smart computing and communication (SmartCom 2018) (pp. 39–54). Yamato, Y., Noguchi, H., Kataoka, M., Isoda, T., Demizu, T. (2018b). Proposal of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications. In The 3rd international conference on smart computing and communication (SmartCom 2018) (pp. 39–54).
go back to reference Yokohata, Y., Yamato, Y., Takemoto, M., Sunaga, H. (2006a). Service composition architecture for programmability and flexibility in ubiquitous communication networks. In IEEE international symposium on applications and the internet workshops (SAINTW’06) (pp. 142–145). Yokohata, Y., Yamato, Y., Takemoto, M., Sunaga, H. (2006a). Service composition architecture for programmability and flexibility in ubiquitous communication networks. In IEEE international symposium on applications and the internet workshops (SAINTW’06) (pp. 142–145).
go back to reference Yokohata, Y., Yamato, Y., Takemoto, M., Tanaka, E., Nishiki, K. (2006b). Context-aware content-provision service for shopping malls based on ubiquitous Service-Oriented network framework and authentication and access control agent framework. In IEEE consumer communications and networking conference (CCNC 2006) (pp. 1330–1331). Yokohata, Y., Yamato, Y., Takemoto, M., Tanaka, E., Nishiki, K. (2006b). Context-aware content-provision service for shopping malls based on ubiquitous Service-Oriented network framework and authentication and access control agent framework. In IEEE consumer communications and networking conference (CCNC 2006) (pp. 1330–1331).
Metadata
Title
Study of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications
Author
Yoji Yamato
Publication date
14-08-2019
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 3/2020
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-019-00575-8

Other articles of this Issue 3/2020

Journal of Intelligent Information Systems 3/2020 Go to the issue

Premium Partner