Swipe to navigate through the articles of this issue
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
To overcome of the high cost of developing IoT (Internet of Things) services by vertically integrating devices and services, Open IoT has been developed to enable various IoT services to be developed by integrating horizontally separated devices and services. For Open IoT, we have proposed Tacit Computing technology to discover the devices that can provide the data users need on demand and use them dynamically. We have also proposed an automatic GPU (graphics processing unit) offloading method as an elementary technology of Tacit Computing. However, our GPU offloading method can improve only a limited number of applications because it only optimizes the extraction of parallelizable loop statements. Therefore, in this paper, to improve performances of more applications automatically, we propose an improved GPU offloading method with fewer data transfers between the CPU and GPU that can improve performance of many IoT applications. We evaluate our proposed GPU offloading method by applying it to Darknet and Fourier Transform, which are general large applications for CPU, and find that it can process them 3 times and 5 times as quickly as only using CPUs within 10-hour tuning time.
Please log in to get access to this content
To get access to this content you need the following product:
Clang Website. (2018). http://llvm.org/. Accessed 20 May 2019.
Hermann, M., Pentek, T., Otto, B. (2015). Design principles for Industrie 4.0 scenarios, Working Draft, Rechnische Universitat Dortmund. http://www.snom.mb.tu-dortmund.de/cms/de/forschung/Arbeitsberichte/Design-Principles-for-Industrie-4_0-Scenarios.pdf.
Holland, J.H. (1992). Genetic algorithms. Scientific american, 267(1), 66–73. CrossRef
Ishizaki, K. (2016). Transparent GPU exploitation for Java. In The fourth international symposium on computing and networking (CANDAR 2016).
Laplace Equation Source Website. (2018). https://github.com/parallel-forall/cudacasts/tree/master/ep3-first-openacc-program. Accessed 20 May 2019.
NAS.FT Website. (2018). https://www.nas.nasa.gov/publications/npb.html. Accessed 20 May 2019.
Putnam, A., Caulfield, A.M., Chung, E.S., Chiou, D., Constantinides, K., Demme, J., Esmaeilzadeh, H., Fowers, J., Gopal, G.P., Gray, J., Haselman, M., Hauck, S., Heil, S., Hormati, A., Kim, J.-Y., Lanka, S., Larus, J., Peterson, E., Pope, S., Smith, A., Thong, J., Xiao, P.Y., Burger, D. (2014). A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the 41th annual international symposium on computer architecture (ISCA’14) (pp. 13–24).
Redmon, J., & Angelova, A. (2015). Real-time grasp detection using convolutional neural networks. In IEEE international conference on robotics and automation (ICRA) (p. 2015).
Sanders, J., & Kandrot, E. (2011). CUDA by example: an introduction to general-purpose GPU programming, Addison-Wesley ISBN-0131387685.
Shirahata, K., Sato, H., Matsuoka, S. (2010). Hybrid map task scheduling for GPU-based heterogeneous clusters. In IEEE second international conference on cloud computing technology and science (CloudCom) (pp. 733–740).
Shitara, A., Nakahama, T., Yamada, M., Kamata, T., Nishikawa, Y., Yoshimi, M., Amano, H. (2011). Vegeta: an implementation and evaluation of development-support middleware on multiple opencl platform. In IEEE second international conference on networking and computing (ICNC 2011) (pp. 141–147).
Stone, J.E., Gohara, D., Shi, G. (2010). OpenCL: a parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering, 12 (3), 66–73. CrossRef
Su, E., Tian, X., Girkar, M., Haab, G., Shah, S., Petersen, P. (2002). Compiler support of the workqueuing execution model for Intel SMP architectures. In Fourth European workshop on OpenMP.
Sunaga, H., Yamato, Y., Ohnishi, H., Kaneko, M., Iio, M., Hirano, M. (2008). Service delivery platform architecture for the next-generation network, ICIN 2008, Session 9-A.
Tanaka, Y., Miki, M., Yoshimi, M., Hiroyasu, T. (2011). Evaluation of optimization method for fortran codes with GPU automatic parallelization compiler. IPSJ SIG Technical Report, 2011(9), 1–6.
Tomatsu, Y., Hiroyasu, T., Yoshimi, M., Miki, M. (2010). Gpot: intelligent compiler for GPGPU using combinatorial optimization techniques. In The 7th joint symposium between Doshisha University and Chonnam National University.
Tron Project Web Site. (2018). http://www.tron.org/. Accessed 20 May 2019.
Wienke, S., Springer, P., Terboven, C., an Mey, D. (2012). Open ACC-first experiences with real-world applications. Euro-Par 2012 Parallel Processing, pp. 859–870.
Wolfe, M. (2010). Implementing the PGI accelerator model. In ACM the 3rd workshop on general-purpose computation on graphics processing units (pp. 43–50).
Wuhib, F., Stadler, R., Lindgren, H. (2012). Dynamic resource allocation with management objectives - implementation for an OpenStack cloud. In 2012 8th international conference and 2012 workshop on systems virtualiztion management, Proceedings of Network and service management (pp. 309–315).
Yamato, Y. (2007). Ubiquitous service composition technology for ubiquitous network environments. IPSJ Journal, 48(2), 562–577.
Yamato, Y. (2015a). Use case study of HDD-SSD hybrid storage, distributed storage and HDD storage on OpenStack. In 19th international database engineering & applications symposium (IDEAS15) (pp. 228–229).
Yamato, Y. (2015b). OpenStack Hypervisor, container and baremetal servers performance comparison. IEICE Communication Express, 4(7), 228–232. CrossRef
Yamato, Y. (2015c). Automatic verification technology of software patches for user virtual environments on IaaS cloud, Journal of Cloud Computing, Springer, 2015, 4:4, https://doi.org/10.1186/s13677-015-0028-6.
Yamato, Y. (2016a). Cloud storage application area of HDD-SSD hybrid storage, distributed storage and HDD storage. IEEJ Transactions on Electrical and Electronic Engineering, 11(5), 674–675. CrossRef
Yamato, Y. (2016b). Performance-aware server architecture recommendation and automatic performance verification technology on IaaS cloud, Service oriented computing and applications, Springer.
Yamato, Y. (2017a). Server selection, configuration and reconfiguration technology for IaaS cloud with multiple server types, Journal of Network and Systems Management, Springer, https://doi.org/10.1007/s10922-017-9418-z.
Yamato, Y. (2017b). Optimum application deployment technology for heterogeneous IaaS cloud. Journal of Information Processing, 25(1), 56–58. CrossRef
Yamato, Y., & Sunaga, H. (2007). Context-aware service composition and component change-over using semantic web techniques. In IEEE international conference on web services (ICWS 2007) (pp. 687–694).
Yamato, Y., Tanaka, Y., Sunaga, H. (2006). Context-aware ubiquitous service composition technology. In The IFIP international conference on research and practical issues of enterprise information systems (CONFENIS 2006) (pp. 51–61).
Yamato, Y., Ohnishi, H., Sunaga, H. (2008). Development of service control server for web-telecom coordination service. In IEEE international conference on web services (ICWS 2008) (pp. 600–607).
Yamato, Y., Nishizawa, Y., Nagao, S., Sato, K. (2015a). Fast and reliable restoration method of virtual resources on OpenStack, IEEE Transactions on Cloud Computing, https://doi.org/10.1109/TCC.2015.2481392.
Yamato, Y., Katsuragi, S., Nagao, S., Miura, N. (2015b). Software maintenance evaluation of agile software development method based on OpenStack. IEICE Transactions on Information & Systems, E98-D(7), 1377–1380. CrossRef
Yamato, Y., Fukumoto, Y., Kumazaki, H. (2017). Predictive maintenance platform with sound stream analysis in edges. Journal of Information Processing, 25, 317–320. CrossRef
Yamato, Y., Demizu, T., Noguchi, H., Kataoka, M. (2018a). Automatic GPU offloading technology for open IoT environment. IEEE Internet of Things Journal.
Yamato, Y., Noguchi, H., Kataoka, M., Isoda, T., Demizu, T. (2018b). Proposal of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications. In The 3rd international conference on smart computing and communication (SmartCom 2018) (pp. 39–54).
Yokohata, Y., Yamato, Y., Takemoto, M., Sunaga, H. (2006a). Service composition architecture for programmability and flexibility in ubiquitous communication networks. In IEEE international symposium on applications and the internet workshops (SAINTW’06) (pp. 142–145).
Yokohata, Y., Yamato, Y., Takemoto, M., Tanaka, E., Nishiki, K. (2006b). Context-aware content-provision service for shopping malls based on ubiquitous Service-Oriented network framework and authentication and access control agent framework. In IEEE consumer communications and networking conference (CCNC 2006) (pp. 1330–1331).
- Study of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications
- Publication date
- Springer US
Journal of Intelligent Information Systems
Integrating Artificial Intelligence and Database Technologies
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
Neuer Inhalt/© ITandMEDIA