Skip to main content

2015 | Buch

GPU Computing and Applications

insite
SUCHEN

Über dieses Buch

This book presents a collection of state of the art research on GPU Computing and Application. The major part of this book is selected from the work presented at the 2013 Symposium on GPU Computing and Applications held in Nanyang Technological University, Singapore (Oct 9, 2013). Three major domains of GPU application are covered in the book including (1) Engineering design and simulation; (2) Biomedical Sciences; and (3) Interactive & Digital Media. The book also addresses the fundamental issues in GPU computing with a focus on big data processing. Researchers and developers in GPU Computing and Applications will benefit from this book. Training professionals and educators can also benefit from this book to learn the possible application of GPU technology in various areas.

Inhaltsverzeichnis

Frontmatter
Chapter 1. A GPU-Enabled Parallel Genetic Algorithm for Path Planning of Robotic Operators
Abstract
Genetic algorithm (GA) is a class of global optimization algorithm inspired by the Darwinian biological evolution. It is widely applied in the field of robotic path planning. Parallel GA (PGA) is a subclass of GA which is able to achieve good solutions in a short time. This chapter discusses the utilization of a PGA in determining collision-free path for robotic operators. GPU-style genetic operators are designed to speed up the GA process while improving the quality of solutions. GPU parallelization for a master–slave parallel GA (MSPGA) is implemented by parallelizing the selection, crossover and mutation operators.
Panpan Cai, Yiyu Cai, Indhumathi Chandrasekaran, Jianmin Zheng
Chapter 2. Real-Time Deformation of Constrained Meshes Using GPU
Abstract
Constrained meshes play an important role in free-form architectural design, as they can represent panel layouts on free-form surfaces. It is challenging to perform real-time manipulation on such meshes, because all constraints need to be respected during the deformation while the shape quality needs to be maintained. This usually leads to nonlinear constrained optimization problems, which are challenging to solve in real time. In this chapter, we present a GPU-based shape manipulation tool for constrained meshes, using the parallelizable algorithm proposed in Deng et al. (Computer-Aided Design, 2014). We discuss the main challenges and solutions for the GPU implementation and provide timing comparison against CPU implementations of the algorithm. Our GPU implementation significantly outperforms the CPU version, allowing real-time handle-based deformation for large constrained meshes.
Alexandre Kaspar, Bailin Deng
Chapter 3. GPU-Based Real-Time Volume Interaction for Scientific Visualization Education
Abstract
In this chapter, we introduce the interaction methods of our self-developed VisEdu as a visual teaching system to teach scientific visualization courses at Beijing Normal University. VisEdu provides real-time visualization and interaction of midsize CT datasets at interactive frame rates via CUDA-based volume rendering. We describe various rendering methods through plane, superquadric, and virtual lenses tools which offer different views of the same dataset. It aids the students to better understand the feature of virtual contents and the core algorithms of the scientific visualization course such as volume rendering, volume interaction, etc.
Yanlin Luo, Zhongke Wu, Zuying Luo, Yanhong Luo
Chapter 4. Real-Time Separable Subsurface Scattering for Animated Virtual Characters
Abstract
In this chapter, we present our real-time, GPU-accelerated separable subsurface scattering method for interactive, skeletal-based deformable animated virtual characters. Our screen space implementation is based on state-of-the-art algorithms, and we propose specific algorithmic and implementation extensions so that these algorithms can be employed in real-time virtual characters. We have created a physically principled real-time rendering framework, which features a series of rendering effects based on widely available open-source tools such as Open Scene Graph, C++, and GLSL so that it can be easily integrated in modern rendering engines and scene graphs via commodity graphics h/w.
P. Papanikolaou, G. Papagiannakis
Chapter 5. Adaptive NURBS Tessellation on GPU
Abstract
This chapter presents a method for adaptively tessellating NURBS surfaces on GPU. The method involves tessellation interval estimation, conversion from NURBS to rational Bézier patches, and gap-free tessellation of rational Bézier patches. All the computations are performed on GPU. The main contributions of the chapter lie in two aspects: (1) we improve Zheng and Sederberg’s tessellation interval estimation for rational curves and surfaces to give larger tessellation interval and thus to produce fewer triangles, and (2) we propose an adaptive tessellation strategy that allows to tessellate each rational Bézier patch on GPU independently and meanwhile avoid gaps between rational Bézier patches. By using GPU, complicated NURBS models can be easily rendered in real time.
Yusha Li, Xingjiang Lu, Wenjing Zhang, Guozhao Wang
Chapter 6. Graphics Native Approach to Identifying Surface Atoms of Macromolecules
Abstract
Classification of “surface atoms” or “interior atoms” of proteins or other macromolecules is significant for many biochemical tasks, particularly for molecular docking. We present a simple and easy-to-implement algorithm for identifying surface atoms of macromolecules from interior atoms. Unlike existing methods that are based on geometry computations, our approach takes the advantage of graphics hardware, and most of the computations are fulfilled with graphics processing unit (GPU). The algorithm can be easily incorporated within visualization applications for macromolecules to enable the removal of interior atoms from a macromolecular structure, thus simplifying the graphics display and manipulation.
Huagen Wan, Yunqing Guan, Yiyu Cai
Chapter 7. A Scalable Software Framework for Stateful Stream Data Processing on Multiple GPUs and Applications
Abstract
During the past few years, the increase of computational power has been realized using more processors with multiple cores and specific processing units like graphics processing units (GPUs). Also, the introduction of programming languages such as CUDA and OpenCL makes it easy, even for non-graphics programmers, to exploit the computational power of massively parallel processors available in current GPUs. Although CUDA and OpenCL relieve programmers from considering many low-level details of parallel programming on multiple cores on a single GPU, the same support at a higher level of parallelization for multiple GPUs is still under research. In particular, fundamental issues of memory management and synchronization must be dealt with directly by the programmer. In this chapter, we introduce concepts for CUDA-based frameworks which are designed for stateful stream data processing for graph-like arrangements of processing modules on two or more GPUs in a single compute node. We evaluate these concepts and further elaborate on the approach of our choice. Our approach relieves the programmer from error-prone chores of memory management and synchronization. The chapter presents detailed evaluation results which demonstrate the scalability of the proposed framework. To demonstrate the usability of our framework, we utilize it for demanding online processing in the areas of crystallographic structure detection and video decryption.
Farhoosh Alghabi, Ulrich Schipper, Andreas Kolb
Chapter 8. The Design of SkyPACS: A High-Performance Mobile Medical Imaging Solution
Abstract
Lack of radiologists is a problem that arises in many parts of the world. Radiologists need to work long hours for multiple hospitals. In order to improve the quality of healthcare, SkyPACS is designed. It is a mobile solution that allows radiologists to work more conveniently. SkyPACS is a low-cost and customizable medical image viewer that can be used for prognosis. The solution is designed to be an assistive technology with the focus on simplicity, flexibility, and user experiences. The architecture of SkyPACS is designed based on service-oriented Model-View-Controller. The customers can freely choose the back-end services: cloud computing and storage on public cloud, private server, or hybrid system. The compute-intensive modules are deployed on a GPU server taking advantage of data parallel with CUDA library. The main features include all standard tools for viewing and diagnosis in 2D and 3D, convenient tools for collaborations, and case management. In addition, advanced functions such as automatic tumor detection and reconstruction and bone/skin/muscle segmentation are provided. This paper describes the details of SkyPACS’s design, as well as its implementation and initial deployment. We believe that SkyPACS will soon be available to a broad range of users in Thailand and AEC’s countries and will be able to reduce the cost of the healthcare platform in the near future.
Tananan Pattanangkur, Sikana Tanupabrungson, Katchaguy Areekijseree, Sarunya Pumma, Tiranee Achalakul
Chapter 9. Collision Detection Based on Fuzzy Scene Subdivision
Abstract
We present a novel approach to perform collision detection queries between rigid and/or deformable models. Our method can handle arbitrary deformations and even discontinuous ones. For this, we subdivide the whole scene with all objects into connected but totally independent parts by a fuzzy clustering algorithm. Following, for every part, our algorithm performs a Principal Component Analyses to achieve the best sweep direction for the sweep-plane step, which reduces the number of false positives greatly. Our collision detection algorithm processes all computations without the need of a bounding volume hierarchy or any other acceleration data structure. One great advantage of this is that our method can handle the broad phase as well as the narrow phase within one single framework. Our collision detection algorithm works directly on all primitives of the whole scene, which results in a simpler implementation and can be integrated much more easily by other applications. We can compute inter-object and intra-object collisions of rigid and deformable objects consisting of many tens of thousands of triangles in a few milliseconds on a modern computer. We have evaluated its performance by common benchmarks.
David Mainzer, Gabriel Zachmann
Chapter 10. Smoothed Particle Hydrodynamics Applied to Cartilage Deformation
Abstract
Modelling of the cartilage within the acetabulum is necessary for determination of stresses in preoperative simulation of femoral acetabular impingement (FAI), a condition that is considered a primary cause of osteoarthritis. Presented is a previously proven method for elastic solid deformation using smoothed particle hydrodynamics (SPH). Smoothed particle hydrodynamics is a mesh-free method that has advantages in computational speed and accuracy over other graphical methods and as such is attractive for medical simulations that require high degrees of precision and real-time operability. A complete formulation of the method of polar decomposition as devised for smoothed particle hydrodynamics is outlined with the inclusion of a corotational formulation for accurate rotation handling. Modifications to the existing method include boundary and collision handling using an adapted virtual particle method, as well as an algorithm for parallel implementation on the GPU using NVIDIA’s CUDA framework. The method is verified through testing with a range of material parameters within the provided elastic solid framework. Employing CUDA for calculations is found to dramatically increase the computational speed of the simulation. The results of an indenter analysis of cartilage modelled as a purely elastic solid are presented and evaluated, with the conclusion that with further refinement the presented method is promising for use in cartilage simulations.
Philip Boyer, Sean LeBlanc, Chris Joslin
Chapter 11. A GPU-Based Real-Time Algorithm for Virtual Viewpoint Rendering from Multi-video
Abstract
In this chapter, we propose a novel GPU-based algorithm capable of generating free viewpoints from a network of fixed HD video cameras. This free viewpoint TV system consists of two main subsystems: a real-time depth estimation subsystem, which extracts a disparity map from a network of cameras, and a synthetic viewpoint generation subsystem that uses the disparity map to interpolate new views between the cameras. In this system, we use a space-sweep algorithm to estimate depth information, which is amiable to parallel implementation. The viewpoint generation subsystem generates new synthetic images from 3D vertices and renders them from an arbitrary viewpoint specified by the user. Both steps are computationally extensive, but the computations can be easily divided from each other and thus can be efficiently implemented in parallel using CUDA. The framework is tested using publicly available image sequences published by Microsoft. Experimental results are presented.
Kyrylo Shegeda, Pierre Boulanger
Chapter 12. A Middleware Framework for Programmable Multi-GPU-Based Big Data Applications
Abstract
Current application of GPU processors for parallel computing tasks shows excellent results in terms of speedups compared to CPU processors. However, there is no existing middleware framework that enables automatic distribution of data and processing across heterogeneous computing resources for structured and unstructured Big Data applications. Thus, we propose a middleware framework for “Big Data” analytics that provides mechanisms for automatic data segmentation, distribution, execution, information retrieval across multiple cards (CPU and GPU) and machines, a modular design for easy addition of new GPU kernels at both analytic and processing layer, and information presentation. The architecture and components of the framework such as multi-card data distribution and execution, data structures for efficient memory access, algorithms for parallel GPU computation, and results for various test configurations are shown. Our results show proposed middleware framework, providing alternative and cheaper HPC solution to users. Data cleansing algorithms on GPU show a speedup of over two orders of magnitude compared to the same operation done in MySQL on a multi-core machine. Our framework is also capable of processing more than 120 million of health data within 11 s.
Ettikan K. Karuppiah, Yong Keh Kok, Keeratpal Singh
Chapter 13. On the Efficient Implementation of a Real-Time Kd-Tree Construction Algorithm
Abstract
The kd tree is one of the most commonly used spatial data structures for a variety of graphics applications because of its reliably high-acceleration performance. Several years ago, Zhou et al. devised an effective kd-tree construction algorithm that runs entirely on a GPU. In this chapter, we present improved GPU programming techniques for implementing the algorithm more efficiently on current GPUs. One of the major ideas is to reduce the number of necessary kernel functions by replacing the essential, segmented-scan, and reduction computations by simpler per-block atomic operations, thereby alleviating the overheads from multiple synchronous kernel calls. Combined with the efficient implementation of intrablock scan and reduction, using recently introduced intrinsic functions, these changes achieve remarkable performance enhancement to the kd-tree construction process. Through an example of real-time ray tracing for dynamic scenes of nontrivial complexity, we demonstrate that the proposed GPU techniques can be exploited effectively for various real-time applications.
Byungjoon Chang, Woong Seo, Insung Ihm
Chapter 14. Fast Approximate k-Nearest Neighbours Search Using GPGPU
Abstract
The k-nearest neighbours (k-NN) search is one of the most critical non-parametric methods used in data retrieval and similarity tasks. Over recent years, fast k-NN processing for large amount of high-dimensional data is increasingly demanded. Locality-sensitive hashing is a viable solution for computing fast approximate nearest neighbours (ANN) with reasonable accuracy. This chapter presents a novel parallelisation of the locality-sensitive hashing method using GPGPU, where the multi-probe variant is considered. The method was implemented using CUDA platform for constructing a k-ANN graph. It was compared to the state-of-the-art CPU-based k-ANN and two GPU-based k-NN methods on large and multidimensional data set. The experimental results showed that the proposed method has a speed-up of 30× or higher, in comparison to the CPU-based approximate method, whilst retaining a high recall rate.
Niko Lukač, Borut Žalik
Chapter 15. Soft Computing Methods for Big Data Problems
Abstract
Generally, big data computing deals with massive and high-dimensional data such as DNA microarray data, financial data, medical imagery, satellite imagery, and hyperspectral imagery. Therefore, big data computing needs advanced technologies or methods to solve the issues of computational time to extract valuable information without information loss. In this context, generally, machine learning (ML) algorithms have been considered to learn and find useful and valuable information from large value of data. However, ML algorithms such as neural networks are computationally expensive, and typically, the central processing unit (CPU) is unable to cope with these requirements. Thus, we need a high-performance computer to execute faster solutions such graphics processing unit (GPU). GPUs provide remarkable performance gains compared to CPUs. The GPU is relatively inexpensive with affordable price, availability, and scalability. Since 2006, NVIDIA provides simplification of the GPU programming model with the Compute Unified Device Architecture (CUDA), which supports for accessible programming interfaces and industry-standard languages, such as C and C++. Since then, general-purpose graphics processing unit (GPGPU) using ML algorithms are applied on various applications, including signal and image pattern classification in biomedical area. The importance of fast analysis of detecting cancer or non-cancer becomes the motivation of this study. Accordingly, we proposed soft computing methods, self-organizing map (SOM) and multiple back-propagation (MBP) for big data, particularly on biomedical classification problems. Big data such as gene expression datasets are executed on high-performance computer and Fermi architecture graphics hardware. Based on the experiment, MBP and SOM with GPU-Tesla generate faster computing times than high-performance computer with feasible results in terms of speed and classification performance.
Shafaatunnur Hasan, Siti Mariyam Shamsuddin, Noel Lopes
Chapter 16. Numerical Solution of BVP on GPU with Application to Path Planning
Abstract
The problem of path planning in a virtual environment is a widely researched area, which finds application in fields such as robotics, simulations, and computer games. This article focuses on a comparison of numerical methods for solving partial differential equations with BVP on the GPU with NVIDIA CUDA, used in the path planning of virtual characters using the potential fields. The most commonly used methods for computing the potential fields on the GPU are compared in this article in terms of time consumption.
Lumír Janošek, Martin Němec, Radoslav Fasuga
Chapter 17. Fast Multi-Keyword Range Search Using GPGPU
Abstract
Large organisations are constantly challenged by the need to handle big data. Big data sizes are a constantly moving target, as of 2013 ranging from a few dozen terabytes to many petabytes of data. The data is usually stored in very large databases that are often indexed off-line to enable the acceleration of on-line searches. More recently, the p-ary algorithm has been proposed to exploit the massively parallel architecture of graphics processors (GPUs) to substantially accelerate the search operations on such large databases. In this chapter we present a multi-keyword range search technique that efficiently exploits index data structures to search multiple text keywords in large databases. The multi-keyword range search is an extension of the p-ary algorithm which was originally developed by Kaldewey et al. We enhanced the p-ary algorithm to support multi-keyword search on GPGPU. We compare the performance in terms of response time, throughput and speed-ups between CPU and GPGPU implementations. The performance benchmarks demonstrated that our algorithm achieves up to 25× and 6× performance in terms of speed-up on Tesla K20c GPU card when compared to a single and multicore CPU implementations, respectively.
Amirul Abdullah, Amril Nazir, Mohanavelu Senapan, Soo Saw Meng, Ettikan Karuppiah
Backmatter
Metadaten
Titel
GPU Computing and Applications
herausgegeben von
Yiyu Cai
Simon See
Copyright-Jahr
2015
Verlag
Springer Singapore
Electronic ISBN
978-981-287-134-3
Print ISBN
978-981-287-133-6
DOI
https://doi.org/10.1007/978-981-287-134-3

Neuer Inhalt