1 Introduction
1.1 Contribution
-
A multi-level method (MLE) for the computation of parametric non-rational Bézier tensor-product surfaces of arbitrary degree. The use of this method can be further applied to other formulations (e.g., rational Bézier), as well as tensor-products of higher order than surfaces.
-
We propose different techniques to map MLE onto different hardware platforms, including central processing units (CPU), discrete and integrated graphics processing units (GPU) as well as mobile integrated GPUs—these latter ones being poorly explored in the literature.
-
As the latest trends in computing move towards hybrid systems (more than one kind of processor present), we also propose CPU–GPU cooperation mechanisms, including the exploitation of (HCS) models with different properties.
2 Background
3 Related Work
Publication | Bézier formulation | Max. degree evaluated | Optimization strategies | Implementation | Rendering | ||
---|---|---|---|---|---|---|---|
Algorithmic | Hardware | Shaders | GPGPU | ||||
[29] | Non-rational |
\(4\times 4\)
|
\(\bullet \)
|
\(\bullet \)
| |||
[27] | Non-rational |
\(3\times 3\)
|
\(\bullet \)
|
\(\bullet \)
|
\(\bullet \)
| ||
[33] | Non-rational |
\(3\times 3\)
|
\(\bullet \)
|
\(\bullet \)
| |||
[12] | Non-rational |
\(3\times 3\)
|
\(\bullet \)
| ||||
[8] | Rational |
\(3\times 3\)
|
\(\bullet \)
|
\(\bullet \)
| CUDA |
\(\bullet \)
| |
[34] | Rational |
\(3\times 3\)
|
\(\bullet \)
| CUDA |
\(\bullet \)
| ||
[38] | Rational |
\(3\times 3\)
|
\(\bullet \)
|
\(\bullet \)
|
\(\bullet \)
| ||
[30] | Non-rational | N/A |
\(\bullet \)
|
\(\bullet \)
|
\(\bullet \)
| ||
[35] | Rational | N/A | CUDA | ||||
[7] | Non-rational |
\(3\times 3\)
|
\(\bullet \)
|
\(\bullet \)
|
\(\bullet \)
| ||
[25] | Non-rational |
\(3\times 3\)
|
\(\bullet \)
|
\(\bullet \)
|
\(\bullet \)
| ||
[4] | Non-rational | N/A |
\(\bullet \)
|
\(\bullet \)
| CUDA |
\(\bullet \)
| |
Our work | Non-rational |
\(12\times 12\)
|
\(\bullet \)
|
\(\bullet \)
| CUDA/OpenCL | * |
4 Multi-Level Evaluation of Bézier Surfaces
5 Parallel Implementations
#pragma omp parallel for
.5.1 GPU Parallel Computing
5.2 Heterogeneous Parallel Computing
5.3 Rendering and Graphics Interoperability
6 Performance Evaluation and Results
Device | Codename | Type | Year | Implementation |
---|---|---|---|---|
Intel® \({{\textit{Core}}}^{{\textit{TM}}}\) i7 930 2.80GHz
| Nehalem | CPU | 2008 | OpenMP |
NVIDIA® \({\textit{GTX}}^{{\textit{TM}}}\) 460
| Fermi | Discrete GPU | 2010 | CUDA |
NVIDIA® \({\textit{GTX}}^{{\textit{TM}}}\) 980
| Maxwell | Discrete GPU | 2014 | CUDA |
NVIDIA® \({\textit{Jetson}}^{{\textit{TM}}}\) TK1
| Logan/Kepler | Integrated GPU\(^{\mathrm{a}}\)
| 2014 | CUDA/OpenMP |
AMD® A10-7850K
| Kaveri | Integrated GPU\(^{\mathrm{b}}\)
| 2014 | OpenCL |
float
), as well as double-precision (double
) data types. For many applications this is an important consideration which may have an impact on the performance, precision and numerical stability. Computer-graphics applications, for instance, are mostly concerned about performance and often use single-precision as the data-type of choice; in simulation applications on the other hand, precision and numerical stability are of paramount importance, and hence, a double-precision data-type is preferred.6.1 Evaluation on CPU
6.2 Evaluation on GPUs
6.3 Evaluation on HCSs
OpenCL | Cooperation | Basis transfer |
level2
|
level1
| Surface transfer | Total |
---|---|---|---|---|---|---|
2.0* | CPU+GPU (DDC) | 0.000 | 0.052 | 4.026 | 0.000 | 4.078 |
1.2 | CPU+GPU (SDC) | 0.293 | 0.052 | 4.631 | 0.278 | 5.253 |
2.0* | CPU+GPU (SDC) | 0.000 | 0.054 | 4.603 | 0.000 | 4.657 |