1 Introduction
2 Related Work
3 Internal Design and Domain Partitioning
Microenvironment
and Microenvironment_Options
), (2) Physical Domain represented as 2-D/3-D Mesh (General_Mesh
, Cartesian_Mesh
and Voxel
), and (3) Cells (Basic_Agent
and Agent_Container
). The data members of some classes are either the objects or the pointers of another class type (see dashed arrows in Fig. 1). The Microenvironment
class sets the micro-environment name, the diffusion/decay rates of substrates, defines constants for the Thomas algorithm, contains an object of Cartesian_Mesh
, a pointer to the Agent_Container
class and performs I/O. A group of resizing functions that determine the global/local voxels are members of the Cartesian_Mesh
class. The Microenvironment_Options
class helps to set oxygen as the first default substrate and the default dimensions of the domain/voxel. The Cartesian_Mesh
class is publicly derived from General_Mesh
(thick arrow in Fig. 1). The Basic_Agent
class forms an abstraction of a cell. An object of the Basic_Agent
class can either act as a source or sink of/for substrates. Each agent has a unique ID, a type, and maintains the local/global index of its current voxel.MPI_THREAD_FUNNELED
thread support level and after domain partitioning [27, 28], assign the sub-domains to individual MPI processes. Our implementation as of now supports only a 1-D x-decomposition (see Appendix A). The randomly generated positions of basic agents are mapped to respective processes (see Appendix B) after which they are created individually and in parallel on the MPI processes. Each MPI process initializes an object of the Microenvironment
class, maintains the local and global number of voxels, local (mesh_index
) and global voxel indices (global_mesh_index
) and the center of each local voxel’s global coordinates. A 1-D x-decomposition permits us to employ the optimal serial Thomas algorithm [30, 31] in the undivided y and z dimensions. This enables all threads within a node to simultaneously act on elements belonging to different linear systems.4 Experiments
GCC 8.1
and OpenMPI 3.1.1
running atop the SUSE Linux Enterprise Server 12 SP2 OS. The parallel file system is the IBM General Parallel File System and the compute nodes are interconnected with the Intel Omni-Path technology with a bandwidth of 100 Gbits/s. We pinned the threads to individual cores and bind each MPI process to a single processor (socket). We set the OpenMP environment variables OMP_PROC_BIND=spread
, OMP_PLACES=threads
[26] and used the –map-by ppr:1:socket:pe=24
notation to allocate resources (see https://gitlab.bsc.es/gsaxena/biofvm_x)..mat
file (I/O kernel) (4) Creates Basic Agents (Sources and Sinks, BAG kernel) and (5) Simulates Sources/Sinks and Diffusion (Solver kernel).cross_section_surface.m
Matlab script bundled with BioFVM.Microenvironment
and Basic_Agent
class objects were simultaneously carried out on separate processes in BioFVM-X as opposed to a single thread in BioFVM. The (MPI) I/O kernel showed significant performance gains over serial I/O for the tests considered (Fig. 2). Nevertheless, the Solver kernel execution run-times did not reflect a significant gain in the Hybrid version. An extended analysis of these results can be found in Appendix C. Note that it is generally very difficult for an MPI+OpenMP implementation to outperform the pure OpenMP implementation on a single node, as is the case of Fig. 2, due to the additional memory footprint of MPI and the cost of message-passing/synchronization. Our aim in the current work was to tackle very large problems that cannot fit into the memory a single node and to reduce their time to solution in a multi-node scenario.
\(7680 \times 7680 \times 7680\) | OpenMP | Hyb (n = 4) | Hyb (n = 8) |
---|---|---|---|
Build \(\mu \)-environment | - | 141.98 | 67.81 |
Gaussian profile | - | 0.916 | 0.448 |
Initial file write | - | 2.56 | 4.1 |
Agent generation | - | 0.1060 | 0.0023 |
Source/sink/diffusion | - | 1109.69 | 1210.41 |
Final file write | - | 4.83 | 3.32 |
Total time | - | 1260 | 1286.1 |