A hybrid particle swarm optimization algorithm for load balancing of MDS on heterogeneous computing systems
Introduction
A heterogeneous supercomputer system can be defined as a range of system resources, locally or geographically distributed, which can be utilized to execute computationally intensive applications [1]. Task scheduling for load balancing generally focuses on heuristic-based algorithms, which consist of three subcategories: list scheduling [2], clustering scheduling [3], and duplication scheduling [4]. An important type is the so-called list scheduling algorithms, such as heterogeneous earliest finish time (HEFT) [5] and critical path on a processor (CPOP) [6]. Ref. [7], [8], [9] are on scheduling and load balancing computations on real parallel systems.
Contrary to heuristic-based algorithms, swarm intelligence algorithms incorporate an evolutionary process in the search for solutions. Swarm intelligence algorithms typically require sufficient sampling of candidate solutions in the search space and have robust performance on heterogeneous supercomputer systems for load balancing [10]. Chemical Reaction Optimization (CRO), Particle Swarm Optimization (PSO) [11], Ant Colony Optimization (ACO) [12], Simulated Annealing (SA) [13], Tabu Search (TS) [14], and Genetic Algorithms (GA) have been successfully applied to scheduling. GAs had been widely used for solution evolutions. These GA approaches vary in their encoding schemes and the implementation of genetic operators, as well as in the methods for solution evaluation.
The genetic algorithm (GA) originates from the computer simulation of biological systems. This algorithm uses the bit string coding technology to produce an initial population composed of a certain number of chromosomes and further evaluate its fitness. The selection strategy based on the fitness value ratio has been adopted to choose individuals in this population. Furthermore, crossover and mutation are used to produce the next generation of population. Meanwhile, the colony evolves until optimal solutions are reached [15]. Therefore, GA has been applied to smooth path planning for mobile robots [16], task scheduling in cloud environments [17], distributed flexible job shop scheduling problems [18], multi-objective optimization [19], identification of influencing financial ratios [20], Ab Initio Protein Structure Prediction [21], parallel processor [22], and Short-Term Traffic Congestion Forecasting [23].
Particle swarm optimization (PSO) is an intelligence optimization algorithm that imitates the social interaction and creature communication that occurs within, for example, bird flock and fish schools [24]. PSO shares many similarities with swarm intelligence optimization algorithms, and it has been proven that it is an effective approach for a variety of difficult optimization problems. PSO has been employed to solve Global Smooth Path Planning for Mobile Robots [25], lD heat conduction equation [26], Stochastic Optimal Operation of Micro-grid [27], Nonlinear Electrical Impedance Tomography Reconstruction [28], Stability Analysis [29], Carpool Service Problem [30], Nearest Neighbor Classification [31], Dual-Objective Scheduling of Rescue Vehicles [32], and Fast Nonlinear Model Predictive Control on FPGA [33].
In the field of materials science, molecular dynamics simulation based on Newtonian mechanics is a popular method in the experimental research on and theoretical analysis of metal solidification [34]. In 1957, Alder and Wainwright [35] put forward the molecular dynamic simulation method for perfect hard ball systems for the first time. However, their system only involved a small number of atoms. Worsened by its low computational speed, the simulation results were quite different from the actual scenarios. With the rapid development of computer science and technology, a variety of parallel algorithms have been proposed to improve the calculation efficiency of the parallel program of the molecular dynamics simulation [36]. On the other hand, with the development of GPU programmability, particularly after the development of the compute unified device architecture (CUDA) and open graphics library (OpenGL) programming environments, the complexity of general programming has been greatly reduced. Therefore, various parallel algorithms for molecular dynamics simulation based on the CUDA or OpenGL framework have been presented [37]. The applications of GPU technology act as a powerful complement to the theoretical calculation and experimentation of molecular dynamics simulation and are widely applied to research areas such as materials, chemistry, engineering, and mathematics [38]. In particular, the parallelization of molecular dynamics simulations is highlighted in the field of applications.
For the study of metal solidification, the molecular dynamics simulation is adopted to observe the transient change of the metal microstructure at the electronic and atomic level [39]. In the parallel study of molecular dynamics simulations, molecular numbers and parallel approaches have been the concern of the majority of researchers. The previous program for the acceleration of molecular dynamics simulation is based on the parallel virtual machine (PVM) model using the multi-core CPU platform; then, multi-core CPU cluster architectures are used in the improved parallel algorithms based on the message passing interface (MPI) and open multiple processing (OpenMP) model, leading to an increase in the number of atoms involved in the simulated system from 50,000 to 10,000,000 or even more [39], [40], [41]. The computing power of the GPU far exceeds that of the CPU for computationally intensive problems. Additionally, calculations of interatomic forces consume approximately 90% of the computational effort during the entire calculation processes of molecular dynamics simulation. Therefore, parallel programs are executed on GPU and previous platforms. The result shown, parallel programs running on GPU have better performance than previous platforms [40], [41], [42]. However, serious problems such as frequent access conflict, inefficient memory bandwidth, and overload of the communication in memory also exist, resulting in performance decline when parallel programs of molecular dynamics simulation are running on GPU [41].
The main contributions of the paper are summarized as follows. First, the domain decomposition model of molecular dynamics simulation based on hybrid CPU-GPU architectures is proposed, it improved the performance of the HCPUGPU-MD parallel algorithm. Second, we introduce a hybrid algorithm (HGAPSO) by combining the GA and PSO algorithm, it has better performance than previous methods. Third, the parallel algorithm for molecular dynamics simulation based on the multi-core CPUs and GPU architectures is designed and implemented (HGAPSO-MD). The obtained results show that the proposed HGAPSO algorithm has excellent performance for load balancing of molecular dynamics simulations on a heterogeneous supercomputer.
The remainder of this article is organized as follows. Section 2 introduces the standard algorithms of GA and PSO. Then, the heterogeneous CPU-GPU architecture for molecular dynamics simulations is presented in Section 3. We give a brief discussion of the parallel programming of the molecular dynamics simulation in Section 4. Section 5 presents the steps flowchart of the HGAPSO algorithm. In Section 6, experimental results and analysis are given in detail, and the last section concludes this article with prospects for future research.
Section snippets
GA and PSO algorithms
In practice, the optimal solution of NP-complete problem is hard to get. Thus, researchers used the heuristic algorithm to calculate the approximate optimal solution of this kind of problem. Among them, the genetic algorithm and particle swarm optimization algorithm are two classical algorithms.
Heterogeneous CPU-GPU architecture and programming
In this section, we give a simple introduction to the hybrid CPU-GPU architecture computing system that consists of several processors with a multi-core CPU cluster and a single GPU device as depicted in Fig. 1, which also shows our experimental platform.
To implement the parallel computation of molecular dynamics simulation on this type of hybrid computing system, we present herein a unified execution model for collaboration capable of exploiting the platform’s full heterogeneous potential
Molecular dynamics simulation with collaborative execution
This section will introduce a parallel algorithm for molecular dynamics simulation based on the CPU-GPU architecture (HCPUGPU). In the parallel algorithm, a domain decomposition model is proposed to improve the performance of the algorithm.
Hybrid particle swarm optimization algorithm
This section presented a hybrid parallel algorithm by combining the genetic algorithm and particle swarm algorithm hybrid (HGAPSO-MD). In the molecular dynamics simulation computing system, the performance of parallel algorithm has further improvement than previous algorithms.
Experiment and simulation
The Tianhe-1A supercomputer system is employed as the experiment platform of this study. Tianhe-1A’s peak computing speed is 4700 TFlops, containing 7168 computing nodes and 1024 Service nodes. On each node, the GPU-M2050’s device memory size is less than 3 GB, contributes are 515 GFlops; the CPU-X5670s’ shared memory size is 32 GB, and contributes are only 140 GFlops. Nodes are connected via the fat-tree interconnection network; the latency is 1.57 μs and the bidirectional bandwidth is 20
Acknowledgments
A preliminary version of this manuscript was presented on the 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD 2016).
This research was partially funded by the National Key R&D Program of China (Grant Nos. SQ2018YFB020061, 2015AA015305), the National Natural Science Foundation of China (Grant Nos. 61702170, 61602350, 61602170, 61370098, 61672219), the Key Program of National Natural Science Foundation of China (Grant No. 61432005), the
Dapu Li received the B.S. degree in computer science and technology from Nanjing army command college in 2006, and the M.S. degree in applied mathematics from Guilin university of electronic technology in 2010. He is currently working toward the Ph.D. degree at Hunan university in parallel algorithms, grid and cloud computing.
References (54)
- et al.
Load and storage balanced posting file partitioning for parallel information retrieval
J. Syst. Softw.
(2011) - et al.
A particle swarm optimization approach for the bi-objective load balancing problem
Electron. Notes Discret. Math.
(2010) - et al.
Assigning real-time tasks to heterogeneous processors by applying ant colony optimization
J. Parallel Distrib. Comput.
(2011) - et al.
Multi-criteria scheduling of bag-of-tasks applications on heterogeneous interlinked clouds with simulated annealing
J. Syst. Softw.
(2015) - et al.
A meta-Heuristic optimization approach to the scheduling of bag-of-tasks applications on heterogeneous clouds with multi-level arrivals and critical jobs
Simul. Model. Pract. Theory
(2015) - et al.
An improved genetic algorithm for task scheduling in the cloud environments using the priority queues: formal verification, simulation, and statistical testing
J. Syst. Softw.
(2017) - et al.
Effects of different chromosome representations in developing genetic algorithms to solve DFJS scheduling problems
Comput. Oper. Res.
(2017) - et al.
The elitist non-dominated sorting genetic algorithm with inheritance (I-NSGA-II) and its jumping gene adaptations for multi-objective optimization
Inf. Sci.
(2017) - et al.
Performance analysis of parallel algorithms in physics simulation for molecular dynamics simulation liquid metals solidification processes
Comput. Fluids
(2015) - et al.
Fast analysis of molecular dynamics trajectories with graphics processing unitsradial distribution function histogramming
J. Comput. Phys.
(2011)
A new era in scientific computing: domain decomposition methods in hybrid CPUGPU architectures
Comput. Methods Appl. Mech. Eng.
Nvidia tesla: a unified graphics and computing architecture
IEEE Micro
Research on GAPSO algorithm-based grid workflow scheduling
Comput. Appl. Softw.
List scheduling algorithm for heterogeneous systems by an optimistic cost table
IEEE Trans. Parallel Distrib. Syst.
Scheduling precedence constrained stochastic tasks on heterogeneous cluster systems
IEEE Trans. Comput.
Clustering-based task scheduling in a large number of heterogeneous processors
IEEE Trans. Parallel Distrib. Syst.
Energy-efficient stochastic task scheduling on heterogeneous computing systems
IEEE Trans. Parallel Distrib. Syst.
Performance-effective and low-complexity task scheduling for heterogeneous computing
IEEE Trans. Parallel Distrib. Syst.
Task scheduling algorithms for heterogeneous processors
Proceedings of the Eighth Heterogeneous Computing Workshop, 1999. (HCW ’99)
Highly scalable SFC-based dynamic load balancing and its application to atmospheric modeling
Fut. Gener. Comput. Syst.
Paracon: a parallel control plane for scaling up path computation in SDN
IEEE Trans. Netw. Serv. Manag.
GPU-accelerated parallel hierarchical extreme learning machine on flink for big data
IEEE Trans. Syst. Man Cybern.: Syst.
Gflink: an in-memory computing architecture on heterogeneous CPU-GPU clusters for big data
IEEE Trans. Parallel Distrib. Syst.
A new genetic algorithm approach to smooth path planning for mobile robots
Assemb. Autom.
A genetic bankrupt ratio analysis tool using a genetic algorithm to identify influencing financial ratios
IEEE Trans. Evol. Comput.
An enhanced genetic algorithm for ab initio protein structure prediction
IEEE Trans. Evol. Comput.
High-speed general purpose genetic algorithm processor
IEEE Trans. Cybern.
Cited by (18)
Optimal fog node selection based on hybrid particle swarm optimization and firefly algorithm in dynamic fog computing services
2023, Engineering Applications of Artificial IntelligenceAn improved adaptive genetic algorithm based on DV-Hop for locating nodes in wireless sensor networks
2021, NeurocomputingCitation Excerpt :However, what kind of meta-heuristic algorithm is more advantageous and how much accuracy can be raised by improving the algorithm compared with the others are to be explored [20]. After comparing the performance of GA, bat algorithm [21], cuckoo search (CS) and particle swarm optimization (PSO) [22], this paper introduces an IAGA algorithm [23] and other standard nature-inspired algorithms to replace LSM to locate the unknown nodes [24]. Thanks to the improved strategy and modified evaluation function, the locating accuracy is enhanced considerably [25], and our proposed algorithm has great robustness in comparison with other opponents [26], as well.
Hybrid Particle Swarm and Grey Wolf Optimizer and its application to clustering optimization
2021, Applied Soft ComputingPreaching-inspired swarm intelligence algorithm and its applications
2021, Knowledge-Based SystemsCitation Excerpt :Because of the excellent ability of the PSO in dealing with practical problems, many scholars have been attracted to further study it. In recent years, the PSO has evolved many improved variants, which perform better than the standard one in some respects [7–10]. Since the PSO was proposed, swarm intelligence algorithms based on other animal behavior have been proposed in succession.
Hybrid biogeography-based optimization with shuffled frog leaping algorithm and its application to minimum spanning tree problems
2019, Swarm and Evolutionary ComputationCitation Excerpt :Chen et al. [23] also developed another hybrid algorithm based on Imperialist Competitive Algorithm (ICA) and FA for solving the portfolio selection problem. Li et al. [24] developed an efficient hybrid algorithm HGAPSO for loaded balancing of molecular dynamics simulations on heterogeneous supercomputers by combining GA and PSO. Al-Thanoon et al. [25] proposed a hybrid FA and PSO algorithm which can efficiently exploit the strong points of both FA and PSO in finding the most relevant descriptors with high classification performance, and used the hybrid algorithm to determine the tuning parameter in penalized support vector machine.
Dapu Li received the B.S. degree in computer science and technology from Nanjing army command college in 2006, and the M.S. degree in applied mathematics from Guilin university of electronic technology in 2010. He is currently working toward the Ph.D. degree at Hunan university in parallel algorithms, grid and cloud computing.
Kenli Li received the Ph.D. degree in computer science from Huazhong University of Science and Technology, China, in 2003. He is currently a full professor of computer science and technology at Hunan University. His major research includes parallel computing, and cloud computing. He has published more than 150 papers in international conferences and journals, such as IEEE-TC, IEEE-TPDS. He is an outstanding member of CCF.
Jie Liang received the B.S. degree in communication engineering from Henan normal university in 2011, and the M.S. degree in information and communication engineering from Hunan university in 2014. She is currently working toward the Ph.D. degree at Hunan University, China. Her research interests are mainly in modeling and scheduling of distributed computing systems, optimization and parallel algorithms, game theory, grid and cloud computing.
Aijia Ouyang received the Ph.D. degree in computer science from Hunan University, Changsha, China, in 2015. He has published over 20 research papers in international conference and journals of intelligence algorithms and parallel computing. His current research interests include parallel computing, cloud computing, and big data.