Elsevier

Neurocomputing

Volume 330, 22 February 2019, Pages 380-393
Neurocomputing

A hybrid particle swarm optimization algorithm for load balancing of MDS on heterogeneous computing systems

https://doi.org/10.1016/j.neucom.2018.11.034Get rights and content

Abstract

An efficient hybrid genetic algorithm and particle swarm optimization algorithm (HGAPSO)is studied in this work for load balancing of molecular dynamics simulations (MDS) on heterogeneous supercomputers by combining the genetic algorithm (GA) and the particle swarm optimization (PSO) algorithm. A hybrid CPU-GPU platform is applied to enabling large-scale MDS that emulates the metal solidification. Applied to task scheduling of the parallel algorithm, the approach obtains excellent results. The experimental results show that the proposed algorithm can improve the efficiency of parallel computing and the precision of physical simulation.

Introduction

A heterogeneous supercomputer system can be defined as a range of system resources, locally or geographically distributed, which can be utilized to execute computationally intensive applications [1]. Task scheduling for load balancing generally focuses on heuristic-based algorithms, which consist of three subcategories: list scheduling [2], clustering scheduling [3], and duplication scheduling [4]. An important type is the so-called list scheduling algorithms, such as heterogeneous earliest finish time (HEFT) [5] and critical path on a processor (CPOP) [6]. Ref. [7], [8], [9] are on scheduling and load balancing computations on real parallel systems.

Contrary to heuristic-based algorithms, swarm intelligence algorithms incorporate an evolutionary process in the search for solutions. Swarm intelligence algorithms typically require sufficient sampling of candidate solutions in the search space and have robust performance on heterogeneous supercomputer systems for load balancing [10]. Chemical Reaction Optimization (CRO), Particle Swarm Optimization (PSO) [11], Ant Colony Optimization (ACO) [12], Simulated Annealing (SA) [13], Tabu Search (TS) [14], and Genetic Algorithms (GA) have been successfully applied to scheduling. GAs had been widely used for solution evolutions. These GA approaches vary in their encoding schemes and the implementation of genetic operators, as well as in the methods for solution evaluation.

The genetic algorithm (GA) originates from the computer simulation of biological systems. This algorithm uses the bit string coding technology to produce an initial population composed of a certain number of chromosomes and further evaluate its fitness. The selection strategy based on the fitness value ratio has been adopted to choose individuals in this population. Furthermore, crossover and mutation are used to produce the next generation of population. Meanwhile, the colony evolves until optimal solutions are reached [15]. Therefore, GA has been applied to smooth path planning for mobile robots [16], task scheduling in cloud environments [17], distributed flexible job shop scheduling problems [18], multi-objective optimization [19], identification of influencing financial ratios [20], Ab Initio Protein Structure Prediction [21], parallel processor [22], and Short-Term Traffic Congestion Forecasting [23].

Particle swarm optimization (PSO) is an intelligence optimization algorithm that imitates the social interaction and creature communication that occurs within, for example, bird flock and fish schools [24]. PSO shares many similarities with swarm intelligence optimization algorithms, and it has been proven that it is an effective approach for a variety of difficult optimization problems. PSO has been employed to solve Global Smooth Path Planning for Mobile Robots [25], lD heat conduction equation [26], Stochastic Optimal Operation of Micro-grid [27], Nonlinear Electrical Impedance Tomography Reconstruction [28], Stability Analysis [29], Carpool Service Problem [30], Nearest Neighbor Classification [31], Dual-Objective Scheduling of Rescue Vehicles [32], and Fast Nonlinear Model Predictive Control on FPGA [33].

In the field of materials science, molecular dynamics simulation based on Newtonian mechanics is a popular method in the experimental research on and theoretical analysis of metal solidification [34]. In 1957, Alder and Wainwright [35] put forward the molecular dynamic simulation method for perfect hard ball systems for the first time. However, their system only involved a small number of atoms. Worsened by its low computational speed, the simulation results were quite different from the actual scenarios. With the rapid development of computer science and technology, a variety of parallel algorithms have been proposed to improve the calculation efficiency of the parallel program of the molecular dynamics simulation [36]. On the other hand, with the development of GPU programmability, particularly after the development of the compute unified device architecture (CUDA) and open graphics library (OpenGL) programming environments, the complexity of general programming has been greatly reduced. Therefore, various parallel algorithms for molecular dynamics simulation based on the CUDA or OpenGL framework have been presented [37]. The applications of GPU technology act as a powerful complement to the theoretical calculation and experimentation of molecular dynamics simulation and are widely applied to research areas such as materials, chemistry, engineering, and mathematics [38]. In particular, the parallelization of molecular dynamics simulations is highlighted in the field of applications.

For the study of metal solidification, the molecular dynamics simulation is adopted to observe the transient change of the metal microstructure at the electronic and atomic level [39]. In the parallel study of molecular dynamics simulations, molecular numbers and parallel approaches have been the concern of the majority of researchers. The previous program for the acceleration of molecular dynamics simulation is based on the parallel virtual machine (PVM) model using the multi-core CPU platform; then, multi-core CPU cluster architectures are used in the improved parallel algorithms based on the message passing interface (MPI) and open multiple processing (OpenMP) model, leading to an increase in the number of atoms involved in the simulated system from 50,000 to 10,000,000 or even more [39], [40], [41]. The computing power of the GPU far exceeds that of the CPU for computationally intensive problems. Additionally, calculations of interatomic forces consume approximately 90% of the computational effort during the entire calculation processes of molecular dynamics simulation. Therefore, parallel programs are executed on GPU and previous platforms. The result shown, parallel programs running on GPU have better performance than previous platforms [40], [41], [42]. However, serious problems such as frequent access conflict, inefficient memory bandwidth, and overload of the communication in memory also exist, resulting in performance decline when parallel programs of molecular dynamics simulation are running on GPU [41].

The main contributions of the paper are summarized as follows. First, the domain decomposition model of molecular dynamics simulation based on hybrid CPU-GPU architectures is proposed, it improved the performance of the HCPUGPU-MD parallel algorithm. Second, we introduce a hybrid algorithm (HGAPSO) by combining the GA and PSO algorithm, it has better performance than previous methods. Third, the parallel algorithm for molecular dynamics simulation based on the multi-core CPUs and GPU architectures is designed and implemented (HGAPSO-MD). The obtained results show that the proposed HGAPSO algorithm has excellent performance for load balancing of molecular dynamics simulations on a heterogeneous supercomputer.

The remainder of this article is organized as follows. Section 2 introduces the standard algorithms of GA and PSO. Then, the heterogeneous CPU-GPU architecture for molecular dynamics simulations is presented in Section 3. We give a brief discussion of the parallel programming of the molecular dynamics simulation in Section 4. Section 5 presents the steps flowchart of the HGAPSO algorithm. In Section 6, experimental results and analysis are given in detail, and the last section concludes this article with prospects for future research.

Section snippets

GA and PSO algorithms

In practice, the optimal solution of NP-complete problem is hard to get. Thus, researchers used the heuristic algorithm to calculate the approximate optimal solution of this kind of problem. Among them, the genetic algorithm and particle swarm optimization algorithm are two classical algorithms.

Heterogeneous CPU-GPU architecture and programming

In this section, we give a simple introduction to the hybrid CPU-GPU architecture computing system that consists of several processors with a multi-core CPU cluster and a single GPU device as depicted in Fig. 1, which also shows our experimental platform.

To implement the parallel computation of molecular dynamics simulation on this type of hybrid computing system, we present herein a unified execution model for collaboration capable of exploiting the platform’s full heterogeneous potential

Molecular dynamics simulation with collaborative execution

This section will introduce a parallel algorithm for molecular dynamics simulation based on the CPU-GPU architecture (HCPUGPU). In the parallel algorithm, a domain decomposition model is proposed to improve the performance of the algorithm.

Hybrid particle swarm optimization algorithm

This section presented a hybrid parallel algorithm by combining the genetic algorithm and particle swarm algorithm hybrid (HGAPSO-MD). In the molecular dynamics simulation computing system, the performance of parallel algorithm has further improvement than previous algorithms.

Experiment and simulation

The Tianhe-1A supercomputer system is employed as the experiment platform of this study. Tianhe-1A’s peak computing speed is 4700 TFlops, containing 7168 computing nodes and 1024 Service nodes. On each node, the GPU-M2050’s device memory size is less than 3 GB, contributes are 515 GFlops; the CPU-X5670s’ shared memory size is 32 GB, and contributes are only 140 GFlops. Nodes are connected via the fat-tree interconnection network; the latency is 1.57 μs and the bidirectional bandwidth is 20

Acknowledgments

A preliminary version of this manuscript was presented on the 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD 2016).

This research was partially funded by the National Key R&D Program of China (Grant Nos. SQ2018YFB020061, 2015AA015305), the National Natural Science Foundation of China (Grant Nos. 61702170, 61602350, 61602170, 61370098, 61672219), the Key Program of National Natural Science Foundation of China (Grant No. 61432005), the

Dapu Li received the B.S. degree in computer science and technology from Nanjing army command college in 2006, and the M.S. degree in applied mathematics from Guilin university of electronic technology in 2010. He is currently working toward the Ph.D. degree at Hunan university in parallel algorithms, grid and cloud computing.

References (54)

  • M. Papadrakakis et al.

    A new era in scientific computing: domain decomposition methods in hybrid CPUGPU architectures

    Comput. Methods Appl. Mech. Eng.

    (2011)
  • E. Lindholm et al.

    Nvidia tesla: a unified graphics and computing architecture

    IEEE Micro

    (2008)
  • ZhangM. et al.

    Research on GAPSO algorithm-based grid workflow scheduling

    Comput. Appl. Softw.

    (2011)
  • H. Arabnejad et al.

    List scheduling algorithm for heterogeneous systems by an optimistic cost table

    IEEE Trans. Parallel Distrib. Syst.

    (2014)
  • LiK. et al.

    Scheduling precedence constrained stochastic tasks on heterogeneous cluster systems

    IEEE Trans. Comput.

    (2015)
  • H. Kanemitsu et al.

    Clustering-based task scheduling in a large number of heterogeneous processors

    IEEE Trans. Parallel Distrib. Syst.

    (2016)
  • LiK. et al.

    Energy-efficient stochastic task scheduling on heterogeneous computing systems

    IEEE Trans. Parallel Distrib. Syst.

    (2014)
  • H. Topcuoglu et al.

    Performance-effective and low-complexity task scheduling for heterogeneous computing

    IEEE Trans. Parallel Distrib. Syst.

    (2002)
  • H. Topcuoglu et al.

    Task scheduling algorithms for heterogeneous processors

    Proceedings of the Eighth Heterogeneous Computing Workshop, 1999. (HCW ’99)

    (1999)
  • M. Lieber et al.

    Highly scalable SFC-based dynamic load balancing and its application to atmospheric modeling

    Fut. Gener. Comput. Syst.

    (2017)
  • QiuK. et al.

    Paracon: a parallel control plane for scaling up path computation in SDN

    IEEE Trans. Netw. Serv. Manag.

    (2017)
  • ChenC. et al.

    GPU-accelerated parallel hierarchical extreme learning machine on flink for big data

    IEEE Trans. Syst. Man Cybern.: Syst.

    (2017)
  • ChenC. et al.

    Gflink: an in-memory computing architecture on heterogeneous CPU-GPU clusters for big data

    IEEE Trans. Parallel Distrib. Syst.

    (2018)
  • SongB. et al.

    A new genetic algorithm approach to smooth path planning for mobile robots

    Assemb. Autom.

    (2016)
  • T.M. Lakshmi et al.

    A genetic bankrupt ratio analysis tool using a genetic algorithm to identify influencing financial ratios

    IEEE Trans. Evol. Comput.

    (2016)
  • M.A. Rashid et al.

    An enhanced genetic algorithm for ab initio protein structure prediction

    IEEE Trans. Evol. Comput.

    (2016)
  • S.P.H. Alinodehi et al.

    High-speed general purpose genetic algorithm processor

    IEEE Trans. Cybern.

    (2016)
  • Cited by (18)

    • An improved adaptive genetic algorithm based on DV-Hop for locating nodes in wireless sensor networks

      2021, Neurocomputing
      Citation Excerpt :

      However, what kind of meta-heuristic algorithm is more advantageous and how much accuracy can be raised by improving the algorithm compared with the others are to be explored [20]. After comparing the performance of GA, bat algorithm [21], cuckoo search (CS) and particle swarm optimization (PSO) [22], this paper introduces an IAGA algorithm [23] and other standard nature-inspired algorithms to replace LSM to locate the unknown nodes [24]. Thanks to the improved strategy and modified evaluation function, the locating accuracy is enhanced considerably [25], and our proposed algorithm has great robustness in comparison with other opponents [26], as well.

    • Preaching-inspired swarm intelligence algorithm and its applications

      2021, Knowledge-Based Systems
      Citation Excerpt :

      Because of the excellent ability of the PSO in dealing with practical problems, many scholars have been attracted to further study it. In recent years, the PSO has evolved many improved variants, which perform better than the standard one in some respects [7–10]. Since the PSO was proposed, swarm intelligence algorithms based on other animal behavior have been proposed in succession.

    • Hybrid biogeography-based optimization with shuffled frog leaping algorithm and its application to minimum spanning tree problems

      2019, Swarm and Evolutionary Computation
      Citation Excerpt :

      Chen et al. [23] also developed another hybrid algorithm based on Imperialist Competitive Algorithm (ICA) and FA for solving the portfolio selection problem. Li et al. [24] developed an efficient hybrid algorithm HGAPSO for loaded balancing of molecular dynamics simulations on heterogeneous supercomputers by combining GA and PSO. Al-Thanoon et al. [25] proposed a hybrid FA and PSO algorithm which can efficiently exploit the strong points of both FA and PSO in finding the most relevant descriptors with high classification performance, and used the hybrid algorithm to determine the tuning parameter in penalized support vector machine.

    View all citing articles on Scopus

    Dapu Li received the B.S. degree in computer science and technology from Nanjing army command college in 2006, and the M.S. degree in applied mathematics from Guilin university of electronic technology in 2010. He is currently working toward the Ph.D. degree at Hunan university in parallel algorithms, grid and cloud computing.

    Kenli Li received the Ph.D. degree in computer science from Huazhong University of Science and Technology, China, in 2003. He is currently a full professor of computer science and technology at Hunan University. His major research includes parallel computing, and cloud computing. He has published more than 150 papers in international conferences and journals, such as IEEE-TC, IEEE-TPDS. He is an outstanding member of CCF.

    Jie Liang received the B.S. degree in communication engineering from Henan normal university in 2011, and the M.S. degree in information and communication engineering from Hunan university in 2014. She is currently working toward the Ph.D. degree at Hunan University, China. Her research interests are mainly in modeling and scheduling of distributed computing systems, optimization and parallel algorithms, game theory, grid and cloud computing.

    Aijia Ouyang received the Ph.D. degree in computer science from Hunan University, Changsha, China, in 2015. He has published over 20 research papers in international conference and journals of intelligence algorithms and parallel computing. His current research interests include parallel computing, cloud computing, and big data.

    View full text