GPU-based power flow analysis with Chebyshev preconditioner and conjugate gradient method

doi:10.1016/j.epsr.2014.05.005

Electric Power Systems Research

Volume 116, November 2014, Pages 87-93

https://doi.org/10.1016/j.epsr.2014.05.005 Get rights and content

Highlights

•
In this paper, a GPU-based Chebyshev preconditioner is developed with the integration of an iterative conjugate gradient (CG) solver.
•
This work targets at solving the power flow equations in power systems, as well as any sparse linear systems that are symmetric positive definite.
•
Our work considers Chebyshev preconditioner and conjugate gradient method together to choose the proper degree for Chebyshev preconditioner.
•
The max speedup can reach 46× for Chebyshev preconditioner and 4× for CG solver among all sample systems over the corresponding Matlab baseline.
•
The results suggest that the iterative solver should be considered to further improve the overall performance of solving linear equations.

Abstract

Traditionally, linear equations in power system applications are solved by direct methods based on LU decomposition. With the development of advanced power system controls, the industrial and research community is more interested in simulating larger, interconnected power grids. Iterative methods such as the conjugate gradient method have been applied to power system applications in the literature for its parallelism potential with larger systems. Preconditioner, used for preconditioning the linear system for a better convergence rate in iterative computations, is an indispensable part of iterative solving process. This work implemented a polynomial preconditioner Chebyshev preconditioner with graphic processing unit (GPU), and integrated a GPU-based conjugate gradient solver. Results show that GPU-based Chebyshev preconditioner can reach around 46× speedup for the largest test system, and conjugate gradient can gain more than 4× speedup. This demonstrates great potentials for GPU application in power system simulation.

Introduction

Power flow is the fundamental component in power system analysis and simulation. It is usually modeled as a nonlinear system. The Newton–Raphson method converts this nonlinear system to a group of linear equations with the introduction of Jacobian matrix as Eq. (1) shows. $[\begin{array}{c} Δ P \\ Δ Q \end{array}] = [\begin{array}{c} \frac{\partial Δ P}{\partial δ} & \frac{\partial Δ P}{\partial V} \\ \frac{\partial Δ Q}{\partial δ} & \frac{\partial Δ Q}{\partial V} \end{array}] [\begin{array}{c} Δ δ \\ Δ V \end{array}]$

Each iteration in Newton–Raphson method requires solving a set of sparse linear equations. We measured the linear equation solving time and total run time for large systems in MATPOWER. The results show that about 40–50% of the total time is spent on solving linear equations. Therefore improving the efficiency of solving linear system is of great importance for accelerating power flow analysis.

LU factorization, the most commonly used direct method, is widely deployed in solving power flows. However, LU factorization is intrinsically a serial algorithm and difficult to parallelize due to the tight data dependency during factorization. LU factorization will be effective for smaller systems while its performance is limited for larger systems. Leon and Semlyen have proven that the iterative solver can provide about 25% performance improvement over LU direct solver for the systems larger than 3183-bus system [1].

On one hand, iterative methods have been adapted to power system computation in various aspects. Pai et al. [2] have implemented the Generalized Minimal Residual (GMRES) method on a Cray machine for dynamic power system simulation. Pai and Dag [3] further applied several iterative solvers including conjugate gradient and GMRES to dynamic power flow simulation and state estimation. On the other hand, the graphic processing unit (GPU) has been widely adopted in high performance computing recently as a parallel hardware architecture. The GPU was originally designed for graphic displaying and processing. It has massive parallel computing units on board to perform graphic computations. Computational unified device architecture (CUDA) [4] provides a C-like programing interface for users to utilize these computation resources. GPU as a co-processor helps a commodity server deliver more computational throughput.

There has been research about GPU implementation of iterative method in different realms. Helfenstein and Koko implemented a SSOR preconditioner and conjugate gradient solver on GPU to solve generalized Poisson equations [5]. Zhang and Zhang presented a sliced block ELLPACK format to implement a least-square polynomial preconditioned conjugate gradient method for finite element problem [6]. Researches based on high performance computing especially GPU related methods begin to emerge in power system applications too. Garcia implemented a preconditioned biconjugate gradient method with GPU and CUDA [7]. Li et al. [8] discussed the limitation of using direct solver to solve large systems, and introduced GPU-based conjugate gradient normal residual with Jacobi preconditioner. Multifrontal method with CUDA library solved AC power flow was adapted as well [9]. Gopal et al. [10] implemented a DC power flow based contingency analysis with GPU. Kamiabad also did a prototype implementation of a Chebyshev polynomial preconditioner and conjugate gradient method with CUBLAS [11]. However, the speedup for their Chebyshev preconditioner is limited to up to 8×.

In this work, a polynomial preconditioner Chebyshev preconditioner with graphic processing unit (GPU) will be implemented and integrated with a GPU-based conjugate gradient solver for linearized DC power flows. Results show that our GPU-based Chebyshev preconditioner can reach around 46× speedup and the conjugate gradient can gain more than 4× speedup compared with corresponding CPU implementation. This demonstrates great potentials for GPU applications in power systems.

The rest of this paper is organized as follows. Section 2 takes a closer look at iterative solutions and the polynomial preconditioner Chebyshev preconditioner. Section 3 introduces GPU and CUDA technology in detail. Section 4 presents the algorithm that this work uses and the corresponding GPU-based implementation. Computing experiments are shown in Section 5. A further discussion is extended in Section 6. Section 7 closes the whole paper.

Section snippets

Conjugate gradient

The linear system Ax = b can be solved either directly by using LU decomposition, or indirectly by finding out the minimum value for a quadratic form as Eq. (2) shows: $f (x) = \frac{1}{2} x^{'} A x - b^{'} x + c$

In Eq. (2), A is a symmetric positive definite matrix and b is a vector. The derivative of Eq. (2) is $f^{'} (x) = A x - b$ . Therefore the x which provides minimum value of Eq. (2) satisfies Ax − b = 0. Thus, it is the solution of Ax = b as well.

An intuitive way to find out the minimum value of Eq. (2) is to use the steepest descent

GPU and CUDA

GPU for general purpose computations has been widely deployed nowadays. GPU was originally designed for graphic processing, which requires intensive floating point computations. Before the release of CUDA, there were only a few general purpose computations that could run on GPU. However they have to be done through graphic application programming interface (API). The higher learning cost for using graphic API limits the development of general purpose computation on GPU. CUDA introduces a C-like

Chebyshev preconditioner algorithm

Chebyshev preconditioner algorithm is presented in Fig. 1. β is the largest eigenvalue of matrix A. α is the smallest eigenvalue of matrix A. ratio is used to estimate the value of α. Z transforms A's spectrum to [−1,1]. The decay rate c_k is related to α and β. r is the degree of Chebyshev preconditioner. Dag and Semlyen [12] discussed how to choose ratio and r in detail. Matrix G is the approximation of A's inverse. The output of Chebyshev preconditioner algorithm is matrix $G$ . Bolded lines are

Computational experiment

This section will present computational experiment results beginning with selecting the degree for the Chebyshev Preconditioner, and then the performance comparison between Matlab implementation and our GPU implementation. Finally further performance improvement is discussed. Since Matlab's default floating point processing precision is double precision, our GPU implementations are all based on double precision floating point numbers for fair comparison purpose.

Note, throughout the

Discussion

Our work discusses the GPU-based implementation of an iterative solver: conjugate gradient solver, and a polynomial preconditioner: the Chebyshev preconditioner. Because of its potential in parallelism and scalability, iterative linear solvers have been adapted to power system applications [3], [12], [21]. Preconditioner plays an important role in the iterative solver. Previously, preconditioner like ILU was widely used to precondition the matrix. However, they suffer a tight data dependency

Conclusion

Power system applications such as power system optimization, control and analysis require intensive computational ability [22]. Solving sparse linear systems is a critical computation element involved in these applications. Our work presents a GPU-based Chebyshev preconditioner, and integrates the iterative conjugate gradient solver for a whole iterative solving chain. Our implementation uses native functions from CUSPARSE and CUBLAS libraries which are already optimized. Implementations based

Acknowledgement

This work was supported in part by US NSF grant ECCS-1128381. This work also made use of the Shared Facilities and the Industry Partnership Program supported by CURENT, an Engineering Research Center (ERC) Program of NSF and DOE under NSF grant EEC-1041877.

Xue Li received her BS degree from Northwestern Polytechnical University China in 2007 and MS degree in 2011 from University of Tennessee. She is presently pursuing her Ph.D. degree in the Department of EECS, The University of Tennessee, Knoxville, TN, 37922, USA. Her research interest is computational methods for power system analysis.

References (22)

R. Helfenstein et al.
Parallel preconditioned conjugate gradient algorithm on GPU
J. Comput. Appl. Math.
(2012)
Z. Li et al.
On limitations of traditional multi-core and potential of many-core processing architectures for sparse linear solvers used in large-scale power system applications
X. Li et al.
Exploration of multifrontal method with GPU in power flow computation
F. de Leon et al.
Iterative solvers in the Newton power flow problem: preconditioners, inexact solutions and partial Jacobian updates
IEE Proc. Gener. Transm. Distrib.
(2002)
M.A. Pai et al.
Conjugate gradient approach to parallel processing in dynamic simulation of power systems
M.A. Pai et al.
Iterative solver techniques in large scale power system computation
NVIDIA
Parallel Programming and Computing Platform | CUDA | NVIDIA
(2013)
J. Zhang et al.
Efficient CUDA polynomial preconditioned conjugate gradient solver for finite element computation of elasticity problems
Math. Prob. Eng.
(2013)
N. Garcia
Parallel power flow solutions using a biconjugate gradient algorithm and a Newton method: a GPU-based approach
A. Gopal et al.
DC power flow based contingency analysis using graphics processing units

A.A. Kamiabad

Implementing a Preconditioned Iterative Linear Solver Using Massively Parallel Graphics Processing Units

(2011)

Cited by (49)

Hardware–software codesign for peer-to-peer energy market resolution
2023, Sustainable Energy, Grids and Networks
The growth of distributed energy resources raises the challenge of scaling up network management algorithms. This difficulty may be overcome in operating conditions with the help of a rich literature that frequently calls upon the distribution of computations. However, this issue persists during preliminary simulations validating the performances, the operation’s safety, and the infrastructure’s sizing. A hardware–software co-design approach is conducted here for a Peer-to-Peer market to address this scaling issue while computing simulations on a single machine. The mapping between several algorithms and different models of partitioning on Central and Graphic Processing Units (CPU–GPU) has been conducted. The complexity and performance of the Operator Splitting Quadratic Program (OSQP) or Alternating Direction Method of Multipliers (ADMM) for a centralized or decentralized resolution according to the hardware have been studied and analyzed in different test cases. The dominance of the pair ADMM and GPU has been demonstrated by having a speed-up of more than 98% compared to the other methods when the market has more than 500 agents.
Recycling Newton–Krylov algorithm for efficient solution of large scale power systems
2023, International Journal of Electrical Power and Energy Systems
Citation Excerpt :
The iterative solvers based on Krylov subspaces are preferred candidates to obtain efficient solutions for large systems, especially on high-performance computing platforms [24]. However, iterative solvers may need a proper design of a preconditioner for improving their efficiencies [25]. Besides the well-known Krylov subspace-based solvers, there are several attempts in the literature to improve the efficiency of the solution procedure.
Power flow calculations are crucial for the study of power systems, as they can be used to calculate bus voltage magnitudes and phase angles, as well as active and reactive power flows on lines. In this paper, a new approach, the Recycling Newton–Krylov (ReNK) algorithm, is proposed to solve the linear systems of equations in Newton–Raphson iterations. The proposed method uses the Generalized Conjugate Residuals with inner orthogonalization and deflated restarting (GCRO-DR) method within the Newton–Raphson algorithm and reuses the Krylov subspace information generated in previous Newton runs. We evaluate the performance of the proposed method over the traditional direct solver (LU) and iterative solvers (Generalized Minimal Residual Method (GMRES), the Biconjugate Gradient Stabilized Method (Bi-CGSTAB) and Quasi-Minimal Residual Method (QMR)) as the inner linear solver of the Newton–Raphson method. We use different test systems with a number of busses ranging from 300 to 70000 and compare the number of iterations of the inner linear solver (for iterative solvers) and the CPU times (for both direct and iterative solvers). We also test the performance of the ReNK algorithm for contingency analysis and for different load conditions to simulate optimization problems and observe possible performance gains.
GPU-based matrix structure driven state estimation for large-scale power systems
2021, International Journal of Electrical Power and Energy Systems
The rapid development of the power grid brings more computational burden to the online state monitoring of the power system. State estimation (SE), the key fundamental part as well as the cornerstone of other applications, requires urgent improvement in its computing efficiency. Recently, the graphics processing unit (GPU) provides potentials for computationally intensive tasks. This paper proposes a GPU-based matrix structure driven (MSD) strategy for the Weighted Least Squares (WLS) state estimator. In this scheme, structures of all the sparse matrices are determined on CPU in advance and numerical calculations are completed during each iteration, with carefully-tuned kernels on GPU. Besides, a novel parallel algorithm is designed to tackle the sparse matrix–matrix multiplication (SPMM) problem, where shared memory is exploited to a great extent for performance improvement. Case studies verify the superiority of the framework and results show that the proposed MSD-SE solution is 4.97 times faster than the CPU-based SE solution on a 27723-bus system.
GPU-accelerated sparse matrices parallel inversion algorithm for large-scale power systems
2019, International Journal of Electrical Power and Energy Systems
State-of-the-art Graphics Processing Unit (GPU) has superior performances on float-pointing calculation and memory bandwidth, and therefore has great potential in many computationally intensive power system applications, one of which is the inversion of large-scale sparse matrix. It is a fundamental component for many power system analyses which requires to solve massive number of forward and backward substitution (F&B) subtasks and seems to be a good GPU-accelerated candidate application. By means of solving multiple F&B subtasks concurrently and a serial of performance tunings in compliance with GPU’s architectures, we successfully develop a batch F&B algorithm on GPUs, which not only extracts the intra-level and intra-level parallelisms inside single F&B subtask but also explores a more regular parallelism among massive F&B subtasks, called inter-task parallelism. Case study on a 9241-dimension case shows that the proposed batch F&B solver consumes 2.92 μs per forward substitution (FS) subtask when the batch size is equal to 3072, achieving 65 times speedup relative to KLU library. And on the basis the complete design process of GPU-based inversion algorithm is proposed. By offloading the tremendous computational burden to GPU, the inversion of 9241-dimension case consumes only 97 ms, which can achieve 8.1 times speedup relative to the 12-core CPU inversion solver based on KLU library. The proposed batch F&B solver is practically very promising in many other power system applications requiring solving massive F&B subtasks, such as probabilistic power flow analysis.
Parallel multi-GPU implementation of fast decoupled power flow solver with hybrid architecture
2024, Cluster Computing
A Review of High-Performance Computing Methods for Power Flow Analysis
2023, Mathematics

View all citing articles on Scopus

Fangxing Li, also known as Fran Li, received his Ph.D. degree from Virginia Tech in 2001. Presently, he is an Associate Professor at The University of Tennessee, Knoxville, TN, USA. Dr. Li is a registered Professional Engineer (P.E.) in North Carolina, and a Fellow of IET (formerly IEE).

View full text

GPU-based power flow analysis with Chebyshev preconditioner and conjugate gradient method

Highlights

Abstract

Introduction

Section snippets

Conjugate gradient

GPU and CUDA

Chebyshev preconditioner algorithm

Computational experiment

Discussion

Conclusion

Acknowledgement

J. Comput. Appl. Math.

Iterative solvers in the Newton power flow problem: preconditioners, inexact solutions and partial Jacobian updates

IEE Proc. Gener. Transm. Distrib.

Conjugate gradient approach to parallel processing in dynamic simulation of power systems

Iterative solver techniques in large scale power system computation

Parallel Programming and Computing Platform | CUDA | NVIDIA

Efficient CUDA polynomial preconditioned conjugate gradient solver for finite element computation of elasticity problems

Math. Prob. Eng.

Parallel power flow solutions using a biconjugate gradient algorithm and a Newton method: a GPU-based approach

DC power flow based contingency analysis using graphics processing units

Implementing a Preconditioned Iterative Linear Solver Using Massively Parallel Graphics Processing Units