Rotor resistance and excitation inductance estimation of an induction motor using deep-Q-learning algorithm

doi:10.1016/j.engappai.2018.03.018

Engineering Applications of Artificial Intelligence

Volume 72, June 2018, Pages 67-79

https://doi.org/10.1016/j.engappai.2018.03.018 Get rights and content

Abstract

To estimate the parameters of an induction motor in a data-based manner, this paper proposes a new offline method to estimate rotor resistance and excitation inductance based on the deep-Q-learning approach. In this method, parameter estimation can be facilitated without disturbing the model error or operating state. To achieve this goal, three important elements, namely observation, action, and reward, are appropriately designed. To improve the robustness and accelerate the convergence, a new concept, denoted as Q-sensitivity, is proposed and investigated in detail. The experimental results show that a high-Q-sensitivity design can allow the proposed method to obtain a fast and torque-maximized estimation. Results from the comparative studies confirm the accuracy and robustness of the proposed method.

Introduction

Indirect field-oriented control (IFOC) is one of the most popular strategies used in high-performance induction motor (IM) applications. Since IFOC can exploit an inherent slip relation, it is essentially a feed-forward control method, where accurate values of at least some motor parameters are required to be estimated to obtain robust control performance. For example, the rotor time constant may not be correct or precisely known or the slip cannot be correctly attained due to motor heating, field-weakening, and other variation-induced changes. This would result in detuning of the controller and a loss of correct field orientation (Novotny and Lipo, 1996).

To avoid such situations, it is necessary to introduce estimation methods to equip the vector controller with accurate IM parameter values. So far, several methods have been proposed to estimate the correct motor parameters, including model reference adaptive system (MRAS), extended Kalman filter (EKF), sliding mode observer, and recursive least squares Zerdali and Barut, 2017, Yin et al., 2016a, Yin et al., 2016b, Yin et al., 2016c, Yang et al., 2017, Djadi et al., 2017. This class of methods can be summarized as model-based methods since their performance strongly depends on the accuracy of the approximated IM model. Model-based methods generally suffer from two shortcomings. First, they are very sensitive to noise, where the algorithm becomes very unstable under any noise overlaying the IM model. For example, Shi et al. (2012) proposed an EKF algorithm to estimate the rotor speed and position of permanent magnet synchronous motor. However, this algorithm is difficult to be practically applied owing to its sensitivity to noise. Second, in some situations, these methods may fail due to parameter perturbation. For example, MRAS cannot be used at low frequencies due to a drop in stator resistance voltage (Maiti et al., 2008).

Recent advances in artificial intelligence areas have provided new possibility of estimating motor parameters in a data-based manner. These strategies can utilize a range of data training architectures including artificial neural networks (ANNs), support vector machine (SVM), genetic algorithm, and particle swarm optimization (PSO) Papa and Koroušić-Seljak, 2005, Liu et al., 2008, Woodley et al., 2005, Liu et al., 2017. Data-based methods are generally more robust and accurate since they are independent of the motor models. However, they face a significant challenge. In general, most successful data-based applications require considerable hand-labelled training data, where reliable hand-labelled training data are rather difficult to obtain in practical engineering applications. Karanayil et al. (2007) introduced an ANN to estimate the resistance of a stator and rotor in sensorless applications. However, the dataset cannot be used for training the ANN, and the voltage model of the motor is used instead. In other words, the strategies should be model-based rather than data-based. Bilski (2014) presented an SVM-based strategy; however, the method of obtaining the dataset remained unclear in this method. Sakthivel et al. (2010) investigated an MOPSO strategy to estimate the IM parameters, where the manufacturer data were used as reference instead of the training dataset.

A feasible choice for obtaining the hand-labelled data is to use reinforcement learning, in which the agent can learn the correct knowledge from the environment and generate the label data automatically. Recently, the application of reinforcement learning in industry has become a hot issue due to theoretical breakthroughs. Khan et al. (2011) presented a humanoid robotic arm control strategy based on $Q$ -learning and approximate dynamic programming. Yin et al. (2016a) proposed an approximate dynamic programming method to reduce the time delay of affected passengers in a congested metro line. Deisenroth and Rasmussen (2009) presented a Gauss process reinforcement learning algorithm and applied it to inverted pendulum control in hardware. Yin et al. (2016b) develop an integrated train operation algorithm based on $Q$ -learning to realize real-time train operations with online adjusting the timetable. Zhang et al. (2017) presented an energy-efficient scheduling for real-time systems based on the deep $Q$ -learning (DQL) model. Prashanth and Bhatnagar (2011) proposed a reinforcement learning algorithm along with the function approximation algorithm, which can be applied to traffic signal control.

This paper proposes a data-based method with labelled data obtained in a simple manner. By focusing on this objective, an estimation approach is proposed based on the DQL algorithm. In particular, the training dataset can be generated automatically during the estimation procedure in DQL, where the dataset is trained simultaneously. With a careful design of the algorithm’s architecture, DQL can obtain an unbiased estimation of parameters after a few iterations of the training procedure.

The main contributions of this paper can be summarized as follows.

(1) Although breakthrough has been achieved by DQL in some domains, such as computer vision games, recently (Mnih et al., 2013), no study has applied DQL to the motor control technology. In this study, DQL is used successfully for motor parameter estimation.

(2) To ensure DQL feasibility, the ways of designing the DQL architecture are also discussed, which include the design of observation, action, and reward. Moreover, an indicator, namely $Q$ -sensitivity, is introduced to guarantee the robustness and efficiency of the algorithm, which also demonstrates that the algorithm with high $Q$ -sensitivity is more stable and can converge at a much faster rate.

(3) Some comparative experiments are designed to illustrate the DQL performance. Through these experiments, DQL is verified to have more accuracy and robustness than some other traditional methods. These experiments demonstrate that DQL can generate a maximum-torque output in any state.

The remainder of this paper is organized as follows. In Section 2, an induction model of the motor with some parameters is established. In Section 3, a new DQL architecture is designed, where the observation, reward, and action are appropriately designed. The experimental and comparative studies are extensively illustrated in Section 4. Finally, the conclusion is drawn in Section 5.

Section snippets

IM model

The mathematical model of the IM in $d$ – $q$ axis voltage can be given as $\{\begin{matrix} u_{d s} = R_{s} i_{d s} - ω L_{s}^{'} i_{q s} \\ u_{q s} = R_{s} i_{q s} + L_{s}^{'} \frac{d i_{q s}}{d t} + ω L_{s} i_{d s} . \end{matrix}$

The motor torque is achieved at $T_{e} = \frac{3}{2} n_{p} \frac{L_{m}^{2}}{L_{r}} i_{s d} i_{s q}$ where $ω$ is the electrical angular velocity; $T_{e}$ is the electromagnetic torque; $n_{p}$ is the pole pair of the motor; $u_{d s}$ , $u_{q s}$ , $i_{d s}$ , and $i_{q s}$ are the $d$ – $q$ axis stator voltage and current; $R_{s}$ and $R_{r}$ indicate the stator and rotator resistances, respectively; and $L_{m}$ indicates the excitation inductance. Moreover, the stator and rotator

Reinforcement learning and Markov decision process

The task of motor parameter estimation is shown in Fig. 1, where an algorithm can interact with the environment in a sequence of actions $A$ , observation $S$ , and rewards $R$ . At each time step, the algorithm selects an action $a_{t}$ from the set of actions $A = {1, \dots, K}$ . The action is passed to the environment and modifies the observations, which is an internal signal of the motor controller. Simultaneously, the algorithm receives a reward in the form of motor electromagnetic torque. Note that all motor

Experiment environment

During the experiments, an IM benchmark is employed as the experimental equipment, which is illustrated in Fig. 5. In particular, the benchmark consists of a test motor (design parameters are shown in Table 2), a dynamometer motor, a torque–speed transducer, a data collector, and two motor controllers. Moreover, the test motor works in the IFOC torque loop, while the dynamometer motor works in the IFOC speed loop. The torque and current signals can be obtained from the torque–speed transducer

Conclusion

This paper proposes a parameter estimation method for the rotor resistance $R_{r}$ and excitation inductance $L_{m}$ in an IM’s IFOC application. Towing to the demonstrated disadvantage of the model-based method, a model-free method based on DQL is proposed. In particular, the structure of the DQL parameter estimation is constructed. Meanwhile, the DQL elements, including observation, reward, and action, are designed such that the feasibility of the algorithm can be ensured. Moreover, a new concept of ‘ $Q$

References (28)

BilskiP.
Application of support vector machines to the induction motor parameters identification
Measurement
(2014)
KhanS.G. et al.
A Novel $Q$ -learning based adaptive optimal controller implementation for a humanoid robotic arm *
IFAC Proc.
(2011)
LiuL. et al.
Particle swarm optimization-based parameter identification applied to permanent magnet synchronous motors
Eng. Appl. Artif. Intell.
(2008)
PapaG. et al.
An artificial intelligence approach to the efficiency improvement of a universal motor
Eng. Appl. Artif. Intell.
(2005)
SakthivelV.P. et al.
Multi-objective parameter estimation of induction motor using particle swarm optimization
Eng. Appl. Artif. Intell.
(2010)
WoodleyK.M. et al.
Neural network modeling of torque estimation and d-q transformation for induction machine
Eng. Appl. Artif. Intell.
(2005)
YinJ. et al.
Energy-efficient metro train rescheduling with uncertain time-variant passenger demands: An approximate dynamic programming approach
Transp. Res. B
(2016)
Deisenroth, M.P., Rasmussen, C.E., 2009. Efficient reinforcement learning for motor control. In: International Phd...
DjadiH. et al.
Parameters identification of a brushless doubly fed induction machine using PRBS excitation signal for recursive least squares method
IET Electr. Power Appl.
(2017)
KaranayilB. et al.
Online stator and rotor resistance estimation scheme using artificial neural networks for vector controlled speed sensorless induction motor drive
IEEE Trans. Ind. Electron.
(2007)

Kojooyan-JafariH. et al.

Parameter estimation of wound-rotor induction motors from transient measurements

IEEE Trans. Energy Convers.

(2014)

LiuZ.-H. et al.

GPU Implementation of DPSO-RE algorithm for parameters identification of surface PMSM considering VSI nonlinearity

IEEE J. Emerg. Sel. Top. Power Electron.

(2017)

MaitiS. et al.

Model reference adaptive controller-based rotor resistance and speed estimation techniques for vector controlled induction motor drive utilizing reactive power

IEEE Trans. Ind. Electron.

(2008)

MnihV. et al.

Playing atari with deep reinforcement learning

Comput. Sci.

(2013)

Cited by (35)

Data-driven coordinated control method for multiple systems in proton exchange membrane fuel cells using deep reinforcement learning
2022, Energy Reports
Citation Excerpt :
Using artificial intelligence (AI) technologies, researchers have devised several data-driven algorithms for multi-agent coordination. The MADDPG algorithm (Lowe et al., 2017) refers to a multi-agent reinforcement learning algorithm that can achieve rapid coordinated control of multiple agents through a centralized training strategy, and so it is used extensively adopted to solve various coordination problems (Li et al., 2021d; Li and Yu, 2021a,b; Zhang et al., 2020; Qi, 2018). As impacted by the insufficient exploration ability of the MADDPG algorithm, the robustness exhibited by this type of algorithm is extremely weak for coordination of PEMFC parameters.
To improve the stability and operating efficiency of a proton exchange membrane fuel cell (PEMFC) system, a distributed deep reinforcement learning-based data-driven coordinated control method is proposed for realizing the coordinated control of a PEMFC gas supply system and heat management system. In addition, a siphonophora multiagent double-delay deep deterministic policy gradient (SMA-4DPG) algorithm is proposed for this method. The design of the algorithm is based on bionics in that it imitates the feeding and survival strategies of a siphonophora, a jellyfish creature with a hydra-like structure. The algorithm utilizes different exploration principles for exploring the PEMFC environment to improve the robustness of the coordination strategy in a manner similar to the siphonophora with its different prey-seeking organs. With this algorithm, the gas supply system and the heat management system are treated as two agents. Through centralized training, agents with different objectives and time scales can coordinate with each other and improve the stability of the PEMFC output voltage and stack temperature. The effectiveness of the proposed algorithm is demonstrated in a series of experiments in which its performance is compared with that of a conventional controller.
A multi-objective energy coordinative and management policy for solid oxide fuel cell using triune brain large-scale multi-agent deep deterministic policy gradient
2022, Applied Energy
Citation Excerpt :
The lower robustness of conventional MADDPG algorithm is attributed to the following problems: A large number of low-value samples are obtained by the MADDPG algorithm in the preliminary stage [30,31]. In offline training, the DDPG algorithm adopts a simple exploration approach, lowering the diversity of samples.
In this paper, a data-driven multi-objective energy coordinative management policy is proposed in order to enhance the net output power and efficiency of a solid oxide fuel cell (SOFC) and prevent constraint violations. This study focuses on the optimization agent and controller design for a SOFC power system to maintain stable oxygen excess ratio (OER) and fuel utilization (FU) ratio as well as meet the load demand simultaneously. The optimization agent is responsible for output the reference OER and FU, aiming to achieve maximum net output power and operational efficiency as well as dynamic constraint satisfaction times in terms of OER and FU in real time. By applying reference OER and FU settings, the air and hydrogen flow within the SOFC can be effectively controlled by coordination of the air control agent and hydrogen control agent, respectively. In addition, a triune brain large-scale multi-agent deep deterministic policy gradient algorithm (TBL-MADDPG) is proposed. In order to improve the robustness of the proposed policy, the design of TBL-MADDPG entails curriculum learning, imitation learning and a large-scale multi-agent training framework. The performance of this proposed method is verified by the experiment.
A critical survey of proton exchange membrane fuel cell system control: Summaries, advances, and perspectives
2022, International Journal of Hydrogen Energy
Proton exchange membrane fuel cell (PEMFC) is a promising future power source, which uses hydrogen energy to generate electricity with the byproduct of water. In general, PEMFC includes several strongly coupled subsystems with high nonlinearity and complex dynamic processes. Therefore, proper control strategies are crucial for a reliable and effective PEMFC operation. This paper aims to carry out a comprehensive and systematic overview of state-of-the-art PEMFC control strategies. Based on a thorough investigation of 180 literatures, these control strategies are classified into nine main categories, including proportional integral derivative (PID) control, adaptive control (APC), fuzzy logic control (FLC), robust control, observer-based control, model predictive control (MPC), fault tolerant control (FTC), optimal control and artificial intelligence control. Furthermore, a comprehensive evaluation and detailed summary of their control deigns, objectives, performance, applications, advantages/disadvantages, complexity, robustness and accuracy are conducted thoroughly. Finally, five valuable and insightful perspectives/recommendations are proposed for future research.
Total travel costs minimization strategy of a dual-stack fuel cell logistics truck enhanced with artificial potential field and deep reinforcement learning
2022, Energy
Citation Excerpt :
To this end, a more general form APF is introduced which enables an adjustable virtual attractive/repulsive force through dynamic tuning of ab. DDPG is a model-free, on-line, off-policy DRL algorithm that combines the concepts of deterministic policy gradient (DPG) and deep Q-network (DQN) [31,32]. DDPG is designed, trained and deployed to further optimize the APF parameter aiming to improve the adaptivity of SOC regulator.
To fulfill the increasing power level of fuel cell, a self-adaptive energy management strategy (EMS) with considerations of the efficiency and health of dual-stack fuel cell (DFC) and the total traveling costs for a logistics truck is proposed. The virtual attractive/repulsive forces generated by artificial potential field (APF) functions are applied to DFC and battery system as performance regulator in order to guarantee the efficiency of DFC and the maintenance of SOC. Deep reinforcement learning algorithm, namely deep deterministic policy gradient (DDPG), is leveraged to automatically adjust the virtual force exerted to APF functions in order to assist the power allocation between various energy sources. In comparison to identical power allocation via equivalent hydrogen consumption minimization strategy, APF function generated uneven power distribution of DFC by prohibiting high/low current and frequently start/stop operations of single fuel cell, especially under charge depletion stage. Meanwhile, DDPG-tuner is effective to soften the interaction effect between DFC and battery while meeting the multi-objectives of the EMS. The proposed EMS in cooperation of APF function and DDPG tuner is expected to cope with the dynamic price fluctuation of various energy sources and beneficial to reduce total travel costs as well as extend the DFC's longevity.
Optimal trajectory exploration large-scale deep reinforcement learning tuned optimal controller for proton exchange membrane fuel cell
2022, Journal of the Franklin Institute
To accurately regulate hydrogen flow and guarantee satisfactory output voltage control performance, taking advantage of the high adaptability and robustness of large-scale deep reinforcement learning, an optimal fractional-order proportion integral differential (FOPID) controller for controlling proton exchange membrane fuel cell (PEMFC) output voltage is proposed in this paper. In addition, an optimal trajectory exploration large-scale multi-delay deep deterministic policy gradient (OTEL-MD3PG) algorithm, which naturally considers the baseline FOPID coefficients in the design objective and provides the online coefficient adjusting ability through learning, is designed as the tuner of the controller to improve adaptability and robustness. This algorithm adopts the optimal trajectory exploration policy, whereby a new agent (demonstrator) generates demonstration samples that instruct the agent to learn, and another agent (tracker) adds noise to the action of the demonstrator to explore the limits of its control trajectory, thereby obtaining a more robust control strategy. The simulation results show that this proposed algorithm offers a rapid response, strong anti-interference, and excellent control performance.
Optimal adaptive control for solid oxide fuel cell with operating constraints via large-scale deep reinforcement learning
2021, Control Engineering Practice
Since a solid oxide fuel cell (SOFC) is a complicated nonlinear, time-varying and constrained system, it is difficult to control the fuel flow to stabilize the output voltage while considering fuel utilization operating constraints. To overcome this problem, an adaptive fractional-order proportional integral derivative (FOPID) controller, taking advantage of the adaptability and model-free features of large-scale deep reinforcement learning, is proposed in this paper. Furthermore, a fittest survival strategy large-scale twin delayed deep deterministic policy gradient (FSSL-TD3) algorithm is designed as the tuner of this controller. In this algorithm, the exploration efficacy is improved by way of the fittest survival strategy and imitation learning. Other techniques are also applied to this algorithm in order to improve the robustness of FOPID controller. In addition, by formulating the reward function of the FSSL-TD3 algorithm, the fuel utilization of the SOFC can always be kept in a safe range, which is not possible for conventional control algorithms. The simulation results in this paper show that the output voltage of SOFCs can be controlled effectively by this controller while fuel utilization is retained within a reasonable range.

View all citing articles on Scopus

View full text

Rotor resistance and excitation inductance estimation of an induction motor using deep-Q-learning algorithm

Abstract

Introduction

Section snippets

IM model

Reinforcement learning and Markov decision process

Experiment environment

Conclusion

Measurement

IFAC Proc.

Eng. Appl. Artif. Intell.

Eng. Appl. Artif. Intell.

Eng. Appl. Artif. Intell.

Eng. Appl. Artif. Intell.

Transp. Res. B

Parameters identification of a brushless doubly fed induction machine using PRBS excitation signal for recursive least squares method

IET Electr. Power Appl.

Online stator and rotor resistance estimation scheme using artificial neural networks for vector controlled speed sensorless induction motor drive

IEEE Trans. Ind. Electron.

Parameter estimation of wound-rotor induction motors from transient measurements

IEEE Trans. Energy Convers.

GPU Implementation of DPSO-RE algorithm for parameters identification of surface PMSM considering VSI nonlinearity

IEEE J. Emerg. Sel. Top. Power Electron.

Model reference adaptive controller-based rotor resistance and speed estimation techniques for vector controlled induction motor drive utilizing reactive power

IEEE Trans. Ind. Electron.

Playing atari with deep reinforcement learning

Comput. Sci.