Guest editorial
Learning from delayed rewards

https://doi.org/10.1016/0921-8890(95)00026-CGet rights and content

References (0)

Cited by (123)

  • A theoretical demonstration for reinforcement learning of PI control dynamics for optimal speed control of DC motors by using Twin Delay Deep Deterministic Policy Gradient Algorithm

    2023, Expert Systems with Applications
    Citation Excerpt :

    Later, Q-Learning was first to come out in 1989, which is a special type of RL approach. Watkins used the letter Q for the value function, which is based on the theory of Markov decision processes (Watkins, 1989; Watkins & Dayan, 1992). However, Q-Learning has not attracted interest in its domains until DQN algorithms were developed (Mnih et al., 2013).

  • A differential evolution with reinforcement learning for multi-objective assembly line feeding problem

    2022, Computers and Industrial Engineering
    Citation Excerpt :

    Finally, a tuple consisting of {s, a, s’, r} is stored for agent learning. Here, we employ Q-learning (Watkins, 1989), a value-based RL algorithm, as the agent. RL uses the state to describe the properties of the environment.

  • Deep understanding of big geospatial data for self-driving: Data, technologies, and systems

    2022, Future Generation Computer Systems
    Citation Excerpt :

    A widely applied method is to represent the reward function as a linear combination of functions of a number of manually selected features [71–74]. Q-Learning [75] is one of the most commonly used RL algorithms. It is a model-free algorithm that learns an estimation of the utility of a state–action pair.

View all citing articles on Scopus

Tel.: +31 20 525-7463, Fax: +31 20 525-7490.

View full text