Efficient fast learning automata
Introduction
Adaptive learning is one of the main fields of artificial intelligence. Learning automaton (LA) [3], [4], [5], [6], [7], [8], [9], [10] is one of the most powerful tools in this research area. A learning automaton is a finite state machine that interacts with a stochastic environment trying to learn the optimal action offered by the environment, via a learning process. The learning process is as follows (Fig. 1). The automaton chooses one of the offered actions according to a probability vector which at any time instant contains the probability of choosing each action. The chosen action triggers the environment, which responds with an answer (reward or penalty), according to the reward probability of the chosen action. The automaton takes into account this answer and modifies the probability vector by means of a learning algorithm. A learning automaton is one that learns the action that has the maximum probability to be rewarded and that ultimately chooses this action more frequently than other actions.
We refer the reader to the survey papers by Narendra and Thathachar [6] and by Narendra and Lakshmivarahan [7] for a review of the various families and the corresponding properties of learning automata.
With respect to their Markovian representations, learning automata are classified into two main categories: ergodic automata and automata possessing absorbing barriers. The ergodic automata converge with a distribution independent of the initial state. Automata with absorbing states, after a number of finite steps, are locked in a specific state. When the reward probabilities of actions are time-variant (non-stationary environment), then ergodic automata are preferred, because they are capable of being adapted to the environmental changes. On the other hand, absorbing automata are used in environments with fixed reward probabilities (stationary environments).
With respect to the values that the action probabilities can take, learning automata are classified into continuous and discretized ones. In continuous LA, the action probabilities can take any value in the [0,1] interval. In discretized LA, the action probabilities can take values from a finite set only.
LA have a wide range of applications [3], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23] in real-time and non-real-time systems. Applications of LA include: adaptive choice of cyclic code in communication systems [3], adaptive control of communication networks [11], [12], [13], [14], [15], [16], string taxonomy [17], associative learning of boolean functions [18], object partitioning [19], [20], capacity assignment [21] and optimization of multimodal functions [22], [23].
When LA are applied to real-time systems––such as communication networks––it is critical to keep the computational time of the LA low. In this paper, a new ergodic discretized LA which is capable of supporting high-speed real-time applications is introduced. The proposed LA has a unique characteristic: it is capable of performing both probability updating and action selection with a computational complexity which is independent from the number of actions. Apart from its low computational complexity, the proposed automaton is capable of achieving a high performance when operating in non-stationary stochastic environments.
The paper is organized as follows. The proposed LA is presented in Section 2. Its implementation is discussed in Section 3, while the simulation results which indicate the superiority of the proposed automaton over the well-known LRP scheme, are presented in Section 4. Finally, concluding remarks are given in Section 5.
Section snippets
The proposed fast learning automaton (FLA)
The operation of a LA during one iteration consists of two basic functions:
- (a)
Probability updating. Based on the environmental response to the selected action, the automaton modifies the probability distribution over the set of actions.
- (b)
Action selection. Based on the new probability distribution, the automaton selects a new action that triggers the environment.
When the LA is applied to a non-real-time application, then its computational time is less important. However, when the LA is applied to a
A fast implementation of the FLA
During one iteration, the FLA has to update the probability distribution and then, to select an action according to the new probability distribution. A fast algorithm which implements these operations is presented below. The proposed algorithm has two interesting characteristics: (a) its computational time is independent of the number of actions r and (b) it uses integer arithmetic. Therefore, it is executed very fast, regardless of the number of actions.
Our algorithm is based on keeping a
Simulation results
In the previous sections, it was shown that the FLA is characterized by a low computational complexity. It remains to examine, whether it is capable of operating efficiently in non-stationary stochastic environments. In order to study the performance of the FLA we compare it with the wellknown LRP [5], [6] scheme. It should be noted that LRI [5] cannot be compared with the FLA because it has absorbing states and consequently, its performance is very poor when operating in a non-stationary
Conclusion
This paper has presented a new LA which is capable of performing both probability updating and action selection in a few nanoseconds, regardless of the number of actions. This property is very attractive for high-speed real-time applications, where the computational time of the automaton is a bottleneck.
Apart from its low computational overhead, the proposed FLA scheme achieves a high performance when operating in non-stationary stochastic environments. It is clear that FLA is not ϵ-optimal [5]
References (25)
- et al.
WDM passive star networks: a learning automata based architecture
Computer Communications
(1996) Artificial neural networks to systems, man and cybernetics: characteristics, structures, and applications
IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics
(1998)- M.S.Obaidat, G.I. Papadimitriou, A.S.Pomportsis (Eds.), Learning automata: theory, paradigms and applications, IEEE...
- et al.
Learning Automata: Theory and Applications
(1994) - et al.
Learning Automata and Stochastic Optimization
(1997) - et al.
Learning Automata: An Introduction
(1989) - et al.
Learning automata––a survey
IEEE Transactions on Systems, Man and Cybernetics
(1974) - et al.
Learning automata––a critique
Journal of Cybernetics and Information Sciences
(1977) Absorbing and ergodic discretized two-action learning automata
IEEE Transactions on Systems, Man and Cybernetics
(1986)A new approach to the design of reinforcement schemes for learning automata: stochastic estimator learning algorithms
IEEE Transactions on Knowledge and Data Engineering
(1994)
Hierarchical discretized pursuit nonlinear learning automata with rapid convergence and high accuracy
IEEE Transactions on Knowledge and Data Engineering
Learning automata-based receiver conflict avoidance algorithms for WDM broadcast-and-select star networks
IEEE/ACM Transactions on Networking
Cited by (23)
Learning automata based classifier
2008, Pattern Recognition LettersA learning automata based algorithm for optimization of continuous complex functions
2005, Information SciencesLearning Automata and Their Applications to Intelligent Systems
2023, Learning Automata and their Applications to Intelligent SystemsA novel reduced parameter s-model of estimator learning automata in the switching non-stationary environment
2022, Neural Computing and ApplicationsA non-Monte-Carlo parameter-free learning automata scheme based on two categories of statistics
2019, IEEE Transactions on CyberneticsA Learning Automaton-Based Controller Placement Algorithm for Software-Defined Networks
2018, Proceedings - IEEE Global Communications Conference, GLOBECOM