Efficient fast learning automata

doi:10.1016/S0020-0255(03)00179-8

Information Sciences

Volume 157, December 2003, Pages 121-133

https://doi.org/10.1016/S0020-0255(03)00179-8 Get rights and content

Abstract

A new class of learning automata which are capable of supporting high-speed real-time applications is introduced. The proposed learning automata have a unique characteristic: they are capable of performing both probability updating and action selection with a computational complexity which is independent of the number of actions. Apart from their low computational complexity, the proposed automata are capable of achieving a high performance when operating in non-stationary stochastic environments.

Introduction

Adaptive learning is one of the main fields of artificial intelligence. Learning automaton (LA) [3], [4], [5], [6], [7], [8], [9], [10] is one of the most powerful tools in this research area. A learning automaton is a finite state machine that interacts with a stochastic environment trying to learn the optimal action offered by the environment, via a learning process. The learning process is as follows (Fig. 1). The automaton chooses one of the offered actions according to a probability vector which at any time instant contains the probability of choosing each action. The chosen action triggers the environment, which responds with an answer (reward or penalty), according to the reward probability of the chosen action. The automaton takes into account this answer and modifies the probability vector by means of a learning algorithm. A learning automaton is one that learns the action that has the maximum probability to be rewarded and that ultimately chooses this action more frequently than other actions.

We refer the reader to the survey papers by Narendra and Thathachar [6] and by Narendra and Lakshmivarahan [7] for a review of the various families and the corresponding properties of learning automata.

With respect to their Markovian representations, learning automata are classified into two main categories: ergodic automata and automata possessing absorbing barriers. The ergodic automata converge with a distribution independent of the initial state. Automata with absorbing states, after a number of finite steps, are locked in a specific state. When the reward probabilities of actions are time-variant (non-stationary environment), then ergodic automata are preferred, because they are capable of being adapted to the environmental changes. On the other hand, absorbing automata are used in environments with fixed reward probabilities (stationary environments).

With respect to the values that the action probabilities can take, learning automata are classified into continuous and discretized ones. In continuous LA, the action probabilities can take any value in the [0,1] interval. In discretized LA, the action probabilities can take values from a finite set only.

LA have a wide range of applications [3], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23] in real-time and non-real-time systems. Applications of LA include: adaptive choice of cyclic code in communication systems [3], adaptive control of communication networks [11], [12], [13], [14], [15], [16], string taxonomy [17], associative learning of boolean functions [18], object partitioning [19], [20], capacity assignment [21] and optimization of multimodal functions [22], [23].

When LA are applied to real-time systems––such as communication networks––it is critical to keep the computational time of the LA low. In this paper, a new ergodic discretized LA which is capable of supporting high-speed real-time applications is introduced. The proposed LA has a unique characteristic: it is capable of performing both probability updating and action selection with a computational complexity which is independent from the number of actions. Apart from its low computational complexity, the proposed automaton is capable of achieving a high performance when operating in non-stationary stochastic environments.

The paper is organized as follows. The proposed LA is presented in Section 2. Its implementation is discussed in Section 3, while the simulation results which indicate the superiority of the proposed automaton over the well-known L_RP scheme, are presented in Section 4. Finally, concluding remarks are given in Section 5.

Section snippets

The proposed fast learning automaton (FLA)

The operation of a LA during one iteration consists of two basic functions:

(a)
Probability updating. Based on the environmental response to the selected action, the automaton modifies the probability distribution over the set of actions.
(b)
Action selection. Based on the new probability distribution, the automaton selects a new action that triggers the environment.

When the LA is applied to a non-real-time application, then its computational time is less important. However, when the LA is applied to a

A fast implementation of the FLA

During one iteration, the FLA has to update the probability distribution and then, to select an action according to the new probability distribution. A fast algorithm which implements these operations is presented below. The proposed algorithm has two interesting characteristics: (a) its computational time is independent of the number of actions r and (b) it uses integer arithmetic. Therefore, it is executed very fast, regardless of the number of actions.

Our algorithm is based on keeping a

Simulation results

In the previous sections, it was shown that the FLA is characterized by a low computational complexity. It remains to examine, whether it is capable of operating efficiently in non-stationary stochastic environments. In order to study the performance of the FLA we compare it with the wellknown L_RP [5], [6] scheme. It should be noted that L_RI [5] cannot be compared with the FLA because it has absorbing states and consequently, its performance is very poor when operating in a non-stationary

Conclusion

This paper has presented a new LA which is capable of performing both probability updating and action selection in a few nanoseconds, regardless of the number of actions. This property is very attractive for high-speed real-time applications, where the computational time of the automaton is a bottleneck.

Apart from its low computational overhead, the proposed FLA scheme achieves a high performance when operating in non-stationary stochastic environments. It is clear that FLA is not ϵ-optimal [5]

References (25)

G.I Papadimitriou et al.
WDM passive star networks: a learning automata based architecture
Computer Communications
(1996)
M.S Obaidat
Artificial neural networks to systems, man and cybernetics: characteristics, structures, and applications
IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics
(1998)
M.S.Obaidat, G.I. Papadimitriou, A.S.Pomportsis (Eds.), Learning automata: theory, paradigms and applications, IEEE...
K Najim et al.
Learning Automata: Theory and Applications
(1994)
A.S Poznyak et al.
Learning Automata and Stochastic Optimization
(1997)
K.S Narendra et al.
Learning Automata: An Introduction
(1989)
K.S Narendra et al.
Learning automata––a survey
IEEE Transactions on Systems, Man and Cybernetics
(1974)
K.S Narendra et al.
Learning automata––a critique
Journal of Cybernetics and Information Sciences
(1977)
B.J Oommen
Absorbing and ergodic discretized two-action learning automata
IEEE Transactions on Systems, Man and Cybernetics
(1986)
G.I Papadimitriou
A new approach to the design of reinforcement schemes for learning automata: stochastic estimator learning algorithms
IEEE Transactions on Knowledge and Data Engineering
(1994)

G.I Papadimitriou

Hierarchical discretized pursuit nonlinear learning automata with rapid convergence and high accuracy

IEEE Transactions on Knowledge and Data Engineering

(1994)

G.I Papadimitriou et al.

Learning automata-based receiver conflict avoidance algorithms for WDM broadcast-and-select star networks

IEEE/ACM Transactions on Networking

(1996)

Cited by (23)

Learning automata based classifier
2008, Pattern Recognition Letters
In this paper a new classifier has been designed based on the learning automata. This classifier can efficiently approximate the decision hyperplanes in the feature space without need to know the class distributions and the a priori probabilities. The performance of the proposed classifier has been tested on different kinds of benchmarks with nonlinear, overlapping class boundaries and different feature space dimensions. Extensive experimental results on these data sets are provided to show that the performance of the proposed classifier is comparable to, sometimes better than multi-layer perceptron, k-nearest neighbor classifier, genetic classifier, and particle swarm classifier. Also the comparative results are provided to show the effectiveness of the proposed method in comparison to similar researches. Furthermore the effect of the number of training points on the performance of the designed classifier is investigated. It is found that as the number of training data increases, the performance of the classifier tends to the performance of Bayes classifier which is an optimal one.
A learning automata based algorithm for optimization of continuous complex functions
2005, Information Sciences
This paper presents a new method for optimizing continuous complex functions based on a learning automaton. This method can be considered as active learning permitting to select on-line the most significant data samples in order to quickly converge to a quasi global optimum of the functions to be optimized with a fewer number of tests or calculations. Like other stochastic optimization algorithms, it aims at finding a compromise between exploitation and exploration, i.e. converging to the nearest local optima and exploring the function behavior in order to discover global optimal regions. During the optimization procedure, this method enhances local search in interesting regions or intervals and reduces the whole searching space by removing useless regions or intervals.
Learning Automata and Their Applications to Intelligent Systems
2023, Learning Automata and their Applications to Intelligent Systems
A novel reduced parameter s-model of estimator learning automata in the switching non-stationary environment
2022, Neural Computing and Applications
A non-Monte-Carlo parameter-free learning automata scheme based on two categories of statistics
2019, IEEE Transactions on Cybernetics
A Learning Automaton-Based Controller Placement Algorithm for Software-Defined Networks
2018, Proceedings - IEEE Global Communications Conference, GLOBECOM

View all citing articles on Scopus

View full text

Efficient fast learning automata

Abstract

Introduction

Section snippets

The proposed fast learning automaton (FLA)

A fast implementation of the FLA

Simulation results

Conclusion

Computer Communications

Artificial neural networks to systems, man and cybernetics: characteristics, structures, and applications

IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics

Learning Automata: Theory and Applications

Learning Automata and Stochastic Optimization

Learning Automata: An Introduction

Learning automata––a survey

IEEE Transactions on Systems, Man and Cybernetics

Learning automata––a critique

Journal of Cybernetics and Information Sciences

Absorbing and ergodic discretized two-action learning automata

IEEE Transactions on Systems, Man and Cybernetics

A new approach to the design of reinforcement schemes for learning automata: stochastic estimator learning algorithms

IEEE Transactions on Knowledge and Data Engineering

Hierarchical discretized pursuit nonlinear learning automata with rapid convergence and high accuracy

IEEE Transactions on Knowledge and Data Engineering

Learning automata-based receiver conflict avoidance algorithms for WDM broadcast-and-select star networks

IEEE/ACM Transactions on Networking