Engineering Applications of Artificial Intelligence
Detection of locally relevant variables using SOM–NG algorithm
Introduction
Prototype based neural networks are trained to obtain a representative set of points with the same characteristics as the original data set, this is called vector quantization. A classic algorithm for this purpose is k-means (MacQueen, 1967) that reassigns them to the mean of the data points included in the Voronoi region they define. In this algorithm only the closest prototype is modified by each data point, so a “winner-takes-all” philosophy is used.
Neural Gas (NG) is another vector quantization algorithm, presented in Martinetz et al. (1993), which reaches an optimum distribution of prototypes. The main difference between NG and k-means is the use of a neighbourhood ranking between prototype vectors and each data point. It is a cooperative–competitive algorithm which avoids local minima. To improve the convergence speed and the robustness of the algorithm, a batch version is presented in Cottrell et al. (2006). NG reaches optimum distributions at the cost of having no order between prototypes, thus it does not offer output maps.
Supervised versions of NG have been also developed, specially for classification. A supervised version based on a combination of NG and Generalized Learning Vectors Quantization (Sato and Yamada, 1995) is defined in Hammer et al. (2005), which updates all the prototypes of the respective class depending on the NG ranking. A different Supervised Neural Gas was developed including the class labels as additional dimensions and weighting them with a mixing parameter in Hammer et al. (2006).
Another important quantization algorithm is the Self-Organizing Map (SOM), also called Kohonen Map (Kohonen, 2001). Instead of using free moving prototypes, its learning rule is based on a predefined lattice and the learning rule depends also on the position of each unit in that lattice and not only on the distance to the data points. This algorithm has suppose a great advance in machine learning because of its many features, mainly the dimensionality reduction and the topological preservation.
In data mining, SOM is widely used for visualization purposes. The main tools for visualization are the component planes and the U-matrix. Component planes are a representation of the predefined lattice with a colour scale for the values of each dimension in the input space. These component planes allow the user to find non-linear relationships between different variables. The U-matrix represents the distance between adjacent prototypes in the lattice, so different clusters can be defined using the boundaries defined by significantly higher distances. A detailed work of the use of U-matrix can be found in Ultsch (2003).
SOM has been used for multiple tasks, and specific modifications were created for different purposes. In order to create a continuous map based on the lattice, an interpolation method, called “Interpolating Self-Organizing Map” (I-SOM) was started in Göppert and Rosenstiel (1993) and developed in different phases until Göppert and Rosenstiel (1997), where a soft interpolating method was used to obtain the “Continuous Interpolating Self-Organizing Map” (CI-SOM) . Both I-SOM and CI-SOM are based on interpolation using the closest prototypes to the winning neuron, i.e. D+1 neurons were required for a D-dimensional data vector. These interpolating SOMs are proposed for function approximation specially if these can be reduced to a low dimension map.
To predict temporal sequences some special versions of SOM were developed, like the Temporal Kohonen Map (TKM), the Recurrent SOM and Recursive SOM. Temporal Kohonen Map, defined in Chappell and Taylor (1993), uses “leaky integrators” to select the best matching unit (bmu), so the previous state is taken in care to select the new winning neuron. Recurrent SOM, proposed in Varsta et al. (1997), applies an IIR filter to the sequence instead of filtering the output. Recursive SOM, described in Voegtlin (2002), uses both the actual map and a copy of the previous one. So a new data entry is a combination of the input vector and a context vector based on the copy of the map in the previous instant. They all have the advantage of using chunks of temporally ordered data instead of having to create a special data set including ordered fields. A more detailed review on temporal extensions can be found in Salhi et al. (2009).
In many fields of knowledge gradients are used for many tasks, e.g. surface reconstruction or optimization algorithms. There are many approaches for estimating the gradient; mathematical literature has different estimation methods based on local estimators designed for specific problems (Meyer et al., 2001, Hazen and Gupta, 2009).
This work is based on a combination of SOM and NG, with the main goal of having better quantization without losing the good properties of having a predefined lattice. This algorithm was presented in Machón-González et al. (2010) and an intermediate behaviour between the free moving prototypes in NG and the rigid order of SOM maps was obtained. As it was an interesting tool for quantization and clustering, a supervised version was developed in Crespo Ramos et al. (2011) using a linear model for each prototype in the codebook.
In this paper coefficients of local linear models get physical meaning as gradient components instead of being just numerical tools for estimation. Gradient norms provide variability and stability information of the represented system.
A description of the algorithm is included in Section 2, after that the new gradient analysis method is proposed in Section 3, experimental testing is performed in Section 4 and finally the results will be discussed in Section 5.
Section snippets
The SOM–NG algorithm
In order to improve the quantization capability of SOM, the hybrid algorithm SOM–NG was proposed in Machón-González et al. (2010). The main feature of the algorithm is the neighbourhood function, which is based on both SOM and NG neighbourhood functions. The expression of the hybrid neighbourhood function iswhere v is the data vector and wi is a prototype vector in the input space, hSOM is the SOM-based neighbourhood function and hNG is the NG-based
Proposed application
The SOM–NG algorithm has been proven to be a useful tool for quantization and dimensionality reduction, as well as a competitive estimator based on local linear models. Now it will be presented as a data mining tool analysing the gradients and obtaining useful information with them.
Having a set of locally calculated gradients the evolution of the estimated function can be studied from them. A first approach is to represent a gradient norm plane where the norm of the gradient is represented in
Real data application: luminous efficacy
In this section a real data set is used to show the different uses of the algorithm, including the old and the new ones. This data set was created following the steps described in Mayhoub and Carter (2011). In this study the direct luminous efficacy is modelled using geographical and meteorological data. Direct luminous efficacy in different locations is modelled calculating , where E is the direct illuminance measured in lux and I is the direct irradiance measured in . Both variables
Conclusions
In this work a new use for the previously proposed algorithm is presented. The algorithm was proven to be useful for quantization and estimation but it has more capabilities, such as data mining.
Previously presented features are also useful for data mining uses. Quantization and component planes are useful for state definition and obtaining representative points of data. Component planes can find relationships between different variables and distance maps are useful for clustering purposes.
In
References (21)
- et al.
The temporal Kohonen map
Neural Networks
(1993) - et al.
Batch and median neural gas
Neural Networks
(2006) - et al.
A model to estimate direct luminous efficacy based on satellite data
Solar Energy
(2011) Recursive self-organizing maps
Neural Networks
(2002)- Crespo Ramos, M.J., Machón González, I., López García, H., Calvo Rolle, J.L., 2011. Supervised hybrid SOM–NG algorithm....
- Göppert, J., Rosenstiel, W., 1993. Topology-preserving interpolation in self organizing maps. In: Proceedings of...
- et al.
The continuous interpolating self-organizing map
Neural Process. Lett.
(1997) - et al.
Supervised batch neural gas
Artif. Neural Networks Pattern Recognition
(2006) - et al.
Supervised neural gas with general similarity measure
Neural Process. Lett.
(2005) - Hazen, M., Gupta, M.R., 2009. Gradient estimation in global optimization algorithms. In: IEEE Congress on Evolutionary...
Cited by (24)
An intelligent fault detection system for a heat pump installation based on a geothermal heat exchanger
2016, Journal of Applied LogicCitation Excerpt :The field of solar energy has a lot of studies, for example, [19] focuses the study on the relation between solar radiation and atmospheric conditions; [25] shows new climatic indicators base on the urban sprawl. In other fields, intelligent algorithms are used to detect locally relevant variables [18], or to assist in the verification of some system [53]. On the last research line, this work describes a new intelligent method to make fault detection on buildings for heat pump systems using geothermal exchanger.
Hybrid Intelligent Model to Predict the Remifentanil Infusion Rate in Patients under General Anesthesia
2021, Logic Journal of the IGPLComparative of clustering techniques for academic advice and performance measurement
2021, Advances in Intelligent Systems and ComputingDynamics identification and control of nonlinear MIMO coupled plant using supervised neural gas and comparison with recurrent neural controller
2020, Neural Computing and ApplicationsHybrid model for the ANI index prediction using Remifentanil drug and EMG signal
2020, Neural Computing and ApplicationsLithium iron phosphate power cell fault detection system based on hybrid intelligent system
2020, Logic Journal of the IGPL