Self-Organizing Maps for imprecise data
Introduction
Self-Organizing Maps (SOMs) (Kohonen [1], [2]) consist of a set of neurons arranged in a linear (one-dimensional) or rectangular (two-dimensional) configuration, such that there are neighbourhood relations among the neurons. Each neuron is attached to a weight vector of the same dimension as the input space (the multi-dimensional space of the units or input vectors). After completion of training, by assigning each input vector to the neuron with the nearest weight vector (reference vector), the SOMs are able to divide the input space into regions with common nearest weight vector (vector quantization), allowing clustering of the input vectors. Moreover, under appropriate training, because of the neighbourhood relation contributed by the interconnection among neurons, the SOMs exhibit the important property of topology preservation. In other words, if two input vectors are close in the input space, the corresponding reference vectors (closest neurons) will also be close in the neural network. Therefore, at least in two-dimensional neural networks, visualization is also possible. The density of the weight vectors of an organized map reflects the density of the input: in clustered areas the weight vectors are close to each other, and in the empty space between the clusters they are more sparse.
In the literature, a great deal of attention has been paid to the SOMs for traditional (numeric) data (“precise” observations). However, there are various real situations in which the data are not precisely observed (imprecise observations). In the literature on the SOMs, there are only few works. Bock [3], [4] proposed to visualize symbolic data (i.e. interval-valued data) by using SOMs. Chen et al. [5] suggested a batch version of the SOMs for symbolic data. DʼUrso and De Giovanni [6] proposed Midpoint Radius-Based Self-Organizing Maps (MR-SOMs) for interval-valued data showing a suggestive telecommunications application. Hajjar and Hamdan [7] suggested an algorithm to train the SOMs for interval data based on city-block distance. Yang et al. [8] suggested SOMs for symbolic data. According to our knowledge, in the literature, there are no works on SOMs for fuzzy data.
In this paper, by considering two distance measures for fuzzy data [9], [10], we proposed SOMs allowing clustering and vector quantization of imprecise data (i.e. fuzzy data).
In particular, in Section 2 we show the fuzzy management of data imprecisely observed. In Section 3, following a fuzzy formalization of the data, two distance measures for imprecise observations are illustrated [9], [10]; in particular, we prove that the measure introduced by Coppi et al. [9] is a metric. These metrics are utilized in the Self-Organizing Maps for imprecise data (SOMs-ID) proposed in Section 4. In order to point out the performances of the suggested SOMs-ID a simulation study is illustrated in Section 5. Several suggestive applications are shown in Section 6. Final remarks are made in Section 7.
Section snippets
Fuzzy management of imprecise information
A statistical reasoning process may be looked at as a specific cognitive process characterized by the simultaneous management of information and uncertainty [11]. In this framework, various sources of uncertainty can be taken into account [12]: (1) sampling uncertainty connected to the data generation process; (2) uncertainty regarding the various theoretical ingredients considered in the data analysis process; (3) uncertainty concerning the observation or nature of empirical data (i.e.,
Distance measures for imprecise data
By formalizing the imprecise data in a fuzzy manner, we can compare objects with imprecise information by using distance measures for fuzzy data.
In the literature, several proximity measures (dissimilarity, similarity and distance measures) have been suggested in a fuzzy framework [16].
Some of these measures are defined by comparing suitably the membership functions of the fuzzy data. These distances can be classified according to different approaches [19], [20]: the “functional approach”, in
SOMs for imprecise data (SOMs-ID)
In this section, in order to classify imprecise data, the distance measures (3.1) and (3.2) are exploited in SOMs framework.
The SOM is a network (topology, lattice) of P functional units, or neurons, arranged in a one-dimensional or multi-dimensional configuration. Each neuron p () has a (scalar or vectorial) location (coordinate) dependent on the configuration (one-dimensional or multi-dimensional), and an initial J-dimensional weight .
Then there is a set of IJ
Simulation study
Three simulation studies have been developed to exploit the capabilities of the and algorithms.
It is worth noting that the number of iterations is mainly a heuristic issue. As remarked by Kohonen [34], even if the number of iterations should be reasonably large and depending on the size of the dataset, 10 000 steps and even less may be enough.
The learning rate function is where is set to 0.1.
The neighbourhood function between neuron p and
Applications
In this section different applications are presented to illustrate the effectiveness of the proposed SOMs-ID. As in Section 5, the number of iteration is set to 10 000, the initial value of the learning rate function is and is the maximum semi-radius of the SOM.
Final remarks
In this paper we proposed an extension of the Self-Organizing Maps (SOMs) for data imprecisely observed (Self-Organizing Maps for imprecise data, SOMs-ID). To compute the distance between the multivariate imprecise data and the weights of the neurons of the map we suggested to exploit two different distance measures: the Coppi–DʼUrso–Giordani distance (CDG distance) [9] and the Extended Yang–Ko distance (EYK distance) [10].
We illustrated the main features of the proposed method with a
References (42)
- et al.
Midpoint radius self-organizing maps for interval-valued data with telecommunications application
Appl. Soft Comput.
(2011) - et al.
Self-organizing map for symbolic data
Fuzzy Sets Syst.
(2012) - et al.
Fuzzy and possibilistic clustering for fuzzy data
Comput. Stat. Data Anal.
(2012) - et al.
On a class of fuzzy c-numbers clustering procedures for fuzzy data
Fuzzy Sets Syst.
(1996) Management of uncertainty in statistical reasoning: The case of regression analysis
Int. J. Approx. Reason.
(2008)Multidimensional least-squares fitting of fuzzy models
Math. Model.
(1987)On fuzzy distances and their use in image processing under imprecision
Pattern Recognit.
(1999)- et al.
Measures of similarity among fuzzy concepts: A comparative analysis
Int. J. Approx. Reason.
(1987) - et al.
Distances between fuzzy sets representing grey level images
Fuzzy Sets Syst.
(1998) - et al.
A comparative assessment of measures of similarity of fuzzy values
Fuzzy Sets Syst.
(1993)
A definition of non-probabilistic entropy in the setting of fuzzy set theory
Inf. Control
A weighted fuzzy c-means clustering model for fuzzy data
Comput. Stat. Data Anal.
Fuzzy clustering algorithms for mixed feature variables
Fuzzy Sets Syst.
Fuzzy clustering procedures for conical fuzzy vector data
Fuzzy Sets Syst.
On the mathematical treatment of self organization: extension of some classical results
Neural maps and topographic vector quantization
Neural Netw.
Fuzzy clustering on LR-type fuzzy numbers with an application in Taiwanese tea evaluation
Fuzzy Sets Syst.
A robust clustering procedure for fuzzy data
Comput. Math. Appl.
Multivalued type proximity measure and concept of mutual similarity value useful for clustering symbolic patterns
Pattern Recognit. Lett.
Self-Organization and Associative Memory
Self-Organizing Maps
Cited by (15)
Fuzzy clustering of spatial interval-valued data
2023, Spatial StatisticsKohonen map-wise regression applied to interval data
2021, Knowledge-Based SystemsFuzzy clustering of fuzzy data based on robust loss functions and ordered weighted averaging
2020, Fuzzy Sets and SystemsCitation Excerpt :Some of these proximity measures are defined by taking into account the membership functions of the fuzzy data. These distances can be classified according to different approaches [21,3,59]: functional approach: based on the comparison of the membership functions by means of Minkowski and Canberra distance measures [41,43];
A fuzzy inference system modeling approach for interval-valued symbolic data forecasting
2019, Knowledge-Based SystemsCitation Excerpt :A possibilistic fuzzy c-means clustering algorithm is addressed in [42]. Symbolic fuzzy classification using a fuzzy radial basis function network and self-organizing maps are proposed in [43] and [44,45], respectively. More recently, [46] suggested a fuzzy clustering algorithm for interval-valued data based on the concept of participatory learning.
Clustering of fuzzy data and simultaneous feature selection: A model selection approach
2018, Fuzzy Sets and SystemsCitation Excerpt :Under the assumption that the data are produced (following some probability distribution) from one of a number (unknown) of alternative sources of generation, finite mixture models are rich enough to be extended to an involved statistical model by which issues like selection of an optimal number of clusters, feature saliency and the validity of a given model can be addressed in a formal and structured way [6]. Clustering of fuzzy data is a topic of interest to the modern research on imprecise data analytics [7–14]. Though the mixture model has been extensively used in clustering of crisp data it was first introduced in the fuzzy setup in [5,15].
Distance-based linear discriminant analysis for interval-valued data
2016, Information SciencesCitation Excerpt :The problem of interval data in regression analysis, time series, hypothesis testing and principal component analysis has been extensively elaborated (see, e.g.,[6,7,9,10,16,17,43–45] for the most recent results). Different problems concerning clustering methods for interval-valued data have been previously tackled in the literature [18,20–23,29,30]. Some classification algorithms for interval-valued data have been developed based on the imprecise probability theory in [41,47,48].