Self-Organizing Maps for imprecise data

doi:10.1016/j.fss.2013.09.011

Fuzzy Sets and Systems

Volume 237, 16 February 2014, Pages 63-89

https://doi.org/10.1016/j.fss.2013.09.011 Get rights and content

Abstract

Self-Organizing Maps (SOMs) consist of a set of neurons arranged in such a way that there are neighbourhood relationships among neurons. Following an unsupervised learning procedure, the input space is divided into regions with common nearest neuron (vector quantization), allowing clustering of the input vectors. In this paper, we propose an extension of the SOMs for data imprecisely observed (Self-Organizing Maps for imprecise data, SOMs-ID). The learning algorithm is based on two distances for imprecise data. In order to illustrate the main features and to compare the performances of the proposed method, we provide a simulation study and different substantive applications.

Introduction

Self-Organizing Maps (SOMs) (Kohonen [1], [2]) consist of a set of neurons arranged in a linear (one-dimensional) or rectangular (two-dimensional) configuration, such that there are neighbourhood relations among the neurons. Each neuron is attached to a weight vector of the same dimension as the input space (the multi-dimensional space of the units or input vectors). After completion of training, by assigning each input vector to the neuron with the nearest weight vector (reference vector), the SOMs are able to divide the input space into regions with common nearest weight vector (vector quantization), allowing clustering of the input vectors. Moreover, under appropriate training, because of the neighbourhood relation contributed by the interconnection among neurons, the SOMs exhibit the important property of topology preservation. In other words, if two input vectors are close in the input space, the corresponding reference vectors (closest neurons) will also be close in the neural network. Therefore, at least in two-dimensional neural networks, visualization is also possible. The density of the weight vectors of an organized map reflects the density of the input: in clustered areas the weight vectors are close to each other, and in the empty space between the clusters they are more sparse.

In the literature, a great deal of attention has been paid to the SOMs for traditional (numeric) data (“precise” observations). However, there are various real situations in which the data are not precisely observed (imprecise observations). In the literature on the SOMs, there are only few works. Bock [3], [4] proposed to visualize symbolic data (i.e. interval-valued data) by using SOMs. Chen et al. [5] suggested a batch version of the SOMs for symbolic data. DʼUrso and De Giovanni [6] proposed Midpoint Radius-Based Self-Organizing Maps (MR-SOMs) for interval-valued data showing a suggestive telecommunications application. Hajjar and Hamdan [7] suggested an algorithm to train the SOMs for interval data based on city-block distance. Yang et al. [8] suggested SOMs for symbolic data. According to our knowledge, in the literature, there are no works on SOMs for fuzzy data.

In this paper, by considering two distance measures for fuzzy data [9], [10], we proposed SOMs allowing clustering and vector quantization of imprecise data (i.e. fuzzy data).

In particular, in Section 2 we show the fuzzy management of data imprecisely observed. In Section 3, following a fuzzy formalization of the data, two distance measures for imprecise observations are illustrated [9], [10]; in particular, we prove that the measure introduced by Coppi et al. [9] is a metric. These metrics are utilized in the Self-Organizing Maps for imprecise data (SOMs-ID) proposed in Section 4. In order to point out the performances of the suggested SOMs-ID a simulation study is illustrated in Section 5. Several suggestive applications are shown in Section 6. Final remarks are made in Section 7.

Section snippets

Fuzzy management of imprecise information

A statistical reasoning process may be looked at as a specific cognitive process characterized by the simultaneous management of information and uncertainty [11]. In this framework, various sources of uncertainty can be taken into account [12]: (1) sampling uncertainty connected to the data generation process; (2) uncertainty regarding the various theoretical ingredients considered in the data analysis process; (3) uncertainty concerning the observation or nature of empirical data (i.e.,

Distance measures for imprecise data

By formalizing the imprecise data in a fuzzy manner, we can compare objects with imprecise information by using distance measures for fuzzy data.

In the literature, several proximity measures (dissimilarity, similarity and distance measures) have been suggested in a fuzzy framework [16].

Some of these measures are defined by comparing suitably the membership functions of the fuzzy data. These distances can be classified according to different approaches [19], [20]: the “functional approach”, in

SOMs for imprecise data (SOMs-ID)

In this section, in order to classify imprecise data, the distance measures (3.1) and (3.2) are exploited in SOMs framework.

The SOM is a network (topology, lattice) of P functional units, or neurons, arranged in a one-dimensional or multi-dimensional configuration. Each neuron p ( $1 ⩽ p ⩽ P$ ) has a (scalar or vectorial) location (coordinate) $r_{p}$ dependent on the configuration (one-dimensional or multi-dimensional), and an initial J-dimensional weight $μ_{p} = (μ_{p 1}, \dots, μ_{p j}, \dots, μ_{p J})$ .

Then there is a set of IJ

Simulation study

Three simulation studies have been developed to exploit the capabilities of the ${}_{CDG}{SOM-ID}$ and ${}_{EYK}{SOM-ID}$ algorithms.

It is worth noting that the number of iterations is mainly a heuristic issue. As remarked by Kohonen [34], even if the number of iterations should be reasonably large and depending on the size of the dataset, 10 000 steps and even less may be enough.

The learning rate function is $α (s) = α (0) \cdot \exp (- (s + 1) / maxiter)$ where $α (0)$ is set to 0.1.

The neighbourhood function between neuron p and

Applications

In this section different applications are presented to illustrate the effectiveness of the proposed SOMs-ID. As in Section 5, the number of iteration is set to 10 000, the initial value of the learning rate function is $α (0) = 0.1$ and $σ (0)$ is the maximum semi-radius of the SOM.

Final remarks

In this paper we proposed an extension of the Self-Organizing Maps (SOMs) for data imprecisely observed (Self-Organizing Maps for imprecise data, SOMs-ID). To compute the distance between the multivariate imprecise data and the weights of the neurons of the map we suggested to exploit two different distance measures: the Coppi–DʼUrso–Giordani distance (CDG distance) [9] and the Extended Yang–Ko distance (EYK distance) [10].

We illustrated the main features of the proposed method with a

References (42)

P. DʼUrso et al.
Midpoint radius self-organizing maps for interval-valued data with telecommunications application
Appl. Soft Comput.
(2011)
M. Yang et al.
Self-organizing map for symbolic data
Fuzzy Sets Syst.
(2012)
R. Coppi et al.
Fuzzy and possibilistic clustering for fuzzy data
Comput. Stat. Data Anal.
(2012)
M. Yang et al.
On a class of fuzzy c-numbers clustering procedures for fuzzy data
Fuzzy Sets Syst.
(1996)
R. Coppi
Management of uncertainty in statistical reasoning: The case of regression analysis
Int. J. Approx. Reason.
(2008)
A. Celmiņš
Multidimensional least-squares fitting of fuzzy models
Math. Model.
(1987)
I. Bloch
On fuzzy distances and their use in image processing under imprecision
Pattern Recognit.
(1999)
R. Zwick et al.
Measures of similarity among fuzzy concepts: A comparative analysis
Int. J. Approx. Reason.
(1987)
R. Lowen et al.
Distances between fuzzy sets representing grey level images
Fuzzy Sets Syst.
(1998)
C. Pappis et al.
A comparative assessment of measures of similarity of fuzzy values
Fuzzy Sets Syst.
(1993)

A. De Luca et al.

A definition of non-probabilistic entropy in the setting of fuzzy set theory

Inf. Control

(1972)

P. DʼUrso et al.

A weighted fuzzy c-means clustering model for fuzzy data

Comput. Stat. Data Anal.

(2006)

M. Yang et al.

Fuzzy clustering algorithms for mixed feature variables

Fuzzy Sets Syst.

(2004)

M. Yang et al.

Fuzzy clustering procedures for conical fuzzy vector data

Fuzzy Sets Syst.

(1999)

P. Conti et al.

On the mathematical treatment of self organization: extension of some classical results

H. Bauer et al.

Neural maps and topographic vector quantization

Neural Netw.

(1999)

W. Hung et al.

Fuzzy clustering on LR-type fuzzy numbers with an application in Taiwanese tea evaluation

Fuzzy Sets Syst.

(2005)

W. Hung et al.

A robust clustering procedure for fuzzy data

Comput. Math. Appl.

(2010)

D.S. Guru et al.

Multivalued type proximity measure and concept of mutual similarity value useful for clustering symbolic patterns

Pattern Recognit. Lett.

(2004)

T. Kohonen

Self-Organization and Associative Memory

(1989)

T. Kohonen