Elsevier

Fuzzy Sets and Systems

Volume 237, 16 February 2014, Pages 63-89
Fuzzy Sets and Systems

Self-Organizing Maps for imprecise data

https://doi.org/10.1016/j.fss.2013.09.011Get rights and content

Abstract

Self-Organizing Maps (SOMs) consist of a set of neurons arranged in such a way that there are neighbourhood relationships among neurons. Following an unsupervised learning procedure, the input space is divided into regions with common nearest neuron (vector quantization), allowing clustering of the input vectors. In this paper, we propose an extension of the SOMs for data imprecisely observed (Self-Organizing Maps for imprecise data, SOMs-ID). The learning algorithm is based on two distances for imprecise data. In order to illustrate the main features and to compare the performances of the proposed method, we provide a simulation study and different substantive applications.

Introduction

Self-Organizing Maps (SOMs) (Kohonen [1], [2]) consist of a set of neurons arranged in a linear (one-dimensional) or rectangular (two-dimensional) configuration, such that there are neighbourhood relations among the neurons. Each neuron is attached to a weight vector of the same dimension as the input space (the multi-dimensional space of the units or input vectors). After completion of training, by assigning each input vector to the neuron with the nearest weight vector (reference vector), the SOMs are able to divide the input space into regions with common nearest weight vector (vector quantization), allowing clustering of the input vectors. Moreover, under appropriate training, because of the neighbourhood relation contributed by the interconnection among neurons, the SOMs exhibit the important property of topology preservation. In other words, if two input vectors are close in the input space, the corresponding reference vectors (closest neurons) will also be close in the neural network. Therefore, at least in two-dimensional neural networks, visualization is also possible. The density of the weight vectors of an organized map reflects the density of the input: in clustered areas the weight vectors are close to each other, and in the empty space between the clusters they are more sparse.

In the literature, a great deal of attention has been paid to the SOMs for traditional (numeric) data (“precise” observations). However, there are various real situations in which the data are not precisely observed (imprecise observations). In the literature on the SOMs, there are only few works. Bock [3], [4] proposed to visualize symbolic data (i.e. interval-valued data) by using SOMs. Chen et al. [5] suggested a batch version of the SOMs for symbolic data. DʼUrso and De Giovanni [6] proposed Midpoint Radius-Based Self-Organizing Maps (MR-SOMs) for interval-valued data showing a suggestive telecommunications application. Hajjar and Hamdan [7] suggested an algorithm to train the SOMs for interval data based on city-block distance. Yang et al. [8] suggested SOMs for symbolic data. According to our knowledge, in the literature, there are no works on SOMs for fuzzy data.

In this paper, by considering two distance measures for fuzzy data [9], [10], we proposed SOMs allowing clustering and vector quantization of imprecise data (i.e. fuzzy data).

In particular, in Section 2 we show the fuzzy management of data imprecisely observed. In Section 3, following a fuzzy formalization of the data, two distance measures for imprecise observations are illustrated [9], [10]; in particular, we prove that the measure introduced by Coppi et al. [9] is a metric. These metrics are utilized in the Self-Organizing Maps for imprecise data (SOMs-ID) proposed in Section 4. In order to point out the performances of the suggested SOMs-ID a simulation study is illustrated in Section 5. Several suggestive applications are shown in Section 6. Final remarks are made in Section 7.

Section snippets

Fuzzy management of imprecise information

A statistical reasoning process may be looked at as a specific cognitive process characterized by the simultaneous management of information and uncertainty [11]. In this framework, various sources of uncertainty can be taken into account [12]: (1) sampling uncertainty connected to the data generation process; (2) uncertainty regarding the various theoretical ingredients considered in the data analysis process; (3) uncertainty concerning the observation or nature of empirical data (i.e.,

Distance measures for imprecise data

By formalizing the imprecise data in a fuzzy manner, we can compare objects with imprecise information by using distance measures for fuzzy data.

In the literature, several proximity measures (dissimilarity, similarity and distance measures) have been suggested in a fuzzy framework [16].

Some of these measures are defined by comparing suitably the membership functions of the fuzzy data. These distances can be classified according to different approaches [19], [20]: the “functional approach”, in

SOMs for imprecise data (SOMs-ID)

In this section, in order to classify imprecise data, the distance measures (3.1) and (3.2) are exploited in SOMs framework.

The SOM is a network (topology, lattice) of P functional units, or neurons, arranged in a one-dimensional or multi-dimensional configuration. Each neuron p (1pP) has a (scalar or vectorial) location (coordinate) rp dependent on the configuration (one-dimensional or multi-dimensional), and an initial J-dimensional weight μp=(μp1,,μpj,,μpJ).

Then there is a set of IJ

Simulation study

Three simulation studies have been developed to exploit the capabilities of the SOM-IDCDG and SOM-IDEYK algorithms.

It is worth noting that the number of iterations is mainly a heuristic issue. As remarked by Kohonen [34], even if the number of iterations should be reasonably large and depending on the size of the dataset, 10 000 steps and even less may be enough.

The learning rate function is α(s)=α(0)exp((s+1)/maxiter) where α(0) is set to 0.1.

The neighbourhood function between neuron p and

Applications

In this section different applications are presented to illustrate the effectiveness of the proposed SOMs-ID. As in Section 5, the number of iteration is set to 10 000, the initial value of the learning rate function is α(0)=0.1 and σ(0) is the maximum semi-radius of the SOM.

Final remarks

In this paper we proposed an extension of the Self-Organizing Maps (SOMs) for data imprecisely observed (Self-Organizing Maps for imprecise data, SOMs-ID). To compute the distance between the multivariate imprecise data and the weights of the neurons of the map we suggested to exploit two different distance measures: the Coppi–DʼUrso–Giordani distance (CDG distance) [9] and the Extended Yang–Ko distance (EYK distance) [10].

We illustrated the main features of the proposed method with a

References (42)

Cited by (15)

  • Fuzzy clustering of fuzzy data based on robust loss functions and ordered weighted averaging

    2020, Fuzzy Sets and Systems
    Citation Excerpt :

    Some of these proximity measures are defined by taking into account the membership functions of the fuzzy data. These distances can be classified according to different approaches [21,3,59]: functional approach: based on the comparison of the membership functions by means of Minkowski and Canberra distance measures [41,43];

  • A fuzzy inference system modeling approach for interval-valued symbolic data forecasting

    2019, Knowledge-Based Systems
    Citation Excerpt :

    A possibilistic fuzzy c-means clustering algorithm is addressed in [42]. Symbolic fuzzy classification using a fuzzy radial basis function network and self-organizing maps are proposed in [43] and [44,45], respectively. More recently, [46] suggested a fuzzy clustering algorithm for interval-valued data based on the concept of participatory learning.

  • Clustering of fuzzy data and simultaneous feature selection: A model selection approach

    2018, Fuzzy Sets and Systems
    Citation Excerpt :

    Under the assumption that the data are produced (following some probability distribution) from one of a number (unknown) of alternative sources of generation, finite mixture models are rich enough to be extended to an involved statistical model by which issues like selection of an optimal number of clusters, feature saliency and the validity of a given model can be addressed in a formal and structured way [6]. Clustering of fuzzy data is a topic of interest to the modern research on imprecise data analytics [7–14]. Though the mixture model has been extensively used in clustering of crisp data it was first introduced in the fuzzy setup in [5,15].

  • Distance-based linear discriminant analysis for interval-valued data

    2016, Information Sciences
    Citation Excerpt :

    The problem of interval data in regression analysis, time series, hypothesis testing and principal component analysis has been extensively elaborated (see, e.g.,[6,7,9,10,16,17,43–45] for the most recent results). Different problems concerning clustering methods for interval-valued data have been previously tackled in the literature [18,20–23,29,30]. Some classification algorithms for interval-valued data have been developed based on the imprecise probability theory in [41,47,48].

View all citing articles on Scopus
View full text