Elsevier

Applied Soft Computing

Volume 11, Issue 5, July 2011, Pages 3877-3886
Applied Soft Computing

Midpoint radius self-organizing maps for interval-valued data with telecommunications application

https://doi.org/10.1016/j.asoc.2011.01.006Get rights and content

Abstract

The aim of this paper is to cluster units (objects) described by interval-valued information by adopting an unsupervised neural network approach. By considering a suitable distance measure for interval data, self-organizing maps to deal with interval-valued data are suggested. The technique, called midpoint radius self-organizing maps (MR-SOMs), recovers the underlying structure of interval-valued data by using both the midpoints (or centers) and the radii (a measure of the interval width) information. In order to show how the method MR-SOMs works a suggestive application on telecommunication market segmentation is described.

Introduction

In classical data analysis, the observations are represented by single-valued data (numerical data). However, in many real applications, the utilization of single-valued variables may bring about a heavy loss of information. For example, in chemometrics, we may think about the mineral concentrations of food products analyzed at different times or in different experimental situations; in meteorology, we may consider the daily temperature, humidity and wind speed registered in different places; in medicine, we may make reference to daily systolic and diastolic pressure, pulse rate, temperature of patients; in finance, we may examine the daily rate of exchange between Euro–Dollar; and so on. In all the previous cases, it is more interesting to take into account the minimum and maximum values registered in the period or to consider the mean value and the deviation from the mean to the lower bound (minimum) and upper bound (maximum) of the interval value than the average because they offer more detailed and complete information about the phenomenon under examination.

Interval-valued data as input (and eventually output) for neural network have been studied with respect to supervised learning, in particular multi layer perceptron (MLP). Various authors have handled interval-valued inputs relying on interval arithmetic [1], introducing an interval-based MLP, which can be trained thanks to a modified back-propagation algorithm [2], [3], [4]. These approaches cannot be easily integrated in existing neural network software.

Rossi et al. [5] have proposed two methods that allow to use interval-valued data as inputs for multi-layer perceptron: the extremal values method, which is based on a complete description of intervals, and the simulation method, which is based on a probabilistic understanding of intervals. Both methods can be easily implemented on top of existing neural network software. In the extreme value method the interval-valued input is transformed into a pair of real numbers, for instance the lower and upper bounds of the interval. In this way a set of interval variables, i.e. J interval variables, are translated into 2J real value variables and an MLP neural network can be used exactly as a classical MLP. In the probabilistic method interval-valued data is considered as simple probabilistic data. If a sample for the MLP is described by the interval [a, b], a first way to proceed is to assume that the sample can take any value between a and b, with uniform probability. The interval can be replaced by its middle (the mean) and the network can be trained with the obtained values (mean method). Another way to proceed is to replace each sample by a set of real valued samples obtained thanks to simulation, assuming that the interval [a, b] corresponds to a uniform distribution in [a, b].

In this paper interval-valued data as input (and output) for a neural network with respect to unsupervised learning, in particular self-organizing maps (SOMs), are introduced and studied. A dissimilarity measure for interval-valued data is illustrated and used in the SOMs resulting in a simple topology preserving model named midpoint radius self-organizing maps (MR-SOMs).

The paper is organized as follows. In Section 2 the mathematical formalization and a dissimilarity measure for interval-valued data are introduced; successively SOMs fed with interval value data are suggested and their applicability to clustering is discussed. In Section 3 an application of MR-SOMs to segmentation in telecommunications environment is described.

Section snippets

Self-organizing maps for interval-valued data

In literature, different clustering approaches for interval-valued data have been suggested [6].

In this paper we consider the SOMs approach due to its ability, beside cluster identification, to organize clusters in a topology, as will be shown in the following.

In the following, we define interval-valued data and a suitable distance measure for interval-valued data. Successively, we show the suggested unsupervised neural network approach for interval-valued data, called midpoint radius

Experimental results

In this section, in order to demonstrate the effectiveness of the suggested clustering approach, we investigate the applicability of the midpoint radius self-organizing maps in telecommunications market segmentation.

The SOMs can be widely utilized in the telecommunications segmentation.

For instance, an application of SOMs to segmentation of residential consumers of communication services from American Telephone and Telegraph Company (AT&T) is described in [21] and references cited. Due to the

Conclusion

In this paper an extension of SOMs to deal with interval-valued data is presented. Firstly, an Euclidean squared distance measure for interval-valued data is shown. Successively, this measure is exploited in unsupervised neural nertworks, i.e. SOMs network. The SOMs fed with interval-valued data, called MR-SOMs (midpoint radius SOMs), are applied in a classification framework. The MR-SOMs inherit the usefulness of the considered (squared) distance measure for interval values based on the

Acknowledgments

The authors thank the three referees for their useful comments and suggestions which helped to improve the quality and presentation of this manuscript. This research was partially supported by the following grants of the Italian Ministry of Education, University and Research: PRIN 2008 Research (“Analysis of latent structures: new frontiers in statistical methods and models”) and PRIN 2007 Research (“Dependence analysis in problems with partial information structure”).

Pierpaolo D’Urso is a Professor of Statistics, Sapienza University of Rome. He received his degree and Ph.D. in Statistics both from Sapienza University of Rome. He is Associate Editor of Information Sciences and of Advances in Computational Research. He is member of the Program Committee of the IEEE International Conferences on Granular Computing (IEEE-GrC) since 2006. He is currently Technical Committee Member of the IEEE Computational Intelligence Society - Granular Computing. He is a

References (22)

  • S.J. Simoff

    Handling uncertainty in neural networks: an interval approach

    IEEE International Conference on Neural Networks

    (1996)
  • Cited by (26)

    • Multiple breaks detection in financial interval-valued time series

      2021, Expert Systems with Applications
      Citation Excerpt :

      This explains the strong and still growing effort spent by academicians and practitioners to account for the interval nature of the data; in particular, a variety of methods that either extend classical approaches or develop specific new ones are still at the core of the scientific debate. Examples can be found in different empirical fields, such as chemometrics (D’Urso & De Giovanni, 2011), ecotoxicology (de Almeida, de Souza, & Candeias, 2013), meteorology (Hajjar & Hamdan, 2013), physics (Groenen, Winsberg, Rodrguez, & Diday, 2006), pattern recognition (Douzal-Chouakria, Billard, & Diday, 2011), telecommunications (D’Urso & De Giovanni, 2011), economics and finance (Gonzalez-Rivera & Lin, 2013). Interval-based observations arise often in intra-period time series analysis where, usually, several values are collected at each time period (week, day, hour, even minute); noticeable examples are the time series of the temperatures or the air pollutant concentrations in meteorology; the systolic or diastolic pressure of a patience in medicine; the volatility of an asset in finance.

    • Clustering of fuzzy data and simultaneous feature selection: A model selection approach

      2018, Fuzzy Sets and Systems
      Citation Excerpt :

      Under the assumption that the data are produced (following some probability distribution) from one of a number (unknown) of alternative sources of generation, finite mixture models are rich enough to be extended to an involved statistical model by which issues like selection of an optimal number of clusters, feature saliency and the validity of a given model can be addressed in a formal and structured way [6]. Clustering of fuzzy data is a topic of interest to the modern research on imprecise data analytics [7–14]. Though the mixture model has been extensively used in clustering of crisp data it was first introduced in the fuzzy setup in [5,15].

    View all citing articles on Scopus

    Pierpaolo D’Urso is a Professor of Statistics, Sapienza University of Rome. He received his degree and Ph.D. in Statistics both from Sapienza University of Rome. He is Associate Editor of Information Sciences and of Advances in Computational Research. He is member of the Program Committee of the IEEE International Conferences on Granular Computing (IEEE-GrC) since 2006. He is currently Technical Committee Member of the IEEE Computational Intelligence Society - Granular Computing. He is a referee for several international journals and conferences in statistics and data mining. His main research interests include multivariate analysis and data mining. In particular, his recent research activity is focused on fuzzy clustering, classification of time series, fuzzy regression, and statistical methods in econophysics and neuromarketing.

    Livia De Giovanni is a Professor of Statistics, LUISS University of Rome. She received her degree in Statistics from Sapienza University of Rome. From 1988 to 1997 she worked at Telecom Italia, Network Division, Research and Development Department. She is a referee for several international journals and conferences in statistics. Her main research interests include statistical inference in stochastic processes, long memory processes, non parametric statistics, neural networks, statistical models in telecommunications.

    View full text