1 Introduction
2 Neural network clustering methods
2.1 Self-organising neural network map
-
\( {K}_{j,\mathrm{c}\ \left({x}_i\right)\ (n)} \) is the neighbourhood function between each unit (j) on the map and the winning unit c (x i ) at the nth training step
-
\( {\delta}_{j,\mathrm{c}\ \left({x}_i\right)} \) is the distance (Euclidean) from the position of unit (j) to the winning unit c (x i ) on the map.
-
σ(n) is the effective width of the topological neighbourhood at the nth training step; this serves as the moderator of the learning step during training iterations. The size of the effective width shrinks with time to facilitate the convergence of the map.
-
α(n) is the learning rate that depends on the number of iterations (n); this is initialised to a value of around 0.1 which decreases from α max to α min.
3 Related work
4 Methodology
4.1 SOM weights analysis with quantization error method
4.2 Weighted self-organising neural network map
5 Experiment
5.1 Synthetic datasets
Dataset name | Samples | Input features | Classes |
---|---|---|---|
Synthetic_Data01 (Normalised) | 100 | 4 | 5 |
All classes defined by first 4 related features. | |||
This is a simple dataset with no irrelevant inputs and outliers, created mainly for exploring the cost functions of the two self-organising algorithms. | |||
Synthetic_Data02 (Normalised) | 1220 | 7 | 5 |
All classes defined by first 4 related features. Irrelevant features: 5, 6, 7 | |||
Irrelevant inputs are clearly separated from the relevant inputs for easy identification by the algorithms. | |||
Synthetic_Data03 (Normalised) | 1220 | 10 | 5 |
Classes defined by features independently with equal distribution. Class1 = 1, 2, & 3, Class2 = 4, 5, & 6, Class3 = 2, 3, 4 & 5, Class4 = 6, 7, & 8, Class5 = 1, 4, & 8. Noise features; features 9 & 10 | |||
In addition to Synthetic_Data02, the definition of classes was distributed among variables, to identify the self-organising method’s ability to identify the degree of relevance of the input features for classification. | |||
Synthetic_Data04 (Normalised) | 1220 | 9 | 5 |
Classes defined by features independently with unequal distribution. Class1 (550 samples) = 1,2, & 3, Class2 (300 samples) = 1, 2, & 3, Class3 (200 samples) = 2, 4, & 5, Class4 (100 samples) = 1, 3, 5, & 6, Class5 (70 samples) = 1,3,4, & 7, Noise features; features 8 & 9 | |||
Synthetic_Data05 (Unnormalised) | 1220 | 7 | 5 |
All classes defined by first 4 related features. Irrelevant features: 5, 6, 7 | |||
This dataset was created to evaluate the self-organising system’s performance in identifying irrelevant inputs from unnormalised datasets having features of unequal variance. | |||
WaveForm dataset (Normalised) | 5000 | 40 | 3 |
As described by [36] the first 21 inputs of the waveform data describe the classes, the latter 19 are completely irrelevant noise features with mean 0 and variance 1. More details can be found from the UCI repository online. No information is provided on which inputs out of the first 21 describe each class. |
5.2 Experiment design
Clustering Synthetic_Data01 | ||||||
---|---|---|---|---|---|---|
Training parameters | Map dimension, 3 × 3 rectangular grid topology Training epochs, 1000 Learning rate, 0.1 | |||||
Weighted SOM | Standard SOM | |||||
RUNS | Identified important inputs | Correct classes found (all inputs) | Correct classes found (selected inputs) | Identified important attributes | Correct classes found (all inputs) | Correct classes found (selected inputs) |
Run 1 | 1/4 | 1/5 | 0/5 | 4/4 | 5/5 | 5/5 |
Run 2 | 1/4 | 1/5 | 0/5 | 4/4 | 5/5 | 4/5 |
Run 3 | 1/4 | 1/5 | 1/5 | 4/4 | 5/5 | 5/5 |
Run 4 | 1/4 | 2/5 | 1/5 | 4/4 | 5/5 | 5/5 |
Run 5 | 2/4 | 1/5 | 2/5 | 4/4 | 4/5 | 5/5 |
Run 6 | 1/4 | 0/5 | 0/5 | 4/4 | 5/5 | 5/5 |
Run 7 | 1/4 | 1/5 | 0/5 | 4/4 | 5/5 | 5/5 |
Run 8 | 1/4 | 0/5 | 1/5 | 4/4 | 5/5 | 5/5 |
Run 9 | 1/4 | 2/5 | 1/5 | 4/4 | 5/5 | 5/5 |
Run 10 | 1/4 | 0/5 | 0/5 | 4/4 | 4/5 | 5/5 |
Clustering Synthetic_Data02 | ||||||
---|---|---|---|---|---|---|
Training parameters | Map dimension, 3 × 3 rectangular grid topology Training epochs, 1000 Learning rate, 0.1 | |||||
Weighted SOM | Standard SOM | |||||
RUNS | Identified important inputs | Correct classes found (all inputs) | Correct classes found (selected inputs) | Identified important attributes | Correct classes found (all inputs) | Correct classes found (selected inputs) |
Run 1 | 1/4 | 0/5 | 0/5 | 4/4 | 2/5 | 5/5 |
Run 2 | 0/4 | 0/5 | – | 4/4 | 1/5 | 5/5 |
Run 3 | 2/4 | 1/5 | 2/5 | 4/4 | 2/5 | 5/5 |
Run 4 | 1/4 | 0/5 | 1/5 | 3/4 | 1/5 | 4/5 |
Run 5 | 2/4 | 0/5 | 1/5 | 4/4 | 1/5 | 5/5 |
Run 6 | 3/4 | 1/5 | 1/5 | 4/4 | 2/5 | 5/5 |
Run 7 | 1/4 | 1/5 | 1/5 | 4/4 | 3/5 | 5/5 |
Run 8 | 2/4 | 0/5 | 2/5 | 4/4 | 2/5 | 5/5 |
Run 9 | 0/4 | 0/5 | – | 4/4 | 3/5 | 5/5 |
Run 10 | 1/4 | 0/5 | 1/5 | 4/4 | 2/5 | 5/5 |
Clustering Synthetic_Data03 | ||||||
---|---|---|---|---|---|---|
Training parameters | Map dimension, 3 × 3 rectangular grid topology Training epochs, 1000 Learning rate, 0.1 | |||||
Weighted SOM | Standard SOM | |||||
RUNS | Identified important inputs | Correct classes found (all inputs) | Correct classes found (selected inputs) | Identified important attributes | Correct classes found (all inputs) | Correct classes found (selected inputs) |
Run 1 | 0/8 | 0/5 | – | 2/8 | 3/5 | 2/5 |
Run 2 | 0/8 | 0/5 | – | 4/8 | 1/5 | 3/5 |
Run 3 | 1/8 | 1/5 | 0/5 | 2/8 | 1/5 | 2/5 |
Run 4 | 0/8 | 0/5 | – | 3/8 | 0/5 | 2/5 |
Run 5 | 1/8 | 0/5 | 0/5 | 1/8 | 1/5 | 1/5 |
Run 6 | 2/8 | 0/5 | 1/5 | 1/8 | 0/5 | 1/5 |
Run 7 | 1/8 | 1/5 | 1/5 | 5/8 | 1/5 | 4/5 |
Run 8 | 1/8 | 0/5 | 0/5 | 1/8 | 2/5 | 2/5 |
Run 9 | 0/8 | 0/5 | – | 2/8 | 1/5 | 2/5 |
Run 10 | 1/8 | 0/5 | 0/5 | 2/8 | 1/5 | 2/5 |
Clustering Synthetic_Data04 | ||||||
---|---|---|---|---|---|---|
Training parameters | Map dimension, 3 × 3 rectangular grid topology Training epochs, 1000 Learning rate, 0.1 | |||||
Weighted SOM | Standard SOM | |||||
RUNS | Identified important inputs | Correct classes found (all inputs) | Correct classes found (selected inputs) | Identified important attributes | Correct classes found (all inputs) | Correct classes found (selected inputs) |
Run 1 | 2/7 | 1/5 | 1/5 | 4/7 | 3/5 | 2/5 |
Run 2 | 1/7 | 1/5 | 0/5 | 4/7 | 2/5 | 3/5 |
Run 3 | 1/7 | 1/5 | 0/5 | 2/7 | 1/5 | 1/5 |
Run 4 | 1/7 | 1/5 | 0/5 | 4/7 | 4/5 | 2/5 |
Run 5 | 1/7 | 1/5 | 0/5 | 3/7 | 2/5 | 2/5 |
Run 6 | 0/7 | 0/5 | – | 4/7 | 2/5 | 3/5 |
Run 7 | 2/7 | 1/5 | 1/5 | 4/7 | 2/5 | 3/5 |
Run 8 | 1/7 | 1/5 | 0/5 | 2/7 | 1/5 | 1/5 |
Run 9 | 1/7 | 1/5 | 0/5 | 2/7 | 1/5 | 1/5 |
Run 10 | 1/7 | 1/5 | 0/5 | 4/7 | 2/5 | 3/5 |
Clustering Synthetic_Data05 | ||||||
---|---|---|---|---|---|---|
Training parameters | Map dimension, 3 × 3 rectangular grid topology Training epochs, 1000 Learning rate, 0.1 | |||||
Weighted SOM | Standard SOM | |||||
RUNS | Identified important inputs | Correct classes found (all inputs) | Correct classes found (selected inputs) | Identified important attributes | Correct classes found (all inputs) | Correct classes found (selected inputs) |
Run 1 | 2/4 | 0/5 | 1/5 | 3/4 | 3/5 | 5/5 |
Run 2 | 0/4 | 0/5 | – | 4/4 | 2/5 | 5/5 |
Run 3 | 1/4 | 1/5 | 2/5 | 4/4 | 2/5 | 5/5 |
Run 4 | 1/4 | 0/5 | 1/5 | 4/4 | 2/5 | 5/5 |
Run 5 | 1/4 | 0/5 | 2/5 | 4/4 | 2/5 | 5/5 |
Run 6 | 0/4 | 1/5 | – | 4/4 | 2/5 | 5/5 |
Run 7 | 2/4 | 1/5 | 0/5 | 4/4 | 2/5 | 5/5 |
Run 8 | 1/4 | 1/5 | 1/5 | 4/4 | 2/5 | 5/5 |
Run 9 | 0/4 | 1/5 | – | 4/4 | 3/5 | 5/5 |
Run 10 | 1/4 | 1/5 | 0/5 | 3/4 | 2/5 | 5/5 |
Clustering waveform data | |||||
---|---|---|---|---|---|
Training parameters | Map dimension, 26 × 14 rectangular grid topology Training epochs, 1000 Learning rate, 0.1 | ||||
Weighted SOM | Standard SOM | ||||
RUNS | Identified important inputs | Correct classes found (selected inputs) | Identified important attributes | Correct classes found (all inputs) | Correct classes found (selected inputs) |
Run 1 | 2/20 | 1/3 | 18/20 | 1/3 | 2/3 |
Run 2 | 9/20 | 1/3 | 18/20 | 1/3 | 3/3 |
Run 3 | 19/20 | 2/3 | 15/20 | 2/3 | 2/3 |
Run 4 | 11/20 | 1/3 | 18/20 | 1/3 | 3/3 |
Run 5 | 5/20 | 1/3 | 18/20 | 1/3 | 3/3 |
Run 6 | 19/20 | 2/3 | 18/20 | 2/3 | 2/3 |
Run 7 | 15/20 | 2/3 | 19/20 | 1/3 | 3/3 |
Run 8 | 2/20 | 1/3 | 18/20 | 1/3 | 2/3 |
Run 9 | 16/20 | 2/3 | 17/20 | 1/3 | 2/3 |
Run 10 | 11/20 | 1/3 | 18/20 | 1/3 | 3/3 |