The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. Data can reveal clusters of different geometrical shapes, sizes and densities as demonstrated in Figure 1.1. Clusters can be spherical (a), elongated or “linear” (b), and also hollow (c) and (d). Their prototypes can be points (a), lines (b), spheres (c) or ellipses (d) or their higher-dimensional analogs. Clusters (b) to (d) can be characterized as linear and nonlinear subspaces of the data space (ℝ
in this case). Algorithms that can detect subspaces of the data space are of particular interest for identification. The performance of most clustering algorithms is influenced not only by the geometrical shapes and densities of the individual clusters but also by the spatial relations and distances among the clusters. Clusters can be well separated, continuously connected to each other, or overlapping each other. The separation of clusters is influenced by the scaling and normalization of the data (see Example 1.1, Example 1.2 and Example 1.3).
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten