2009 | OriginalPaper | Chapter
Tools for Identification of Structure in Data
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
In recent years, with our increased ability to collect and store data, have come enormous datasets. These datasets may consist of billions of observations and millions of variables. Some of the classical methods of statistical inference, in which a parametric model is studied, are neither feasible nor relevant for analysis of these datasets. The objective is to identify interesting structures in the data, such as clusters of observations, or relationships among the variables. Sometimes, the structures allow a reduction in the dimensionality of the data. Many of the classical methods of multivariate analysis, such as principal components analysis, factor analysis, canonical correlations analysis, and multidimensional scaling, are useful in identifying interesting structures. These methods generally attempt to combine variables in such a way as to preserve information yet reduce the dimension of the dataset. Dimension reduction generally carries a loss of some information. Whether the lost information is important is the major concern in dimension reduction. Another set of methods for reducing the complexity of a dataset attempts to group observations together, combining observations, as it were.