Skip to main content

2000 | OriginalPaper | Buchkapitel

Context-Based Similarity Measures for Categorical Databases

verfasst von : Gautam Das, Heikki Mannila

Erschienen in: Principles of Data Mining and Knowledge Discovery

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Similarity between complex data objects is one of the central notions in data mining. We propose certain similarity (or distance) measures between various components of a 0/1 relation. We define measures between attributes, between rows, and between subrelations of the database. They find important applications in clustering, classification, and several other data mining processes. Our measures are based on the contexts of individual components. For example, two products (i.e., attributes) are deemed similar if their respective sets of customers (i.e., subrelations) are similar. This reveals more subtle relationships between components, something that is usually missing in simpler measures. Our problem of finding distance measures can be formulated as a system of nonlinear equations. We present an iterative algorithm which, when seeded with random initial values, converges quickly to stable distances in practice (typically requiring less than five iterations). The algorithm requires only one database scan. Results on artificial and real data show that our method is efficient, and produces results with intuitive appeal.

Metadaten
Titel
Context-Based Similarity Measures for Categorical Databases
verfasst von
Gautam Das
Heikki Mannila
Copyright-Jahr
2000
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/3-540-45372-5_20

Neuer Inhalt