2003 | OriginalPaper | Chapter
Comparing Clusterings by the Variation of Information
Author : Marina Meilă
Published in: Learning Theory and Kernel Machines
Publisher: Springer Berlin Heidelberg
Included in: Professional Book Archive
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set. The criterion, called variation of information (VI), measures the amount of information lost and gained in changing from clustering ${\cal C}$ to clustering ${\cal C}'$. The criterion makes no assumptions about how the clusterings were generated and applies to both soft and hard clusterings. The basic properties of VI are presented and discussed from the point of view of comparing clusterings. In particular, the VI is positive, symmetric and obeys the triangle inequality. Thus, surprisingly enough, it is a true metric on the space of clusterings.