ABSTRACT
The aim of data mining is to find novel and actionable insights in data. However, most algorithms typically just find a single (possibly non-novel/actionable) interpretation of the data even though alternatives could exist. The problem of finding an alternative to a given original clustering has received little attention in the literature. Current techniques (including our previous work) are unfocused/unrefined in that they broadly attempt to find an alternative clustering but do not specify which properties of the original clustering should or should not be retained. In this work, we explore a principled and flexible framework in order to find alternative clusterings of the data. The approach is principled since it poses a constrained optimization problem, so its exact behavior is understood. It is flexible since the user can formally specify positive and negative feedback based on the existing clustering, which ranges from which clusters to keep (or not) to making a trade-off between alternativeness and clustering quality.
Supplemental Material
- A. Asuncion and D. Newman. UCI machine learning repository, 2007.Google Scholar
- E. Bae and J. Bailey. Coala: A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In ICDM '06: Proceedings of the Sixth International Conference on Data Mining, pages 53--62, 2006. Google ScholarDigital Library
- T. Coleman, J. Saunderson, and A. Wirth. Spectral clustering with inconsistent advice. In ICML '08: Proceedings of the 25th international conference on Machine learning, pages 152--159, 2008. Google ScholarDigital Library
- Y. Cui, X. Z. Fern, and J. G. Dy. Non-redundant multi-view clustering via orthogonalization. In ICDM '07: Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, pages 133--142, 2007. Google ScholarDigital Library
- I. Davidson and Z. Qi. Finding alternative clusterings using constraints. In ICDM '08: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, 2008. Google ScholarDigital Library
- I. Davidson and S. S. Ravi. The complexity of non-hierarchical clustering with instance and cluster level constraints. Data Min. Knowl. Discov., 14(1):25--61, 2007. Google ScholarDigital Library
- I. Davidson and S. S. Ravi. Intractability and clustering with constraints. In ICML '07: Proceedings of the 24th international conference on Machine learning, pages 201--208, 2007. Google ScholarDigital Library
- I. Davidson, S. S. Ravi, and M. Ester. Efficient incremental constrained clustering. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 240--249, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- D. Gondek and T. Hofmann. Non-redundant data lustering. In ICDM '04: Proceedings of the Fourth IEEE International Conference on Data Mining, pages 75--82, 2004. Google ScholarDigital Library
- P. Jain, R. Meka, and I. S. Dhillon. Simultaneous unsupervised learning of disparate clusterings. In SDM '08: Proceedings of the SIAM International Conference on Data Mining, pages 858--869, 2008.Google ScholarCross Ref
- J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 22(8):888--905, 2000. Google ScholarDigital Library
Index Terms
- A principled and flexible framework for finding alternative clusterings
Recommendations
A framework to uncover multiple alternative clusterings
Clustering is often referred to as unsupervised learning which aims at uncovering hidden structures from data. Unfortunately, though widely being used as one of the principal tools to understand the data, most conventional clustering techniques are ...
A novel approach for finding alternative clusterings using feature selection
DASFAA'12: Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part IAlternative clustering algorithms target finding alternative groupings of a dataset, on which traditional clustering algorithms can find only one even though many alternatives could exist. In this research, we propose a method for finding alternative ...
Finding Alternative Clusterings Using Constraints
ICDM '08: Proceedings of the 2008 Eighth IEEE International Conference on Data MiningThe aim of data mining is to find novel and actionable insights. However, most algorithms typically just find a single explanation of the data even though alternatives could exist. In this work, we explore a general purpose approach to find an ...
Comments