ABSTRACT
Traditional semi-supervised clustering uses only limited user supervision in the form of labeled instances and pairwise instance constraints to aid unsupervised clustering. However, user supervision can also be provided in alternative forms for document clustering, such as labeling a feature by indicating whether it discriminates among clusters. This paper thus fills this void by enhancing traditional semi-supervised clustering with feature supervision which asks the user to label discriminating features during labeling the instance or pairwise instance constraints. Various types of semi-supervised clustering algorithms were explored with feature supervision. Our experimental results on several real-world datasets demonstrate that augmenting the instance-level supervision with feature-level supervision can significantly improve document clustering performance.
- Josh Attenberg, Prem Melville, and Foster Provost. A Unified Approach to Active Dual Supervision for Labeling Features and Examples. In ECML PKDD 2010 Part I, LNAI 6321, pages 40--55. Springer, 2010. Google ScholarDigital Library
- S. Basu, A. Banerjee, and R. Mooney. Semi-supervised clustering by seeding. In International Conference on Machine Learning, pages 19--26, 2002. Google ScholarDigital Library
- S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 59--68. ACM, 2004. Google ScholarDigital Library
- H. Cheng, K. A. Hua, and K. Vu. Constrained locally weighted clustering. Proceedings of the PVLDB'08, 1 (1): 90--101, 2008. Google ScholarDigital Library
- I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 89--98. ACM, 2003. ISBN 1581137370. Google ScholarDigital Library
- B. E. Dom. An information-theoretic external cluster-validity measure. Technical Report RJ 10219, IBM Research Division, 2001.Google Scholar
- G. Druck, G. Mann, and A. McCallum. Learning from labeled features using generalized expectation criteria. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 595--602. ACM, 2008. Google ScholarDigital Library
- Y. Hu, E. Milios, and J. Blustein. Interactive feature selection for document clustering. In the 26th Symposium On Applied Computing, pages 1148--1155. ACM Special Interest Group on Applied Computing, 2011. Google ScholarDigital Library
- Y. Huang and T. M. Mitchell. Text clustering with extended user feedback. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, page 420. ACM, 2006. Google ScholarDigital Library
- X. Ji and W. Xu. Document clustering with prior knowledge. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, page 412. ACM, 2006. Google ScholarDigital Library
- Joe Lamantia. Text Clouds: A New Form of Tag Cloud? http://www.joelamantia.com/tag-clouds/text-clouds-a-new-form-of-tag-cloud, 2007.Google Scholar
- B. Liu, X. Li, W. S. Lee, and P. S. Yu. Text classification by labeling words. In Proceedings of the National Conference on Artificial Intelligence, pages 425--430, 2004. Google ScholarDigital Library
- H. Raghavan, O. Madani, and R. Jones. Interactive feature selection. In Proceedings of IJCAI 05: The 19th International Joint Conference on Artificial Intelligence, pages 841--846, 2005. Google ScholarDigital Library
- W. Tang, H. Xiong, S. Zhong, and J. Wu. Enhancing semi-supervised clustering: a feature projection perspective. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 707--716. ACM, 2007. Google ScholarDigital Library
- K. Wagstaff, C. Cardie, S. Rogers, and S. Schrödl. Constrained k-means clustering with background knowledge. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 577--584, 2001. Google ScholarDigital Library
- E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. Advances in Neural Information Processing Systems, pages 521--528, 2003.Google ScholarDigital Library
Index Terms
- Enhancing semi-supervised document clustering with feature supervision
Recommendations
Document Clustering With Dual Supervision Through Feature Reweighting
Traditional semi-supervised clustering uses only limited user supervision in the form of instance seeds for clusters and pairwise instance constraints to aid unsupervised clustering. However, user supervision can also be provided in alternative forms ...
Semi-supervised document clustering with dual supervision through seeding
SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied ComputingSemi-supervised clustering algorithms for general problems use a small amount of labeled instances or pairwise instance constraints to aid the unsupervised clustering. However, user supervision can also be provided in alternative forms for document ...
A unified framework for document clustering with dual supervision
Semi-supervised clustering algorithms for general problems use a small amount of labeled instances or pairwise instance constraints to aid the unsupervised clustering. However, user supervision can also be provided in alternative forms for document ...
Comments