skip to main content
10.1145/2245276.2245457acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Enhancing semi-supervised document clustering with feature supervision

Published:26 March 2012Publication History

ABSTRACT

Traditional semi-supervised clustering uses only limited user supervision in the form of labeled instances and pairwise instance constraints to aid unsupervised clustering. However, user supervision can also be provided in alternative forms for document clustering, such as labeling a feature by indicating whether it discriminates among clusters. This paper thus fills this void by enhancing traditional semi-supervised clustering with feature supervision which asks the user to label discriminating features during labeling the instance or pairwise instance constraints. Various types of semi-supervised clustering algorithms were explored with feature supervision. Our experimental results on several real-world datasets demonstrate that augmenting the instance-level supervision with feature-level supervision can significantly improve document clustering performance.

References

  1. Josh Attenberg, Prem Melville, and Foster Provost. A Unified Approach to Active Dual Supervision for Labeling Features and Examples. In ECML PKDD 2010 Part I, LNAI 6321, pages 40--55. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Basu, A. Banerjee, and R. Mooney. Semi-supervised clustering by seeding. In International Conference on Machine Learning, pages 19--26, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 59--68. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Cheng, K. A. Hua, and K. Vu. Constrained locally weighted clustering. Proceedings of the PVLDB'08, 1 (1): 90--101, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 89--98. ACM, 2003. ISBN 1581137370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. E. Dom. An information-theoretic external cluster-validity measure. Technical Report RJ 10219, IBM Research Division, 2001.Google ScholarGoogle Scholar
  7. G. Druck, G. Mann, and A. McCallum. Learning from labeled features using generalized expectation criteria. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 595--602. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Hu, E. Milios, and J. Blustein. Interactive feature selection for document clustering. In the 26th Symposium On Applied Computing, pages 1148--1155. ACM Special Interest Group on Applied Computing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Huang and T. M. Mitchell. Text clustering with extended user feedback. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, page 420. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. Ji and W. Xu. Document clustering with prior knowledge. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, page 412. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Joe Lamantia. Text Clouds: A New Form of Tag Cloud? http://www.joelamantia.com/tag-clouds/text-clouds-a-new-form-of-tag-cloud, 2007.Google ScholarGoogle Scholar
  12. B. Liu, X. Li, W. S. Lee, and P. S. Yu. Text classification by labeling words. In Proceedings of the National Conference on Artificial Intelligence, pages 425--430, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Raghavan, O. Madani, and R. Jones. Interactive feature selection. In Proceedings of IJCAI 05: The 19th International Joint Conference on Artificial Intelligence, pages 841--846, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. W. Tang, H. Xiong, S. Zhong, and J. Wu. Enhancing semi-supervised clustering: a feature projection perspective. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 707--716. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Wagstaff, C. Cardie, S. Rogers, and S. Schrödl. Constrained k-means clustering with background knowledge. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 577--584, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. Advances in Neural Information Processing Systems, pages 521--528, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Enhancing semi-supervised document clustering with feature supervision

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing
          March 2012
          2179 pages
          ISBN:9781450308571
          DOI:10.1145/2245276
          • Conference Chairs:
          • Sascha Ossowski,
          • Paola Lecca

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 March 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SAC '12 Paper Acceptance Rate270of1,056submissions,26%Overall Acceptance Rate1,650of6,669submissions,25%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader