Article

Active learning using pre-clustering

Authors:
Hieu T. Nguyen

University of Amsterdam, Amsterdam, The Netherlands

University of Amsterdam, Amsterdam, The Netherlands
View Profile

,
Arnold Smeulders

University of Amsterdam, Amsterdam, The Netherlands

University of Amsterdam, Amsterdam, The Netherlands
View Profile

ICML '04: Proceedings of the twenty-first international conference on Machine learningJuly 2004https://doi.org/10.1145/1015330.1015349

Published:04 July 2004Publication History

ICML '04: Proceedings of the twenty-first international conference on Machine learning

ABSTRACT

The paper is concerned with two-class active learning. While the common approach for collecting data in active learning is to select samples close to the classification boundary, better performance can be achieved by taking into account the prior data distribution. The main contribution of the paper is a formal framework that incorporates clustering into active learning. The algorithm first constructs a classifier on the set of the cluster representatives, and then propagates the classification decision to the other samples via a local noise model. The proposed model allows to select the most representative samples as well as to avoid repeatedly labeling samples in the same cluster. During the active learning process, the clustering is adjusted using the coarse-to-fine strategy in order to balance between the advantage of large clusters and the accuracy of the data representation. The results of experiments in image databases show a better performance of our algorithm compared to the current methods.

References

Campbell, C., Cristianini, N., & Smola, A. (2000). Query learning with large margin classifiers. Proc. 17th International Conf. on Machine Learning (pp. 111--118). Morgan Kaufmann, CA. Google ScholarDigital Library
Chapelle, O., Weston, J., & Scholkopf, B. (2002). Cluster kernels for semi-supervised learning. Advances in Neural Information Processing Systems.Google Scholar
Cohn, D. A., Ghahramani, Z., & Jordan, M. I. (1996). Active learning with statistical models. Journal of Artificial Intelligence research, 4, 129--145. Google ScholarDigital Library
Kaufman, L., & Rousseeuw, P. (1990). Finding groups in data: An introduction to cluster analysis. John Wiley & Sons.Google Scholar
Lewis, D. D., & Gale, W. A. (1994). A sequential algorithm for training text classifiers. Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval (pp. 3--12). Springer Verlag. Google ScholarDigital Library
McCallum, A. K., & Nigam, K. (1998). Employing EM in pool-based active learning for text classification. Proc. 15th International Conf. on Machine Learning (pp. 350--358). Morgan Kaufmann, CA. Google ScholarDigital Library
Miller, D., & Uyar, H. (1996). A mixture of experts classifier with learning based on both labelled and unlabelled data. Advances in Neural Information Processing Systems 9 (pp. 571--577).Google Scholar
Nigam, K., McCallum, A., Thurn, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, 39, 103--134. Google ScholarDigital Library
Pham, T., Worring, M., & Smeulders, A. (2002). Face detection by aggregated bayesian network classifiers. Pattern Recogn. Letters, 23, 451--461. Google ScholarDigital Library
Roy, N., & McCallum, A. (2001). Toward optimal active learning through sampling estimation of error reduction. Proc. 18th International Conf. on Machine Learning (pp. 441--448). Morgan Kaufmann, CA. Google ScholarDigital Library
Schohn, G., & Cohn, D. (2000). Less is more: Active learning with support vector machines. Proc. 17th International Conf. on Machine Learning (pp. 839--846). Morgan Kaufmann, CA. Google ScholarDigital Library
Seeger, M. (2001). Learning with labeled and unlabeled data (Technical Report). Edinburgh University.Google Scholar
Shen, X., & Zhai, C. (2003). Active feedback - UIUC TREC-2003 HARD experiments. The 12th Text Retrieval Conference, TREC.Google Scholar
Struyf, A., Hubert, M., & Rousseeuw, P. (1997). Integrating robust clustering techniques in s-plus. Computational Statistics and Data Analysis, 26, 17--37. Google ScholarDigital Library
Tang, M., Luo, X., & Roukos, S. (2002). Active learning for statistical natural language parsing. Proc. of the Association for Computational Linguistics 40th Anniversary Meeting. Philadelphia, PA. Google ScholarDigital Library
Tong, S., & Chang, E. (2001). Support vector machine active learning for image retrieval. Proceedings of the 9th ACM int. conf. on Multimedia (pp. 107--118). Ottawa. Google ScholarDigital Library
Tong, S., & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2, 45--66. Google ScholarDigital Library
Xu, Z., Yu, K., Tresp, V., Xu, X., & Wang, J. (2003). Representative sampling for text classification using support vector machines. 25th European Conf. on Information Retrieval Research, ECIR 2003. Springer. Google ScholarDigital Library
Zhang, C., & Chen, T. (2002). An active learning framework for content-based information retrieval. IEEE trans on multimedia, 4, 260--268. Google ScholarDigital Library
Zhang, T., & Oles, F. (2000). A probability analysis on the value of unlabeled data for classification problems. Proc. Int. Conf. on Machine Learning.Google Scholar
Zhang, T., & Oles, F. J. (2001). Text categorization based on regularized linear classification methods. Information Retrieval, 4, 5--31. Google ScholarDigital Library
Zhu, J., & Hastie, T. (2001). Kernel logistic regression and the import vector machine. Advances in Neural Information Processing Systems.Google Scholar

Active learning using pre-clustering
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
      2. Unsupervised learning
    2. Machine learning approaches

Recommendations

A clustering-based active learning method to query informative and representative samples
Abstract
Active learning (AL) has widely been used to address the shortage of labeled datasets. Yet, most AL techniques require an initial set of labeled data as the knowledge base to perform active querying. The informativeness of the initial labeled set ...
Read More
Hierarchical confidence-based active clustering
SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing

In this paper, we address the problem of semi-supervised hierarchical clustering by using an active clustering solution with cluster-level constraints. This active learning approach is based on a concept of merge confidence in agglomerative clustering. ...
Read More
Active learning through density clustering

We propose the active learning through density clustering algorithm with three new features.We design a new importance measure to select representative instances deterministically.We employ tri-partition to determine the action to be taken on each ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '04: Proceedings of the twenty-first international conference on Machine learning
July 2004
934 pages
ISBN:1581138385
DOI:10.1145/1015330
Conference Chair:
Carla Brodley
Purdue University/Tufts University
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 July 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 376
  Total Citations
  View Citations
- 2,757
  Total Downloads
- Downloads (Last 12 months)130
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Active learning using pre-clustering

ICML '04: Proceedings of the twenty-first international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

A clustering-based active learning method to query informative and representative samples

Hierarchical confidence-based active clustering

Active learning through density clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Active learning using pre-clustering

ICML '04: Proceedings of the twenty-first international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

A clustering-based active learning method to query informative and representative samples

Hierarchical confidence-based active clustering

Active learning through density clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media