Abstract
Learning from cluster examples (LCE) is a hybrid task combining features of two common grouping tasks: learning from examples and clustering. In LCE, each training example is a partition of objects. The task is then to learn from a training set, a rule for partitioning unseen object sets. A general method for learning such partitioning rules is useful in any situation where explicit algorithms for deriving partitions are hard to formalize, while individual examples of correct partitions are easy to specify. In the past, clustering techniques have been applied to such problems, despite being essentially unsuited to the task. We present a technique that has qualitative advantages over standard clustering approaches. We demonstrate these advantages by applying our method to problems in two domains; one with dot patterns and one with more realistic vector-data images.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Bensaid, A. M., Hall, L. O., Bezdek, J. C., &; Clarke, L. P. (1996). Partially supervised clustering for image segmentation. Pattern Recognition, 29:5, 859–871.
Breiman, L., Friedman, J. H., Olshen, R. A., &; Stone, C. J. (1984). Classification and Regression Trees. Wadsworth Inc.
Burset, M., &; Guigó, R. (1996). Evaluation of gene structure prediction programs. Genomics, 34, 353–367.
Cheeseman, P., &; Stutz, J. (1996). Bayesian classification (AutoClass): Theory and results. In U. M. Fayyad, G. Diatetsky-Shapiro, P. Smyth, &; R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining (pp. 153–180). AAAI Press/The MIT Press, Chapt. 6.
Dempster, A. P., Laird, N. M., &; Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society (B), 39:1, 1–38.
Emde, W. (1994). Inductive learning of characteristic concept descriptions from small sets of classified examples. In Proc. of European Conference of Macine Learning, (pp. 103–121). [LNAI 784].
Everitt, B. S. (1993). Cluster Analysis. Edward Arnold, 3rd ed.
Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139–172.
Itoh, S. (1992). Application of MDL principle to pattern classification problems. J. of Japanese Society for Artificial Intelligence, 7:4, 608–614 (in Japanese).
Jain, A. K., &; Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice Hall.
Kamishima, T., Minoh, M., &; Ikeda, K. (1995). Rule formulation based on inductive learning for extraction and classification of diagram symbols. Transactions of The Information Processing Society of Japan, 36:3, 614–626 (in Japanese).
McCallum, A., Nigam, K., &; Ungar, L. H. (2000). Efficient clustering of high-dimensional data sets with application to reference matching. In Proc. of ACM SIGKDD (pp. 169–178).
Michalski, R. S. (1993). Inferential theory of learning as a conceptual basis for multistrategy learning. Machine Learning, 11, 111–151.
Pagallo, G., &; Haussler, D. (1990). Boolean feature discovery in empirical learning. Machine Learning, 5, 71–99.
Pavlidis, T. (1992). Why progress in machine vision is so slow. Pattern Recognition Letters, 13, 221–225.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Quinlan, J. R., &; Rivest, R. L. (1989). Inferring decision trees using the minimum description length principle. Information and Computation, 80, 227–248.
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. J. of the American Statistical Association, 66, 846–850.
Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length. The Annals of Statistics, 11:2, 416–431.
Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton University Press.
Urquhart, R. (1982). Graph theoretical clustering based on limited neghbourhood sets. Pattern Recognition, 15:3, 173–187.
Wallace, C. S., &; Patrick, J. D. (1993). Coding decision trees. Machine Learning, 11, 7–22.
Wong, M. A., &; Lane, T. (1983). A kth nearest neighbour clustering procedure. Journal of the Royal Statistical Society (B), 45:3, 362–368.
Yamanishi, K. (1992). A learning criterion for stochastic rules. Machine Learning, 9, 165–203.
Yamanishi, K., &; Han, T. (1992). Introduction to MDL from viewpoints of information theory. J. of Japanese Society for Artificial Intelligence, 7:3, 427–434 (in Japanese).
Zahn, C. T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. on Computers, 20:1, 68–86.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kamishima, T., Motoyoshi, F. Learning from Cluster Examples. Machine Learning 53, 199–233 (2003). https://doi.org/10.1023/A:1026351106797
Issue Date:
DOI: https://doi.org/10.1023/A:1026351106797