Abstract
Instance-based learning algorithms are often faced with the problem of deciding which instances to store for use during generalization. Storing too many instances can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce storage requirements in instance-based learning algorithms and other exemplar-based algorithms. Second, it proposes six additional reduction algorithms called DROP1–DROP5 and DEL (three of which were first described in Wilson & Martinez, 1997c, as RT1–RT3) that can be used to remove instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 classification tasks. Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest average generalization accuracy in these experiments, especially in the presence of uniform class noise.
Article PDF
Similar content being viewed by others
References
Aha, D. W. (1992). Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies, 36, 267–287.
Aha, D.W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.
Batchelor, B. G. (1978). Pattern recognition: ideas in practice. New York: Plenum Press.
Biberman, Y. (1994).Acontext similarity measure. Proceedings of the European Conference on Machine Learning (ECML-94) (pp. 49–63). Catania, Italy: Springer Verlag.
Brodley, C. E. (1993). Addressing the selective superiority problem: automatic algorithm/model class selection. Proceedings of the Tenth International Machine Learning Conference, Amherst, MA (pp. 17–24).
Broomhead, D. S. & Lowe, D. (1988). Multi-variable functional interpolation and adaptive networks. Complex Systems, 2, 321–355.
Cameron-Jones, R. M. (1995). Instance selection by encoding length heuristic with random mutation hill climbing. Proceedings of the Eighth Australian Joint Conference on Artificial Intelligence (pp. 99–106).
Carpenter, G. A. & Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine.Computer Vision, Graphics, and Image Processing, 37, 54–115.
Chang, C.-L. (1974). Finding prototypes for nearest neighbor classifiers. IEEE Transactions on Computers, 23(11), 1179–1184.
Conover, W. J. (1971). Practical nonparametric statistics (pp. 206–209). New York: John Wiley.
Cover, T. M. & Hart, P. E. (1967). Nearest neighbor pattern classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 13(1), 21–27.
Dasarathy, B. V. (1991). Nearest neighbor (NN) norms: NN pattern classification techniques. Los Alamitos, CA: IEEE Computer Society Press.
DeGroot, M. H. (1986). Probability and statistics, 2nd edn. Reading, MA: Addison-Wesley.
Diday, E. (1974). Recent progress in distance and similarity measures in pattern recognition. Second International Joint Conference on Pattern Recognition (pp. 534–539).
Dietterich, T. G. (1989). Limitations on inductive learning. Proceedings of the Sixth International Conference on Machine Learning (pp. 124–128). San Mateo, CA: Morgan Kaufmann.
Domingos, P. (1995). Rule induction and instance-based learning: a unified approach. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95) (pp. 1226–1232). Montreal, Canada: Morgan Kaufmann.
Domingos, P. (1996). Unifying instance-based and rule-based induction. Machine Learning, 24, 141–168.
Dudani, S. A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man and Cybernetics, 6(4), 325–327.
Gates, G.W. (1972). The reduced nearest neighbor rule. IEEE Transactions on Information Theory, 18(3), 431–433.
Hart, P. E. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory, 14, 515–516.
Hecht-Nielsen, R. (1987). Counterpropagation networks. Applied Optics, 26(23), 4979–4984.
Kibler, D. & Aha, D.W. (1987). Learning representative exemplars of concepts: an initial case study. Proceedings of the Fourth International Workshop on Machine Learning (pp. 24–30). Irvine, CA: Morgan Kaufmann.
Kubat, M. & Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. In D. Fisher (Ed.), Machine Learning: Proceedings of the Fourteenth International Conference (ICML'97) (pp. 179–186). San Francisco, CA: Morgan Kaufmann Publishers.
Lowe, D. G. (1995). Similarity Metric Learning for aVariable-Kernel Classifier. Neural Computation, 7(1), 72–85.
Merz, C. J. & Murphy, P. M. (1996). UCI repository of machine learning databases. Irvine, CA: University of California Irvine, Department of Information and Computer Science. Internet: http://www.ics.uci.edu/ ~mlearn/ MLRepository.html.
Michalski, R. S., Stepp, R. E., & Diday, E. (1981). In L. N. Kanal & Azriel Rosenfeld (Eds.). A recent advance in data analysis: clustering objects into classes characterized by conjunctive concepts. Progress in Pattern Recognition (Vol. 1, pp. 33–56). New York: North-Holland.
Mitchell, T. M. (1980). The need for biases in learning generalizations. In J.W. Shavlik & T. G. Dietterich (Eds.), Readings in Machine Learning (pp. 184–191). San Mateo, CA: Morgan Kaufmann.
Nadler, M. & Smith, E. P. (1993). Pattern recognition engineering. New York: Wiley.
Papadimitriou, C. H. & Steiglitz, K. (1982). Combinatorial Optimization: Algorithms and Complexity, Englewood Cliffs, NJ, Prentice-Hall.
Papadimitriou, C. H. & Bentley, J. L. (1980). A worst-case analysis of nearest neighbor searching by projection. Lecture Notes in Computer Science (Vol. 85): Automata, Languages and Programming (pp. 470–482). New York: Springer-Verlag.
Renals, S. & Rohwer, R. (1989). Phoneme classification experiments using radial basis functions. Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN'89) (Vol. 1, pp. 461–467).
Ritter, G. L., Woodruff, H. B., Lowry, S. R.,& Isenhour, T. L. (1975). An algorithm for a selective nearest neighbor decision rule. IEEE Transactions on Information Theory, 21(6), 665–669.
Rumelhart, D. E. & McClelland, J. L. (1986). Parallel distributed processing (Ch. 8, pp. 318–362). MIT Press.
Salzberg, S. (1991). A nearest hyperrectangle learning method. Machine Learning, 6, 277–309.
Schaffer, C. (1994). A conservation law for generalization performance. Proceedings of the Eleventh International Conference on Machine Learning (ML'94) (pp. 259–265). New Brunswick, NJ: Morgan Kaufmann.
Skalak, D. B. (1994). Prototype and feature selection by sampling and random mutation hill climbing algorithms. Proceedings of the Eleventh International Conference on Machine Learning (ML94) (pp. 293–301). Morgan Kaufmann.
Specht, D. F. (1992). Enhancements to probabilistic neural networks. Proceedings International Joint Conference on Neural Networks (IJCNN '92) (Vol. 1, pp. 761–768).
Sproull, R. F. (1991). Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica, 6, 579–589.
Stanfill, C. & Waltz, D. (1986). Toward memory-based reasoning. Communications of the ACM, 29, 1213–1228.
Tomek, I. (1976). An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics, 6(6), 448–452.
Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327–352.
Wasserman, P. D. (1993). Advanced methods in neural computing (pp. 147–176). New York, NY: Van Nostrand Reinhold.
Watson, I. & Marir, F. (1994). Case-based reasoning: a review. The knowledge engineering review, 9(4), Cambridge, UK: Cambridge University Press.
Wess, S., Althoff, K.-D.,& Richter, M.M. (1993). Using k-d trees to improve the retrieval step in case-based reasoning. Topics in Case-Based Reasoning, First European Workshop (EWCBR-93) (pp. 67–181). Springer-Verlag.
Wettschereck, D. (1994). A hybrid nearest-neighbor and nearest-hyperrectangle algorithm, In F. Bergadano & Raedt, L. de (Eds.), Proceedings of the 7th European Conference on Machine Learning (pp. 323–335). LNAI, Vol. 784.
Wettschereck, D. & Dietterich, T. G. (1995). An experimental comparison of nearest-neighbor and nearesthyperrectangle algorithms. Machine Learning, 19(1), 5–28.
Wettschereck, D., Aha, D.W. & Mohri, T. (1997). A review and comparative evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11 (Special issue on Lazy Learning), pp. 273–314.
Wilson, D.R. & Martinez, T.R.(1996) Heterogeneous radial basis functions Proceedings of the International Conference on Neural Networks (ICNN'96) (Vol.2pp.1263–1267).
Wilson, D. R. & Martinez, T. R. (1997a). Improved heterogeneous distance functions. Journal of Artificial Intelligence Research (JAIR), 6(1), 1–34.
Wilson, D. R. & Martinez, T. R. (1997b). Improved center point selection for radial basis function networks. Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms (ICANNGA'97) (pp. 514–517).
Wilson, D. R. & Martinez, T. R. (1997c). Instance pruning techniques. In Fisher, D. (Ed.), Machine Learning: Proceedings of the Fourteenth International Conference (ICML'97) (pp.403–411). San Francisco, CA: Morgan Kaufmann Publishers.
Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, 2(3), 408–421.
Wolpert, D. H. (1993). On overfitting avoidance as bias. Technical Report SFI TR 92–03–5001. Santa Fe, NM: The Santa Fe Institute.
Zhang, J. (1992). Selecting typical instances in instance-based learning. Proceedings of the Ninth International Conference on Machine Learning (pp. 470–479). Aberdeen, Scotland: Morgan Kaufmann.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Wilson, D.R., Martinez, T.R. Reduction Techniques for Instance-Based Learning Algorithms. Machine Learning 38, 257–286 (2000). https://doi.org/10.1023/A:1007626913721
Issue Date:
DOI: https://doi.org/10.1023/A:1007626913721