skip to main content
article

Extreme re-balancing for SVMs: a case study

Published:01 June 2004Publication History
Skip Abstract Section

Abstract

There are many practical applications where learning from single class examples is either, the only possible solution, or has a distinct performance advantage. The first case occurs when obtaining examples of a second class is difficult, e.g., classifying sites of "interest" based on web accesses. The second situation is exemplified by the gene knock-out experiments for understanding Aryl Hydrocarbon Receptor signalling pathway that provided the data for the second task of the KDD 2002 Cup, where minority one-class SVMs significantly outperform models learnt using examples from both classes.This paper explores the limits of supervised learning of a two class discrimination from data with heavily unbalanced class proportions. We focus on the case of supervised learning with support vector machines. We consider the impact of both sampling and weighting imbalance compensation techniques and then extend the balancing to extreme situations when one of the classes is ignored completely and the learning is accomplished using examples from a single class.Our investigation with the data for KDD 2002 Cup as well as text benchmarks such as Reuters Newswire shows that there is a consistent pattern of performance differences between one and two-class learning for all SVMs investigated, and these patterns persist even with aggressive dimensionality reduction through automated feature selection. Using insight gained from the above analysis, we generate synthetic data showing similar pattern of performance.

References

  1. D. Bamber. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J. Math. Psych., 12:387--415, 1975.]]Google ScholarGoogle ScholarCross RefCross Ref
  2. R. Centor. The use of ROC curves and their analysis. Med. Decis. Making, 11:102--106, 1991.]]Google ScholarGoogle ScholarCross RefCross Ref
  3. P. K. Chan and S. J. Stolfo. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In Knowledge Discovery and Data Mining, KDD-98, pages 164--168, 1998.]]Google ScholarGoogle Scholar
  4. P. K. Chan and S. J. Stolfo. Toward scalable learning with non-uniform distributions: Effects and a multi-classifier approach. In http://www1.cs.columbia.edu/ sal/recent-papers.html, 1999.]]Google ScholarGoogle Scholar
  5. Y. Chen, X. Zhou, and T. Huang. One-class svm for learning in image retrieval. In Proceedings of IEEE International Conference on Image Processing (ICIP'01 Oral), 2001.]]Google ScholarGoogle Scholar
  6. M. Craven. The Genomics of a Signaling Pathway: A KDD Cup Challenge Task. SIGKDD Explorations, 4(2), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive Learning Algorithms and Representations for Text Categorization. In Seventh International Conference on Information and Knowledge Management, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pages 973--978, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Fawcett. Roc graphs: Notes and practical considerations for researchers. In HP Labs Tech Report HPL-2003-4, 2003.]]Google ScholarGoogle Scholar
  11. D. Hand and R. Till. A simple generalisation of the area under the roc curve for multiple class classification problems. Machine Learning, 45:171--186, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Japkowicz. Are we better off without counter examples? In Proceedings of the First International ICSC Congress on Computational Intelligence Methods and Applications (CIMA-99), pages 242--248, 1999.]]Google ScholarGoogle Scholar
  13. N. Japkowicz, C. Myers, and M. Gluck. A novelty detection approach to classification. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 518--523, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Japkowicz and S. Stephen. The class imbalance problem: A systematic study. Intelligent Data Analysis Journal, 6(5), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Joshi. On Evaluating Performance of Classifiers for Rare Classes. In Proceedings of the Second IEEE International Conference on Data Mining (ICDM'02), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Kowalczyk and B. Raskutti. One Class SVM for Yeast Regulation Prediction. SIGKDD Explorations, 4(2), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Kubat, H. R., and S. Matwin. Learning when negative examples abound. In Proceedings of the Ninth European Conference on Machine Learning ECML97, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Lewis and J. Catlett. Training Text Classifiers by Uncertainty Sampling. In Proceedings of the Seventeenth International ACM SIGIR Conference on Research and Development in Information Retrieval, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. M. Maneivitz and M. Yousef. One-class SVMs for Document Classification. Journal of Machine Learning Research, 2:139--154, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. R. Quinlan. Induction of Decision Trees. Machine Learning, 1(1), (1986).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Raskutti, H. Ferrá, and A. Kowalczyk. Second Order Features for Maximising Text Classification Performance. In Proceedings of the Twelfth European Conference on Machine Learning ECML01, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw Hill, 1983.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. Schölkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson. Estimating the support of a high-dimensional distribution. In Technical Report 99-87, Microsoft Research, 1999. 1999.]]Google ScholarGoogle Scholar
  24. B. Schölkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Yang and J. O. Pedersen. A Comparative Study on Feature Selection in Text Categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGKDD Explorations Newsletter
    ACM SIGKDD Explorations Newsletter  Volume 6, Issue 1
    Special issue on learning from imbalanced datasets
    June 2004
    117 pages
    ISSN:1931-0145
    EISSN:1931-0153
    DOI:10.1145/1007730
    Issue’s Table of Contents

    Copyright © 2004 Authors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 June 2004

    Check for updates

    Qualifiers

    • article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader