skip to main content
10.1145/2001576.2001746acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Semi-supervised genetic programming for classification

Authors Info & Claims
Published:12 July 2011Publication History

ABSTRACT

Learning from unlabeled data provides innumerable advantages to a wide range of applications where there is a huge amount of unlabeled data freely available. Semi-supervised learning, which builds models from a small set of labeled examples and a potential large set of unlabeled examples, is a paradigm that may effectively use those unlabeled data. Here we propose KGP, a semi-supervised transductive genetic programming algorithm for classification. Apart from being one of the first semi-supervised algorithms, it is transductive (instead of inductive), i.e., it requires only a training dataset with labeled and unlabeled examples, which should represent the complete data domain. The algorithm relies on the three main assumptions on which semi-supervised algorithms are built, and performs both global search on labeled instances and local search on unlabeled instances. Periodically, unlabeled examples are moved to the labeled set after a weighted voting process performed by a committee. Results on eight UCI datasets were compared with Self-Training and KNN, and showed KGP as a promising method for semi-supervised learning.

References

  1. D. A. Augusto, H. J. C. Barbosa, and N. F. F. Ebecken. Coevolutionary multi-population genetic programming for data classification. In GECCO, pages 933--940, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval: The Concepts and Technology behind Search. Addison-Wesley Professional, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. Genetic Programming -- An Introduction; On the Automatic Evolution of Computer Programs and its Applications. Morgan Kaufmann, Jan. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proc. of the 11th Annual Conf. on Computational Learning Theory, pages 92--100, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. O. Chapelle, B. Schölkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. A. COHN, L. ATLAS, and R. E. LADNER. Improving generalization with active learning. Machine Learning, 15(2):201--221, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. de Berg, O. Cheong, M. van Kreveld, and M. Overmars. Computational Geometry: Algorithms and Applications. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. de Freitas, G. L. Pappa, A. S. da Silva, M. A. Gonçalves, E. S. de Moura, A. Veloso, A. H. F. Laender, and M. G. de Carvalho. Active learning genetic programming for record deduplication. In IEEE Congress on Evolutionary Computation, pages 1--8, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  9. A. Demiriz, K. Bennett, K. P. Bennett, and M. J. Embrechts. Semi-supervised clustering using genetic algorithms ayhan demiriz. In In Artificial Neural Networks in Engineering (ANNIE-99, pages 809--814. ASME Press, 1999.Google ScholarGoogle Scholar
  10. A. A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer-Verlag, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Ginestet. Semisupervised learning for computational linguistics. Journal of the Royal Statistical Society: Series A (Statistics in Society), 172(3):694--694, 2009.Google ScholarGoogle Scholar
  12. Y. Hong, S. Kwong, H. Xiong, and Q. Ren. Genetic-guided semi-supervised clustering algorithm with instance-level constraints. In GECCO '08: Proceedings of the 10th Annual Conf. on Genetic and Evolutionary Computation, pages 1381--1388, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. In IEEE Transactions on Neural Networks, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Joachims. Transductive inference for text classification using support vector machines. In Proceedings of the International Conference on Machine Learning (ICML), pages 200--209, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Kishore, L. Patnaik, V. Mani, and V. Agrawal. Application of genetic programming for multicategory pattern classification. Evolutionary Computation, IEEE Transactions on, 4(3):242 --258, Sept. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. R. Koza. Genetic Programming: on the programming of computers by the means of natural selection. The MIT Press, Massachusetts, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Maeireizo, D. Litman, and R. Hwa. Co-training for predicting emotions with spoken dialogue data. In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions, ACLdemo '04, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Muni, N. Pal, and J. Das. A novel approach to design classifier using genetic programming. IEEE Transactions on Evolutionary Computation, 8(2):183--196, Apr. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz. UCI Repository of machine learning databases. University of California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html, 1998.Google ScholarGoogle Scholar
  20. Z.-Y. Niu, D.-H. Ji, and C. L. Tan. Word sense disambiguation using label propagation based semi-supervised learning. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL '05, pages 395--402, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Segond, C. Fonlupt, and D. Robilliard. Genetic programming for protein related text classification. In GECCO, pages 1099--1106, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Tur, D. Hakkani-Tür, and R. E. Schapire. Combining active and semi-supervised learning for spoken language understanding. Speech Communication, 45(2):171 -- 186, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  23. J. Wang, Y. Zhao, X. Wu, and X.-S. Hua. Transductive multi-label learning for video concept detection. In Proceeding of the 1st ACM International Conference on Multimedia Information Retrieval, MIR '08, pages 298--304, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. Zhu. Semi-supervised learning literature survey. Technical report, University of Wisconsin - Madison, 2008.Google ScholarGoogle Scholar
  25. X. Zhu and A. B. Goldberg. Introduction to Semi-supervised Learning. Morgan and Claypool Publishers, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Semi-supervised genetic programming for classification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation
        July 2011
        2140 pages
        ISBN:9781450305570
        DOI:10.1145/2001576

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 July 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,669of4,410submissions,38%

        Upcoming Conference

        GECCO '24
        Genetic and Evolutionary Computation Conference
        July 14 - 18, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader