ABSTRACT
Learning from unlabeled data provides innumerable advantages to a wide range of applications where there is a huge amount of unlabeled data freely available. Semi-supervised learning, which builds models from a small set of labeled examples and a potential large set of unlabeled examples, is a paradigm that may effectively use those unlabeled data. Here we propose KGP, a semi-supervised transductive genetic programming algorithm for classification. Apart from being one of the first semi-supervised algorithms, it is transductive (instead of inductive), i.e., it requires only a training dataset with labeled and unlabeled examples, which should represent the complete data domain. The algorithm relies on the three main assumptions on which semi-supervised algorithms are built, and performs both global search on labeled instances and local search on unlabeled instances. Periodically, unlabeled examples are moved to the labeled set after a weighted voting process performed by a committee. Results on eight UCI datasets were compared with Self-Training and KNN, and showed KGP as a promising method for semi-supervised learning.
- D. A. Augusto, H. J. C. Barbosa, and N. F. F. Ebecken. Coevolutionary multi-population genetic programming for data classification. In GECCO, pages 933--940, 2010. Google ScholarDigital Library
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval: The Concepts and Technology behind Search. Addison-Wesley Professional, 2011. Google ScholarDigital Library
- W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. Genetic Programming -- An Introduction; On the Automatic Evolution of Computer Programs and its Applications. Morgan Kaufmann, Jan. 1998. Google ScholarDigital Library
- A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proc. of the 11th Annual Conf. on Computational Learning Theory, pages 92--100, 1998. Google ScholarDigital Library
- O. Chapelle, B. Schölkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, 2010. Google ScholarDigital Library
- D. A. COHN, L. ATLAS, and R. E. LADNER. Improving generalization with active learning. Machine Learning, 15(2):201--221, 1994. Google ScholarDigital Library
- M. de Berg, O. Cheong, M. van Kreveld, and M. Overmars. Computational Geometry: Algorithms and Applications. Springer, 2010. Google ScholarDigital Library
- J. de Freitas, G. L. Pappa, A. S. da Silva, M. A. Gonçalves, E. S. de Moura, A. Veloso, A. H. F. Laender, and M. G. de Carvalho. Active learning genetic programming for record deduplication. In IEEE Congress on Evolutionary Computation, pages 1--8, 2010.Google ScholarCross Ref
- A. Demiriz, K. Bennett, K. P. Bennett, and M. J. Embrechts. Semi-supervised clustering using genetic algorithms ayhan demiriz. In In Artificial Neural Networks in Engineering (ANNIE-99, pages 809--814. ASME Press, 1999.Google Scholar
- A. A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer-Verlag, 2002. Google ScholarDigital Library
- C. Ginestet. Semisupervised learning for computational linguistics. Journal of the Royal Statistical Society: Series A (Statistics in Society), 172(3):694--694, 2009.Google Scholar
- Y. Hong, S. Kwong, H. Xiong, and Q. Ren. Genetic-guided semi-supervised clustering algorithm with instance-level constraints. In GECCO '08: Proceedings of the 10th Annual Conf. on Genetic and Evolutionary Computation, pages 1381--1388, 2008. Google ScholarDigital Library
- C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. In IEEE Transactions on Neural Networks, 2002. Google ScholarDigital Library
- T. Joachims. Transductive inference for text classification using support vector machines. In Proceedings of the International Conference on Machine Learning (ICML), pages 200--209, 1999. Google ScholarDigital Library
- J. Kishore, L. Patnaik, V. Mani, and V. Agrawal. Application of genetic programming for multicategory pattern classification. Evolutionary Computation, IEEE Transactions on, 4(3):242 --258, Sept. 2000. Google ScholarDigital Library
- J. R. Koza. Genetic Programming: on the programming of computers by the means of natural selection. The MIT Press, Massachusetts, 1992. Google ScholarDigital Library
- B. Maeireizo, D. Litman, and R. Hwa. Co-training for predicting emotions with spoken dialogue data. In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions, ACLdemo '04, 2004. Google ScholarDigital Library
- D. Muni, N. Pal, and J. Das. A novel approach to design classifier using genetic programming. IEEE Transactions on Evolutionary Computation, 8(2):183--196, Apr. 2004. Google ScholarDigital Library
- D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz. UCI Repository of machine learning databases. University of California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html, 1998.Google Scholar
- Z.-Y. Niu, D.-H. Ji, and C. L. Tan. Word sense disambiguation using label propagation based semi-supervised learning. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL '05, pages 395--402, 2005. Google ScholarDigital Library
- M. Segond, C. Fonlupt, and D. Robilliard. Genetic programming for protein related text classification. In GECCO, pages 1099--1106, 2009. Google ScholarDigital Library
- G. Tur, D. Hakkani-Tür, and R. E. Schapire. Combining active and semi-supervised learning for spoken language understanding. Speech Communication, 45(2):171 -- 186, 2005.Google ScholarCross Ref
- J. Wang, Y. Zhao, X. Wu, and X.-S. Hua. Transductive multi-label learning for video concept detection. In Proceeding of the 1st ACM International Conference on Multimedia Information Retrieval, MIR '08, pages 298--304, 2008. Google ScholarDigital Library
- X. Zhu. Semi-supervised learning literature survey. Technical report, University of Wisconsin - Madison, 2008.Google Scholar
- X. Zhu and A. B. Goldberg. Introduction to Semi-supervised Learning. Morgan and Claypool Publishers, 2009. Google ScholarDigital Library
Index Terms
- Semi-supervised genetic programming for classification
Recommendations
Semi-supervised multi-label classification using incomplete label information
Highlights- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
AbstractClassifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningIn multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Self-training involving semantic-space finetuning for semi-supervised multi-label document classification
AbstractSelf-training is an effective solution for semi-supervised learning, in which both labeled and unlabeled data are leveraged for training. However, the application scenarios of existing self-training frameworks are mostly confined to single-label ...
Comments