Abstract
We propose a general framework to incorporate first-order logic (FOL) clauses, that are thought of as an abstract and partial representation of the environment, into kernel machines that learn within a semi-supervised scheme. We rely on a multi-task learning scheme where each task is associated with a unary predicate defined on the feature space, while higher level abstract representations consist of FOL clauses made of those predicates. We re-use the kernel machine mathematical apparatus to solve the problem as primal optimization of a function composed of the loss on the supervised examples, the regularization term, and a penalty term deriving from forcing real-valued constraints deriving from the predicates. Unlike for classic kernel machines, however, depending on the logic clauses, the overall function to be optimized is not convex anymore. An important contribution is to show that while tackling the optimization by classic numerical schemes is likely to be hopeless, a stage-based learning scheme, in which we start learning the supervised examples until convergence is reached, and then continue by forcing the logic clauses is a viable direction to attack the problem. Some promising experimental results are given on artificial learning tasks and on the automatic tagging of bibtex entries to emphasize the comparison with plain kernel machines.
Article PDF
Similar content being viewed by others
References
Allgower, E., & Georg, K. (2003). Introduction to numerical continuation methods. In Society for industrial mathematics (p. 2003).
Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. The Journal of Machine Learning Research, 7, 2434.
Bengio, Y. (2009). Curriculum learning. In Proceedings of the 26th annual international conference on machine learning (pp. 41–48).
Caponnetto, A., Micchelli, C., Pontil, M., & Ying, Y. (2008). Universal kernels for multi-task learning. Journal of Machine Learning Research.
Chapelle, O. (2007). Training a support vector machine in the primal. Neural Computation, 19(5), 1155–1178.
Cumby, C., & Roth, D. (2002). Learning with feature description logics. In Proceedings of the 12th international conference on inductive logic programming.
Cumby, C., & Roth, D. (2003). On kernel methods for relational learning. In Proceedings of the twentieth international conference on machine learning (ICML-2003), Washington DC, 2003.
Diligenti, M., Gori, M., Maggini, M., & Rigutini, L. (2010a). Multitask kernel-based learning with first-order logic constraints. In The 20th international conference on inductive logic programming.
Diligenti, M., Gori, M., Maggini, M., & Rigutini, L. (2010b). Multitask kernel-based learning with logic constraints. In The 19th European conference on artificial intelligence.
Fanizzi, N., D’Amato, C., & Esposito, F. (2008). Statistical learning for inductive query answering on owl ontologies. In THE SEMANTIC WEB—ISWC (pp. 195–212).
Fung, G., Mangasarian, O., & Shavlik, J. (2002). Knowledgebased support vector machine classifiers. In Proceedings of sixteenth conference on neural information processing systems (NIPS), Vancouver, Canada.
Fung, G., Mangasarian, O., & Shavlik, J. (2003). Knowledgebased nonlinear kernel classifiers. In International conference on learning theory—COLT, Washington D.C.
Giaquinta, M., & Hildebrand, S. (1996a). Calculus of variations I (Vol. 1). Berlin: Springer.
Giaquinta, M., & Hildebrand, S. (1996b). Calculus of variations II (Vol. 2). Berlin: Springer.
Gori, M. (2009). Semantic-based regularization and Piaget’s cognitive stages. Neural Networks, 22(7), 1035–1036.
Gori, M., & Melacci, S. (2010). Learning with convex constraints. In 20th International conference on artificial neural networks.
Gorse, D., Shepherd, A. J., & Taylor, J. (1997). The new era in supervised learning. Neural Networks, 10(2), 343–352.
Gorse, D., Sherpard, A. J., & Taylor, J. (2004). A classical algorithm for avoiding local minima. In Proceedings of WCCI-2004.
Guerin, F. (2008). Constructivism in ai: Prospects, progress and challenges. In Proceedings of the AISB convention 2008, Aberdeen, Scotland, 1–4 April, 2008, (pp. 20–27).
Guerin, F., & McKenzie, D. (2008). A Piagetian model of early sensorimotor development. In Proceedings of the eighth international conference on epigenetic robotics, University of Sussex, 30–31 July 2008.
Haussler, D. (1999). Convolution kernels on discrete structures, Tech. rep., Department of Computer Science, University of California at Santa Cruz.
Hitzler, P., Holldobler, S., & Sedab, A. K. (2004). Logic programs and connectionist networks. Journal of Applied Logic, 2(3), 245–272.
Inhelder, B., & Piaget, J. (1958). The growth of logical thinking from childhood to adolescence. New York: Basic Books.
Katakis, I., Tsoumakas, G., & Vlahavas, I. (2008). Multilabel text classification for automated tag suggestion. ECML PKDD Discovery Challenge, 75.
Klement, E., Mesiar, R., & Pap, E. (2000). Triangular norms. Norwell: Kluwer Academic.
Klir, G., & Yuan, B. (1995). Fuzzy sets and fuzzy logic: theory and applications. New York: Prentice Hall.
Landwehr, N., Passerini, A., Raedt, L. D., & Frasconi, P. (2006). Kfoil: learning simple relational kernels. In Proceeding of the AAAI-2006.
Landwehr, N., Passerini, A., Raedt, L., & Frasconi, P. (2010). Fast learning of relational kernels. Machine Learning.
Laurer, F., & Bloch, G. (2009). Incorporating prior knowledge in support vector machines for classification: a review. Neurocomputing, 71(7–9), 1578–1594.
Le, Q., Smola, A., & Gartner, T. (2006). Simpler knowledge-based support vector machines. In Proceedings of the 23rd international conference on machine learning.
Maclin, R., Wild, E., Shavlik, J., Torrey, L., & Walker, T. (2007). Refining rules incorporated into knowledge-based support vector learners via successive linear programming. In A. Press (Ed.), AAAI conference on artificial intelligence, Vancouver, British Columbia, Canada, pp. 584–589.
Melacci, S., Maggini, M., & Gori, M. (2009). Semi-supervised learning with constraints for multi-view object recognition. In Proceedings of the 19th international conference on artificial neural networks (pp. 653–662). Berlin: Springer.
Muggleton, S.L.H., Amini, A., & Sternberg, M., (2005). In A. Hoffmann, H. Motoda, & T. Scheffer (Eds.), Support vector inductive logic programming (pp. 163–175). San Mateo: Kaufmann.
Piaget, J. (1961). La psychologie de l’intelligence. Paris: Armand Colin.
Poggio, T., & Girosi, F. (1989). A theory of networks for approximation and learning. Tech. rep., MIT, 1989.
Raedt, L. D., Frasconi, P., Kersting, K., & Muggleton, S. (Eds.). (2008). Probabilistic inductive logic programming (Vol. 4911). Lecture notes in artificial intelligence. Berlin: Springer.
Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine Learning, 62(1–2), 107–136.
Scholkopf, B., & Smola, A. J. (2001). Learning with Kernels. Cambridge: MIT Press.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47.
Sloman, A. (2009). Ontologies for baby animals and robots, Tech. rep., Talks 68.
Weng, J. (2004). Developmental robotics: Theory and experiments. International Journal of Humanoid Robotics, 1, 199–236.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Paolo Frasconi and Francesca Lisi.
Rights and permissions
About this article
Cite this article
Diligenti, M., Gori, M., Maggini, M. et al. Bridging logic and kernel machines. Mach Learn 86, 57–88 (2012). https://doi.org/10.1007/s10994-011-5243-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-011-5243-x