Abstract
Similarity also plays a crucial role in support vector machines. Similarity assessment plays a key role in lazy learning methods such as k-nearest neighbor or case-based reasoning. In this paper we will show how refinement graphs, that were originally introduced for inductive learning, can be employed to assess and reason about similarity. We will define and analyze two similarity measures, S λ and S π , based on refinement graphs. The anti-unification-based similarity, S λ , assesses similarity by finding the anti-unification of two instances, which is a description capturing all the information common to these two instances. The property-based similarity, S π , is based on a process of disintegrating the instances into a set of properties, and then analyzing these property sets. Moreover these similarity measures are applicable to any representation language for which a refinement graph that satisfies the requirements we identify can be defined. Specifically, we present a refinement graph for feature terms, in which several languages of increasing expressiveness can be defined. The similarity measures are empirically evaluated on relational data sets belonging to languages of different expressiveness.
Article PDF
Similar content being viewed by others
References
Aamodt, A., & Plaza, E. (1994). Case-based reasoning: foundational issues, methodological variations, and system approaches. Artificial Intelligence Communications, 7(1), 39–59.
Aït-Kaci, H. (2007). Description logic vs. order-sorted feature logic. In Description Logics.
Aït-Kaci, H., & Podelski, A. (1992). Towards a meaning of LIFE. Technical Report 11, Digital Research Laboratory.
Aït-Kaci, H., Podelski, A., & Goldstein, S. C. (1993). Order-sorted feature theory unification. In ILPS ’93: Proc. 1993 international symposium on logic programming (pp. 506–524). Cambridge: MIT Press.
Aït-Kaci, H., & Sasaki, Y. (2001). An axiomatic approach to feature term generalization. In EMCL ’01: Proceedings of the 12th European conference on machine learning (pp. 1–12). London: Springer.
Arcos, J. L. (1997). The NOOS representation language. PhD thesis, Universitat Politècnica de Catalunya.
Armengol, E., & Plaza, E. (2003). Relational case-based reasoning for carcinogenic activity prediction. Artificial Intelligence Review, 20(1–2), 121–141.
Armengol, E., & Plaza, E. (2005). Lazy learning for predictive toxicology based on a chemical ontology. In W. Dubitzky & F. Azuaje (Eds.), Artificial intelligence methods and tools for systems biology (Vol. 5, pp. 1–18). Berlin: Springer.
Baader, F., Calvanese, D., McGuinness, D. L., Nardi, D., & Patel-Schneider, P. F. (Eds.) (2003). The description logic handbook: theory, implementation, and applications. Cambridge: Cambridge University Press.
Bergmann, R., & Stahl, A. (1998). Similarity measures for object-oriented case representations. In Lecture notes in artificial intelligence: Vol. 1488. Advances in case-based reasoning, EWCBR-98 (pp. 8–13). Berlin: Springer.
Bisson, G. (1992). Learing in FOL with a similarity measure. In Proceedings of AAAI 1992 (pp. 82–87).
Börner, K. (1993). Structural similarity as a guidance in case-based design. In Lecture notes in computer science: Vol. 837. Topics in case-based reasoning: EWCBR’93 (pp. 197–208).
Carpenter, B. (1992). Cambridge tracts in theoretical computer science: Vol. 32. The logic of typed feature structures. Cambridge: Cambridge University Press.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27.
d’Amato, C., Staab, S., & Fanizzi, N. (2008). On the influence of description logics ontologies on conceptual similarity. In Lecture notes in computer science: Vol. 5268. EKAW ’08, Proc. 16th int. conf. on knowledge engineering: practice and patterns (pp. 48–63).
Dietterich, T., Domingos, P., Getoor, L., Muggleton, S., & Tadepalli, P. (2008). Structured machine learning: the next ten years. Machine Learning, 3–23.
Emde, W., & Wettschereck, D. (1996). Relational instance-based learning. In L. Saitta (Ed.), Proceedings 13th international conference on machine learning (pp. 122–130). Los Altos: Kaufmann.
Fanizzi, N., D’Amato, C., & Esposito, F. (2007). Induction of optimal semi-distances for individuals based on feature sets. In Proc. 2007 international workshop on description logics. CEUR-WS.
Fanizzi, N., D’Amato, C., & Esposito, F. (2008). DL-FOIL concept learning in description logics. In Lecture notes in computer science: Vol. 5194. ILP ’08: Proceedings of the 18th int. conf. inductive logic programming (pp. 107–121). Berlin: Springer.
González-Calero, P. A., Díaz-Agudo, B., & Gómez-Albarrán, M. (1999). Applying DLs for retrieval in case-based reasoning. In Proceedings of the 1999 description logics workshop (DL’99).
Helma, C., King, R., Kramer, S., & Srinivasan, A. (2001). The predictive toxicology challenge 2000–2001. Bioinformatics, 17, 107–108.
Hinton, G. E. (1986). Learning distributed representations of concepts. In Proceedings of 8th annual conf. cognitive science society (pp. 1–12).
Horváth, T., Wrobel, S., & Bohnebeck, U. (2001). Relational instance-based learning with lists and terms. Machine Learning, 43(1/2), 53–80.
Hutchinson, A. (1997). Metrics on terms and clauses. In Lecture notes in computer science: Vol. 1224. ECML ’97: Proceedings of the 9th European conference on machine learning (pp. 138–145). Berlin: Springer.
Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In Proceedings of the twentieth international conference (ICML 2003) (pp. 321–328). Menlo Park: AAAI Press.
Larson, J., & Michalski, R. S. (1977). Inductive inference of VL decision rules. SIGART Bulletin, 63, 38–44.
Lavrač, N., & Džeroski, S. (1994). Inductive logic programming. Techniques and applications. Chichester: Ellis Horwood.
Lehmann, J., & Hitzler, P. (2008). Foundations of refinement operators for description logics. In Lecture notes in computer science: Vol. 4894. Proceedings of the 17th international conference on inductive logic programming (ILP) (pp. 161–174). Berlin: Springer.
Lehmann, J., & Hitzler, P. (2008). A refinement operator based learning algorithm for the ALC description logic. In Lecture notes in computer science: Vol. 4894. Proceedings of the 17th international conference on inductive logic programming (ILP) (pp. 147–160). Berlin: Springer.
Michie, D., Muggleton, S., Page, D., & Srinivasan, A. (1994). To the international computing community: a new East–West challenge. Technical report, Oxford University Computing Laboratory, Oxford, UK.
Mitchell, T. (1982). Generalization as search. Artificial Intelligence, 18(2), 203–226.
Ontañón, S., & Plaza, E. (2009). On similarity measures based on a refinement lattice. In D. Wilson & L. McGinty (Eds.), Lecture notes in computer science: Vol. 5650. Proc. 8th int. conf. on case-based reasoning (pp. 240–255). Berlin: Springer.
Plaza, E. (1995). Cases as terms: a feature term approach to the structured representation of cases. In M. Veloso & A. Aamodt (Eds.), Lecture notes in artificial intelligence: Vol. 1010. Case-based reasoning, ICCBR-95 (pp. 265–276). Berlin: Springer.
Plaza, E., Armengol, E., & Ontañón, S. (2005). The explanatory power of symbolic similarity in case-based reasoning. Artificial Intelligence Review, 24(2), 145–161.
Plotkin, G. D. (1970). A note on inductive generalization. In Machine intelligence (Vol. 5, pp. 153–163). Edinburgh: Edinburgh University Press.
Pollard, C. J., & Moshier, M. D. (1990). Unifying partial descriptions of sets. In P. P. Hanson (Ed.), Information, language and cognition (Vol. 1, pp. 167–184). Vancouver: University of British Columbia Press.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Raymond, J. W., Blankley, C. J., & Willett, P. (2003). Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures. Journal of Molecular Graphics and Modelling, 21(5), 421–433.
Raymond, J. W., Gardiner, E. J., & Willett, P. (2002). Rascal: calculation of graph similarity using maximum common edge subgraphs. Computer Journal, 45(6), 631–644.
Sánchez-Ruiz, A. A., Ontañón, S., González-Calero, P. A., & Plaza, E. (2011). Measuring similarity in description logics using refinement operators. In Lecture notes in computers science: Vol. 6880. Proc. 19th int. conf. on case-based reasoning (pp. 289–303). Berlin: Springer.
Shapiro, E. Y. (1981). Inductive inference of theories from facts. Technical report 192, Dept. of Computer Science, Yale University.
Tversky, A. (1977). Features of similarity. In Psychological review (Vol. 84, pp. 327–352).
van der Laag, P. R. J., & Nienhuys-Cheng, S.-H. (1992). Subsumption and refinement in model inference. Discussion paper 7, Erasmus University Rotterdam, Faculty of Economics.
van der Laag, P. R. J., & Nienhuys-Cheng, S.-H. (1994). Existence and nonexistence of complete refinement operators. In ECML-94: Proceedings of the European conference on machine learning (pp. 307–322). New York: Springer.
Willett, P., Barnard, J. M., & Downs, G. M. (1998). Chemical similarity searching. Journal of Chemical Information and Computer Sciences, 38(6), 983–996.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: David Page.
Rights and permissions
About this article
Cite this article
Ontañón, S., Plaza, E. Similarity measures over refinement graphs. Mach Learn 87, 57–92 (2012). https://doi.org/10.1007/s10994-011-5274-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-011-5274-3