How do we measure and improve the quality of a hierarchical ontology?

https://doi.org/10.1016/j.jss.2011.07.010Get rights and content

Abstract

Hierarchical ontologies enable organising information in a human–machine understandable form, but constructing them for reuse and maintainability remains difficult. Often supporting tools available lack formal methodological underpinning and their developers are not supported by any concomitant metrics. The paper presents a formal underpinning to provide quality metrics of a taxonomy hierarchical ontology and proposes a methodology for semi-automatic building of maintainable taxonomies. Users provide terms to be used to describe different ontological elements as well as their attributes and their ranges of values. The methodology uses the formalised metrics to assess the quality of the users input and proposes changes according to given quality constraints. The paper illustrates the metrics and the methodology in constructing and repairing two medium size well-known taxonomies.

Highlights

► We first ask “How Good is Your Hierarchical Ontology?” ► We present a qualitative evaluation framework to answer this question. ► We then ask “Can We Improve/Repair any of Shortcomings of the Ontology Automatically?” ► We then present two algorithms that use the formal evaluation framework and evaluate the algorithms on two cases studies.

Introduction

Domain specific taxonomies represent information semantics in a shareable and reusable manner (Gruber, 1993). Suitably crafted, they facilitate exchange of organization-dependent information within and across organizational boundaries and can play a central role in alleviating political and technical obstacles involved. Taxonomies often come from different organisations and persons with varying agendas with varying quality criteria. Thus interest in their evaluation in the context of their design within semantically enabled technologies increased significantly in recent years. Some examples of recent such works can be found in (Middleton et al., 2004), where a similarity system is presented as a prelude to evaluation, or in (Staab et al., 2001), where the authors propose a complex framework consisting of 160 characteristics spread across five dimensions: content, language, development methodology, building tools, and usage costs.

The significance of a well formed taxonomic structure cannot be overstated. They are a key technology in supporting user preferences search systems over the Semantic Web (Chamiel and Pagnucco, 2008). For example, a hierarchical musical genre system is a user preferencing system: an album can be identified as Progressive Rock which may be a leaf concept in a taxonomy but at the same time an album can be identified with the concept Rock even though it serves as an abstract concept. Resolving such problems in generalizations/classifications is not easy and requires tuning and improving the quality of taxonomies to avoid them in the first place. Borst (1997) generalizes evaluation to one of three scenarios: where both users and machines need a taxonomy assessment guide to suit their needs; a second where designers need practical guides to build and evaluate ontologies before publishing them, and a third where automatic taxonomy machine-learning requires identifying a suitable option among varying different possibilities, to adjust the parameters of the learning algorithms appropriately. This paper is well suited to support the imposition of quality requirements in the second scenario. We present a well-founded evaluation framework for automatic structural assessment of taxonomic ontologies by means of a set of formalised metrics. The paper also formalizes a quality description which is also well suited for use in the other two scenarios.

We present and illustrate an algorithmic methodology for building taxonomies with formally specified content. Although there are some examples of such methodologies in literature which accommodate the psychological/mental/real processes that take place, very few of them provide a formal structure (as was also pointed out in Brewster and O’Hara, 2004, Ganter and Wille, 1999). This is particularly problematic in scenario three identified in (Borst, 1997) (see above). Our formally specified algorithm to fix malformed taxonomies is extremely valuable in settings where taxonomies are developed on the fly by non-computer literate users e.g., as in collecting user preferences (Balke and Wagner, 2004, Chamiel and Pagnucco, 2008). They can play a central role generally in knowledge structures (e.g., The Semantic Web) where every concept in the hierarchy (not only leaves) can be referred to, not only as an abstract concept (probably through the set of properties it contains), but also as a data concept – a concept which can be associated with asserted instances (i.e., real web objects). Various examples of using such structures can be found in recent literature e.g., in Fortuna et al. (2007), Kiefer et al. (2007) and Schickel-Zuber and Faltings, 2006, Schickel-Zuber and Faltings, 2007.

Our focus in this paper is on the generation of precise taxonomic levels with carefully chosen and evaluated inter-level links to connect the most appropriate concepts in the ontology. Various authors noted the importance to analyse taxonomies and their properties, to define ‘goodness’ of taxonomies (Mizoguchi, 2004, Welty and Guarino, 2001). We apply similarity measures between concepts with particular emphasis on their attributes. Although evaluation and generation of taxonomies are important parts of most conceptual models, as they provide substantial structural information, and are key elements in integration tasks, little research effort has been done as to how they affect the development of taxonomies. Simperl et al. (2009) present the results of an empirical survey on ontology development, in which, during 6 months 148 ontology engineering real-world projects were analysed. After calculating the correlation between different aspects of ontology engineering and the associated effort, they conclude that complexity of domain analysis and ontology evaluation are two of the most effort demanding tasks. Therefore any result which eases the difficulties of these two tasks would result in major efficiency gains in the use of taxonomic ontologies. Ontologies are envisioned to be developed by domain experts having limited to no skills in ontology engineering. They conclude it is paramount to provide appropriate techniques and tools which enable the effective and efficient development and evaluation of ontologies. There is a gap between this recognition and the availability of formal and technical tools available and the work here presented contributes to fill in this gap. The rest of this paper is structured as follows: Section 2 presents a formal framework to represent a taxonomy. Section 3 presents a formal framework to represent the quality of the taxonomy. Based on formalisms of Sections 2 Modelling maintainable taxonomies, 3 Modelling and measuring: the quality of a taxonomy, Section 4 presents automatic algorithms for evaluation and construction of a taxonomy. These are illustrated in Section 5 with an example. Finally Section 6 concludes with further discussion of related work and future plans for the research.

Section snippets

Modelling maintainable taxonomies

Taxonomies have been important for modelling database schemas, knowledge-based systems and semantic vocabularies (Welty and Guarino, 2001). Most ontology building tools only work with ontologies organised based on the partially ordered relation, IS-A, through which, entities are grouped into or subsumed by a higher level class. In this section, we present our formal ontology model which is structured around concepts, IS-A relations, and axioms. Ganter and Wille, 1999 and Punera et al. (2006)

Modelling and measuring: the quality of a taxonomy

In our framework, an ontology is a set of well-defined concepts hierarchically presented coupled with a set of axioms (i.e., laws that always hold between the attributes of the same or different concepts). Using Section 2 definitions, we define the following notions about our taxonomies:

Definition 10 :

Well-connectedness: Let C be a non-empty set of well-defined concepts. These are said to be well-connected, written well-connected(C), if the following conditions hold:

  • i)

    x  C, categorical_context(x,C)  ϕ; and

  • ii)

    y  C

Taxonomy building and evaluation

Building and evaluating our taxonomy algorithmically is presented in this section. Building the ontology is based on two concept refinement processes, a bottom-up process generating an upper category from a given concept and a top-down process generating lower categories from a given concept. The automatic taxonomic evaluation and refinement of the taxonomic structure is based on identifying a well defined set of concepts by analysing the similarities of all involved attributes. This structure

Methodology applications

We illustrate our enhancement methodology with two examples. The first is the road safety domain which continues the context of the examples shown in the 21 definitions in Sections 3 Modelling and measuring: the quality of a taxonomy, 4 Taxonomy building and evaluation. The second is based on a well-known example for ontology practitioners, originally developed in the Manchester University as a support resource for the Protégé tutorial (Horridge et al., 2007). This is the so called Pizza

Discussion and future work

In this section we review the works of other authors, whose proposals are close to ours. We discuss and summarise our results and we conclude with our future work.

Acknowledgments

This work was supported in part by the Spanish Government (under projects TIN2006-14780 and PT-2006-055-24ICPP and the Region of Murcia under project BIO-TEC 06/01-005) and Australian Research Council (Grant DP1112378).

Dr. Ghassan Beydoun received a degree in computer science and a PhD degree in knowledge systems from the University of New South Wales. He is currently a senior lecturer at the School of Information Systems and Technology at the University of Wollongong and an adjunct senior research fellow at the School of Information Systems, Management and Technology at the University of New South Wales. He has authored more than 90 papers international journals and conferences. He is currently working on a

References (37)

  • E. Blomqvist

    OntoCase-Automatic Ontology Enrichment Based on Ontology Design Patterns

  • Borst, W.N., 1997. Construction of Engineering Ontologies for Knowledge Sharing and Reuse. PhD Thesis. University of...
  • C. Brewster et al.

    Knowledge representation with ontologies: the present and future

    Intelligent Systems, IEEE

    (2004)
  • G. Chamiel et al.

    Exploiting ontological information for reasoning with preferences

  • B. Fortuna et al.

    OntoG semi-automatic ontology

    Lecture Notes in Computer Science Springer

    (2007)
  • B. Ganter et al.

    Formal Concept Analysis: Mathematical Foundations

    (1999)
  • N. Guarino et al.

    Evaluating ontological decisions with OntoClean

    Communications of the ACM

    (2002)
  • N. Guarino et al.

    An overview of OntoClean

  • Cited by (29)

    • Sigmoid similarity - a new feature-based similarity measure

      2019, Information Sciences
      Citation Excerpt :

      This section provides a background on the main notions used in this work. A conceptual hierarchy provides a taxonomy (a tree or a lattice) of concepts organised using the partial order IS-A relation, which specialises more general classes into more specific classes [3,33]. The IS-A relation is asymmetric and transitive and defines a hierarchical structure of the ontology, enabling the inheritance of characteristics from parent classes to descendant classes.

    • Development and validation of a Disaster Management Metamodel (DMM)

      2014, Information Processing and Management
      Citation Excerpt :

      For the purpose of the validation in this paper, we refine 20 DM models in details using our metamodel and applying multiple validation techniques. The quality of the metamodel is measured based on how it can fulfill the purpose of its development (Beydoun, Lopez-Lorca, Sanchez, & Martinez-Bejar, 2011; Garcia, 2007): addressing the needs of domain practitioners, increasing the transparency to the knowledge encoded within the domain applications and how amenable to be validated by experts in the domain area. Our end users (domain practitioners) include emergency managers, DM coordinators or safety managers for various public and private organizations seeking to create a DM model to manage anticipated disasters.

    • Anisotropic propagation of user interests in ontology-based user models

      2013, Information Sciences
      Citation Excerpt :

      In our work, we focus on a particular type of ontology, namely conceptual hierarchy derived from the domain ontology, also known as hierarchical ontology. This kind of ontology is a taxonomy of concepts where concepts are organized based on the partial order relation IS-A, through which entities are grouped into or subsumed by a higher level classes [9,43]. A conceptual hierarchy can be seen as a simple ontology where the properties of concepts are not taken into account.

    • Measuring ontology information by rules based transformation

      2013, Knowledge-Based Systems
      Citation Excerpt :

      As a new successor of knowledge engineering, ontology engineering [5] aims at knowledge sharing and reuse by designing, implementing and deploying ontologies. However, ontology construction is rather tedious and costly [7,8]. It is attractive for ontology engineers to select and reuse the candidate ontologies that most satisfy their requirements by measuring and evaluating them [9].

    View all citing articles on Scopus

    Dr. Ghassan Beydoun received a degree in computer science and a PhD degree in knowledge systems from the University of New South Wales. He is currently a senior lecturer at the School of Information Systems and Technology at the University of Wollongong and an adjunct senior research fellow at the School of Information Systems, Management and Technology at the University of New South Wales. He has authored more than 90 papers international journals and conferences. He is currently working on a project sponsored by an Australian Research Council Discovery Grant to investigate the best uses of ontologies in developing methodologies for distributed intelligent systems. His other research interests include multi agent systems applications, ontologies and their applications, and knowledge acquisition.

    Antonio A. Lopez-Lorca holds a degree in computer science from University of Murcia in Spain. Currently he is a PhD candidate and lecturer at the School of Information Systems and Technology at the University of Wollongong in Australia. In his PhD, funded by the Australian Research Council, he studies the validation of multi agent systems models using ontologies. He is a co-author of several papers in international journals and conferences. His research interests include multi agent systems, artificial intelligence, knowledge management, ontology modelling and reasoning and software engineering.

    Dr Francisco García-Sánchez received his BA, MSc and PhD degrees in computer science from the University of Murcia. He is currently working as PhD Assistant Professor in the Computer Science Department at the University of Valencia. His research interests include agent technology, service-oriented architectures and the semantic web. He has conducted a number of research stays in world-leading research institutes in Ireland, Austria, the United States and Australia, and has published over 40 articles in international journals, international and national conferences and workshops. He has leaded several projects concerning the development of user interfaces to Semantic Web services execution environments and ontology-based intelligent systems to assist in accessing financial data sources.

    Dr. Rodrigo Martínez-Béjar received his BA and PhD degrees in Computer Science from the University of Murcia, Spain. He got his MSc degree in Computer Science at the University of Málaga, Spain. He is Professor at the Department of Information and Communication Engineering, University of Murcia. His research interests include the development and application of knowledge technologies to different fields such as Medicine, the Semantic Web, E-learning, Bioinformatics, and Rural Development. He has been the leader of a number of national and international research projects. He is co-author of more than 100 articles published in international journals and conferences.

    View full text