How do we measure and improve the quality of a hierarchical ontology?

doi:10.1016/j.jss.2011.07.010

Journal of Systems and Software

Volume 84, Issue 12, December 2011, Pages 2363-2373

https://doi.org/10.1016/j.jss.2011.07.010 Get rights and content

Abstract

Hierarchical ontologies enable organising information in a human–machine understandable form, but constructing them for reuse and maintainability remains difficult. Often supporting tools available lack formal methodological underpinning and their developers are not supported by any concomitant metrics. The paper presents a formal underpinning to provide quality metrics of a taxonomy hierarchical ontology and proposes a methodology for semi-automatic building of maintainable taxonomies. Users provide terms to be used to describe different ontological elements as well as their attributes and their ranges of values. The methodology uses the formalised metrics to assess the quality of the users input and proposes changes according to given quality constraints. The paper illustrates the metrics and the methodology in constructing and repairing two medium size well-known taxonomies.

Highlights

► We first ask “How Good is Your Hierarchical Ontology?” ► We present a qualitative evaluation framework to answer this question. ► We then ask “Can We Improve/Repair any of Shortcomings of the Ontology Automatically?” ► We then present two algorithms that use the formal evaluation framework and evaluate the algorithms on two cases studies.

Introduction

Domain specific taxonomies represent information semantics in a shareable and reusable manner (Gruber, 1993). Suitably crafted, they facilitate exchange of organization-dependent information within and across organizational boundaries and can play a central role in alleviating political and technical obstacles involved. Taxonomies often come from different organisations and persons with varying agendas with varying quality criteria. Thus interest in their evaluation in the context of their design within semantically enabled technologies increased significantly in recent years. Some examples of recent such works can be found in (Middleton et al., 2004), where a similarity system is presented as a prelude to evaluation, or in (Staab et al., 2001), where the authors propose a complex framework consisting of 160 characteristics spread across five dimensions: content, language, development methodology, building tools, and usage costs.

The significance of a well formed taxonomic structure cannot be overstated. They are a key technology in supporting user preferences search systems over the Semantic Web (Chamiel and Pagnucco, 2008). For example, a hierarchical musical genre system is a user preferencing system: an album can be identified as Progressive Rock which may be a leaf concept in a taxonomy but at the same time an album can be identified with the concept Rock even though it serves as an abstract concept. Resolving such problems in generalizations/classifications is not easy and requires tuning and improving the quality of taxonomies to avoid them in the first place. Borst (1997) generalizes evaluation to one of three scenarios: where both users and machines need a taxonomy assessment guide to suit their needs; a second where designers need practical guides to build and evaluate ontologies before publishing them, and a third where automatic taxonomy machine-learning requires identifying a suitable option among varying different possibilities, to adjust the parameters of the learning algorithms appropriately. This paper is well suited to support the imposition of quality requirements in the second scenario. We present a well-founded evaluation framework for automatic structural assessment of taxonomic ontologies by means of a set of formalised metrics. The paper also formalizes a quality description which is also well suited for use in the other two scenarios.

We present and illustrate an algorithmic methodology for building taxonomies with formally specified content. Although there are some examples of such methodologies in literature which accommodate the psychological/mental/real processes that take place, very few of them provide a formal structure (as was also pointed out in Brewster and O’Hara, 2004, Ganter and Wille, 1999). This is particularly problematic in scenario three identified in (Borst, 1997) (see above). Our formally specified algorithm to fix malformed taxonomies is extremely valuable in settings where taxonomies are developed on the fly by non-computer literate users e.g., as in collecting user preferences (Balke and Wagner, 2004, Chamiel and Pagnucco, 2008). They can play a central role generally in knowledge structures (e.g., The Semantic Web) where every concept in the hierarchy (not only leaves) can be referred to, not only as an abstract concept (probably through the set of properties it contains), but also as a data concept – a concept which can be associated with asserted instances (i.e., real web objects). Various examples of using such structures can be found in recent literature e.g., in Fortuna et al. (2007), Kiefer et al. (2007) and Schickel-Zuber and Faltings, 2006, Schickel-Zuber and Faltings, 2007.

Our focus in this paper is on the generation of precise taxonomic levels with carefully chosen and evaluated inter-level links to connect the most appropriate concepts in the ontology. Various authors noted the importance to analyse taxonomies and their properties, to define ‘goodness’ of taxonomies (Mizoguchi, 2004, Welty and Guarino, 2001). We apply similarity measures between concepts with particular emphasis on their attributes. Although evaluation and generation of taxonomies are important parts of most conceptual models, as they provide substantial structural information, and are key elements in integration tasks, little research effort has been done as to how they affect the development of taxonomies. Simperl et al. (2009) present the results of an empirical survey on ontology development, in which, during 6 months 148 ontology engineering real-world projects were analysed. After calculating the correlation between different aspects of ontology engineering and the associated effort, they conclude that complexity of domain analysis and ontology evaluation are two of the most effort demanding tasks. Therefore any result which eases the difficulties of these two tasks would result in major efficiency gains in the use of taxonomic ontologies. Ontologies are envisioned to be developed by domain experts having limited to no skills in ontology engineering. They conclude it is paramount to provide appropriate techniques and tools which enable the effective and efficient development and evaluation of ontologies. There is a gap between this recognition and the availability of formal and technical tools available and the work here presented contributes to fill in this gap. The rest of this paper is structured as follows: Section 2 presents a formal framework to represent a taxonomy. Section 3 presents a formal framework to represent the quality of the taxonomy. Based on formalisms of Sections 2 Modelling maintainable taxonomies, 3 Modelling and measuring: the quality of a taxonomy, Section 4 presents automatic algorithms for evaluation and construction of a taxonomy. These are illustrated in Section 5 with an example. Finally Section 6 concludes with further discussion of related work and future plans for the research.

Section snippets

Modelling maintainable taxonomies

Taxonomies have been important for modelling database schemas, knowledge-based systems and semantic vocabularies (Welty and Guarino, 2001). Most ontology building tools only work with ontologies organised based on the partially ordered relation, IS-A, through which, entities are grouped into or subsumed by a higher level class. In this section, we present our formal ontology model which is structured around concepts, IS-A relations, and axioms. Ganter and Wille, 1999 and Punera et al. (2006)

Modelling and measuring: the quality of a taxonomy

In our framework, an ontology is a set of well-defined concepts hierarchically presented coupled with a set of axioms (i.e., laws that always hold between the attributes of the same or different concepts). Using Section 2 definitions, we define the following notions about our taxonomies:

Definition 10 :

Well-connectedness: Let C be a non-empty set of well-defined concepts. These are said to be well-connected, written well-connected(C), if the following conditions hold:

i)
∀x ∈ C, categorical_context(x,C) ≠ ϕ; and
ii)
∃y ∈ C

Taxonomy building and evaluation

Building and evaluating our taxonomy algorithmically is presented in this section. Building the ontology is based on two concept refinement processes, a bottom-up process generating an upper category from a given concept and a top-down process generating lower categories from a given concept. The automatic taxonomic evaluation and refinement of the taxonomic structure is based on identifying a well defined set of concepts by analysing the similarities of all involved attributes. This structure

Methodology applications

We illustrate our enhancement methodology with two examples. The first is the road safety domain which continues the context of the examples shown in the 21 definitions in Sections 3 Modelling and measuring: the quality of a taxonomy, 4 Taxonomy building and evaluation. The second is based on a well-known example for ontology practitioners, originally developed in the Manchester University as a support resource for the Protégé tutorial (Horridge et al., 2007). This is the so called Pizza

Discussion and future work

In this section we review the works of other authors, whose proposals are close to ours. We discuss and summarise our results and we conclude with our future work.

Acknowledgments

This work was supported in part by the Spanish Government (under projects TIN2006-14780 and PT-2006-055-24ICPP and the Region of Murcia under project BIO-TEC 06/01-005) and Australian Research Council (Grant DP1112378).

References (37)

G. Beydoun et al.
Theoretical framework of incremental hierarchical knowledge acquisition
International Journal of Human-Computer Studies
(2001)
M.Y. Dahab et al.
TextOntoEx: automatic ontology construction from natural English text
Expert Systems with Applications
(2008)
T.R. Gruber
A translation approach to portable ontology specifications
Knowledge Acquisition
(1993)
B. Motik et al.
Bridging the gap between OWL and relational databases
Journal of Web Semantics
(2009)
R. Stevens et al.
Using OWL to model biological knowledge
International Journal of Human-Computer Studies
(2007)
C. Welty et al.
Supporting ontological analysis of taxonomic relationships
Data & Knowledge Engineering
(2001)
W.-T. Balke et al.
Through different eyes: assessing multiple conceptual views for querying web services
T. Berners-Lee et al.
Creating a science of the web
Science
(2006)
G. Beydoun et al.
Cooperative modeling evaluated
International Journal of Cooperative Information Systems
(2005)
G. Beydoun et al.
FAML: a generic metamodel for MAS development
IEEE Transactions on Software Engineering
(2009)

E. Blomqvist

OntoCase-Automatic Ontology Enrichment Based on Ontology Design Patterns

Borst, W.N., 1997. Construction of Engineering Ontologies for Knowledge Sharing and Reuse. PhD Thesis. University of...

C. Brewster et al.

Knowledge representation with ontologies: the present and future

Intelligent Systems, IEEE

(2004)

G. Chamiel et al.

Exploiting ontological information for reasoning with preferences

B. Fortuna et al.

OntoG semi-automatic ontology

Lecture Notes in Computer Science Springer

(2007)

B. Ganter et al.

Formal Concept Analysis: Mathematical Foundations

(1999)

N. Guarino et al.

Evaluating ontological decisions with OntoClean

Communications of the ACM

(2002)

N. Guarino et al.

An overview of OntoClean

Cited by (29)

Sigmoid similarity - a new feature-based similarity measure
2019, Information Sciences
Citation Excerpt :
This section provides a background on the main notions used in this work. A conceptual hierarchy provides a taxonomy (a tree or a lattice) of concepts organised using the partial order IS-A relation, which specialises more general classes into more specific classes [3,33]. The IS-A relation is asymmetric and transitive and defines a hierarchical structure of the ontology, enabling the inheritance of characteristics from parent classes to descendant classes.
Similarity is one of the most straightforward ways to relate objects and guide the human perception of the world. It has an important role in many areas, such as Information Retrieval, Natural Language Processing, Semantic Web and Recommender Systems. To help applications in these areas achieve satisfying results when finding similar concepts, it is important to simulate human perception of similarity and assess which similarity measure is the most adequate.
We propose Sigmoid similarity, a feature-based semantic similarity measure on instances in a specific ontology, as an improvement of Dice measure. We performed two separate evaluations with real evaluators. The first evaluation includes 137 subjects and 25 pairs of concepts in the recipes domain and the second one includes 147 subjects and 30 pairs of concepts in the drinks domain. To the best of our knowledge these are some of the most extensive evaluations in the field.
We also explored the performance of some hierarchy-based approaches and showed that feature-based approaches outperform them on two specific ontologies we tested. In addition, we tried to incorporate hierarchy-based information into our measures and concluded it is not worth complicating the measures only based on features with additional information since they perform comparably.
Identification of ontologies to support information systems development
2014, Information Systems
Ontologies can provide many benefits during information systems development. They can provide domain knowledge to requirement engineers, are reusable software components for web applications or intelligent agent developers, and can facilitate semi-automatic model verification and validation. They also assist in software extensibility, interoperability and reuse. All these benefits critically depend on the provision of a suitable ontology (ies). This paper introduces a semantically-based three stage-approach to assist developers in checking the consistency of the requirements models and choose the most suitable and relevant ontology (ies) for their development project from a given repository. The early requirements models, documented using the i^⁎ language, are converted to a retrieval ontology. The consistency of this retrieval ontology is then checked before being used to identify a set of reusable ontologies that are relevant for the development project. The paper also provides an initial validation of each of the stages.
Aligning ontology-based development with service oriented systems
2014, Future Generation Computer Systems
This paper argues for placing ontologies at the centre of the software development life cycle for distributed component-based systems and, in particular, for service-oriented systems. It presents an ontology-based development process which relies on three levels of abstraction using ontologies: architecture layer, application layer and domain layer. The paper discusses the key roles of ontologies with respect to the various abstraction layers and their corresponding impact on the concomitant workproducts. In addition, a peer-to-peer-based service selecting and composing tool is suggested as a way of supporting the process. The paper presents the architecture of the proposed tool and illustrates the whole process in the development of a mobile banking application based on dynamic Web services.
Development and validation of a Disaster Management Metamodel (DMM)
2014, Information Processing and Management
Citation Excerpt :
For the purpose of the validation in this paper, we refine 20 DM models in details using our metamodel and applying multiple validation techniques. The quality of the metamodel is measured based on how it can fulfill the purpose of its development (Beydoun, Lopez-Lorca, Sanchez, & Martinez-Bejar, 2011; Garcia, 2007): addressing the needs of domain practitioners, increasing the transparency to the knowledge encoded within the domain applications and how amenable to be validated by experts in the domain area. Our end users (domain practitioners) include emergency managers, DM coordinators or safety managers for various public and private organizations seeking to create a DM model to manage anticipated disasters.
Disaster Management (DM) is a diffused area of knowledge. It has many complex features interconnecting the physical and the social views of the world. Many international and national bodies create knowledge models to allow knowledge sharing and effective DM activities. But these are often narrow in focus and deal with specified disaster types. We analyze thirty such models to uncover that many DM activities are actually common even when the events vary. We then create a unified view of DM in the form of a metamodel. We apply a metamodelling process to ensure that this metamodel is complete and consistent. We validate it and present a representational layer to unify and share knowledge as well as combine and match different DM activities according to different disaster situations.
Anisotropic propagation of user interests in ontology-based user models
2013, Information Sciences
Citation Excerpt :
In our work, we focus on a particular type of ontology, namely conceptual hierarchy derived from the domain ontology, also known as hierarchical ontology. This kind of ontology is a taxonomy of concepts where concepts are organized based on the partial order relation IS-A, through which entities are grouped into or subsumed by a higher level classes [9,43]. A conceptual hierarchy can be seen as a simple ontology where the properties of concepts are not taken into account.
This work contributes to the development of ontology-based user models, devised as overlays over conceptual hierarchies derived from domain ontologies. We tackle the problem of propagation of user interests in such a conceptual hierarchy. In addition to accounting for the hierarchical structure of the domain and the type and amount of feedback provided by the user, the principal contributions introduced in this work are: (i) horizontal propagation which enables propagation among siblings, in addition to vertical propagation among ancestors and descendants; (ii) anisotropic vertical propagation which permits user interests to be propagated differently upward and downward; (iii) context-dependance which introduces the possibility to propagate differently according to various contexts for specific applications; (iv) support for dynamic ontology maintenance, i.e. preserving the user interest values when adding or removing a node from the conceptual hierarchy. Our approach supports finer recommendation modalities and contributes to the resolution of the cold start problem, since it allows for propagation from a small number of initial concepts to other related domain concepts by exploiting the conceptual hierarchy of the domain. A field evaluation confirmed the effectiveness of our approach w.r.t. the traditional vertical propagation.
Measuring ontology information by rules based transformation
2013, Knowledge-Based Systems
Citation Excerpt :
As a new successor of knowledge engineering, ontology engineering [5] aims at knowledge sharing and reuse by designing, implementing and deploying ontologies. However, ontology construction is rather tedious and costly [7,8]. It is attractive for ontology engineers to select and reuse the candidate ontologies that most satisfy their requirements by measuring and evaluating them [9].
Ontologies have currently attracted much attention of researchers and engineers in many fields such as knowledge management, etc. It is attractive for ontology engineers to select and reuse the existing ontologies by measuring and evaluating them because ontology construction is rather tedious and costly. In this paper, a general framework for stable semantic ontology measurement is proposed. We first clarify the concepts of syntactic, semantic and stable semantic ontology measurement. Then we present the semantic derived model (SDM) to represent the semantic model of an ontology. By rule based transformation, an ontology can be automatically transformed into its final semantic derived model (FSDM) which is unique. Furthermore, we can measure ontologies based on FSDM by analyzing the types of entities of the existing ontology metrics. The related experiments are made to illustrate that our framework can effectively excavate and stably measure the semantics of ontologies.

View all citing articles on Scopus

Dr. Ghassan Beydoun received a degree in computer science and a PhD degree in knowledge systems from the University of New South Wales. He is currently a senior lecturer at the School of Information Systems and Technology at the University of Wollongong and an adjunct senior research fellow at the School of Information Systems, Management and Technology at the University of New South Wales. He has authored more than 90 papers international journals and conferences. He is currently working on a project sponsored by an Australian Research Council Discovery Grant to investigate the best uses of ontologies in developing methodologies for distributed intelligent systems. His other research interests include multi agent systems applications, ontologies and their applications, and knowledge acquisition.

Antonio A. Lopez-Lorca holds a degree in computer science from University of Murcia in Spain. Currently he is a PhD candidate and lecturer at the School of Information Systems and Technology at the University of Wollongong in Australia. In his PhD, funded by the Australian Research Council, he studies the validation of multi agent systems models using ontologies. He is a co-author of several papers in international journals and conferences. His research interests include multi agent systems, artificial intelligence, knowledge management, ontology modelling and reasoning and software engineering.

Dr Francisco García-Sánchez received his BA, MSc and PhD degrees in computer science from the University of Murcia. He is currently working as PhD Assistant Professor in the Computer Science Department at the University of Valencia. His research interests include agent technology, service-oriented architectures and the semantic web. He has conducted a number of research stays in world-leading research institutes in Ireland, Austria, the United States and Australia, and has published over 40 articles in international journals, international and national conferences and workshops. He has leaded several projects concerning the development of user interfaces to Semantic Web services execution environments and ontology-based intelligent systems to assist in accessing financial data sources.

Dr. Rodrigo Martínez-Béjar received his BA and PhD degrees in Computer Science from the University of Murcia, Spain. He got his MSc degree in Computer Science at the University of Málaga, Spain. He is Professor at the Department of Information and Communication Engineering, University of Murcia. His research interests include the development and application of knowledge technologies to different fields such as Medicine, the Semantic Web, E-learning, Bioinformatics, and Rural Development. He has been the leader of a number of national and international research projects. He is co-author of more than 100 articles published in international journals and conferences.

View full text

How do we measure and improve the quality of a hierarchical ontology?

Abstract

Highlights

Introduction

Section snippets

Modelling maintainable taxonomies

Modelling and measuring: the quality of a taxonomy

Taxonomy building and evaluation

Methodology applications

Discussion and future work

Acknowledgments

International Journal of Human-Computer Studies

Expert Systems with Applications

Knowledge Acquisition

Journal of Web Semantics

International Journal of Human-Computer Studies

Data & Knowledge Engineering

Through different eyes: assessing multiple conceptual views for querying web services

Creating a science of the web

Science

Cooperative modeling evaluated

International Journal of Cooperative Information Systems

FAML: a generic metamodel for MAS development

IEEE Transactions on Software Engineering