Elsevier

Journal of Web Semantics

Volume 7, Issue 4, December 2009, Pages 344-356
Journal of Web Semantics

An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size

https://doi.org/10.1016/j.websem.2009.09.001Get rights and content

Abstract

It has been a formidable task to achieve efficiency and scalability for the alignment between two massive, conceptually similar ontologies. Here we assume, an ontology is typically given in RDF (Resource Description Framework) or OWL (Web Ontology Language) and can be represented by a directed graph. A straightforward approach to the alignment of two ontologies entails an O(N2) computation by comparing every combination of pairs of nodes from given two ontologies, where N denotes the average number of nodes in each ontology. Our proposed algorithm called Anchor-Flood algorithm, boasting of O(Nlog(N)) computation on the average, starts off with an anchor, a pair of “look-alike” concepts from each ontology, gradually exploring concepts by collecting neighboring concepts, thereby taking advantage of locality of reference in the graph data structure. It outputs a set of alignments between concepts and properties within semantically connected subsets of two entire graphs, which we call segments. When similarity comparison between a pair of nodes in the directed graph has to be made to determine whether two given ontologies are aligned or not, we repeat the similarity comparison between a pair of nodes, within the neighborhood pairs of two ontologies surrounding the anchor iteratively until the algorithm meets that “either all the collected concepts are explored, or no new aligned pair is found”. In this way, we can significantly reduce the computational time for the alignment. Moreover, since we only focus on segment-to-segment comparison, regardless of the entire size of ontologies, our algorithm not only achieves high performance, but also resolves the scalability problem in aligning ontologies. Our proposed algorithm reduces the number of seemingly-aligned but actually misaligned pairs. Through several examples with large ontologies, we will demonstrate the features of our Anchor-Food algorithm.

Introduction

An ontology is an explicit specification of a conceptualization” is a prominent defintion by T.R. Gruber in 1995 [13]. The definition was then extended by Studer et al., in 1998 as “an ontology is an explicit, formal specification of a shared conceptualization of a domain of interest” [39]. An ontology generally consists of entities such as concepts, properties and relations. It is the backbone to fulfill the semantic web vision [2], [23] and is a knowledge base to enable machines to communicate each other effectively. The knowledge captured in ontologies can be used to annotate data, to distinguish between homonyms and polysemies, to drive intelligent user interfaces and even to retrieve new information. A number of ontologies are increasing day by day with new semantic web contents, because an ontology is being developed to formalize the conceptualization behind the idea of semantic web. Therefore, in order to achieve semantic interoperability and integrity, ontology alignment would be playing an important role [1].

Ontologies may be of different size, small to large, having hundreds to millions of RDF triples, where an RDF triple is an RDF statement between a subject, a predicate, and an object. A small-scale ontology is usually defined within a single segment, where the term “segment” is referred by Seidenberg et al. [34] as a fragment that stands alone as an ontology in its own right. A large-scale ontology is generally very complex in nature and may contain multiple segments [34]. Stuckenschimdt et al. [38] also reveals that a large-scale ontology contains a set of modules about a certain subtopic that can be used independently of the other modules. A large-scale ontology expressed in Web Ontology Language (OWL) [25] is appearing to capture distributed knowledge in a centralized knowledge base. For example, the biomedical domain has enormous large-scale ontologies, such as FMA [32] and OpenGALEN [31].

Although there are many diverse solutions to the ontology alignment problem [7], only a few are capable to handle large ontologies efficiently [17], [19]. Many previously developed ontology alignment systems are also facing severe performance problem when they attempt to deal with large ontologies. Resolving scalability problem is also an important current issue in the ontology alignment research [35]. Moreover, users (humans or machines) may want to find a specific part of a large ontology which will fit only to their interest.

The main contribution of our approach is of attaining performance enhancement by solving the scalability problem in aligning large ontologies. The key idea is to start from an anchor point of a taxonomy of an ontology and to proceed towards the neighboring nodes considering the locality of reference. Eventually our proposed algorithm aligns a part of gigantic ontologies and outputs a segmented alignment, (i.e. an alignment across two related segments or fragments across ontologies) which are relatively small, however sufficient to satisfy users interest in a particular domain. Therefore, performance, scalability, and segmented alignment (i.e. an alignment across two related segments or fragments of ontologies) are the key motivations of our proposed algorithm in this paper.

In our previous research work, we also focused on the ontology alignment [16]. Our system was capable of aligning small-scale ontologies effectively with the feature of eliminating misalignments. However, its major drawback was against the aligning of large-scale ontologies, as its complexity was O(N2). To overcome this limitation of scalability problem, we developed a new algorithm, we called as “Anchor-Flood” algorithm.

Our algorithm assumes a seed called an anchor, where the notion anchor is derived from the Anchor-PROMPT algorithm [30], although it is clearly different from the notion used in that algorithm. The Anchor-PROMPT algorithm augments existing methods by determining additional possible aligned pairs across ontologies, while our Anchor-Flood algorithm starts aligning from an anchor and produces segmented alignments.

Our algorithm also shares the term “flood” with similarity-flooding algorithm [26] without major similarities in their action blocks. Although our work is inspired by the idea of similarity flooding algorithm in that the elements of two distinct hierarchy models are likely to be similar when their adjacent elements are similar. However, our Anchor-Flood algorithm does not propagate similarity value to the adjacent nodes iteratively unlike similarity-flooding algorithm.

In the main process block of our algorithm, we collect aligned pairs from the neighboring concepts computing similarities among the collected concepts across ontologies starting from an anchor. It begins exploring to the neighboring concepts to collect more aligned pairs. As our algorithm starts off an anchor and explores to the neighboring concepts, it has a salient feature of scalability. Our algorithm achieves enhancement in terms of scalability and performance in aligning large ontologies. It can reduce the number of seemingly-aligned but actually misaligned pairs [16]. It also outputs “segmented alignment”, which is a unique characteristic in the field of ontology alignment research.

The rest of the paper is organized as follows. Section 2 introduces general terminologies to be used in the later sections. Section 3 contains the description of our proposed “Anchor-Flood” algorithm. Section 4 includes experiments and evaluation, while Section 5 describes the complexity factor of the algorithm. Some related work and the differences with our algorithm are described in Section 6. Section 7 includes concluded remarks along with some future directions of our work.

Section snippets

General terminologies

This section introduces some of the basic definitions in the field of ontology alignment research and familiarizes the reader with the notions and terminologies used throughout the paper. It includes the definitions of ontology, taxonomy, alignment across ontologies, the similarity measures and the idea of a segment in an ontology.

Anchor-Flood algorithm

Our proposed algorithm takes an anchor as one of the inputs along with two ontologies, O1 and O2. It outputs a set of aligned pairs within two segments across ontologies. Before proceeding to the detail of our algorithm, we define four frequently used terms in this paper.

Definition 1 Anchor

An anchor is defined as a pair of similar concepts across ontologies. If eO1, fO2 and ef, meaning that e and f are aligned, then an anchor, X is defined as a pair (e(O1),f(O2)), where the first element e(O1) comes from

Experiments and evaluation

We applied our Anchor-Flood algorithm to a variety of datasets, including the benchmark dataset,3 and larger ontologies, such as anatomical ontologies (FMA.owl, OpenGALEN.owl, human.owl, mouse.owl and so on), web directory ontologies in OWL format.

In this section, we discuss the results obtained from the OAEI-2008 and acquired by our extensive experiments in terms of scalability, memory consumption and segmented alignment. As we participated in

Complexity analysis

As the Anchor-Flood algorithm starts off with aligning an anchor, and explores to its neighboring concepts, the performance is not heavily affected by the size of ontologies. However, the performance depends on the size of neighbors defined in Eq. (5), and on the size of segments in a segmented alignment, which is illustrated in Section 2.4.

Consider two ontologies O1 with N1 entities and O2 with N2 entities, and for simplicity, if we assume N1=N2N, and the average number of children of each

Related work

There are some related works which target to deal with ontology alignment such as COMA++ [6], NLM Anatomical Ontology Alignment [44], PBM based Falcon-AO [17], [18], iMAP [5] and Chimaera [24]. Moreover, there are some other approaches [14], [27], [42], which address large-scale ontologies or large-scale class-hierarchies. These related work, which are relevant to our Anchor-Flood algorithm, are described in the subsequent sections.

Conclusions and future work

In this paper, we described the Anchor-Flood algorithm that can align ontologies of arbitrary size effectively, and that makes it possible to achieve high performance and scalability over previous alignment algorithms. To achieve these goals, the algorithm took advantage of the notion of segmentation and allowed segmented output of aligned ontologies. Specifically, owing to the segmentation, our algorithm concentrates on aligning only small sets of the entire ontology data iteratively, by

Acknowledgment

This work was partially supported by the Global COE Program “Frontiers of Intelligent Sensing”, from the ministry of Education, Culture, Sports, Science and Technology.

References (43)

  • J. David

    AROMA results for OAEI

  • R. Dhamankar et al.

    iMAP: discovering complex semantic matches between database schemas

  • M. Ehrig

    Ontology Alignment: Bridging the Semantic Gap

    (2007)
  • C. Fellbaum

    WordNet: An Electronic Lexical Database

    (1998)
  • F. Giunchiglia et al.

    Semantic matching

    The Knowledge Engineering Review

    (2004)
  • B. Grau et al.

    Automatic partitioning of OWL ontologies using ϵ-Connections

  • F. Hamdi et al.

    TaxoMap in the OAEI 2008 alignment contest

  • S. Hanif et al.

    Automatic alignment of ontology eliminating the probable misalignments

  • W. Hu et al.

    The results of Falcon-AO in the OAEI 2006 campaign

  • W. Hu et al.

    Partition-based block matching of large class hierarchies

  • Y. Jean-Mary et al.

    ASMOV: results for OAEI

  • Cited by (104)

    • Large-scale complex ontology matching through anchor-based semantic partitioning technique and confidence matrix based evolutionary algorithm

      2022, Applied Soft Computing
      Citation Excerpt :

      The similar segments are also determined by the similarity of their central concepts. Anchor-Flood [17] works on the basis of the anchors, which are highly similar concept pairs. It gradually matches the neighboring concepts of an anchor to exploit more correspondences.

    • Matching large-scale biomedical ontologies with central concept based partitioning algorithm and Adaptive Compact Evolutionary Algorithm

      2021, Applied Soft Computing
      Citation Excerpt :

      Falcon-AO [12] also uses the partitioning technique to reduce the search space, but it uses two concepts’ structural information to control the segment’s scale. Anchor-Flood [13] partitions an ontology by exploring the neighborhood of the anchor concepts, i.e. highly similar concepts neighborhood, to form different segments. Anchor-Flood can control the size and number of the segments.

    View all citing articles on Scopus
    View full text