An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size

doi:10.1016/j.websem.2009.09.001

Journal of Web Semantics

Volume 7, Issue 4, December 2009, Pages 344-356

https://doi.org/10.1016/j.websem.2009.09.001 Get rights and content

Abstract

It has been a formidable task to achieve efficiency and scalability for the alignment between two massive, conceptually similar ontologies. Here we assume, an ontology is typically given in RDF (Resource Description Framework) or OWL (Web Ontology Language) and can be represented by a directed graph. A straightforward approach to the alignment of two ontologies entails an $O (N^{2})$ computation by comparing every combination of pairs of nodes from given two ontologies, where $N$ denotes the average number of nodes in each ontology. Our proposed algorithm called Anchor-Flood algorithm, boasting of $O (N \log (N))$ computation on the average, starts off with an anchor, a pair of “look-alike” concepts from each ontology, gradually exploring concepts by collecting neighboring concepts, thereby taking advantage of locality of reference in the graph data structure. It outputs a set of alignments between concepts and properties within semantically connected subsets of two entire graphs, which we call segments. When similarity comparison between a pair of nodes in the directed graph has to be made to determine whether two given ontologies are aligned or not, we repeat the similarity comparison between a pair of nodes, within the neighborhood pairs of two ontologies surrounding the anchor iteratively until the algorithm meets that “either all the collected concepts are explored, or no new aligned pair is found”. In this way, we can significantly reduce the computational time for the alignment. Moreover, since we only focus on segment-to-segment comparison, regardless of the entire size of ontologies, our algorithm not only achieves high performance, but also resolves the scalability problem in aligning ontologies. Our proposed algorithm reduces the number of seemingly-aligned but actually misaligned pairs. Through several examples with large ontologies, we will demonstrate the features of our Anchor-Food algorithm.

Introduction

“An ontology is an explicit specification of a conceptualization” is a prominent defintion by T.R. Gruber in 1995 [13]. The definition was then extended by Studer et al., in 1998 as “an ontology is an explicit, formal specification of a shared conceptualization of a domain of interest” [39]. An ontology generally consists of entities such as concepts, properties and relations. It is the backbone to fulfill the semantic web vision [2], [23] and is a knowledge base to enable machines to communicate each other effectively. The knowledge captured in ontologies can be used to annotate data, to distinguish between homonyms and polysemies, to drive intelligent user interfaces and even to retrieve new information. A number of ontologies are increasing day by day with new semantic web contents, because an ontology is being developed to formalize the conceptualization behind the idea of semantic web. Therefore, in order to achieve semantic interoperability and integrity, ontology alignment would be playing an important role [1].

Ontologies may be of different size, small to large, having hundreds to millions of RDF triples, where an RDF triple is an RDF statement between a subject, a predicate, and an object. A small-scale ontology is usually defined within a single segment, where the term “segment” is referred by Seidenberg et al. [34] as a fragment that stands alone as an ontology in its own right. A large-scale ontology is generally very complex in nature and may contain multiple segments [34]. Stuckenschimdt et al. [38] also reveals that a large-scale ontology contains a set of modules about a certain subtopic that can be used independently of the other modules. A large-scale ontology expressed in Web Ontology Language (OWL) [25] is appearing to capture distributed knowledge in a centralized knowledge base. For example, the biomedical domain has enormous large-scale ontologies, such as FMA [32] and OpenGALEN [31].

Although there are many diverse solutions to the ontology alignment problem [7], only a few are capable to handle large ontologies efficiently [17], [19]. Many previously developed ontology alignment systems are also facing severe performance problem when they attempt to deal with large ontologies. Resolving scalability problem is also an important current issue in the ontology alignment research [35]. Moreover, users (humans or machines) may want to find a specific part of a large ontology which will fit only to their interest.

The main contribution of our approach is of attaining performance enhancement by solving the scalability problem in aligning large ontologies. The key idea is to start from an anchor point of a taxonomy of an ontology and to proceed towards the neighboring nodes considering the locality of reference. Eventually our proposed algorithm aligns a part of gigantic ontologies and outputs a segmented alignment, (i.e. an alignment across two related segments or fragments across ontologies) which are relatively small, however sufficient to satisfy users interest in a particular domain. Therefore, performance, scalability, and segmented alignment (i.e. an alignment across two related segments or fragments of ontologies) are the key motivations of our proposed algorithm in this paper.

In our previous research work, we also focused on the ontology alignment [16]. Our system was capable of aligning small-scale ontologies effectively with the feature of eliminating misalignments. However, its major drawback was against the aligning of large-scale ontologies, as its complexity was $O (N^{2})$ . To overcome this limitation of scalability problem, we developed a new algorithm, we called as “Anchor-Flood” algorithm.

Our algorithm assumes a seed called an anchor, where the notion anchor is derived from the Anchor-PROMPT algorithm [30], although it is clearly different from the notion used in that algorithm. The Anchor-PROMPT algorithm augments existing methods by determining additional possible aligned pairs across ontologies, while our Anchor-Flood algorithm starts aligning from an anchor and produces segmented alignments.

Our algorithm also shares the term “flood” with similarity-flooding algorithm [26] without major similarities in their action blocks. Although our work is inspired by the idea of similarity flooding algorithm in that the elements of two distinct hierarchy models are likely to be similar when their adjacent elements are similar. However, our Anchor-Flood algorithm does not propagate similarity value to the adjacent nodes iteratively unlike similarity-flooding algorithm.

In the main process block of our algorithm, we collect aligned pairs from the neighboring concepts computing similarities among the collected concepts across ontologies starting from an anchor. It begins exploring to the neighboring concepts to collect more aligned pairs. As our algorithm starts off an anchor and explores to the neighboring concepts, it has a salient feature of scalability. Our algorithm achieves enhancement in terms of scalability and performance in aligning large ontologies. It can reduce the number of seemingly-aligned but actually misaligned pairs [16]. It also outputs “segmented alignment”, which is a unique characteristic in the field of ontology alignment research.

The rest of the paper is organized as follows. Section 2 introduces general terminologies to be used in the later sections. Section 3 contains the description of our proposed “Anchor-Flood” algorithm. Section 4 includes experiments and evaluation, while Section 5 describes the complexity factor of the algorithm. Some related work and the differences with our algorithm are described in Section 6. Section 7 includes concluded remarks along with some future directions of our work.

Section snippets

General terminologies

This section introduces some of the basic definitions in the field of ontology alignment research and familiarizes the reader with the notions and terminologies used throughout the paper. It includes the definitions of ontology, taxonomy, alignment across ontologies, the similarity measures and the idea of a segment in an ontology.

Anchor-Flood algorithm

Our proposed algorithm takes an anchor as one of the inputs along with two ontologies, $O_{1}$ and $O_{2}$ . It outputs a set of aligned pairs within two segments across ontologies. Before proceeding to the detail of our algorithm, we define four frequently used terms in this paper.

Definition 1 Anchor

An anchor is defined as a pair of similar concepts across ontologies. If $e \in O_{1}$ , $f \in O_{2}$ and $e \equiv f$ , meaning that e and f are aligned, then an anchor, $X$ is defined as a pair $(e (O_{1}), f (O_{2}))$ , where the first element $e (O_{1})$ comes from

Experiments and evaluation

We applied our Anchor-Flood algorithm to a variety of datasets, including the benchmark dataset,³ and larger ontologies, such as anatomical ontologies (FMA.owl, OpenGALEN.owl, human.owl, mouse.owl and so on), web directory ontologies in OWL format.

In this section, we discuss the results obtained from the OAEI-2008 and acquired by our extensive experiments in terms of scalability, memory consumption and segmented alignment. As we participated in

Complexity analysis

As the Anchor-Flood algorithm starts off with aligning an anchor, and explores to its neighboring concepts, the performance is not heavily affected by the size of ontologies. However, the performance depends on the size of neighbors defined in Eq. (5), and on the size of segments in a segmented alignment, which is illustrated in Section 2.4.

Consider two ontologies $O_{1}$ with $N_{1}$ entities and $O_{2}$ with $N_{2}$ entities, and for simplicity, if we assume $N_{1} = N_{2} \approx N$ , and the average number of children of each

Related work

There are some related works which target to deal with ontology alignment such as COMA++ [6], NLM Anatomical Ontology Alignment [44], PBM based Falcon-AO [17], [18], iMAP [5] and Chimaera [24]. Moreover, there are some other approaches [14], [27], [42], which address large-scale ontologies or large-scale class-hierarchies. These related work, which are relevant to our Anchor-Flood algorithm, are described in the subsequent sections.

Conclusions and future work

In this paper, we described the Anchor-Flood algorithm that can align ontologies of arbitrary size effectively, and that makes it possible to achieve high performance and scalability over previous alignment algorithms. To achieve these goals, the algorithm took advantage of the notion of segmentation and allowed segmented output of aligned ontologies. Specifically, owing to the segmentation, our algorithm concentrates on aligning only small sets of the entire ontology data iteratively, by

Acknowledgment

This work was partially supported by the Global COE Program “Frontiers of Intelligent Sensing”, from the ministry of Education, Culture, Sports, Science and Technology.

References (43)

H. Do et al.
Matching large schemas: approaches and evaluation
Information Systems
(2007)
A. Farquhar et al.
Ontolingua server: a tool for collaborative ontology construction
International Journal of Human–Computers Studies
(1997)
T. Gruber
Toward principles for the design of ontologies used for knowledge sharing
International Journal of Human–Computer Studies
(1995)
S. Guha et al.
Rock: a robust clustering algorithm for categorical attributes
Journal of Information Systems
(2000)
W. Hu et al.
Matching large ontologies: a divide-and-conquer approach
Data and Knowledge Engineering
(2008)
C. Rosse et al.
A reference ontology for biomedical informatics: the foundation model of anatomy
Journal of Biomedical Informatics
(2003)
R. Studer et al.
Knowledge engineering: principles and methods
Journal of Data & Knowledge Engineering
(1998)
V. Benjamins et al.
Six challenges for the semantic web
AIS SIGSEMIS Bulletin
(2004)
T. Berners-Lee, M. Fischetti, M. Dertouzos, Weaving the Web: The Original Design and Ultimate Destiny of the World Wide...
C. Caracciolo et al.
Results of the Ontology Alignment Evaluation Initiative

J. David

AROMA results for OAEI

R. Dhamankar et al.

iMAP: discovering complex semantic matches between database schemas

M. Ehrig

Ontology Alignment: Bridging the Semantic Gap

(2007)

C. Fellbaum

WordNet: An Electronic Lexical Database

(1998)

F. Giunchiglia et al.

Semantic matching

The Knowledge Engineering Review

(2004)

B. Grau et al.

Automatic partitioning of OWL ontologies using $ϵ$ -Connections

F. Hamdi et al.

TaxoMap in the OAEI 2008 alignment contest

S. Hanif et al.

Automatic alignment of ontology eliminating the probable misalignments

W. Hu et al.

The results of Falcon-AO in the OAEI 2006 campaign

W. Hu et al.

Partition-based block matching of large class hierarchies

Y. Jean-Mary et al.

ASMOV: results for OAEI

Cited by (104)

Large-scale complex ontology matching through anchor-based semantic partitioning technique and confidence matrix based evolutionary algorithm
2022, Applied Soft Computing
Citation Excerpt :
The similar segments are also determined by the similarity of their central concepts. Anchor-Flood [17] works on the basis of the anchors, which are highly similar concept pairs. It gradually matches the neighboring concepts of an anchor to exploit more correspondences.
Ontology presents a field knowledge by defining the meaning of domain concepts and relations among them. To conduct the inter-operability between two heterogeneous ontologies, we need to determine a number of entity correspondences, the so-called Ontology Matching (OM). Traditional OM techniques dedicate to find the simple correspondences which cannot sufficiently express the diverse kinds of heterogeneity, and therefore, complex correspondences are required, which maps one or multiple entities from one ontology to those from the other. Due to the semantic complexity, automatically discovering large-scale simple and complex correspondences has become a challenge. To face this challenge, in this work, an Anchor-based Semantic Partitioning Technique (ASPT) is first presented to divide the large-scale ontologies into several semantic segments, whose computational complexity is lower than the existing ontology partitioning algorithms. Then, a Pattern-based Similarity Measure (PSM) is used to distinguish both simple and complex correspondences. Finally, a Confidence Matrix based Evolutionary Algorithm (CM-EA) is proposed to efficiently match the ontology segments. We compare our proposal with EA-based matching technique and state-of-the-art ontology matching systems on OAEI’s Complex track, and the experimental results show that our approach is able to efficiently match large-scale complex ontologies.
Matching large-scale biomedical ontologies with central concept based partitioning algorithm and Adaptive Compact Evolutionary Algorithm
2021, Applied Soft Computing
Citation Excerpt :
Falcon-AO [12] also uses the partitioning technique to reduce the search space, but it uses two concepts’ structural information to control the segment’s scale. Anchor-Flood [13] partitions an ontology by exploring the neighborhood of the anchor concepts, i.e. highly similar concepts neighborhood, to form different segments. Anchor-Flood can control the size and number of the segments.
As a unified model for describing biomedical knowledge, a biomedical ontology is of help to solve the issues of data heterogeneity in different biomedical databases. However, these ontologies might model same biomedical knowledge differently, yielding the heterogeneity problem. To address the biomedical ontology heterogeneity problem, it is necessary to match the heterogeneous concept pairs between two ontologies. How to reduce the computational complexity is a challenging problem when matching large-scale biomedical ontologies, which directly affects the matching efficiency and the alignment’s quality. To face this challenge, this work proposes a large-scale biomedical ontology partitioning and matching framework. In our proposal, a central concepts based ontology partitioning algorithm is first used to divide the ontology into several disjoint segments, which borrows the idea from the social network and Firefly Algorithm (FA). The proposed algorithm is able to partition the ontologies with low computation complexity, and at the same time, ensure the semantic completeness and the decent scale of each segment. Then, an Adaptive Compact Evolutionary Algorithm (ACEA) based matching technique is utilized to determine the ontology segment alignments, which can efficiently match the similar ontology segments. The experiment utilizes the biomedical testing cases provided by Ontology Alignment Evaluation Initiative (OAEI) to test our approach’s effectiveness, and the experimental results show that the alignments obtained by our method significantly outperforms the state-of-the-art biomedical ontology matching techniques.
AHAB: Aligning heterogeneous knowledge bases via iterative blocking
2019, Information Processing and Management
With the development of information extraction, there have been an increasing number of large-scale knowledge bases available in different domains. In recent years, a great deal of approaches have been proposed for large-scale knowledge base alignment. Most of them are based on iterative matching. If a pair of entities has been aligned, their compatible neighbors are selected as candidate entity pairs. The limitation of these methods is that they discover candidate entity pairs depending on aligned relations, which cannot be used for aligning heterogeneous knowledge bases. Only few existing methods focus on aligning heterogeneous knowledge bases, which discover candidate entity pairs just for once by traditional blocking methods. However, the performance of these methods depends on blocking keys heavily, which are hard to select. In this paper, we present an approach for aligning heterogeneous knowledge bases via iterative blocking (AHAB) to improve the discovery and refinement of candidate entity pairs. AHAB iteratively utilizes different relations for blocking, and then matches block pairs based on matched entity pairs. The Cartesian product of unmatched entities in matched block pairs forms candidate entity pairs. By filtering out dissimilar candidate entity pairs, matched entity pairs will be found. The number of matched entity pairs proliferates with iterations, which in turn helps match block pairs in each iteration. Experiments on real-world heterogeneous knowledge bases demonstrate that AHAB is able to yield a competitive performance.
A Novel Accurate and Time Efficient Map Reduce Approach for Biomedical Ontology Alignment
2024, Journal of Electrical Engineering and Technology
Construction of a Type Knowledge Graph Based on the Value Cognitive Turn of Characteristic Villages: An Application in Jixi, Anhui Province, China
2024, Land
Matching Weak Informative Ontologies
2023, arXiv

View all citing articles on Scopus

View full text

An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size

Abstract

Introduction

Section snippets

General terminologies

Anchor-Flood algorithm

Experiments and evaluation

Complexity analysis

Related work

Conclusions and future work

Acknowledgment

Information Systems

International Journal of Human–Computers Studies

International Journal of Human–Computer Studies

Journal of Information Systems

Data and Knowledge Engineering

Journal of Biomedical Informatics

Journal of Data & Knowledge Engineering

Six challenges for the semantic web

AIS SIGSEMIS Bulletin

Results of the Ontology Alignment Evaluation Initiative