An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size
Introduction
“An ontology is an explicit specification of a conceptualization” is a prominent defintion by T.R. Gruber in 1995 [13]. The definition was then extended by Studer et al., in 1998 as “an ontology is an explicit, formal specification of a shared conceptualization of a domain of interest” [39]. An ontology generally consists of entities such as concepts, properties and relations. It is the backbone to fulfill the semantic web vision [2], [23] and is a knowledge base to enable machines to communicate each other effectively. The knowledge captured in ontologies can be used to annotate data, to distinguish between homonyms and polysemies, to drive intelligent user interfaces and even to retrieve new information. A number of ontologies are increasing day by day with new semantic web contents, because an ontology is being developed to formalize the conceptualization behind the idea of semantic web. Therefore, in order to achieve semantic interoperability and integrity, ontology alignment would be playing an important role [1].
Ontologies may be of different size, small to large, having hundreds to millions of RDF triples, where an RDF triple is an RDF statement between a subject, a predicate, and an object. A small-scale ontology is usually defined within a single segment, where the term “segment” is referred by Seidenberg et al. [34] as a fragment that stands alone as an ontology in its own right. A large-scale ontology is generally very complex in nature and may contain multiple segments [34]. Stuckenschimdt et al. [38] also reveals that a large-scale ontology contains a set of modules about a certain subtopic that can be used independently of the other modules. A large-scale ontology expressed in Web Ontology Language (OWL) [25] is appearing to capture distributed knowledge in a centralized knowledge base. For example, the biomedical domain has enormous large-scale ontologies, such as FMA [32] and OpenGALEN [31].
Although there are many diverse solutions to the ontology alignment problem [7], only a few are capable to handle large ontologies efficiently [17], [19]. Many previously developed ontology alignment systems are also facing severe performance problem when they attempt to deal with large ontologies. Resolving scalability problem is also an important current issue in the ontology alignment research [35]. Moreover, users (humans or machines) may want to find a specific part of a large ontology which will fit only to their interest.
The main contribution of our approach is of attaining performance enhancement by solving the scalability problem in aligning large ontologies. The key idea is to start from an anchor point of a taxonomy of an ontology and to proceed towards the neighboring nodes considering the locality of reference. Eventually our proposed algorithm aligns a part of gigantic ontologies and outputs a segmented alignment, (i.e. an alignment across two related segments or fragments across ontologies) which are relatively small, however sufficient to satisfy users interest in a particular domain. Therefore, performance, scalability, and segmented alignment (i.e. an alignment across two related segments or fragments of ontologies) are the key motivations of our proposed algorithm in this paper.
In our previous research work, we also focused on the ontology alignment [16]. Our system was capable of aligning small-scale ontologies effectively with the feature of eliminating misalignments. However, its major drawback was against the aligning of large-scale ontologies, as its complexity was . To overcome this limitation of scalability problem, we developed a new algorithm, we called as “Anchor-Flood” algorithm.
Our algorithm assumes a seed called an anchor, where the notion anchor is derived from the Anchor-PROMPT algorithm [30], although it is clearly different from the notion used in that algorithm. The Anchor-PROMPT algorithm augments existing methods by determining additional possible aligned pairs across ontologies, while our Anchor-Flood algorithm starts aligning from an anchor and produces segmented alignments.
Our algorithm also shares the term “flood” with similarity-flooding algorithm [26] without major similarities in their action blocks. Although our work is inspired by the idea of similarity flooding algorithm in that the elements of two distinct hierarchy models are likely to be similar when their adjacent elements are similar. However, our Anchor-Flood algorithm does not propagate similarity value to the adjacent nodes iteratively unlike similarity-flooding algorithm.
In the main process block of our algorithm, we collect aligned pairs from the neighboring concepts computing similarities among the collected concepts across ontologies starting from an anchor. It begins exploring to the neighboring concepts to collect more aligned pairs. As our algorithm starts off an anchor and explores to the neighboring concepts, it has a salient feature of scalability. Our algorithm achieves enhancement in terms of scalability and performance in aligning large ontologies. It can reduce the number of seemingly-aligned but actually misaligned pairs [16]. It also outputs “segmented alignment”, which is a unique characteristic in the field of ontology alignment research.
The rest of the paper is organized as follows. Section 2 introduces general terminologies to be used in the later sections. Section 3 contains the description of our proposed “Anchor-Flood” algorithm. Section 4 includes experiments and evaluation, while Section 5 describes the complexity factor of the algorithm. Some related work and the differences with our algorithm are described in Section 6. Section 7 includes concluded remarks along with some future directions of our work.
Section snippets
General terminologies
This section introduces some of the basic definitions in the field of ontology alignment research and familiarizes the reader with the notions and terminologies used throughout the paper. It includes the definitions of ontology, taxonomy, alignment across ontologies, the similarity measures and the idea of a segment in an ontology.
Anchor-Flood algorithm
Our proposed algorithm takes an anchor as one of the inputs along with two ontologies, and . It outputs a set of aligned pairs within two segments across ontologies. Before proceeding to the detail of our algorithm, we define four frequently used terms in this paper.
Definition 1 Anchor An anchor is defined as a pair of similar concepts across ontologies. If , and , meaning that e and f are aligned, then an anchor, is defined as a pair , where the first element comes from
Experiments and evaluation
We applied our Anchor-Flood algorithm to a variety of datasets, including the benchmark dataset,3 and larger ontologies, such as anatomical ontologies (FMA.owl, OpenGALEN.owl, human.owl, mouse.owl and so on), web directory ontologies in OWL format.
In this section, we discuss the results obtained from the OAEI-2008 and acquired by our extensive experiments in terms of scalability, memory consumption and segmented alignment. As we participated in
Complexity analysis
As the Anchor-Flood algorithm starts off with aligning an anchor, and explores to its neighboring concepts, the performance is not heavily affected by the size of ontologies. However, the performance depends on the size of neighbors defined in Eq. (5), and on the size of segments in a segmented alignment, which is illustrated in Section 2.4.
Consider two ontologies with entities and with entities, and for simplicity, if we assume , and the average number of children of each
Related work
There are some related works which target to deal with ontology alignment such as COMA++ [6], NLM Anatomical Ontology Alignment [44], PBM based Falcon-AO [17], [18], iMAP [5] and Chimaera [24]. Moreover, there are some other approaches [14], [27], [42], which address large-scale ontologies or large-scale class-hierarchies. These related work, which are relevant to our Anchor-Flood algorithm, are described in the subsequent sections.
Conclusions and future work
In this paper, we described the Anchor-Flood algorithm that can align ontologies of arbitrary size effectively, and that makes it possible to achieve high performance and scalability over previous alignment algorithms. To achieve these goals, the algorithm took advantage of the notion of segmentation and allowed segmented output of aligned ontologies. Specifically, owing to the segmentation, our algorithm concentrates on aligning only small sets of the entire ontology data iteratively, by
Acknowledgment
This work was partially supported by the Global COE Program “Frontiers of Intelligent Sensing”, from the ministry of Education, Culture, Sports, Science and Technology.
References (43)
- et al.
Matching large schemas: approaches and evaluation
Information Systems
(2007) - et al.
Ontolingua server: a tool for collaborative ontology construction
International Journal of Human–Computers Studies
(1997) Toward principles for the design of ontologies used for knowledge sharing
International Journal of Human–Computer Studies
(1995)- et al.
Rock: a robust clustering algorithm for categorical attributes
Journal of Information Systems
(2000) - et al.
Matching large ontologies: a divide-and-conquer approach
Data and Knowledge Engineering
(2008) - et al.
A reference ontology for biomedical informatics: the foundation model of anatomy
Journal of Biomedical Informatics
(2003) - et al.
Knowledge engineering: principles and methods
Journal of Data & Knowledge Engineering
(1998) - et al.
Six challenges for the semantic web
AIS SIGSEMIS Bulletin
(2004) - T. Berners-Lee, M. Fischetti, M. Dertouzos, Weaving the Web: The Original Design and Ultimate Destiny of the World Wide...
- et al.
Results of the Ontology Alignment Evaluation Initiative
AROMA results for OAEI
iMAP: discovering complex semantic matches between database schemas
Ontology Alignment: Bridging the Semantic Gap
WordNet: An Electronic Lexical Database
Semantic matching
The Knowledge Engineering Review
Automatic partitioning of OWL ontologies using -Connections
TaxoMap in the OAEI 2008 alignment contest
Automatic alignment of ontology eliminating the probable misalignments
The results of Falcon-AO in the OAEI 2006 campaign
Partition-based block matching of large class hierarchies
ASMOV: results for OAEI
Cited by (104)
Large-scale complex ontology matching through anchor-based semantic partitioning technique and confidence matrix based evolutionary algorithm
2022, Applied Soft ComputingCitation Excerpt :The similar segments are also determined by the similarity of their central concepts. Anchor-Flood [17] works on the basis of the anchors, which are highly similar concept pairs. It gradually matches the neighboring concepts of an anchor to exploit more correspondences.
Matching large-scale biomedical ontologies with central concept based partitioning algorithm and Adaptive Compact Evolutionary Algorithm
2021, Applied Soft ComputingCitation Excerpt :Falcon-AO [12] also uses the partitioning technique to reduce the search space, but it uses two concepts’ structural information to control the segment’s scale. Anchor-Flood [13] partitions an ontology by exploring the neighborhood of the anchor concepts, i.e. highly similar concepts neighborhood, to form different segments. Anchor-Flood can control the size and number of the segments.
AHAB: Aligning heterogeneous knowledge bases via iterative blocking
2019, Information Processing and ManagementA Novel Accurate and Time Efficient Map Reduce Approach for Biomedical Ontology Alignment
2024, Journal of Electrical Engineering and TechnologyMatching Weak Informative Ontologies
2023, arXiv