Generating core domain ontologies from normalized dictionaries

doi:10.1016/j.engappai.2016.01.014

Engineering Applications of Artificial Intelligence

Volume 51, May 2016, Pages 230-241

https://doi.org/10.1016/j.engappai.2016.01.014 Get rights and content

Abstract

This paper proposes a general framework for automatic core domain ontology generation from LMF (ISO 24613) standardized dictionaries. The originality of this work lies not only in the use of a unique and finely structured source containing multi-domain and lexical knowledge of morphological, syntactic and semantic levels, lending itself to ontological interpretations, but also in the proper building of the taxonomic backbone of the domain ontology. To this end, we have integrated a validation stage into the proposed process in order to maintain the consistency of the resulting formalized domain ontology core throughout this process and support the checking of anomalies in the handled source. Furthermore, this generation process has been implemented in an iterative and incremental system based on domain- and language-independent rules. The reliability of the proposed process is proven through many experiments that have been conducted on various domains using normalized dictionaries, but without lack of generality, we choose to illustrate an experiment carried out on the Arabic language. This choice is explained by both the great deficiency of work on building of Arabic ontologies and the availability within our research team of an LMF-standardized Arabic dictionary.

Introduction

In recent years, research on ontology development and construction process improvement has become increasingly widespread in computer science community. Indeed, domain ontologies are extremely powerful knowledge representation tools for describing a set of relevant domain-specific concepts and their relationships in a formal way (Guarino, 1998). Although the field of ontology learning aiming to automate the ontology creation process has been dealt with by plenty of work, it is still a long way from being fully automatic and deployable on a large scale. Actually, it requires significant human (expert) involvement for the validation of each step throughout this process (Lonsdale et al., 2010).

In order to reduce the costs, research on ontology learning has been conducted using a variety of resources, such as raw text (Poon and Domingos, 2010, Aussenac-Gilles et al., 2008, Li et al., 2005, Navigli et al., 2003), XML structured data (Aussenac-Gilles and Kamel, 2009, Bedini et al., 2011), Machine-Readable Dictionaries (MRDs) (Kurematsu et al., 2004, Kietz et al., 2000, Rigau et al., 1998), and thesauri (Li and Li, 2012, Chrisment et al., 2008, Soergel et al., 2004). Obviously, these resources have different features, and therefore, each proposed process is based on a different approach pertaining to rules, Natural Language Processing (NLP) techniques, etc.

As linguistic information is increasingly required in ontologies mainly in NLP applications (Buitelaar et al., 2009, Pazienza et al., 2007), among the considered terminological resources, MRDs represent one of the most likely and suitable sources promoting the knowledge extraction both at conceptual and lexical levels. However, since much lexical information has not yet been encoded, access to the potential wealth of information in dictionaries remains limited for software applications.

From another standpoint, the growing awareness of the benefits of having finely structured knowledge in lexical resources has led to the definition of Lexical Markup Framework (LMF) (ISO 24613, 2008). Its meta-model basically provides a common and shared representation of lexical objects that allows the encoding of rich linguistic information, including morphological, syntactic and semantic aspects (Francopoulo and George, 2008). Particularly, an LMF-standardized dictionary incorporates widely-accepted and commonly-referenced diversified linguistic knowledge lending itself to ontological interpretations. Besides, finely structured and multi-domain knowledge in an LMF-standardized dictionary paves the way for automatically generating ontological entities that constitute the core of the targeted domain ontology (Baccar et al., 2010).

The ultimate objective of this paper is to propose a framework for core domain ontologies generation starting from LMF-standardized dictionaries. In fact, the systematic organization of such resource allowed us to implement a fully automatic process for a direct dictionary transformation of some particular information into ontological elements based on domain- and language-independent rules. In addition, since evaluating ontology as a whole is a costly and challenging task, especially when the reduction of human intervention is sought (Almeida, 2009) a validation stage had to be integrated into our iterative process. Indeed, the detected errors would be avoided in ontology to maintain its consistency, on the one hand, and should be reported back to an expert to indicate anomalies in the handled source, on the other hand. Then, the interest of this stage is twofold: first, it guarantees the quality of the produced ontology; second, it contributes to the checking of anomalies in the handled dictionary that are hard to be detected manually (Wali et al., 2014).

Apart from its generality and fully automated level, the proposed framework has the merit to support under-resourced languages, such as Arabic language, in the sense that new resources could be generated from existing ones with the least effort and costs.

Furthermore, the implemented system has proven to be trustworthy through a series of experiments that have been conducted on various domains using normalized dictionaries, but without lack of generality, the experiment reported in this paper was carried out on the Arabic language. This choice is explained not only by the great deficiency of work on Arabic ontology building, but also by the availability within our research team of an LMF-standardized Arabic dictionary (Baccar et al., 2008).

Concerning the ontology implementation, we chose to use the Ontology Web Language (OWL) language in its version 2 (Motik et al., 2009). It is a formal and standard language proposed to represent ontologies for the Semantic Web. Indeed, the use of this language is stimulated by its reasoning capabilities and standard nature.

The remainder of this paper is structured as follows. Section 2 presents the state-of-the art and the motivations of our work. Section 3 describes the proposed framework for the core domain ontology generation from normalized dictionaries. Then, Section 4 gives details of the system implementation and its experimentation. Next, in Section 5, we discuss the assessment of the proposed framework as well as the quality of the obtained experimental results. Finally, Section 6 concludes the paper and opens perspectives for future work.

Section snippets

State-of-the art and motivations

Ontology engineering has always been a tedious task requiring considerable human involvement and effort especially in the activity of knowledge acquisition. During the three last decades, there have been some efforts to automate ontology construction process by exploiting structured documents like XML structured data sources (Aussenac-Gilles and Kamel, 2009). However, almost all the proposed approaches suffer from many drawbacks, especially their non generality (narrow scope of application) and

Basic idea

Domain ontology offers a formal representation of concepts and relationships between the concepts for a given domain (Guarino, 1998). Regarding the concept, it is a cognitive unit of meaning which may be an abstract idea or a mental symbol. As for conceptual relationships, they include hierarchical relations (taxonomies or subclass relations) and non-hierarchical relations (simple associations).

Researchers from computational lexical semantics have investigated the close relations between domain

The core generation tool implementation details

The proposed process for core domain ontology generation from LMF-standardized dictionaries is implemented in a Java-based tool that enables users to automatically build the core structures. It is an incremental system that examines the automatically created domain dictionary entry by entry. It also performs the core generation process in one or more iterations.

Furthermore, generated core domain ontologies are formalized in OWL2, a new version of OWL (Motik et al., 2009). Indeed, an OWL

Discussion

It is generally accepted that ontology development is a challenging, time-consuming and error-prone task. That is why one of our primary goals is to alleviate this undertaking through the reuse of existing linguistic resources. According to our experiences carried out on LMF-standardized dictionaries (ISO-24613), which are finely structured sources containing multi-domain and lexical knowledge of morphological, syntactic and semantic levels, lending itself to ontological interpretations, we

Conclusion and future work

This paper proposes a general framework of generating core domain ontologies from the structured content of LMF (ISO-24613)-standardized dictionaries. In contrast with almost all the approaches of ontology learning, the one proposed here suggests domain and language independent process for automatically domain ontology entity generation. Moreover, we have addressed another challenging task of a major importance, which is ontology quality. For this reason, we have decided to focus on error

References (42)

D. Lonsdale et al.
Reusing ontologies and language components for ontology generation
J. Data Knowl. Eng.
(2010)
E. Sirin et al.
Pellet: a practical OWL DL reasoner
J. Web Semantics
(2007)
M.B. Almeida
A proposal to evaluate ontology content
Appl. Ontol.
(2009)
Amsler, R.A., 1981. A taxonomy for english nouns and verbs. In: Proceedings of the 19th Annual Meeting on Association...
N. Aussenac-Gilles et al.
The TERMINAE Method and Platform for Ontology Engineering from texts
Aussenac-Gilles, N., and Kamel, M., 2009. Ontology learning by analyzing XML document structure and content. In:...
Baccar, Ben Amar, F., Khemakhem. A., Gargouri. B., Haddar. K., Ben Hamadou. A., 2008. LMF standardized model for the...
Baccar, Ben Amar, F., Gargouri, B., Ben Hamadou, A., 2010. Towards generation of domain ontology from LMF standardized...
Bedini, I., Matheus, C., Patel-Schneider, P., Boran, A., Nguyen, B., 2011. Transformnig XML schema to OWL using...
Buitelaar, P., Cimiano, P., Haase, P., Sintek, M., 2009. Towards linguistically grounded ontologies. In: Proceedings of...

Calzolari, N., 1985. Detecting patterns in a lexical data base. In: Proceedings of the 22nd Annual Meeting on...

Chodorow, M.S., Byrd, R.J., Heidorn, G.E., 1985. Extracting semantic hierarchies from a large on-line dictionary. In:...

P. Cimiano

Ontology Learning and Population from Text: Algorithms, Evaluation and Applications

(2006)

C. Chrisment et al.

Méthodologie de transformation d׳un thesaurus en une ontologie de domaine

Revue d׳Intelligence Artif.

(2008)

Dolan, W., Vanderwende, L., Richardson, S., 1993. Automatically deriving strcutured knowledge bases from online...

Francopoulo, G., George, M., 2008. Language Resource Management - Lexical Markup Framework (LMF). Technical report,...

Gómez-Pérez, A., 1999. Evaluation of taxonomic knowledge on ontologies and knowledge-based systems. In: Proceedings of...

A. Gómez-Pérez

Ontology evaluation

Guarino, N., 1998. Formal Ontology in Information Systems, In: Proceedings of the 1st International Conference, June...

G. Hirst

Ontology and the Lexicon

ISO 24613: Lexical Markup Framework (LMF) revision 16. ISO FDIS...

Cited by (3)

Building a morpho-semantic knowledge graph for Arabic information retrieval
2020, Information Processing and Management
Citation Excerpt :
Whereas, CS is a Concept-Space by which semantically related terms are grouped into concepts based on semantic relations as synonymy, generalization properties (Super-Classes and Instance-Of) and specialization properties (Sub-classes and Has-Instances). Existent works build or exploit several types of resources such as dictionaries (Soudani, Bounhas, Elayeb & Slimani, 2014), ontologies (Baccar Ben Amar, Gargouri & Ben Hamadou, 2016; Gasmi, 2009), conceptual spaces including semantic relations (Achour & Zouari, 2013; Bakhouche & Tlili-Guiassa, 2012) and AWN (Abbache, Meziane, Belalem & Belkredim, 2016; Abouenour et al., 2010; Atwan et al., 2016; Fraser et al., 2002). Although some existing resources have a good coverage and are rich with semantic relations, we note the lack of a resource which combines both morphological and semantic knowledge.
In this paper, we propose to build a morpho-semantic knowledge graph from Arabic vocalized corpora. Our work focuses on classical Arabic as it has not been deeply investigated in related works. We use a tool suite which allows analyzing and disambiguating Arabic texts, taking into account short diacritics to reduce ambiguities. At the morphological level, we combine Ghwanmeh stemmer and MADAMIRA which are adapted to extract a multi-level lexicon from Arabic vocalized corpora. At the semantic level, we infer semantic dependencies between tokens by exploiting contextual knowledge extracted by a concordancer. Both morphological and semantic links are represented through compressed graphs, which are accessed through lazy methods. These graphs are mined using a measure inspired from BM25 to compute one-to-many similarity. Indeed, we propose to evaluate the morpho-semantic Knowledge Graph in the context of Arabic Information Retrieval (IR). Several scenarios of document indexing and query expansion are assessed. That is, we vary indexing units for Arabic IR based on different levels of morphological knowledge, a challenging issue which is not yet resolved in previous research. We also experiment several combinations of morpho-semantic query expansion. This permits to validate our resource and to study its impact on IR based on state-of-the art evaluation metrics.
Review of tools for semantics extraction: Application in tsunami research domain
2022, Information (Switzerland)
Genetic algorithm based random selection-rule creation for ontology building
2017, Advances in Intelligent Systems and Computing

View full text