Generating core domain ontologies from normalized dictionaries
Introduction
In recent years, research on ontology development and construction process improvement has become increasingly widespread in computer science community. Indeed, domain ontologies are extremely powerful knowledge representation tools for describing a set of relevant domain-specific concepts and their relationships in a formal way (Guarino, 1998). Although the field of ontology learning aiming to automate the ontology creation process has been dealt with by plenty of work, it is still a long way from being fully automatic and deployable on a large scale. Actually, it requires significant human (expert) involvement for the validation of each step throughout this process (Lonsdale et al., 2010).
In order to reduce the costs, research on ontology learning has been conducted using a variety of resources, such as raw text (Poon and Domingos, 2010, Aussenac-Gilles et al., 2008, Li et al., 2005, Navigli et al., 2003), XML structured data (Aussenac-Gilles and Kamel, 2009, Bedini et al., 2011), Machine-Readable Dictionaries (MRDs) (Kurematsu et al., 2004, Kietz et al., 2000, Rigau et al., 1998), and thesauri (Li and Li, 2012, Chrisment et al., 2008, Soergel et al., 2004). Obviously, these resources have different features, and therefore, each proposed process is based on a different approach pertaining to rules, Natural Language Processing (NLP) techniques, etc.
As linguistic information is increasingly required in ontologies mainly in NLP applications (Buitelaar et al., 2009, Pazienza et al., 2007), among the considered terminological resources, MRDs represent one of the most likely and suitable sources promoting the knowledge extraction both at conceptual and lexical levels. However, since much lexical information has not yet been encoded, access to the potential wealth of information in dictionaries remains limited for software applications.
From another standpoint, the growing awareness of the benefits of having finely structured knowledge in lexical resources has led to the definition of Lexical Markup Framework (LMF) (ISO 24613, 2008). Its meta-model basically provides a common and shared representation of lexical objects that allows the encoding of rich linguistic information, including morphological, syntactic and semantic aspects (Francopoulo and George, 2008). Particularly, an LMF-standardized dictionary incorporates widely-accepted and commonly-referenced diversified linguistic knowledge lending itself to ontological interpretations. Besides, finely structured and multi-domain knowledge in an LMF-standardized dictionary paves the way for automatically generating ontological entities that constitute the core of the targeted domain ontology (Baccar et al., 2010).
The ultimate objective of this paper is to propose a framework for core domain ontologies generation starting from LMF-standardized dictionaries. In fact, the systematic organization of such resource allowed us to implement a fully automatic process for a direct dictionary transformation of some particular information into ontological elements based on domain- and language-independent rules. In addition, since evaluating ontology as a whole is a costly and challenging task, especially when the reduction of human intervention is sought (Almeida, 2009) a validation stage had to be integrated into our iterative process. Indeed, the detected errors would be avoided in ontology to maintain its consistency, on the one hand, and should be reported back to an expert to indicate anomalies in the handled source, on the other hand. Then, the interest of this stage is twofold: first, it guarantees the quality of the produced ontology; second, it contributes to the checking of anomalies in the handled dictionary that are hard to be detected manually (Wali et al., 2014).
Apart from its generality and fully automated level, the proposed framework has the merit to support under-resourced languages, such as Arabic language, in the sense that new resources could be generated from existing ones with the least effort and costs.
Furthermore, the implemented system has proven to be trustworthy through a series of experiments that have been conducted on various domains using normalized dictionaries, but without lack of generality, the experiment reported in this paper was carried out on the Arabic language. This choice is explained not only by the great deficiency of work on Arabic ontology building, but also by the availability within our research team of an LMF-standardized Arabic dictionary (Baccar et al., 2008).
Concerning the ontology implementation, we chose to use the Ontology Web Language (OWL) language in its version 2 (Motik et al., 2009). It is a formal and standard language proposed to represent ontologies for the Semantic Web. Indeed, the use of this language is stimulated by its reasoning capabilities and standard nature.
The remainder of this paper is structured as follows. Section 2 presents the state-of-the art and the motivations of our work. Section 3 describes the proposed framework for the core domain ontology generation from normalized dictionaries. Then, Section 4 gives details of the system implementation and its experimentation. Next, in Section 5, we discuss the assessment of the proposed framework as well as the quality of the obtained experimental results. Finally, Section 6 concludes the paper and opens perspectives for future work.
Section snippets
State-of-the art and motivations
Ontology engineering has always been a tedious task requiring considerable human involvement and effort especially in the activity of knowledge acquisition. During the three last decades, there have been some efforts to automate ontology construction process by exploiting structured documents like XML structured data sources (Aussenac-Gilles and Kamel, 2009). However, almost all the proposed approaches suffer from many drawbacks, especially their non generality (narrow scope of application) and
Basic idea
Domain ontology offers a formal representation of concepts and relationships between the concepts for a given domain (Guarino, 1998). Regarding the concept, it is a cognitive unit of meaning which may be an abstract idea or a mental symbol. As for conceptual relationships, they include hierarchical relations (taxonomies or subclass relations) and non-hierarchical relations (simple associations).
Researchers from computational lexical semantics have investigated the close relations between domain
The core generation tool implementation details
The proposed process for core domain ontology generation from LMF-standardized dictionaries is implemented in a Java-based tool that enables users to automatically build the core structures. It is an incremental system that examines the automatically created domain dictionary entry by entry. It also performs the core generation process in one or more iterations.
Furthermore, generated core domain ontologies are formalized in OWL2, a new version of OWL (Motik et al., 2009). Indeed, an OWL
Discussion
It is generally accepted that ontology development is a challenging, time-consuming and error-prone task. That is why one of our primary goals is to alleviate this undertaking through the reuse of existing linguistic resources. According to our experiences carried out on LMF-standardized dictionaries (ISO-24613), which are finely structured sources containing multi-domain and lexical knowledge of morphological, syntactic and semantic levels, lending itself to ontological interpretations, we
Conclusion and future work
This paper proposes a general framework of generating core domain ontologies from the structured content of LMF (ISO-24613)-standardized dictionaries. In contrast with almost all the approaches of ontology learning, the one proposed here suggests domain and language independent process for automatically domain ontology entity generation. Moreover, we have addressed another challenging task of a major importance, which is ontology quality. For this reason, we have decided to focus on error
References (42)
- et al.
Reusing ontologies and language components for ontology generation
J. Data Knowl. Eng.
(2010) - et al.
Pellet: a practical OWL DL reasoner
J. Web Semantics
(2007) A proposal to evaluate ontology content
Appl. Ontol.
(2009)- Amsler, R.A., 1981. A taxonomy for english nouns and verbs. In: Proceedings of the 19th Annual Meeting on Association...
- et al.
The TERMINAE Method and Platform for Ontology Engineering from texts
- Aussenac-Gilles, N., and Kamel, M., 2009. Ontology learning by analyzing XML document structure and content. In:...
- Baccar, Ben Amar, F., Khemakhem. A., Gargouri. B., Haddar. K., Ben Hamadou. A., 2008. LMF standardized model for the...
- Baccar, Ben Amar, F., Gargouri, B., Ben Hamadou, A., 2010. Towards generation of domain ontology from LMF standardized...
- Bedini, I., Matheus, C., Patel-Schneider, P., Boran, A., Nguyen, B., 2011. Transformnig XML schema to OWL using...
- Buitelaar, P., Cimiano, P., Haase, P., Sintek, M., 2009. Towards linguistically grounded ontologies. In: Proceedings of...
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications
Méthodologie de transformation d׳un thesaurus en une ontologie de domaine
Revue d׳Intelligence Artif.
Ontology evaluation
Ontology and the Lexicon
Cited by (3)
Building a morpho-semantic knowledge graph for Arabic information retrieval
2020, Information Processing and ManagementCitation Excerpt :Whereas, CS is a Concept-Space by which semantically related terms are grouped into concepts based on semantic relations as synonymy, generalization properties (Super-Classes and Instance-Of) and specialization properties (Sub-classes and Has-Instances). Existent works build or exploit several types of resources such as dictionaries (Soudani, Bounhas, Elayeb & Slimani, 2014), ontologies (Baccar Ben Amar, Gargouri & Ben Hamadou, 2016; Gasmi, 2009), conceptual spaces including semantic relations (Achour & Zouari, 2013; Bakhouche & Tlili-Guiassa, 2012) and AWN (Abbache, Meziane, Belalem & Belkredim, 2016; Abouenour et al., 2010; Atwan et al., 2016; Fraser et al., 2002). Although some existing resources have a good coverage and are rich with semantic relations, we note the lack of a resource which combines both morphological and semantic knowledge.
Review of tools for semantics extraction: Application in tsunami research domain
2022, Information (Switzerland)Genetic algorithm based random selection-rule creation for ontology building
2017, Advances in Intelligent Systems and Computing