Top

2014 | Book

Read chapter Read first chapter

Controlled Natural Language

4th International Workshop, CNL 2014, Galway, Ireland, August 20-22, 2014. Proceedings

Editors: Brian Davis, Kaarel Kaljurand, Tobias Kuhn

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This book constitutes the refereed proceedings of the 4th International Workshop on Controlled Natural Language, CNL 2014, held in Galway, Ireland, in August 2014. The 17 full papers and one invited paper presented were carefully reviewed and selected from 26 submissions. The topics include simplified language, plain language, formalized language, processable language, fragments of language, phraseologies, conceptual authoring, language generation, and guided natural language interfaces.

Frontmatter

Embedded Controlled Languages

Abstract

Inspired by embedded programming languages, an embedded CNL (controlled natural language) is a proper fragment of an entire natural language (its host language), but it has a parser that recognizes the entire host language. This makes it possible to process out-of-CNL input and give useful feedback to users, instead of just reporting syntax errors. This extended abstract explains the main concepts of embedded CNL implementation in GF (Grammatical Framework), with examples from machine translation and some other ongoing work.

Aarne Ranta

Controlled Natural Language Processing as Answer Set Programming: An Experiment

Abstract

Most controlled natural languages (CNLs) are processed with the help of a pipeline architecture that relies on different software components. We investigate in this paper in an experimental way how well answer set programming (ASP) is suited as a unifying framework for parsing a CNL, deriving a formal representation for the resulting syntax trees, and for reasoning with that representation. We start from a list of input tokens in ASP notation and show how this input can be transformed into a syntax tree using an ASP grammar and then into reified ASP rules in form of a set of facts. These facts are then processed by an ASP meta-interpreter that allows us to infer new knowledge.

Rolf Schwitter

How Easy Is It to Learn a Controlled Natural Language for Building a Knowledge Base?

Abstract

Recent developments in controlled natural language editors for knowledge engineering (KE) have given rise to expectations that they will make KE tasks more accessible and perhaps even enable non-engineers to build knowledge bases. This exploratory research focussed on novices and experts in knowledge engineering during their attempts to learn a controlled natural language (CNL) known as OWL Simplified English and use it to build a small knowledge base. Participants’ behaviours during the task were observed through eye-tracking and screen recordings.

This was an attempt at a more ambitious user study than in previous research because we used a naturally occurring text as the source of domain knowledge, and left them without guidance on which information to select, or how to encode it. We have identified a number of skills (competencies) required for this difficult task and key problems that authors face.

Sandra Williams, Richard Power, Allan Third

Linguistic Analysis of Requirements of a Space Project and Their Conformity with the Recommendations Proposed by a Controlled Natural Language

Abstract

The long term aim of the project carried out by the French National Space Agency (CNES) is to design a writing guide based on the real and regular writing of requirements. As a first step in the project, this paper proposes a linguistic analysis of requirements written in French by CNES engineers. The aim is to determine to what extent they conform to two rules laid down in INCOSE, a recent guide for writing requirements. Although CNES engineers are not obliged to follow any Controlled Natural Language in their writing of requirements, we believe that language regularities are likely to emerge from this task, mainly due to the writers’ experience. The issue is approached using natural language processing tools to identify sentences that do not comply with INCOSE rules. We further review these sentences to understand why the recommendations cannot (or should not) always be applied when specifying largescale projects.

Anne Condamines, Maxime Warnier

Evaluating the Fully Automatic Multi-language Translation of the Swiss Avalanche Bulletin

Abstract

The Swiss avalanche bulletin is produced twice a day in four languages. Due to the lack of time available for manual translation, a fully automated translation system is employed, based on a catalogue of predefined phrases and predetermined rules of how these phrases can be combined to produce sentences. The system is able to automatically translate such sentences from German into the target languages French, Italian and English without subsequent proofreading or correction. Our catalogue of phrases is limited to a small sublanguage. The reduction of daily translation costs is expected to offset the initial development costs within a few years. After being operational for two winter seasons, we assess here the quality of the produced texts based on an evaluation where participants rate real danger descriptions from both origins, the catalogue of phrases versus the manually written and translated texts. With a mean recognition rate of 55%, users can hardly distinguish between the two types of texts, and give similar ratings with respect to their language quality. Overall, the output from the catalogue system can be considered virtually equivalent to a text written by avalanche forecasters and then manually translated by professional translators. Furthermore, forecasters declared that all relevant situations were captured by the system with sufficient accuracy and within the limited time available.

Kurt Winkler, Tobias Kuhn, Martin Volk

Towards an Error Correction Memory to Enhance Technical Texts Authoring in LELIE

Abstract

In this paper, we investigate and experiment the notion of error correction memory applied to error correction in technical texts. The main purpose is to induce relatively generic correction patterns associated with more contextual correction recommendations, based on previously memorized and analyzed corrections. The notion of error correction memory is developed within the framework of the LELIE project and illustrated on the case of fuzzy lexical items, which is a major problem in technical texts.

Juyeon Kang, Patrick Saint-Dizier

RuleCNL: A Controlled Natural Language for Business Rule Specifications

Abstract

Business rules represent the primary means by which companies define their business, perform their actions in order to reach their objectives. Thus, they need to be expressed unambiguously to avoid inconsistencies between business stakeholders and formally in order to be machine-processed. A promising solution is the use of a controlled natural language (CNL) which is a good mediator between natural and formal languages. This paper presents RuleCNL, which is a CNL for defining business rules. Its core feature is the alignment of the business rule definition with the business vocabulary which ensures traceability and consistency with the business domain. The RuleCNL tool provides editors that assist end-users in the writing process and automatic mappings into the Semantics of Business Vocabulary and Business Rules (SBVR) standard. SBVR is grounded in first order logic and includes constructs called semantic formulations that structure the meaning of rules.

Paul Brillant Feuto Njonko, Sylviane Cardey, Peter Greenfield, Walid El Abed

Toward Verbalizing Ontologies in isiZulu

Abstract

IsiZulu is one of the eleven official languages of South Africa and roughly half the population can speak it. It is the first (home) language for over 10 million people in South Africa. Only a few computational resources exist for isiZulu and its related Nguni languages, yet the imperative for tool development exists. We focus on natural language generation, and the grammar options and preferences in particular, which will inform verbalization of knowledge representation languages and could contribute to machine translation. The verbalization pattern specification shows that the grammar rules are elaborate and there are several options of which one may have preference. We devised verbalization patterns for subsumption, basic disjointness, existential and universal quantification, and conjunction. This was evaluated in a survey among linguists and non-linguists. Some differences between linguists and non-linguists can be observed, with the former much more in agreement, and preferences depend on the overall structure of the sentence, such as singular for subsumption and plural in other cases.

C. Maria Keet, Langa Khumalo

FrameNet CNL: A Knowledge Representation and Information Extraction Language

Abstract

The paper presents a FrameNet-based information extraction and knowledge representation framework, called FrameNet-CNL. The framework is used on natural language documents and represents the extracted knowledge in a tailor-made Frame-ontology from which unambiguous FrameNet-CNL paraphrase text can be generated automatically in multiple languages. This approach brings together the fields of information extraction and CNL, because a source text can be considered belonging to FrameNet-CNL, if information extraction parser produces the correct knowledge representation as a result. We describe a state-of-the-art information extraction parser used by a national news agency and speculate that FrameNet-CNL eventually could shape the natural language subset used for writing the newswire articles.

Guntis Barzdins

INAUT, a Controlled Language for the French Coast Pilot Books Instructions nautiques

Abstract

We describe INAUT, a controlled natural language dedicated to collaborative update of a knowledge base on maritime navigation and to automatic generation of coast pilot books (Instructions nautiques) of the French National Hydrographic and Oceanographic Service SHOM. INAUT is based on French language and abundantly uses georeferenced entities. After describing the structure of the overall system, giving details on the language and on its generation, and discussing the three major applications of INAUT (document production, interaction with ENCs and collaborative updates of the knowledge base), we conclude with future extensions and open problems.

Yannis Haralambous, Julie Sauvage-Vincent, John Puentes

Are Style Guides Controlled Languages?

The Case of Koenig & Bauer AG

Abstract

Controlled languages for industrial application are often regarded as a response to the challenges of translation and multilingual communication [3, pp. 52-53], [5, p. 212], [2, pp. i-iii]. This paper presents a quite different approach taken by Koenig & Bauer AG, where the main goal was the improvement of the authoring process for technical documentation. Most importantly, this paper explores the notion of a controlled language and demonstrates how style guides can emerge from non-linguistic considerations. Moreover, it shows the transition from loose language recommendations into precise and prescriptive rules and investigates whether such rules can be regarded as a full-fledged controlled language.

Karolina Suchowolec

Lexpresso: A Controlled Natural Language

Abstract

This paper presents an overview of ‘Lexpresso’, a Controlled Natural Language developed at the Defence Science & Technology Organisation as a bidirectional natural language interface to a high-level information fusion system. The paper describes Lexpresso’s main features including lexical coverage, expressiveness and range of linguistic syntactic and semantic structures. It also touches on its tight integration with a formal semantic formalism and tentatively classifies it against the PENS system.

Adam Saulwick

A CNL for Contract-Oriented Diagrams

Abstract

We present a first step towards a framework for defining and manipulating normative documents or contracts described as Contract-Oriented (C-O) Diagrams. These diagrams provide a visual representation for such texts, giving the possibility to express a signatory’s obligations, permissions and prohibitions, with or without timing constraints, as well as the penalties resulting from the non-fulfilment of a contract. This work presents a CNL for verbalising C-O Diagrams, a web-based tool allowing editing in this CNL, and another for visualising and manipulating the diagrams interactively. We then show how these proof-of-concept tools can be used by applying them to a small example.

John J. Camilleri, Gabriele Paganelli, Gerardo Schneider

Handling Non-compositionality in Multilingual CNLs

Abstract

In this paper, we describe methods for handling multilingual non-compositional constructions in the framework of GF. We specifically look at methods to detect and extract non-compositional phrases from parallel texts and propose methods to handle such constructions in GF grammars. We expect that the methods to handle non-compositional constructions will enrich CNLs by providing more flexibility in the design of controlled languages. We look at two specific use cases of non-compositional constructions: a general-purpose method to detect and extract multilingual multiword expressions and a procedure to identify nominal compounds in German. We evaluate our procedure for multiword expressions by performing a qualitative analysis of the results. For the experiments on nominal compounds, we incorporate the detected compounds in a full SMT pipeline and evaluate the impact of our method in machine translation process.

Ramona Enache, Inari Listenmaa, Prasanth Kolachina

Controlled Natural Language Generation from a Multilingual FrameNet-Based Grammar

Abstract

This paper presents a currently bilingual but potentially multilingual FrameNet-based grammar library implemented in Grammatical Framework. The contribution of this paper is two-fold. First, it offers a methodological approach to automatically generate the grammar based on semantico-syntactic valence patterns extracted from FrameNet-annotated corpora. Second, it provides a proof of concept for two use cases illustrating how the acquired multilingual grammar can be exploited in different CNL applications in the domains of arts and tourism.

Dana Dannélls, Normunds Gruzitis

Architecture of a Web-Based Predictive Editor for Controlled Natural Language Processing

Abstract

In this paper, we describe the architecture of a web-based predictive text editor being developed for the controlled natural language PENG^ASP. This controlled language can be used to write non-monotonic specifications that have the same expressive power as Answer Set Programs. In order to support the writing process of these specifications, the predictive text editor communicates asynchronously with the controlled natural language processor that generates lookahead categories and additional auxiliary information for the author of a specification text. The text editor can display multiple sets of lookahead categories simultaneously for different possible sentence completions, anaphoric expressions, and supports the addition of new content words to the lexicon.

Stephen Guy, Rolf Schwitter

Explaining Violation Traces with Finite State Natural Language Generation Models

Abstract

An essential element of any verification technique is that of identifying and communicating to the user, system behaviour which leads to a deviation from the expected behaviour. Such behaviours are typically made available as long traces of system actions which would benefit from a natural language explanation of the trace and especially in the context of business logic level specifications. In this paper we present a natural language generation model which can be used to explain such traces. A key idea is that the explanation language is a CNL that is, formally speaking, regular language susceptible transformations that can be expressed with finite state machinery. At the same time it admits various forms of abstraction and simplification which contribute to the naturalness of explanations that are communicated to the user.

Gordon J. Pace, Michael Rosner

A Brief State of the Art of CNLs for Ontology Authoring

Abstract

One of the main challenges for building the Semantic web is Ontology Authoring. Controlled Natural Languages CNLs offer a user friendly means for non-experts to author ontologies. This paper provides a snapshot of the state-of-the-art for the core CNLs for ontology authoring and reviews their respective evaluations.

Hazem Safwat, Brian Davis

Backmatter

Title: Controlled Natural Language
Editors: Brian Davis
Kaarel Kaljurand
Tobias Kuhn
Publisher: Springer International Publishing
Electronic ISBN: 978-3-319-10223-8
Print ISBN: 978-3-319-10222-1
DOI: https://doi.org/10.1007/978-3-319-10223-8

Springer Professional

Controlled Natural Language

4th International Workshop, CNL 2014, Galway, Ireland, August 20-22, 2014. Proceedings

About this book

Table of Contents

Frontmatter

Embedded Controlled Languages

Controlled Natural Language Processing as Answer Set Programming: An Experiment

How Easy Is It to Learn a Controlled Natural Language for Building a Knowledge Base?

Linguistic Analysis of Requirements of a Space Project and Their Conformity with the Recommendations Proposed by a Controlled Natural Language

Evaluating the Fully Automatic Multi-language Translation of the Swiss Avalanche Bulletin

Towards an Error Correction Memory to Enhance Technical Texts Authoring in LELIE

RuleCNL: A Controlled Natural Language for Business Rule Specifications

Toward Verbalizing Ontologies in isiZulu

FrameNet CNL: A Knowledge Representation and Information Extraction Language

INAUT, a Controlled Language for the French Coast Pilot Books Instructions nautiques

Are Style Guides Controlled Languages?

Lexpresso: A Controlled Natural Language

A CNL for Contract-Oriented Diagrams

Handling Non-compositionality in Multilingual CNLs

Controlled Natural Language Generation from a Multilingual FrameNet-Based Grammar

Architecture of a Web-Based Predictive Editor for Controlled Natural Language Processing

Explaining Violation Traces with Finite State Natural Language Generation Models

A Brief State of the Art of CNLs for Ontology Authoring

Backmatter

Premium Partner