Skip to main content

2010 | Buch

Semantic Processing of Legal Texts

Where the Language of Law Meets the Law of Language

herausgegeben von: Enrico Francesconi, Simonetta Montemagni, Wim Peters, Daniela Tiscornia

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Inhaltsverzeichnis

Frontmatter

Legal Text Processing and Information Extraction

Frontmatter
Legal Language and Legal Knowledge Management Applications
Abstract
This work is an investigation into the peculiarities of legal language with respect to ordinary language. Based on the idea that a shallow parsing approach can help to provide enough detailed linguistic information, this work presents the results obtained by shallow parsing (i.e. chunking) corpora of Italian and English legal texts and comparing them with corpora of ordinary language. In particular, this paper puts the emphasis of how understanding the syntactic and lexical characteristics of this specialised language has practical importance in the development of domain–specific Knowledge Management applications.
Giulia Venturi
Named Entity Recognition and Resolution in Legal Text
Abstract
Named entities in text are persons, places, companies, etc. that are explicitly mentioned in text using proper nouns. The process of finding named entities in a text and classifying them to a semantic type, is called named entity recognition. Resolution of named entities is the process of linking a mention of a name in text to a pre-existing database entry. This grounds the mention in something analogous to a real world entity. For example, a mention of a judge named Mary Smith might be resolved to a database entry for a specific judge of a specific district of a specific state. This recognition and resolution of named entities can be leveraged in a number of ways including providing hypertext links to information stored about a particular judge: their education, who appointed them, their other case opinions, etc.
This paper discusses named entity recognition and resolution in legal documents such as US case law, depositions, and pleadings and other trial documents. The types of entities include judges, attorneys, companies, jurisdictions, and courts.
We outline three methods for named entity recognition, lookup, context rules, and statistical models. We then describe an actual system for finding named entities in legal text and evaluate its accuracy. Similarly, for resolution, we discuss our blocking techniques, our resolution features, and the supervised and semi-supervised machine learning techniques we employ for the final matching.
Christopher Dozier, Ravikumar Kondadadi, Marc Light, Arun Vachher, Sriharsha Veeramachaneni, Ramdev Wudali
Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
Abstract
Information extraction from legal documents is an important and open problem. A mixed approach, using linguistic information and machine learning techniques, is described in this paper. In this approach, top-level legal concepts are identified and used for document classification using Support Vector Machines. Named entities, such as, locations, organizations, dates, and document references, are identified using semantic information from the output of a natural language parser. This information, legal concepts and named entities, may be used to populate a simple ontology, allowing the enrichment of documents and the creation of high-level legal information retrieval systems.
The proposed methodology was applied to a corpus of legal documents - from the EUR-Lex site – and it was evaluated. The obtained results were quite good and indicate this may be a promising approach to the legal information extraction problem.
Paulo Quaresma, Teresa Gonçalves
Approaches to Text Mining Arguments from Legal Cases
Abstract
This paper describes recent approaches using text-mining to automatically profile and extract arguments from legal cases. We outline some of the background context and motivations. We then turn to consider issues related to the construction and composition of corpora of legal cases. We show how a Context-Free Grammar can be used to extract arguments, and how ontologies and Natural Language Processing can identify complex information such as case factors and participant roles. Together the results bring us closer to automatic identification of legal arguments.
Adam Wyner, Raquel Mochales-Palau, Marie-Francine Moens, David Milward

Legal Text Processing and Construction of Knowledge Resources

Frontmatter
Automatic Identification of Legal Terms in Czech Law Texts
Abstract
Law texts including constitution, acts, public notices and court judgements form a huge database of texts. As many texts from small domains, the used sublanguage is partially restricted and also different from general language (Czech). As a starting collection of data, the legal database Lexis containing approx. 50,000 Czech law documents has been chosen. Our attention is concentrated mostly on noun groups, which are the main candidates for law terms. We were able to recognize 3992 such different noun groups in the selected text samples. The paper also presents results of the morphological analysis, lemmatization, tagging, disambiguation, and the basic syntactic analysis of Czech law texts as these tasks are crucial for any further sophisticated natural language processing. The verbs in legal texts have been explored preliminarily as well. In this respect, we are trying to explore how the linguistic analysis can help in identification of the semantic nature of law terms.
Karel Pala, Pavel Rychlý, Pavel Šmerk
Integrating a Bottom–Up and Top–Down Methodology for Building Semantic Resources for the Multilingual Legal Domain
Abstract
This article presents a methodology for multilingual legal knowledge acquisition and modelling. It encompasses two comlementary strategies. On the one hand, there is the top–down definition of the conceptual structure of the legal domain under consideration on the basis of expert jugdment. This structure is language–independent, modeled as an ontology, and can be aligned with other ontologies that capture similar or complementary knowledge, in order to provide a wider conceptual embedding. Another top–down approach is the exploitation of the explicit structure of legal texts, which enables the targeted identification of text spans that play an ontological role and their subsequent inclusion in the knowledge model.
On the other hand, the linguistically motivated, text-based bottom–up population and incremental refinement of this conceptual structure using (semi-)automatic NLP techniques, maximizes the completeness and domain-specificity of the resulting knowledge.
The proposed methodology is concerned with the relation between these two differently derived types of knowledge, and defines a framework for interfacing lexical and ontological knowledge, the result of which offers various perspectives on multilingual legal knowledge.
Two case-studies combining bottom-up and top-down methodologies for knowledge modelling and learning are presented as illustrations of the methodology.
Enrico Francesconi, Simonetta Montemagni, Wim Peters, Daniela Tiscornia
Ontology Based Law Discovery
Abstract
The vast amount of information freely available on the Web constitutes a unparalleled resource for automatic knowledge discovery and learning. In this article we propose a study on Ontology Induction for individual laws based on corpora comparison that exploits a domain corpus automatically generated from the Web; in particular we present a case study on the Italian “Legge Bassanini” (59/1997, 127/1997 - concerning the simplification and decentralization of administrative procedures).
We evaluate how the induced ontological characterizations might vary according to different factors, such as the genre (e.g. news vs. social media),the learning algorithm, the text analysis granularity, etc; the main contribution of the paper consists of highlighting the structural difference emerging from the learned predicates, and in showing how the learning mechanism might provide valuable information on how laws are perceived in different layers of the civil society.
Alessio Bosca, Luca Dini
Multilevel Legal Ontologies
Abstract
In order to manage the conceptual representation of European law we have proposed the Legal Taxonomy Syllabus (LTS) and the related methodology. In this paper we consider further issues that emerged during the testing and use of the LTS, and how we took them into account in the new release of the system. In particular, we address the problem of representing interpretation of terms besides the definitions occurring in the directives, the problem of normative change, and the process of planning legal reforms of European law. We show how to include into the Legal Taxonomy Syllabus the Acquis Principles - which have been sketched by scholars in European Private Law from the so-called Acquis communautaire -, how to take the temporal dimension into account in ontologies, and how to apply natural language processing techniques to the legal texts being annotated in the LTS.
Gianmaria Ajani, Guido Boella, Leonardo Lesmo, Marco Martin, Alessandro Mazzei, Daniele P. Radicioni, Piercarlo Rossi

Legal Text Processing and Semantic Indexing, Summarization and Translation

Frontmatter
Semantic Indexing of Legal Documents
Abstract
Automated semantic indexing may be the answer to insufficient recall of legal information systems. The semantic web has created powerful tools for mark-up and ontological representation. Re-use in legal applications remains low due to inappropriate knowledge structuring and lack of automated knowledge acquisition. This paper describes the state of the art and proposes a dynamic electronic legal commentary.
Erich Schweighofer
Automated Classification of Norms in Sources of Law
Abstract
The research described here attempts to achieve automated support for modelling sources of law for legal knowledge based systems and services. Many existing systems use models that do not reflect the entire law, and simplify parts of the text. These models are difficult to validate, maintain and re-use. We propose to create an intermediate model that has an isomorphic representation of the structure of the original text. A first step towards automated modelling is the detection and classification of provisions in sources of law. A list of different categories of norms and provisions that are used in Dutch legal texts is presented. These categories can be identified by the use of typical text patterns. Next, the results of experiments in automated classification of provisions using these patterns are presented. 91% of 592 sentences in fifteen different Dutch laws were classified correctly. Some conclusions about the generality of the approach are drawn and future research is outlined.
Emile de Maat, Radboud Winkels
Efficient Multilabel Classification Algorithms for Large-Scale Problems in the Legal Domain
Abstract
In this paper we apply multilabel classification algorithms to the EUR-Lex database of legal documents of the European Union. For this document collection, we studied three different multilabel classification problems, the largest being the categorization into the EUROVOC concept hierarchy with almost 4000 classes. We evaluated three algorithms: (i) the binary relevance approach which independently trains one classifier per label; (ii) the multiclass multilabel perceptron algorithm, which respects dependencies between the base classifiers; and (iii) the multilabel pairwise perceptron algorithm, which trains one classifier for each pair of labels. All algorithms use the simple but very efficient perceptron algorithm as the underlying classifier, which makes them very suitable for large-scale multilabel classification problems. The main challenge we had to face was that the almost 8,000,000 perceptrons that had to be trained in the pairwise setting could no longer be stored in memory. We solve this problem by resorting to the dual representation of the perceptron, which makes the pairwise approach feasible for problems of this size. The results on the EUR-Lex database confirm the good predictive performance of the pairwise approach and demonstrates the feasibility of this approach for large-scale tasks.
Eneldo Loza Mencía, Johannes Fürnkranz
An Automatic System for Summarization and Information Extraction of Legal Information
Abstract
This paper presents an information system for legal professionals that integrates natural language processing technologies such as text classification and summarization. We describe our experience in the use of a mix of linguistics aware transductor and XML technologies for bilingual information extraction from judgements in both French and English within a legal information and summarizing system. We present the context of the work, the main challenges and how they were tackled by clearly separating language and domain dependent terms and vocabularies. After having been developed on the immigration law domain, the system was easily ported to the intellectual property and tax law domains.
Emmanuel Chieze, Atefeh Farzindar, Guy Lapalme
Evaluation Metrics for Consistent Translation of Japanese Legal Sentences
Abstract
We propose new translation evaluation metrics for legal sentences. Since most previous metrics, that have been proposed to evaluate machine translation systems, prepare human reference translations and assume that several correct translations exist for one source sentence. However, readers usually believe that different translations denote different meanings, so that the existence of several translations of one legal expression may confuse them. Therefore, since translation variety is unacceptable and consistency is crucial in legal translation, we propose two metrics to evaluate the consistency of legal translations and illustrate their performances by comparing them with other metrics.
Yasuhiro Ogawa, Kazuhiro Imai, Katsuhiko Toyama
Backmatter
Metadaten
Titel
Semantic Processing of Legal Texts
herausgegeben von
Enrico Francesconi
Simonetta Montemagni
Wim Peters
Daniela Tiscornia
Copyright-Jahr
2010
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-12837-0
Print ISBN
978-3-642-12836-3
DOI
https://doi.org/10.1007/978-3-642-12837-0