Skip to main content

Open Access 09.09.2024 | Original Research

LAWSUIT: a LArge expert-Written SUmmarization dataset of ITalian constitutional court verdicts

verfasst von: Luca Ragazzi, Gianluca Moro, Stefano Guidi, Giacomo Frisoni

Erschienen in: Artificial Intelligence and Law

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Large-scale public datasets are vital for driving the progress of abstractive summarization, especially in law, where documents have highly specialized jargon. However, the available resources are English-centered, limiting research advancements in other languages. This paper introduces LAWSUIT, a collection of 14K Italian legal verdicts with expert-authored abstractive maxims drawn from the Constitutional Court of the Italian Republic. LAWSUIT presents an arduous task with lengthy source texts and evenly distributed salient content. We offer extensive experiments with sequence-to-sequence and segmentation-based approaches, revealing that the latter achieve better results in full and few-shot settings. We openly release LAWSUIT to foster the development and automation of real-world legal applications.
Hinweise
All authors contributed equally to this paper.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Text summarization (Sharma and Sharma 2023) is a persistent pursuit of natural language processing (NLP). Recently, there has been a growing interest in abstractive summarization (AS), which involves paraphrasing the essential details of textual documents in a succinct and accessible language (Zhang et al. 2022). This surge in interest is primarily attributed to the availability of large pretrained language models (Lewis et al. 2020; Guo et al. 2022; Moro et al. 2022) and publicly accessible datasets spanning various domains (Cohan et al. 2018; Narayan et al. 2018). One particularly impactful domain in real-world applications is law, where documents often consist of thousands of words filled with jargon and intricate expressions. The complexity of these documents makes their comprehension a time-consuming and labor-intensive process, even for legal experts (Kanapala et al. 2019). Therefore, legal AS (Moro et al. 2023) is a practical, useful, and essential task to promote knowledge acquisition. Lamentably, current legal summarization corpora are almost entirely devoted to English. There are yet no Italian datasets for legal AS, which limits the research, access, and elaboration of lawful texts and their implications to Italian law practitioners.
To fill this gap, we present the first large-scale Italian legal AS dataset, LAWSUIT,1 consisting of 14,000 source documents with expert-authored summaries (Fig. 1). LAWSUIT allows the community to study the AS of legal verdicts in a critical application setting found in the Constitutional Court of the Italian Republic (CCIR). As the highest court in Italy for constitutional law matters, the CCIR maintains a comprehensive record of legal verdicts, accessible through an open-access data portal (https://​dati.​cortecostituzion​ale.​it). In particular, highly qualified legal experts meticulously crafted and reviewed each ruling and the accompanying maxims, i.e., the synopses that clarify the events and core decisions. Beyond its potential to expand summarization capabilities, benefiting legal NLP benchmarks and tangible uses, LAWSUIT boasts several key features:
  • The average number of source and target words is significantly higher than that contemplated by existing Italian summarization datasets (+269% and +589%, respectively) (Casola and Lavelli 2021), encouraging long document AS for Italian.
  • In contrast to existing English legal benchmarks (Kornilova and Eidelman 2019; Huang et al. 2021), the salient content in the input is more uniformly distributed, and summary-worthy words are not concentrated in specific sections of the text. This characteristic poses a unique challenge for summarization tasks, requiring comprehensive processing of the entire source document rather than relying on localized content.
  • Unlike many summarization datasets that undergo automatic construction processes (Cohan et al. 2018; Grusky et al. 2018; Sharma et al. 2019; Huang et al. 2021), our inputs and targets are authored by experts. Specifically, university law professors and magistrates are responsible for drafting the verdicts, and the corresponding maxims are compiled by the supervisory office. The supervisory office oversees the formal control of the texts in collaboration with the study assistants of the President. This meticulous procedure ensures a high level of quality control and supervision, mitigating the risk of model hallucination (Maynez et al. 2020), which refers to the generation of unfaithful outputs due to training on targets that contain facts that are not supported by the source text.
We benchmark LAWSUIT using various extractive and abstractive summarization solutions, including a segmentation-based pipeline that demonstrates superior performance in both full and few-shot summarization scenarios, namely training models with all or just a few dozen instances.
Natural Language Processing for Legal Texts Legal NLP has been the subject of extensive research in various legal tasks, including information retrieval (Chalkidis et al. 2018; Hendrycks et al. 2021; Sansone and Sperlí 2022), question answering (Ravichander et al. 2019; Huang et al. 2020; Kien et al. 2020; Zhong et al. 2020), text classification (Chalkidis et al. 2019; Tuggener et al. 2020; Chalkidis et al. 2021, 2022; Feng et al. 2022), and automatic text summarization (Duan et al. 2019; Zhong et al. 2019; Bhattacharya et al. 2021; Elaraby and Litman 2022; Moro and Ragazzi 2022; Moro et al. 2023). Moreover, recent endeavors have increasingly shifted towards non-English applications (Metsker et al. 2019; Wang et al. 2019; Malik et al. 2021; Xiao et al. 2021; Bakker et al. 2022; Qin et al. 2022; Niklaus et al. 2023), including Italian (Bellandi et al. 2022; Galli et al. 2022; Licari and Comandé 2022; Tagarelli and Simeri 2022), thus stimulating research in low-resource language contexts. These studies focus on fetching past court decisions and predicting outcomes. To the best of our knowledge, we pioneer the exploration of Italian legal document summarization grounded in a non-common law system. This is achieved by releasing the first large-scale legal abstractive summarization dataset derived from the CCIR.
Legal Document Summarization Previous studies on automatic summarization of court proceedings have mainly relied on extractive approaches, where the predicted summary consists of exact sentences taken directly from the source material. This ranges from unsupervised methods (Farzindar and Lapalme 2004; Saravanan et al. 2006; Polsley et al. 2016; Zhong et al. 2019) to supervised methods (Liu and Chen 2019). In contrast, our work is centered on AS, where the output is a rewording of the input. Abstraction is more closely aligned with the actual conditions of legal practices (Kornilova and Eidelman 2019; Sharma et al. 2019; Huang et al. 2021; Shen et al. 2022).
Legal Summarization Datasets Given the crucial social role of the legal domain and the growing demand for summarization tools (Jain et al. 2021), numerous datasets have been introduced, covering various types of documents. These include case reports (Greenleaf et al. 1995), judgments (Grover et al. 2004), legislative bills (Kornilova and Eidelman 2019), patents (Sharma et al. 2019), government reports (Huang et al. 2021), and federal civil rights LawSuITs (Shen et al. 2022). Diversity allows the development of large language models pretrained on legal text (Chalkidis et al. 2020; Zheng et al. 2021). Our dataset presents unique challenges, as it consists of lengthy domain-specialized documents that are inherently difficult to summarize. Challenges arise due to (i) the scattered distribution of summary-worthy information throughout the input and (ii) the occasional presence of formulaic expressions in the targets. Previous works have introduced Italian summarization datasets featuring short documents, such as those in the news (Landro et al. 2022) and articles related to Wikipedia (Ladhak et al. 2020; Casola and Lavelli 2021). Instead, LAWSUIT comprises longer texts (refer to Table 1), establishing itself as the first dataset for the Italian long document summarization task. Notably, the dataset includes gold summaries, diverging from the cases where summaries are automatically generated using the first sentence (Ladhak et al. 2020) or by concatenating the title with a description (Landro et al. 2022), a procedure that can compromise the factual consistency of models trained on such data (Maynez et al. 2020). In terms of legal contributions, LAWSUIT establishes the first large-scale legal resource, distinguishing itself from smaller datasets (Aumiller et al. 2022) and those designed exclusively for extractive summarization (Licari and Comandé 2022).
Italian Legal Language Models Since 2017, legal text analysis has been revolutionized by transformer-based architectures. Despite these advancements, accurately training machines to understand legal language remains a significant challenge. Legal language models, often benefitting from specialized pretraining (Chalkidis et al. 2020), currently achieve state-of-the-art results on various benchmarks (Zheng et al. 2021; Chalkidis et al. 2022). However, public generative models pretrained on legal corpora are scarce, forcing reliance on general models instead (Hwang et al. 2022; Shen et al. 2022). An extensive literature review (Katz et al. 2023) shows that English dominates open-source Legal NLP (56%), followed by Chinese (\(\approx \)10%), with models usually requiring extensive training hardware (Song et al. 2023). The main challenge in applying current language models to Italian documents is their inadequate training in comprehending instructions in that language.
Some contributions have explored Italian encoder-only models. UmBERTo (110 M) (Parisi et al. 2020) is the result of continual pretraining on top of RoBERTa using whole-word masking with filtered resources from Wikipedia and CommonCrawl. In the legal domain, Licari et al. introduced Italian-Legal-BERT (111 M) (Licari and Comandé 2022), which runs the continual pre-training of a general-domain Italian BERT model with civil law corpora and from scratch pre-training based on CamemBERT (111 M) (Martin et al. 2020), with distilled and long-document variants. However, these works fall outside of our scope, which is instead concerned with generative architectures.
In this sense, Mattei et al. proposed GePpeTto (117 M) (Mattei et al. 2020), a GPT-2 model fine-tuned on Italian Wikipedia and the ItWac corpus (Baroni et al. 2009), mainly aimed at text completion. Sarti and Nissim devised IT5 (60 M, 220 M, 738 M) (Sarti and Nissim 2024), a family of encoder-decoder transformer models pretrained on a cleaned version of the Italian mC4 corpus,2 a web-crawled text collection that includes more than 40 billion words. La Quatra and Cagliero submitted BART-IT (Quatra and Cagliero 2023), an Italian version of BART trained on the same mixture of IT5. Santilli et al. released Camoscio (7B) (Santilli and Rodolà 2023), an instruction-tuned LLaMA model trained with low-rank (LoRA) adaptation on an Italian (ChatGPT-translated) version of Stanford Alpaca (Taori et al. 2023).
Regarding conversational objectives, Bacciu et al. (2023) presented Fauno (7B/13B), a LoRA fine-tuned version of Baize (Xu et al. 2023) in heterogeneous synthetic Italian datasets. LLaMantino (7B, 13B, 70B) (Basile et al. 2023) is a family of Italian-adapted LLaMA-2 models, trained using QLoRA on the IT5 data mixture. Maestrale (7B)3 is a Mistral model specialized in Italian through continual pretraining and instruction fine-tuning. Zefiro (7B)4 is a porting of the Mistral model to the Italian language, obtained through continual pretraining on a random subset of Oscar and Wikipedia data, supervised fine-tuning on UltraChat-ITA (silver translation), and DPO alignment (Rafailov et al. 2023) with the ultrafeedback preference dataset (silver translation). Minerva (350 M, 1B, 3B)5 is a family of large language models pretrained from scratch on 660B tokens (330B in Italian, 330B in English). DanteLLM (Bacciu et al. 2024) is a QLoRA fine-tuned version of Mistral-Instruct (7B), trained on the Italian SQuAD dataset (Croce et al. 2018), 25K sentences from the Europarl dataset (Koehn 2005), the Fauno’s Quora dataset and the Camoscio dataset. Notably, as underscored by the current leaderboard dedicated to Italian language modeling available on HuggingFace,6 the results achievable by language models pretrained from scratch on the Italian language are significantly inferior compared to those achievable by foundational models that have undergone extensive pretraining on larger multilingual corpora.
Taking LAWSUIT as a testbed, we fairly compare the effectiveness and efficiency of available Italian-adapted or multi-lingual encoder–decoder models with million-scale parameters, which offer significant advantages in hardware-constrained scenarios. We examine their adaptability to different tasks, languages, and amounts of labeled training data.
Table 1
Comparison of LAWSUIT to other related datasets
   
Source
Target
Source \(\rightarrow \) Target
Dataset
Samples
Vocab
Words
Sents
Words
Sents
Cov
Den
Comp-W
Comp-S
English (Legal)
BillSum (Kornilova and Eidelman 2019)
23,455
70,596
1673.25
46.49
212.96
4.99
0.90
6.86
12.21
15.61
GovReport (Huang et al. 2021)
19,463
316,428
8765.01
298.69
556.30
18.10
0.94
9.08
17.85
31.32
Italian
IlPost (Landro et al. 2022)
44,001
128,205
199.17
5.88
30.13
1.91
0.69
1.97
7.05
3.22
Mlsum-IT (Landro et al. 2022)
39,997
161,942
200.89
6.07
17.55
1.06
0.63
1.80
12.73
5.88
Fanpage (Landro et al. 2022)
84,365
260,915
349.67
11.67
49.85
1.96
0.72
2.72
7.84
6.84
WikiLingua-IT (Ladhak et al. 2020)
50,943
136,750
412.97
21.17
43.95
4.98
0.67
1.33
11.68
4.50
Wits (Casola and Lavelli 2021)
699,400
3,493,303
845.27
23.93
64.42
2.44
0.55
0.99
15.92
11.42
Legal
EUR-Lex-Sum (Aumiller et al. 2022)
1403
206,791
17590.71
421.10
1106.05
30.96
0.85
6.40
16.24
13.93
ITA-CaseHold (Licari and Comandé 2022)
1101
58,411
4687.86
152.10
799.78
22.77
0.96
113.43
7.18
8.38
LAWSUIT (Ours)
14,000
112,876
3120.49
98.26
444.56
17.64
0.92
14.07
8.19
6.92
Measurements include dataset and vocabulary size, number of words and sentences in the source and target texts, and source–target coverage, density, and compression ratio of the words and sentences. Except for the number of samples, all reported values are averaged across all instances

3 LAWSUIT

LAWSUIT is a large-scale Italian AS dataset that collects CCIR-sourced legal verdicts, serving as a new and demanding benchmark for the NLP community. The corpus comprises 14,000 long texts from 1956 to 2022, classified into orders and judgments (see Fig. 2 for statistics based on the year), each meticulously paired with a set of maxims (concatenated to form the target summary). The term order denotes a legal ruling declared during the judicial proceeding to settle questions and disputes verified during the trial, while judgment refers to a legal ruling declared by the judicial body at the end of the trial. The maxims summarize the judicial process by encapsulating key details about the ruling, general legal characteristics, references, and the final provision. Each source consists of three informative sections (Fig. 1):
  • Epigraph: the introduction detailing the main gist of the ruling and the context in which the request is addressed.
  • Text: the core content that highlights the legal extremes.
  • Decision: the concluding segment of the ruling that contains the final provisions of the Court.
Additional details on data access are provided in Appendix 6.

3.1 LAWSUIT Processing

To construct the LAWSUIT dataset, we started with 21,331 instances obtained from the CCIR open data, discarding verdicts that were too recent and lacked an associated maxim.
Size Filtering We retained records with summary lengths between 100 and 2000 words and source texts between 1000 and 20,000 words, resulting in 14,072 instances. This step aimed to remove unbalanced texts (i.e., outliers) that do not reflect the average typical characteristics of these legal documents. Specifically, excessively short texts often lack sufficient context, while very long texts can introduce redundancy. Therefore, retaining only texts within the specified length ranges ensures that the model is exposed to a more homogeneous and representative sample of the data, leading to better generalization and performance.
Duplicate Data Removal To identify and eliminate duplicate instances, we employed an approach similar to Kornilova and Eidelman (2019), resulting in 14,054 instances. Technically, the process involved (i) removing stop words and the 30 most common terms (e.g., article, law, court), (ii) vectorizing texts using scikit-learn’s CountVectorizer, (iii) computing average cosine similarity between the texts and the summaries for each pair of verdicts, and (iv) iteratively adding verdicts while discarding instances highly similar (>96%) to any verdicts already included. Duplicates were often orders on related subjects pronounced in close time frames or written as corrections to previous orders with drafting errors; we kept the most recent version of the document.
Compression Ratio Filtering The compression ratio quantifies how much a document is condensed to produce its summary. This metric is defined as the ratio between the number of words in the input and its corresponding target (Grusky et al. 2018). As for the size filtering procedure, we aimed to create a high-quality homogeneous dataset without outliers. Thus, due to the considerable variation in the sizes of both sources and summaries, we retained verdicts with a compression ratio between 2 and 70, ending up with 14,000 instances.
Quality Control and Text Cleaning Besides traditional operations (e.g., extra spaces and newline chars disposal), we implemented a preprocessing pipeline aimed at ensuring the textual quality of the dataset. The steps involved were as follows:
  • Removal of epigraph and decision prefixes containing personal names, such as those of the president, editors, and directors;
  • Given that several instances lacked a clear structural separation between the epigraph and the main text, we have explicitly delineated these sections to enhance overall structuring;
  • Elimination of duplicated notes found at the end of maxims, which were deemed irrelevant due to versioning management;
  • Replacement of apostrophes in vowels with correct accents by applying UTF-8 encoding;
  • Removal of publisher, judge, and reviewer information at the end of the decision;
  • Deletion of backslashes in ruling codes to address encoding errors present in the original JSON files.
On the other hand, certain elements, recognized for their high frequency and factuality role, were intentionally retained: (i) cf., bibliographic citations pointing to external references; (ii) artt., legal jargon signifying the citation of multiple articles; (iii) personal names, except for publisher, judge, and reviewer.
Train-test Split. Following prior work (Cripwell et al. 2023), we employ a dataset split size of 90-5-5 to ensure sufficient training data while allowing for adequate validation and testing. Therefore, the dataset was divided into train (90%, 12,600 samples), validation (5%, 700), and test (5%, 700) sets. We carried out a proportional stratified random sampling without replacement, considering the categorization and lengths of the sources. To be precise, we (i) evenly distributed the orders and judgments in the splits to have the same percentage of each type in each split and (ii) divided them equally based on their lengths (tertiles are calculated to assign \(\{\text {short}, \text {medium}, \text {long}\}\) classes). Table 2 shows the equal distribution of documents among the splits, specifying fine-graned statistics about the number of words within the three source sections (i.e., epigraph, text, and decision).
Table 2
LAWSUIT’s train-test splits
LAWSUIT
Statistic
Mean
Min
Max
25th
50th
75th
Train (90%, 12,600)
Source words
3116.03
932
19,531
1562.00
2349.50
3864.00
Target words
449.54
94
1995
221.00
340.00
560.00
Source sents
93.18
6
998
41.00
68.00
118.00
Target sents
18.10
1
137
9.00
14.00
23.00
Epigraph words
164
56
4953
116
140
179
Text words
2827
212
18,876
1319
2089
3542
Decision words
128
27
1640
80
105
151
Validation (5%, 700)
Source words
3180.53
938
18,671
1572.00
2390.50
3762.25
Target words
479.30
103
1984
236.00
368.50
608.00
Source sents
94.01
10
666
42.00
67.00
114.50
Target sents
19.47
1
110
9.00
15.00
24.00
Epigraph words
172
69
3803
115
144
185
Text words
2873
174
17,549
1324
2129
3445
Decision words
138
27
1201
84
108
158
Test (5%, 700)
Source words
3124.34
937
18,085
1589.75
2398.50
3869.50
Target words
447.11
101
1865
214.00
335.50
586.25
Source sents
94.25
10
810
41.00
72.00
117.00
Target sents
18.49
1
116
8.00
14.00
24.00
Epigraph words
164
70
2093
114
138
179
Text words
2830
572
17,121
1358
2130
3577
Decision words
132
28
713
82
109
155
The results indicate the equal distribution of samples across splits

3.2 Dataset characterization

Table 1 offers a comparative analysis of key statistics between LAWSUIT and other relevant text summarization datasets. Concretely, we present corpus sizes and the average number of words and sentences in both source documents and target summaries, calculated using the NLTK library (Bird 2006). Additionally, we furnish information on the average coverage, density, and compression ratio of extractive fragments in terms of words and sentences, as defined by Grusky et al. (2018). In particular, LAWSUIT exhibits longer source texts and target summaries than existing datasets, except for GovReport, where the targets contain more source-related tokens, indicating greater coverage. Moreover, we observe a slightly smaller frequency of vocabulary words w.r.t. corpora with a higher number of documents, suggesting that while our dataset is more concise, it still captures the essential linguistic diversity, maintaining a robust and representative vocabulary distribution. In terms of legal contributions, it is noteworthy that LAWSUIT represents the inaugural dataset composed exclusively of Italian documents, distinguishing it from multilingual datasets that include only limited subsets of Italian texts.
Summary Abstractiveness Compared to previous contributions, we observe that the summaries within LAWSUIT exhibit substantial coverage (0.92).7 This implies that the target generations contain fewer unsupported entities and facts, ensuring faithfulness while mitigating the risk of hallucinations, an imperative consideration in legal applications. Simultaneously, we note that the density, which represents the average length of the extractive fragments, is the highest among the datasets, suggesting that the summaries in LAWSUIT might have an extractive nature. To dispel this assumption, following established methodologies (See et al. 2017; Chen and Bansal 2018; Sharma et al. 2019), we compute abstractiveness as the fraction of novel n-grams in the summary that do not appear in the input source. Figure 3 illustrates the percentage of novel sentences and n-grams with \(n \in [1,10]\), indicating that many summary details are not verbatim extractions from sources but rather abstractive, despite the high density.
Coverage increment and section informativeness Given the obstacles presented by lengthy sources in identifying salient content for inclusion in the summary, we examine the coverage increment of summary-worthy unigrams in the input. To achieve this, we divide each source into ten equal partitions, according to Huang et al. (2021). Specifically, we count the number of unique unigrams that also appear in the target, accumulated from the document’s start to the end of each partition. Figure 4 illustrates that relevant information is spread throughout documents, with novel salient unigrams being covered more uniformly as more content is consumed. This means that LAWSUIT exhibits less positional bias and requires a comprehensive reading of the entire input. To further elucidate this aspect, we break down the informativeness across the three sections (i.e., epigraph, text, decision) by computing the percentage of unique salient unigrams occurring in each text span. Figure 5 demonstrates that the core content of a summary is generally concentrated in the text section of the ruling to which it refers. However, through a deeper qualitative investigation (Appendix 10), we discover that the epigraph and the decision are essential at both ends of the generation, where the maxim is likely to mention references and final court judgments, briefly rephrasing and aggregating them.
Summary Formulaicness Legal summaries often incorporate common expressions and shared standard structures, enabling models to learn patterns during training without a deep understanding of the input. To quantify this phenomenon, we analyze the formulaicness of summaries in the training set by calculating the longest common subsequence (LCS) (Lin 2004). Technically, we compute the LCS for each subset by taking 5 non-overlapping subsets of 100 random samples.8 Figure 6 highlights that summaries in LAWSUIT have a lower occurrence of structural patterns across targets than related English legal datasets, especially BillSum, despite the latter having shorter summaries. In fact, the longer the resumes, the higher the chance that words overlap.

4 Experiments

Our goal with LAWSUIT is to establish a novel and challenging benchmark to advance Legal NLP in real-world applications. Therefore, our experiments with LAWSUIT delve into two research questions.
  • RQ1: can current models effectively summarize Italian legal verdicts to support legal practitioners and automate downstream applications?
  • RQ2: given the high cost of human annotation for creating labeled examples, can models be configured to produce useful summaries in real-world scenarios with only a handful of training instances?
To answer them, we set up the following tasks.
  • Full summarization: this involves training models with the entire set of available instances in LAWSUIT, totaling 12,600 samples.
  • Few-shot summarization: this simulates a scenario marked by data scarcity for model supervision due to the high cost of labeling. To replicate this setting, models are provided with only the first 10 and 100 training samples, aligning with previous works (Zhang et al. 2020; Chen and Shuai 2021; Moro and Ragazzi 2022).9

4.1 Models

We investigate the performance of multiple extractive and abstractive solutions on LAWSUIT.
Extractive Baselines For upper-bound performance, we consider an oracle: Oracle-Opt selects, for each of the k gold summary sentences—extrapolated with the NLTK library—the input sentence that maximizes the average ROUGE-{1,2,L} F1 score. LexRank-PLM is a graph-based unsupervised extractive summarizer that leverages LexRank’s eigenvector centrality (Erkan and Radev 2004) and a pretrained language model (paraphrase-multilingual-MiniLM-L12-v2) to enhance sentence representation during text encoding. Epi, Text, and Dec select the first n sentences from the epigraph, text, and decision, respectively. Cat concatenates the first \(\nicefrac {n}{3}\) sentences from the three sections, maintaining the occurrence order in the source document.10
Table 3
Performance of small (s), base (b), and large (l) models on LAWSUIT with 10, 100, and full (12,600) training instances
 
LAWSUIT (10)
LAWSUIT (100)
LAWSUIT (12,600 - Full)
Models
Input
R-1\(_{f1}\)
R-2\(_{f1}\)
R-L\(_{f1}\)
BS\(_{f1}\)
R-1\(_{f1}\)
R-2\(_{f1}\)
R-L\(_{f1}\)
BS\(_{f1}\)
R-1\(_{f1}\)
R-2\(_{f1}\)
R-L\(_{f1}\)
BS\(_{f1}\)
Extractive
Oracle-Opt
63.03
46.06
60.07
63.03
46.06
60.07
63.03
46.06
60.07
Epi
26.48
10.21
24.02
26.48
10.21
24.02
26.48
10.21
24.02
Text
39.23
18.37
36.00
39.23
18.37
36.00
39.23
18.37
36.00
Dec
30.98
18.55
29.50
30.98
18.55
29.50
30.98
18.55
29.50
Cat
38.45
18.36
35.64
38.45
18.36
35.64
38.45
18.36
35.64
LexRank-Plm
38.90
21.31
36.37
38.90
21.31
36.37
38.90
21.31
36.37
Abstractive
mBart-l
512
34.54
13.21
31.17
19.44
37.36
17.20
34.43
22.90
41.16
21.06
38.20
29.41
IT5-s
512
31.71
13.13
29.43
14.41
34.10
16.03
31.67
19.52
39.07
19.81
36.38
31.16
IT5-b
512
23.51
11.35
21.74
8.12
24.32
11.39
22.67
8.53
33.81
15.20
31.75
19.94
IT5-l
512
20.96
7.53
19.39
0.61
24.34
8.28
22.94
1.48
33.71
14.96
31.63
21.69
mBart-l
1024
34.41
12.53
30.83
19.55
37.74
17.59
34.83
23.00
41.84
21.52
38.52
33.85
IT5-s
1024
33.98
13.67
31.00
17.88
36.16
16.89
33.15
21.18
42.14
21.74
38.78
32.68
IT5-b
1024
24.64
12.06
22.85
9.67
27.01
12.38
25.14
9.11
37.56
19.41
34.50
25.33
IT5-l
1024
22.69
8.76
21.27
2.20
26.62
8.86
24.93
3.62
37.04
18.54
34.54
23.98
IT5-s
2048
34.94
13.80
31.43
19.17
38.48
18.58
34.71
24.30
47.64
27.52
44.13
36.97
IT5-b
2048
25.90
12.57
24.10
10.94
28.86
12.25
26.62
9.26
41.46
22.41
38.24
27.91
IT5-l
2048
25.00
8.88
23.39
3.43
29.00
10.40
27.05
8.65
39.94
23.00
37.08
28.45
IT5-s
4096
35.66
14.91
32.34
20.68
40.87
21.29
37.10
27.99
50.94
31.87
47.59
38.69
IT5-b
4096
26.76
12.90
24.86
10.71
31.34
13.62
28.83
12.63
44.91
27.72
41.67
32.51
IT5-l
4096
24.56
8.66
23.21
1.29
31.40
12.11
29.02
11.66
43.97
27.20
40.97
31.63
IT5-s
8192
35.82
15.03
32.59
21.01
41.63
22.19
37.94
27.08
51.58
33.77
48.50
15.99
mT5-b
8192
14.70
1.22
14.26
10.45
33.78
16.53
30.47
21.67
46.44
30.19
43.42
13.62
SegSumm
mBart-l
512
46.86
24.19
43.00
30.13
46.95
24.45
43.08
29.85
48.01
26.89
45.50
32.00
IT5-s
512
46.86
24.48
43.04
28.14
47.97
26.28
44.24
30.86
50.91
30.73
47.50
34.46
mBart-l
1024
45.83
21.32
41.54
29.30
47.37
23.51
43.40
31.32
48.51
27.17
47.50
32.50
IT5-s
1024
46.59
24.17
42.47
28.72
48.41
26.42
44.31
31.79
52.56
32.84
49.27
14.72
IT5-s
2048
40.75
19.18
36.89
24.50
46.49
25.18
42.41
30.24
53.90
34.72
50.75
38.65
IT5-s
4096
37.87
16.75
34.43
22.40
44.34
24.07
40.52
29.47
53.97
34.97
50.80
38.79
IT5-s
8192
36.21
15.28
32.94
21.25
42.25
22.77
38.69
28.61
52.61
34.41
49.51
38.48
The best overall scores are in bold. Moreover, for each model (i.e., IT5 and mBart), the configuration achieving the best average scores is highlighted in bold
Abstractive Baselines mBART (Liu et al. 2020; Tang et al. 2020) is a sequence-to-sequence model largely pretrained on multiple languages using Bart’s denoising objective (Lewis et al. 2020); it can process inputs up to 1024 tokens.11IT5 (Sarti and Nissim 2022) is a text-to-text model centered on T5 (Raffel et al. 2020) and pretrained on Italian corpora; it is unbounded in the input dimension thanks to its positional embedding mechanism. mT5 (Xue et al. 2021) is a T5-based model pretrained on multiple languages. We employ the small (s), base (b), and large (l) model checkpoints (see Table 8 for technical details).
Segmentation-based Pipeline Inspired by the necessity of (i) comprehensively processing the entire input source without overlooking details, (ii) minimizing the risk of model hallucination through careful consideration of small, highly correlated source–target pairs, and (iii) generating precise summaries in scenarios with limited data availability, we introduce a straightforward yet powerful language-agnostic segmentation-based approach. Let \(\mathcal {D}=\{d_1, \dots , d_n\}\) be the long input document, where each \(d_i\) is a sentence; this solution divides \(\mathcal {D}\) into non-overlapping chunks (i.e., a set of consecutive sentences), each containing a maximum of \(\mathcal {M}\) tokens. Specifically, we start with an empty chunk c and iteratively add sentences until \(\mathcal {M}\). To train our solution, we assign each summary sentence—selected with NLTK—to the chunk that maximizes the ROUGE-1 precision metric—creating small high-correlated training pairs (\(c_i, t_i\))—as defined by Moro and Ragazzi (2022). On the other hand, at inference time, the chunks are summarized, and their predictions are concatenated in the order of occurrence of the source document to produce the final summary. We refer to this approach with the term SegSumm, depicted in Fig. 7.
This pipeline is related to but differs from Moro and Ragazzi (2022) because the segmentation is model-agnostic—and thus language-agnostic—making it applicable to multiple languages, including Italian (see Sect. 4.4.2 for experiments on English legal texts).
Note: when small values of \(\mathcal {M}\) are used, the document is divided into multiple chunks. Consequently, if the number of summary sentences is fewer than the number of chunks, the chunks without corresponding target sentences are discarded during the training process. In toher words, the above target-matching algorithm does not ensure \(t_i \ne \emptyset \), which is evident if the number of chunks is greater than the target sentences. However, the summaries in LAWSUIT have an averagely higher number of sentences (see Table 1) than the hypothetical number of source chunks.

4.2 Implementation and hardware

For abstractive summarizers, we fine-tune the models using the PyTorch (Paszke et al. 2019) implementation from the HuggingFace library (Wolf et al. 2019), leveraging publicly available checkpoints. The models are trained for 3 epochs on a single NVIDIA GeForce RTX 3090 GPU (24GB VRAM) from an internal cluster, with a learning rate of 5e-5. In the decoding process, we apply beam search with 4 beams and n-gram repetition blocks for n>5, using 1024 as the maximum summary length. The seed is fixed at 42 for reproducibility. Additional details are available in Appendix 7.

4.3 Evaluation setup

To gauge a comprehensive evaluation, we conduct both a quantitative and qualitative analysis with automatic metrics and human annotators, respectively.
Automatic ROUGE-{1,2,L} F1 (Lin 2004) and BERTScore F1 (BS) (Zhang et al. 2020) are used to calculate the lexical overlap and estimated semantic overlap between the generated and the gold summaries, respectively. For BS, we use the bert-base-multilingual-cased model and set rescale_with_baseline=True.
Human Given the potential failure of automatic metrics to act as reliable proxies for summary quality dimensions, we perform an in-depth human evaluation. In the steps of previous work (Narayan et al. 2018; Fabbri et al. 2019; Moro et al. 2023), we use Best-Worst Scaling (Louviere and Woodworth 1991; Louviere et al. 2015), which is more trustworthy and less expensive than rating scales (Kiritchenko and Mohammad 2017). Pointedly, we provide 3 legal expert evaluators with the source document and the artificial summaries from the best-performed models. We ask them to rank predictions according to informativeness, fluency, and factuality. The assessment is done on 30 randomly selected documents from LAWSUIT’s test set by comparing all the possible summary pair combinations, i.e., 90 binary preference annotations per participant. We randomize the order of pairs and per-example sources to guard the rating against being gamed. Elicited labels are used to establish a score in \([-1,1]\) for each summary source s: \(\%_{best}(s) - \%_{worst}(s)\). The annotation process takes \(\approx \)6 h per judge, 18 in total. Appendix 9 illustrates our setup.

4.4 Results and discussion

4.4.1 Dataset

Italian Legal Ruling Summarization Table 3 presents the performance of each baseline in LAWSUIT, where the summarizers are tasked with extracting and synthesizing crucial information from lengthy sources, utilizing varying numbers of training samples (Table 4). Table 5 presents the transfer learning performance of IT5-small when trained and tested across orders and judgments. In terms of abstractive summarizers, models that allow long inputs (IT5) perform better than input-constrained models (mBart) on all tasks, underscoring the utility of an extensive input context. Longer input also brings consistent performance gains for IT5 across tasks. Interestingly, SegSumm significantly exceeds the baselines (p-value \(< 0.05\) with student t-test) in full and few-shot summarization. Human evaluation results are reported in Table 6. SegSumm is rated the best in all dimensions. These findings demonstrate that existing language models can effectively support Italian legal summarization, particularly when equipped with segmentation capabilities (RQ1). Indeed, text segmentation allows the model to process the entire document without truncating information that exceeds the maximum input size permitted by its architecture. In plausible few-shot scenarios, SegSumm emerges as the sole model offering satisfactory effectiveness (RQ2). We provide some examples of the generated summaries for few shot training and training on entire data in Appendix 12.
Table 4
ROUGE F1 performance of IT5-small (8192) in full summarization for generating the summaries from individual and combination of sections
E
T
D
All (E + T + D)
R-1
R-2
R-L
R-1
R-2
R-L
R-1
R-2
R-L
R-1
R-2
R-L
34.90
15.15
32.75
50.15
31.73
46.96
37.30
18.44
35.27
51.58
33.77
48.50
Compared when giving the model all sections concatenated (All), performance is lower, suggesting that all sections are informative for the summary
Table 5
Transfer learning performance
 
Orders (491)
Judgments (209)
Train data (size)
R-1\(_{f1}\)
R-2\(_{f1}\)
R-L\(_{f1}\)
R-1\(_{f1}\)
R-2\(_{f1}\)
R-L\(_{f1}\)
Orders (3757)
57.39
41.52
40.84
49.20
28.73
26.52
Judgments (8843)
51.79
34.18
33.54
49.62
29.12
27.02
Full (12600)
58.10
42.71
43.17
52.20
31.71
29.23
Table 6
Human evaluation ranking
Model (input)
Info (\(\uparrow \))
Fluency (\(\uparrow \))
Factuality (\(\uparrow \))
Average (\(\uparrow \))
SegSumm-IT5-s (4096)
0.767
0.344
0.678
0.596
IT5-s (8192)
\(-\)0.122
\(-\)0.378
\(-\)0.111
\(-\)0.204
mBart-l (1024)
\(-\)0.644
0.033
\(-\)0.567
\(-\)0.393
Kendall’s W Inter-rater agreement
0.69
0.51
0.80
0.67
Generating summaries from sections To further explore the importance of reading the entire input source in LAWSUIT, we train summarizers on individual sections (i.e., epigraph, text, decision) to generate the summary.
As shown in Table 4, the model trained on the three concatenated sections reveals significant improvements compared to processing only the epigraph and the decision. As the text section is longer, models processing only that part are marginally less efficient. However, this analysis indicates that all the source sections are sufficiently informative to produce a comprehensive summary. This further underscores the importance of avoiding the truncation of longer texts due to context limitations and instead leveraging segmentation-based approaches.
Table 7
Performance of models on BillSum in few-shot summarization
 
BillSum (10)
BillSum (100)
Models
R-1\(_{f1}\)
R-2\(_{f1}\)
R-L\(_{f1}\)
R-1\(_{f1}\)
R-2\(_{f1}\)
R-L\(_{f1}\)
MTL-ABS
41.22
18.61
26.33
45.29
22.74
29.56
Pegasus
40.48
18.49
27.27
44.78
26.40
34.40
Bart
45.59
22.83
29.05
49.74
27.15
32.93
Se3
46.58
22.03
28.23
49.88
26.84
33.33
Lw-Ml
46.64
25.07
30.90
48.18
27.18
33.28
Athena
47.57
24.14
30.35
51.59
29.36
35.04
SegSumm
49.51
25.88
31.45
51.79
29.59
35.01
We use the SegSumm on top of IT5-small (4096). The best scores are in bold and the second best results are underlined

4.4.2 Method

Generality of SegSumm Due to the language independence of the SegSumm approach, we gauge its generality to analyze whether it can improve legal applications in other languages. Specifically, we experiment with the BillSum dataset under low-resource conditions (Moro et al. 2023, 2023, 2023), simulating a real-world legal scenario.12 We compare with existing solutions concentrating on few-shot summarization: Pegasus (Zhang et al. 2020), a transformer-based model pretrained with a summarization-specific objective that allows for fast adaptation with few labeled samples. MTL-ABS (Chen and Shuai 2021), a meta-transfer learning approach that augments training data with similar corpora. Se3 (Moro and Ragazzi 2022), a segmentation-based solution equipped with metric learning. Athena (Moro and Ragazzi 2023), a segmentation-based model with a dynamic learned size of the chunks. LW-ML (Huh and Ko 2022), a meta-learning algorithm that inserts a lightweight module into the attention mechanism of a pretrained language model. Regarding our solution, we test SegSumm on top of Bart-base (Lewis et al. 2020). Table 7 points out that SegSumm largely outperforms previous models, confirming the usefulness of text segmentation for legal texts.

5 Conclusion

In this paper, we introduced LAWSUIT, the first large-scale dataset for the abstractive summarization of long Italian legal verdicts. The challenges presented by LAWSUIT include lengthy sources, the uniform distribution of relevant information throughout the input, and the lower presence of formulaic patterns in the targets. Through an extensive series of experiments, we found that a text segmentation pipeline significantly outperforms other methods in both few-shot and full summarization. We anticipate that LAWSUIT will contribute to the development of real-world legal summarization systems and stimulate research towards effective long-range solutions for Italian legal documents. Future works will extend LAWSUIT to new tasks, such as cross-domain and in-domain ruling classification (Domeniconi et al. 2014a, b, 2015, 2016, 2017; Moro et al. 2018), legal reasoning (Guha et al. 2023; Moro et al. 2024), open-domain question answering (Frisoni et al. 2024), corpus-level knowledge extraction (Frisoni and Moro 2020), and lay summarization (Ragazzi et al. 2024). By representing the source document as a graph (Moro et al. 2023), researchers could explore efficient segmentation and summarization techniques based on graph sparsification (Domeniconi et al. 2014, 2016; Zaheer et al. 2020), eventually using distributed algorithms (Lodi et al. 2010; Cerroni et al. 2015) to handle a large number of nodes and edges.

6 Limitations

As there are no publicly available Italian datasets specifically designed to summarize long legal documents, we conducted a comparison between LAWSUIT and existing English legal datasets. However, it is crucial to acknowledge that English and Italian differ not only in language but also in dictionary and style, potentially introducing linguistic biases when comparing statistics. While SegSum serves as a baseline, it requires the generation of at least one sentence for each chunk during inference. Although this is suitable for extensive summaries, such as those found in typical long document summarization datasets such as LAWSUIT and GovReport, it might be less scalable for concise summaries. Regarding low-resource experiments, our method is guided by published top-tier work, but we recognize that the sample selection process could significantly impact the final results. Hence, future contributions should explore various subsets of the training set to gain a more comprehensive understanding.

Acknowledgements

This research is partially supported by (i) Artificial Intelligence for Public Administration Connected (AI-PACT): https://​disi-unibo-nlp.​github.​io/​projects/​aipact/​, (ii) the Complementary National Plan PNC-I.1, “Research initiatives for innovative technologies and pathways in the health and welfare sector” D.D. 931 of 06/06/2022, DARE—DigitAl lifelong pRevEntion initiative, code PNC0000002, CUP B53C22006450001, (iii) the PNRR, M4C2, FAIR—Future Artificial Intelligence Research, Spoke 8 “Pervasive AI,” funded by the European Commission under the NextGeneration EU program. We thank the Maggioli Group (https://​www.​maggioli.​com/​who-we-are/​company-profile) for granting the Ph.D. scholarship to Luca Ragazzi from November 2020 to January 2024.

Declarations

Ethical approval

The data used to create LAWSUIT have the license that gives us the right to transform and share the dataset publicly.13 Regarding the experimental methods, due to the high societal impact of the legislation, experts should verify the quality of the inferred summaries to make the proposed solutions work in real-world applications.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Anhänge

Appendix A: LAWSUIT release

Accessing The dataset files are stored in JSON format and will be uploaded to Google Drive and GitHub in case of acceptance. We will also integrate our dataset into the HuggingFace Datasets library (Lhoest et al. 2021).
License LAWSUIT is distributed under the CC-BY-SA 3.0 IT license, while the sources and summaries are already in the public domain. The authors assume all responsibility in the event of a breach of rights and accept the dataset licenses.
Maintenance The authors intend to provide long-term support for LAWSUIT, monitor usage, and produce necessary updates.

Appendix B: Implementation details

We trained abstractive summarization models using the Adam optimizer with \(\beta _1=0.9\) and \(\beta _2=0.99\), setting the learning rate to 5e-5 with linear scheduling. We evaluated the performance in the validation set at the end of each epoch, using only the first 100 samples to save time. We then tested the checkpoint on the test set that performed best on the validation set. Table 8 lists the model checkpoints used for pretrained models. Table 9 reports the batch size used during the training.
Table 9
The models’ training batch size with different input sizes. Oom means “out of memory” exception
Size
IT5-small
IT5-base
IT5-large
mBart-large
512
32
16
8
2
1024
16
8
4
1
2048
8
4
2
Oom
4096
4
2
1
Oom
8192
2
Oom
Oom
Oom

Appendix c: Insights on extractive results

Table 10 reports results with extractive baselines by varying the number of extracted sentences.
Table 10
Extractive summarization ROUGE F1 scores with different numbers of target sentences
n
Epi R-1 / R-2 / R-L
Text R-1 / R-2 / R-L
Dec R-1 / R-2 / R-L
Cat R-1 / R-2 / R-L
LexRank-PLM R-1 / R-2 / R-L
3
23.43/10.17/21.29
23.86/11.42/21.41
25.55/16.09/24.25
18.74/8.32/16.98
31.18/14.67/27.68
6
26.09/10.29/23.67
34.67/16.71/31.33
29.69/17.99/28.24
32.60/15.26/29.92
39.08/18.66/35.16
9
26.37/10.24/23.92
38.53/17.90/34.87
30.54/18.34/29.05
36.61/17.34/33.78
41.06/20.30/37.42
12
26.45/10.24/24.00
39.54/18.14/35.92
30.82/18.48/29.33
37.72/17.95/34.94
41.01/21.03/37.79
15
26.47/10.22/24.02
39.50/18.16/36.04
30.92/18.52/29.44
38.21/18.18/35.40
40.06/21.18/37.17
18
26.48/10.21/24.02
39.23/18.37/36.00
30.98/18.55/29.50
38.45/18.36/35.64
38.90/21.31/36.37
21
26.48/10.21/24.02
38.69/18.52/35.64
31.03/18.57/29.54
38.64/18.48/35.80
37.61/21.24/35.37

Appendix D: Human evaluation

The interface with human evaluation instructions is sketched in Fig. 8.

Appendix E: The role of epigraph and decision

Table 11 shows how the gold summaries compress the ruling epigraph and the decision information.
Table 11
Epigraph (yellow) and Decision (green) overlap between a sample document and its gold summary. English translated. (Color figure online)
https://static-content.springer.com/image/art%3A10.1007%2Fs10506-024-09414-w/MediaObjects/10506_2024_9414_Tab11_HTML.png

Appendix F: Data example

Figure 9 reports the original Italian-language LAWSUIT version of the example depicted in Fig. 1.

Appendix G: Examples of generated summaries

Tables 12 and 13 present qualitative examples from two distinct instances within the LAWSUIT test set. In particular, we provide the summaries generated by the top-notch baselines, such as IT5-small-8192 and IT5-small-512-SegSumm, with a varying size of trainable examples.
Table 12
Qualitative examples on the instance #3 of the LAWSUIT’s test set
Gold Summary
A differenza dell’adozione dei minori, disciplinata dalla legge n. 184 del 1983, l’adozione di persone maggiori di età regolata dagli artt. 291 e seguenti cod. civ. non implica necessariamente l’instaurarsi o il permanere della convivenza familiare, non determina la soggezione alla potestà dei genitori adottivi, nè impone all’adottante l’obbligo di mantenere, istruire ed educare l’adottato. Inoltre, l’adozione di persone maggiori di età è essenzialmente determinata dal consenso dell’adottante e dell’adottando, giacchè il controllo del Tribunale verte sui requisiti che legittimano l’adozione, essendo rimesso al giudice il ristretto potere di valutare se l’adozione "conviene" all’adottando (art. 312 cod. civ.), senza alcun discrezionale apprezzamento dell’interesse della persona dell’adottando e senza gli incisivi controlli previsti per l’adozione di minori. Risulta quindi razionalmente giustificata una diversità di disciplina anche nel superamento - consentito solo per l’adozione di minori, in casi eccezionali che esigono una specifica indagine e la rigorosa valutazione del giudice - del limite posto dal divario di età ordinariamente richiesto tra adottante ed adottando. (Non fondatezza della questione di legittimità costituzionale, in riferimento agli artt. 2, 3 e 30, primo e secondo comma, Cost. dell’art. 291 cod. civ. nella parte in cui, limitatamente al caso del coniuge che chieda di adottare - come nella specie - il figlio dell’altro coniuge, non consente al giudice di ridurre, in presenza di determinate circostanze, il previsto intervallo di età tra adottante e adottando). - Sulla illegittimità della mancata previsione, in casi eccezionali, nella disciplina dell’adozione di minori, della possibilità di ridurre l’intervallo di età tra adottante e adottando, v. s. nn. 44/1990 e 148/1992
IT5-small-8192-10
La Corte d’appello di Roma ha sollevato, con ordinanza emessa dal 10 dicembre 1991, questione di legittimità costituzionale dell’art. 291 del codice civile, nella parte in cui, limitatamente al caso del coniuge che chiede di adottare il figlio, anche adottivo dell’altro coniuge, non consente al giudice di derogare, in caso di adozione ordinaria, l’obbligo dell’adottando di mantenere, istruire ed educare l’adottato, in ragione del raccordo tra l’unità familiare ed il momento ineliminabilmente formativo ed educativo, l’adozione di età di età di adozione di età di età dell’adozione di non consente al giudice, in caso di età di età, l’obbligo di età dell’obbligo dell’adozione di età dellart. 381 del minore, in caso dell’adozione di un erede, in caso di non consente al limite del minore, l’obbligo del minore, per la prescrizione relativa al giudice di un erede, l’adottato ed educare ed educare ed educare ed educare l’adottando, per preservare i valori di età dell’adottato ed educare l’adozione di adozione di età del minore, per preservare i principi di adottare, per tutelare i principi di età dell’adottante e l’adozione di adottare, l’adozione, l’adozione del figlio dell’obbligo di età di età del minore
IT5-small-8192-100
L’organica disciplina della adozione dei minori, dettata dalla legge n. 184 del 1983, ha come essenziale e dominante obiettivo - in conformità alle convenzioni internazionali volte a disciplinare e proteggere in modo specifico i minori (si veda in proposito la Convenzione di Strasburgo sulla loro adozione, ratificata in forza della legge 22 maggio 1974, n. 357) - l’interesse di quest’ultimo ad un ambiente familiare stabile ed armonioso, nel quale si possa sviluppare la loro personalità, godendo di un equilibrato contesto affettivo ed educativo che caratterizza lo sviluppo del minore in una famiglia ed adottante, non determina la soggezione alla potestà dei genitori adottivi, non consente al giudice, secondo comma, secondo comma, in caso eccezionali che esigono una specifica indagine e per ampiezza dei poteri attribuiti al giudice, per salvaguardare i valori di protezione del minore, assicurando allo stesso una particolare protezione che solo quella famiglia ed impone al giudice, per tutelare i valori di tutelare i valori di protezione che soltanto quella famiglia può assicurare allo stesso una famiglia ed educativo che solo quella famiglia può garantire allo stesso una famiglia, assicurando allo stesso un equilibrato contesto educativo ed educativo che caratterizzano lo sviluppo del momento ineliminabilmente formativo ed educativo ed educativo che l’adozione di età dell’adottato ed educativo che legittimano l’adozione di un equilibrato contesto familiare stabile ed educativo ed educativo, garantendo allo stesso una famiglia
IT5-small-8192-Full
Non è fondata, in riferimento agli artt. 2, 3, e 30, primo e secondo comma, Cost. la questione di legittimità costituzionale dell’art. 291 cod. civ. nella parte in cui, disciplinando le condizioni dell’adozione di persone maggiori di età, stabilisce che l’adottante deve superare di almeno diciotto anni l’età dell’adottando, senza prevedere che il giudice possa ridurre la differenza di età richiesta da questa disposizione, nel caso di adozione del figlio maggiorenne, anche adottivo, dell’altro coniuge. L’esigenza di evitare gravi danni allo sviluppo della personalità del minore, causati dal venir meno dell’unità di vita e di educazione tra fratelli minori, uno dei quali già adottato, ha consentito inoltre, in casi altrettanto rigorosamente circoscritti ed eccezionali, il superamento dello stesso limite del divario di età massimo tra adottante ed adottato, limite che può essere eccezionalmente superato quando sia indispensabile, secondo il rigoroso apprezzamento del giudice, per salvaguardare i valori di protezione del minore, assicurando allo stesso una famiglia. (Non fondatezza della questione di legittimità costituzionale, in riferimento agli articoli 2, 3, 3, 30 e secondo comma, della Costituzione, nella parte in cui prevede che l’adozione di minori in casi particolari), non consente al giudice di ridurre, quando l’adottando sia di fatto stabilmente inserito nel nucleo familiare e sussistano validi motivi per la realizzazione dell’unità familiare, l’obbligo dell’adottante l’obbligo di mantenere, istruire ed educare l’adottato, in conformità a quanto prescritto dall’art. 147 cod. per i figli nati nel matrimonio (art. 48 della legge n. 184 del 1983). - V. S. n. 148/1992
IT5-small-512-10-SegSumm
La Corte d’appello di Roma ha sollevato, con ordinanza emessa il 10 dicembre 1991, questione di legittimità costituzionale dell’art. 291 del codice civile, nella parte in cui, limitatamente al caso del coniuge che chiede di adottare il figlio, anche adottivo, dell’altro coniuge, non consente al giudice di ridurre, quando l’adottando sia di fatto stabilmente inserito nel contesto familiare e sussistano validi motivi per la realizzazione dell’unità familiare, l’intervallo di età di diciotto anni che la stessa disposizione prevede debba intercorrere tra adottante e adottante. L’adozione di persone maggiori di età ha la finalità di trasmettere il nome di chi non ha discendenti legittimi o legittimati e di dare all’adottando un erede, ma lo scopo dell’istituto, ad avviso del giudice rimettente, non è necessariamente limitato ai risvolti patrimoniali, ben potendo comprendere anche quello di inserire a pieno titolo l’adottando nella famiglia alla quale di fatto partecipa. La Corte d’appello sollecita, in definitiva, l’applicazione anche all’adozione ordinaria delle ragioni in base alle quali è stata ritenuta costituzionalmente illegittima, per l’adozione di minori in casi particolari, la mancata previsione del potere del giudice di accordare una ragionevole riduzione della differenza di età di diciotto anni tra il coniuge adottante ed il minore adottato, quando quest’ultimo sia figlio, anche adottivo, dell’altro coniuge. La Cassazione del 14.2.2016 n. 148 ha stabilito che l’adozione di un minore figlio del coniuge dell’adottante è necessaria per assicurare all’adottato, con l’inserimento a pieno titolo nella famiglia e con l’attribuzione del cognome dei fratelli uterini generati in costanza di matrimonio, il superamento del limite del divario di età tra adottante ed adottato, limite che può essere eccezionalmente superato quando sia indispensabile, secondo il rigoroso apprezzamento del giudice, per salvaguardare i valori di protezione del minore stesso. La Corte d’appello di Roma ha stabilito che nell’adozione di persone maggiori di età al giudice non è attribuito alcun discrezionale apprezzamento dell’interesse della persona dell’adottato, né impone all’adottando l’obbligo di mantenere, istruire ed educare l’adottato, poiché il controllo del Tribunale verte sui requisiti che legittimano l’adozione, essendo rimesso al giudice il ristretto potere di valutare se l’adozione “conviene” all’adottato
Table 13
Qualitative examples on the instance #10 of the LAWSUIT’s test set
Gold Summary
Non sussiste ostacolo alcuno nel nostro sistema costituzionale in ordine al riconoscimento della legittimità di "attività di mero rilievo internazionale delle Regioni" consistenti nello scambio di informazioni utili ovvero nell’approfondimento di conoscenze in materie di comune interesse, oppure, ancora, nell’enunciazione di analoghi intenti ed aspirazioni - allo scopo di favorirne unilateralmente la realizzazione mediante atti propri o, al più, mediante sollecitazione dei competenti organi nazionali - e, come tali, insuscettibili di incidere nei rapporti internazionali o di impegnare la responsabilità internazionale dello Stato. Non spettava quindi allo Stato il potere di negare l’assenso alla partecipazione di un delegato della Regione Umbria al Forum di Amsterdam del 22-23 novembre 1980 e conseguentemente è annullato il telegramma della Presidenza del Consiglio dei Ministri, Ufficio Regioni n. 200/8761 del 22 novembre 1980 e la delibera della Commissione di controllo sugli atti della Regione Umbria n. 5719 del 28 novembre 1980. - cfr. S.n. 179/1987
IT5-small-8192-10
La Regione Umbria ha sollevato conflitto di attribuzione nei confronti dello Stato in relazione al telegramma della Presidenza del Consiglio dei Ministri, Ufficio Regioni, del 22 novembre 1980, depositato in cancelleria il 31 gennaio successivo ed iscritto al n. 4 del registro ricorsi 1981, in quanto dotate di autonomia costituzionalmente garantita ai sensi degli articoli 5, 114, 115, 117, 117, 118 e 122 Cost. La sentenza di questa Corte n. 179 del 1987 ha ritenuto di dover ravvisare il fondamento di possibili interventi inibitori dello Stato nei confronti delle attività internazionali delle Regioni, in quanto in tema della rappresentanza politica di unilaterre la responsabilità internazionale, in tema della Regione Umbria del Forum sul tema della Presidenza del Comitato olandese del Forum sul di unila delibera della Presidenza del 22 novembre 1980 n. 5719, non può farsi rientrare tra gli atti propri atti propri o di tale da impegnare la decisione della Regione Umbria, in tema della Giunta regionale, in tema della Presidenza del 29 novembre 1980, in materia di cui si limita a seguito all’e la legittimità delle rispettive condotte, in tema della responsabilità internazionale, in materia di tali attività regionali, in tema della rappresentanza internazionale, in tema di tali attività che non vincolare la responsabilità internazionale del Comitato olandese, in materia di interesse, in tema della partecipazione al Forum sul tema della responsabilità internazionale del Consiglio dei Ministri del Comitato olandese n
IT5-small-8192-100
La semplice partecipazione di un delegato della Regione Umbria al Forum internazionale di Amsterdam, indetto dal Comitato olandese per la pace ed il disarmo, può farsi rientrare tra quelle attività di mero rilievo internazionale consentite alle Regioni, in quanto insuscettibili di incidere nei rapporti internazionali o di impegnare la responsabilità internazionale dello Stato. Né tale partecipazione, date le finalità umanitarie connesse all’incontro o di informazione (in materie tecniche) oppure la previsione di partecipazione a manifestazioni dirette ad agevolare il processo culturale o economico, in ambito locale, ovvero l’enunciazione di propositi intesi ad armonizzare unilateralmente le rispettive condotte, in cui questa Corte, nella sentenza della Regione Umbria n. 179 del 1987, n. 179/1988/1984/8761/8761/8661/8761/1981, non può farsi rientrare tra le Regioni, in riferimento agli interessi del Paese. La semplice partecipazione di altri (di norma omologhi) organismi esteri, aventi per oggetto finalità umanitarie legate all’incontro o la previsione di informazioni utili ovvero l’approfondimento di conoscenze in materie tecniche) oppure, ad enunciare analoghi intenti ed aspirazioni delle tendenze ivi rappresentate - in materia tecnica, ovvero l’escettibili di interesse o economico, ovvero l’impegno la responsabilità internazionale
IT5-small-8192-Full
Non spettava allo Stato il potere di negare l’assenso alla partecipazione di un delegato della Regione Umbria al Forum di Amsterdam del 22-23 novembre 1980 e conseguentemente annulla il telegramma della Presidenza del Consiglio dei Ministri, Ufficio Regioni, del 22 novembre 1980, n. 200/8761 e della delibera della Commissione di controllo sugli atti della Regione Umbria n. 5719 del 28 novembre 1980. La semplice partecipazione di un delegato regionale al Forum internazionale di Amsterdam, indetto dal Comitato olandese per la pace ed il disarmo, pu farsi rientrare tra quelle "attività di mero rilievo internazionale" consentite alle Regioni, in quanto insuscettibili di incidere nei rapporti internazionali o di impegnare la responsabilità internazionale dello Stato, ma si limitano a prevedere "lo scambio di informazioni utili ovvero l’approfondimento di conoscenze in materie di comune interesse, oppure, ancora, ad enunciare analoghi intenti ed aspirazioni, proponendosi di favorirne unilateralmente la realizzazione mediante atti propri o, al più, mediante sollecitazione dei competenti organi nazionali. - S. n. 179/1987/1988. - S. nn. 179/1988 e 179/1988. - V. la citata sentenza n. 179 del 1987, n. 179/87/1988
IT5-small-512-10-SegSumm
È manifestamente inammissibile il conflitto di attribuzione sollevato dalla Regione Umbria in relazione al telegramma della Presidenza del Consiglio dei Ministri, Ufficio Regioni, datato 22 novembre 1980, n. 200/8761 e alla conseguente delibera della Commissione di controllo sugli atti della Regione Umbria del 28 novembre 1980, con la quale è stata annullata la delibera della Giunta regionale del 18 novembre 1980 n. 6427 avente ad oggetto la partecipazione di un delegato della Regione al Forum sul disarmo indetto dal Comitato olandese per la pace. 5, 114, 115, 117, 118 e 122 Cost. 4 del d.P.R. 612/77, in quanto dotate di autonomia costituzionalmente garantita ai sensi degli artt. 5, 114, 114, 117, 118, 118, 122 Cost. - Sulla rilevanza della questione di legittimità costituzionale dell’art. Non sussiste ostacolo alcuno nel nostro sistema costituzionale a riconoscere la legittimità di tali attività, per le quali può essere accolta la denominazione, proposta dalla dottrina, di "attività di mero rilievo internazionale delle Regioni" (nella specie, in quanto le Regioni non pongono in essere veri accordi né assumono diritti ed obblighi tali da impegnare la responsabilità internazionale dello Stato) ma si limitano a prevedere "lo scambio di informazioni utili ovvero l’approfondimento di conoscenze in materie di comune interesse, oppure, ancora, ad enunciare analoghi intenti ed aspirazioni, proponendosi di favorirne unilateralmente la realizzazione mediante atti propri o, al più, mediante sollecitazione dei competenti organi nazionali. Non spettava allo Stato il potere di negare l’assenso alla partecipazione di un delegato della Regione Umbria al Forum internazionale di Amsterdam del 22-23 novembre 1980 e conseguentemente annulla il telegramma della Presidenza del Consiglio dei Ministri, Ufficio Regioni n. 200/8761 del 22 novembre 1980 e la delibera della Commissione di controllo sugli atti della Regione Umbria n. 5719 del 28 novembre 1980
Fußnoten
7
Coverage is defined as the average fraction of token spans that can be jointly identified in both the source and target. For example, a coverage of 0.92 indicates that 92% of the summary words appear in extractive source fragments.
 
8
Random subsets avoid the need for computing all combinations of targets in the dataset, thereby alleviating space and time complexity.
 
9
We use the same sample size also for the validation set.
 
10
We choose \(n=18\) (also for LexRank-Plm) because it is the average number of target sentences in LAWSUIT (Table 1). If a section has \(<18\) sentences, we take all available sentences. Results with different n are provided in Appendix 8.
 
11
Preliminary experiments led us to prefer mBART over BART-IT, as the latter showed extremely low quality in the legal domain.
 
12
We used the same training settings but set 100 and 300 as the min and max summary size and 3 as the n-grams penalty.
 
Literatur
Zurück zum Zitat Bacciu A, Campagnano C, Trappolini G, Silvestri F (2024) DanteLLM: let’s push Italian LLM research forward! In: Calzolari N, Kan M-Y, Hoste V, Lenci A, Sakti S, Xue N (eds.) Proceedings of the 2024 Joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024), pp 4343–4355. ELRA and ICCL, Torino, Italia. https://aclanthology.org/2024.lrec-main.388 Bacciu A, Campagnano C, Trappolini G, Silvestri F (2024) DanteLLM: let’s push Italian LLM research forward! In: Calzolari N, Kan M-Y, Hoste V, Lenci A, Sakti S, Xue N (eds.) Proceedings of the 2024 Joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024), pp 4343–4355. ELRA and ICCL, Torino, Italia. https://​aclanthology.​org/​2024.​lrec-main.​388
Zurück zum Zitat Bacciu A, Trappolini G, Santilli A, Rodolà E, Silvestri F (2023) Fauno: the Italian large language model that will leave you senza parole! In: Nardini FM, Tonellotto N, Faggioli G, Ferrara A (eds) Proceedings of the 13th Italian information retrieval workshop (IIR 2023), Pisa, Italy, June 8–9, 2023. CEUR Workshop Proceedings, vol. 3448, pp 9–17. CEUR-WS.org. https://ceur-ws.org/Vol-3448/paper-24.pdf Bacciu A, Trappolini G, Santilli A, Rodolà E, Silvestri F (2023) Fauno: the Italian large language model that will leave you senza parole! In: Nardini FM, Tonellotto N, Faggioli G, Ferrara A (eds) Proceedings of the 13th Italian information retrieval workshop (IIR 2023), Pisa, Italy, June 8–9, 2023. CEUR Workshop Proceedings, vol. 3448, pp 9–17. CEUR-WS.org. https://​ceur-ws.​org/​Vol-3448/​paper-24.​pdf
Zurück zum Zitat Basile P, Musacchio E, Polignano M, Siciliani L, Fiameni G, Semeraro G (2023) Llamantino: Llama 2 models for effective text generation in italian language. arXiv:2312.09993 Basile P, Musacchio E, Polignano M, Siciliani L, Fiameni G, Semeraro G (2023) Llamantino: Llama 2 models for effective text generation in italian language. arXiv:​2312.​09993
Zurück zum Zitat Cripwell L, Legrand J, Gardent, C (2023) Simplicity level estimate (SLE): a learned reference-less metric for sentence simplification. In: Bouamor H, Pino J, Bali K (eds) Proceedings of the 2023 conference on empirical methods in natural language processing, EMNLP 2023, Singapore, December 6–10, 2023, pp 12053–12059. Association for Computational Linguistics. https://doi.org/10.18653/V1/2023.EMNLP-MAIN.739 Cripwell L, Legrand J, Gardent, C (2023) Simplicity level estimate (SLE): a learned reference-less metric for sentence simplification. In: Bouamor H, Pino J, Bali K (eds) Proceedings of the 2023 conference on empirical methods in natural language processing, EMNLP 2023, Singapore, December 6–10, 2023, pp 12053–12059. Association for Computational Linguistics. https://​doi.​org/​10.​18653/​V1/​2023.​EMNLP-MAIN.​739
Zurück zum Zitat Croce D, Zelenanska A, Basili R (2018) Neural learning for question answering in italian. In: Ghidini C, Magnini B, Passerini A, Traverso P (eds) AI*IA 2018 - advances in artificial intelligence - XVIIth international conference of the Italian Association for artificial intelligence, Trento, Italy, November 20–23, 2018, proceedings. Lecture Notes in Computer Science, vol. 11298, pp 389–402. Springer. https://doi.org/10.1007/978-3-030-03840-3_29 Croce D, Zelenanska A, Basili R (2018) Neural learning for question answering in italian. In: Ghidini C, Magnini B, Passerini A, Traverso P (eds) AI*IA 2018 - advances in artificial intelligence - XVIIth international conference of the Italian Association for artificial intelligence, Trento, Italy, November 20–23, 2018, proceedings. Lecture Notes in Computer Science, vol. 11298, pp 389–402. Springer. https://​doi.​org/​10.​1007/​978-3-030-03840-3_​29
Zurück zum Zitat Domeniconi G, Masseroli M, Moro G, Pinoli P (2014) Discovering new gene functionalities from random perturbations of known gene ontological annotations. In: Fred ALN, Filipe J (eds) KDIR 2014 - Proceedings of the international conference on knowledge discovery and information retrieval, Rome, Italy, 21–24 October, 2014, pp 107–116. SciTePress.https://doi.org/10.5220/0005087801070116 Domeniconi G, Masseroli M, Moro G, Pinoli P (2014) Discovering new gene functionalities from random perturbations of known gene ontological annotations. In: Fred ALN, Filipe J (eds) KDIR 2014 - Proceedings of the international conference on knowledge discovery and information retrieval, Rome, Italy, 21–24 October, 2014, pp 107–116. SciTePress.https://​doi.​org/​10.​5220/​0005087801070116​
Zurück zum Zitat Domeniconi G, Moro G, Pagliarani A, Pasolini R (2015) Markov chain based method for in-domain and cross-domain sentiment classification. In: Fred ALN, Dietz JLG, Aveiro D, Liu K, Filipe J (eds) KDIR 2015 - Proceedings of the international conference on knowledge discovery and information retrieval, part of the 7th international joint conference on knowledge discovery, knowledge engineering and knowledge management (IC3K 2015), Volume 1, Lisbon, Portugal, November 12–14, 2015, pp 127–137. SciTePress.https://doi.org/10.5220/0005636001270137 Domeniconi G, Moro G, Pagliarani A, Pasolini R (2015) Markov chain based method for in-domain and cross-domain sentiment classification. In: Fred ALN, Dietz JLG, Aveiro D, Liu K, Filipe J (eds) KDIR 2015 - Proceedings of the international conference on knowledge discovery and information retrieval, part of the 7th international joint conference on knowledge discovery, knowledge engineering and knowledge management (IC3K 2015), Volume 1, Lisbon, Portugal, November 12–14, 2015, pp 127–137. SciTePress.https://​doi.​org/​10.​5220/​0005636001270137​
Zurück zum Zitat Domeniconi G, Moro G, Pagliarani A, Pasolini R (2017) On deep learning in cross-domain sentiment classification. In: Fred ALN, Filipe J (eds.) Proceedings of the 9th International joint conference on knowledge discovery, knowledge engineering and knowledge management - (Volume 1), Funchal, Madeira, Portugal, November 1–3, 2017, pp 50–60. SciTePress. https://doi.org/10.5220/0006488100500060 Domeniconi G, Moro G, Pagliarani A, Pasolini R (2017) On deep learning in cross-domain sentiment classification. In: Fred ALN, Filipe J (eds.) Proceedings of the 9th International joint conference on knowledge discovery, knowledge engineering and knowledge management - (Volume 1), Funchal, Madeira, Portugal, November 1–3, 2017, pp 50–60. SciTePress. https://​doi.​org/​10.​5220/​0006488100500060​
Zurück zum Zitat Domeniconi G, Moro G, Pasolini R, Sartori C (2014) Cross-domain text classification through iterative refining of target categories representations. In: Fred ALN, Filipe J (eds) KDIR 2014 - proceedings of the international conference on knowledge discovery and information retrieval, Rome, Italy, 21–24 October, 2014, pp 31–42. SciTePress. https://doi.org/10.5220/0005069400310042 Domeniconi G, Moro G, Pasolini R, Sartori C (2014) Cross-domain text classification through iterative refining of target categories representations. In: Fred ALN, Filipe J (eds) KDIR 2014 - proceedings of the international conference on knowledge discovery and information retrieval, Rome, Italy, 21–24 October, 2014, pp 31–42. SciTePress. https://​doi.​org/​10.​5220/​0005069400310042​
Zurück zum Zitat Domeniconi G, Moro G, Pasolini R, Sartori C (2014) Iterative refining of category profiles for nearest centroid cross-domain text classification. In: Fred ALN, Dietz JLG, Aveiro D, Liu K, Filipe J (eds) Knowledge discovery, knowledge engineering and knowledge management - 6th international joint conference, IC3K 2014, Rome, Italy, October 21–24, 2014, Revised Selected Papers. Communications in Computer and Information Science, vol. 553, pp 50–67. Springer. https://doi.org/10.1007/978-3-319-25840-9_4 Domeniconi G, Moro G, Pasolini R, Sartori C (2014) Iterative refining of category profiles for nearest centroid cross-domain text classification. In: Fred ALN, Dietz JLG, Aveiro D, Liu K, Filipe J (eds) Knowledge discovery, knowledge engineering and knowledge management - 6th international joint conference, IC3K 2014, Rome, Italy, October 21–24, 2014, Revised Selected Papers. Communications in Computer and Information Science, vol. 553, pp 50–67. Springer. https://​doi.​org/​10.​1007/​978-3-319-25840-9_​4
Zurück zum Zitat Domeniconi G, Semertzidis K, López V, Daly EM, Kotoulas S, Moro G (2016) A novel method for unsupervised and supervised conversational message thread detection. In: Francalanci C, Helfert M (eds) DATA 2016 - Proceedings of 5th international conference on data management technologies and applications, Lisbon, Portugal, 24–26 July, 2016, pp 43–54. SciTePress. https://doi.org/10.5220/0006001100430054 Domeniconi G, Semertzidis K, López V, Daly EM, Kotoulas S, Moro G (2016) A novel method for unsupervised and supervised conversational message thread detection. In: Francalanci C, Helfert M (eds) DATA 2016 - Proceedings of 5th international conference on data management technologies and applications, Lisbon, Portugal, 24–26 July, 2016, pp 43–54. SciTePress. https://​doi.​org/​10.​5220/​0006001100430054​
Zurück zum Zitat Farzindar A, Lapalme G (2004) Legal text summarization by exploration of the thematic structure and argumentative roles. In: Text Summarization branches out, pp 27–34. ACL, Barcelona, Spain. https://aclanthology.org/W04-1006 Farzindar A, Lapalme G (2004) Legal text summarization by exploration of the thematic structure and argumentative roles. In: Text Summarization branches out, pp 27–34. ACL, Barcelona, Spain. https://​aclanthology.​org/​W04-1006
Zurück zum Zitat Frisoni G, Cocchieri A, Presepi A, Moro G, Meng Z (2024) To generate or to retrieve? On the effectiveness of artificial contexts for medical open-domain question answering. arXiv:2403.01924 Frisoni G, Cocchieri A, Presepi A, Moro G, Meng Z (2024) To generate or to retrieve? On the effectiveness of artificial contexts for medical open-domain question answering. arXiv:​2403.​01924
Zurück zum Zitat Frisoni G, Moro G (2020) Phenomena explanation from text: Unsupervised learning of interpretable and statistically significant knowledge. In: Hammoudi S, Quix C, Bernardino J (eds) Data management technologies and applications - 9th international conference, DATA 2020, Virtual Event, July 7–9, 2020, Revised Selected Papers. Communications in Computer and Information Science, vol. 1446, pp 293–318. Springer. https://doi.org/10.1007/978-3-030-83014-4_14 Frisoni G, Moro G (2020) Phenomena explanation from text: Unsupervised learning of interpretable and statistically significant knowledge. In: Hammoudi S, Quix C, Bernardino J (eds) Data management technologies and applications - 9th international conference, DATA 2020, Virtual Event, July 7–9, 2020, Revised Selected Papers. Communications in Computer and Information Science, vol. 1446, pp 293–318. Springer. https://​doi.​org/​10.​1007/​978-3-030-83014-4_​14
Zurück zum Zitat Galli F, Grundler G, Fidelangeli A, Galassi A, et al. (2022) Predicting outcomes of italian VAT decisions. In: JURIX. Frontiers in artificial intelligence and applications, vol. 362, pp 188–193. IOS Press. https://doi.org/10.3233/FAIA220465 Galli F, Grundler G, Fidelangeli A, Galassi A, et al. (2022) Predicting outcomes of italian VAT decisions. In: JURIX. Frontiers in artificial intelligence and applications, vol. 362, pp 188–193. IOS Press. https://​doi.​org/​10.​3233/​FAIA220465
Zurück zum Zitat Greenleaf G, (1995) Public access to law via internet: the Australasian legal information institute. In: Paper presented at the asian pacific specials, health and law librarians conference (6th, et al Sydney). J Law Inf Sci 6(1):49–69 Greenleaf G, (1995) Public access to law via internet: the Australasian legal information institute. In: Paper presented at the asian pacific specials, health and law librarians conference (6th, et al Sydney). J Law Inf Sci 6(1):49–69
Zurück zum Zitat Guha N, Nyarko J, Ho DE, Ré C, Chilton A, K A, Chohlas-Wood A, Peters A, Waldon B, Rockmore DN, Zambrano D, Talisman D, Hoque E, Surani F, Fagan F, Sarfaty G, Dickinson GM, Porat H, Hegland J, Wu J, Nudell J, Niklaus J, Nay JJ, Choi JH, Tobia K, Hagan M, Ma M, Livermore MA, Rasumov-Rahe N, Holzenberger N, Kolt N, Henderson P, Rehaag S, Goel S, Gao S, Williams S, Gandhi S, Zur T Iyer V, Li Z (2023) Legalbench: A collaboratively built benchmark for measuring legal reasoning in large language models. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S.(eds.) Advances in neural information processing systems 36: annual conference on neural information processing systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10–16, 2023. http://papers.nips.cc/paper_files/paper/2023/hash/89e44582fd28ddfea1ea4dcb0ebbf4b0-Abstract-Datasets_and_Benchmarks.html Guha N, Nyarko J, Ho DE, Ré C, Chilton A, K A, Chohlas-Wood A, Peters A, Waldon B, Rockmore DN, Zambrano D, Talisman D, Hoque E, Surani F, Fagan F, Sarfaty G, Dickinson GM, Porat H, Hegland J, Wu J, Nudell J, Niklaus J, Nay JJ, Choi JH, Tobia K, Hagan M, Ma M, Livermore MA, Rasumov-Rahe N, Holzenberger N, Kolt N, Henderson P, Rehaag S, Goel S, Gao S, Williams S, Gandhi S, Zur T Iyer V, Li Z (2023) Legalbench: A collaboratively built benchmark for measuring legal reasoning in large language models. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S.(eds.) Advances in neural information processing systems 36: annual conference on neural information processing systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10–16, 2023. http://​papers.​nips.​cc/​paper_​files/​paper/​2023/​hash/​89e44582fd28ddfe​a1ea4dcb0ebbf4b0​-Abstract-Datasets_​and_​Benchmarks.​html
Zurück zum Zitat Licari D, Comandé G (2022) ITALIAN-LEGAL-BERT: a pre-trained transformer language model for italian law. In: Symeonidou D, Yu R, Ceolin D, Poveda-Villalón M, Audrito D, Caro LD, Grasso F, Nai R, Sulis E, Ekaputra FJ, Kutz O, Troquard N (eds) Companion proceedings of the 23rd international conference on knowledge engineering and knowledge management, Bozen-Bolzano, Italy, September 26–29, 2022. CEUR workshop proceedings, vol. 3256. CEUR-WS.org. https://ceur-ws.org/Vol-3256/km4law3.pdf Licari D, Comandé G (2022) ITALIAN-LEGAL-BERT: a pre-trained transformer language model for italian law. In: Symeonidou D, Yu R, Ceolin D, Poveda-Villalón M, Audrito D, Caro LD, Grasso F, Nai R, Sulis E, Ekaputra FJ, Kutz O, Troquard N (eds) Companion proceedings of the 23rd international conference on knowledge engineering and knowledge management, Bozen-Bolzano, Italy, September 26–29, 2022. CEUR workshop proceedings, vol. 3256. CEUR-WS.org. https://​ceur-ws.​org/​Vol-3256/​km4law3.​pdf
Zurück zum Zitat Louviere JJ, Flynn TN, Marley AAJ (2015) Best-worst scaling: theory. Cambridge University Press, Methods and Applications Louviere JJ, Flynn TN, Marley AAJ (2015) Best-worst scaling: theory. Cambridge University Press, Methods and Applications
Zurück zum Zitat Louviere JJ, Woodworth, GG (1991) Best-worst scaling: a model for the largest difference judgments. Technical report, Working paper Louviere JJ, Woodworth, GG (1991) Best-worst scaling: a model for the largest difference judgments. Technical report, Working paper
Zurück zum Zitat Martin L, Muller B, Ortiz Suárez PJ, Dupont Y, Romary L, de la Clergerie É, Seddah D, Sagot B (2020) CamemBERT: a tasty French language model. In: Jurafsky D, Chai J, Schluter N, Tetreault J (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, pp 7203–7219. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.645 Martin L, Muller B, Ortiz Suárez PJ, Dupont Y, Romary L, de la Clergerie É, Seddah D, Sagot B (2020) CamemBERT: a tasty French language model. In: Jurafsky D, Chai J, Schluter N, Tetreault J (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, pp 7203–7219. Association for Computational Linguistics. https://​doi.​org/​10.​18653/​v1/​2020.​acl-main.​645
Zurück zum Zitat Mattei LD, Cafagna M, Dell’Orletta F, Nissim M, Guerini M (2020) Geppetto carves italian into a language model. In: Monti J, Dell’Orletta F, Tamburini F (eds) Proceedings of the Seventh Italian conference on computational linguistics, CLiC-it 2020, Bologna, Italy, March 1–3, 2021. CEUR Workshop Proceedings, vol. 2769. CEUR-WS.org. https://ceur-ws.org/Vol-2769/paper_46.pdf Mattei LD, Cafagna M, Dell’Orletta F, Nissim M, Guerini M (2020) Geppetto carves italian into a language model. In: Monti J, Dell’Orletta F, Tamburini F (eds) Proceedings of the Seventh Italian conference on computational linguistics, CLiC-it 2020, Bologna, Italy, March 1–3, 2021. CEUR Workshop Proceedings, vol. 2769. CEUR-WS.org. https://​ceur-ws.​org/​Vol-2769/​paper_​46.​pdf
Zurück zum Zitat Metsker OG, Trofimov E, Grechishcheva S (2019) Natural language processing of russian court decisions for digital indicators mapping for oversight process control efficiency: disobeying a police officer case. In: EGOSE. communications in computer and information science, vol. 1135, pp 295–307. Springer. https://doi.org/10.1007/978-3-030-39296-3_22 Metsker OG, Trofimov E, Grechishcheva S (2019) Natural language processing of russian court decisions for digital indicators mapping for oversight process control efficiency: disobeying a police officer case. In: EGOSE. communications in computer and information science, vol. 1135, pp 295–307. Springer. https://​doi.​org/​10.​1007/​978-3-030-39296-3_​22
Zurück zum Zitat Moro G, Pagliarani A, Pasolini R, Sartori C (2018) Cross-domain & in-domain sentiment analysis with memory-based deep neural networks. In: Fred ALN, Filipe J (eds) Proceedings of the 10th international joint conference on knowledge discovery, knowledge engineering and knowledge management, IC3K 2018, Volume 1: KDIR, Seville, Spain, September 18–20, 2018, pp 125–136. SciTePress. https://doi.org/10.5220/0007239101270138 Moro G, Pagliarani A, Pasolini R, Sartori C (2018) Cross-domain & in-domain sentiment analysis with memory-based deep neural networks. In: Fred ALN, Filipe J (eds) Proceedings of the 10th international joint conference on knowledge discovery, knowledge engineering and knowledge management, IC3K 2018, Volume 1: KDIR, Seville, Spain, September 18–20, 2018, pp 125–136. SciTePress. https://​doi.​org/​10.​5220/​0007239101270138​
Zurück zum Zitat Moro G, Ragazzi L (2022) Semantic self-segmentation for abstractive summarization of long documents in low-resource regimes. In: Thirty-Sixth AAAI conference on artificial intelligence, AAAI 2022, thirty-fourth conference on innovative applications of artificial intelligence, IAAI 2022, The twelveth symposium on educational advances in artificial intelligence, EAAI 2022 Virtual Event, February 22–March 1, 2022, pp 11085–11093. AAAI Press. https://doi.org/10.1609/AAAI.V36I10.21357 Moro G, Ragazzi L (2022) Semantic self-segmentation for abstractive summarization of long documents in low-resource regimes. In: Thirty-Sixth AAAI conference on artificial intelligence, AAAI 2022, thirty-fourth conference on innovative applications of artificial intelligence, IAAI 2022, The twelveth symposium on educational advances in artificial intelligence, EAAI 2022 Virtual Event, February 22–March 1, 2022, pp 11085–11093. AAAI Press. https://​doi.​org/​10.​1609/​AAAI.​V36I10.​21357
Zurück zum Zitat Moro G, Ragazzi L, Valgimigli L (2023) Carburacy: summarization models tuning and comparison in eco-sustainable regimes with a novel carbon-aware accuracy. In: Williams B, Chen Y, Neville J (eds.) Thirty-seventh AAAI conference on artificial intelligence, AAAI 2023, Thirty-fifth conference on innovative applications of artificial intelligence, IAAI 2023, Thirteenth symposium on educational advances in artificial intelligence, EAAI 2023, Washington, DC, USA, February 7–14, 2023, pp 14417–14425. AAAI Press. https://doi.org/10.1609/AAAI.V37I12.26686 Moro G, Ragazzi L, Valgimigli L (2023) Carburacy: summarization models tuning and comparison in eco-sustainable regimes with a novel carbon-aware accuracy. In: Williams B, Chen Y, Neville J (eds.) Thirty-seventh AAAI conference on artificial intelligence, AAAI 2023, Thirty-fifth conference on innovative applications of artificial intelligence, IAAI 2023, Thirteenth symposium on educational advances in artificial intelligence, EAAI 2023, Washington, DC, USA, February 7–14, 2023, pp 14417–14425. AAAI Press. https://​doi.​org/​10.​1609/​AAAI.​V37I12.​26686
Zurück zum Zitat Moro G, Ragazzi L, Valgimigli L (2023) Graph-based abstractive summarization of extracted essential knowledge for low-resource scenarios. In: Gal K, Nowé A, Nalepa GJ, Fairstein R, Radulescu R (eds) ECAI 2023 - 26th European conference on artificial intelligence, September 30–October 4, 2023, Kraków, Poland—Including 12th conference on prestigious applications of intelligent systems (PAIS 2023). Frontiers in Artificial Intelligence and Applications, vol. 372, pp 1747–1754. IOS Press. https://doi.org/10.3233/FAIA230460 Moro G, Ragazzi L, Valgimigli L (2023) Graph-based abstractive summarization of extracted essential knowledge for low-resource scenarios. In: Gal K, Nowé A, Nalepa GJ, Fairstein R, Radulescu R (eds) ECAI 2023 - 26th European conference on artificial intelligence, September 30–October 4, 2023, Kraków, Poland—Including 12th conference on prestigious applications of intelligent systems (PAIS 2023). Frontiers in Artificial Intelligence and Applications, vol. 372, pp 1747–1754. IOS Press. https://​doi.​org/​10.​3233/​FAIA230460
Zurück zum Zitat Moro G, Ragazzi L, Valgimigli L, Freddi D (2022) Discriminative marginalized probabilistic neural method for multi-document summarization of medical literature. In: Muresan S, Nakov P, Villavicencio A (eds) Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pp 180–189. Association for Computational Linguistics. https://doi.org/10.18653/V1/2022.ACL-LONG.15 Moro G, Ragazzi L, Valgimigli L, Freddi D (2022) Discriminative marginalized probabilistic neural method for multi-document summarization of medical literature. In: Muresan S, Nakov P, Villavicencio A (eds) Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pp 180–189. Association for Computational Linguistics. https://​doi.​org/​10.​18653/​V1/​2022.​ACL-LONG.​15
Zurück zum Zitat Moro G, Ragazzi L, Valgimigli L, Molfetta L (2023) Retrieve-and-rank end-to-end summarization of biomedical studies. In: Pedreira, O., Estivill-Castro, V (eds) Similarity search and applications - 16th international conference, SISAP 2023, A Coruña, Spain, October 9–11, 2023, proceedings. Lecture Notes in Computer Science, vol. 14289 pp 64–78. Springer. https://doi.org/10.1007/978-3-031-46994-7_6 Moro G, Ragazzi L, Valgimigli L, Molfetta L (2023) Retrieve-and-rank end-to-end summarization of biomedical studies. In: Pedreira, O., Estivill-Castro, V (eds) Similarity search and applications - 16th international conference, SISAP 2023, A Coruña, Spain, October 9–11, 2023, proceedings. Lecture Notes in Computer Science, vol. 14289 pp 64–78. Springer. https://​doi.​org/​10.​1007/​978-3-031-46994-7_​6
Zurück zum Zitat Moro G, Ragazzi L, Valgimigli L, Vincenzi F, Freddi D (2024) Revelio: Interpretable long-form question answering. In: The second tiny papers track at ICLR 2024, Tiny Papers @ ICLR 2024, Vienna, Austria, May 7–11, 2024. OpenReview.net. https://openreview.net/pdf?id=fyvEJXsaQf Moro G, Ragazzi L, Valgimigli L, Vincenzi F, Freddi D (2024) Revelio: Interpretable long-form question answering. In: The second tiny papers track at ICLR 2024, Tiny Papers @ ICLR 2024, Vienna, Austria, May 7–11, 2024. OpenReview.net. https://​openreview.​net/​pdf?​id=​fyvEJXsaQf
Zurück zum Zitat Niklaus J, Matoshi V, Rani P, Galassi A, et al. (2023) LEXTREME: a multi-lingual and multi-task benchmark for the legal domain. arXiv:2301.13126 Niklaus J, Matoshi V, Rani P, Galassi A, et al. (2023) LEXTREME: a multi-lingual and multi-task benchmark for the legal domain. arXiv:​2301.​13126
Zurück zum Zitat Parisi L, Francia S, Magnani P (2020) UmBERTo: an Italian Language Model trained with Whole Word Masking. GitHub Parisi L, Francia S, Magnani P (2020) UmBERTo: an Italian Language Model trained with Whole Word Masking. GitHub
Zurück zum Zitat Raffel C, Shazeer N, Roberts A, Lee K et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21:140–114067MathSciNet Raffel C, Shazeer N, Roberts A, Lee K et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21:140–114067MathSciNet
Zurück zum Zitat Ragazzi L, Italiani P, Moro G, Panni M (2024) What are you token about? differentiable perturbed top-\(k\) token selection for scientific document summarization. In: Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, August 11–16, 2024, pp 9427-9440. Association for Computational Linguistics. https://aclanthology.org/2024.findings-acl.561 Ragazzi L, Italiani P, Moro G, Panni M (2024) What are you token about? differentiable perturbed top-\(k\) token selection for scientific document summarization. In: Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, August 11–16, 2024, pp 9427-9440. Association for Computational Linguistics. https://​aclanthology.​org/​2024.​findings-acl.​561
Zurück zum Zitat Santilli A, Rodolà E (2023) Camoscio: an Italian instruction-tuned llama. In: Boschetti F, Lebani GE, Magnini B, Novielli N (eds) Proceedings of the 9th Italian conference on computational linguistics, Venice, Italy, November 30–December 2, 2023. CEUR workshop proceedings, vol. 3596. CEUR-WS.org. https://ceur-ws.org/Vol-3596/paper44.pdf Santilli A, Rodolà E (2023) Camoscio: an Italian instruction-tuned llama. In: Boschetti F, Lebani GE, Magnini B, Novielli N (eds) Proceedings of the 9th Italian conference on computational linguistics, Venice, Italy, November 30–December 2, 2023. CEUR workshop proceedings, vol. 3596. CEUR-WS.org. https://​ceur-ws.​org/​Vol-3596/​paper44.​pdf
Zurück zum Zitat Sarti G, Nissim M (2022) IT5: large-scale text-to-text pretraining for italian language understanding and generation. arXiv:2203.03759 Sarti G, Nissim M (2022) IT5: large-scale text-to-text pretraining for italian language understanding and generation. arXiv:​2203.​03759
Zurück zum Zitat Sarti G, Nissim M (2024) IT5: text-to-text pretraining for italian language understanding and generation. In: Calzolari N, Kan M, Hoste V, Lenci A, Sakti S, Xue N (eds) Proceedings of the 2024 joint international conference on computational linguistics, language resources and evaluation, LREC/COLING 2024, 20–25 May, 2024, Torino, Italy, pp 9422–9433. ELRA and ICCL. https://aclanthology.org/2024.lrec-main.823 Sarti G, Nissim M (2024) IT5: text-to-text pretraining for italian language understanding and generation. In: Calzolari N, Kan M, Hoste V, Lenci A, Sakti S, Xue N (eds) Proceedings of the 2024 joint international conference on computational linguistics, language resources and evaluation, LREC/COLING 2024, 20–25 May, 2024, Torino, Italy, pp 9422–9433. ELRA and ICCL. https://​aclanthology.​org/​2024.​lrec-main.​823
Zurück zum Zitat Shen Z, Lo K, Yu L, Dahlberg N, et al. (2022) Multi-lexsum: real-world summaries of civil rights lawsuits at multiple granularities. arXiv:2206.10883 Shen Z, Lo K, Yu L, Dahlberg N, et al. (2022) Multi-lexsum: real-world summaries of civil rights lawsuits at multiple granularities. arXiv:​2206.​10883
Zurück zum Zitat Tang Y, Tran C, Li X, Chen P, et al (2020) Multilingual translation with extensible multilingual pretraining and finetuning. arXiv:2008.00401 Tang Y, Tran C, Li X, Chen P, et al (2020) Multilingual translation with extensible multilingual pretraining and finetuning. arXiv:​2008.​00401
Zurück zum Zitat Taori R, Gulrajani I, Zhang T, Dubois Y, Li X, Guestrin C, Liang P, Hashimoto TB (2023) Stanford Alpaca: an instruction-following LLaMA model. GitHub Taori R, Gulrajani I, Zhang T, Dubois Y, Li X, Guestrin C, Liang P, Hashimoto TB (2023) Stanford Alpaca: an instruction-following LLaMA model. GitHub
Zurück zum Zitat Tuggener D, von Däniken P, Peetz T, Cieliebak M (2020) LEDGAR: A large-scale multi-label corpus for text classification of legal provisions in contracts. In: LREC, pp 1235–1241. European Language Resources Association, Marseille, France. https://aclanthology.org/2020.lrec-1.155 Tuggener D, von Däniken P, Peetz T, Cieliebak M (2020) LEDGAR: A large-scale multi-label corpus for text classification of legal provisions in contracts. In: LREC, pp 1235–1241. European Language Resources Association, Marseille, France. https://​aclanthology.​org/​2020.​lrec-1.​155
Zurück zum Zitat Wolf T, Debut L, Sanh V, Chaumond J, et al (2019) Huggingface’s transformers: state-of-the-art natural language processing. arXiv:1910.03771 Wolf T, Debut L, Sanh V, Chaumond J, et al (2019) Huggingface’s transformers: state-of-the-art natural language processing. arXiv:​1910.​03771
Zurück zum Zitat Xue L, Constant N, Roberts A, Kale M, Al-Rfou R, Siddhant A, Barua A, Raffel C (2021) mt5: a massively multilingual pre-trained text-to-text transformer. In: Toutanova K, Rumshisky A, Zettlemoyer L, Hakkani-Tür D, Beltagy I, Bethard S, Cotterell R, Chakraborty T, Zhou Y (eds) Proceedings of the 2021 conference of the North American Chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2021, Online, June 6–11, 2021, pp 483–498. Association for Computational Linguistics. https://doi.org/10.18653/V1/2021.NAACL-MAIN.41 Xue L, Constant N, Roberts A, Kale M, Al-Rfou R, Siddhant A, Barua A, Raffel C (2021) mt5: a massively multilingual pre-trained text-to-text transformer. In: Toutanova K, Rumshisky A, Zettlemoyer L, Hakkani-Tür D, Beltagy I, Bethard S, Cotterell R, Chakraborty T, Zhou Y (eds) Proceedings of the 2021 conference of the North American Chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2021, Online, June 6–11, 2021, pp 483–498. Association for Computational Linguistics. https://​doi.​org/​10.​18653/​V1/​2021.​NAACL-MAIN.​41
Zurück zum Zitat Xu C, Guo D, Duan N, McAuley J (2023) Baize: An open-source chat model with parameter-efficient tuning on self-chat data. In: Bouamor H, Pino J, Bali K (eds) Proceedings of the 2023 conference on empirical methods in natural language processing, pp 6268–6278. Association for Computational Linguistics, Singapore. https://doi.org/10.18653/v1/2023.emnlp-main.385 Xu C, Guo D, Duan N, McAuley J (2023) Baize: An open-source chat model with parameter-efficient tuning on self-chat data. In: Bouamor H, Pino J, Bali K (eds) Proceedings of the 2023 conference on empirical methods in natural language processing, pp 6268–6278. Association for Computational Linguistics, Singapore. https://​doi.​org/​10.​18653/​v1/​2023.​emnlp-main.​385
Zurück zum Zitat Zhang M, Zhou G, Yu W, Huang N, et al. (2022) A comprehensive survey of abstractive text summarization based on deep learning. Comput Intell Neurosci 2022 Zhang M, Zhou G, Yu W, Huang N, et al. (2022) A comprehensive survey of abstractive text summarization based on deep learning. Comput Intell Neurosci 2022
Metadaten
Titel
LAWSUIT: a LArge expert-Written SUmmarization dataset of ITalian constitutional court verdicts
verfasst von
Luca Ragazzi
Gianluca Moro
Stefano Guidi
Giacomo Frisoni
Publikationsdatum
09.09.2024
Verlag
Springer Netherlands
Erschienen in
Artificial Intelligence and Law
Print ISSN: 0924-8463
Elektronische ISSN: 1572-8382
DOI
https://doi.org/10.1007/s10506-024-09414-w