Skip to main content

Open Access 2023 | Open Access | Buch

Buchtitelbild

European Language Equality

A Strategic Agenda for Digital Language Equality

insite
SUCHEN

Über dieses Buch

This open access book presents a comprehensive collection of the European Language Equality (ELE) project’s results, its strategic agenda and roadmap with key recommendations to the European Union on how to achieve digital language equality in Europe by 2030. The fabric of the EU linguistic landscape comprises 24 official languages and over 60 regional and minority languages. However, language barriers still hamper communication and the free flow of information. Multilingualism is a key cultural cornerstone of Europe, signifying what it means to be and to feel European. Various studies and resolutions have found a striking imbalance in the support of Europe’s languages through technologies, issuing a call to action.

Following an introduction, the book is divided into two parts. The first part describes the state of the art of language technology and language-centric AI and the definition and metrics developed to measure digital language equality. It also presents the status quo in 2022/2023, i.e., the current level of technology support for over 30 European languages. The second part describes plans and recommendations on how to bring about digital language equality in Europe by 2030. It includes chapters on the setup and results of the community consultation process, four technical deep dives, an overview of existing strategic documents and an abridged version of the strategic agenda and roadmap.

The recommendations have been prepared jointly with the European community in the fields of language technology, natural language processing, and language-centric AI, as well as with representatives of relevant initiatives and associations, language communities and regional and minority language groups. Ensuring appropriate technology support for all European languages will not only create jobs, growth and opportunities in the digital single market. Overcoming language barriers in the digital environment is also essential for an inclusive society and for providing unity in diversity for many years to come.

Inhaltsverzeichnis

Frontmatter

Open Access

Chapter 1. European Language Equality: Introduction

This chapter provides an introduction to the EU-funded project European Language Equality (ELE). It motivates the project by taking a general look at multilingualism, especially with regard to the political equality of all languages in Europe. Since 2010, several projects and initiatives have developed the notion of utilising sophisticated language technologies to unlock and enable multilingualism technologically. However, despite a landmark resolution that was adopted by the European Parliament in 2018, no significant progress has been made. Together with the whole European LT community, and making use of a concerted community consultation process, the ELE project produced strategic recommendations that specify how to bring about full digital language equality in Europe and reach the scientific goal of Deep Natural Language Understanding by 2030, not only addressing but eventually solving the problem of digital inequality of Europe’s languages.

Georg Rehm, Andy Way

European Language Equality: Status Quo in 2022

Frontmatter

Open Access

Chapter 2. State-of-the-Art in Language Technology and Language-centric Artificial Intelligence

This chapter landscapes the field of Language Technology (LT) and language- centric AI by assembling a comprehensive state-of-the-art of basic and applied research in the area. It sketches all recent advances in AI, including the most recent deep learning neural technologies. The chapter brings to light not only where language-centric AI as a whole stands, but also where the required resources should be allocated to place European LT at the forefront of the AI revolution. We identify key research areas and gaps that need to be addressed to ensure LT can overcome the current inequalities.

Rodrigo Agerri, Eneko Agirre, Itziar Aldabe, Nora Aranberri, Jose Maria Arriola, Aitziber Atutxa, Gorka Azkune, Jon Ander Campos, Arantza Casillas, Ainara Estarrona, Aritz Farwell, Iakes Goenaga, Josu Goikoetxea, Koldo Gojenola, Inma Hernáez, Mikel Iruskieta, Gorka Labaka, Oier Lopez de Lacalle, Eva Navas, Maite Oronoz, Arantxa Otegi, Alicia Pérez, Olatz Perez de Viñaspre, German Rigau, Ander Salaberria, Jon Sanchez, Ibon Saratxaga, Aitor Soroa

Open Access

Chapter 3. Digital Language Equality: Definition, Metric, Dashboard

This chapter presents the concept of Digital Language Equality (DLE) that was at the heart of the European Language Equality (ELE) initiative, and describes the DLE Metric, which includes technological factors (TFs) and contextual factors (CFs): the former concern the availability of Language Resources and Technologies (LRTs) for the languages of Europe, based on the data included in the European Language Grid (ELG) catalogue, while the latter reflect the broader socio-economic contexts and ecosystems of the languages, as these determine the potential for LRT development. The chapter discusses related work, presents the DLE definition and describes how it was implemented through the DLE Metric, explaining how the TFs and CFs were quantified. The resulting scores of the DLE Metric for Europe’s languages can be visualised and compared through the interactive DLE dashboard, to monitor the progress towards DLE in Europe.

Federico Gaspari, Annika Grützner-Zahn, Georg Rehm, Owen Gallagher, Maria Giagkou, Stelios Piperidis, Andy Way

Open Access

Chapter 4. European Language Technology in 2022/2023

This chapter presents the results of an extensive empirical investigation of the digital readiness of European languages, and provides a snapshot of the support they are offered through technology as of 2022. The degree of digital readiness was assessed on the basis of the availability of language resources and technologies for each language under investigation and a cross-language comparison was performed. As a complementary approach, the perspectives and opinions of LT users, developers and the regular citizen were acquired in order to fully understand the EU’s LT landscape. Both the objective empirical findings and the voice of the community clearly indicate that there is an extreme imbalance across languages when it comes to the individual levels of technological support. Although the LT field as a whole has demonstrated remarkable progress during the last decade, this progress is not equally evidenced across all languages, posing, more acutely than ever before, a threat of digital extinction for many of Europe’s lesser supported languages.

Maria Giagkou, Teresa Lynn, Jane Dunne, Stelios Piperidis, Georg Rehm

Open Access

Chapter 5. Language Report Basque

Since 1968 Basque has been immersed in a process of revitalisation that has faced formidable obstacles. Nonetheless, significant progress has been made in numerous areas. The Language Technology community widely accepts the standardised language and constructs efficacious LT tools. After thirty years of collaborative work, research has resulted in state-of-the-art technology and robust, broad-coverage NLP for Basque. However, a dramatic difference remains between Basque and other European languages in terms of both the maturity of research and the state of readiness with respect to language technology solutions.

Kepa Sarasola, Itziar Aldabe, Arantza Diaz de Ilarraza, Ainara Estarrona, Aritz Farwell, Inma Hernáez, Eva Navas

Open Access

Chapter 6. Language Report Bosnian

It is objective to state that there are no language technologies for the Bosnian language or initiatives for the digitalisation of the Bosnian language. Therefore, it is necessary to take initial steps towards technological support for the Bosnian language, in order to prevent its digital extinction. In Bosnia and Herzegovina, no programmes aimed at the research and development of language technology products have been initiated. The Bosnian language is present in the digital sphere more or less as much as it is included in foreign, multilingual tools and resources, which are mostly related to Machine Translation (Google Translate and others).

Tarik Ćušić

Open Access

Chapter 7. Language Report Bulgarian

This chapter reports on the current status of technology support for Bulgarian and highlights certain gaps. The analysis is based on the services and resources available in the European Language Grid in early 2022. While the LT field as a whole has significantly progressed in the last ten years, we conclude that there is still a yawning technological gap between English and Bulgarian, and even between German, French, Italian, Spanish and Bulgarian. It is exactly this distance that needs to be ideally eliminated, if not at least reduced, in order to move towards Digital Language Equality for Bulgarian.

Svetla Koeva

Open Access

Chapter 8. Language Report Catalan

Despite its vulnerable position as a minoritised language, the presence of Catalan in the digital sphere is relatively strong, thanks to an active online community with a high technological profile. Technological support for Catalan is slowly growing, following the recent advances in AI and increased awareness of the value of language data and technologies among public and private bodies. However, more effort is needed to promote the creation of open-source solutions and resources so as to lower the investment barrier for companies to build technology for Catalan.

Maite Melero, Blanca Calvo, Mar Rodríguez, Marta Villegas

Open Access

Chapter 9. Language Report Croatian

This chapter presents a summary of the Language Report on Croatian (Tadić 2022) on general features of the language and the level of technological support it receives since the previous report (Tadić et al. 2012). The chapter includes information about the typological and structural features of Croatian, its status and usage in the digital sphere and its support through Language Technologies.

Marko Tadić

Open Access

Chapter 10. Language Report Czech

This chapter provides basic data about Language Technology for the Czech language. After a brief introduction with general facts about the language (history, linguistic features, writing system, dialects), we touch upon Czech in the digital sphere. The main achievements in the field of NLP are presented: important datasets (corpora, treebanks, lexicons etc.) and tools (morphological analyzers, taggers, automatic translators, voice recognisers and generators, keyword extracters etc).

Jaroslava Hlaváčová

Open Access

Chapter 11. Language Report Danish

This chapter summarises the current level of language technologies (LT) and resources for Danish (Pedersen et al. 2022). Even if Danish LTs are now being used in many areas of society, their quality still needs to be improved in order to make them more useful and inclusive for the majority of the population. To this end, the development of large, high-quality language resources and data sets still proves to be a bottleneck. We report, however, on an increased awareness of sharing and reusing language resources and data sets across public institutions, academia and industry. New, large governmental initiatives within the area of AI and LT have been initiated which support this development.

Bolette Sandford Pedersen, Sussi Olsen, Lina Henriksen

Open Access

Chapter 12. Language Report Dutch

This chapter provides a new state of affairs (Steurs et al. 2022) with regard to language technology (LT) for Dutch (after Odijk 2012). LT for Dutch is highly developed, and the Netherlands and Flanders have a strong and cooperative LT community. A lot of digital data is freely available through CLARIN and the Dutch Language Institute (INT). However, data and software have to be updated continuously, and there is a need for a new overarching programme to support research initiatives.

Frieda Steurs, Vincent Vandeghinste, Walter Daelemans

Open Access

Chapter 13. Language Report English

This chapter focuses on the status of the English language, primarily acting as a benchmark for the level of technological support that other European languages could receive (see Maynard et al. 2022; Ananiadou et al. 2012). While it is rather unlikely that any other European language will ever reach this level, due to the continuing development of support for English, and thus serves as a moving goalpost, nevertheless it provides a good criterion for relative assessment. While the inequalities in the amount of technological support available for English compared with other European languages may act as a deterrent for working on the latter, nevertheless it serves as a useful mechanism for applying cross-lingual transfer methods in order to build language models and generate labelled data for lower resource languages.

Diana Maynard, Joanna Wright, Mark A. Greenwood, Kalina Bontcheva

Open Access

Chapter 14. Language Report Estonian

This chapter gives a brief overview of Estonian LT tools and resources (Muischnek 2022; Liin et al. 2012). The Estonian language has only around one million speakers and so the market for Estonian LT products is also a small one. In general, the current situation of Estonian LT is acceptable for a small language, but far from perfect. The main force driving the development of Estonian LT has been the public sector and so the resources and tools developed by publicly funded projects are mainly open source. Nonetheless, during the last decade, the private sector has also engaged in creating tools and solutions for Estonian.

Kadri Muischnek

Open Access

Chapter 15. Language Report Finnish

During the last ten years, digitalisation has changed the way we interact with the world creating an increasing demand for language-based AI services. In the field of language technology, the Finnish language is still only moderately equipped with products, technologies and resources. The situation has improved in recent years, but still support for automated translation leaves room for ample improvement, as the general support for spoken language is modest in industry applications although some recent research results are encouraging. We take stock of the existing resources for Finnish and try to identify some remaining gaps.

Krister Lindén, Wilhelmina Dyster

Open Access

Chapter 16. Language Report French

This chapter presents a survey of the current state of technologies for the automatic processing of the French language. It is based on a thorough analysis of existing tools and resources for French, and also provides an accurate presentation of the domain and its main stakeholders (Adda et al. 2022). The chapter documents the presence of French on the internet and describes in broad terms the existing technologies for the French language. It also spells out general conclusions and formulates recommendations for progress towards deep language understanding for French.

Gilles Adda, Ioana Vasilescu, François Yvon

Open Access

Chapter 17. Language Report Galician

This chapter reports on the current state of Language Technology (LT) for Galician. The main conclusion is that there are a limited number of resources, products, and technologies for the Galician language with text-based technologies and services being more mature than those based on speech processing. We start with general facts about Galician, followed by a high-level qualitative description of the LT situation for Galician, and conclude with recommendations for bridging the gap between Galician LT with Spanish and the other co-official languages of Spain.

José Manuel Ramírez Sánchez, Laura Docío Fernández, Carmen García-Mateo

Open Access

Chapter 18. Language Report German

German is the second most widely spoken language in the EU. The last decade has seen strongly perceptible language change, trending towards the simplification of the grammatical system, a rapidly growing number of anglicisms, a decreasing prevalence of dialects, and an increase in socio-political debates on matters such as language policies and gender-neutral language. Many technologies and resources for German are available, which is also due to numerous well-established research institutions and a thriving Language Technology (LT) and Artificial Intelligence (AI) industry. In order to withstand in the digital sphere, it is important that incentives for research, digital education and also concrete opportunities for marketing and deploying LT applications are put at the forefront of future AI strategies.

Stefanie Hegele, Barbara Heinisch, Antonia Popp, Katrin Marheinecke, Annette Rios, Dagmar Gromann, Martin Volk, Georg Rehm

Open Access

Chapter 19. Language Report Greek

Technological support for Greek, one of Europe’s lesser spoken languages, has progressed in the past decade, while LRTs have both increased in volume and improved in quality and coverage. Despite this progress, when compared to the ‘big languages’, Greek is obviously disadvantaged. Prominent among the challenges is the fact that LT is not included in the language policies or AI strategies of Greece and Cyprus, i. e., the significance of language-centric AI is still not officially recognised. Lack of continuity in research and development funding is an additional factor hampering progress. A Europe-wide coordinated initiative focused on overcoming the differences in language technology readiness for European languages coupled with national targeted actions is considered necessary.

Maria Gavriilidou, Maria Giagkou, Dora Loizidou, Stelios Piperidis

Open Access

Chapter 20. Language Report Hungarian

The revolutionary expansion of language technologies (LT) in the last decade and the emergence of neural networks has heavily impacted LT. This is reflected in the development of Hungarian NLP as well, as numerous high-quality LMs, tools and datasets have been created. However, new, huge datasets are still needed to train LMs. Due to being a lesser resourced Uralic language with a smaller number of speakers, Hungarian LT has to face challenges often different from those of large Indo-European languages like English. Here we present a snapshot of this important period in the development of Hungarian LT, with special attention to language resources, and we outline some of the possible next steps.

Kinga Jelencsik-Mátyus, Enikő Héja, Zsófia Varga, Tamás Váradi

Open Access

Chapter 21. Language Report Icelandic

In 2019, the Icelandic Government launched a three-year Language Technology Programme for Icelandic (LTPI). Within this programme, a number of language resources and tools have been built from scratch and several pre-existing resources and tools have been enhanced and improved. This programme is now finished and the situation for Icelandic with respect to language technology has improved considerably. In spite of this, Icelandic still remains a low-resourced language compared to most official European languages.

Eiríkur Rögnvaldsson

Open Access

Chapter 22. Language Report Irish

Language technology (LT) underpins many applications that enable our digitally enhanced lives (virtual assistants, search engines, translation tools, spellcheckers, language learning tools etc.). However, these advances do not benefit all Irish citizens equally. Due to a lack of sufficient LTs for Irish, Irish speakers regularly need to revert to using English. Such a language shift plays a major role in the risk of digital extinction, i. e., an eventual decline in language use due to lack of technological support. This chapter highlights work carried out on Irish LT, and the gaps and challenges that still need to be addressed (Lynn 2022).

Teresa Lynn

Open Access

Chapter 23. Language Report Italian

In the last few years, three important factors have influenced the Italian Language Technology (LT) community: 1. in 2015, the foundation of the Associazione Italiana di Linguistica Computazionale (Italian Association for Computational Linguistics, AILC); 2. the organisation of CLiC-it, the annual Italian Conference on Computational Linguistics; 3. the organisation of the EVALITA (Evaluation of NLP and Speech Tools for Italian) evaluation campaigns. This situation is producing a widespread expansion of interest in LT for Italian in academia and industry.

Bernardo Magnini, Alberto Lavelli, Manuela Speranza

Open Access

Chapter 24. Language Report Latvian

Ten years ago, when META-NET conducted a study on Language Technology support for Europe’s languages, Latvian was assessed as a language with little or no support (Skadiņa et al. 2012). During the last decade, progress has been made in the development of language resources and tools for Latvian, particularly with respect to advanced datasets and language models, machine translation solutions, speech technologies, and technologies for natural language understanding and human-computer interaction. This chapter provides a summary of the current state of the Latvian language, the only official language of Latvia, in the digital environment and highlights the most important activities in the language technology field.

Inguna Skadiņa, Ilze Auziņa, Baiba Valkovska, Normunds Grūzītis

Open Access

Chapter 25. Language Report Lithuanian

Significant progress has been made in adapting the Lithuanian language to the digital environment. A number of digital language resources and basic language analysis tools, as well as complex online language services and the Lithuanian language ontology have been developed, while a number of computer programs and tools have been localised. Computer applications relevant to society are being Lithuanianised, and the standardisation of computer terms is being carried out. Lithuanian researchers actively participate in the cooperation and mobility activities of international associations, and a core of Lithuanian specialists working in the field of IT application, and developing innovative work in this field, has been formed. Lithuania also strives for all citizens to have full access to digital solutions, which adds importance to the policy of adapting them for those living with disabilities.

Anželika Gaidienė, Aurelija Tamulionienė

Open Access

Chapter 26. Language Report Luxembourgish

The Grand Duchy of Luxembourg is a small and multilingual country. The national language is Luxembourgish, and the legislative language is French. French, German and Luxembourgish are the three administrative and judicial languages. There are about 650,000 inhabitants and the majority of Luxembourgers speak four languages. As of March 2021, there were 59,000 Wikipedia articles written in Luxembourgish. Luxembourgish is very under-resourced when it comes to data resources and tools. This chapter provides a brief overview of the current level of support that Luxembourgish receives through technology (Anastasiou 2022).

Dimitra Anastasiou

Open Access

Chapter 27. Language Report Maltese

This chapter is a highly abbreviated version of an update (Rosner and C. Borg 2022) to the META-NET White Paper on Maltese (Rosner and Joachimsen 2012). Like its predecessor, the update forms part of a series for all European Languages. Section 1 provides a brief description of the language, its national status, its general typology as a language, and its current usage in the digital sphere. Section 2 gives an overview of technologies and resources that are currently available. Finally, Section 3 frames the main shortcomings of Maltese language technology in terms of fragmentation, and offers some recommendations on how that might be reduced.

Michael Rosner, Claudia Borg

Open Access

Chapter 28. Language Report Norwegian

The use of Language Technology (LT) has greatly increased in Norway in recent years, as have the linguistic resources needed to make them work. In the past 10 years, Norwegian has adopted new or improved versions of machine translation, speech technology, chatbots and digital assistants, and machine learning has improved. Nevertheless, LT for both written standards of the Norwegian language – the majority Bokmål and minority Nynorsk – is nowhere near the same level as that of major European languages such as English, German, French and Spanish.

Kristine Eide, Andre Kåsen, Ingerid Løyning Dale

Open Access

Chapter 29. Language Report Polish

The quality of language technology (LT) for Polish has greatly improved recently, influenced by three independent trends. The first one is Poland-specific and concerns the increase in national funding of both scientific and R&D projects, resulting in the construction of The National Corpus of Polish and the development of the CLARIN-PL and DARIAH-PL infrastructures. Two other trends are global: the development of language resources (LRs) and tools by private companies and of course, the deep learning revolution which has led to enormous improvements in the state-of-the-art in all fields of language processing.

Maciej Ogrodniczuk, Piotr Pęzik, Marek Łaziński, Marcin Miłkowski

Open Access

Chapter 30. Language Report Portuguese

This chapter provides an analysis of the level of technological preparation of the Portuguese language for the digital age, as well as the actions necessary for the consolidation of Portuguese as a language of international communication with global projection.

António Branco, Sara Grilo, João Silva

Open Access

Chapter 31. Language Report Romanian

Since the previous META-NET report, there have been significant improvements (e. g., creation of a large Romanian national corpus, steady progress in written language technologies, LT, construction of a national LT portal for the Romanian language etc.), but things are far from what they should be. Support for LT and AI through national programmes is still modest, although there are signs of a more active involvement of policy makers in the strategic planning and funding programmes in this domain. Continued research is required to produce large language models, able to capture the characteristics of the Romanian language. Large language resources need to be created so that AI systems are able to learn from them.

Vasile Păiş, Dan Tufiş

Open Access

Chapter 32. Language Report Serbian

Standard Serbian is the national language of Serbs and the official language in the Republic of Serbia. Although statistics show that the population of Serbia is well equipped to use IT, and although some important language resources and tools have been developed for Serbian, the language still lags significantly behind most European languages in terms of Language Technology (LT). This shows that a stable, dedicated and long-term investment in the development of LT for Serbian through national and international scientific and development projects is needed.

Cvetana Krstev, Ranka Stanković

Open Access

Chapter 33. Language Report Slovak

For Slovak, all the fundamental NLP building blocks for basic applications exist, but they are often of lesser quality and lower accuracy than those of other languages. The availability of free and open tools and data is rather low, with most of the resources proprietary. Compared to neighbouring languages of similar levels of NLP development (Czech, Polish, Hungarian), Slovak is positioned toward the lower end of this group. Slovak language support by “big players” in the LT industry is comparable to other European languages with similar size; speech recognition and synthesis work acceptably while machine translation between Slovak and English is almost good enough to be used by professionals as a source for post-editing. Spell checkers, LT-assisted mobile phone input, OCR and lemmatised fulltext search are taken for granted, although their quality is significantly lacking compared to bigger European languages.

Radovan Garabík

Open Access

Chapter 34. Language Report Slovenian

Around 2.5 million people around the world speak or understand Slovene, with the vast majority of them living in the Republic of Slovenia where it is the official language. The constitution grants the right to use their mother tongue to Italian and Hungarian minorities in certain municipalities. In terms of Language Technology, the Slovene CLARIN.SI consortium plays the key role in the community; all major Slovene institutions involved in the development of LT resources, tools and services are members of the consortium. In contrast, the number of private companies in Slovenia specialising in LT for Slovene remains low, and most of the LT products come either from the (Slovene) academic sphere via national or EU funding, or from the big international IT companies that cover a large number of languages.

Simon Krek

Open Access

Chapter 35. Language Report Spanish

Spanish, one of the most spoken languages in the world, is not threatened by globalisation in the way other languages are and is well-supported by big technological companies, albeit still a long way from English. The number of available language resources (text, and to a lesser extent speech) in Spanish is quite large, but there is still a lack of high-quality, well-curated, annotated resources, available under open-access conditions. Initiatives at the national level, such as the Plan de Impulso de las Tecnologías del Lenguaje, have already started to address this gap.

Maite Melero, Pablo Peñarrubia, David Cabestany, Blanca Calvo, Mar Rodríguez,, Marta Villegas

Open Access

Chapter 36. Language Report Swedish

Swedish speech and language technology (LT) research goes back over 70 years. This has paid off: there is a national research infrastructure, as well as significant research projects, and Swedish is well-endowed with language resources (LRs) and tools. However, there are gaps that need to be filled, especially high-quality goldstandard LRs required by the most recent deep-learning methods. In the future, we would like to see closer collaborations and communication between the “traditional” LT research community and the burgeoning AI field, the establishment of dedicated academic LT training programmes, and national funding for LT research.

Lars Borin, Rickard Domeij, Jens Edlund, Markus Forsberg

Open Access

Chapter 37. Language Report Welsh

In this chapter, based on Prys et al. (2022), an update to the META-NET White Paper (Evas 2014), we present Language Technology (LT) for the Welsh language, providing an overview of the status of Welsh in Wales and a summary of the Welsh writing system and typology. We describe key tools and our recommendations for Welsh LT and associated resource development.

Delyth Prys, Gareth Watkins

European Language Equality: The Future Situation in 2030 and beyond

Frontmatter

Open Access

Chapter 38. Consulting the Community: How to Reach Digital Language Equality in Europe by 2030?

This chapter describes the community consultation process carried out in the European Language Equality (ELE) project concerning the future situation in 2030. Due to its central status for the future-looking activities within the project, this chapter introduces the second part of the present book. We gathered, analysed and structured the views, visions, demands, needs and gaps of European Language Technology (LT) developers, both industry and academia, and European LT users and consumers. Additionally, based on these collected findings and other evidence, we attempted to derive a thorough description of the steps to take to reach Digital Language Equality (DLE) in Europe by the year 2030 and, moreover, what the field of LT will look like in Europe in about ten years from now.

Jan Hajič, Maria Giagkou, Stelios Piperidis, Georg Rehm, Natalia Resende

Open Access

Chapter 39. Results of the Forward-looking Community-wide Consultation

Within the ELE project three complementary online surveys were designed and implemented to consult the Language Technology (LT) community with regard to the current state of play and the future situation in about 2030 in terms of Digital Language Equality (DLE). While Chapters 4 and 38 provide a general overview of the community consultation methodology and the results with regard to the current situation as of 2022, this chapter summarises the results concerning the future situation in 2030. All of these results have been taken into account for the specification of the project’s Strategic Research, Innovation and Implementation Agenda (SRIA) and Roadmap for Achieving Full DLE in Europe by 2030.

Emma Daly, Jane Dunne, Federico Gaspari, Teresa Lynn, Natalia Resende, Andy Way, Maria Giagkou, Stelios Piperidis, Tereza Vojtěchová, Jan Hajič, Annika Grützner-Zahn, Stefanie Hegele, Katrin Marheinecke, Georg Rehm

Open Access

Chapter 40. Deep Dive Machine Translation

Machine Translation (MT) is one of the oldest language technologies having been researched for more than 70 years. However, it is only during the last decade that it has been widely accepted by the general public, to the point where in many cases it has become an indispensable tool for the global community, supporting communication between nations and lowering language barriers. Still, there remain major gaps in the technology that need addressing before it can be successfully a0146pplied in under-resourced settings, can understand context and use world knowledge. This chapter provides an overview of the current state-of-the-art in the field of MT, offers technical and scientific forecasting for 2030, and provides recommendations for the advancement of MT as a critical technology if the goal of digital language equality in Europe is to be achieved.

Inguna Skadiņa, Andrejs Vasiḷjevs, Mārcis Pinnis, Aivars Bērziņš, Nora Aranberri, Joachim Van den Bogaert, Sally O’Connor, Mercedes García-Martínez, Iakes Goenaga, Jan Hajič, Manuel Herranz, Christian Lieske, Martin Popel, Maja Popović, Sheila Castilho, Federico Gaspari, Rudolf Rosa, Riccardo Superbo, Andy Way

Open Access

Chapter 41. Deep Dive Speech Technology

This chapter provides an in-depth account of current research activities and applications in the field of Speech Technology (ST). It discusses technical, scientific, commercial and societal aspects in various ST sub-fields and relates ST to the wider areas of Natural Language Processing and Artificial Intelligence. Furthermore, it outlines breakthroughs needed, main technology visions and provides an outlook towards 2030 as well as a broad view of how ST may fit into and contribute to a wider vision of Deep Natural Language Understanding and Digital Language Equality in Europe. The chapter integrates the views of several companies and institutions involved in research and commercial application of ST.

Marcin Skowron, Gerhard Backfried, Eva Navas, Aivars Bērziņš, Joachim Van den Bogaert, Franciska de Jong, Andrea DeMarco, Inma Hernáez, Marek Kováč, Peter Polák, Johan Rohdin, Michael Rosner, Jon Sanchez, Ibon Saratxaga, Petr Schwarz

Open Access

Chapter 42. Deep Dive Text Analytics and Natural Language Understanding

In this chapter, we present a comprehensive overview of text analytics and Natural Language Understanding (NLU) from the perspective of digital language equality (DLE) in Europe. We focus on the research that is currently being undertaken in foundational methods and techniques related to these technologies as well as on the gaps that need to be addressed in order to offer improved text analytics and NLU support across languages. Our analysis includes eight recommendations that address central topics for text analytics and NLU, e. g., the role of language equality for social good, the balance between commercial interests and equal opportunities for society, and incentives to language equality, as well as key technologies like language models and the availability of cross-lingual, cross-modal, and cross-sector datasets and benchmarks.

Jose Manuel Gómez-Pérez, Andrés García-Silva, Cristian Berrio, German Rigau, Aitor Soroa, Christian Lieske, Johannes Hoffart, Felix Sasaki, Daniel Dahlmeier, Inguna Skadiņa, Aivars Bērziņš, Andrejs Vasiḷjevs, Teresa Lynn

Open Access

Chapter 43. Deep Dive Data and Knowledge

This deep dive on data, knowledge graphs (KGs) and language resources (LRs) is the final of the four technology deep dives, as data as well as related models are the basis for technologies and solutions in the area of Language Technology (LT) for European digital language equality (DLE). This chapter focuses on the data and LRs required to achieve full DLE in Europe by 2030. The main components identified – data, KGs, LRs – are explained, and used to analyse the state-of-the-art as well as identify gaps. All of these components need to be tackled in the future, for the widest range of languages possible, from official EU languages to dialects to non- EU languages used in Europe. For all these languages, efficient data collection and sustainable data provision to be facilitated with fair conditions and costs. Specific technologies, methodologies and tools have been identified to enable the implementation of the vision of DLE by 2030. In addition, data-related business models and data-governance models are discussed, as they are considered a prerequisite for a working data economy that stimulates a vibrant LT landscape that can bring about European DLE.

Martin Kaltenböck, Artem Revenko, Khalid Choukri, Svetla Boytcheva, Christian Lieske, Teresa Lynn, German Rigau, Maria Heuschkel, Aritz Farwell, Gareth Jones, Itziar Aldabe, Ainara Estarrona, Katrin Marheinecke, Stelios Piperidis, Victoria Arranz, Vincent Vandeghinste, Claudia Borg

Open Access

Chapter 44. Strategic Plans and Projects in Language Technology and Artificial Intelligence

This chapter on existing strategic plans and projects in Language Technology and Artificial Intelligence is based on an analysis of around 200 documents and is divided into three sections. The first provides a synopsis of international and European reports on Language Technology. The second constitutes a review of existing European Strategic Research Agendas, initiatives, and national plans related to Language Technology. The third contains a SWOT analysis designed to identify the factors that will need to be addressed to help solve the challenge of digital language inequality in Europe. Among the principal conclusions presented is the contention that our continent requires sophisticated multilingual, cross-lingual and monolingual LT for all European languages: LT for Europe that is made in Europe.

Itziar Aldabe, Aritz Farwell, German Rigau, Georg Rehm, Andy Way

Open Access

Chapter 45. Strategic Research, Innovation and Implementation Agenda for Digital Language Equality in Europe by 2030

This chapter presents the ELE Programme (ELE Consortium 2022). Reacting to the landmark resolution (European Parliament 2018), its vision is to achieve digital language equality in Europe by 2030. The programme was prepared jointly with many stakeholders from the European Language Technology, Natural Language Processing, Computational Linguistics and language-centric AI communities, as well as with representatives of relevant initiatives and associations, and language communities. Europe still suffers from strong inequalities in terms of technology support of its languages. English is still by far the language with the best technological support, followed by a cluster of three languages (German, Spanish, French) that already have only half the technological support of English. More than half of the around 90 languages surveyed have either weak or no technological support at all. The ELE Programme is foreseen to be a shared, long-term funding programme tailored to Europe’s needs, demands and values. For the EU we foresee the role of providing resources for coordinating the programme, for providing shared infrastructures, for maintaining the scientific goals and programme principles, etc. The participating countries have the role of providing resources for the development of technologies and datasets for their own languages. Key goals are to reduce the technology gap between English and all other European languages and to address the lack of available language data. The ELE Programme tackles the following overarching themes: Language Modelling, Data and Knowledge, Machine Translation, Text Understanding and Speech. These interconnected themes focus upon the socio-political goal of establishing DLE in Europe and on the scientific goal of Deep Natural Language Understanding, both by 2030.1

Georg Rehm, Andy Way
Metadaten
Titel
European Language Equality
herausgegeben von
Georg Rehm
Andy Way
Copyright-Jahr
2023
Electronic ISBN
978-3-031-28819-7
Print ISBN
978-3-031-28818-0
DOI
https://doi.org/10.1007/978-3-031-28819-7

Premium Partner