Skip to main content

2016 | Buch

Linguistic Linked Open Data

12th EUROLAN 2015 Summer School and RUMOUR 2015 Workshop, Sibiu, Romania, July 13-25, 2015, Revised Selected Papers

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 12th EUROLAN Summer School on Linguistic Linked Open Data and its Satellite Workshop on Social Media and the Web of Linked Data, RUMOUR 2015, held in Sibiu, Romania, in July 2015.

The 10 revised full papers presented together with 12 abstracts of tutorials were carefully reviewed and selected from 21 submissions.

Inhaltsverzeichnis

Frontmatter

Ontological Modeling of Social Media Data

Frontmatter
Ontological Modelling of Rumors
Abstract
In this paper, we present on-going work pursued in the context of the Pheme project. There, the detection of rumors in social media is playing a central role in two use cases. In order to be able to store and to query for information on specific types of rumors that can be circulated in such media (but also in “classical” media), we started to build ontological models of rumors, disputed claims, misinformation and veracity. As rumors can be considered as unverified statements, which after a certain time can be classified as either erroneous information or as facts, there is a need to model also the temporal information associated with any statement. As we are dealing in first line with social media, our modelling work should also cover information diffusion networks and user online behavior, which can also help in classifying a statement as a rumor or a fact. We focus in this paper on the core of our rumor ontology.
Thierry Declerck, Petya Osenova, Georgi Georgiev, Piroska Lendvai
Towards Creating an Ontology of Social Media Texts
Abstract
Texts live around us just as we live around them. At any instant, there are texts that people write, share, use to get informed, etc. (starting with an advertisement heard on the radio every morning and finishing with the contract of sale signed before a notary). Combining this with the concept of economy in language (or the principle of least effort) – a tendency shared by all humans – consisting in minimizing the amount of effort necessary to achieve the maximum result, it is no wonder why the social media, with its short, informal and context dependent texts, achieved such a high popularity.
Even texts are so constantly present in our lives (or precisely because of that), linguistic classification of texts is still debated, and no clear visualization of texts types is yet available. Going beyond the classification of texts in species and genres, this paper proposes an ontology which discusses the various text types, focusing on social media texts, and offering a set of properties to describe them.
Andreea Macovei, Oana Gagea, Diana Trandabăţ

Application of Social Media and Linked Data Methodologies in Real-Life Scenarios

Frontmatter
Towards Social Data Analytics for Smart Tourism: A Network Science Perspective
Abstract
In this paper we present our preliminary results regarding collecting, processing and visualizing relations between the user comments that were posted on Smart Tourism Web sites. The focus of this paper is on investigating the user interactions generated by expressing questions and answers containing the users’ impressions and opinions about the attractions offered by various tourism destinations. We propose a prototype system based on the design of a conceptual data model and of the development of a data processing workflow that allows to capture, to analyze and to query the implicit social network that was determined by the relations between user comments, using specialized software tools for graph databases and complex networks analytics.
Alex Becheru, Costin Bădică, Mihăiță Antonie
A Mixed Approach in Recognising Geographical Entities in Texts
Abstract
The paper describes an approach for automatic identification in Romanian texts of name entities belonging to the geographical domain. The research is part of a project (MappingBooks) aimed to link mentions of entities in an e-book with external information, as found in social media, Wikipedia, or web pages containing cultural or touristic information, in order to enhance the reader’s experience. The described name entity recognizer mixes ontological information, as found in public resources, with handwritten symbolic rules. The outputs of the two component modules are compared and heuristics are used to take decisions in cases of conflict.
Dan Cristea, Daniela Gîfu, Ionuţ Pistol, Daniel Sfirnaciuc, Mihai Niculiţă

User Profiling and Assessing the Suitability of Content from Social Media

Frontmatter
Image and User Profile-Based Recommendation System
Abstract
A great variety of websites try to help users in finding items of interest by offering a list of recommendations. It has become a function of great importance, especially for online stores. This paper presents a recommendation system for images which works with ratings to compute similarities, and with social profiling to introduce diversity in the list of suggestions.
Cristina Şerban, Lenuţa Alboaie, Adrian Iftene

Extracting and Linking Content

Frontmatter
Discovering Semantic Relations Within Nominals
Abstract
We are interested to develop a technology able to discover entities and relations connecting them, as expressed in fiction texts. Deciphering these links is a major step in understanding the content of books. In this study we consider the case of imbricated entities, therefore entities realized at the surface text level by imbricated spans. For this research we use the QuoVadis corpus, whose conventions of annotations we describe briefly, same as some statistics on the types of relations, features regarding the relations’ arguments and words or expressions functioning as triggers. The approach to recognize the semantic relations is based on patterns extracted from the corpus. The evaluation shows very promising results.
Mihaela Colhon, Dan Cristea, Daniela Gîfu
Quality Improvement Based on Big Data Analysis
Abstract
Big data analysis has become an important trend in computer science. Quality improvement is a constant in current industry trends. In this paper, we present an idea of quality improvement based on big data analysis with the aid of linked data and ontologies in order to implement it in the case of automotive parts production. We consider defective automotive products and try to find the best refurbishment solution for them considering their characteristics. Moreover, we propose to develop a recommender system that is able to give recommendations in order to prevent or to alleviate defects and to provide insights for possible causes that led to these defective parts. This study intends to help direct beneficiaries (public consumer, quality engineers, quality control managers), but also specialists and researchers in the NLP, software engineers, etc.
Radu Adrian Ciora, Carmen Mihaela Simion, Marius Cioca
Romanian Dictionaries. Projects of Digitization and Linked Data
Abstract
In the context of globalization and of interest for linked data, Romanian lexicography tries to harmonize to this trends by aligning its resources and adapting to the necessities of a diversity of users. The lexicographic tradition of the Romanian language passed through various periods, from glosses and small bilingual dictionaries, written in Slavonic alphabet (17th–19th century), to scholar dictionaries from the 20th century, written in Latin alphabet. This tradition was highlighted by different projects, some of them presented in this article, and these projects will continue to emphasize the Romanian language features in order to make accessible the Romanian language for the users and to offer the public research materials and resources of the Romanian culture.
Mădălin Ionel Patrașcu, Gabriela Haja, Marius Radu Clim, Elena Tamba

Sentiment Analysis in Social Media and Linked Data

Frontmatter
Extracting Features from Social Media Networks Using Semantics
Abstract
This paper focuses on the analysis of social media content generated by social networks (e.g. Twitter) in order to extract semantic features. By using text categorization to sort text feeds into categories of similar feeds, it has been proved to reduce the overhead that is required to retrieve these feeds and at the same time, it provides smaller pools in which further investigations can be made easier. The aim of this survey is to draw a user profile, by analysing his or her tweets. In this early stage of research, being a pre-processing phase, a dictionary based approach is considered. Moreover, the paper describes an algorithm used in analysing the text and its preliminary results. This paper is focusing to support research in Social Media exploration. Thus, it describes a tool useful for communication experts to analyse public speeches. So far, this tool gave promising results in inferring socio-political trends from social media content of public speakers. We also evaluated our experiment on Support Vector Machine (SVM) with 10-fold cross-validations.
Marius Cioca, Cosmin Cioranu, Radu Adrian Ciora

Social Data Mining to Create Structured Social Media Resources

Frontmatter
Including Social Media – A Very Dynamic Style – in the Corpora for Processing Romanian Language
Abstract
This paper aims to describe the process of introducing a new sub-corpus, in a new style, social media, in our UAIC-Ro-Dependency-Treebank. Our purpose is to enhance the corpus and to also include all the styles of the language. Unfortunately, the growth of the corpus is interrelated with the development of the syntactic parser. The inclusion of all the styles is a very difficult target; when parsing texts in a style for which the tools are not yet trained, the accuracy drops significantly. At least 1,000 sentences are needed for the first step of the training of the parser in a new style. We describe this first step that implies the introduction of social media style in the Treebank, the first series of orthographic, stylistic, pragmatic, lexical, semantic, syntactic, and discursive observations on this style of the language, and we communicate the first statistical evaluation of the automatic annotation.
Cenel-Augusto Perez, Cătălina Mărănduc, Radu Simionescu
Backmatter
Metadaten
Titel
Linguistic Linked Open Data
herausgegeben von
Diana Trandabăţ
Daniela Gîfu
Copyright-Jahr
2016
Electronic ISBN
978-3-319-32942-0
Print ISBN
978-3-319-32941-3
DOI
https://doi.org/10.1007/978-3-319-32942-0

Neuer Inhalt