Skip to main content

2014 | Buch

Semantic Web Evaluation Challenge

SemWebEval 2014 at ESWC 2014, Anissaras, Crete, Greece, May 25-29, 2014, Revised Selected Papers

herausgegeben von: Valentina Presutti, Milan Stankovic, Erik Cambria, Iván Cantador, Angelo Di Iorio, Tommaso Di Noia, Christoph Lange, Diego Reforgiato Recupero, Anna Tordai

Verlag: Springer International Publishing

Buchreihe : Communications in Computer and Information Science

insite
SUCHEN

Über dieses Buch

This book constitutes the thoroughly refereed post conference proceedings of the first edition of the Semantic Web Evaluation Challenge, SemWebEval 2014, co-located with the 11th Extended Semantic Web conference, held in Anissaras, Crete, Greece, in May 2014. This book includes the descriptions of all methods and tools that competed at SemWebEval 2014, together with a detailed description of the tasks, evaluation procedures and datasets. The contributions are grouped in three areas: semantic publishing (sempub), concept-level sentiment analysis (ssa), and linked-data enabled recommender systems (recsys).

Inhaltsverzeichnis

Frontmatter

Concept Level Sentiment Analysis

Frontmatter
ESWC’14 Challenge on Concept-Level Sentiment Analysis
Abstract
With the introduction of social networks, blogs, wikis, etc., the users’ behavior and their interaction in the Web have changed. As a consequence, people express their opinions and sentiments in a totally different way with respect to the past. All this information hinders potential business opportunities, especially within the advertising world, and key stakeholders need to catch up with the latest technology if they want to be at the forefront in the market. In practical terms, the automatic analysis of online opinions involves a deep understanding of natural language text, and it has been proved that the use of semantics improves the accuracy of existing sentiment analysis systems based on classical machine learning or statistical approaches. To this end, the Concept Level Sentiment Analysis challenge aims to provide a push in this direction offering the researchers an event where they can learn new approaches for the employment of Semantic Web features within their systems of sentiment analysis bringing to better performance and higher accuracy. The challenge aims to go beyond a mere word-level analysis of text and provides novel methods to process opinion data from unstructured textual information to structured machine-processable data.
Diego Reforgiato Recupero, Erik Cambria
A Fuzzy System for Concept-Level Sentiment Analysis
Abstract
An emerging field within Sentiment Analysis concerns the investigation about how sentiment concepts have to be adapted with respect to the different domains in which they are used. In the context of the Concept-Level Sentiment Analysis Challenge, we presented a system whose aims are twofold: (i) the implementation of a learning approach able to model fuzzy functions used for building the relationships graph representing the appropriateness between sentiment concepts and different domains (Task 1); and (ii) the development of a semantic resource based on the connection between an extended version of WordNet, SenticNet, and ConceptNet, that has been used both for extracting concepts (Task 2) and for classifying sentences within specific domains (Task 3).
Mauro Dragoni, Andrea G. B. Tettamanzi, Célia da Costa Pereira
Unsupervised Fine-Grained Sentiment Analysis System Using Lexicons and Concepts
Abstract
Sentiment is mainly analyzed at a document, sentence or aspect level. Document or sentence levels could be too coarse since polar opinions can co-occur even within the same sentence. In aspect level sentiment analysis often opinion-bearing terms can convey polar sentiment in different contexts. Consider the following laptop review: “the big plus was a large screen but having a large battery made me change my mind,” where polar opinions co-occur in the same sentence, and the opinion term that describes the opinion targets (“large”) encodes polar sentiments: a positive for screen, and a negative for battery. To parse these differences, our approach is to identify opinions with respect to the specific opinion targets, while taking the context into account. Moreover, considering that there is a problem of obtaining an annotated training set in each context, our approach uses unlabeled data.
Nir Ofek, Lior Rokach
Semantic Lexicon Expansion for Concept-Based Aspect-Aware Sentiment Analysis
Abstract
We have developed a prototype for sentiment analysis that is able to identify aspects of an entity being reviewed, along with the sentiment polarity associated to those aspects. Our approach relies on a core ontology of the task, augmented by a workbench for bootstrapping, expanding and maintaining semantic assets that are useful for a number of text analytics tasks. The workbench has the ability to start from classes and instances defined in an ontology and expand their corresponding lexical realizations according to target corpora. In this paper we present results from applying the resulting semantic asset to enhance information extraction techniques for concept-level sentiment analysis. Our prototype(Demo at http://​bit.​ly/​1svngDi) is able to perform SemSA’s Elementary Task (Polarity Detection), Advanced Task #1 (Aspect-Based Sentiment Analysis), and Advanced Task #3 (Topic Spotting).
Anni Coden, Dan Gruhl, Neal Lewis, Pablo N. Mendes, Meena Nagarajan, Cartic Ramakrishnan, Steve Welch
Dependency Tree-Based Rules for Concept-Level Aspect-Based Sentiment Analysis
Abstract
Over the last few years, the way people express their opinions has changed dramatically with the progress of social networks, web communities, blogs, wikis, and other online collaborative media. Now, people buy a product and express their opinion in social media so that other people can acquire knowledge about that product before they proceed to buy it. On the other hand, for the companies it has become necessary to keep track of the public opinions on their products to achieve customer satisfaction. Therefore, nowadays opinion mining is a routine task for every company for developing a widely acceptable product or providing satisfactory service. Concept-based opinion mining is a new area of research. The key parts of this research involve extraction of concepts from the text, determining product aspects, and identifying sentiment associated with these aspects. In this paper, we address each one of these tasks using a novel approach that takes text as input and use dependency parse tree-based rules to extract concepts and aspects and identify the associated sentiment. On the benchmark datasets, our method outperforms all existing state-of-the-art systems.
Soujanya Poria, Nir Ofek, Alexander Gelbukh, Amir Hussain, Lior Rokach
Sinica Semantic Parser for ESWC’14 Concept-Level Semantic Analysis Challenge
Abstract
We present a semantic parsing system to decompose a sentence into semantic-expressions/concepts for ESWC’14 semantic analysis challenge. The proposed system has a pipeline architecture, and is based on syntactic parsing and semantic role labeling of the candidate sentence. For the former task, we use Stanford English parser; and for the later task, we use an in-house developed semantic role labeling system. From the syntactically and semantically annotated sentence, the concepts are formulated using a set of hand-build concept-formulation patterns. We compare the proposed system’s performance to SenticNet with the help of few examples.
Shafqat Mumtaz Virk, Yann-Huei Lee, Lun-Wei Ku
Polarity Detection of Online Reviews Using Sentiment Concepts: NCU IISR Team at ESWC-14 Challenge on Concept-Level Sentiment Analysis
Abstract
In this paper, we present our system that participated in the Polarity Detection task, the elementary task in the ESWC-14 Challenge on Concept-Level Sentiment Analysis. In addition to traditional Bag-of-Words features, we also employ state-of-the-art Sentic API to extract concepts from documents to generate Bag-of-Sentiment-Concepts features. Our previous work SentiConceptNet serves as the reference concept-based sentiment knowledge base for concept-level sentiment analysis. Experimental results on our development set show that adding Bag-of-Sentiment-Concepts can improve the accuracy by 1.3 %, indicating the benefit of concept-level sentiment analysis. Our demo website is located at http://​140.​115.​51.​136:​5000.
Jay Kuan-Chieh Chung, Chi-En Wu, Richard Tzong-Han Tsai

Semantic Publishing

Frontmatter
Semantic Publishing Challenge – Assessing the Quality of Scientific Output
Abstract
Linked Open Datasets about scholarly publications enable the development and integration of sophisticated end-user services; however, richer datasets are still needed. The first goal of this Challenge was to investigate novel approaches to obtain such semantic data. In particular, we were seeking methods and tools to extract information from scholarly publications, to publish it as LOD, and to use queries over this LOD to assess quality. This year we focused on the quality of workshop proceedings, and of journal articles w.r.t. their citation network. A third, open task, asked to showcase how such semantic data could be exploited and how Semantic Web technologies could help in this emerging context.
Christoph Lange, Angelo Di Iorio
ROHub — A Digital Library of Research Objects Supporting Scientists Towards Reproducible Science
Abstract
Research Objects (ROs) are semantic aggregations of related scientific resources, their annotations and research context. They are meant to help scientists to refer to all the materials supporting their investigation. ROHub is a digital library system for ROs that supports their storage, lifecycle management and preservation. It provides a Web interface and a set of RESTful APIs enabling the sharing of scientific findings via ROs. Additionally, ROHub includes different features that help scientists throughout the research lifecycle to create and maintain high-quality ROs that can be interpreted and reproduced in the future. For instance, scientists can assess the conformance of an RO to a set of predefined requirements and create RO Snapshots, at any moment, to share, cite or submit to review the current state of research outcomes. ROHub can also generate nested ROs for workflow runs, exposing their content and annotations, and includes monitoring features that generate notifications when changes are detected.
Raúl Palma, Piotr Hołubowicz, Oscar Corcho, José Manuel Gómez-Pérez, Cezary Mazurek
Semantify CEUR-WS Proceedings: Towards the Automatic Generation of Highly Descriptive Scholarly Publishing Linked Datasets
Abstract
Rich and fine-grained semantic information describing varied aspects of scientific productions is essential to support their diffusion as well as to properly assess the quality of their output. To foster this trend, in the context of the ESWC2014 Semantic Publishing Challenge, we present a system that automatically generates rich RDF datasets from CEUR-WS workshop proceedings. Proceedings are analyzed through a sequence of processing phases. SVM classifiers complemented by heuristics are used to annotate missing CEUR-WS markups. Annotations are then linked to external datasets like DBpedia and Bibsonomy. Finally, the data is modeled and published as an RDF graph. Our system is provided as an on-line Web service to support on-the-fly RDF generation. In this paper we describe the system and present its evaluation following the procedure set by the organizers of the challenge.
Francesco Ronzano, Gerard Casamayor del Bosque, Horacio Saggion
A Template-Based Information Extraction from Web Sites with Unstable Markup
Abstract
This paper presents results of a work on crawling CEUR Workshop proceedings(CEUR Workshop proceedings web site, URL: http://​ceur-ws.​org) web site to a Linked Open Data (LOD) dataset in the framework of ESWC 2014 Semantic Publishing Challenge 2014(ESWC 2014 Semantic Publishing Challenge, URL: http://​2014.​eswc-conferences.​org/​semantic-publishing-challenge). Our approach is based on using an extensible template-dependent crawler and DBpedia for linking extracted entities, such as the names of universities and countries.
Maxim Kolchin, Fedor Kozlov
Linkitup: Semantic Publishing of Research Data
Abstract
Linkitup is a Web-based dashboard for enrichment of research output published via industry grade data repository services. It takes metadata entered through Figshare.com and tries to find equivalent terms, categories, persons or entities on the Linked Data cloud and several Web 2.0 services. It extracts references from publications, and tries to find the corresponding Digital Object Identifier (DOI). Linkitup feeds the enriched metadata back as links to the original article in the repository, but also builds a RDF representation of the metadata that can be downloaded separately, or published as research output in its own right. In this paper, we compare Linkitup to the standard workflow of publishing linked data, and show that it significantly lowers the threshold for publishing linked research data.
Rinke Hoekstra, Paul Groth, Marat Charlaganov
Understanding Research Dynamics
Abstract
Rexplore leverages novel solutions in data mining, semantic technologies and visual analytics, and provides an innovative environment for exploring and making sense of scholarly data. Rexplore allows users: (1) to detect and make sense of important trends in research; (2) to identify a variety of interesting relations between researchers, beyond the standard co-authorship relations provided by most other systems; (3) to perform fine-grained expert search with respect to detailed multi-dimensional parameters; (4) to detect and characterize the dynamics of interesting communities of researchers, identified on the basis of shared research interests and scientific trajectories; (5) to analyse research performance at different levels of abstraction, including individual researchers, organizations, countries, and research communities.
Francesco Osborne, Enrico Motta
Semantic Facets for Scientific Information Retrieval
Abstract
We present an Information Retrieval System for scientific publications that provides the possibility to filter results according to semantic facets. We use sentence-level semantic annotations that identify specific semantic relations in texts, such as methods, definitions, hypotheses, that correspond to common information needs related to scientific literature. The semantic annotations are obtained using a rule-based method that identifies linguistic clues organized into a linguistic ontology. The system is implemented using Solr Search Server and offers efficient search and navigation in scientific papers.
Iana Atanassova, Marc Bertin
Extraction and Semantic Annotation of Workshop Proceedings in HTML Using RML
Abstract
Despite the significant number of existing tools, incorporating data into the Linked Open Data cloud remains complicated; hence discouraging data owners to publish their data as Linked Data. Unlocking the semantics of published data, even if they are not provided by the data owners, can contribute to surpass the barriers posed by the low availability of Linked Data and come closer to the realisation of the envisaged Semantic Web. rml, a generic mapping language based on an extension over https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-12024-9_15/330768_1_En_15_IEq1_HTML.gif , the https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-12024-9_15/330768_1_En_15_IEq2_HTML.gif standard for mapping relational databases into rdf, offers a uniform way of defining the mapping rules for data in heterogeneous formats. In this paper, we present how we adjusted our prototype rml  Processor, taking advantage of rml’s scalability, to extract and map data of workshop proceedings published in html to the rdf data model for the Semantic Publishing Challenge needs.
Anastasia Dimou, Miel Vander Sande, Pieter Colpaert, Laurens De Vocht, Ruben Verborgh, Erik Mannens, Rik Van de Walle
Extraction and Characterization of Citations in Scientific Papers
Abstract
We propose a hybrid method for the extraction and characterization of citations in scientific papers using machine learning combined with rule-based approaches. Our protocol consists of the extraction of metadata, bibliography parsing, section titles processing, and find-grained semantic annotation on the sentence level of texts. This allows us to generate Linked Open Data from a set of research papers in XML.
Marc Bertin, Iana Atanassova

Linked-Data Enabled Recommender Systems

Frontmatter
Linked Open Data-Enabled Recommender Systems: ESWC 2014 Challenge on Book Recommendation
Abstract
In this chapter we present a report of the ESWC 2014 Challenge on Linked Open Data-enabled Recommender Systems, which consisted of three tasks in the context of book recommendation: rating prediction in cold-start situations, top N recommendations from binary user feedback, and diversity in content-based recommendations. Participants were requested to address the tasks by means of recommendation approaches that made use of Linked Open Data and semantic technologies. In the chapter we describe the challenge motivation, goals and tasks, summarize and compare the nine final participant recommendation approaches, and discuss their experimental results and lessons learned. Finally, we end with some conclusions and potential lines of future research.
Tommaso Di Noia, Iván Cantador, Vito Claudio Ostuni
Hybrid Recommending Exploiting Multiple DBPedia Language Editions
Abstract
In this paper we describe approach of our SemWex1 group to the ESWC 2014 RecSys Challenge. Our method is based on using an adaptation of Content Boosted Matrix factorization [1], where objects are defined through their content-based features. Features were comprised of both direct DBPedia RDF triples and derived semantic information (with some WIE and NLP features). Total of seven DBPedia language editions were used to form the dataset. In the paper we will further describe our methods for semantic information creation, data filtration, algorithm details and settings as well as decisions made during the challenge and dead ends we explored.
Ladislav Peska, Peter Vojtas
A Hybrid Multi-strategy Recommender System Using Linked Open Data
Abstract
In this paper, we discuss the development of a hybrid multi-strategy book recommendation system using Linked Open Data. Our approach builds on training individual base recommenders and using global popularity scores as generic recommenders. The results of the individual recommenders are combined using stacking regression and rank aggregation. We show that this approach delivers very good results in different recommendation settings and also allows for incorporating diversity of recommendations.
Petar Ristoski, Eneldo Loza Mencía, Heiko Paulheim
Exploring Semantic Features for Producing Top-N Recommendation Lists from Binary User Feedback
Abstract
In this paper, we report the experiments that we conducted for two of the tasks of the ESWC’14 Challenge on Linked Open Data (LOD)-enabled Recommender Systems. Task 2 and Task 3 dealt with the top-N recommendation problem from a binary user feedback dataset and results were evaluated on the accuracy and diversity respectively of the recommendations produced in a Top-N recommendation list for each user. The DBbook dataset was used in both tracks in which the books had been mapped to their corresponding DBpedia URIs. Since the mappings could be used to extract semantic features from DBpedia, in all our experiments, we avoided the use of any collaborative filtering methods (e.g. user/item K-nearest neighbors and matrix factorization approaches) and instead focused exclusively on the semantic features of the items. Even though the performance of our methods did not beat the best performing approaches of other teams, our results indicate that it is indeed feasible to create effective recommender systems which fully utilize the content of the items they deal with by utilizing information from the Semantic Web.
Nicholas Ampazis, Theodoros Emmanouilidis
Content-Based Recommender Systems + DBpedia Knowledge = Semantics-Aware Recommender Systems
Abstract
This paper provides an overview of the work done in the ESWC Linked Open Data-enabled Recommender Systems challenge, in which we proposed an ensemble of algorithms based on popularity, Vector Space Model, Random Forests, Logistic Regression, and PageRank, running on a diverse set of semantic features. We ranked 1st in the top-N recommendation task, and 3rd in the tasks of rating prediction and diversity.
Pierpaolo Basile, Cataldo Musto, Marco de Gemmis, Pasquale Lops, Fedelucio Narducci, Giovanni Semeraro
SemStim at the LOD-RecSys 2014 Challenge
Abstract
SemStim is a graph-based recommendation algorithm which is based on Spreading Activation and adds targeted activation and duration constraints. SemStim is not affected by data sparsity, the cold-start problem or data quality issues beyond the linking of items to DBpedia. The overall results show that the performance of SemStim for the diversity task of the challenge is comparable to the other participants, as it took 3rd place out of 12 participants with 0.0413 F1@20 and 0.476 ILD@20. In addition, as SemStim has been designed for the requirements of cross-domain recommendations with different target and source domains, this shows that SemStim can also provide competitive single-domain recommendations.
Benjamin Heitmann, Conor Hayes
Popular Books and Linked Data: Some Results for the ESWC’14 RecSys Challenge
Abstract
Within this paper we present our contribution to Task 2 of the ESWC’14 Recommender Systems Challenge. First we describe an unpersonalized baseline approach that uses no linked-data but applies a naive way to compute the overall popularity of the items observed in the training data. Despite being very simple and unpersonalized, we achieve a competitive \(F_1\) measure of 0.5583. Then we describe an algorithm that makes use of several features acquired from DBpedia, like author and type, and self-generated features like abstract-based keywords, for item representation and comparison. Item recommendations are generated by a mixture-model of individual classifiers that have been learned per feature on a user neighborhood cluster in combination with a global classifier learned on all training data. While our Linked-Data-based approach achieves an \(\mathrm{F}_1\) measure of 0.5649, the increase over the popularity baseline remains surprisingly low.
Michael Schuhmacher, Christian Meilicke
A Semantic Pattern-Based Recommender
Abstract
This paper presents a novel approach for Linked Data-based recommender systems through the use of semantic patterns - generalized paths in a graph described through the types of the nodes and links involved. We apply this novel approach to the book dataset from the ESWC2014 recommender systems challenge. User profiles are built by aggregating ratings on patterns with respect to each book in provided user training set. Ratings are aggregated by estimating the expected value of a Beta distribution describing the rating given to each individual book. Our approach allows the determination of a rating for a book, even if the book is poorly connected with user profile. It allows for a “prudent” estimation thanks to smoothing. However, if many patterns are available, it considers all the contributions. Additionally, it allows for a lightweight computation of ratings as it exploits the knowledge encoded in the patterns. Our approach achieved a precision of 0.60 and an overall F-measure of about 0.52 on the ESWC2014 challenge.
Valentina Maccatrozzo, Davide Ceolin, Lora Aroyo, Paul Groth
Increasing Top-20 Diversity Through Recommendation Post-processing
Abstract
This paper presents two different methods for diversifying recommendations that were developed as part of the ESWC2014 challenge. Both methods focus on post-processing recommendations provided by the baseline recommender system and have increased the ILD at the cost of final precision (measured with F@20). The authors feel that this method has potential yet requires further development and testing.
Matevž Kunaver, Tomaž Požrl, Štefan Dobravec, Uroš Droftina, Andrej Košir
Hybrid Model Rating Prediction with Linked Open Data for Recommender Systems
Abstract
We detail the solution of team uniandes1 to the ESWC 2014 Linked Open Data-enabled Recommender Systems Challenge Task 1 (rating prediction on a cold start situation). In these situations, there are few ratings per item and user and thus collaborative filtering techniques may not be suitable. In order to be able to use a content-based solution, linked-open data from DBPedia was used to obtain a set of descriptive features for each item. We compare the performance (measured as RMSE) of three models on this cold-start situation: content-based (using min-count sketches), collaborative filtering (SVD++) and rule-based switched hybrid models. Experimental results show that the hybrid system outperforms each of the models that compose it. Since features taken from DBPedia were sparse, we clustered items in order to reduce the dimensionality of the item and user profiles.
Andrés Moreno, Christian Ariza-Porras, Paula Lago, Claudia Lucía Jiménez-Guarín, Harold Castro, Michel Riveill
Deep Learning of Semantic Word Representations to Implement a Content-Based Recommender for the RecSys Challenge’14
Abstract
In this paper, we will discuss a recommender system that exploits the semantics regularities captured by a Recurrent Neural Network (RNN) in text documents. Many information retrieval systems treat words as binary vectors under the classic bag-of-words model; however there is not a notion of semantic similarity between words when describing a document in the resulting feature space. Recent advances in neural networks have shown that continuous word vectors can be learned as a probability distribution over the words of a document [3, 4]. Surprisingly, researchers have found that algebraic operations on this new representation captures semantic regularities in language [5]. For example, \(Intel + Pentium - Google\) results in word vectors associated to \(\{Search, Android, Phones\}\).
We used this deep learning approach to discover the continuous features describing content of documents with vectors of semantic words and fitted a linear regression model to approximate user preferences for documents. Our submission to the RecSys Challenge’14 obtained a RMSE of \(0.902\) and ranked 6th for Task 1. Interestingly enough, our approach provided better vector representations than LDA, LSA, and PCA for modeling the content of book abstracts, which are well-known techniques currently used to implement content-based recommender systems in the recommendation community.
Omar U. Florez
Backmatter
Metadaten
Titel
Semantic Web Evaluation Challenge
herausgegeben von
Valentina Presutti
Milan Stankovic
Erik Cambria
Iván Cantador
Angelo Di Iorio
Tommaso Di Noia
Christoph Lange
Diego Reforgiato Recupero
Anna Tordai
Copyright-Jahr
2014
Electronic ISBN
978-3-319-12024-9
Print ISBN
978-3-319-12023-2
DOI
https://doi.org/10.1007/978-3-319-12024-9

Neuer Inhalt