Skip to main content

Über dieses Buch

This book constitutes the refereed proceedings of the 17th International Conference on Web Engineering, ICWE 2017, held in Rome, Italy, in June 2017.
The 20 full research papers and 12 short papers presented together with 6 application papers, 6 demonstration papers, and 6 contributions to the PhD Symposium, were carefully reviewed and selected from 139 submissions. The papers cover research areas such as Web application modeling and engineering, human computation and crowdsourcing applications, Web applications composition and mashup, Social Web applications, Semantic Web applications, Web of Things applications, and big data.



Technical Research Papers


Evaluating Knowledge Anchors in Data Graphs Against Basic Level Objects

The growing number of available data graphs in the form of RDF Linked Data enables the development of semantic exploration applications in many domains. Often, the users are not domain experts and are therefore unaware of the complex knowledge structures represented in the data graphs they interact with. This hinders users’ experience and effectiveness. Our research concerns intelligent support to facilitate the exploration of data graphs by users who are not domain experts. We propose a new navigation support approach underpinned by the subsumption theory of meaningful learning, which postulates that new concepts are grasped by starting from familiar concepts which serve as knowledge anchors from where links to new knowledge are made. Our earlier work has developed several metrics and the corresponding algorithms for identifying knowledge anchors in data graphs. In this paper, we assess the performance of these algorithms by considering the user perspective and application context. The paper address the challenge of aligning basic level objects that represent familiar concepts in human cognitive structures with automatically derived knowledge anchors in data graphs. We present a systematic approach that adapts experimental methods from Cognitive Science to derive basic level objects underpinned by a data graph. This is used to evaluate knowledge anchors in data graphs in two application domains - semantic browsing (Music) and semantic search (Careers). The evaluation validates the algorithms, which enables their adoption over different domains and application contexts.

Marwan Al-Tawil, Vania Dimitrova, Dhavalkumar Thakker, Alexandra Poulovassilis

Decentralized Evolution and Consolidation of RDF Graphs

The World Wide Web and the Semantic Web are designed as a network of distributed services and datasets. In this network and its genesis, collaboration played and still plays a crucial role. But currently we only have central collaboration solutions for RDF data, such as SPARQL endpoints and wiki systems, while decentralized solutions can enable applications for many more use-cases. Inspired by a successful distributed source code management methodology in software engineering a framework to support distributed evolution is proposed. The system is based on Git and provides distributed collaboration on RDF graphs. This paper covers the formal expression of the evolution and consolidation of distributed datasets, the synchronization, as well as other supporting operations.

Natanael Arndt, Michael Martin

The BigDataEurope Platform – Supporting the Variety Dimension of Big Data

The management and analysis of large-scale datasets – described with the term Big Data – involves the three classic dimensions volume, velocity and variety. While the former two are well supported by a plethora of software components, the variety dimension is still rather neglected. We present the BDE platform – an easy-to-deploy, easy-to-use and adaptable (cluster-based and standalone) platform for the execution of big data components and tools like Hadoop, Spark, Flink, Flume and Cassandra. The BDE platform was designed based upon the requirements gathered from seven of the societal challenges put forward by the European Commission in the Horizon 2020 programme and targeted by the BigDataEurope pilots. As a result, the BDE platform allows to perform a variety of Big Data flow tasks like message passing, storage, analysis or publishing. To facilitate the processing of heterogeneous data, a particular innovation of the platform is the Semantic Layer, which allows to directly process RDF data and to map and transform arbitrary data into RDF. The advantages of the BDE platform are demonstrated through seven pilots, each focusing on a major societal challenge.

Sören Auer, Simon Scerri, Aad Versteden, Erika Pauwels, Angelos Charalambidis, Stasinos Konstantopoulos, Jens Lehmann, Hajira Jabeen, Ivan Ermilov, Gezim Sejdiu, Andreas Ikonomopoulos, Spyros Andronopoulos, Mandy Vlachogiannis, Charalambos Pappas, Athanasios Davettas, Iraklis A. Klampanos, Efstathios Grigoropoulos, Vangelis Karkaletsis, Victor de Boer, Ronald Siebes, Mohamed Nadjib Mami, Sergio Albani, Michele Lazzarini, Paulo Nunes, Emanuele Angiuli, Nikiforos Pittaras, George Giannakopoulos, Giorgos Argyriou, George Stamoulis, George Papadakis, Manolis Koubarakis, Pythagoras Karampiperis, Axel-Cyrille Ngonga Ngomo, Maria-Esther Vidal

Spatially Cohesive Service Discovery and Dynamic Service Handover for Distributed IoT Environments

The proliferation of the Internet of Things (IoT) enables the provision of diverse services that utilize IoT resources distributed in ad-hoc network environments. This has resulted in a new challenge, the issue of how to efficiently and dynamically discover appropriate IoT services that are necessary to accomplish a user task in the vicinity of the user. In this paper, we propose a service discovery method that finds IoT services from a user’s surrounding environment in a spatially cohesive manner so that the interactions among the services can be efficiently carried out, and the outcome of service coordination can be effectively delivered to the user. In addition, to ensure a certain Quality of user Experience (QoE) level for the user task, we develop a service handover approach that dynamically switches from one IoT resource to an alternative one to provide services in a stable manner when the degradation of the spatial cohesiveness of the services is monitored. The spatio-cohesive service discovery and dynamic service handover algorithms are evaluated by simulating a mobile ad-hoc network (MANET) based IoT environment. Then, various service discovery strategies are implemented on this simulation environment, and several options for the service discovery and handover algorithms are tested. The simulation results show that compared to various baseline approaches, the proposed approach results in a significant improvement in the spatial cohesiveness of the services discovered for user tasks. The results also show that the approach efficiently adapts to dynamically changing distributed IoT environments.

Kyeong-Deok Baek, In-Young Ko

ALMOsT.js: An Agile Model to Model and Model to Text Transformation Framework

Model Driven Development (MDD) requires model-to-model and/or model-to-text transformations to produce application code from high level descriptions. Creating such transformations is in itself a complex task, which requires mastering meta-modeling, ad hoc transformation languages, and custom development tools. This paper presents ALMOsT.js, an agile, in-browser framework for the rapid prototyping of MDD transformations, which lowers the technical skills required for Web and mobile developers to start be proficient with modeling and code generation. ALMOsT.js is shown at work in the creation of, a browser-based, online/offline environment for the MDD specification and rapid prototyping of web and mobile applications.

Carlo Bernaschina

A Big Data Analysis Framework for Model-Based Web User Behavior Analytics

While basic Web analytics tools are widespread and provide statistics about website navigation, no approaches exist for merging such statistics with information about the Web application structure, content and semantics. Current analytics tools only analyze the user interaction at page level in terms of page views, entry and landing page, page views per visit, and so on. We show the advantages of combining Web application models with runtime navigation logs, at the purpose of deepening the understanding of users behaviour. We propose a model-driven approach that combines user interaction modeling (based on the IFML standard), full code generation of the designed application, user tracking at runtime through logging of runtime component execution and user activities, integration with page content details, generation of integrated schema-less data streams, and application of large-scale analytics and visualization tools for big data, by applying both traditional data visualization techniques and direct representation of statistics on visual models of the Web application.

Carlo Bernaschina, Marco Brambilla, Andrea Mauri, Eric Umuhoza

From Search Engines to Augmented Search Services: An End-User Development Approach

The World Wide Web is a vast and continuously changing source of information where searching is a frequent, and sometimes critical, user task. Searching is not always the user’s primary goal but an ancillary task that is performed to find complementary information allowing to complete another task. In this paper, we explore primary and/or ancillary search tasks and propose an approach for simplifying the user interaction during search tasks. Rather than focusing on dedicated search engines, our approach allows the user to abstract search engines already provided by Web applications into pervasive search services that will be available for performing searches from any other Web site. We also propose allowing users to manage the way in which the search results are presented and some possible interactions. In order to illustrate the feasibility of this approach, we have built a support tool based on a plug-in architecture that allows users to integrate new search services (created by themselves by means of visual tools) and execute them in the context of both kinds of searches. A case study illustrates the use of such tool. We also present the results of two evaluations that demonstrate the feasibility of the approach and the benefits in its use.

Gabriela Bosetti, Sergio Firmenich, Alejandro Fernandez, Marco Winckler, Gustavo Rossi

Temporal Analysis of Social Media Response to Live Events: The Milano Fashion Week

Social media response to catastrophic events, such as natural disasters or terrorist attacks, has received a lot of attention. However, social media are also extremely important in the context of planned events, such as fairs, exhibits, festivals, as they play an essential role in communicating them to fans, interest groups, and the general population. These kinds of events are geo-localized within a city or territory and are scheduled within a public calendar. We consider a specific scenario, the Milano Fashion Week (MFW), which is an important event in our city.We focus our attention on the spreading of social content in time, measuring the delay of the event propagation. We build different clusters of stakeholders (fashion brands), we characterize several features of time propagation and we correlate it to the popularity of involved actors. We show that the clusters by time and popularity are loosely correlated, and therefore the time response cannot be easily inferred. This motivates the development of a predictor through supervised learning in order to anticipate the space cluster of a new brand.

Marco Brambilla, Stefano Ceri, Florian Daniel, Gianmarco Donetti

Trading Off Popularity for Diversity in the Results Sets of Keyword Queries on Linked Data

Keyword search is the most popular technique for querying the ever growing repositories of RDF graph data on the Web. However, keyword queries are ambiguous. As a consequence, they typically produce on linked data a huge number of candidate results corresponding to a plethora of alternative query interpretations. Current approaches ignore the diversity of the result interpretations and might fail to satisfy the users who are looking for less popular results. In this paper, we propose a novel approach for keyword search result diversification on RDF graphs. Our approach instead of diversifying the query results per se, diversifies the interpretations of the query (i.e., pattern graphs). We model the problem as an optimization problem aiming at selecting k pattern graphs which maximize an objective function balancing relevance and diversity. We devise metrics to assess the relevance and diversity of a set of pattern graphs, and we design a greedy heuristic algorithm to generate a relevant and diverse list of k pattern graphs for a given keyword query. The experimental results show the effectiveness of our approach and proposed metrics and also the efficiency of our algorithm.

Ananya Dass, Dimitri Theodoratos

The Qanary Ecosystem: Getting New Insights by Composing Question Answering Pipelines

The field of Question Answering (QA) is very multi-disciplinary as it requires expertise from a large number of areas such as natural language processing (NLP), artificial intelligence, machine learning, information retrieval, speech recognition and semantic technologies. In the past years a large number of QA systems were proposed using approaches from different fields and focusing on particular tasks in the QA process. Unfortunately, most of these systems cannot be easily reused, extended, and results cannot be easily reproduced since the systems are mostly implemented in a monolithic fashion, lack standardized interfaces and are often not open source or available as Web services. To address these issues we developed the knowledge-based Qanary methodology for choreographing QA pipelines distributed over the Web. Qanary employs the qa vocabulary as an exchange format for typical QA components. As a result, QA systems can be built using the Qanary methodology in a simpler, more flexible and standardized way while becoming knowledge-driven instead of being process-oriented. This paper presents the components and services that are integrated using the qa vocabulary and the Qanary methodology within the Qanary ecosystem. Moreover, we show how the Qanary ecosystem can be used to analyse QA processes to detect weaknesses and research gaps. We illustrate this by focusing on the Entity Linking (EL) task w.r.t. textual natural language input, which is a fundamental step in most QA processes. Additionally, we contribute the first EL benchmark for QA, as open source. Our main goal is to show how the research community can use Qanary to gain new insights into QA processes.

Dennis Diefenbach, Kuldeep Singh, Andreas Both, Didier Cherix, Christoph Lange, Sören Auer

Improving Reliability of Crowdsourced Results by Detecting Crowd Workers with Multiple Identities

Quality control in crowdsourcing marketplaces plays a vital role in ensuring useful outcomes. In this paper, we focus on tackling the issue of crowd workers participating in tasks multiple times using different worker-ids to maximize their earnings. Workers attempting to complete the same task repeatedly may not be harmful in cases where the aim of a requester is to gather data or annotations, wherein more contributions from a single worker are fruitful. However, in several cases where the outcomes are subjective, requesters prefer the participation of distinct crowd workers. We show that traditional means to identify unique crowd workers such as worker-ids and ip-addresses are not sufficient. To overcome this problem, we propose the use of browser fingerprinting in order to ascertain the unique identities of crowd workers in paid crowdsourcing microtasks. By using browser fingerprinting across 8 different crowdsourced tasks with varying task difficulty, we found that 6.18% of crowd workers participate in the same task more than once, using different worker-ids to avoid detection. Moreover, nearly 95% of such workers in our experiments pass gold-standard questions and are deemed to be trustworthy, significantly biasing the results thus produced.

Ujwal Gadiraju, Ricardo Kawase

Maturity Model for Liquid Web Architectures

Liquid Web applications adapt to the set of connected devices and flow seamlessly between them following the user attention. As opposed to traditional centralised architectures, in which data and logic of the application resides entirely on a Web server, Liquid software needs decentralised or distributed architectures in order to achieve seamless application mobility between clients. By decomposing Web application architectures into layers, following the Model View Controller design pattern, we define a maturity model for Web application architectures evolving from classical solid applications deployed on single devices, to fully liquid applications deployed across multiple Web-enabled devices. The maturity model defines different levels based on where the application layers are deployed and how they migrate or synchronize their state across multiple devices. The goal of the maturity model described in this paper is to understand, control and describe how Web applications following the liquid user experience paradigm are designed and also provide Web developers with a gradual adoption path to evolve existing Web applications.

Andrea Gallidabino, Cesare Pautasso

Twisting Web Pages for Saving Energy

Battery capacity (energy density) is increasing at around 3% per year. However, the increasing requirements of the mobile platform is placing higher demands on this capacity. In this case, there are three options: decrease our expectations of the mobile platform, increase the capacity and therefore size and weight of our batteries, or create energy saving solutions to extend battery-life with minimal effect on platform performance. Here we present a system called Twes+ which is inline with the last option and aims to transcode web pages for increasing battery-life when surfing-the-web without changing the look and feel. Our evaluation results show that there is a statistically significant energy saving when using our Twes+ transcoder. Our redirect service brings a 4.6% cumulative processor energy reduction, while image transcoding service, brings a 7% cumulative processor energy reduction. These savings equate to between a 40 to 60 min saving depending on the mobile device.

Eda Köksal, Yeliz Yeşilada, Simon Harper

MateTee: A Semantic Similarity Metric Based on Translation Embeddings for Knowledge Graphs

Large Knowledge Graphs (KGs), e.g., DBpedia or Wikidata, are created with the goal of providing structure to unstructured or semi-structured data. Having these special datasets constantly evolving, the challenge is to utilize them in a meaningful, accurate, and efficient way. Further, exploiting semantics encoded in KGs, e.g., class and property hierarchies, provides the basis for addressing this challenge and producing a more accurate analysis of KG data. Thus, we focus on the problem of determining relatedness among entities in KGs, which corresponds to a fundamental building block for any semantic data integration task. We devise MateTee, a semantic similarity measure that combines the gradient descent optimization method with semantics encoded in ontologies, to precisely compute values of similarity between entities in KGs. We empirically study the accuracy of MateTee with respect to state-of-the-art methods. The observed results show that MateTee is competitive in terms of accuracy with respect to existing methods, with the advantage that background domain knowledge is not required.

Camilo Morales, Diego Collarana, Maria-Esther Vidal, Sören Auer

Improved Developer Support for the Detection of Cross-Browser Incompatibilities

Various tools are available to help developers detect cross-browser incompatibilities (XBIs) by testing the documents generated by their code. We propose an approach that enables XBIs to be detected earlier in the development cycle by providing support in the IDE as the code is being written. This has the additional advantage of making it clear to the developers where the sources of the problems are and how to fix them. We present wIDE which is an extension to an IDE designed specifically to support web developers. wIDE uses a compatibility knowledge base to scan the source code for XBIs. The knowledge base is extracted automatically from online resources and periodically updated to ensure that the compatibility information is always up-to-date. In addition, developers can query documentation from within the IDE to access descriptions and usage examples of code statements. We report on a qualitative user study where developers provided positive feedback about the approach, but raised some issues to address in future work.

Alfonso Murolo, Fabian Stutz, Maria Husmann, Moira C. Norrie

Proximity-Based Adaptation of Web Content on Public Displays

Viewers of public displays perceive the content of a display at different sizes according to their distance from the display. While responsive design adapts web content to different viewing contexts, so far only the characteristics of the device and browser are taken into account. We show how these techniques could be extended to consider viewer proximity as part of the viewing context in the case of public displays. We propose a general model for proximity-based adaptation and present a JavaScript framework that we developed to support experimentation with variants of the model in both single and multi-viewer contexts. We also report on an initial user study based on single viewers which yielded promising results in terms of improved user perception and engagement.

Amir E. Sarabadani Tafreshi, Kim Marbach, Moira C. Norrie

Ontology-Enhanced Aspect-Based Sentiment Analysis

With many people freely expressing their opinions and feelings on the Web, much research has gone into modeling and monetizing opinionated, and usually unstructured and textual, Web-based content. Aspect-based sentiment analysis aims to extract the fine-grained topics, or aspects, that people are talking about, together with the sentiment expressed on those aspects. This allows for a detailed analysis of the sentiment expressed in, for instance, product and service reviews. In this work we focus on knowledge-driven solutions that aim to complement standard machine learning methods. By encoding common domain knowledge into a knowledge repository, or ontology, we are able to exploit this information to improve classification performance for both aspect detection and aspect sentiment analysis. For aspect detection, the ontology-enhanced method needs only 20% of the training data to achieve results comparable with a standard bag-of-words approach that uses all training data.

Kim Schouten, Flavius Frasincar, Franciska de Jong

Vision Papers


Inter-parameter Constraints in Contemporary Web APIs

Today’s web applications often rely on a myriad of external web APIs, communicating with them through various HTTP requests spread throughout the application. These APIs are often textually described by constraints on the inputs and outputs of their entry points. In this paper we discuss constraints in web APIs that span multiple parameters. We show that these constraints are common in web APIs, but cannot be expressed in existing machine-readable API specification languages. We envision the emergence of constraint-centric specification languages which focus on expressing constraints and describe a prototypical language that supports constraints over multiple parameters.

Nathalie Oostvogels, Joeri De Koster, Wolfgang De Meuter

Collaborative Item Embedding Model for Implicit Feedback Data

Collaborative filtering is the most popular approach for recommender systems. One way to perform collaborative filtering is matrix factorization, which characterizes user preferences and item attributes using latent vectors. These latent vectors are good at capturing global features of users and items but are not strong in capturing local relationships between users or between items. In this work, we propose a method to extract the relationships between items and embed them into the latent vectors of the factorization model. This combines two worlds: matrix factorization for collaborative filtering and item embedding, a similar concept to word embedding in language processing. Our experiments on three real-world datasets show that our proposed method outperforms competing methods on top-n recommendation tasks.

ThaiBinh Nguyen, Kenro Aihara, Atsuhiro Takasu

Short Papers


Impact of Referral Incentives on Mobile App Reviews

Product owners occasionally provide referral incentives to the customers (e.g. coupons, bonus points, referral rewards). However, clever customers can write their referral codes in online review pages to maximize incentives. While these reviews are beneficial for both writers and product owners, the core motivation behind such reviews is monetary as opposed to helping potential customers. In this paper, we analyze referral reviews in the Google Play store and identify groups of users that have been consistently taking part in writing such abusive reviews. We further explore how such referral reviews indeed help the mobile apps in gaining popularity when compared to apps that do not provide incentives. We also find an increasing trend in the number of apps being targeted by abusers, which, if continued, will render review systems as crowd advertising platforms rather than an unbiased source of helpful information.

Noor Abu-El-Rub, Amanda Minnich, Abdullah Mueen

Towards Stochastic Performance Models for Web 2.0 Applications

System performance is one of the most critical quality characteristics of Web applications which is typically expressed in response time, throughput, and utilization. These performance indicators, as well as the workload of a system, may be evaluated and analyzed by (i) model-based or (ii) measurement-based techniques. Given the complementary benefits offered by both techniques, it seems beneficial to combine them. For this purpose we introduce a combined performance engineering approach by presenting a concise way of describing user behavior by Markov models and derive from them workloads on resources. By means of an empirical user test, we evaluate the Markov assumption for a given Web 2.0 application which is an important prerequisite for our approach.

Johannes Artner, Alexandra Mazak, Manuel Wimmer

Web Intelligence Linked Open Data for Website Design Reuse

Code and design reuse are as old as software engineering industry itself, but it’s also always a new trend, as more and more software products and websites are being created. Domain-specific design reuse on the web has especially high potential, saving work effort for thousands of developers and encouraging better interaction quality for millions of Internet users. In our paper we perform pilot feature engineering for finding similar solutions (website designs) within Domain, Task, and User UI models supplemented by Quality aspects. To obtain the feature values, we propose extraction of website-relevant data from online global services (DMOZ, Alexa, SimilarWeb, etc.) considered as linked open data sources, using specially developed web intelligence data miner. The preliminary investigation with 21 websites and 82 human annotators showed reasonable accuracy of the data sources and suggests potential feasibility of the approach.

Maxim Bakaev, Vladimir Khvorostov, Sebastian Heil, Martin Gaedke

Exploratory Search of Web Data Services Based on Collective Intelligence

Developers of data-intensive web applications benefit from the integration of data sourced from the web. Web data services are solutions off-the-shelf, provided by third parties, that enable access to web data sources. Web data services are usually discovered according to different features, related to lightweight descriptions. Recent approaches in literature convey on new research challenges, considering also collective intelligence in developers’ networks, containing information about service co-usage in existing applications and ratings on services given by developers who used them in their own development experiences. Following this direction, in this paper, we contribute with a distinguishing viewpoint, by proposing an explorative approach, that enables web applications developers to iteratively discover services of interest by also relying on collective intelligence, in a Web 2.0 context.

Devis Bianchini, Valeria De Antonellis, Michele Melchiori

Tweetchain: An Alternative to Blockchain for Crowd-Based Applications

The assurance of information in the crowdsourcing domain cannot be committed to a single party, but should be distributed over the crowd. Blockchain is an infrastructure allowing this, because transactions are broadcast to the entire community and verified by miners. A node (or a coalition of nodes) with high computational power can play the role of miner to verify and approve transactions by computing the proof of work. Miners follows a highest-fee-first-served policy, so that a provider of a Blockchain-based application has to pay a non-negligible fee per transaction, to increase the likelihood that the application proceeds. This makes Blockchain not suitable for small-value transactions often occurring in the crowdsourcing paradigm. To overcome this drawback, in this paper we propose an alternative to Blockchain, leveraging an online social network (we choose Twitter to provide a proof of concept). Our protocol works by building a meshed chain of public posts to ensure transaction security instead of proof of work, and no trustworthiness assumption is required for the social network provider.

Francesco Buccafurri, Gianluca Lax, Serena Nicolazzo, Antonino Nocera

The Dimensions of Crowdsourcing Task Design

Crowdsourcing, i.e., the provision of micro-tasks to be executed by a large pool of possibly anonymous workers, is attracting an increasing research attention, because it promises to help solving many scientific and practical problems where the harmonic cooperation of humans and machines delivers superior results. This paper proposes a systematic view of the crowdsourcing task design space and categorizes the dimensions that qualify the design decisions in crowdsourcing applications. For each dimension, we discuss the main open research problems and the most significant contributions, thereby offering guidelines for a principled understanding of current crowdsourcing marketplaces.

Ilio Catallo, Davide Martinenghi

Public Transit Route Planning Through Lightweight Linked Data Interfaces

While some public transit data publishers only provide a data dump – which only few reusers can afford to integrate within their applications – others provide a use case limiting origin-destination route planning api. The Linked Connections framework instead introduces a hypermedia api, over which the extendable base route planning algorithm “Connections Scan Algorithm” can be implemented. We compare the cpu usage and query execution time of a traditional server-side route planner with the cpu time and query execution time of a Linked Connections interface by evaluating query mixes with increasing load. We found that, at the expense of a higher bandwidth consumption, more queries can be answered using the same hardware with the Linked Connections server interface than with an origin-destination api, thanks to an average cache hit rate of 78%. The findings from this research show a cost-efficient way of publishing transport data that can bring federated public transit route planning at the fingertips of anyone.

Pieter Colpaert, Ruben Verborgh, Erik Mannens

A WebRTC Extension to Allow Identity Negotiation at Runtime

In this paper we describe our implementation of the WebRTC identity architecture. We adapt OpenID Connect servers to support WebRTC peer to peer authentication and detail the issues and solutions found in the process. We observe that although WebRTC allows for the exchange of identity assertion between peers, users lack feedback and control over the other party authentication. To allow identity negotiation during a WebRTC communication setup, we propose an extension to the Session Description Protocol. Our implementation demonstrates current limitations with respect to the current WebRTC specification.

Kevin Corre, Simon Bécot, Olivier Barais, Gerson Sunyé

A UML Profile for OData Web APIs

More and more individuals and organizations are making their data available online publicly, resulting in a growing market of technologies and services to help consume data and extract its real value. One of the several ways to publish data on the Web is via Web APIs. Unlike other approaches like RDF, Web APIs provide a simple way to query structured data by relying only on the HTTP protocol. Standards and frameworks such as Open API or API Blueprint offer a way to create Web APIs but OData stands out from the rest as it is specifically tailored to deal with data sources. However, creating an OData Web API is a hard and time-consuming task for data providers as they have to choose between relying on commercial solutions, which are heavy and require a deep knowledge of their corresponding platforms, or create a customized solution to share their data. We propose an approach that leverages on model-driven techniques to facilitate the development of OData Web APIs. The approach relies on a UML profile for OData allowing to annotate a UML class diagram with OData stereotypes. In this paper we describe the profile and show how class diagrams can be automatically annotated with such profile.

Hamza Ed-douibi, Javier Luis Cánovas Izquierdo, Jordi Cabot

A Query Log Analysis of Dataset Search

Data is one of the most important digital assets in the world and its availability on the web is increasing. To use it effectively, we need tools that can retrieve the most relevant datasets to match our information needs. Web search engines are not well suited for this task, as they are designed primarily for documents, not data. In this paper, we present the first query log analysis for dataset search, based on logs of four national open data portals. Our aim is to gain a better understanding of the typical users of these portals and the types of queries they issue, and frame the findings in the broader context of dataset search. The logs suggest that queries issued on data portals differ from those issued to web search engines in their length and structure. From the analysis we could also infer that the portals are used exploratively, rather than to answer focused questions. These insights can inform the design of more effective dataset retrieval technology, and improve the user experience of data portals.

Emilia Kacprzak, Laura M. Koesten, Luis-Daniel Ibáñez, Elena Simperl, Jeni Tennison

Recruiting from the Network: Discovering Twitter Users Who Can Help Combat Zika Epidemics

Tropical diseases like Chikungunya and Zika have come to prominence in recent years as the cause of serious health problems. We explore the hypothesis that monitoring and analysis of social media content streams may effectively complement institutional disease prevention efforts. Specifically, we aim to identify selected members of the public who are likely to be sensitive to virus combat initiatives. Focusing on Twitter and on the topic of Zika, our approach involves (i) training a classifier to select topic-relevant tweets from the Twitter feed, and (ii) discovering the top users who are actively posting relevant content about the topic. In this short paper we describe our analytical approach and prototype architecture, discuss the challenges of dealing with noisy and sparse signal, and present encouraging preliminary results.

Paolo Missier, Callum McClean, Jonathan Carlton, Diego Cedrim, Leonardo Silva, Alessandro Garcia, Alexandre Plastino, Alexander Romanovsky

Towards Automatic Generation of Web-Based Modeling Editors

With the current trend of digitalization within a multitude of different domains, the need raises for effective approaches to capture domain knowledge. Modeling languages, especially, domain-specific modeling languages (DSMLs), are considered as an important method to involve domain experts in the system development. However, current approaches for developing DSMLs and generating modeling editors are mostly focusing on reusing the infrastructures provided by programming IDEs. On the other hand, several approaches exist for developing Web-based modeling editors using dedicated JavaScript frameworks. However, these frameworks do not exploit the high automation potential from DSML approaches to generate modeling editors from language specifications. Thus, the development of Web-based modeling editors requires still major programming efforts and dealing with recurring tasks. In this paper, we combine the best of both worlds by reusing the language specification techniques of DSML engineering approaches for generating Web-based modeling editors. In particular, we show how to combine two concrete approaches, namely Eugenia from DSML engineering and JointJS as a protagonist from JavaScript frameworks, and demonstrate the automation potential of establishing Web-based modeling editors. We present first results concerning two reference DSML examples which have been realized by our approach as Web-based modeling editors.

Manuel Wimmer, Irene Garrigós, Sergio Firmenich

Application Papers


Harvesting Forum Pages from Seed Sites

Web forums are rich sources of conversational content. Many applications, such as opinion mining and question answering, can greatly benefit from mining and exploring such useful content. A key step towards making this content more easily available is to collect conversational pages on forum sites – so-called thread pages. In this paper, we propose a two-step crawling solution for the problem of collecting thread pages in large scale. First, since thread pages are located within forum sites, we propose an inter-site crawler that locates forum sites on the Web. To do that, the inter-site crawler focuses on the Web graph neighbourhood of forum sites, and explores the content patterns of the links in this region to guide its visitation policy. Next, to collect thread pages within the discovered forum sites, we propose an intra-site crawler that finds thread pages by learning the context of links that lead to those pages and, to detect them, relies on their content and structural features. Experimental results demonstrate that both the inter-site and the intra-site crawlers are effective and obtain superior performance in comparison to their baselines.

Luciano Barbosa

Open Access

Decentralised Authoring, Annotations and Notifications for a Read-Write Web with dokieli

While the Web was designed as a decentralised environment, individual authors still lack the ability to conveniently author and publish documents, and to engage in social interactions with documents of others in a truly decentralised fashion. We present dokieli, a fully decentralised, browser-based authoring and annotation platform with built-in support for social interactions, through which people retain ownership of and sovereignty over their data. The resulting “living” documents are interoperable and independent of dokieli since they follow standards and best practices, such as HTML+RDFa for a fine-grained semantic structure, Linked Data Platform for personal data storage, and Linked Data Notifications for updates. This article describes dokieli’s architecture and implementation, demonstrating advanced document authoring and interaction without a single point of control. Such an environment provides the right technological conditions for independent publication of scientific articles, news, and other works that benefit from diverse voices and open interactions. To experience the described features please open this document in your Web browser under its canonical URI:

Sarven Capadisli, Amy Guy, Ruben Verborgh, Christoph Lange, Sören Auer, Tim Berners-Lee

Evaluating Genomic Big Data Operations on SciDB and Spark

We are developing a new, holistic data management system for genomics, which provides high-level abstractions for querying large genomic datasets. We designed our system so that it leverages on data management engines for low-level data access. Such design can be adapted to two different kinds of data engines: the family of scientific databases (among them, SciDB) and the broader family of generic platforms (among them, Spark). Trade-offs are not obvious; scientific databases are expected to outperform generic platforms when they use features which are embedded within their specialized design, but generic platforms are expected to outperform scientific databases on general-purpose operations.In this paper, we compare our SciDB and Spark implementations at work on genomic abstractions. We use four typical genomic operations as benchmark, stemming from the concrete requirements of our project, and encoded using SciDB and Spark; we discuss their common aspects and differences, specifically discussing how genomic regions and operations can be expressed using SciDB arrays. We comparatively evaluate the performance and scalability of the two implementations over datasets consisting of billions of genomic regions.

Simone Cattani, Stefano Ceri, Abdulrahman Kaitoua, Pietro Pinoli

Mining Worse and Better Opinions

Unsupervised and Agnostic Aggregation of Online Reviews

In this paper, we propose a novel approach for aggregating online reviews, according to the opinions they express. Our methodology is unsupervised, due to the fact that it does not rely on pre-labeled reviews, and it is agnostic, since it does not make any assumption about the domain or the language of the review content. We measure the adherence of a review content to the domain terminology extracted from a review set. First, we demonstrate the informativeness of the adherence metric with respect to the score associated with a review. Then, we exploit the metric values to group reviews, according to the opinions they express. Our experimental campaign has been carried out on two large datasets collected from Booking and Amazon, respectively.

Michela Fazzolari, Marinella Petrocchi, Alessandro Tommasi, Cesare Zavattari

XOOM: An End-User Development Tool for Web-Based Wearable Immersive Virtual Tours

XOOM is a novel interactive tool that allows non ICT-specialists to create web-based applications of Wearable Immersive Virtual Reality (WIVR) technology that use 360° realistic videos as interactive virtual tours. These applications are interesting for various domains that range from gaming, entertainment, cultural heritage, and tourism to education, professional training, therapy and rehabilitation. 360° interactive videos are displayed on smart-phones placed on head-mounted VR viewers. Users explore the virtual environment and interact with active elements through head direction and movements. The virtual scenarios can be seen also on external displays (e.g., TV monitors or projections) to enable other users to participate in the experience, and to control the VR space if needed, e.g., for education, training or therapy purposes. XOOM provides the functionality to create applications of this kind, import 360° videos, concatenate them, and superimpose active elements on the virtual scenes, so that the resulting environment is more interactive and is customized to the requirement of a specific domain and user target. XOOM also supports automatic data gathering and visualizations (e.g., through heat-maps) of the users’ experience, which can be inspected for analytics purposes, as well as for user evaluation (e.g., in education, training, or therapy contexts). The paper describes the design and implementation of XOOM, and reports a case study in the therapeutic context.

Franca Garzotto, Mirko Gelsomini, Vito Matarazzo, Nicolò Messina, Daniele Occhiuto

Public Debates on the Web

With the advent of social media, any piece of information may be spread all over the world in no time. Furthermore, the vast number of available communication channels makes it difficult to cross-check information that has been (re-)published on different media in real time. In this context where people may express their positions on many subjects, as well as launching new open initiatives, the public needs a mean to gather and compare ideas and opinions in a structured manner. The present paper presents the project, which aims to develop a collaborative platform where opinions, namely arguments, are gathered, analyzed and linked to one another via explicit relations. relies on various Natural Language Processing modules to semi-automatically extract information from the web and propose meaningful visualizations to the platform’s contributors. Furthermore, public actors may be identified and attached to the ideas they publish to create a structured knowledge base where annotated texts, extracted positions and alliances may be identified.

Fabian Gilson, André Bittar, Pierre-Yves Schobbens

Demonstration Papers


A Web Tool for Type Checking and Testing of SPARQL Queries

In this paper a property-based testing tool for SPARQL is described. The tool randomly generates test cases in the form of instances of an ontology. The tool checks the well typed-ness of the SPARQL query as well as the consistency of the test cases with the ontology axioms. Test cases are after used to execute queries. The output of the queries is tested with a Boolean property which is defined in terms of membership of ontology individuals to classes. The testing tool reports counterexamples when the Boolean property is not satisfied.

Jesús M. Almendros-Jiménez, Antonio Becerra-Terón

Supporting Mobile Web Augmentation by End Users

This article presents MoWA Authoring, an End User Development platform supporting the improvement of existing –usually third party– Web applications with mobile features. This enhancement is carried out by the addition of specific behaviours, mostly dependent on context values. The tool assists the user in the construction of applications by easily selecting the components that fit his needs. A series of forms allows selecting sensors, context values of interest and digital counterparts to define an augmentation layer. The latter is composed of augmentation units, called augmenters, which are configurable through widgets that can be placed over the presentation layer of any Web application.

Gabriela Bosetti, Sergio Firmenich, Gustavo Rossi, Marco Winckler

Rapid Engineering of QA Systems Using the Light-Weight Qanary Architecture

Establishing a Question Answering (QA) system is time consuming. One main reason is the involved fields, as solving a Question Answering task, i.e., answering a user’s question with the correct fact(s), might require functionalities from different fields like information retrieval, natural language processing, and linked data. The architecture used for Qanary supports the derived need for easy collaboration on the level of QA processes. The focus of the design of Qanary was to enable rapid engineering of QA systems as same as a high flexibility of the component functionality. In this paper, we will present the engineering approach leading to re-usable components, high flexibility, and easy-to-compose QA systems.

Andreas Both, Kuldeep Singh, Dennis Diefenbach, Ioanna Lytra

Improving GISBuilder with Runtime Product Preview

Software product lines allow users with little development experience to configure and generate applications. On the web this approach is becoming more and more popular due to the low time required to bring a new release to the final users. The architecture of web applications though require complex development environments in order to allow users to test and evaluate a new configuration. In this work we present a novel approach, based on in-browser generation and emulation techniques, which can be applied to real-world state of the art software product lines, reducing test deployment complexity and enabling an agile development cycle.

Alejandro Cortiñas, Carlo Bernaschina, Miguel R. Luaces, Piero Fraternali

ALMOsT-Trace: A Web Based Embeddable Tracing Tool for ALMOsT.js

Model Driven Development (MDD) requires model-to-model and/or model-to-text transformations to produce application code from high level descriptions. Debugging and evaluating such transformations is in itself a complex task; complexity which can be mitigated through the usage of advanced developer tools. We demonstrate ALMOsT-Trace, a plug-in for ALMOsT.js, which allows developers to debug and analyze their model transformations from within their applications. In the demo, attendees will be able to experiment with ALMOsT-Trace by evaluating it in, an online tool for the rapid prototyping of web and mobile applications, and by means of examples that can be customized by the attendees themself.

Rocio Nahime Torres, Carlo Bernaschina

TweetCric: A Twitter-Based Accountability Mechanism for Cricket

This paper demonstrates a Web service called TweetCric to uncover cricket insights from Twitter with the aim of facilitating sports analysts and journalists. It essentially arranges crowdsourced Twitter data about a team in comprehensive visualizations by incorporating domain-specific approaches to sentiment analysis.

Arjumand Younus, M. Atif Qureshi, Naif R. Aljohani, Derek Greene, Michael P. O’Mahony

PhD Symposium Papers


Extending the Genomic Data Model and the Genometric Query Language with Domain Taxonomies

In bioinformatics and biology researchers annotate experimental data in many different ways. When other researchers need to query these data, they are typically unaware of the specificity of the annotations; often they encounter possible mismatches between the granularity of the query and the granularity of the annotations. In this work, we propose an extension of the Genomic Data Model and the GenoMetric Query Language (a well established framework for biomedical data), able to search, integrate, and extend genomic data. The extension is going to be performed through domain taxonomies and by considering many external ontologies and databases. An ad-hoc software system and query language will be implemented for the storage, management, search, retrieval, and integration of biomedical data.

Eleonora Cappelli, Emanuel Weitschek

A Semantic Integration Approach for Building Knowledge Graphs On-Demand

Information about the same entity may be spread across several Web data sources, e.g., people on the social networks (Social Web), product descriptions on e-commerce sites (Deep Web) or in public Knowledge Graphs (Web of data). The problem of integrating entities from heterogeneous Web data sources on-demand is still a challenge. Existing approaches propose expensive Extraction Transformation Loading (ETL) processes and rely on syntactic comparison of entity properties, leaving aside the semantics encoded in the data. We devise FuhSen, an integration approach that exploits search capabilities of Web data sources and semantics encoded in the data. FuhSen generates Knowledge Graphs in response to keyword-based queries. Resulting Knowledge Graphs describe the semantics of the integrated entities, as well as the relationships among these entities. FuhSen approach utilizes an ontology to describe the Web data sources in terms of content and search capabilities, and exploits this knowledge to select the sources relevant for answering a keyword-based query on-demand. The results of various empirical studies of the effectiveness of FuhSen suggest that the proposed integration technique is able to accurately integrate data from heterogeneous Web data sources into a Knowledge Graph.

Diego Collarana

A People-Oriented Paradigm for Smart Cities

Most works in the literature agree on considering the Internet of Things (IoT) as the base technology to collect information related to smart cities. This information is usually offered as open data for its analysis, and to elaborate statistics or provide services which improve the management of the city, making it more efficient and more comfortable to live in. However, it is not possible to actually improve the quality of life of smart cities’ inhabitants if there is no direct information about them and their experiences. To address this problem, we propose using a social and mobile computation model, called the Internet of People (IoP) which empowers smartphones to recollect information about their users, analyze it to obtain knowledge about their habits, and provide this knowledge as a service creating a collaborative information network. Combining IoT and IoP, we allow the smart city to dynamically adapt its services to the needs of its citizens, promoting their welfare as the main objective of the city.

Alejandro Pérez-Vereda, Carlos Canal

CSQuaRE: Approach for Quality Control in Crowdsourcing

Quality control of responses enhances sustainability and adoption of crowdsourcing. Expert and peer reviews, majority voting, machine learning, game theory, etc. are some of the practices for quality control in crowdsourcing. However, quality of crowdsourced responses is still a concern. We propose a quality control approach drawing inspiration from Requirements Engineering quality attributes - Completeness, Consistency and Correctness (3Cs). The 3Cs of a response are assessed and displayed as a CSQuaRE score based on coverage with reference to knowledge base (ontology), cohesiveness and contributor credibility in the domain. The knowledge base would evolve with continued extraction of instances of information and thus responses would be re-calibrated for relevance. The suggested approach would be demonstrated for Information Security related Question and Answers on a crowdsourcing platform. The evaluation of the approach would be based on comparison with existing quality control techniques and feedback from security experts.

Lalit Mohan Sanagavarapu

Design of a Small and Medium Enterprise Growth Prediction Model Based on Web Mining

Small and medium enterprises (SMEs) play an important role in the economy of many countries. Still, due to the highly turbulent business environment, SMEs experience more severe challenges in maintaining and expanding their business. To support SMEs at improving their competitiveness, researchers recently turned their focus on applying web mining (WM) to build growth prediction models. WM enables automatic and large-scale collection and analysis of potentially valuable data from various online platforms, thus bearing a great potential for extracting SME growth factors, and enhancing existing SME growth prediction models. This study aims at developing an automated system to collect business-relevant data from the Web and predict future growth trends of SMEs by means of WM and machine learning (ML) techniques.

Yiea-Funk Te, Irena Pletikosa Cvijikj

Intelligent End User Development Platform Towards Enhanced Decision-Making

From a decision-making perspective, the web is an emerging information domain. It makes a large amount of data available for a large group of users in different domains. In recent years, the dramatic growth of data accessible through the web and the development of large-scale distributed web services have presented new challenges for users. Web services generate data in an ad hoc manner; hence, the systematic management of data has become an obstacle for efficient decision making. On the other hand, the emergence of IoT devices exaggerates the data production rate. Therefore, due to the overwhelming amount of data online, it is essential to support end users, who have limited knowledge of programming, in accessing the relevant information. During the last few decades, many approaches have been proposed to collect and process heterogeneous data from distributed sources in a more uniform way. However, existing solutions have failed to provide the flexibility required for data integration and management. The goal of this project is providing a systematic approach for end users to access and analyze exact data at the right time. The goal of this PhD project is to support knowledge workers in decision making with an end user development approach.

Bahareh Zarei, Martin Gaedke


Weitere Informationen

Premium Partner