nach oben

2014 | Buch

Kapitel lesen Erstes Kapitel lesen

Web Information Systems Engineering – WISE 2014

15th International Conference, Thessaloniki, Greece, October 12-14, 2014, Proceedings, Part II

herausgegeben von: Boualem Benatallah, Azer Bestavros, Yannis Manolopoulos, Athena Vakali, Yanchun Zhang

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book constitutes the proceedings of the 15th International Conference on Web Information Systems Engineering, WISE 2014, held in Thessaloniki, Greece, in October 2014.

The 52 full papers, 16 short and 14 poster papers, presented in the two-volume proceedings LNCS 8786 and 8787 were carefully reviewed and selected from 196 submissions. They are organized in topical sections named: Web mining, modeling and classification; Web querying and searching; Web recommendation and personalization; semantic Web; social online networks; software architectures amd platforms; Web technologies and frameworks; Web innovation and applications; and challenge.

Inhaltsverzeichnis

Frontmatter

Social Online Networks

Predicting Elections from Social Networks Based on Sub-event Detection and Sentiment Analysis

Social networks are widely used by all kinds of people to express their opinions. Predicting election outcomes is now becoming a compelling research issue. People express themselves spontaneously with respect to the social events in their social networks. Real time prediction on ongoing election events can provide feedback and trend analysis for politicians and news analysts to make informed decisions. This paper proposes an approach to predicting election results by incorporating sub-event detection and sentiment analysis in social networks to analyse as well as visualise political preferences revealed by those social network users. Extensive experiments are conducted to evaluate the performance of our approach based on a real-world

Twitter

dataset. Our experiments show that the proposed approach can effectively predict the election results over the given baselines.

Sayan Unankard, Xue Li, Mohamed Sharaf, Jiang Zhong, Xueming Li

Sonora: A Prescriptive Model for Message Authoring on Twitter

Within social networks, certain messages propagate with more ease or attract more attention than others. This effect can be a consequence of several factors, such as topic of the message, number of followers, real-time relevance, person who is sending the message etc. Only one of these factors is within a user’s reach at authoring time: how to phrase the message. In this paper we examine how word choice contributes to the propagation of a message.

We present a prescriptive model that analyzes words based on their historic performance in retweets in order to propose enhancements in future tweet performance. Our model calculates a novel score (

Sonora score

) that is built on three aspects of diffusion - volume, prevalence and sustain.

We show that

Sonora score

has powerful predictive ability, and that it complements social and tweet-level features to achieve an F1 score of 0.82 in retweet prediction. Moreover, it has the ability to prescribe changes to the tweet wording such that when the

Sonora score

for a tweet is higher, it is twice as likely to have more retweets.

Lastly, we show how our prescriptive model can be used to assist users in content creation for optimized success on social media. Because the model works at the word level, it lends itself extremely well to the creation of user interfaces which help authors incrementally – word by word – refine their message until its potential is maximized and it is ready for publication. We present an easy to use iOS application that illustrates the potential of incremental refinement using

Sonora score

coupled with the familiarity of a traditional spell checker.

Pablo N. Mendes, Daniel Gruhl, Clemens Drews, Chris Kau, Neal Lewis, Meena Nagarajan, Alfredo Alba, Steve Welch

A Fuzzy Model for Selecting Social Web Services

This paper discusses a fuzzy model to select social component Web services for developing composite Web services. The social aspect stems from the qualities that Web services exhibit at run-time such as selfishness and trustworthiness. The fuzzy model considers these qualities during the selection, which allows users to express their needs and requirements with respect to these qualities. The ranking in this fuzzy model is based on a hybrid weighting technique that mixes Web services’ computing and social behaviors. The simulation results show the appropriateness of fuzzy logic for social Web services selection as well as better performance over entropy-based ranking techniques.

Hamdi Yahyaoui, Mohammed Almulla, Zakaria Maamar

Insights into Entity Name Evolution on Wikipedia

Working with Web archives raises a number of issues caused by their temporal characteristics. Depending on the age of the content, additional knowledge might be needed to find and understand older texts. Especially facts about entities are subject to change. Most severe in terms of information retrieval are name changes. In order to find entities that have changed their name over time, search engines need to be aware of this evolution. We tackle this problem by analyzing Wikipedia in terms of entity evolutions mentioned in articles regardless the structural elements. We gathered statistics and automatically extracted minimum excerpts covering name changes by incorporating lists dedicated to that subject. In future work, these excerpts are going to be used to discover patterns and detect changes in other sources. In this work we investigate whether or not Wikipedia is a suitable source for extracting the required knowledge.

Helge Holzmann, Thomas Risse

Assessing the Credibility of Nodes on Multiple-Relational Social Networks

With the development of the Internet, social network is changing people’s daily lives. In many social networks, the relationships between nodes can be measured. It is an important application to predict trust link, find the most reliable node and rank nodes. In order to implement those applications, it is crucial to assess the credibility of a node. The credibility of a node is denoted as the expected value, which can be evaluated by similarities between the node and its neighbors. That means the credibility of a node is high while its behaviors are reasonable. When multiple-relational networks are becoming prevalent, we observe that it is possible to apply more relations to improve the performance of assessing the credibility of nodes. We found that trust values among one type of nodes and similarity scores among different types of nodes reinforce each other towards better and more meaningful results. In this paper, we introduce a framework that computes the credibility of nodes on a multiple-relational network. The experiment result on real data shows that our framework is effective.

Weishu Hu, Zhiguo Gong

Result Diversification for Tweet Search

Being one of the most popular microblogging platforms, Twitter handles more than two billion queries per day. Given the users’ desire for fresh and novel content but their reluctance to submit long and descriptive queries, there is an inevitable need for generating diversified search results to cover different aspects of a query topic. In this paper, we address diversification of results in tweet search by adopting several methods from the text summarization and web search domains. We provide an exhaustive evaluation of all the methods using a standard dataset specifically tailored for this purpose. Our findings reveal that implicit diversification methods are more promising in the current setup, whereas explicit methods need to be augmented with a better representation of query sub-topics.

Makbule Gulcin Ozsoy, Kezban Dilek Onal, Ismail Sengor Altingovde

WikipEvent: Leveraging Wikipedia Edit History for Event Detection

Much of existing work in information extraction assumes the static nature of relationships in fixed knowledge bases. However, in collaborative environments such as Wikipedia, information and structures are highly dynamic over time. In this work, we introduce a new method to extract complex event structures from Wikipedia. We propose a new model to represent events by engaging multiple entities, generalizable to an arbitrary language. The evolution of an event is captured effectively based on analyzing the user edits history in Wikipedia. Our work provides a foundation for a novel class of evolution-aware entity-based enrichment algorithms, and considerably increases the quality of entity accessibility and temporal retrieval for Wikipedia. We formalize this problem and introduce an efficient end-to-end platform as a solution. We conduct comprehensive experiments on a real dataset of

$1.8 \ million$

Wikipedia articles to show the effectiveness of our proposed solution. Our results demonstrate that we are able to achieve a precision of 70% when evaluated using manually annotated data. Finally, we make a comparative analysis of our work with the well established Current Event Portal of Wikipedia and find that our system

WikipEvent

using

Co-References

method can be used in a complementary way to deliver new and more information about events.

Tuan Tran, Andrea Ceroni, Mihai Georgescu, Kaweh Djafari Naini, Marco Fisichella

Feature Based Sentiment Analysis of Tweets in Multiple Languages

Feature based sentiment analysis is normally conducted using review Web sites, since it is difficult to extract accurate product features from tweets. However, Twitter users express sentiment towards a large variety of products in many different languages. Besides, sentiment expressed on Twitter is more up to date and represents the sentiment of a larger population than review articles. Therefore, we propose a method that identifies product features using review articles and then conduct sentiment analysis on tweets containing those features. In that way, we can increase the precision of feature extraction by up to 40% compared to features extracted directly from tweets. Moreover, our method translates and matches the features extracted for multiple languages and ranks them based on how frequently the features are mentioned in the tweets of each language. By doing this, we can highlight the features that are the most relevant for multilingual analysis.

Maike Erdmann, Kazushi Ikeda, Hiromi Ishizaki, Gen Hattori, Yasuhiro Takishima

Incorporating the Position of Sharing Action in Predicting Popular Videos in Online Social Networks

Predicting popular videos in online social networks (OSNs) is important for network traffic engineering and video recommendation. In order to avoid the difficulty of acquiring all OSN users’ activities, recent studies try to predict popular media contents in OSNs only based on a very small number of users, referred to as experts. However, these studies simply treat all users’ diffusion actions as the same. Based on large-scale video diffusion traces collected from a popular OSN, we analyze the positions of users’ video sharing actions in the propagation graph, and classify users’ video sharing actions into three different types, i.e., initiator actions, spreader actions and follower actions. Surprisingly, while existing studies mainly focus on the initiators, our empirical studies suggest that the spreaders actually play a more important role in the diffusion process of popular videos. Motivated by this finding, we account for the position information of sharing actions to select initiator experts, spreader experts and follower experts, based on corresponding sharing actions. We conduct experiments on the collected dataset to evaluate the performance of these three types of experts in predicting popular videos. The evaluation results demonstrate that the spreader experts can not only make more accurate predictions than initiator experts and follower experts, but also outperform the general experts selected by existing studies.

Yi Long, Victor O. K. Li, Guolin Niu

An Evolution-Based Robust Social Influence Evaluation Method in Online Social Networks

Online Social Networks (OSNs) are becoming popular and attracting lots of participants. In OSN based e-commerce platforms, a buyer’s review of a product is one of the most important factors for other buyers’ decision makings. A buyer who provides high quality reviews thus has strong social influence, and can impact a large number of participants’ purchase behaviours in OSNs. However, the dishonest participants can cheat the existing social influence evaluation models by using some typical attacks, like

Constant

and

Camouflage

, to obtain fake strong social influence. Therefore, it is significant to accurately evaluate such social influence to recommend the participants who have strong social influences and provide high quality product reviews. In this paper, we propose an Evolutionary-Based Robust Social Influence (EB-RSI) method based on the trust evolutionary models. In our EB-RSI, we propose four influence impact factors in social influence evaluation, i.e., Total Trustworthiness (TT), Fluctuant Trend of Being Advisor (FTBA), Fluctuant Trend of Trustworthiness (FTT) and Trustworthiness Area (TA). They are all significant in the influence evaluation. We conduct experiments onto a real social network dataset Epinions, and validate the effectiveness and robustness of our EB-RSI by comparing with state-of-the-art method, SoCap. The experimental results demonstrate that our EB-RSI can more accurately evaluate participants’ social influence than SoCap.

Feng Zhu, Guanfeng Liu, An Liu, Lei Zhao, Xiaofang Zhou

A Framework for Linking Educational Medical Objects: Connecting Web2.0 and Traditional Education

With the emergence of Linked Data principles for achieving web-scale interoperability, and the increasing uptake of open educational content across institutions, Linked Data (LD) is playing an important role in exposing and sharing open educational content on the web. The growing use of the internet has modified quickly our learning habits. Learning in the medical field is unique in its nature. The educational objects are of various types and should be published by trustworthy organizations. Therefore, medical students and educators face difficulties locating educational objects across the web. To address this problem, we propose a data model for describing educational medical objects harvested from the World Wide Web (WWW) published in Linked Data format. To reduce the burden of navigating through the overwhelming amount of information on the web, we provide a harvesting engine for collecting metadata objects from specified repositories. Then, the harvested educational objects’ metadata are represented in our proposed data model named Linked Educational Medical Objects (LEMO). Further enrichment is applied on the metadata records by annotating the textual elements of the records using biomedical ontologies in order to build dynamic connections between the objects. In this paper, we present the framework proposing the data model LEMO along with its implementation and experiments conducted.

Reem Qadan Al Fayez, Mike Joy

An Ensemble Model for Cross-Domain Polarity Classification on Twitter

Polarity analysis of Social Media content is of significant importance for various applications. Most current approaches treat this task as a classification problem, demanding a labeled corpus for training purposes. However, if the learned model is applied on a different domain, the performance drops significantly and, given that it is impractical to have labeled corpora for every domain, this becomes a challenging task. In the current work, we address this problem, by proposing an ensemble classifier that is trained on a general domain and and adapts, without the need for additional ground truth, on the desired (test) domain before classifying a document. Our experiments are performed on three different datasets and the obtained results are compared with various baselines and state-of-the-art methods; we demonstrate that our model is outperforming all out-of-domain trained baseline algorithms, and that it is even comparable with different in-domain classifiers.

Adam Tsakalidis, Symeon Papadopoulos, Ioannis Kompatsiaris

A Faceted Crawler for the Twitter Service

Researchers, nowadays, have at their disposal valuable data from social networking applications, of which Twitter and Facebook are the most prominent examples. To retrieve this content, the Twitter service provides 2 distinct

Application Programming Interfaces

(APIs): a probe-based and a streaming one, each of which imposes different limitations on the data collection process. In this paper, we present a general architecture to facilitate

faceted crawling

of the service, which simplifies retrieval. We give implementation details of our system, while providing a simple way to express the crawling process, i.e., the

crawl flow

. We experimentally evaluate it on a variety of faceted crawls, depicting its efficacy for the online medium.

George Valkanas, Antonia Saravanou, Dimitrios Gunopulos

Diversifying Microblog Posts

Microblogs have become an important source of information, a medium for following and spreading trends, news and ideas all over the world. As a result, microblog search has emerged as a new option for covering user information needs, especially with respect to timely events, news or trends. However users are frequently overloaded by the high rate of produced microblogging posts, which often carry no new information with respect to other similar posts. In this paper we propose a method that helps users effectively harvest information from a microblogging stream, by filtering out redundant data and maximizing diversity among the displayed information. We introduce microblog posts-specific diversification criteria and apply them on heuristic diversification algorithms. We implement the above methods into a prototype system that works with data from Twitter. The experimental evaluation, demonstrates the effectiveness of applying our problem specific diversification criteria, as opposed to applying plain content diversity on microblog posts.

Marios Koniaris, Giorgos Giannopoulos, Timos Sellis, Yiannis Vasileiou

Software Architectures and Platforms

MultiMasher: Providing Architectural Support and Visual Tools for Multi-device Mashups

The vast majority of web applications still assume a single user on a single device and provide fairly limited means for interaction across multiple devices. In particular, developing applications for multi-device environments is a challenging task for which there is little tool support. We present the architecture and tools of MultiMasher, a system for the development of multi-device web applications based on the reuse of existing web sites created for single device usage. Web sites and devices can be mashed up and accessed by multiple users simultaneously, with our tools ensuring a consistent state across all devices. MultiMasher supports the composition of arbitrary elements from any web site, inter-widget communication across devices, and awareness of connected devices. We present both conceptual and technical evaluations of MultiMasher including a study on 50 popular web sites demonstrating high compatibility in terms of browsing, distribution and linking of web site components.

Maria Husmann, Michael Nebeling, Stefano Pongelli, Moira C. Norrie

MindXpres: An Extensible Content-Driven Cross-Media Presentation Platform

Existing presentation tools and document formats show a number of shortcomings in terms of the management, visualisation and navigation of rich cross-media content. While slideware was originally designed for the production of physical transparencies, there is an increasing need for richer and more interactive media types. We investigate innovative forms of organising, visualising and navigating presentations. This includes the introduction of a new document format supporting the integration or transclusion of content from different presentations and cross-media sources as well as the non-linear navigation of presentations. We present MindXpres, a web technology-based extensible platform for content-driven cross-media presentations. The modular architecture and plug-in mechanism of MindXpres enable the reuse or integration of new visualisation and interaction components. Our MindXpres prototype forms a platform for the exploration and rapid prototyping of innovative concepts for presentation tools. Its support for multi-device user interfaces further enables an active participation of the audience which should ultimately result in more dynamic, engaging presentations and improved knowledge transfer.

Reinout Roels, Beat Signer

Open Cross-Document Linking and Browsing Based on a Visual Plug-in Architecture

Digital documents often do not exist in isolation but are implicitly or explicitly linked to parts of other documents. Nevertheless, most existing document formats only support links to web resources but not to parts of third-party documents. An open cross-document link service should address the multitude of existing document formats and be extensible to support emerging document formats and models. We present an architecture and prototype of an open cross-document link service and browser that is based on the RSL hypermedia metamodel. A main contribution is the specification and development of a visual plug-in solution that enables the integration of new document formats without requiring changes to the cross-document browser’s main user interface component. The presented visual plug-in mechanism makes use of the Open Service Gateway initiative (OSGi) specification for modularisation and plug-in extensibility and has been validated by developing data as well as visual plug-ins for a number of existing document formats.

Ahmed A. O. Tayeh, Beat Signer

Cost-Based Join Algorithm Selection in Hadoop

In recent years, MapReduce has become a popular computing framework for big data analysis. Join is a major query type for data analysis and various algorithms have been designed to process join queries on top of Hadoop. Since the efficiency of different algorithms differs on the join tasks on hand, to achieve a good performance, users need to select an appropriate algorithm and use the algorithm with a proper configuration, which is rather difficult for many end users. This paper proposes a cost model to estimate the cost of four popular join algorithms. Based on the cost model, the system may automatically choose the join algorithm with the least cost, and then give the reasonable configuration values for the chosen algorithm. Experimental results with the TPC-H benchmark verify that the proposed method can correctly choose the best join algorithm, and the chosen algorithm can achieve a speedup of around 1.25 times over the default join algorithm.

Jun Gu, Shu Peng, X. Sean Wang, Weixiong Rao, Min Yang, Yu Cao

Consistent Freshness-Aware Caching for Multi-Object Requests

Dynamic websites rely on caching and clustering to achieve high performance and scalability. While queries benefit from middle-tier caching, updates introduce a distributed cache consistency problem. One promising approach to solving this problem is

Freshness-Aware Caching (FAC)

: FAC tracks the freshness of cached data and allows clients to

explicitly

trade freshness of data for response times. The original protocol was limited to single-object lookups and could only handle complex requests if all requested objects had been loaded into the cache at the same time. In this paper we describe the

Multi-Object Freshness-Aware Caching (MOFAC)

algorithm, an extension of FAC that provides a consistent snapshot of multiple cached objects even if they are loaded and updated at different points of time. This is done by keeping track of their

group valid interval

, as introduced and defined in this paper. We have implemented

MOFAC

in the JBoss Java EE container so that it can provide freshness and consistency guarantees for cached Java beans. Our evaluation shows that those consistency guarantees come with a reasonable overhead and that

MOFAC

can provide significantly better read performance than cache invalidation in the case of concurrent updates and reads for multi-object requests.

Meena Rajani, Uwe Röhm, Akon Dey

ε-Controlled-Replicate: An ImprovedControlled-Replicate Algorithm for Multi-way Spatial Join Processing on Map-Reduce

Gupta et al. [11] studied the problem of handling multi-way spatial join queries on map-reduce platform and proposed the

Controlled-Replicate

algorithm for the same. In this paper we present

ε-Controlled-Replicate

- an improved

Controlled-Replicate

procedure for processing multi-way spatial join queries on map-reduce. We show that

ε-Controlled-Replicate

algorithm presented in this paper involves a significantly smaller communication cost vis-a-vis

Controlled-Replicate

. We discuss the details of

ε-Controlled-Replicate

algorithm and through an experimental study over synthetic as well as real-life California road datasets, we show the efficacy of the

ε-Controlled-Replicate

algorithm vis-a-vis

Controlled-Replicate

Himanshu Gupta, Bhupesh Chawda

REST as an Alternative to WSRF: A Comparison Based on the WS-Agreement Standard

WS-Agreement and WS-Agreement Negotiation are specifications that define a protocol and a language to dynamically negotiate, renegotiate, create and monitor bi-lateral service level agreements in distributed systems. While both specifications are based on the Web Services Resource Framework standard, that allows using stateful SOAP services, the WSAG4J reference implementation additionally provides a RESTful service implementation of the same operations. This paper evaluates the performance disparity between the standard conformable and the RESTful implementation of WS-Agreement and WS-Agreement Negotiation.

Florian Feigenbutz, Alexander Stanik, Andreas Kliem

Web Technologies and Frameworks

Enabling Cross-Platform Mobile Application Development: A Context-Aware Middleware

The emergence of mobile computing has changed the rules of web application development. Since context-awareness has become almost a necessity in mobile applications, web applications need to adapt to this new reality. A universal development approach for context-aware applications is inherently complex due to the requirement to manage diverse context information from different sources and at different levels of granularity. A context middleware can be a key enabler in adaptive applications, since it can serve in hiding the complexity of context management functions, promoting reusability and enabling modularity and extensibility in developing context-aware applications. In this paper we present our work on a cross-platform framework that fulfils the above. We elaborate on the need for cross-platform support in context-aware web application development for mobile computing environments identifying gaps in the current state of context support. The paper introduces the architecture of the middleware that fills these gaps and provides examples of its main components. An evaluation based on the development of a prototype, web-based, context-aware application is detailed. The application is compared against an analogous hybrid mobile application showing the evolutionary potential introduced via the middleware in delivering context-aware mobile applications.

Achilleas P. Achilleos, Georgia M. Kapitsaki

GEAP: A Generic Approach to Predicting Workload Bursts for Web Hosted Events

A number of recent research contributions in workload forecasting aim to confront the challenge facing many web applications of maintaining QoS in the face of fluctuating workload. Many of these demonstrate good prediction accuracy for periodic and long-term workload trends, but they exhibit poor accuracy when faced with predicting the magnitude, profile, and time of non-periodic bursts. It is such workload bursts that have been known to bring down numerous e-commerce and other web-based systems during events like online sales, as well as product, and result announcements. In this paper, we leverage the implicit link that often exists between such events and workload bursts, and we contribute: a generic approach that can make use of a given event’s definition to forecast the time, magnitude and profile of the event’s associated workload burst; a burst prediction accuracy metric for evaluating the efficacy of burst prediction methods; and an evaluation to showcase the generic applicability of event aware prediction across multiple domains, using real workload traces from three different domains.

Matthew Sladescu, Alan Fekete, Kevin Lee, Anna Liu

High-Payload Image-Hiding Scheme Based on Best-Block Matching and Multi-layered Syndrome-Trellis Codes

Image steganography has been widely used in the domain of privacy protection, such as the storage and transmission of the secret images. This paper presents a novel image-hiding scheme to embed the secret image into an innocent cover image. We use a block matching procedure to search for the best-matching block for each secret image block. Then the

-means clustering method is used to select

representative blocks for the not-well-matched secret image blocks. Moreover, the bases and indexes of the well-matched blocks together with the representative blocks are compressed by the Huffman coding. Finally, we use multi-layered syndrome-trellis codes (STC) to embed all the relevant data into multiple least significant bit (LSB) planes of the cover image. The results of experiments show that the proposed method outperforms some previous image-hiding methods.

Tao Han, Jinlong Fei, Shengli Liu, Xi Chen, Yuefei Zhu

Educational Forums at a Glance: Topic Extraction and Selection

Web forums play a key role in the process of knowledge creation, providing means for users to exchange ideas and to collaborate. However, educational forums, along several others online educational environments, often suffer from topic disruption. Since the contents are mainly produced by participants (in our case learners), one or few individuals might change the course of the discussions. Thus, realigning the discussed topics of a forum thread is a task often conducted by a tutor or moderator. In order to support learners and tutors to harmonically align forum discussions that are pertinent to a given lecture or course, in this paper, we present a method that combines semantic technologies and a statistical method to find and expose relevant topics to be discussed in online discussion forums. We surveyed the outcomes of our topic extraction and selection method with students, professors and university staff members. Results suggest the potential usability of the method and the potential applicability in real learning scenarios.

Bernardo Pereira Nunes, Ricardo Kawase, Besnik Fetahu, Marco A. Casanova, Gilda Helena B. de Campos

PDist-RIA Crawler: A Peer-to-Peer Distributed Crawler for Rich Internet Applications

Crawling Rich Internet Applications (RIAs) is important to ensure their security, accessibility and to index them for searching. To crawl a RIA, the crawler has to reach every application state and execute every application event. On a large RIA, this operation takes a long time. Previously published

GDist-RIA Crawler

proposes a distributed architecture to parallelize the task of crawling RIAs, and run the crawl over multiple computers to reduce time. In GDist-RIA Crawler, a centralized unit calculates the next task to execute, and tasks are dispatched to worker nodes for execution. This architecture is not scalable due to the centralized unit which is bound to become a bottleneck as the number of nodes increases. This paper extends GDist-RIA Crawler and proposes a fully peer-to-peer and scalable architecture to crawl RIAs, called

PDist-RIA Crawler

. PDist-RIA doesn’t have the same limitations in terms scalability while matching the performance of GDist-RIA. We describe a prototype showing the scalability and performance of the proposed solution.

Seyed M. Mirtaheri, Gregor V. Bochmann, Guy-Vincent Jourdan, Iosif Viorel Onut

Understand the City Better: Multimodal Aspect-Opinion Summarization for Travel

Every city has a unique taste, and attracts tourists from all over the world to experience personally. People like to share their opinions on scenic spots of a city via the Internet after a wonderful journey, which has become a kind of important information source for people who are going to make their travel planning. Confronted with the ever-increasing multimedia content, it is desirable to provide visualized summarization to quickly grasp the essential aspects of the scenic spots. To better understand the city, we propose a novel framework termed multimodal aspect-opinion summarization (

MAOS

) to discover the aspect-opinion about the popular scenic spots. We devolop a three-step solution to generate the multimodal summary in this paper. We first select important informative sentences from reviews and then identify the aspects from the selected sentences. Finally relevant and representative images from the travelogues are picked out to visualize the aspect opinions. We have done extensive experiments on a real-world travel and review dataset to demonstrate the effectiveness of our proposed method against the state-of-the-art approaches.

Ting Wang, Changqing Bai

Event Processing over a Distributed JSON Store: Design and Performance

Web applications are increasingly built to target both desktop and mobile users. As a result, modern Web development infrastructure must be able to process large numbers of events (e.g., for location-based features) and support analytics over those events, with applications ranging from banking (e.g., fraud detection) to retail (e.g., just-in-time personalized promotions). We describe a system specifically designed for those applications, allowing high-throughput event processing along with analytics. Our main contribution is the design and implementation of an in-memory JSON store that can handle both events and analytics workloads. The store relies on the JSON model in order to serve data through a common Web API. Thanks to the flexibility of the JSON model, the store can integrate data from systems of record (e.g., customer profiles) with data transmitted between the server and a large number of clients (e.g., location-based events or transactions). The proposed store is built over a distributed, transactional, in-memory object cache for performance. Our experiments show that our implementation handles high throughput and low latency without sacrificing scalability.

Miki Enoki, Jérôme Siméon, Hiroshi Horii, Martin Hirzel

Cleaning Environmental Sensing Data Streams Based on Individual Sensor Reliability

Environmental sensing is becoming a significant way for understanding and transforming the environment, given recent technology advances in the Internet of Things (IoT). Current environmental sensing projects typically deploy commodity sensors, which are known to be unreliable and prone to produce noisy and erroneous data. Unfortunately, the accuracy of current cleaning techniques based on mean or median prediction is unsatisfactory. In this paper, we propose a cleaning method based on incrementally adjusted individual sensor reliabilities, called

influence mean cleaning

(IMC). By incrementally adjusting sensor reliabilities, our approach can properly discover latent sensor reliability values in a data stream, and improve reliability-weighted prediction even in a sensor network with changing conditions. The experimental results based on both synthetic and real datasets show that our approach achieves higher accuracy than the mean and median-based approaches after some initial adjustment iterations.

Yihong Zhang, Claudia Szabo, Quan Z. Sheng

Managing Incentives in Social Computing Systems with PRINGL

Novel web-based socio-technical systems require incentives for efficient management and motivation of human workers taking part in complex collaborations. Incentive management techniques used in existing crowdsourcing platforms are not suitable for intellectually-challenging tasks; platform-specific solutions prevent both workers from comparing working conditions across different platforms as well as platform owners from attracting skilled workers. In this paper we present PRINGL, a domain-specific language for programming complex incentive strategies. It promotes re-use of proven incentive logic and allows composing of complex incentives suitable for novel types of socio-technical systems. We illustrate its applicability and expressiveness and discuss its properties and limitations.

Ognjen Scekic, Hong-Linh Truong, Schahram Dustdar

Consumer Monitoring of Infrastructure Performance in a Public Cloud

Many web information systems and applications are now run as cloud-hosted systems. The organization that owns the information system or application is thus a consumer of cloud services, and often relies on the cloud provider to monitor the virtual infrastructure and alert them of any disruption of the offered services. For example, Amazon Web Services’ cloud disruptions are announced by the cloud provider on a dedicated RSS feed so that the consumers can watch and act quickly. In this paper, we report on a long-running experiment for the monitoring and continuous benchmarking of a number of cloud resources on Amazon Cloud from a consumer’s perspective, aiming to check whether the service disruptions announced by the cloud provider are consistent with what we observe. We evaluate the performance of cloud resources over several months. We find that the performance of the cloud can vary significantly over time which leads to unpredictable application performance. Our analysis shows also that continuous benchmarking data can help detect failures before any announcement is made by the provider, as well as significant degradation of performance that is not always connected with Amazon service disruption announcements.

Rabia Chaudry, Adnene Guabtni, Alan Fekete, Len Bass, Anna Liu

Business Export Orientation Detection through Web Content Analysis

Economic indicators are essential for economic studies, forecasts and economic policy designs. To meet their objective, they should be available in a frequent and timely fashion. However, official data are usually released with a long delay. Web-based economic indicators can be made available on real-time basis, thus contributing to alleviate this lag. Across all the economic indicators, those related to the export orientation of the companies are particularly interesting because of the growing importance of international trade to most developed countries.

This paper proposes a prediction system that analyzes corporate websites to produce web-based economic indicators for the export orientation of the companies. To validate our approach, we compared the prediction accuracy of our model to a baseline model made by manually retrieving the web indicators from 350 corporate websites. Our results showed that the proposed prediction system captures most of the prediction accuracy of the model with manual web indicators, but achieved at a minimum processing cost.

Desamparados Blazquez, Josep Domenech, Jose A. Gil, Ana Pont

Web Innovation and Applications

Towards Real Time Contextual Advertising

The practice of placement of advertisements on a target webpage which are relevant to the page’s subject matter is called contextual advertising. Placement of such ads can lead to an improved user experience and increased revenue to the webpage owner, advertisement network and advertiser. The selection of these advertisements is done online by the advertisement network. Empirically, we have found that such advertisements are rendered later than the other content of the webpage which lowers the quality of the user experience and lessens the impact of the ads. We propose an offline method of contextual advertising where a website is classified into a particular category according to a given taxonomy. Upon a request from any web page under its domain, an advertisement is served from the pool of advertisements which are also classified according to the taxonomy. Experiments suggest that this approach is a viable alternative to the current form of contextual advertising.

Abhimanyu Panwar, Iosif-Viorel Onut, James Miller

On String Prioritization in Web-Based User Interface Localization

We have noticed that most of the current challenges affecting user interface localization could be easily approached if string prioritization would be made possible. In this paper, we tackle these challenges through Nimrod, a web-based internationalization tool that prioritizes user interface strings using a number of discriminative features. As a practical application, we investigate different prioritization strategies for different string categories from Wordpress, a popular open-source content management system with a large message catalog. Further, we contribute with WPLoc, a carefully annotated dataset so that others can reproduce our experiments and build upon this work. Strings in the WPLoc dataset are labeled as relevant and non-relevant, where relevant strings are in turn categorized as critical, informative, or navigational. Using state-of-the-art classifiers, we are able to retrieve strings in these categories with competitive accuracy. Nimrod and the WPLoc dataset are both publicly available for download.

Luis A. Leiva, Vicent Alabau

Affective, Linguistic and Topic Patterns in Online Autism Communities

Online communities offer a platform to support and discuss health issues. They provide a more accessible way to bring people of the same concerns or interests. This paper aims to study the characteristics of online autism communities (called Clinical) in comparison with other online communities (called Control) using data from 110 Live Journal weblog communities. Using machine learning techniques, we comprehensively analyze these online autism communities. We study three key aspects expressed in the blog posts made by members of the communities: sentiment, topics and language style. Sentiment analysis shows that the sentiment of the clinical group has lower valence, indicative of poorer moods than people in control. Topics and language styles are shown to be good predictors of autism posts. The result shows the potential of social media in medical studies for a broad range of purposes such as screening, monitoring and subsequently providing supports for online communities of individuals with special needs.

Thin Nguyen, Thi Duong, Dinh Phung, Svetha Venkatesh

A Product-Customer Matching Framework for Web 2.0 Applications

Finding matching customers for a product is critical in many applications, especially in the e-commerce field. In this paper, we propose a novel product-customer matching framework to handle this issue, which consists of two components: data preprocessing and query processing. During the data preprocessing phase, a generation rule is proposed to learn the user’s preference. With the spread of the web 2.0 applications, users like to rate some products they have experienced in the social applications, e.g. Dianping and Yelp. Hence, it is possible to construct users’ preferences based on their rating information. In the query processing phase, we first propose Top-

-Ranks Query, which integrates reverse top-

query and reverse

-ranks query, to find some users to match the query product, and then devise an efficient method (BBPA) to handle this new query. Finally, we evaluate the efficiency and effectiveness of our matching framework upon real and synthetic datasets, showing that our framework works well in finding matching users for a query product.

Qiangqiang Kang, Zhao Zhang, Cheqing Jin, Aoying Zhou

Rapid Development of Interactive Applications Based on Online Social Networks

Online social networks, like Twitter or Google+, are widely used for all kind of purposes, and the proliferation of smartphones enables their use anywhere, anytime. The instant messaging capabilities of these services are used in an ad-hoc way for social activities, like organizing meetings or gathering preferences among a group of friends, or as a means to contact community managers of companies or services.

Provided with automation mechanisms, posts (messages in social networks) can be used as a dialogue mechanism between users and computer applications. In this paper we propose the concept of post-based application, an application that uses short messages as a medium to obtain input commands from users and produce outputs, describing several scenarios where these applications are of interest. In addition, we provide an automated,

Model-Driven Engineering

approach (currently targeting Twitter) for their rapid construction, including dedicated Domain-Specific Languages to express the interesting parts to be detected in posts; and query matched posts, aggregate information or synthesize posts.

Ángel Mora Segura, Juan de Lara, Jesús Sánchez Cuadrado

Introducing the Public Transport Domain to the Web of Data

The public transport domain generates large amounts of structured data. Making that information available on the Web of Data and linking it to other data sources can enable new services and applications for the benefit of passengers as well as public transport providers. Most standard data models in the public transport domain lack explicit semantics and interoperability because they are modeled in an informal or implementation-centric way. In this paper, we describe the development process of an OWL-ontology based on existing data models and standards in the domain. We show that our ontology enables the development of advanced passenger information systems and we briefly illustrate the application in a tourism-themed prototype.

Christine Keller, Sören Brunk, Thomas Schlegel

Measuring and Mitigating Product Data Inaccuracy in Online Retailing

Driven by the proliferation of Smartphones and e-Commerce, consumers rely more on online product information to make purchasing decisions. Beyond price comparisons, consumers want to know more about feature differences of similar products. However, these comparisons require rich and accurate product data. As one of the first studies, we quantify how accurate online product data is today and evaluate existing approaches of mitigating inaccuracy. The result shows that the accuracy varies a lot across different Web sites and can be as low as 20%. However, when aggregating product information across different Web pages, the accuracy can be improved on average by 11.3%. Based on the analysis, we propose an attribute-based authentication approach based on Semantic Web to further mitigate online data inaccuracy.

Runhua Xu, Alexander Ilic

Challenge

WISE 2014 Challenge: Multi-label Classification of Print Media Articles to Topics

The WISE 2014 challenge was concerned with the task of multi-label classification of articles coming from Greek print media. Raw data comes from the scanning of print media, article segmentation, and optical character segmentation, and therefore is quite noisy. Each article is examined by a human annotator and categorized to one or more of the topics being monitored. Topics range from specific persons, products, and companies that can be easily categorized based on keywords, to more general semantic concepts, such as environment or economy. Building multi-label classifiers for the automated annotation of articles into topics can support the work of human annotators by suggesting a list of all topics by order of relevance, or even automate the annotation process for media and/or categories that are easier to predict. This saves valuable time and allows a media monitoring company to expand the portfolio of media being monitored. This paper summarizes the approaches of the top 4 among the 121 teams that participated in the competition.

Grigorios Tsoumakas, Apostolos Papadopoulos, Weining Qian, Stavros Vologiannidis, Alexander D’yakonov, Antti Puurula, Jesse Read, Jan Švec, Stanislav Semenov

Erratum: MultiMasher: Providing Architectural Support and Visual Tools for Multi-device Mashups

In the original version, the authors in reference [5] were wrong. It should be read as follows:

Ghiani, G., Paternò, F., Spano, L.D.: Creating Mashups by Direct Manipulation of Existing Web Applications. In: Costabile, M.F., Dittrich, Y., Fischer, G., Piccinno, A. (eds.) IS-EUD 2011. LNCS, vol. 6654, pp. 42–52. Springer, Heidelberg (2011)

Maria Husmann, Michael Nebeling, Stefano Pongelli, Moira C. Norrie

Backmatter

Titel: Web Information Systems Engineering – WISE 2014
herausgegeben von: Boualem Benatallah
Azer Bestavros
Yannis Manolopoulos
Athena Vakali
Yanchun Zhang
Verlag: Springer International Publishing
Electronic ISBN: 978-3-319-11746-1
Print ISBN: 978-3-319-11745-4
DOI: https://doi.org/10.1007/978-3-319-11746-1