Skip to main content

2017 | Buch

Web Information Systems Engineering – WISE 2017

18th International Conference, Puschino, Russia, October 7-11, 2017, Proceedings, Part II

herausgegeben von: Athman Bouguettaya, Prof. Yunjun Gao, Andrey Klimenko, Lu Chen, Xiangliang Zhang, Fedor Dzerzhinskiy, Weijia Jia, Stanislav V. Klimenko, Qing Li

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

The two-volume set LNCS 10569 and LNCS 10570 constitutes the proceedings of the 18th International Conference on Web Information Systems Engineering, WISE 2017, held in Puschino, Russia, in October 2017.

The 49 full papers and 24 short papers presented were carefully reviewed and selected from 195 submissions. The papers cover a wide range of topics such as microblog data analysis, social network data analysis, data mining, pattern mining, event detection, cloud computing, query processing, spatial and temporal data, graph theory, crowdsourcing and crowdsensing, web data model, language processing and web protocols, web-based applications, data storage and generator, security and privacy, sentiment analysis, and recommender systems.

Inhaltsverzeichnis

Frontmatter

Crowdsourcing and Crowdsensing

Frontmatter
Real-Time Target Tracking Through Mobile Crowdsensing

In order to track a single target in real-time across a large area, we proposed a novel method which combines mobile crowdsensing and existing sparse camera networks. Tracking is proceeded by reports, which either come from cameras or smart phone users. Intra-camera tracking is performed on selected cameras to identify target, and smart phone users can report with live photo or text when seeing the target. Such schema can largely help tracking target within blind area and increase the accuracy of target identification, due to the better identification ability of human eyes. Novel validation and correction mechanisms are designed to eliminate false reports, which ensures the robustness of our method. Compared with traditional cross-camera tracking methods, our design can be performed in real-time with better performance even if the target has appearance changes during the tracking. Simulations are done using road structures of our university, which validate the accuracy and robustness of our design.

Jinyu Shi, Weijia Jia
Crowdsourced Entity Alignment: A Decision Theory Based Approach

Crowdsourcing is a new computation paradigm that utilizes the wisdom of the crowd to solve problems which are difficult for computers (e.g., image annotation and entity alignment). In crowdsourced entity alignment tasks, there are usually large numbers of candidate pairs to be verified by the crowd workers, and each pair will be assigned to multiple workers to achieve high quality. Thus, two fundamental problems are raised: (1) question selection – what are the most beneficial questions that should be crowdsourced, and (2) question assignment – which workers should be assigned to answer a selected question? In this paper, we address these two problems by decision theory. Firstly, we define the problems on two budget constraints. The first takes the marginal gain into account, and the second focuses on the limited budget. Then, we formulate the decision-making problems under different budget constraints and build influence diagram to perform result inference. We propose two efficient algorithms to address these two problems. Finally, we conduct extensive experiments to validate the efficiency and effectiveness of our proposed algorithms on both synthetic and real data.

Yan Zhuang, Guoliang Li, Jianhua Feng
A QoS-Aware Online Incentive Mechanism for Mobile Crowd Sensing

Mobile crowd sensing has emerged as a compelling paradigm to provide sensing data for web information system. A number of incentive mechanisms have been proposed to stimulate smartphone users participation. The vast majority of work fails to take QoS into consideration. In general, QoS is of paramount importance as a standard criterion for mobile crowd sensing applications. In this paper, we propose a QoS-aware online incentive mechanism for maximizing the social welfare. In consideration of the dynamics, we design an approximation algorithm with $$\frac{1}{2}$$-competitive ratio to solve the online allocation problem. We conduct rigorous theoretical analysis and extensive experimental simulations, demonstrating that the proposed mechanism achieves truthfulness, individual rationality, high computational efficiency and low overpayment ratio.

Hui Cai, Yanmin Zhu, Jiadi Yu
Iterative Reduction Worker Filtering for Crowdsourced Label Aggregation

Quality control has been an important issue in crowdsourcing. In the label collection tasks, for a given question, requesters usually aggregate the redundant answers labeled from multiple workers to obtain the reliable answer. Researchers have proposed various statistical approaches for this crowd label aggregation problem. Intuitively these approaches can generate aggregation results with higher quality if the ability of the set of workers is higher. To select a set of workers who are possible to have the higher ability without additional efforts for the requesters, in contrast to the existing solutions which need to design a proper qualification test or use auxiliary information, we propose an iterative reduction approach for worker filtering by leveraging the similarity of two workers. The worker similarity we select is feasible for the practical cases of incomplete labels. We construct experiments based on both synthetic and real datasets to verify the effectiveness of our approach and discuss the capability of our approach in different cases.

Jiyi Li, Hisashi Kashima

Web Data Model

Frontmatter
Semantic Web Datatype Inference: Towards Better RDF Matching

In the context of RDF document matching/integration, the datatype information, which is related to literal objects, is an important aspect to be analyzed in order to better determine similar RDF documents. In this paper, we propose a datatype inference process based on four steps: (i) predicate information analysis (i.e., deduce the datatype from existing range property); (ii) analysis of the object value itself by a pattern-matching process (i.e., recognize the object lexical-space); (iii) semantic analysis of the predicate name and its context; and (iv) generalization of numeric and binary datatypes to ensure the integration. We evaluated the performance and the accuracy of our approach with datasets from DBpedia. Results show that the execution time of the inference process is linear and its accuracy can increase up to 97.10%.

Irvin Dongo, Yudith Cardinale, Firas Al-Khalil, Richard Chbeir
Cross-Cultural Web Usability Model

Research shows that different user interfaces are needed for successful communication with different cultural groups, yet studies on cross-cultural website usability are limited. This research works towards creating a culturally sensitive world wide web by addressing the gap with a novel cross-cultural website usability model. The authors’ prior work evaluated Australian, Chinese, and Saudi Arabian web pages and revealed significant differences in the use of web attributes including: layout, navigation, links, multimedia, visual representation, colour and text. This paper extends those findings by mapping the usage of web attributes with theories of culture to create website design guidelines and a usability measuring instrument. The development of this model includes: evaluation of element use, identification of prominent elements, organisation of cultural factors, organisation of HCI factors, development of design guidelines and development of the usability measuring instrument. This model simplifies the creation of cross-cultural websites, while enabling developers to evaluate page usability for different cultures.

Rukshan Alexander, David Murray, Nik Thompson
How Fair Is Your Network to New and Old Objects?: A Modeling of Object Selection in Web Based User-Object Networks

After success of Web 2.0, several web services are easily available and accessible. This has led to a rapid growth of both web users and objects. In this paper, we propose a growth model for user-object bipartite network that describes selection pattern of web objects. Here, both users and objects grow but edges evolve only from the object set. The network evolves by the arrival of external edges brought by new objects and/or internal edges created by old objects. Attachment of these edges to the users is either purely preferential to the degree of the users or purely random. We evaluate our proposed model using six real world user-object bipartite networks. The result shows good agreements between real data, model and simulation. We propose a novel technique to compute the number of preferential and random external and internal edges at each time step during the evolution of the network. Interesting inferences are reported after analysing and comparing different parameters of the model.

Anita Chandra, Himanshu Garg, Abyayananda Maiti
Modeling Complementary Relationships of Cross-Category Products for Personal Ranking

The category of the product acts as the label of the product. It also exemplifies users various needs and tastes. In the existing recommender systems, the focus is on similar products recommendation with little or no intention to investigate the cross-category and the complementary relationship between categories and products. In this paper, a novel method based on Bayesian Personalized Ranking (BPR) is proposed to integrate the complementary information between categories and the latent features of both users and items for better recommendation. By considering category dimensions explicitly, the model can alleviate the cold start issue and give the recommendation not only considering traditional similarity measure but complementary relationships between products as well. The method is evaluated comprehensively and the experimental results illustrate that our work optimized ranking significantly (with high recommendation performance).

Wenli Yu, Li Li, Fei Hu, Fan Li, Jinjing Zhang

Language Processing and Web Protocols

Frontmatter
Eliminating Incorrect Cross-Language Links in Wikipedia

Many Wikipedia articles that cover the same topic in different language editions are interconnected via cross-language links that enable the understanding of topics in multiple languages, as well as cross-language information retrieval applications. However, cross-language links are added manually by the users of Wikipedia and, as such, are often incorrect. In this paper, we propose an approach to automatically eliminate incorrect cross-language links based on the observation that groups of articles that are pairwise connected through cross-language links form independent connected components. For each incoherent component (i.e., one that contains two or more articles from the same language edition), our approach assigns a correctness score to its crosslinks and removes those with the lowest score to make the component coherent. The results of our evaluation on a snapshot of Wikipedia in 8 languages indicates that our approach shows quantitative promise.

Nacéra Bennacer, Francesca Bugiotti, Jorge Galicia, Mariana Patricio, Gianluca Quercini
Combining Local and Global Features in Supervised Word Sense Disambiguation

Word Sense Disambiguation (WSD) is a task to identify the sense of a polysemy in given context. Recently, word embeddings are applied to WSD, as additional input features of a supervised classifier. However, previous approaches narrowly use word embeddings to represent surrounding words of target words. They may not make sufficient use of word embeddings in representing different features like dependency relations, word order and global contexts (the whole document). In this work, we combine local and global features to perform WSD. We explore utilizing word embeddings to leverage word order and dependency features. We also use word embeddings to represent global contexts as global features. We conduct experiments to evaluate our methods and find out that our methods outperform the state-of-the-art methods on Lexical Sample WSD datasets.

Xue Lei, Yi Cai, Qing Li, Haoran Xie, Ho-fung Leung, Fu Lee Wang
A Concurrent Interdependent Service Level Agreement Negotiation Protocol in Dynamic Service-Oriented Computing Environments

Service Level Agreement (SLA) negotiations are capable of helping define the quality of service in order to meet the customer’s service requirements. To date, a large number of negotiation protocols are proposed to handle single SLA negotiations, but little work can be found in handling multiple interdependent SLA negotiations in dynamic negotiation environments. This paper proposes an adaptive protocol for concurrently handling multiple interdependent SLA negotiations in dynamic environments. First, interdependencies between SLA negotiations are represented by a graph-based model. Then, an updating mechanism is proposed to handle the dynamism of multiple SLA negotiations. By applying the proposed updating mechanism, a protocol for concurrently processing SLA negotiations in dynamic environments with unexpected changes of service requests is presented. Experimental results show that the proposed approach can effectively handle unexpected changes of service requests from customers in dynamic environments, and successfully lead multiple SLA negotiations to agreements aligning with customers.

Lei Niu, Fenghui Ren, Minjie Zhang
A New Static Web Caching Mechanism Based on Mutual Dependency Between Result Cache and Posting List Cache

Caching is an important optimization technique in search engine architectures. There exist various types of caches, such as result cache, posting list cache, intersection cache, snippet cache, and document cache. However, these caching techniques are studied separately. Although several multiple level caches that integrate different types of caches have been proposed, the relationships among different caches are ignored. In this paper, we study the mutual dependency between the result cache and the posting list cache via empirical experiments and observe duplicate hits in the two types of caches. In order to better utilize the cache space and increase the hit ratio, three algorithms are proposed to implement a static cache mechanism based on the mutual dependency between the result cache and the posting list cache. A series of experiments were conducted on a real data set and the results have demonstrated the improvement of the hit ratio of our proposals.

Thanh Trinh, Dingming Wu, Joshua Zhexue Huang

Web-Based Applications

Frontmatter
A Large-Scale Visual Check-In System for TV Content-Aware Web with Client-Side Video Analysis Offloading

The intuitive linkage between TV and the web brings about new opportunities to motivate people to watch video content or visit websites. A check-in system that recognizes which specific programs are being watched by users is highly effective in promoting TV content. However, such a check-in system faces two technical problems: the temporal characteristics of broadcasting media, resulting in a massive number of simultaneous check-in requests, and the wide variation of audience environments, such as lighting, cameras, and TV devices. We propose a visual check-in system for linking websites and TV programs. The system identifies what program a user is watching by analyzing the visual features of a video captured with a smartphone. The key technology is a real-time video analysis framework that achieves both scalability to an enormous number of simultaneous requests and practical robustness in terms of content identification. We have constructed a special color scheme consisting of 120 (non-neutral) colors to absorb differences in the illumination levels of user environments. This color scheme plays an important role in offloading video analysis tasks onto the client-side in a tamper-proof way. Our system assigns a unique color scheme to each user and verifies a check-in request using the corresponding color scheme, thus preventing malicious users from sharing the analysis results with others. Experimental results using a real dataset demonstrate the accuracy and efficiency of the proposed method. We have applied the system to actual TV programs and clarified its scalability and precision in a production environment.

Shuichi Kurabayashi, Hiroki Hanaoka
A Robust and Fast Reputation System for Online Rating Systems

Recent studies have shown that reputation escalation is emerging as a new service, by which dealers pay to receive good feedback and escalate their ratings in online shopping markets. With the dramatic increase in the number of ratings provided by consumers, scalability has arisen as a significant issue in the existing methods of reputation systems. In order to tackle such issue, we here propose a fast algorithm that calculates the reputation based on a random sample of the ratings. Since the randomly selected sample has a logarithmic size, it guarantees a feasible scalability for large-scale online review systems. In addition, the randomness nature of the algorithm makes it robust against unfair ratings. We analyze the effectiveness of the proposed algorithm through extensive empirical evaluation using real world and synthetically generated datasets. Our experimental results show that the proposed method provides a high accuracy while running much faster than the existing iterative filtering approach.

Mohsen Rezvani, Mojtaba Rezvani
The Automatic Development of SEO-Friendly Single Page Applications Based on HIJAX Approach

In this study, we provide a method and develop a library for the automatic development of single-page web applications or SPA–based websites. The SPA–based websites run AJAX calls and client-side scripts while search engines do not run scripts within pages. Thus, SPA–based websites are not completely indexed by search engine crawlers. It is necessary that all AJAX requests in web applications can also be requested by static links. We propose a method that suggests all ajax fine-grained calls are also indicated in links URLs. This method allows us to design a SEO-friendly SPA-based website without any client-side programming. Moreover, a new feature is provided that helps to load several pages, as subpages, inside a page in both server-side and client-side. This nested loading can be repeated without any limitation. Each page’s (or subpage’s) URL, as a client-side application state presentation, has a specific query string parameter which specifies its subpages’ addresses. Besides, all links inside the page have a specific query string parameter which indicates the application state of the client-side.

Siamak Hatami
Towards Intelligent Web Crawling – A Theme Weight and Bayesian Page Rank Based Approach

With the rapid development of Internet, the web crawler has become one of the key technologies for users to automatically obtain information from designated sites. The traditional web crawler technology has exposed several problems, such as low content accuracy due to simple filtering conditions with respect to crawling themes, low efficiency due to content duplication and long webpage update time. Aiming at solving these problems, we propose the TBPR (Theme weight and Bayesian Page Rank based crawler) approach by adopting a multi-queue model to achieve high efficiency and reduce content redundancy. Further, TBPR introduces a theme weights model to accurately classify web pages into user’s crawl concept and a Bayesian Page Rank model containing two novel factors to increase content accuracy. Our experiment applies TBPR to real world web contents, demonstrating its accuracy and efficiency.

Yan Tang, Lei Wei, Wangsong Wang, Pengcheng Xuan

Data Storage and Generator

Frontmatter
Efficient Multi-version Storage Engine for Main Memory Data Store

Multi-version storage engine is the fundamental component of modern main memory data store using the popular multiple version concurrency control (MVCC). The straightforward implementation of storage engine is to use a linked list to store multiple versions of an object. A read operation has to traverse the list for the specified version, which incurs pointer chasing. An optimization method implemented in HyPer is to store the current version in the object header, which is friendly to read the latest snapshot of data. However, a read operation still needs one extra pointer chasing in memory when accessing an object being updated. In this paper, we propose an efficient multi-version storage (EMS), a new storage engine for main memory data store. EMS embeds two latest versions in each object header, so that it can avoid the overhead of traversal of version list, especially in the update-intensive scenario. We present an implementation mechanism of widely used snapshot isolation level over EMS. The experimental results demonstrate that EMS outperforms the exiting multi-version storage engine of well-known main memory data stores in terms of throughput without excessive memory consumption.

Jinwei Guo, Bing Xiao, Peng Cai, Weining Qian, Aoying Zhou
WeDGeM: A Domain-Specific Evaluation Dataset Generator for Multilingual Entity Linking Systems

Entity Linking is the task to annotate ambiguous mentions in an unstructured text to the referent entities in the given knowledge base. To evaluate these approaches, there are a vast amount of general purpose benchmark datasets. However, it is difficult to evaluate domain-specific Entity Linking approaches due to lack of evaluation datasets for specific domains. This study presents a tool called WeDGeM as a multilingual evaluation set generator for specific domains using Wikipedia and DBpedia. Wikipedia category pages and DBpedia taxonomy are used for adjusting domain-specific annotated text generation. Wikipedia disambiguation pages are applied to determine the ambiguity level of the generated texts. Based on these texts, a use case for well-known Entity Linking systems supporting English and Turkish texts are evaluated in the movie domain.

Emrah Inan, Oguz Dikenelli
Extracting Web Content by Exploiting Multi-Category Characteristics

Extracting web content aims at separating web content from web pages since web content is organized and presented by different HTML templates and is surrounded by various information. Knowing little about template structures and noise information before extraction, the variability of page templates, etc., make the extraction process very challenging to guarantee extraction precision and extraction adaptability. This study proposes an effective web content extraction method for various web environments. To ensure extraction performance, we exploited three kinds of characteristics, visual text information, content semantics(instead of HTML tag semantics) and web page structures. These characteristics are then integrated into an extraction framework for extraction decisions for different websites. Comparative experiments on multiple web sites with two popular extraction methods, CETR and CETD, show that our proposed extraction method outperforms CETR on precision when keeping the same advantage on recall, and also gains 4% improvement over CETD on the average F1-score; especially, our method can provide better extraction performance when facing short content than CETD, and presents a better extraction adaptability.

Qian Wang, Qing Yang, Jingwei Zhang, Rui Zhou, Yanchun Zhang

Security and Privacy

Frontmatter
PrivacySafer: Privacy Adaptation for HTML5 Web Applications

Privacy protection is necessary in many applications in mobile and stationary environments. The advances in web applications with the introduction of HTML5 provide the possibility for cross-platform application support. Access to sensitive information is feasible via various means from such applications in order to provide a personalized user experience. Mechanisms to allow users to control this access are vital for a better web experience. In this work, we present our approach toward a mechanism for privacy protection in HTML5 web environments. User preferences for privacy policies can be specified via an indicated notation that considers contextual parameters. Preferences are taken into account during the execution adapting the application content. Our PrivacySafer approach is supported by implementations of extensions in two popular web browsers, Chrome and Firefox. An evaluation on the efficiency of the approach and the resulting web experience with a small group of users has been performed.

Georgia M. Kapitsaki, Theodoros Charalambous
Anonymity-Based Privacy-Preserving Task Assignment in Spatial Crowdsourcing

The ubiquity of mobile device and wireless networks flourishes the market of Spatial Crowdsourcing (SC), in which location constrained tasks are sent to workers and expected to be performed in some designated locations. To obtain a global optimal task assignment scheme, the SC-server usually needs to collect location information of all workers. During this process, there is a significant security concern, that is, SC-server may not be trustworthy, so it brings about a threat to workers location privacy. In this paper, we focus on the privacy-preserving task assignment in SC. By introducing a semi-honest third party, we present an approach for task assignment in which location privacy of workers can be protected in a k-anonymity manner. We theoretically show that the proposed model is secure against semi-honest adversaries. Experimental results show that our approach is efficient and can scale to real SC applications.

Yue Sun, An Liu, Zhixu Li, Guanfeng Liu, Lei Zhao, Kai Zheng
Understanding Evasion Techniques that Abuse Differences Among JavaScript Implementations

There is a common approach to detecting drive-by downloads using a classifier based on the static and dynamic features of malicious websites collected using a honeyclient. However, attackers detect the honeyclient and evade analysis using sophisticated JavaScript code. The evasive code indirectly identifies clients by abusing the differences among JavaScript implementations. Attackers deliver malware only to targeted clients on the basis of the evasion results while avoiding honeyclient analysis. Therefore, we are faced with a problem in that honeyclients cannot extract features from malicious websites and the subsequent classifier does not work. Nevertheless, we can observe the evasion nature, i.e., the results in accessing malicious websites by using targeted clients are different from those by using honeyclients. In this paper, we propose a method of extracting evasive code by leveraging the above differences to investigate current evasion techniques and to use them for analyzing malicious websites. Our method analyzes HTTP transactions of the same website obtained using two types of clients, a real browser as a targeted client and a browser emulator as a honeyclient. As a result of evaluating our method with 8,467 JavaScript samples executed in 20,272 malicious websites, we discovered unknown evasion techniques that abuse the differences among JavaScript implementations. These findings will contribute to improving the analysis capabilities of conventional honeyclients.

Yuta Takata, Mitsuaki Akiyama, Takeshi Yagi, Takeo Hariu, Shigeki Goto
Mining Representative Patterns Under Differential Privacy

Representative frequent pattern mining from a transaction dataset has been well studied in both the database and the data mining community for many years. One popular scenario is that if the input dataset contains private information, publishing representative patterns may pose great threats to individual’s privacy. In this paper, we study the subject of mining representative patterns under the differential privacy model. We propose a method that combines RPlocal with differential privacy to mine representative patterns. We analyze the breach of privacy in RPlocal, and utilize the differential privacy to protect the private information of transaction dataset. Through formal privacy analysis, we prove that our proposed algorithm satisfies $$\epsilon $$-differential privacy. Extensive experimental results on real datasets reveal that our algorithm produces similar number of representative patterns compared to RPlocal.

Xiaofeng Ding, Long Chen, Hai Jin
A Survey on Security as a Service

Security as a Service (SECaaS) has been demonstrated to be one of the increasingly popular ways to address security problems in Cloud Computing but still not very widely investigated. As a new concept, SECaaS could be treated as integrated security means and delivered as a service module in the Cloud. Reviewed from a number of related literature, this paper analyzes and categorizes SECaaS into three major groups including Protective, Detective, and Reactive based on security control perspectives. We discuss the three groups and their interplay in order to identify the key characteristics and problems that they aim to address therefore revealing potentials of research and industrial application in the cloud security and service-oriented computing field.

Wenyuan Wang, Sira Yongchareon

Sentiment Analysis

Frontmatter
Exploring the Impact of Co-Experiencing Stressor Events for Teens Stress Forecasting

Nowadays increasingly severe psychological stress becomes a major threat to adolescents’ health development. Accurate and timely stress forecast is of great significance for understanding adolescents’ mental health status. State-of-the-art microblog-based stress prediction utilizes only explicit self expression and behavior as cues, which may suffer from the problem of data sparsity: what if the user performs not so actively in microblog? As teenagers with similar background exhibit similar coping mechanism under co-experiencing stressor events, in this paper, we try to leverage the intra-group impact of co-experiencing stressor events to supplement sparse individual stress series and thus help improve individual stress prediction. Jointly considering stress response details, posting habit and individual profile, we quantify teenagers’ stress coping similarity under co-experiencing stressors using K-medoids model and represent the impact of co-experiencing stressors. Afterward, a cluster-based NARX recurrent neural network is constructed to combine intra-group impact of co-experiencing stressor events and individual stress series for stress prediction. Experiments upon the real dataset of 124 high school students demonstrate the effectiveness of our forecasting model. It is also proved that leveraging the impact of co-experiencing stressors significantly improves individual stress prediction.

Qi Li, Liang Zhao, Yuanyuan Xue, Li Jin, Ling Feng
SGMR: Sentiment-Aligned Generative Model for Reviews

Customers rate their purchases and leave comments when buying products from e-commerce web sites. However, the commentary information did not draw enough attention until recently. Users’ reviews contain much more information than other behaviors and the review text shows the characteristics of both products and users. Users’ comments are more likely to express their attitudes towards the purchasing. These sentiment tendencies will affect users when buying new products or rating products. In this paper, we propose Sentiment-aligned Generative Model for Reviews (SGMR) to combine rating dimensions with sentiment dimensions in users’ reviews. We extract sentiment topics using opinion mining methods. A generative feature reviews model based on sentiment is subsequently constructed. Finally the rating dimensions and sentiment dimensions align with each other with Factorization Machines (FM) model. Our model generates interpretable sentiment topics for latent sentiment dimensions. Experiments on real world datasets show that our proposed model leads to significant improvements compared with other baselines. Furthermore, our opinions have been confirmed that comments will affect other users’ purchasing and rating.

He Zou, Litian Yin, Dong Wang, Yue Ding
An Ontology-Enhanced Hybrid Approach to Aspect-Based Sentiment Analysis

Numerous reviews are available online regarding a wide range of products and services. Aspect-Based Sentiment Analysis aims at extracting sentiment polarity per aspect instead of only the whole product or service. In this work, we use restaurant data from Task 5 of SemEval 2016 to investigate the potential of ontologies to improve the aspect sentiment classification produced by a support vector machine. We achieve this by combining a standard bag-of-words model with external dictionaries and an ontology. Our ontology-enhanced methods yield significantly better performance compared to the methods without ontology features: we obtain a significantly higher $$F_{1}$$ score and require less than 60% of the training data for equal performance.

Daan de Heij, Artiom Troyanovsky, Cynthia Yang, Milena Zychlinsky Scharff, Kim Schouten, Flavius Frasincar
DARE to Care: A Context-Aware Framework to Track Suicidal Ideation on Social Media

The abundance and growing usage of social media has given an unprecedented access to users’ social accounts for studying people’s thoughts and sentiments. In this work, we are interested in tracking individual’s emotional states and more specifically suicidal ideation in microblogging services. We propose a probabilistic framework that models user’s online activities as a sequence of psychological states over time and predicts the emotional states by incorporating the context history. Based on Conditional Random Fields, our model is able to provide comprehensive interpretations of the relationship between the risk factors and psychological states. We evaluated our approach within real case studies of Twitter’ users that have demonstrated a serious change in their emotional states and online behaviour. Our experiments show that the model is able to identify suicidal ideation with high precision and good recall with substantial improvements on state-of-the-art methods.

Bilel Moulahi, Jérôme Azé, Sandra Bringay

Recommender Systems

Frontmatter
Local Top-N Recommendation via Refined Item-User Bi-Clustering

Top-$$N$$ recommendation has drawn much attention from many portal websites nowadays. The classic item-based methods based on sparse linear models (SLIM) have demonstrated very good performance, which estimate a single model for all users. Lately, local models have been considered necessary since a user only resembles a group of others but not all. Moreover, we find that two users with similar tastes on one item group may have totally different tastes on another. Thus, it is intuitive to make preference predictions for a user via item-user subgroups rather than the entire feedback matrix. For elegant local top-$$N$$ recommendation, this paper introduces a bi-clustering scheme to be integrated with SLIM, such that item-user subgroups are softly constructed to capture subtle preferences of users. A novel localized recommendation model is hence presented, and an alternative direction algorithm is devised to collectively learn item coefficient for each local model. To deal with the data sparsity issue during clustering, we conceive a refined feature-based distance measure to better model and reflect user-item interaction. The proposed method is experimentally compared with state-of-the-art methods, and the results demonstrate the superiority of our model.

Yuheng Wang, Xiang Zhao, Yifan Chen, Wenjie Zhang, Weidong Xiao
HOMMIT: A Sequential Recommendation for Modeling Interest-Transferring via High-Order Markov Model

Capturing user interest accurately is a key task for predicting personalized sequential action in recommender systems. Through preliminary investigation, we find that user interest is stable in short term, while changeable in long term. The user interest changes significantly during the interaction with the system, and the duration of a particular interest and the frequency of transition are also personalized. Based on this finding, a recommendation framework called HOMMIT is proposed, which can identify user interests and adapt an improved high-order Markov chain method to model the dynamic transition process of user interests. It can predict the transition trends of user interest and make personalized sequential recommendation. We evaluate and compare multiple implementations of our framework on two large, real-world datasets. The experiments are conducted to prove the high accuracy of our proposed sequential recommendation framework, which verified the importance of considering interest-transferring in recommendations.

Yang Xu, Xiaoguang Hong, Zhaohui Peng, Yupeng Hu, Guang Yang
Modeling Implicit Communities in Recommender Systems

In recommender systems, a group of users may have similar preferences on a set of items. As the groups of users and items are not explicitly given, these similar-preferences groups are called implicit communities (where users inside same communities may not necessarily know each other).Implicit communities can be detected with users’ rating behaviors. In this paper, we propose a unified model to discover the implicit communities with rating behaviors from recommender systems.Following the spirit of Latent Factor Model, we design a bayesian probabilistic graphical model which generates the implicit communities, where the latent vectors of users/items inside the same community follow the same distribution. An implicit community model is proposed based on rating behaviors and a Gibbs Sampling based algorithm is proposed for corresponding parameter inferences. To the best of our knowledge, this is the first attempt to integrate the rating information into implicit communities for recommendation.We provide a linear model (matrix factorization based) and a non-linear model (deep neural network based) for community modeling in recsys.Extensive experiments on seven real-world datasets have been conducted in comparison with 14 state-of-art recommendation algorithms. Statistically significant improvements verify the effectiveness of the proposed implicit community based models. They also show superior performances in cold-start scenarios, which contributes to the application of real-life recommender systems.

Lin Xiao, Gu Zhaoquan
Coordinating Disagreement and Satisfaction in Group Formation for Recommendation

Group recommendation has attracted significant research efforts for its importance in benefiting a group of users. There are two steps involved in this process, which are group formation and making recommendations. The studies on making recommendations to a given group has been studied extensively, however seldom investigation has been put into the essential problem of how the groups should be formed. As pointed in existing studies on group recommendation, both satisfaction and disagreement are important factors in terms of recommendation quality. Satisfaction reflects the degree to which the item is preferred by the members; while disagreement reflects the level at which members disagree with each other. As it is difficult to solve group formation problem, none of existing studies ever considered both factors in group formation.This paper investigates the satisfaction and disagreement aware group formation problem in group recommendation. In this work, we present a formulation of the satisfaction and disagreement aware group formation problem. We design an efficient optimization algorithm based on Projected Gradient Descent and further propose a swapping alike algorithm that accommodates to large datasets. We conduct extensive experiments on real-world datasets and the results verify that the performance of our algorithm is close to optimal. More importantly, our work reveals that proper group formation can lead to better performances of group recommendation in different scenarios. To our knowledge, we are the first to study the group formation problem with satisfaction and disagreement awareness for group recommendation.

Lin Xiao, Gu Zhaoquan
Factorization Machines Leveraging Lightweight Linked Open Data-Enabled Features for Top-N Recommendations

With the popularity of Linked Open Data (LOD) and the associated rise in freely accessible knowledge that can be accessed via LOD, exploiting LOD for recommender systems has been widely studied based on various approaches such as graph-based or using different machine learning models with LOD-enabled features. Many of the previous approaches require construction of an additional graph to run graph-based algorithms or to extract path-based features by combining user-item interactions (e.g., likes, dislikes) and background knowledge from LOD. In this paper, we investigate Factorization Machines (FMs) based on particularly lightweight LOD-enabled features which can be directly obtained via a public SPARQL Endpoint without any additional effort to construct a graph. Firstly, we aim to study whether using FM with these lightweight LOD-enabled features can provide competitive performance compared to a learning-to-rank approach leveraging LOD as well as other well-established approaches such as kNN-item and BPRMF. Secondly, we are interested in finding out to what extent each set of LOD-enabled features contributes to the recommendation performance. Experimental evaluation on a standard dataset shows that our proposed approach using FM with lightweight LOD-enabled features provides the best performance compared to other approaches in terms of five evaluation metrics. In addition, the study of the recommendation performance based on different sets of LOD-enabled features indicate that property-object lists and PageRank scores of items are useful for improving the performance, and can provide the best performance through using them together for FM. We observe that subject-property lists of items does not contribute to the recommendation performance but rather decreases the performance.

Guangyuan Piao, John G. Breslin
A Fine-Grained Latent Aspects Model for Recommendation: Combining Each Rating with Its Associated Review

Recently, several approaches simultaneously exploiting ratings and review texts have been proposed for personalized recommendations. These approaches apply topic modeling techniques on review texts to mining major latent aspects of the item (or the user) and align them with collaborative filtering algorithms to increase the accuracy and interpretability of rating prediction. However, they learn the topics for each item (or user) by harnessing all reviews related to it, which is not intuitive or in line with users’ rating and review behavior. In this paper, we propose a Fine-grained Latent Aspects Model (FLAM), which learns the topics for each review with the corresponding latent aspect ratings of the user and the item. FLAM is an united model of Latent Factor Model (LFM) and Latent Dirichlet Allocation (LDA). LFM, well-known for its high prediction accuracy, is employed to predict latent aspect ratings of the user and the item. LDA, a classical topic model, is used to extract latent aspects in the reviews. Our experiment results on 25 real-world datasets show the proposed model has superiority over state-of-the-art methods and can learn the latent topics that are interpretable. Furthermore, our model can alleviate the cold-start problem.

Xuehui Mao, Shizhong Yuan, Weimin Xu, Daming Wei
Auxiliary Service Recommendation for Online Flight Booking

Booking flights through online travel companies (OTCs) is becoming increasingly popular. In order to improve profits, OTCs often suggest additional optional auxiliary services, such as security insurance, a VIP lounge or a pick-up service, to passengers. In order to promote the sale of auxiliary services, these can be selected as a default when passengers purchase a flight. However, if a passenger does not want to buy these services, he will have to cancel them himself, which can result in a negative user experience. Therefore, a personalized auxiliary service recommendation approach is proposed (IR-GBDT), which is built on the Gradient Boosting Decision Tree (GBDT) model. GBDT is also applied to mine the interrelationships between services so that a service package is finally recommended. The experiments on a real dataset which includes 6-month’s of flight order data shows that our model has improved performance compared to the others. abstract environment.

Hongyu Lu, Jian Cao, Yudong Tan, Quanwu Xiao
How Does Fairness Matter in Group Recommendation

Group recommendation has attracted significant research efforts for its importance in benefiting a group of users. In contrast to personalized recommendation, group recommendation tries to recommend same set of items to a group of users. Therefore a gap exists between the group recommendation and individual recommendation in terms of individual satisfaction. We aim to explore the possibility of narrowing this gap by introducing the concept of fairness in group recommendation.In this work, we propose the concept of fairness in group recommendation and try to accommodate it into the recommendation algorithm so that the satisfaction of users in group recommendation can get close to that of individual recommendation. We utilize the concept of Ordered Weighted Average from fuzzy logic to evaluate the individual satisfaction of users and use min-max fairness metrics to accommodate the fairness into group recommendation process. We formulate the problem of group recommendation with fairness as an integer programming problem and propose efficient algorithms for three different OWA scenarios. Extensive experiments have been conducted on the real-world datasets and the results corroborate our analyses.

Lin Xiao, Gu Zhaoquan
Exploiting Users’ Rating Behaviour to Enhance the Robustness of Social Recommendation

In the rating systems, quite often it can be observed that some users rate few items, whereas some users rate a large number of items. Users’ rating scores also vary, i.e., some users’ scores are widely distributed while others are falling in a small range. Existing social recommendation approaches largely ignore such differences. We propose a peer-based relay recommendation method that exploits the credibility of users’ ratings. The credibility of a user’s rating is calculated according to its rating behaviour in terms of the number of ratings provided and the deviation from the normal behaviour. The credibility value of a user’s rating is incorporated when aggregating ratings from different users. Experiments are conducted on a large-scale social rating network for movie recommendations. The results show that the incorporation of credibility of users’ ratings can effectively reduce the impact of recommended rating noises with low credibility and enhance robustness of the system.

Zizhu Zhang, Weiliang Zhao, Jian Yang, Surya Nepal, Cecile Paris, Bing Li

Special Sessions on Security and Privacy

Frontmatter
A Study on Securing Software Defined Networks

Most of the IT infrastructure across the globe is virtualized and is backed by Software Defined Networks (SDN). Hence, any threat to SDN’s core components would potentially mean to harm today’s Internet and the very fabric of utility computing. After thorough analysis, this study identifies Crossfire link flooding technique as one of the lethal attacks that can potentially target the link connecting the control plane to the data plane in SDNs. In such a situation, the control plane may get disconnected, resulting in the degradation of the performance of the whole network and service disruption. In this work we present a detailed comparative analysis of the link flooding mitigation techniques and propose a framework for effective defense. It comprises of a separate controller consisting of a flood detection module, a link listener module and a flood detection module, which will work together to detect and mitigate attacks and facilitate the normal flow of traffic. This paper serves as a first effort towards identifying and mitigating the crossfire LFA on the channel that connects control plane to data plane in SDNs. We expect that further optimizations in the proposed solution can bring remarkable results.

Raihan Ur Rasool, Hua Wang, Wajid Rafique, Jianming Yong, Jinli Cao
A Verifiable Ranked Choice Internet Voting System

This paper, proposes a web-based voting system, which allows voters to cast and submit their electronic ballots by ranking all candidates according to their personal preference. Each ballot is treated as a square matrix, with each element encrypted using the ElGamal cryptosystem before submission. Furthermore, proof of partial knowledge and zero knowledge are used to verify the eligibility of ballots without accessing ballot contents. We also implement a prototype to test our proposed voting system. The security and performance analysis indicate the feasibility of the proposed protocols.

Xuechao Yang, Xun Yi, Caspar Ryan, Ron van Schyndel, Fengling Han, Surya Nepal, Andy Song
Privacy Preserving Location Recommendations

With the rapid development of location based social networks (LBSN) and location based services (LBS), the location recommendation to users has gained much attentions. A traditional location recommendation scheme may use any of the following information to generate a location recommendation: users’ check-in frequencies on different locations, their distance of other locations from any point of interest (POI), time of visiting different locations, social influence or interests on those locations which are visited by friends and so on. Depending on different contexts and tastes, results of recommending new location may vary. Again the users might have specific preferences of context to find the most suitable locations for him. However, these contextual information and preferences related to users are personal and an user usually does not want to reveal these information to any third party which are the main source of information to generate a recommendation. Revealing these information may cause to misuse or expose the data to third parties which is clearly breaching privacy of users. In this circumstances, it is essential to hide users’ check-in history in different locations from service providers, and get advantages of the server’s processing power to generate user personalized location recommendations. To address these challenges we present a cryptographic framework to preserve users’ privacy and simultaneously generating location recommendations for users. We also incorporate users’ friendship network along with the location preferences and show that users are able to choose their friends’ preferences on different locations to influence the recommendation results without revealing any information. The security and performance analysis show that the protocol is secure as well as practical.

Shahriar Badsha, Xun Yi, Ibrahim Khalil, Dongxi Liu, Surya Nepal, Elisa Bertino
Botnet Command and Control Architectures Revisited: Tor Hidden Services and Fluxing

Botnet armies constitute a major and continuous threat to the Internet. Their number, diversity, and power grows with each passing day, and the last years we are witnessing their rapid expansion to mobile and even IoT devices. The work at hand focuses on botnets which comprise mobile devices (e.g. smartphones), and aims to raise the alarm on a couple of advanced Command and Control (C&C) architectures that capitalize on Tor’s hidden services (HS) and DNS protocol. Via the use of such architectures, the goal of the perpetrator is dual; first to further obfuscate their identity and minimize the botnet’s forensic signal, and second to augment the resilience of their army. The novelty of the introduced architectures is that it does not rely on static C&C servers, but on rotating ones, which can be reached by other botnet members through their (varied) onion address. Also, we propose a scheme called “Tor fluxing”, which opposite to legacy IP or DNS fluxing, does not rely on A type of DNS resource records but on TXT ones. We demonstrate the soundness and effectiveness of the introduced C&C constructions via a proof-of-concept implementation.

Marios Anagnostopoulos, Georgios Kambourakis, Panagiotis Drakatos, Michail Karavolos, Sarantis Kotsilitis, David K. Y. Yau
My Face is Mine: Fighting Unpermitted Tagging on Personal/Group Photos in Social Media

In social media such as Facebook, the sharing of photos among users is common and enjoyable but also very dangerous when the uploader posts photo online without the consents from other participants in the same photo. As a solution, recent research has developed a fine-grained access control on social media photos. Every participant will be tagged by the uploader and notified through internal messages to initialise their own access control strategies. The appearance of participants will be blurred if they want to preserve their own privacy in a photo. However, these methods highly depend on the uploader’s reputation of tagging behaviours. Adversaries can easily manipulate unpermitted tagging processes and then publish photos, which should have kept confidential to the public in social media. In order to solve this critical problem, we propose developing a participant-free tagging system for social media photos. This system excludes potential adversaries through automatic tagging processes over two cascading stages: (1) participants are tagged through internal searching which is based on the portrait samples collected in initialisation stage for every new user; (2) the remaining untagged participants will be identified cooperatively through tagged users. In the evaluation, we carried out a series of experiments to validate our system’s efficiency and effectiveness. All the results demonstrate the tagging efficiency and effectiveness in protecting users’ privacy.

Lihong Tang, Wanlun Ma, Sheng Wen, Marthie Grobler, Yang Xiang, Wanlei Zhou
Cryptographic Access Control in Electronic Health Record Systems: A Security Implication

An electronic health record (EHR) system is designed to allow individuals and their health care providers to access their key health information online. These systems are considered more efficient, less error-prone and higher availability over traditional paper based systems. However, privacy and security concerns are arguably the major barriers in adoption of these systems globally including Australia. Individuals are unwilling to accept EHR systems unless they ensure their shared key health information is securely stored, a proper access control mechanism is used and any unauthorised disclosure is prevented. In this paper, we propose a cryptographic access control mechanism to protect the health information in EHR systems. We also developed a new encryption framework for the cryptographic access control to maintain a high level of protection. We systematically review the traditional cryptography methods to identify the weaknesses in order to overcome those weaknesses in our new method.

Pasupathy Vimalachandran, Hua Wang, Yanchun Zhang, Guangping Zhuo, Hongbo Kuang
SDN-based Dynamic Policy Specification and Enforcement for Provisioning SECaaS in Cloud

In this paper we make use of SDN for provisioning of Security as a Service (SECaaS) to the tenant and simplify the security management in cloud. We have developed a Security Application (SA) for the SDN Controller which is used for capturing the tenant security requirements and enforcing the related security policies for securing their virtual machines (VMs). We have developed a security policy specification language for enforcing TPM, Access Control and Intrusion Detection related security policies with the SA. Finally we present the prototype implementation of our approach and some performance results.

Uday Tupakula, Vijay Varadharajan, Kallol Karmakar
Topic Detection with Locally Weighted Semi-supervised Collective Learning

Topic detection and tracking (TDT) under modern media circumstances has been dramatically innovated with the ever-changing social network and some of the inconspicuous connections among participants in the internet communities. Instead of only considering the varied word features of analysing materials, detecting and tracking topics in multi-relational data with incidental information becomes a new trend for prevalent topic models, for example, the use of link structures and time series. In this paper, we employ the users’ groups extracted from Twitter as the social context that accompanied the corresponding news articles and explore the interior links among data points to develop the non-negative factorization methods with semi-supervised information. A locally weighted scheme is applied to original data points to differentiate the proximity of approximate points for a better approximation. We evaluate our proposed method on synthetic data set as well as real news data set combining social information extracted from Twitter. The experimental results show the performance improvement of our method comparing to other baseline methods.

Ye Wang, Yong Quan, Bin Zhou, Yanchun Zhang, Min Peng
Backmatter
Metadaten
Titel
Web Information Systems Engineering – WISE 2017
herausgegeben von
Athman Bouguettaya
Prof. Yunjun Gao
Andrey Klimenko
Lu Chen
Xiangliang Zhang
Fedor Dzerzhinskiy
Weijia Jia
Stanislav V. Klimenko
Qing Li
Copyright-Jahr
2017
Electronic ISBN
978-3-319-68786-5
Print ISBN
978-3-319-68785-8
DOI
https://doi.org/10.1007/978-3-319-68786-5