scroll identifier for mobile
main-content

## Über dieses Buch

This volume constitutes the refereed proceedings of the Confederated International Conferences: Cooperative Information Systems, CoopIS 2016, Ontologies, Databases, and Applications of Semantics, ODBASE 2016, and Cloud and Trusted Computing, C&TC, held as part of OTM 2016 in October 2016 in Rhodes, Greece.
The 45 full papers presented together with 16 short papers were carefully reviewed and selected from 133 submissions. The OTM program every year covers data and Web semantics, distributed objects, Web services, databases, information systems, enterprise workow and collaboration, ubiquity, interoperability, mobility,grid and high-performance computing.

## Inhaltsverzeichnis

### Software-Defined Simulations for Continuous Development of Cloud and Data Center Networks

Cloud network systems and applications are tested in simulation and emulation environments prior to physical deployments, at different stages of development. Software-Defined Networking (SDN) enables separating logic and execution from the data plane consisting of switches and hosts, to a logically centralized control plane. The global view and control available to the controller enable incremental updates, management, and allocation of resources to the networks. However, unlike the physical networks or the networks emulated by the emulators, current network simulators still lack integration with the SDN controllers.Hence, currently it is impossible to efficiently orchestrate a simulated network through a centralized controller, or realistically model the controller algorithms and SDN architectures without having the resources for a one-to-one emulation. To address this, this paper presents SDNSim, an SDN simulation middleware, which leverages the principles of SDN for continuous development of cloud and data center networks. SDNSim is an “SDN-aware” network simulator that integrates with the controller through plugins for southbound protocols such as OpenFlow, to execute the algorithms incrementally thus deployed in the control plane.

### Continuous Top-k Queries in Social Networks

Information streams provide today a prevalent way of publishing and consuming content on the Web, especially due to the great success of social networks. Top-k queries over the streams of interest allow limiting results to the most relevant content, while continuous processing of such queries is the most effective approach in large scale systems. However, current systems fail in combining continuous top-k processing with rich scoring models including social network criteria. We present here the SANTA algorithm, able to handle scoring functions including content similarity, but also social network criteria and events in a continuous processing of top-k queries. We propose a variant (SANTA+) that accelerates the processing of interaction events in social networks. We compare SANTA/SANTA+ with an extension of a state-of-the-art algorithm and report a rich experimental study of our approach.

Abdulhafiz Alkhouli, Dan Vodislav, Boris Borzic

### Process Synthesis with Sequential and Parallel Constraints

Synthesis is the generation of a process model that fulfills a set of declarative constraints, a. k. a. properties. In this article, we study synthesis in the presence of both so-called sequential and parallel constraints. Sequential constraints state that certain tasks must occur in a specific ordering. Parallel constraints specify the maximal degree of parallelization at a certain position in a process model. Combining both sequential and parallel constraints in one approach is difficult, because their interference is complex and hard to foresee. Besides this, with large specifications, solutions which do not scale are not viable either. Our synthesis approach consists of two steps. First, we generate a model fulfilling only the sequential constraints. We then apply a novel algorithm that deparallelizes the process to fulfill the parallel constraints as well as any additional optimization criteria. We evaluate our approach using the real-world use case of commissioning in vehicle manufacturing. In particular, we compare our synthesized models to ones domain experts have generated by hand. It turns out that our synthesized models are significantly better than these reference points.

Richard Mrasek, Jutta Mülle, Klemens Böhm

### Transition Adjacency Relation Computation Based on Unfolding: Potentials and Challenges

Transition Adjacency Relation (TAR) has provided a useful perspective for process model similarity measurement. Motivated by recent developments of other similarity metrics, this article puts TAR computation in the context of Petri net unfolding. Apart from being significantly faster than existing TAR computation algorithms, unfolding-based TAR computation also provides the potentials of enhancement through combination with other metrics that can be obtained from unfolding, especially the popular Behavior Profiles. We show that TAR computation can generally be reduced to coverability problem and solved using unfolding. However, there are also questions to be answered regarding how to further exploit unfolding information for optimal efficiency and handle silent transitions. In this article, we discuss what has been learned from our research, and also point out the open issues.

Jisheng Pei, Lijie Wen, Xiaojun Ye, Akhil Kumar, Zijing Lin

### Multi-perspective Anomaly Detection in Business Process Execution Events

Ensuring anomaly-free process model executions is crucial in order to prevent fraud and security breaches. Existing anomaly detection approaches focus on the control flow, point anomalies, and struggle with false positives in the case of unexpected events. By contrast, this paper proposes an anomaly detection approach that incorporates perspectives that go beyond the control flow, such as, time and resources (i.e., to detect contextual anomalies). In addition, it is capable of dealing with unexpected process model execution events: not every unexpected event is immediately detected as anomalous, but based on a certain likelihood of occurrence, hence reducing the number of false positives. Finally, multiple events are analyzed in a combined manner in order to detect collective anomalies. The performance and applicability of the overall approach are evaluated by means of a prototypical implementation along and based on real life process execution logs from multiple domains.

Kristof Böhmer, Stefanie Rinderle-Ma

### Scalable Detection of Overlapping Communities and Role Assignments in Networks via Bayesian Probabilistic Generative Affiliation Modeling

A new generative model of directed networks is developed to explain link formation from a Bayesian probabilistic perspective. Essentially, nodes can be affiliated to multiple (or, even, all) communities as well as roles. Affiliations are dichotomized to account for link direction. The unknown strength of node affiliations to communities and roles is captured through latent nonnegative random variables, that are ruled by Gamma priors for better model interpretability. Overall, such random variables are meant to generalize both mixed-membership and directed affiliation modeling, which allows for a differentiated connectivity structure inside communities. The probability of a link between two nodes is governed by a Poisson distribution, whose rate increases with the number of shared community affiliations as well as the strength of their affiliations to the common communities and respective roles. The properties of the Poisson distribution are especially beneficial on sparse networks for faster posterior inference. The latter is implemented by a coordinate-ascent variational algorithm enabling affiliation exploration and link prediction. The results of a comparative evaluation carried out on several real-world networks show the overcoming performance of the devised approach in community compactness, link prediction and scalability.

Gianni Costa, Riccardo Ortale

### Cooperative Routing and Scheduling of an Electric Vehicle Fleet Managing Dynamic Customer Requests

Environmental issues and consumer concerns have paved the way for governments to legislate and help usher into operation alternative-fueled vehicles and pertinent infrastructures. In the last decade, battery-powered electric vehicles have been introduced and the service industry has followed suit and deployed such trucks in their distribution networks. However, electric vehicles do impose limitations when it comes to their traveling range. Replenishing the power to the vehicle batteries may entail lengthy charging visits at respective stations. In this paper, we examine the problem of routing and scheduling a fleet of electric vehicles that seek to satisfy dynamic pickup and delivery requests in an urban environment. We develop a web application to facilitate cooperation between organizations and individuals involved in urban freight transport. The application uses geolocation services and mobile devices to help manage the fleet and make timely decisions. Moreover, we propose three heuristic recharging strategies to ensure that electric vehicles can restore their energy levels in an effective manner. Through detailed experimentation, we show that the costs associated with the use of an electric vehicle fleet concern mainly the size of the fleet. The impact regarding the total route length traveled is less evident for all our strategies.

Panagiotis Liakos, Iosif Angelidis, Alex Delis

### Process Instance Similarity: Potentials, Metrics, Applications

The analysis of process instance similarity offers valuable input for certain application fields including the evaluation of instance clusters, the identification of compliance abuses, and process optimization. In this paper, we discuss the topic of instance similarity in general: We show that similarity might be determined from different process perspectives such as control flow, time, and instance attributes. Each of these perspectives impose individual requirements on the similarity calculation concerning data and structure. Four metrics for process instance similarity are proposed covering different perspectives. The applicability and feasibility of the proposed metrics are evaluated based on a prototypical implementation and real-world process logs from the BPI challenges.

Johannes Pflug, Stefanie Rinderle-Ma

### Users Views on Others – Analysis of Confused Relation-Based Terms in Social Network

Nowadays, online social networks are used everywhere. Thus, areas such as social network analysis or recommender systems are currently very important. Roles related to relations between users, concerning influential, trusted or popular individuals are known to be crucial in these areas. While the significance of such roles is undeniable, the terms connected to these roles are not precisely specified and generally confused. In this article, we focus on the roles of users connected to terms trust, reputation, influence and popularity in the scope of social network analysis and social recommendations. We analyze existing works using these roles in order to compare and contrast their interpretations. We emphasize the most important features that the definitions of these notions should include and make the comparative analysis of the most often confused terms: trust vs reputation, and influence vs popularity. We also present the notions global classification concerning their abstract level, define the terms and distinguish them from one another.

Monika Rakoczy, Amel Bouzeghoub, Katarzyna Wegrzyn-Wolska, Alda Gancarski Lopes

### Optimization and Approximate Placement of Autonomic Resources for the Management of Service-Based Applications in the Cloud

Cloud Computing is a new distributed computing paradigm that consists in provisioning of infrastructure, software and platform resources as services. This paradigm is being increasingly used for the deployment and execution of service-based applications. To efficiently manage them according the autonomic computing approach, service-based applications can be associated with autonomic managers that monitor them, analyze monitoring data, plan and execute configuration action on them. Although, in these last years, autonomic management of cloud services has received an increasing attention, optimization of autonomic managers (AMs) assigned to cloud services and their placement in the cloud remain not well explored. In fact, almost all the existing solutions on autonomic computing have been interested in modeling and implementing of autonomic environments without paying attention on optimization. To address this issue, we present in this paper a novel approach to optimize autonomic management of service-based applications that consists in minimizing both the cost of allocated AMs while avoiding bottlenecks in management and the cost of their placement in the cloud (the inter-virtual machine communication cost). We propose two algorithms: (i) an algorithm that determines the optimal number of AMs to be assigned to services of a managed service-based application, and (ii) an algorithm that approximates the optimal placement of AMs in the cloud. Experiments conducted show the efficiency of our finding.

### Patorc: Pattern Oriented Compression for Semantic Data Streams

Recently, semantic data streams were proposed as a solution to cope with the heterogeneity of the original streams. However, nowadays, huge volumes of data are produced on the web, at very high velocity. This may provoke bottleneck effect and decrease efficiency of RDF stream processing engines. One approach to address this issue is to compress the data in the stream to decrease the delays and costs of the RDF exchange on the network. In this paper, we propose Patorc: a PATern ORiented Compression approach, a lossless method for compressing semantic data stream. Our approach takes advantage of the RDF data streams key features, which are the regularity of their graph structure and the redundancy of part of data. Experiments carried on publicly available datasets have demonstrated the effectiveness of our approach.

Fethi Belghaouti, Amel Bouzeghoub, Zakia Kazi-Aoul, Raja Chiky

### Online Discovery of Cooperative Structures in Business Processes

Process mining is a data-driven technique aiming to provide novel insights and help organizations to improve their business processes. In this paper, we focus on the cooperative aspect of process mining, i.e., discovering networks of cooperating resources that together perform processes. We use online streams of events as an input rather than event logs, which are typically used in an off-line setting. We present the Online Cooperative Network (OCN) framework, which defines online cooperative resource network discovery in a generic way. A prototypical implementation of the framework is available in the open source process mining toolkit ProM. By means of an empirical evaluation we show the applicability of the framework in the streaming domain. The techniques presented operate in a real time fashion and are able to handle unlimited amounts of data. Moreover, the implementation allows to visualize network dynamics, which helps in gaining insights in changes in the execution of the underlying business process.

S. J. van Zelst, B. F. van Dongen, W. M. P. van der Aalst

### Human-in-the-Loop Web Resource Classification

Engaging humans in the resolution of classification tasks has been shown to be effective especially when digital resources are considered, with complex features to be abstracted for an automated procedure, like images or multimedia web resources. In this paper, we propose the $$\mathsf {HC^2}$$ crowdclustering approach for unsupervised classification of web resources, by allowing the classification categories to dynamically emerge from the crowd. In $$\mathsf {HC^2}$$, crowd workers actively participate to clustering activities (i) by resolving tasks in which they are asked to visually recognize groups of similar resources and (ii) by labeling recognized clusters with prominent keywords. To increase flexibility, $$\mathsf {HC^2}$$ can be interactively configured to dynamically set the balance between human engagement and automated procedures in cluster formation, according to the kind and nature of resources to be classified. For experimentation and evaluation, the $$\mathsf {HC^2}$$ approach has been deployed on the Argo platform providing crowdsourcing techniques for consensus-based task execution.

Silvana Castano, Alfio Ferrara, Stefano Montanelli

### Top-k Queries Over Uncertain Scores

Modern recommendation systems leverage some forms of collaborative user or crowd sourced collection of information. For instance, services like TripAdvisor, Airbnb and HungyGoWhere rely on user-generated content to describe and classify hotels, vacation rentals and restaurants. By nature of such independent collection of information, the multiplicity, diversity and varying quality of the information collected result in uncertainty. Objects, such as the services offered by hotels, vacation rentals and restaurants, have uncertain scores for their various features.In this context, ranking of uncertain data becomes a crucial issue. Several data models for uncertain data and several semantics for probabilistic top-k queries have been proposed in the literature. We consider here a model of objects with uncertain scores given as probability distributions and the semantics proposed by the state of the art reference work of Soliman, Hyas and Ben-David.In this paper, we explore the design space of Metropolis-Hastings Markov chain Monte Carlo algorithms for answering probabilistic top-k queries over a database of objects with uncertain scores. We are able to devise several algorithms that yield better performance than the reference algorithm. We empirically and comparatively prove the effectiveness and efficiency of these new algorithms.

Qing Liu, Debabrota Basu, Talel Abdessalem, Stéphane Bressan

### RoSE: Reoccurring Structures Detection in BPMN 2.0 Process Model Collections

The detection of structural similarities of process models is frequently discussed in the literature. The state-of-the-art approaches for structural similarities of process models presume a known subgraph that is searched in a larger graph, and utilize behavioral and textual semantics to achieve their goal. In this paper we propose an approach to detect reoccurring structures in a collection of BPMN 2.0 process models, without the knowledge of a subgraph to be searched, and by focusing solely on the structural characteristics of the process models. The proposed approach deals with the problems of subgraph isomorphism, frequent pattern discovery and maximum common subgraph isomorphism, which are mentioned as NP-hard in the literature. In this work we present a formal model and a novel algorithm for the detection of reoccurring structures in a collection of BPMN 2.0 process models. We then apply the algorithm to a collection of 1,806 real-world process models and provide a quantitative and qualitative analysis of the results.

Marigianna Skouradaki, Vasilios Andrikopoulos, Oliver Kopp, Frank Leymann

### A Multigraph Approach for Web Services Recommendation

In this paper, we describe a Web services recommendation approach where the services’ ecosystem is represented as a heterogeneous multigraph, and edges may have different semantics. The recommendation process relies on clustering techniques to suggest services “of interest” to a user. Our approach has been implemented as a tool called WesReG (Web services Recommendation with Graphs) on top of Neo4j and its cypher query language. We present the system implementation details and present the results of experiments on a collection of real Web services.

Fatma Slaimi, Sana Sellami, Omar Boucelma, Ahlem Ben Hassine

### A Generic Framework for Context-Aware Process Performance Analysis

Process mining combines model-based process analysis with data-driven analysis techniques. The role of process mining is to extract knowledge and gain insights from event logs. Most existing techniques focus on process discovery (the automated extraction of process models) and conformance checking (aligning observed and modeled behavior). Relatively little research has been performed on the analysis of business process performance. Cooperative business processes often exhibit a high degree of variability and depend on many factors. Finding root causes for inefficiencies such as delays and long waiting times in such flexible processes remains an interesting challenge. This paper introduces a novel approach to analyze key process performance indicators by considering the process context. A generic context-aware analysis framework is presented that analyzes performance characteristics from multiple perspectives. A statistical approach is then utilized to evaluate and find significant differences in the results. Insights obtained can be used for finding high-impact points for optimization, prediction, and monitoring. The practical relevance of the approach is shown in a case study using real-life data.

Bart F. A. Hompes, Joos C. A. M. Buijs, Wil M. P. van der Aalst

### ExRORU: A New Approach to Characterize the Behavioral Semantics of Process Models (Short Paper)

A recent paper has proposed new ordering relations with uncertainty between the executions of tasks in acyclic process models. However, this approach cannot work for cyclic process models and those with silent transitions and non-free-choice constructs. In practice most non-trivial process models contain cycles and about 10 % to 20 % have also non-free-choice constructs. In this paper, we show how to overcome these problems by a refinement of the relations (i.e., extended refined ordering relations with uncertainty, ExRORU for short). All these relations can uniquely detect the behavioral differences between any pair of process models and can also be computed efficiently based on the complete prefix unfolding of a process model. Experiments on real-life and synthesized process models show that ExRORU is both effective and scalable.

Shuhao Wang, Lijie Wen, Akhil Kumar, Jianmin Wang, Jianwen Su

### An Architecture and Common Data Model for Open Data-Based Cargo-Tracking in Synchromodal Logistics

In logistics, questions as “Where is my container?” and “When does my container arrive?” can often not be answered with sufficient precision, which restricts the ability of logistics service providers to be efficient. Since logistics is complex and often involves multiple transportation modes and carriers, improving efficiency and saving costs in the supply chain requires communication between the different parties and the usage of real-time data is critical. Currently, logistics service providers (LSPs) use real-time data to a very limited extent, mainly for tracking the progress of a specific part of a given shipment. This data is retrieved manually from a number of websites and sharing with other actors is not even considered. This leads to lack of end-to end visibility and delays in planning. This research proposes an architecture and a common data model for an integration platform that allows the automated collection of real time container tracking data enabling LSPs to plan more efficient. Currently, there is no common data model available that contains all the information required and enables LSPs to track their shipments real-time. The common data model is designed via a bottom-up approach using results of interviews, observations at different logistics service providers, analyses of open data on websites, and serves the information needs of the business processes involving such data. The model is also validated against industry standards. Based on the proposed architecture a prototype was built that is tested in real operating conditions with a fourth party logistics company.

Wouter Bol Raap, Maria-Eugenia Iacob, Marten van Sinderen, Sebastian Piest

### A Framework for Clustering and Classification of Big Data Using Spark

Nowadays, massive data sets are generated in many modern applications ranging from economics to bioinformatics, and from social networks to scientific databases. Typically, such data need to be processed by machine learning algorithms, which entails high processing cost and usually requires the execution of iterative algorithms. Spark has been recently proposed as a framework that supports iterative algorithms over massive data efficiently. In this paper, we design a framework for clustering and classification of big data suitable for Spark. Our framework supports different restrictions on the data exchange model that are applicable in different settings. We integrate k-means and ID3 algorithms in our framework, leading to interesting variants of our algorithms that apply to the different restrictions on the data exchange model. We implemented our algorithms over the open-source computing framework Spark and evaluated our approach in a cluster of 37-nodes, thus demonstrating the scalability of our techniques. Our experimental results show that we outperform the algorithm provided by Spark for k-means up to 31 %, while the centralized k-means is at least one order of magnitude worse.

Xristos Mallios, Vasilis Vassalos, Tassos Venetis, Akrivi Vlachou

### A Global SLA-Aware Approach for Aggregating Services in the Cloud

With the advent of Cloud computing, services are more and more deployed on the worldwide and enterprises are massively migrating their application to the cloud. By nature, enterprises requirements are complex unlike provided Cloud services which are more simple and atomic.Therefore a significant research problem, is how to compose a set of independent services in order to satisfy complex requirements. In addition, performance considerations are vital for the overall success of the composition, including the optimum cost of composed services, reliability, scalability, etc. They attracted the attention and efforts of the Cloud computing community in the pursuit of guaranteeing the service level agreements (SLAs) established with users.In this paper, we propose a new composition approach that takes into account not only services QoS, but also services security level, services network environment and the cloud storage QoS. Our exact approach based on a multi-objective linear programming (MOLP) method provides efficient and optimal solution for the composition problem. For experiments, we have developed a Cloud simulator to simulate services composition in the Cloud context.

Aida Lahouij, Lazhar Hamel, Mohamed Graiet, Abir Elkhalfa, Walid Gaaloul

### Rule-Based Runtime Monitoring of Instance-Spanning Constraints in Process-Aware Information Systems

Instance-spanning constraints (ISC) constitute a crucial instrument to establish coordination between multiple instances in Process-Aware Information Systems. ISC need to be verified and monitored at design as well as runtime. In this work we propose a rule-based approach for runtime monitoring of ISC. We base our work on the well known Rete algorithm and research ways structure the network in such a way that improves matching speed for ISC. We show through a technical evaluation that (1) a rule-based approach is feasible for performing runtime monitoring of ISC and (2) that the heuristics we extract for structuring the Rete network improve the rule matching speed.

Conrad Indiono, Juergen Mangler, Walid Fdhila, Stefanie Rinderle-Ma

### Formal Verification of Time-Aware Cloud Resource Allocation in Business Process

Cloud computing has become an essential ingredient for any modern enterprise information systems infrastructure to effectively facilitate business-process execution with a low operating cost. Most of pricing strategies proposed by Cloud providers are based on temporal dimension. That is why time is considered one of the most important Cloud resources properties. In addition, activities in the business process are also constrained by hard timing requirements. Therefore, it is essential to ensure the matching between temporal constraints of both Cloud resources and business process activities. The aim of the present paper is to ensure a consistent time-aware Cloud resource allocation: we propose to formally specify temporal constraints on cloud resources and on process activities in business processes. This specification is translated to timed automata in order to formally validate the consistency of the time-aware Cloud resource allocation, and to analyze and check its correctness against business-process temporal constraints.

Rania Ben Halima, Slim Kallel, Kais Klai, Walid Gaaloul, Mohamed Jmaiel

### A 3D Visualization Approach for Process Training in Office Environments

Process participants need to learn how to perform in the context of their business processes. Process training is challenging due to cognitive difficulties in relating process model elements to real world concepts. In this paper we present a 3D Virtual World (VW) process training approach for office environments. In this VW, process participants can experience a process in an immersive environment. They interact with VW representations of process elements in changing locations, based on process activities. By means of embodied 3D representation, deep immersion and engagement in this environment, enhancements in long term memory learning and episodic memory usage for knowledge retrieval are expected. Our illustration of an example process model shows the applicability of the approach. We list a number of future directions to extend the use and the benefits.

Banu Aysolmaz, Ross Brown, Peter Bruza, Hajo A. Reijers

### Energy Efficient Configurable Resource Allocation in Cloud-Based Business Processes (Short Paper)

With the increasing adoption of cloud computing, organizations are migrating their business processes to the cloud in order to quickly adapt to changes of requirements at lower costs, in a multi-tenancy manner. In such environment, using configurable BPs allows for various tenants to share a reference process which can be customized depending on their needs. Nevertheless, there is a lack of support for cloud-specific resource configuration in an energy efficient way. In this paper, we cope with this gap by proposing a genetic-based approach that aims at selecting optimal cloud resources configuration allocation w.r.t cloud resource properties (i.e., elasticity and shareability), and process non functional properties associated to the ecological characteristics, namely green properties.

Emna Hachicha, Karn Yongsiriwit, Walid Gaaloul

### Efficient Constraint Verification in Service Composition Design and Execution (Short Paper)

Most methods that have been proposed to solve the problem of web service composition only consider input and output parameters of services in order to solve the composition problem. However, there are other factors that affect composition and execution of composite services such as constraints. Constraints can be used to express customer requirements on services features. Additionally, most real-world web services have constraints that specify their limitations and use restrictions. Constraint verification has significant impact on composition and execution of composite services. In particular, run time verification of service constraints can result in the failure of the execution of composite services and eventually waste computational resources. Such failures can not always be predicted as the verification of some services depends on execution effects of other services inside a composite plan. In this paper, we focus on verification of constraints during the composition and execution of composite services.

Touraj Laleh, Joey Paquet, Serguei A. Mokhov, Yuhong Yan

### Exploratory Search of Web Data Services (Short Paper)

Building web applications, which integrate information available over the web, increasingly requires frameworks to support the discovery of data services, enabling access to web data sources. Web data services might be discovered according to different criteria, such as their descriptions, their co-usage in similar applications, other developers who used the data services in their own development experiences. Recent approaches in literature convey on data service selection based on multiple criteria. In this paper, we take a distinguishing viewpoint, by proposing an exploratory search method, that enables web applications developers to iteratively discover services of interest and progressively improve their knowledge on available web data services.

Devis Bianchini, Valeria De Antonellis, Michele Melchiori

### FSCEP: A New Model for Context Perception in Smart Homes

With the emergence of the Internet of Things and smart devices, smart homes are becoming more and more popular. The main goal of this study is to implement an event driven system in a smart home and to extract meaningful information from the raw data collected by the deployed sensors using Complex Event Processing (CEP). These high-level events can then be used by multiple smart home applications in particular situation identification. However, in real life scenarios, low-level events are generally uncertain. In fact, an event may be outdated, inaccurate, imprecise or in contradiction with another one. This can lead to misinterpretation from CEP and the associated applications. To overcome these weaknesses, in this paper, we propose a Fuzzy Semantic Complex Event Processing (FSCEP) model which can represent and reason with events by including domain knowledge and integrating fuzzy logic. It handles multiple dimensions of uncertainty, namely freshness, accuracy, precision and contradiction. FSCEP has been implemented and compared with a well known CEP. The results show how some ambiguities are solved.

Amina Jarraya, Nathan Ramoly, Amel Bouzeghoub, Khedija Arour, Amel Borgi, Béatrice Finance

### Detecting Communities of Commuters: Graph Based Techniques Versus Generative Models

The main stage for a new generation of cooperative information systems are smart communities such as smart cities and smart nations. In the smart city context in which we position our work, urban planning, development and management authorities and stakeholders need to understand and take into account the mobility patterns of urban dwellers in order to manage the sociological, economic and environmental issues created by the continuing growth of cities and urban population. In this paper, we address the issue of the detection of communities of commuters which is one of the crucial aspects of smart community analysis.A community of commuters is a group of users of a public transportation network who share similar mobility patterns. Existing techniques for mobility patterns analysis, based on spatio-temporal data clustering, are generally based on geometric similarity metrics such as Euclidean distance, cosine similarity or variations of edit distance. They fail to capture the intuition of mobility patterns, based on recurring visitation sequences, which are more complex than simple trajectories with start and end points.In this work, we look at visitations as observations for generative models and we explain the mobility patterns in terms of mixtures of communities defined as latent topics which are seen as independent distributions over locations and time. We devise generative models that match and extend Latent Dirichlet Allocation (LDA) model to capture the mobility patterns. We show that our approach, using generative models, is more efficient and effective in detecting mobility patterns than traditional community detection techniques.

Ashish Dandekar, Stéphane Bressan, Talel Abdessalem, Huayu Wu, Wee Siong Ng

### ChorSystem: A Message-Based System for the Life Cycle Management of Choreographies

Service choreographies are commonly used as the means for enabling inter-organizational collaboration by providing a global view on the message exchange between involved participants. Choreographies are ideal for a number of application domains that are classified under the Collaborative, Dynamic & Complex (CDC) systems area. System users in these application domains require facilities to control the execution of a choreography instance such as suspending, resuming or terminating, and thus actively control its life cycle. We support this requirement by introducing the ChorSystem, a system capable of managing the complete life cycle of choreographies from choreography modeling, through deployment, to execution and monitoring. The performance evaluation of the life cycle operations shows that the ChorSystem introduces an acceptable performance overhead compared to purely script-based scenarios, while gaining the abilities to control the choreography life cycle.

Andreas Weiß, Vasilios Andrikopoulos, Santiago Gómez Sáez, Michael Hahn, Dimka Karastoyanova

### Ontology Driven Complex Event Pattern Definition (Short Paper)

Complex Event processing (CEP) usually focuses on analyzing raw atomic events in order to detect composite events. Usually, a composite event is defined as the pattern actively searched by a CEP system. However, considering uncertainty in some paradigms, such as internet of things, is still an open issue. In current approaches the confidence value related to the occurrence of an event is usually not communicated to the CEP system. As a consequence, a complex event pattern doesn’t take this information into account. Nevertheless, even if static, they are useful for pattern definition and particularly for a more accurate constraint definition. We propose to manage this information through domain ontologies. In this paper we describe the architecture for the enrichment of CEP queries to enable evolutivity and flexibility in CEP systems according to event sources [9].

Francois-Élies Calvier, Abderrahmen Kammoun, Antoine Zimmermann, Kamal Singh, Jacques Fayolle

### The Semantics of Hybrid Process Models

In the area of business process modelling, declarative notations have been proposed as alternatives to notations that follow the dominant, imperative paradigm. Yet, the choice between an imperative or declarative style of modelling is not always easy to make. Instead, a mixture of these styles is sometimes preferable. This observation has underpinned recent calls for so-called hybrid process modelling notations. In this paper, we present a formal semantics for these. In our proposal, a hybrid process model is hierarchical, where each of its sub-processes may be specified in either an imperative or declarative fashion. The semantics we provide will allow modelling communities to build on the benefits of existing imperative and declarative modelling notations, instead of spending their energy on inventing new ones.

Tijs Slaats, Dennis M. M. Schunselaar, Fabrizio M. Maggi, Hajo A. Reijers

### A Lightweight Process Engine for Enabling Advanced Mobile Applications

The widespread dissemination of smart mobile devices offers new perspectives for timely data collection in large-scale scenarios. However, realizing sophisticated mobile data collection applications raises various technical issues like the support of different mobile operating systems and their platform-specific features. Often, specifically tailored mobile applications are implemented in order to meet particular requirements. In this context, changes of the data collection procedure become costly and profound programming skills are needed to adapt the respective mobile application accordingly. To remedy this drawback, we developed a model-driven approach, enabling end-users to create mobile data collection applications themselves. Basis to this approach are elements for flexibly defining sophisticated questionnaires, called instruments, which not only contain information about the data to be collected, but also on how the instrument shall be processed on different mobile operating systems. For the latter purpose, we provide an advanced mobile (kernel) service that is capable of processing the logic of sophisticated instruments on various platforms. The paper discusses fundamental requirements for such a kernel and introduces a generic architecture. The feasibility of this architecture is demonstrated through a prototypical implementation. Altogether, the mobile service allows for the effective use of smart mobile devices in a multitude of different data collection application scenarios (e.g., clinical and psychological trials).

Johannes Schobel, Rüdiger Pryss, Marc Schickler, Manfred Reichert

### Barycenter-Based Placement Strategy Towards Improving Replicas Distribution Quality

Replication of data on variant sites is a well-known technique used to face many challenges in data grid systems. In this respect, selecting the appropriate site to place a new replica is crucial to improve data grid performance. Many replicas placement strategies have been proposed in the literature. Each one of them targets a specific goal. In this paper, we try to reach a new goal which is the improvement of the replicas distribution quality. To this purpose, a new placement strategy is proposed in which the barycenter method is applied. This strategy allows to reduce the total remote accesses cost, to increase the quality of service, and to ensure that the benefits of the new replica placement will remain for a long-term. We evaluate the strategy through the OptorSim simulator. Results show that the new strategy achieves significant improvement in terms of replicas distribution quality and total execution time, in the short-term as well as in the long-term.

Chamseddine Hamdeni, Tarek Hamrouni, Faouzi Ben Charrada

### Semantic Integration of Open-Data Tables

With vast amounts of tabular data freely available under several Open-Data initiatives, semantic integration of such datasets is a pressing need. Multiple research efforts have addressed the problem of annotating tabular data. However, to the best of our knowledge, they do not adequately address the problem of semantic integration of tables. A given collection of tables can be semantically integrated along several perspectives or themes. This makes semantic integration a “divergent aggregation” problem. Most existing approaches have focused on interpreting a single table, or rewriting tables to describe an overarching theme that is already provided. In this work, we address semantic integration along two levels: Theme identification (identifying dominant topics or perspectives through which the data can be characterized) and Schematic characterization (classes, relationships and instances that best characterize the data within the theme). The theme need not be represented by a single column, and may span across multiple columns or tables. We use Linked Open data (LOD) cloud to map ontologies that best suit the datasets. Our work also identifies incoherent datasets where a given collection may not have common topics. In such cases we are able to provide guidance on the intersection of semantic footprints of the tables for a judicious selection of the datasets for semantic integration.

Asha Subramanian, Ved Kurien Mathai, Vikkurthi Manikanta, Janaki Vinesh Joshi, Srinath Srinivasa

### An Analysis of Real-World XML Queries

The aim of our research was to gather a representative set of real-world XQuery queries and perform its analysis in order to confirm or refute distinct hypotheses about query complexity. The data were gathered using a modified crawler, then cleaned, corrected, and validated. The main subject of the analysis was usage of XQuery grammar symbols. We also analyzed the XML documents referenced from the XQuery queries as well as their outputs. To the best of our knowledge this is the first analysis of this kind and extent.

Peter Hlísta, Irena Holubová

### Modeling Recipes for Online Search

In this paper we propose a formal model which allows us to effectively represent and search for recipes in online environments. The proposed model is an entity-relationship model that provides relevant entity types and properties, formalized as an ontology. The important aspects of the recipe model are identified by means of competency questions. Our model advances the state of the art in that it supports essential queries that are typically not supported by websites and current reference data models, such as Schema.org and the BBC Food Ontology. We illustrate the methodology followed, the developed model, and the evaluation we conducted.

Usashi Chatterjee, Fausto Giunchiglia, Devika P. Madalli, Vincenzo Maltese

### GeoEtypes: Harmonizing Diversity in Geospatial Data (Short Paper)

The open data community is becoming aware of the proliferation of data and its exponential growth in size related to various domains. Another important aspect of open data is diversity, where the focus lies on managing heterogeneous datasets. In this work, we propose a common data model for the geospatial domain to address the diversity of (open) geospatial data, which we name GeoEtypes. GeoEtypes has two components. The first is the GeoEtypes Schema which provides a unified schema, while the second, named GeoEtypes Voc, is an aggregation for the vocabulary of terms. The idea behind this vocabulary is to provide natural language description of all terms used in the schema to make the model self-sufficient. GeoEtypes is constituted as a formalisation of INSPIRE, the European directive on spatial data. The model has been evaluated on three global datasets as well as on a local dataset with the main purpose to validate its adaptability to diversity.

Subhashis Das, Fausto Giunchiglia

Alan Meehan, Dimitris Kontokostas, Markus Freudenberg, Rob Brennan, Declan O’Sullivan

### On Topic Aware Recommendation to Increase Popularity in Microblogging Services (Short Paper)

The flourish of Web-based Online Social Networks (OSNs) has led to numerous applications that exploit social relationships to boost the influence of content in the network. However, existing approaches focus on the social ties and ignore how the topic of a post and its structure relate to its popularity. Our work assists in filling this gap. The contribution of this work is two-fold: (i) we develop a scheme that automatically identifies the topic of a post, specifically tweets, in real-time without human participation in the process, and then (ii) based on the topic of the tweet and prior related posts, we recommend appropriate structural properties to increase the popularity of the particular tweet. By exploiting Wikipedia, our model requires no training or expensive feature engineering for the classification of tweets to topics.

Iouliana Litou, Vana Kalogeraki, Dimitrios Gunopulos

### An Adaptive Semantic Model to Enhance Web Accessibility to Visually Impaired Users (Short Paper)

Web has becoming an invaluable source of knowledge. However, visually impaired users have faced critical difficulties to access the services and data available on Web. To tackle this problem, this paper proposes a semantic model to improve accessibility to websites for users with visual disability. The model is made by components to identify and prioritize relevant information on web pages, converting pages elements into more understandable ones, by a strategy of semantically enrichment that results in an adapted page that meets the user’s needs. The model is under development and has been partially implemented and validated in two different scenarios, one dealing with a site that portrays the Brazilian semiarid region and the other one regarding an adaptation of a social network page, replacing an image with an equivalent audio description. The experiment has shown the feasibility of semantic web technologies to improve accessibility.

Tatiana Sorrentino, Alexandre Santos, Joaquim Macedo, Cláudia Ribeiro

### Factorization Techniques for Longitudinal Linked Data (Short Paper)

Longitudinal linked data are RDF descriptions of observations from related sampling frames or sensors at multiple points in time, e.g., patient medical records or climate sensor data. Observations are expressed as measurements whose values can be repeated several times in a sampling frame, resulting in a considerable increase in data volume. We devise a factorized compact representation of longitudinal linked data to reduce repetition of same measurements, and propose algorithms to generate collections of factorized longitudinal linked data that can be managed by existing RDF triple stores. We empirically study the effectiveness of the proposed factorized representation on linked observation data. We show that the total data volume can be reduced by more than 30 % on average without loss of information, as well as improve compression ratio of state-of-the-art compression techniques.

Farah Karim, Maria-Esther Vidal, Sören Auer

### An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper)

Extraction of structured information from biomedical literature is a complex and challenging problem due to the complexity of biomedical domain and lack of appropriate natural language processing (NLP) techniques. High quality domain ontologies model both data and metadata information at a fine level of granularity, which can be effectively used to accurately extract structured information from biomedical text. Extraction of provenance metadata, which describes the history or source of information, from published articles is an important task to support scientific reproducibility. Reproducibility of results reported by previous research studies is a foundational component of scientific advancement. This is highlighted by the recent initiative by the US National Institutes of Health called “Principles of Rigor and Reproducibility”. In this paper, we describe an effective approach to extract provenance metadata from published biomedical research literature using an ontology-enabled NLP platform as part of the Provenance for Clinical and Healthcare Research (ProvCaRe). The ProvCaRe-NLP tool extends the clinical Text Analysis and Knowledge Extraction System (cTAKES) platform using both provenance and biomedical domain ontologies. We demonstrate the effectiveness of ProvCaRe-NLP tool using a corpus of 20 peer-reviewed publications. The results of our evaluation demonstrate that the ProvCaRe-NLP tool has significantly higher recall in extracting provenance metadata as compared to existing NLP pipelines such as MetaMap.

Joshua Valdez, Michael Rueschman, Matthew Kim, Susan Redline, Satya S. Sahoo

### Class Annotation Using Linked Open Data

The meaningful usage of RDF datasets requires a description of their content. Part of this description is provided in the dataset itself through class definitions. However, the name of a class does not always reflect accurately its semantics. This meaning can be captured by providing some annotations for each class.In this paper, we present a set of algorithms exploiting the instances of a dataset in order to provide annotations which best capture the semantics of a class. These algorithms rely on an external knowledge source. We introduce three ways of extracting annotations: (i) using the names of instances, (ii) using their property sets and (iii) considering the vocabularies used by the dataset. As an external source, we have used Linked Open Data, which represents an unprecedented amount of knowledge provided on the Web. We also show how annotations can be used to discover a class hierarchy and we present some evaluation results showing the effectiveness of our approach.

With the steady growth of linked datasets available on the web, it becomes increasingly necessary the creation of efficient approaches for analyzing, search and discover links between RDF datasets. In this paper, we describe LD-LEx, an architecture that creates the possibility of indexing RDF datasets using GridFS documents and probabilistic data structures called Bloom filter. Hence, our lightweight approach provides metadata about quantity and quality of links between datasets. Moreover, we explored these concepts indexing more than 2 billion triples from over a thousand of datasets, providing insights of Bloom filters behavior w.r.t. performance and memory footprint.

Ciro Baron Neto, Dimitris Kontokostas, Gustavo Publio, Kay Müller, Sebastian Hellmann, Eduardo Moletta

### Enriching Topic Models with DBpedia

Traditional Topic Modeling approaches only consider the words in the document. By using an entity-topic modeling approach and including background knowledge about the entities such as the occupation of persons, the location of organizations, the band of a musician etc., we can better cluster related documents together, and produce semantic topic models that can be represented in a knowledge base. In our approach we first reduce the text documents to a set of entities and then enrich this set with background knowledge from DBpedia. Topic modeling is performed on the enriched set of entities and various feature combinations are evaluated in order to determine the combination that achieves the best classification precision or perplexity compared to using word-based topic models alone.

Alexandru Todor, Wojciech Lukasiewicz, Tara Athan, Adrian Paschke

### FuhSen: A Federated Hybrid Search Engine for Building a Knowledge Graph On-Demand (Short Paper)

A vast amount of information about various types of entities is spread across the Web, e.g., people or organizations on the Social Web, product offers on the Deep Web or on the Dark Web. These data sources can comprise heterogeneous data and are equipped with different search capabilities e.g., Search API. End users such as investigators from law enforcement institutions searching for traces and connections of organized crime have to deal with these interoperability problems not only during search time but also while merging data collected from different sources. We devise FuhSen, a keyword-based federated engine that exploits the search capabilities of heterogeneous sources during query processing and generates knowledge graphs on-demand applying an RDF-Molecule integration approach in response to keyword-based queries. The resulting knowledge graph describes the semantics of entities collected from the integrated sources, as well as relationships among these entities. Furthermore, FuhSen utilizes ontologies to describe the available sources in terms of content and search capabilities and exploits this knowledge to select the sources relevant for answering a keyword-based query. We conducted a user evaluation where FuhSen is compared to traditional search engines. FuhSen semantic search capabilities allow users to complete search tasks that could not be accomplished with traditional Web search engines during the evaluation study.

Diego Collarana, Mikhail Galkin, Christoph Lange, Irlán Grangel-González, Maria-Esther Vidal, Sören Auer

### Bindings-Restricted Triple Pattern Fragments

The Triple Pattern Fragment (TPF) interface is a recent proposal for reducing server load in Web-based approaches to execute SPARQL queries over public RDF datasets. The price for less overloaded servers is a higher client-side load and a substantial increase in network load (in terms of both the number of HTTP requests and data transfer). In this paper, we propose a slightly extended interface that allows clients to attach intermediate results to triple pattern requests. The response to such a request is expected to contain triples from the underlying dataset that do not only match the given triple pattern (as in the case of TPF), but that are guaranteed to contribute in a join with the given intermediate result. Our hypothesis is that a distributed query execution using this extended interface can reduce the network load (in comparison to a pure TPF-based query execution) without reducing the overall throughput of the client-server system significantly. Our main contribution in this paper is twofold: we empirically verify the hypothesis and provide an extensive experimental comparison of our proposal and TPF.

Olaf Hartig, Carlos Buil-Aranda

### Scheduling Refresh Queries for Keeping Results from a SPARQL Endpoint Up-to-Date (Short Paper)

Many datasets change over time. As a consequence, long-running applications that cache and repeatedly use query results obtained from a SPARQL endpoint may resubmit the queries regularly to ensure up-to-dateness of the results. While this approach may be feasible if the number of such regular refresh queries is manageable, with an increasing number of applications adopting this approach, the SPARQL endpoint may become overloaded with such refresh queries. A more scalable approach would be to use a middle-ware component at which the applications register their queries and get notified with updated query results once the results have changed. Then, this middle-ware can schedule the repeated execution of the refresh queries without overloading the endpoint. In this paper, we study the problem of scheduling refresh queries for a large number of registered queries by assuming an overload-avoiding upper bound on the length of a regular time slot available for testing refresh queries. We investigate a variety of scheduling strategies and compare them experimentally in terms of time slots needed before they recognize changes and number of changes that they miss.

Magnus Knuth, Olaf Hartig, Harald Sack

### StatSpace: A Unified Platform for Statistical Data Exploration

In recent years, the amount of statistical data available on the web has been growing fast. Numerous organizations and governments publish data sets in a multitude of formats and encodings, using different scales, and providing access through a wide range of mechanisms. Due to such inconsistent publishing practices, integrated analysis of statistical data is challenging. StatSpace tackles this problem through semantic integration and provides uniform access to disparate statistical data. At present, it incorporates more than 1,800 data sets published by a variety of data providers including the World Bank, the European Union, and the European Environment Agency. StatSpace transparently lifts data from raw sources, maps geographical and temporal dimensions, aligns value ranges, and allows users to explore and integrate the previously isolated data sets. This paper introduces the constituent elements of the StatSpace architecture – i.e., a metadata repository, URI design patterns, and supporting services – and demonstrates the usefulness of the resulting Linked Data infrastructure by means of use case examples.

Ba-Lam Do, Peter Wetz, Elmar Kiesling, Peb Ruswono Aryan, Tuan-Dat Trinh, A Min Tjoa

### Ontological Reasoning About Situations from Calendar Events

Inferring situations is the key to developing situation-aware applications that exploit users’ situations to support the fulfillment of their tasks on the move. In this paper, we take an attempt to reason about users’ various situations from their calendar events, provided the calendar event data represent accurate scheduling of ‘real-world’ events. We show how ontology can be used to infer situations from calendar events by considering both the semantic and temporal aspects of situations. We develop a situation ontology and propose a semantic rule based approach to deducing and abstracting situations from data collected from calendar system. The feasibility and applicability of our approach is demonstrated by developing a prototype mobile phone call interrupt management application that uses user’s situation information for handling incoming phone calls. We further present an empirical evaluation of our approach on a real dataset. Our preliminary results show that our approach has a great potential to infer users’ various situations.

### Formalisation of ORM Derivation Rules and Their Mapping into OWL

Object-Role Modelling (ORM) is a framework for modelling and querying information at the conceptual level. It comes to support the design of large-scale industrial applications allowing the users to model easily the domain. Derivation rules are additional ORM constructs which capture some relevant information of the domain that cannot be expressed in the standard ORM2 language. In this paper, we identify the first-order fragment of subtype derivation rules (without arithmetic operators and aggregation functions) and we provide a provably correct mapping into OWL. This enables complete automated reasoning with ORM2 conceptual schemas enriched by derivation rules, such as detecting inconsistencies and redundancies and deriving implicit constructs. We illustrate the implementation of our formalisation in ORMiE, a plugin for the NORMA ORM2 extension of Microsoft Visual Studio, which automatically maps ORM2 conceptual schemas with derivation rules into OWL and uses a description logic prover as a background reasoning engine.

Francesco Sportelli, Enrico Franconi

### Processing Regular Path Queries on Arbitrarily Distributed Data

Regular Path Queries (RPQs) are a type of graph query where answers are pairs of nodes connected by a sequence of edges matching a regular expression.We study the techniques to process such queries on a distributed graph of data.While many techniques assume the location of each data element (node or edge) is known, when the components of the distributed system are autonomous, the data will be arbitrarily distributed, or non-localized.We compare query processing strategies for this setting analytically and empirically, using biomedical data and meaningful queries. We isolate query-dependent cost factors and present a method to choose between strategies, using new query cost estimation techniques.

Alan Davoust, Babak Esfandiari

### A Discretionary Delegation Framework for Access Control Systems

Provision for delegation of access privileges lends access control systems flexibility and context-awareness. The topic of delegation did not exist in classical computing security, but – as IT systems got more distributed and complex – provision for delegation became a necessary access-control feature, and consequently much effort has been dedicated to extend conventional access control models with delegation capability. Many such efforts have pivoted around the well-known Role-based Access Control (RBAC) model, mainly for compatibility reasons, as RBAC had already been considered the de facto industry standard – even before the need for delegation arose in enterprise information systems. However, delegation is arguably more discretionary in nature rather than role-based; especially for healthcare informatics which is the application domain for our proposed delegation framework. In this paper, we present a discretionary framework for delegation of access rights from a delegator to a delegatee by implementing a delegation-token and various stages of its life cycle in tamper-resistant devices including smartcards. The proposed framework is designed and implemented using our eTRON cybersecurity architecture which advocates use of public key cryptographic protocols for secure entity authentication, data integrity and data confidentiality.

M. Fahim Ferdous Khan, Ken Sakamura

### Deploying Visual Analytics Through a Multi-cloud Service Store with Encrypted Big Data (Short Paper)

The benefits of Cloud Computing are now widely recognised, in terms of easy, flexible, scalable and cost effective deployment of services and storage. At the same time, the growth in Big Data solutions is offering a plethora of new service opportunities. However, significant barriers of trust and privacy concerns are slowing the adoption of Big Data cloud services.In this paper we describe how a Visual Analytics system can be flexibly deployed via a multi-cloud application store. The supporting infrastructure (IaaS) is protected via innovative security protection capabilities, while associated Big Data resources can be protected via encryption and access control.The novel Visual Analytics capability makes analysis of Big Data in the cloud easier and faster, thereby empowering data analysts with attractive new tools, while the security features help to tackle issues of privacy and trust for big data cloud deployments.

Mark Shackleton, Fadi El-Moussa, Robert Rowlingson, Alex Healing, John Crowther, Joshua Daniel, Theo Dimitrakos, Ali Sajjad

### Evaluating Two Methods for WS-(Security) Policy Negotiation and Decision Making

Any communication between a Web Service Provider (WSP) and a Web Service Consumer (WSC) in Web Service (WS) systems need both parties to negotiate their security policies in order to reach an agreed upon security rules. However, reaching this agreement faces several issues. First, there are no current policy selection methods for the case of multiple compatible alternatives or any mechanism for the case no compatible alternatives. Second, the complexity of these security policy assertions written in XML language. In order to overcome these issues, we propose in this paper an evaluation for the policy intersection method in its current status and another one for two policy selection methods that are Lattice lub/glb and Fuzzy Multiple Criteria Decision Making (MCDM) using the Analytical Hierarchy Process (AHP) for policy selection and decision making. These two methods can be used as an extension for policy intersection to solve policy compatibility measurements for better interoperability. An implementation to evaluate the decision making methods is built. It is found that about 98.91 % of the total comparisons using both methods select the same set of security policies. Based on the evaluation findings we propose a negotiation process using the extended policy intersection using the two evaluated methods for final policy agreement.

Abeer Elsafie, Jörg Schwenk

### A Context-Aware Analytics for Processing Tweets and Analysing Sentiment in Realtime (Short Paper)

Sentiment analysis has grown to become increasingly important for companies to more accurately understand customer/supplier sentiments about their processes/products and services, and predict customer churn. In particular, existing sentiment analysis aims to better understand their customer’s or supplier’s emotions which are essentially the affirmative, negative, and neutral views of users on tangible or intangible entities e.g., products or services. One of the most prevalent sources to analyse these sentiments is Twitter. Unfortunately, however, existing sentiment analysis techniques suffer from three serious shortcomings: (1) they have problems to effectively deal with streaming data as they can merely exploit (Twitter) hashtags, and (2) neglect the context of Tweets. In this paper, we present SANA: a context-aware solution for dealing with streaming (Twitter) data, analysing this data on the fly taking into account context and more comprehensive semantics of Tweets, and dynamically monitoring and visualising trends in sentiments through dashboarding and query facilities.

Yehia Taher, Rafiqul Haque, Mohammed AlShaer, Willem Jan van den Heuvel, Mohand-Saïd Hacid, Mohamed Dbouk

### Using Multiplex Networks to Model Cybersecurity Attack Profiles

Recent research in cybersecurity models the nature of attacks as graphs consisting of nodes that represent attacks and their properties, forming attack profiles. We examine the relationships between attack profiles based on established properties of the attacks to reduce the amount of information present in the graph and make them more applicable to cloud and big data environments. This is done by using multiplex networks, which are constructed based on the individual properties of cyber-attacks and reasoning rules enhanced with semantics and context to generate a multiplex semantic link network (mSLN). This paper presents an approach to generate mSLNs and it is evaluated with specific datasets.

Manesh Pillai, George Karabatis

### Network Trace Anonymization Using a Prefix-Preserving Condensation-Based Technique (Short paper)

This paper proposes a method to anonymize network trace data by utilizing a novel perturbation technique that has strong privacy guarantee and at the same time preserves data utility. The resulting dataset can be used for security analysis, retaining the utility of the original dataset, without revealing sensitive information. Our method utilizes a condensation based approach with strong privacy guarantees, suited for cloud environments. Experiments show that the method performs better than existing anonymization techniques in terms of privacy-utility trade off, and it surpasses existing techniques in attack prediction accuracy.

Ahmed Aleroud, Zhiyuan Chen, George Karabatis

### Balancing Utility and Security: Securing Cloud Federations of Public Entities

Following their practical needs and legal constraints, recent application of the cloud paradigm among public administrations has been focused on the deployment of private clouds. Due to the increasing amount of data and processing requirements, many organizations are considering possibilities to additionally optimize their infrastructures and collaborative processes by employing private cloud federations.In this work, we present our contribution based on three real-world use cases implemented in the course of the SUNFISH project. We consider intra- and inter-organizational processes which demand secure and transparent infrastructure and data sharing. Based on derived requirements for data security and privacy in cloud federations, we propose a security governance architecture which enables a multi-layered, context and process-aware policy enforcement in heterogeneous environments. The proposed architecture relies on the micro-services paradigm to support scalability and provides additional security by integrating reactive and transformative security controls. To prove the feasibility of this work we provide performance evaluation of our implementation.

Bojan Suzic, Bernd Prünster, Dominik Ziegler, Alexander Marsalek, Andreas Reiter

### Differential Privacy Based Access Control

The huge availability of data is giving organizations the opportunity to develop and consume new data-intensive applications (e.g., predictive analytics). However, data often contain personal and confidential information, and their usage and sharing come with security and legal risks; so there is the need of devising appropriate, task specific, data release mechanisms to find the balance between advantages of big data and the potential risks.We propose a novel privacy-aware access control model, based on differential privacy. The model allows for data access at different privacy levels, generating an anonymized data set according to the privacy clearance of each request. The architecture also supports re-negotiation of the privacy level, in return of fulfilling a set of obligations. We also show, how the model can address the privacy and utility requirements, in a human-resource motivated use-case with a classification task. The model provides a flexible access control, improving data availability, while guaranteeing a certain level of privacy.

### Backmatter

Weitere Informationen

## BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.

## Whitepaper

- ANZEIGE -

### Globales Erdungssystem in urbanen Kabelnetzen

Bedingt durch die Altersstruktur vieler Kabelverteilnetze mit der damit verbundenen verminderten Isolationsfestigkeit oder durch fortschreitenden Kabelausbau ist es immer häufiger erforderlich, anstelle der Resonanz-Sternpunktserdung alternative Konzepte für die Sternpunktsbehandlung umzusetzen. Die damit verbundenen Fehlerortungskonzepte bzw. die Erhöhung der Restströme im Erdschlussfall führen jedoch aufgrund der hohen Fehlerströme zu neuen Anforderungen an die Erdungs- und Fehlerstromrückleitungs-Systeme. Lesen Sie hier über die Auswirkung von leitfähigen Strukturen auf die Stromaufteilung sowie die Potentialverhältnisse in urbanen Kabelnetzen bei stromstarken Erdschlüssen. Jetzt gratis downloaden!

Bildnachweise