Skip to main content

About this book

This book constitutes 5 revised tutorial lectures of the 9th European Business Intelligence and Big Data Summer School, eBISS 2019, held in Berlin, Germany, during June 30 – July 5, 2019.
The tutorials were given by renowned experts and covered advanced aspects of business intelligence and big data. This summer school, presented by leading researchers in the field, represented an opportunity for postgraduate students to equip themselves with the theoretical and practical skills necessary for developing challenging business intelligence applications.

Table of Contents


Actionable Conformance Checking: From Intuitions to Code

Conformance checking is receiving increasing attention in the last years. This is due to several reasons, that can be summarized into two: the explosion of digital information that talks about processes, and the need to use this data in order to monitor and improve processes in organizations. Naturally, conformance checking addresses this by providing techniques capable of relating modeled and recorded process information. This paper overviews in a very accessible way the main techniques and feedback of the conformance checking field. Moreover, in order to make it actionable, code snippets are provided so that an organization can start a conformance checking project on its own data.
Josep Carmona, Matthias Weidlich, Boudewijn van Dongen

Introduction to Text Analytics

Data processing regards analysis of various types of data, including numerical data, signals, texts, pictures, videos, etc. This paper focuses on defining and studying various tasks of text analytics following the typical processing pipeline. Sources of textual data are introduced and related challenges are discussed. Along with the process of text analytics, examples are presented to demonstrate how text analytics should be carried out. Finally, potential applications of text analytics are given including sentiment analysis and automatic generation of content.
Agata Filipowska, Dominik Filipiak

Automated Machine Learning: Techniques and Frameworks

Nowadays, machine learning techniques and algorithms are employed in almost every application domain (e.g., financial applications, advertising, recommendation systems, user behavior analytics). In practice, they are playing a crucial role in harnessing the power of massive amounts of data which we are currently producing every day in our digital world. In general, the process of building a high-quality machine learning model is an iterative, complex and time-consuming process that involves trying different algorithms and techniques in addition to having a good experience with effectively tuning their hyper-parameters. In particular, conducting this process efficiently requires solid knowledge and experience with the various techniques that can be employed. With the continuous and vast increase of the amount of data in our digital world, it has been acknowledged that the number of knowledgeable data scientists can not scale to address these challenges. Thus, there was a crucial need for automating the process of building good machine learning models (AutoML). In the last few years, several techniques and frameworks have been introduced to tackle the challenge of automating the machine learning process. The main aim of these techniques is to reduce the role of humans in the loop and fill the gap for non-expert machine learning users by playing the role of the domain expert. In this chapter, we present an overview of the state-of-the-art efforts in tackling the challenges of machine learning automation. We provide a comprehensive coverage for the various tools and frameworks that have been introduced in this domain. In addition, we discuss some of the research directions and open challenges that need to be addressed in order to achieve the vision and goals of the AutoML process.
Radwa Elshawi, Sherif Sakr

Travel-Time Computation Based on GPS Data

The volume of GPS data collected from moving vehicles has increased significantly over the last years. We have gone from GPS data being collected every few minutes to data being collected every second. With large quantities of GPS data available it is possible to analyze the traffic on most of the road network without installing road-side equipment.
A very important key performance indicator (KPI) in traffic planning is travel time. For this reason, this paper describes how travel time can be computed from GPS data. Of particular interest is how the travel time is affected by the weather.
The work presented here is an extension of previous work on computing accurate travel time from GPS data. In this paper, the logical data model is explained in more details and the result section showing weather’s impact on travel time has been significantly extended with previously unpublished material.
Kristian Torp, Ove Andersen, Christian Thomsen

Laplacian Matrix for Dimensionality Reduction and Clustering

Many problems in machine learning can be expressed by means of a graph with nodes representing training samples and edges representing the relationship between samples in terms of similarity, temporal proximity, or label information. Graphs can in turn be represented by matrices. A special example is the Laplacian matrix, which allows us to assign each node a value that varies only little between strongly connected nodes and more between distant nodes. Such an assignment can be used to extract a useful feature representation, find a good embedding of data in a low dimensional space, or perform clustering on the original samples. In these lecture notes we first introduce the Laplacian matrix and then present a small number of algorithms designed around it for data visualization and feature extraction.
Laurenz Wiskott, Fabian Schönfeld


Additional information