Skip to main content

About this book

This book explains the complete loop to effectively use self-tracking data for machine learning. While it focuses on self-tracking data, the techniques explained are also applicable to sensory data in general, making it useful for a wider audience. Discussing concepts drawn from from state-of-the-art scientific literature, it illustrates the approaches using a case study of a rich self-tracking data set. Self-tracking has become part of the modern lifestyle, and the amount of data generated by these devices is so overwhelming that it is difficult to obtain useful insights from it. Luckily, in the domain of artificial intelligence there are techniques that can help out: machine-learning approaches allow this type of data to be analyzed. While there are ample books that explain machine-learning techniques, self-tracking data comes with its own difficulties that require dedicated techniques such as learning over time and across users.

Table of Contents


Chapter 1. Introduction

This chapter provides a basic introduction into the two domains that are pivotal in this book: the quantified self and machine learning. It explains the need and purpose of the book and introduces the basic definitions and notations. On top of that, an overview of the book is given and a reader’s guide is provided.
Mark Hoogendoorn, Burkhardt Funk

Sensory Data and Features


Chapter 2. Basics of Sensory Data

A typical quantified self/sensory dataset is introduced in this chapter. Popular sensors are briefly highlighted and a procedure is provided to transform raw sensory datasets into a suitable format to enable the application of machine learning techniques. The dataset introduced in this chapter is used to illustrate the approaches explained in the remainder of the chapters.
Mark Hoogendoorn, Burkhardt Funk

Chapter 3. Handling Noise and Missing Values in Sensory Data

In this chapter, approaches to remove noise, that is inherently present in sensory data, are introduced. This includes outlier detection algorithms, missing value imputation, as well as approaches to filter more subtle noise in the data including the low pass filter and principal component analysis. The Kalman filter is also explained to remove noise and impute missing values.
Mark Hoogendoorn, Burkhardt Funk

Chapter 4. Feature Engineering Based on Sensory Data

Approaches to automatically generate useful features from sensory data are introduced in this chapter. Most of the approaches introduced focus on datasets that have a temporal ordering. Features in the time domain are explained, thereby summarizing both numerical and categorical values in a certain historical window. The frequency domain is also discussed, including Fourier transformations and features one can derive from these transformations. In addition, the extraction of features from unstructured data is discussed, mainly focusing on text data.
Mark Hoogendoorn, Burkhardt Funk

Learning Based on Sensory Data


Chapter 5. Clustering

This chapter focuses on clustering of the data resulting from quantified selves. It introduces distance functions that can be used to compare individual data points, but also entire datasets of users. Among these are dynamic time warping and the cross-correlation coefficient. The chapter provides a brief discussion of popular clustering techniques. In addition, it explains more specialized clustering techniques that are better suited for the quantified self, including subspace clustering and data stream mining.
Mark Hoogendoorn, Burkhardt Funk

Chapter 6. Mathematical Foundations for Supervised Learning

In this chapter the theoretical underpinning of supervised learning are discussed. The whole supervised machine learning process is explained from a more formal perspective as well as some underlying theories. The theories discussed include concepts such as PAC learnability and VC dimensions. The implications of these theories are discussed.
Mark Hoogendoorn, Burkhardt Funk

Chapter 7. Predictive Modeling without Notion of Time

Supervised learning approaches that do not explicitly take the time component into account are briefly discussed in this chapter. The approaches explained include feedforward neural networks, support vector machines, k-nearest neighbor, decision trees, naïve bayes and ensembles. Guidelines are provided on how to apply these algorithms to quantified self data, including the learning setup (e.g. learning for single users or across multiple users) and other practical considerations such as feature selection and regularization. Data stream mining approaches for predictive modeling are also briefly discussed.
Mark Hoogendoorn, Burkhardt Funk

Chapter 8. Predictive Modeling with Notion of Time

This chapter focuses on supervised learning approaches that do take time into account explicitly. Times series approaches are explained as well as recurrent neural networks (including echo state networks). In addition, parameter optimization techniques that can be used to fine-tune more knowledge driven predictive temporal models (dynamical systems models) are discussed.
Mark Hoogendoorn, Burkhardt Funk

Chapter 9. Reinforcement Learning to Provide Feedback and Support

This chapter discusses reinforcement learning, a technique that can be used to learn when to provide what kind of feedback or intervention to a user to better accomplish the set goals. The techniques discussed are SARSA and Q-learning. In addition, approaches to allow reinforcement learning to cope with detailed sensor information such as discretization procedures are discussed.
Mark Hoogendoorn, Burkhardt Funk



Chapter 10. Discussion

In this chapter, we list some of the important challenges we see in the field of machine learning for the quantified self.
Mark Hoogendoorn, Burkhardt Funk


Additional information

Premium Partner

image credits