Vehicle-based performance technologies infer driver behavior by monitoring car systems such as lane deviation, steering or speed variability. Such systems are critical to detect and avoid driver drowsiness, which is related to around 20% of severe car injuries. The idea of fingerprinting drivers from timestamped sensor data, e.g., controller area network (CAN) protocol records, is not new; many recent studies have shown that identifying a driver using machine learning-based classification is a promising field of research. Another approach to driver identification, which has also attracted a lot of research effort, is based on face recognition. In this paper, we focus on the former approach.
Most methods in the literature on driving style modeling rely on a human-defined driving behavior feature set, which consists of handcrafted vehicle movement features derived from sensor data. These features are used by machine learning methods (supervised classification, unsupervised clustering, or reinforcement learning) to solve problems such as driver classification/identification, driver performance assessment, and individual driving style learning.
Both simulated and naturalistic driving patterns have been studied in the literature using different features extracted mainly from the in-vehicle’s CAN Bus (the steering wheel, the vehicle speed, and the engine speed, etc.). The number of these features may range from one to twelve. Using these features, different machine learning methods (e.g. Bayesian algorithms, Decision Tree algorithms, instance-based algorithms, deep learning algorithms) have been proposed to learn driving styles.
Dong and Li [
5] proposed to use deep learning to identify a user using only their GPS raw records. This was the first attempt of applying the deep learning concept to driving style feature learning directly from GPS data. First, they proposed a data transformation method to construct an easily consumable input form (the statistical feature matrix) from raw GPS time series for deep learning. Second, they developed several deep neural network architectures including Convolutional Neural Networks (CNNs) using 1-D convolution with pooling, and Recurrent Neural Networks (RNNs). They studied their performance on learning a good representation of driving styles from the transformed data inputs. For driver identification, the authors of [
6‐
8] have proposed several signal processing approaches using Gaussian Mixture Model (GMM) and different feature selection strategies. To handle the car theft problem, Meng et al. [
9] have proposed a Hidden Markov Models (HMM) method, coupled with an HMM-based similarity measure, using mainly three features: acceleration, brake, and steering wheel data. Naturalistic data from University of Texas Drive (UTDrive) corpus have been used by Choi et al. [
10] to derive both GMM and HMM models for the sequence of driving characteristics (wheel angle, brake pedal status, acceleration status, and vehicle speed). The authors have shown that driver identification can be accomplished at rates ranging from 30 to 70%. Wahab et al. [
11] performed driver identification using statistical, artificial neural network, and fuzzy neural network techniques. The authors considered the accelerator and brake pedal pressure signals of 30 drivers and used techniques based on the GMMs and wavelet transformation for feature extraction. To optimize the energy usage, Kedar-Dongarkar and Das [
12] have proposed a simple classifier of driving styles (based on generalized Bell function) using features extracted from the vehicle’s power train signals. The authors defined three driving styles and achieved a classification accuracy of 77%. Van Ly et al. [
13] pointed out that there is a potential in using inertial sensors to differentiate between different drivers. The authors conducted experiments comparing brake and turning signals from two different drivers using K-means and Support vector machine (SVM) algorithms. Another effort in drivers’ differentiation was performed by Zhang et al. [
14] who used HMM to analyze the data of the accelerator and steering wheel of each driver, and achieved an accuracy of 85%. One of the most accurate approaches to driver identification, for naturalistic data, was proposed by Enev et al. [
15]. Twelve features from the CAN bus were considered with SVM, Random Forest, Naive Bayes, and k-nearest neighbor (KNN) algorithms. The authors have shown that it is possible to differentiate between drivers with 100% accuracy under some assumptions, and it is possible to reach high identification rates using less than 8 min of training time. Recently, Wallace et al. [
16] have studied a large dataset of all trips made by 14 drivers over a 2-year period. The authors identified a two-phase relationship between the mean and maximum accelerations within each driver’s acceleration events. This can be used as a measure of a driver’s signature. Burton et al. [
17] proposed a novel approach for driver authentication, where the mode of driving is constructed using the following features: pedal control, steering, speed, and distance traveled. The authors used classical machine learning algorithms (SVM, KNN, and Decision Tree) and boosting to increase the classification accuracy. The obtained results show a time-to-detection of 2 min and 20 s at 95% precision.