nach oben

Erschienen in:

Open Access 2022 | OriginalPaper | Buchkapitel

Efficient Location-Based Tracking for IoT Devices Using Compressive Sensing and Machine Learning Techniques

verfasst von : Ramy Aboushelbaya, Taimir Aguacil, Qiuting Huang, Peter A. Norreys

Erschienen in: High-Dimensional Optimization and Probability

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

In this chapter, a scheme based on compressive sensing (CS) for the sparse reconstruction of down-sampled location data is presented for the first time. The underlying sparsity properties of the location data are explored and two algorithms based on LASSO regression and neural networks are shown to be able to efficiently reconstruct paths with only ∼20% sampling of the GPS receiver. An implementation for iOS devices is discussed and results from it are shown as proof of concept of the applicability of CS in location-based tracking for Internet of Things (IoT) devices.

1 Introduction

Compressive sensing is a scheme that was developed to reconstruct data that was down-sampled below the Shannon frequency. It accomplishes this by exploiting certain statistical properties assumed to exist in the true data. It has been used, to great success, in many different applications ranging from signal processing, medical imaging to scientific diagnostics [1‐3]. The main advantage to it lies in the fact that it allows one to reduce the amount of times a sampling device needs to be used while still being able to reconstruct the needed data. This makes it an attractive prospect for use in applications where the complete sampling of the data is either impossible or impractical, such as the problem of Global Positioning System (GPS) location and trajectory tracking.

GPS tracking has a wide variety of applications across many different fields ranging from traffic management [4], to big data mobility analytics (e.g., origin-destination mapping) [5], and even to studying wild life [6]. Additionally, many Internet of Things (IoT) devices have functionalities that depend on location-based tracking, such as health and fitness devices and trail tracking applications [7, 8]. In the vast majority of these cases, the data is collected on an embedded device using a GPS receiver and stored for later processing. However, constant sampling of the receiver can be very detrimental for the battery-life of the device which poses a problem for portable devices as well as produce an immense amount of data requiring large storage capacities. Additionally, varying atmospheric and environmental conditions can mean that some data points will not be sampled due to the high noise level seen by the receiver.

To that end, there has been a considerable amount of research interest aimed at reconstructing trajectory data. There have been different approaches to this problem each with its own strength and limitations. One of them involved using map matching and has shown high accuracy in reconstruction [9]. However, by definition, map matching requires access to road maps and as such would only work in locations where movement can only occur on well-defined and mapped paths which limits its applicability to vehicle mobility. Another approach used dead-reckoning to correct sparse GPS signals [10] in the context of wildlife tracking, i.e., with no predefined paths. However, dead-reckoning requires accelerometers or similar devices to gauge the movement being tracked. Again, this limits the applicability of these types of algorithms to devices which contain accurate mobility sensors. In order to avoid path constraints and sensor requirements, it is possible to use signal processing techniques to compress GPS trajectories in a way to retain the most relevant information allowing for efficient reconstruction when the data is needed while reducing the required space for data storage [11]. Although this technique has its advantages compared to the others, it does not solve the issue of power efficiency as the GPS receiver is still being constantly used. While this may not be an issue for applications such as vehicle tracking which are not power-sensitive, it poses a problem for portable and embedded applications. It also does not solve the issue of missing points due to environmental or weather issues. What would be optimal is a way to reduce the sampling of the GPS receiver and allow for random missing points. As such, this makes location tracking an ideal candidate for compressive sensing (CS) reconstruction.

In this chapter, we show, for the first time, how the intrinsic sparsity properties of trajectory signals can be exploited to reduce the sampling acquisition, hence the GPS receiver usage, as much as possible while still allowing for accurate path reconstruction. Our compressive sensing-based approach allows us to treat the problem of path reconstruction as a constrained optimization problem that can be solved even on devices with low processing power. Furthermore, less data means less storage is required on the embedded device, or on a remote server, which could prove beneficial if the tracking is required over a long period of time or if a large amount of paths need to be saved in the context of big data mobility analytics.

The chapter is organized as follows: first, the signal model is discussed along with the underlying properties and how they can be exploited. Then, the general operating principles of the novel compressive sensing approach to trajectory reconstruction are presented. Following that, two different approaches to CS are explored and their relative strengths and weaknesses are elaborated. Finally a discussion of future work concludes the chapter.

2 Signal Representation

2.1 Principles

In most embedded devices, location data is output at a given time t as a pair of longitude and latitude coordinates which shall be denoted as (λ(t), ϕ(t)). They also provide data about the accuracy of a given coordinate pair in the form of a numerical estimate for the probable error evaluated in meters [12]. In other words, if one could imagine a circle centered at this coordinate pair with a radius equal to the estimated error, then the true location of the device will lie within this circle with a certain probability whose value depends on the manufacturer. Since GPS receivers rely on satellite communications, these errors can have multiple different sources. The physical environment in which the receiver operates, such as open-sky or urban conditions, significantly affects the quality of the received signal. A classic example would be the user going through a tunnel where there is little or no satellite signal. In addition to that, devices vary greatly in their processing techniques, complexity, and sophistication, which in turn vary the quality of the signals they produce. For these reasons, this chapter presents algorithms that have been designed and tested to be both efficient and robust against a wide range of error values.

Due to the above-mentioned complex nature of their sources, real location errors do not necessarily follow a simple statistical distribution. However, for simplification, we assume that they follow a Gaussian distribution. In that case, the previously defined circle would generally represent an integer multiple of standard deviations.

As with most numerical schemes, the user’s movement is sampled at discrete time intervals t_n. Their path can be thus represented as a 2 × N vector of (λ[n], ϕ[n]). With each pair representing their location at a specific sampling point. This vector shall be then referred to as the fully sampled path.

The goal of the protocol outlined in this chapter is to reduce the number of samples the device has to actually gather while still being able to reconstruct the full path to a given accuracy. The degree of downsampling is quantified by the sampling ratio (ν) which is just the ratio of the number of samples collected to the total number of samples in the full path, referred to hereafter as the path length (N).

2.2 Model

In order to test the different CS algorithms that were designed and optimize them, we needed a significant amount of fully sampled GPS trajectory data to serve as the ground truth to which the results of the reconstruction can be compared. To that end, we opted to simulate the trajectory data that was used to gauge the performance of algorithms in the design stage. The advantage of using simulated data in this step lies in the flexibility offered by the ability to easily control the path lengths as well as the amount of noise added to the data points. After the design stage, the optimal algorithms were implemented on a portable device and made to run on real-life data to evaluate its performance in real time. This section aims to elaborate on the model that was used to generate the simulated data.

A lot of research has been conducted around modelling human trajectories [13, 14]. Many different models have been proposed taking into account different parameters. Even though no conclusive results have been reached about the statistical nature of human movements [14, 15], we are not interested in their distribution but in the individual structure of the trajectory itself. For this reason, we chose the random walk model (RWM) for our simulations.

The random walk has played an integral role in the modelling and analysis of stochastic processes ever since Karl Pearson introduced the original random walk problem [16]. The random walk which we used to generate the trajectory data is the two-dimensional walk on the real plane $\mathbb {R}^2$ in discrete time, also known as the Pearson random walk [17]. There, at each time step, the walker takes a step of a fixed size in a completely random and isotropic direction. As such the walker’s (x, y) coordinates in the plane at time n + 1 can be written in terms of their position at time n as:

$$\displaystyle \begin{aligned} x[n+1] &= x[n] + \cos{}(\theta[n+1])\\ y[n+1] &= y[n] + \sin{}(\theta[n+1]) \end{aligned} $$

(1)

where the steps are taken to be of unit size and $\theta [n]\sim \mathcal {U}(0,2\pi )$ is the heading which is drawn from a uniform distribution. The statistical properties of the Pearson random walk have been studied extensively ever since its introduction and it was shown that at long times, the distance of the walker to the origin ($\rho =\sqrt {x^2+y^2}$) obeys the well-known Rayleigh distribution:

$$\displaystyle \begin{aligned} P(\rho)=\dfrac{2\rho}{n}e^{\rho^2/n} \end{aligned} $$

(2)

After the trajectory path is generated, noise is overlaid on it by sampling from a normal distribution $\sim \mathcal {N}(0,\sigma _e^2)$ with a standard deviation that defines the noise level (σ_e). This noise level may be constant for each data point on the path or it can vary between points in order to simulate the effect of transient noise sources.

Of course, due to its ellipsoidal shape, the surface of the earth is not planar. In fact, as mentioned previously, GPS data comes in the form of ellipsoidal latitude/longitude coordinates. For our algorithm to be useful in real conditions, it needs to be able to perform the reconstruction on trajectory data in these coordinates. To that end, the path generated using the RWM on a 2D plane is mapped to the Lat/Lon coordinate system using the Pseudo-Mercator projection [18], which is based on the WGS84 reference ellipsoid, before being analyzed and fed into the algorithm. This projection was chosen because of its widespread use in mobile mapping applications.

3 Compressive Sensing

3.1 Basics

Compressive sensing is the general name for a class of algorithms that allow for the subsampling of signals to levels below the Shannon frequency, while still being able to reconstruct the signal ex post facto with a high level of accuracy [19]. Its basic, most generic, structure can be seen in Fig. 1. To understand the principle behind the scheme, consider a one-dimensional (1D) vector $\mathbf {x} \in \mathbb {R}^n$ of real numbers, and let this be the signal that one wants to measure. If one considers a subsampling measurement matrix $\mathbf {A} \in \mathbb {R}^{m\times n}$ which maps the true signal to the measured signal y = Ax, then the goal of compressive sensing is to recover x from y with m << n. Now, of course, this problem is not generally solvable, as the mapping between x and y is underdetermined and this leads to an infinite number of exact solutions. However, the scheme gets around this by imposing certain properties on the underlying true signal. Specifically, compressive sensing assumes that x is sparse in some basis {f_i}_{i ∈ [1,n]}. This means that our signal can be written as

$$\displaystyle \begin{aligned} \mathbf{x} = \sum_{i=1}^n\alpha_if_i=\mathbf{F}\boldsymbol{\alpha} \end{aligned} $$

(3)

Where $\mathbf {F}\in \mathbb {R}^{n\times n}$ is called the sparsifying matrix and α_i are the sparse components of the vector in the new basis. The problem then can be modeled as an ℓ₀-minimization subject to a constraint.

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-00832-0_12/MediaObjects/493910_1_En_12_Equ4_HTML.png

(4)

Where $\hat {\mathbf {x}}=\mathbf {F}\hat {\boldsymbol {\alpha }}$ is the reconstructed signal and M = AF is conventionally called the reconstruction matrix. The ℓ₀ norm is commonly defined as the number of non-zero components of a vector which means that it quantifies the true sparsity of the vector. Using this paradigm, this means that we can treat path reconstruction as multidimensional constrained optimization problem.

3.2 Sparsity

Sparsity of α means that most of its components have a value of zero. For example, the Fourier domain representation of a perfect sine wave oscillating at a given frequency is a delta function at that frequency and zero elsewhere. Another more relevant example would be the discrete wavelet transform used in JPEG 2000 compression. The assumption behind this compression scheme is that most images are sparse in the wavelet domain [20]. This means that, in this domain, one needs to only store the non-trivial components of the image and can discard most of the others thus greatly reducing its size. This argument is one of the motivations for using CS in imaging techniques [21].

The first obstacle in implementing CS, as it was described, is that Eq. (4) is a NP-hard problem [22]. The ℓ₀, although it models perfectly the concept of sparsity, is actually poorly adapted for real-world use for many reasons. For example, realistic signals are rarely exactly sparse, in the sense that they rarely have mostly zero-valued components in some basis. In most applications of CS, one tends to consider pseudo-sparsity, where in some basis most of the components are much smaller than the few major components of the signal. This way, the former can be safely ignored while insuring that most of the important information of the signal is captured and thus it can be accurately reconstructed. Furthermore, the ℓ₀-norm is a pseudo-norm that is non-differentiable. Last but not least, such solutions are very sensitive to noise and, in real-world scenarios, signals are always noisy.

In order to circumvent the previously mentioned obstacles, one of the options is to use a convex relaxed optimization. This can be achieved by using the ℓ₁-norm instead. Hence, Eq. (4) can be rewritten as:

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-00832-0_12/MediaObjects/493910_1_En_12_Equ5_HTML.png

(5)

where

$$\displaystyle \begin{aligned} ||\boldsymbol{\alpha}||{}_1 = \sum_{i=1}^n|\alpha_i| \end{aligned} $$

(6)

In order for this relaxation to be valid for a perfect recovery, A and F must be mutually incoherent as much as possible [19]. In other words, the lower the maximum correlation between any two elements of the matrices, the fewer the number of measurement required for a perfect reconstruction.

3.3 Discrete Cosine Transform

Considering location data, even a randomly generated one, it can be safely assumed that they contain fairly low frequency oscillations. This intuition was behind the choice of the discrete cosine transform (DCT) as the sparsifying basis for our signal. To recall, the discrete cosine transform (DCT) of a 1D vector $\mathbf {x}\in \mathbb {R}^N$ is defined as:

$$\displaystyle \begin{aligned} \alpha_n = 2\sum_{k=0}^{N-1}x_k\cos{}(\pi n(2k+1)/2N) \end{aligned} $$

(7)

This equation can be conveniently rewritten as

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-00832-0_12/493910_1_En_12_IEq10_HTML.gif

, with

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-00832-0_12/493910_1_En_12_IEq11_HTML.gif

being the matrix representation of the discrete cosine transform. If the location data is treated as two separate 1D vectors representing latitudes and longitudes, the DCT of these vectors can be shown to be pseudo-sparse. To prove this, we have run simulations generating a large amount of random walks of different sizes. Figure 2 shows that by comparing the largest components of the transformed path to the rest, it can be clearly seen that (λ[n], ϕ[n]) data is indeed pseudo-sparse in this basis. Another interesting thing to note about the results in this figure is that as the added noise to the path increases, the apparent sparsity of the signal decreases. This means that by biasing the reconstruction toward sparser paths, it is possible to denoise the acquired signal and even, in some cases, recover a path that is more accurate than what could have been obtained even by sampling it completely. This shall be elaborated further in the following sections below.

3.4 Measurement Matrix

The final step before solving the CS problem is determining the appropriate measurement, or subsampling matrix. Random matrices drawn from i.i.d. Bernoulli or Gaussian distributions have been shown to be incoherent with any other basis [23]. As a starting point, let us consider a sampling scheme based on the Bernoulli distribution where at each time step we generate a pseudo-random number that is used to determine whether or not a data point is sampled based on the probability defined as:

$$\displaystyle \begin{aligned} p[n] = \nu \end{aligned} $$

(8)

where, again, ν is a predetermined sampling ratio. This gives the draw outcome, b[n] ∈{0, 1}, which decides whether or not to request a new sample. Furthermore, the draw outcomes can then be used to generate the subsampling matrix A, which together with the DCT sparsifying matrix form the reconstruction matrix M. It should be noted that most CS applications have problems implementing random matrices due to the difficulty of storing and reproducing them at the receiver. Luckily, location data signals are time based, hence the measurement matrix is directly mapped to the request from the GPS receiver for a new sample acquisition. Unfortunately, in its current form, the subsampling matrix is completely agnostic to σ_e, i.e., it treats all data points as being equal. This is, of course, not the case in a real-world application, where σ_e varies along the path. Also, the fact that the sampling is based on random number generation means that the effective sampling ratio is almost never the same as the predetermined ν. This is detrimental to some applications that require a fixed known sampling ratio. These shortcomings will be addressed in the following sections using the so-called adaptive sampling schemes. The complete process for generating the subsampling matrix is illustrated in Fig. 3.

3.5 Block-Wise Reconstruction

Subsampling L out of the entire path length N and then applying any reconstruction algorithm is practically unfeasible. This is due to the fact that N can become very large and therefore L as well, thereby increasing the complexity of the algorithm beyond the capabilities of any embedded device. In order to circumvent this problem, we adopt a block-wise reconstruction method as shown in Fig. 4. A time-wise sliding window is utilized to form each block and apply the CS scheme to it. The figure also demonstrates a sophisticated overlap-cut-save (OCS) method that efficiently manages the block boundaries by splitting it into a buffer zone and a sub-block in order to mitigate any discontinuities and interference effects caused by the basis transform. In other words, reconstructed samples in each sub-block do not suffer due to missing information which is obtained from past and future samples outside the sub-block. This method effectively means that the CS protocol will be only applied to the block length B. This parameter can be optimized independently of the path length N to have the best performance possible with the least complexity.

4 Lasso

4.1 Algorithm

Equation (5) is closely related to the least absolute shrinkage and selection operator (LASSO) regression problem in statistics [24], as both of their Lagrangian forms can be written as:

$$\displaystyle \begin{aligned} \hat{\boldsymbol{\alpha}} = \operatorname*{\mathrm{argmin}}_{\boldsymbol{\alpha}} \big\{\frac{1}{2B}||\mathbf{y}-\mathbf{M}\boldsymbol{\alpha}||{}^2_2 + \mu||\boldsymbol{\alpha}||{}_1\big\} \end{aligned} $$

(9)

Where y is the measured down-sampled block, M is the reconstruction matrix, B is the total number of samples in the full block, $\hat {\boldsymbol {\alpha }}$ is the reconstructed block in the DCT domain, and μ is conventionally called the learning rate or the regularization parameter. This formulation can be understood as trying to find the “closest” vector to y, in the sense of the ℓ₂-norm, which has the smallest ℓ₁-norm. As previously discussed, the ℓ₁-norm here ensures that we are biasing the regression toward more sparse vectors, with the degree of bias being controlled by μ. Expressing the problem as Eq. (9) turns it into a linear regression problem and allows the use of traditional LASSO regression solvers. For this case, we have used the conventional Coordinate Descent algorithm which is known to be quite effective for the LASSO problem and easily implementable across various platforms [25]. The mathematical details of this algorithm are beyond the scope of this chapter but can be found in the literature [24].

Reconstructing location data using this scheme has many advantages. The algorithm is implemented in a way that is completely agnostic to the size of the input which means that the block length (B) and the sampling ratio (ν) can be changed at whim. It also means that the variance of the length of the down-sampled path, inherent to the Bernoulli-based measurement matrix mentioned above, is not detrimental to its operation. Furthermore, the hyper-parameters such as the learning rate can be readjusted if necessary after each block. On a more practical note, due to the ubiquity of the Coordinate Descent algorithm, there are many libraries that facilitate its implementation.

4.2 Parameter Optimization

This reconstruction scheme incorporates many hyper-parameters; the most relevant ones are: the block length B, the sampling ratio ν, and the regularization parameter μ. All of which can greatly affect the performance and complexity of the algorithm. In order to find the optimal set of parameters, we have performed an extensive scan over a wide range of them using simulated data with a wide range of noise levels. The metric that was used to evaluate the performance of the algorithm was the conventional root-mean-square error (RMSE) defined as:

$$\displaystyle \begin{aligned} \mathrm{RMSE}(\hat{\boldsymbol{\alpha}},\mathbf{r}) = \sqrt{\dfrac{1}{N}\Big(\sum_{i=0}^{N-1}(\hat{x}_i-x_i)^2+(\hat{y}_i-y_i)^2\Big)} \end{aligned} $$

(10)

Where N is the length of the full unsegmented path, r is the full noise-free path expressed in the (x_i, y_i) Web Mercator coordinates and all variables with a circumflex represent reconstructed quantities. When it comes to finding the optimal block length, there is a trade-off to be made between complexity and performance. It should be noted that, in evaluating the performance of the algorithm we consider the deviation of the full reconstructed path and not just of the individual blocks. Intuitively, the higher the sampling ratio, the better the performance since more samples are included and the CS scheme becomes just a denoising operation. However, this is not desired as the main application is sub-sampled reconstruction, therefore the goal is to find the minimum possible sampling ratio that still allows a reasonable reconstruction. Clearly, the shorter the block length, the faster each block-wise iteration is. Additionally, less storage is required and any inverse transformation is less complex. However, the block length affects the ℓ₁ pseudo-sparsity and thus the CS reconstruction.

The scatter plot in Fig. 5 shows the RMSE across different sampling ratios and block lengths at a specific added noise level σ_e and regularization parameter μ. One can see that the best achievable performance for the lowest sampling ratio is B ∼ O(10²) with ν ∼ 0.2. Since the DCT is most efficient with a block length that is a power of 2, we choose B = 256.

Algorithm 1 Noise Adaptive Sampling Algorithm

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-00832-0_12/MediaObjects/493910_1_En_12_Figaaa_HTML.png

4.3 Adaptive Sampling

In order to improve the previously mentioned Bernoulli sampling scheme, while keeping its random nature, a heuristic algorithm, shown above in Algorithm 1, was developed to adjust recursively the drawing probability in order to reduce the noise injected in the reconstruction algorithm. We define a relative error 𝜖_r = min(σ_e∕σ_max, 1), where σ_max is a predefined error hyperparameter that can vary depending on the application. 𝜖_r is then compared to an error threshold β in order to decide on the incoming signal quality.

The main assumption behind this noise adaptive sampling is that if a certain data point was sampled with a high noise level, this means that upcoming data points are also likely to be the same. This is because sources of significant noise usually evolve on a slower timescale than human motion, e.g., long tunnels or weather disturbances. As such, the algorithm then reduces the probability of sampling the next point based on how high the noise was. To account for the fact that the source may be temporary, the probability then grows until it regains its original value ν, as long as no points have been sampled since. The algorithm does the opposite process in the cases where the noise is unusually low. This ensures that the sampling protocol favors points with a lower error while maintaining a stochastic nature.

5 Neural Networks

5.1 Introduction

As the previous section demonstrates an iterative algorithm, which can prove to be quite computationally taxing on some embedded systems, we turn to an algorithm based on neural networks. Neural networks are a class of computing paradigms that are loosely modeled after the operation of the animal brain in hopes of emulating its processing ability. They have found tremendous success in a multitude of machine learning problems including, but not limited to, computer vision, voice recognition, reinforcement learning, and artificial intelligence [26]. Generally, the network is composed of a collection of nodes, called neurons, grouped in layers where some, or all, of the neurons of one layer are connected to the neurons in the following layer via weights that determine how the output of the former affects the input of the latter. A much more detailed overview of neural networks, their operating principles, and their applications can be found here [27, 28].

To tackle this specific CS application, we have framed the problem of reconstructing location data as a supervised learning one, specifically supervised regression. This means that the goal of the network would be to find the optimal set of weights which would reconstruct the full block from the down-sampled one. Although neural networks are commonly used in classification applications [29], they have also proven to be effective in some regression problems [30, 31].

The principal challenge of this scheme was to find the most optimal network design to achieve the best reconstruction. The input and output layer are, of course, fixed by the length of the down-sampled and full block, respectively. Other than that, the width (size of layers) and depth (number of layers) of the network as well as the number of dropout layers (layers that randomly deactivate a certain proportion of neurons) and the type of activation layers (which add non-linearities to the output of layers) needed to be optimized.

There are two main advantages to the neural network scheme. For one, it does not require us to specify a sparsifying matrix or a measurement matrix. The only input to the protocol is the down-sampled block. Additionally, although neural networks can take a long time to train depending on the size of the network and the amount of training data, they are quick to run and they can be very efficiently parallelized since computing the output of a neural network basically amounts to matrix multiplication. This makes them quite ideal for embedded systems, as the training can be done remotely on dedicated machines and the weights can then be transferred to the device where they are used to achieve the reconstruction.

5.2 Network Design and Optimization

As mentioned above, neural networks contain a lot of variables that needed to be optimized. Other than the parameters controlling the main architecture of the network, one also needs to choose the appropriate training algorithm, activation layers (which add non-linearities to what would otherwise be a purely linear operation) and optimize any dropout regularization (where certain random neurons are deactivated during training to prevent overfitting).

The simulated location data was split into training data which was used to train a specific set of parameters, and validation data which was used to evaluate their performance based on the previously mentioned RMSE metric.

The optimal network was found to contain 10 hidden layers of mostly alternating sizes, shown in Fig. 6. The bottleneck layers serve the same purpose they do in autoencoder networks, where they reduce the dimensionality of the input data and as such are quite useful for CS purposes [32]. The size of the input and output layer are of course fixed by the chosen block length B and sampling ratio ν. Again, we have chosen B = 256 with ν ∼ 0.2, this reduces the size of the network and thus its runtime during training and inference. The chosen activation layer was the leaky ReLU function [33] which allows for both negative and positive outputs which is, of course, essential for this particular regression problem. The leak parameter α ∈ ]0, 1[ also needed to be optimized to find the best possible performance. As for the convergence scheme, tests have shown that the most optimal performance/efficiency is obtained by using the Adam optimization algorithm [34] with α = 0.8. It should be noted that only one neural net is used for both latitude and longitude reconstruction.

5.3 Adaptive Sampling

The advantages of neural networks come at a cost. Once a network design is chosen and trained, it cannot be altered without retraining it. This means that the size of the down-sampled block is fixed as well as full block length. In other words, unlike the LASSO scheme, ν and B need to be predetermined and fixed.

This rigidity means that, not only can one not use the noise adaptive sampling algorithm, but one also cannot even use the simple Bernoulli algorithm as the inherent variance in the random number generation process means that the effective sampling ratio is not deterministic.

Algorithm 2 Ratio Adaptive Sampling Algorithm

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-00832-0_12/MediaObjects/493910_1_En_12_Figaab_HTML.png

In order to circumvent this issue, we have developed a ratio adaptive sampling algorithm is shown in Algorithm 2 which ensures that the effective sampling ratio will be the same as the requested one. The number of points sampled in a block (Σ) is continuously monitored and used to weight the probability of sampling the next point so as to ensure that the final sampling ratio (ν) is completely deterministic no matter the random number generation algorithm.

6 Numerical Results

6.1 Simulation Results

In order to test the aforementioned CS schemes and gather statistics on their absolute and relative performance, we have generated a large amount of simulated location data using the random walk model (RWM) detailed in Section II.b. The noisy simulated paths are each split into blocks, then down-sampled using one of the above-mentioned sampling algorithms and fed through to the different reconstruction algorithms. The blocks are then recombined using the OCS protocol to reform the initial simulated paths. The performance of the algorithm is gauged using the RMSE metric which calculates “how far” the reconstruction is from the true noise-free data. The RMSE is then averaged over all the different data points to show the average performance of each algorithm.

To have a baseline performance to compare to, the RMSE is also computed for the full noisy path with respect to the noise-free data. This baseline estimates the level of accuracy attainable if one were to completely sample the path using a noisy detector. Figure 7 shows this baseline across different noise levels as well as the RMSE of the reconstruction achieved by both schemes (LASSO and neural networks (NN)) from a down-sampled path. Both algorithms are able to outperform the baseline across a wide range of noise levels except for the case of unrealistically low noise (≤3 m). This is as expected since, as was shown earlier, true location data is sparse in the DCT domain with the sparsity decreasing as the added noise level increases. This means that the CS will filter the noise when reconstructing a path from down-sampled noisy data as it tries to find the most sparse solution.

It should be noted that the results in Fig. 7 were obtained while using the regular Bernoulli sampling algorithm for the LASSO scheme and the ratio adaptive sampling algorithm for the NN. To show the advantages of using the noise adaptive sampling algorithm with the LASSO reconstruction, the noise profile added to the path needed to be modified to emulate the effect of temporary high-noise sources. To do so, instead of adding noise that is sampled from a normal distribution with a constant σ_e, each block is split randomly into random-sized “chunks” where each “chunk” has a different noise level that can range from very high to very low. Although the noise adaptive sampling has a set target sampling ratio ν (as can be seen in Algorithm 1), the dependence on the noise level means that the sampling ratio can vary significantly. Therefore, in order to accurately gauge the gain in performance attained by using the noise adaptive sampling, we have compared it to the regular Bernoulli sampling with multiple different sampling ratios. A simulation with adaptive sampling parameters δ = 0.1, β = 0.5, and σ_max = 50 m, and with noise chunks ranging from 0 m to 100 m was run. It showed an improvement of the performance of the reconstruction from the adaptive sampling by about 50% as compared to a purely random sampling that had the same target sampling ratio (ν = 0.2) and by about 25% as compared to a random sampling that had the same effective sampling ratio (ν ∼ 0.4).

6.2 Real Data Experiment

A mobile application has been developed for iOS devices as a proof of concept for the applicability of the algorithm in real-world scenarios. The Core ML and Accelerate frameworks from Apple Inc. were used to efficiently implement LASSO and NN. As before, the block length remains B = 256, the learning rate μ = 0.01 and ν = 0.2. The Bernoulli sampling and the ratio adaptive sampling routines were used for the LASSO and NN reconstruction, respectively. Figure 8 shows the live reconstruction achieved by the CS algorithms on a path made up of N = 3584 sample points. Both the full path and the down-sampled points were recorded to allow for the evaluation of the performance of the algorithms ex post facto. As can be clearly seen in the figure, both algorithms managed to reconstruct the full path to a great degree of accuracy. The estimated RMSE for the reconstruction were (10 ± 8) m and (28 ± 8) m for the LASSO and NN, respectively. It should be noted that the uncertainty in the RMSE comes from the fact that we do not have access to the true noise-free positions and can only compare the reconstruction to what is inherently a noisy fully sampled path.

7 Future Work

All the algorithms demonstrated thus far in this chapter treat the latitude and longitude coordinates as independent from each other. One possible improvement that could enhance the performance would be to consider a joint reconstruction scheme that uses the correlation between λ[n] and ϕ[n] to add additional constraints to the CS schemes. However, we expect such solution to be much more computationally expensive due to the higher dimensional space. Another improvement would be to set up a method to acquire the true noise-free path while performing the live reconstruction in order to be able to measure the true RMSE even in cases of high noise. This can be done by using multiple industrial-grade GPS receivers simultaneously, or just by manually recording the coordinates of each sampling point which is a more accurate (and tedious) method. Another point where this scheme can be improved is by considering the effects of elevation. In this chapter, we have only considered 2D trajectory data. However, there are many regions where due to the nature of their terrain, the elevation of the tracked object can change significantly over the course of its trajectory. The advantage of our approach is that since each coordinate was treated independently, extending the treatment to include elevation is a relatively simple task.

8 Conclusion

In this chapter, we tackled the problem of path reconstruction for IoT devices. For the first time, We have exploited the sparsity of location data in a particular domain to reformulate the problem as that of regularized regression. Using that, we have designed and implemented two novel solutions to trajectory reconstruction: LASSO reconstruction and a neural network model. Each has its set of advantages that make them suited for particular applications, for example, the neural network approach is very computationally light which makes it perfect for embedded systems. An OCS algorithm has been integrated in each scheme to limit the reconstruction to a parametric block length for a more efficient and feasible implementation. For each protocol, an adaptive sampling technique has been developed to efficiently down-sample the path and improve the performance of the algorithms. In both cases, they have been shown to successfully reconstruct the original path with ∼20% of the samples. Furthermore, they outperform the simple scheme of fully sampling the path due to their denoising capability. Finally, a mobile application has been developed as a proof of concept and to validate the theoretical assumptions that were made and the applicability of the reconstruction algorithm in real-world scenarios.

Acknowledgements

This work was supported by UKRI-STFC and UKRI-EPSRC under Grant Nos. ST/P002048/1 and EP/N509711/1. R.A. acknowledges support from the St Anne’s Graduate Development Scholarship scheme. We also gratefully acknowledge the support from the Oxford-ShanghaiTech collaboration project and UKRI-STFC grant ST/T000724/1.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Vorheriges Kapitel A Viscosity Iterative Method with Alternated Inertial Terms for Solving the Split Feasibility Problem

Nächstes Kapitel Nonsmooth Mathematical Programs with Vanishing Constraints in Banach Spaces

E.J. Candes, M.B. Wakin, An introduction to compressive sampling. IEEE Signal Process. Mag. 25(2), 21–30 (2008)CrossRef

C.R. Berger, Z. Wang, J. Huang, S. Zhou, Application of compressive sensing to sparse channel estimation. IEEE Commun. Mag. 48(11), 164–174 (2010)CrossRef

M. Lustig, D. Donoho, J. Pauly, Sparse MRI: the application of compressed sensing for rapid MR imaging, in Magnetic Resonance in Medicine: Official Journal of the Society of Magnetic Resonance in Medicine/Society of Magnetic Resonance in Medicine, vol. 58 (2007), pp. 1182–95

B. Pan, Y. Zheng, D. Wilkie, C. Shahabi, Crowd sensing of traffic anomalies based on human mobility and social media, in Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ser. SIGSPATIAL’13 (Association for Computing Machinery, New York, 2013), pp. 344–353. https://doi.org/10.1145/2525314.2525343

L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, L. Damas, Time-evolving o-d matrix estimation using high-speed GPS data streams. Expert Syst. Appl. 44, 275–288 (2016). http://www.sciencedirect.com/science/article/pii/S0957417415006053 CrossRef

M.E. Nelson, L.D. Mech, P.F. Frame, Tracking of white-tailed deer migration by global positioning system. J. Mammal. 85(3), 505–510 (2004). https://doi.org/10.1644/BOS-120 CrossRef

S. Hiremath, G. Yang, K. Mankodiya, Wearable Internet of Things: concept, architectural components and promises for person-centered healthcare, in 2014 4th International Conference on Wireless Mobile Communication and Healthcare – Transforming Healthcare Through Innovations in Mobile and Wireless Technologies (MOBIHEALTH) (2014), pp. 304–307

R. Aughey, Applications of GPS technologies to field sports. Int. J. Sports Physiol. Perform. 6, 295–310 (2011)CrossRef

T. Hunter, P. Abbeel, A. Bayen, The path inference filter: model-based low-latency map matching of probe vehicle data. IEEE Trans. Intell. Transp. Syst. 15(2), 507–529 (2014)CrossRef

10.

O.P. Dewhirst, H.K. Evans, K. Roskilly, R.J. Harvey, T.Y. Hubel, A.M. Wilson, Improving the accuracy of estimates of animal path and travel distance using gps drift-corrected dead reckoning. Ecol. Evol. 6(17), 6210–6222 (2016). https://onlinelibrary.wiley.com/doi/abs/10.1002/ece3.2359 CrossRef

11.

Y. Zhang, Y. Liu, L.D. Han, Real-time piecewise regression: application to effective and economical collection of gps trajectory data. Transp. Res. Rec. 2643(1), 9–18 (2017). https://doi.org/10.3141/2643-02 CrossRef

12.

G. Inc. Android API documentation. https://developer.android.com/reference/android/location/Location

13.

D. Brockmann, L. Hufnagel, T. Geisel, The scaling laws of human travel. Nature 439(7075), 462–465 (2006). https://www.nature.com/articles/nature04292 CrossRef

14.

M.C. González, C.A. Hidalgo, A.-L. Barabási, Understanding individual human mobility patterns. Nature 453(7196), 779–782 (2008). https://www.nature.com/articles/nature06958 CrossRef

15.

I. Rhee, M. Shin, S. Hong, K. Lee, S.J. Kim, S. Chong, On the Levy-Walk Nature of Human Mobility. IEEE/ACM Trans. Netw. 19(3), 630–643 (2011)CrossRef

16.

K. Pearson, The problem of the random walk. Nature 72(1865), 294–294 (1905). Number 1865. Nature Publishing Group. https://www.nature.com/articles/072294b0

17.

J.E. Kiefer, G.H. Weiss, The Pearson random walk. AIP Conf. Proc. 109(1), 11–32 (1984). American Institute of Physics. https://aip.scitation.org/doi/abs/10.1063/1.34331

18.

S.E. Battersby, M.P. Finn, E.L. Usery, K.H. Yamamoto, Implications of web mercator and its use in online mapping. Cartographica Int. J. Geogr. Inf. Geovisualization 49(2), 85–101 (2014). https://www.utpjournals.press/doi/abs/10.3138/carto.49.2.2313 CrossRef

19.

M. Rani, S.B. Dhok, R.B. Deshmukh, A systematic review of compressive sensing: concepts, implementations and applications. IEEE Access 6, 4875–4894 (2018)CrossRef

20.

M.W. Marcellin, M.J. Gormish, A. Bilgin, M. P. Boliek, An overview of JPEG-2000, in Proceedings DCC 2000. Data Compression Conference (2000), pp. 523–541

21.

M.F. Duarte, M.A. Davenport, D. Takhar, J.N. Laska, T. Sun, K.F. Kelly, R.G. Baraniuk, Single-pixel imaging via compressive sampling. IEEE Signal Process. Mag. 25(2), 83–91 (2008)CrossRef

22.

B. Natarajan, Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995). https://epubs.siam.org/doi/10.1137/S0097539792240406 MathSciNetCrossRef

23.

D. Chafaï, O. Guédon, G. Lecué, A. Pajor, Interactions Between Compressed Sensing Random Matrices and High Dimensional Geometry. Société Mathámatique de France, 2012, google-Books-ID: PwPynQEACAAJ

24.

R. Tibshirani, Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996). http://www.jstor.org/stable/2346178 MathSciNetMATH

25.

T.T. Wu, K. Lange, Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2(1), 224–244 (2008). https://doi.org/10.1214/07-AOAS147 MathSciNetCrossRef

26.

B. Widrow, D.E. Rumelhart, M. Lehr, Neural networks: applications in industry, business and science. Commun. ACM 37, 93–105 (1994)CrossRef

27.

S. Haykin, Neural Networks: A Comprehensive Foundation, 1st edn. (Prentice Hall PTR, Upper Saddle River, 1994)

28.

I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, 2016)MATH

29.

A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems 25, ed. by F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger (Curran Associates, Inc., Red Hook, 2012), pp. 1097–1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

30.

D.F. Specht, A general regression neural network. IEEE Trans. Neural Netw. 2(6), 568–576 (1991)CrossRef

31.

H. Cigizoglu, M. Alp, Generalized regression neural network in modelling river sediment yield. Adv. Eng. Softw. 37, 63–68 (2006)CrossRef

32.

G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006). http://science.sciencemag.org/content/313/5786/504 MathSciNetCrossRef

33.

B. Xu, N. Wang, T. Chen, M. Li, Empirical evaluation of rectified activations in convolutional network. arXiv:1505.00853 [cs, stat] (2015). http://arxiv.org/abs/1505.00853

34.

D.P. Kingma, J. Ba, Adam: a method for stochastic optimization. CoRR, abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980

Titel: Efficient Location-Based Tracking for IoT Devices Using Compressive Sensing and Machine Learning Techniques
verfasst von: Ramy Aboushelbaya
Taimir Aguacil
Qiuting Huang
Peter A. Norreys
Verlag: Springer International Publishing
Buch: High-Dimensional Optimization and Probability
Print ISBN: 978-3-031-00831-3

Electronic ISBN: 978-3-031-00832-0

Copyright-Jahr: 2022
DOI: https://doi.org/10.1007/978-3-031-00832-0_12

Springer Professional