Filling gaps in chaotic time series

doi:10.1016/j.physleta.2005.07.076

Physics Letters A

Volume 346, Issues 1–3, 10 October 2005, Pages 47-53

https://doi.org/10.1016/j.physleta.2005.07.076 Get rights and content

Abstract

We propose a method for filling arbitrarily wide gaps in deterministic time series. Crucial to the method is the ability to apply Takens' theorem in order to reconstruct the dynamics underlying the time series. We introduce a functional to evaluate the degree of compatibility of a filling sequence of data with the reconstructed dynamics. An algorithm for finding highly compatible filling sequences with a reasonable computational effort is then discussed.

Introduction

One problem faced by many practitioners in the applied sciences is the presence of gaps (i.e. sequences of missing data) in observed time series, which makes hard or impossible any analysis. The problem is routinely solved by interpolation if the gap width is very short, but it becomes a formidable one if the gap width is larger than some time scale characterizing the predictability of the time series.

If the physical system under study is described by a small set of coupled ordinary differential equations, then a theorem by Takens [1], [2] suggests that from a single time series it is possible to build-up a mathematical model whose dynamics is diffeomorph to that of the system under examination. In this Letter we leverage the dynamic reconstruction theorem of Takens for filling an arbitrarily wide gap in a time series.

It is important to stress that the goal of the method is not that of recovering a good approximation to the lost data. Sensitive dependence on initial conditions, and imperfections of the reconstructed dynamics, make this goal a practical impossibility, except for some special cases, such as small gap width, or periodic dynamics. We rather aim at giving one or more surrogate data which can be considered compatible with the observed dynamics, in a sense which will be made rigorous in the following.

We shall assume that an observable quantity s is a function of the state of a continuous-time, low-dimensional dynamical system, whose time evolution is confined on a strange attractor (that is, we explicitly discard transient behavior). Both the explicit form of the equations governing the dynamical system and the function which links its state to the signal $s (t)$ may be unknown. We also assume that an instrument samples $s (t)$ at regular intervals of length Δt, yielding an ordered set of $\bar{N}$ data $s_{i} = s ((i - 1) Δ t), i = 1, \dots, \bar{N} .$ If, for any cause, the instrument is unable to record the value of s for a number of times, there will be some invalid entries in the time series ${s_{i}}$ , for some values of the index i.

From the time series ${s_{i}}$ we reconstruct the underlying dynamics with the technique of delay coordinates. That is, we shall invoke Takens' theorem [1], [2] and claim that the m-dimensional vectors $x_{i} = (s_{i}, s_{i + τ}, \dots, s_{i + (m - 1) τ})$ lie on a curve in $R^{m}$ which is diffeomorph to the curve followed in its (unknown) phase space by the state of the dynamical system which originated the signal $s (t)$ . Here τ is a positive integer, and i now runs only up to $N = \bar{N} - (m - 1) τ$ . Severals pitfalls have to be taken into account in order to choose the most appropriate values for m and τ. Strong constraints also come from the length of the time series, compared to the characteristic time scales of the dynamical system, and from the amount of instrumental noise which affects the data. We shall not review these issues here, but address the reader to Refs. [3], [4], [5].

We note that gaps (that is, invalid entries) in the time series ${s_{i}}$ do not prevent a successful reconstruction of a set $R = {x_{i}}$ of state vectors, unless the total width of the gaps is comparable with $\bar{N}$ . We simply mark as “missing” any reconstructed vector $x_{i}$ whose components are not all valid entries. If the gap in the signal s spans more than $(m - 1) * τ$ data points, then it will be mapped into a contiguous gap in the sequence of reconstructed vectors.

If the valid vectors of $R$ sample well enough the underlying strange attractor embedded in $R^{m}$ , one may hope to find, by means of a suitable interpolation technique, a vector field $F : U \to R^{m}$ , such that within an open set U of $R^{m}$ containing all the vectors $x_{i}$ , the observed dynamics can be approximated by $\dot{x} = F (x) .$ This very idea is at the base of several forecasting schemes, where one takes the last observed vector $x_{N}$ as the initial condition for Eq. (2), and integrates it forward in time (see, e.g., [7], [8]).

The gap-filling problem was framed in terms of forecasts by Serre et al. [9]. Their method, which amounts to a special form of the shooting algorithm for boundary value problems, is limited by the predictability properties of the dynamics, and cannot fill gaps of arbitrary width.

The rest of this Letter is organized as follows: in Section 2 we cast the problem as a variational one, where a functional measures how well a candidate filling trajectory agrees with the vector field defining the observed dynamics. Then an algorithm is proposed for finding a filling trajectory. In Section 3 we give an example of what can be obtained with this method. Finally, we discuss the algorithm and offer some speculations on future works in Section 4.

Section snippets

A variational approach

The source of all difficulties of gap-filling comes from the following constraint: the interpolating curve, which shall be as close as possible to a solution of (2), must start at the last valid vector before the gap and reach the first valid vector after the gap in a time T which is prescribed.

To properly satisfy this constraint, we propose to frame the problem of filling gaps as a variational one. We are looking for a differentiable vector function $ξ : [0, T] \to U$ which minimizes the functional $J (ξ)$

An example

In this section we show how the algorithm described above performs on a time series generated by a chaotic attractor. We integrate numerically the Lorenz equations [10] with the usual parameters ( $σ = 10$ , $r = 28$ , $b = 8 / 3$ ). We sample the x-variable of the equations with an interval $Δ t = 0.02$ , collecting 5000 consecutive data points which are our time series. One thousand consecutive data points are then marked as “not-valid”, thus inserting in the time series a gap with a width of $1 / 5$ th of the series

Discussion and conclusions

In this Letter we have described an algorithm which fills an arbitrarily wide gap in a time series, provided that the dynamic reconstruction method of Takens is applicable. The goal is to provide a filling signal which is consistent with the observed dynamics, in the sense that, in the reconstructed phase space, the vector tangent to the filling curve should be close to the vector field modeling the observed dynamics. This request is cast as a variational problem, defined by the functional (3).

Acknowledgements

This work has been supported by fondo convezione strana of the Department of Mathematics of the University of Lecce. We are grateful to Prof. Carlo Sempi and to Dr. Fabio Paronetto for valuable comments.

References (14)

M. Casdagli et al.
Physica D
(1991)
A. Provenzale et al.
Physica D
(1992)
M. Casdagli
Physica D
(1989)
F. Paparella et al.
Phys. Lett. A
(1997)
F. Takens
T. Sauer et al.
J. Stat. Phys.
(1991)
J. Theiler
Phys. Rev. A
(1986)

There are more references available in the full text version of this article.

Cited by (9)

Statistical properties and time-frequency analysis of temperature, salinity and turbidity measured by the MAREL Carnot station in the coastal waters of Boulogne-sur-Mer (France)
2016, Journal of Marine Systems
Citation Excerpt :
The Blackman–Tukey method, however, requires evenly-spaced data. Therefore, we have interpolated the turbidity time series with 13% of missing data in order to generate the powespectra in Fig. 10, as done in some studies (Ibanez and Conversi, 2002; Paparella, 2005). Nevertheless, interpolation introduces numerous artifacts to the data, both in the time and the frequency domain.
In marine sciences, many fields display high variability over a large range of spatial and temporal scales, from seconds to thousands of years. The longer recorded time series, with an increasing sampling frequency, in this field are often nonlinear, nonstationary, multiscale and noisy. Their analysis faces new challenges and thus requires the implementation of adequate and specific methods. The objective of this paper is to highlight time series analysis methods already applied in econometrics, signal processing, health, etc. to the environmental marine domain, assess advantages and inconvenients and compare classical techniques with more recent ones. Temperature, turbidity and salinity are important quantities for ecosystem studies. The authors here consider the fluctuations of sea level, salinity, turbidity and temperature recorded from the MAREL Carnot system of Boulogne-sur-Mer (France), which is a moored buoy equipped with physico-chemical measuring devices, working in continuous and autonomous conditions. In order to perform adequate statistical and spectral analyses, it is necessary to know the nature of the considered time series. For this purpose, the stationarity of the series and the occurrence of unit-root are addressed with the Augmented–Dickey Fuller tests. As an example, the harmonic analysis is not relevant for temperature, turbidity and salinity due to the nonstationary condition, except for the nearly stationary sea level datasets. In order to consider the dominant frequencies associated to the dynamics, the large number of data provided by the sensors should enable the estimation of Fourier spectral analysis. Different power spectra show a complex variability and reveal an influence of environmental factors such as tides. However, the previous classical spectral analysis, namely the Blackman–Tukey method, requires not only linear and stationary data but also evenly-spaced data. Interpolating the time series introduces numerous artifacts to the data. The Lomb–Scargle algorithm is adapted to unevenly-spaced data and is used as an alternative. The limits of the method are also set out. It was found that beyond 50% of missing measures, few significant frequencies are detected, several seasonalities are no more visible, and even a whole range of high frequency disappears progressively. Furthermore, two time-frequency decomposition methods, namely wavelets and Hilbert–Huang Transformation (HHT), are applied for the analysis of the entire dataset. Using the Continuous Wavelet Transform (CWT), some properties of the time series are determined. Then, the inertial wave and several low-frequency tidal waves are identified by the application of the Empirical Mode Decomposition (EMD). Finally, EMD based Time Dependent Intrinsic Correlation (TDIC) analysis is applied to consider the correlation between two nonstationary time series.
Feature-preserving interpolation and filtering of environmental time series
2015, Environmental Modelling and Software
Citation Excerpt :
Recently, copula-based methods have been shown to outperform kriging for gap-filling problems (Bárdossy and Pegram, 2014). In another vein, Paparella (2005) formulates the gap-filling problem as an optimization that starts with a stitching of pieces from the observed signal. In recent years, a family of non-parametric methods has emerged in geostatistics that are based on the recognition that parametric models may be poorly adapted to represent complex phenomena.
We propose a method for filling gaps and removing interferences in time series for applications involving continuous monitoring of environmental variables. The approach is non-parametric and based on an iterative pattern-matching between the affected and the valid parts of the time series. It considers several variables jointly in the pattern matching process and allows preserving linear or non-linear dependences between variables. The uncertainty in the reconstructed time series is quantified through multiple realizations. The method is tested on self-potential data that are affected by strong interferences as well as data gaps, and the results show that our approach allows reproducing the spectral features of the original signal. Even in the presence of intense signal perturbations, it significantly improves the signal and corrects bias introduced by asymmetrical interferences. Potential applications are wide-ranging, including geophysics, meteorology and hydrology.
Estimation of connectivity measures in gappy time series
2015, Physica A: Statistical Mechanics and its Applications
A new method is proposed to compute connectivity measures on multivariate time series with gaps. Rather than removing or filling the gaps, the rows of the joint data matrix containing empty entries are removed and the calculations are done on the remainder matrix. The method, called measure adapted gap removal (MAGR), can be applied to any connectivity measure that uses a joint data matrix, such as cross correlation, cross mutual information and transfer entropy. MAGR is favorably compared using these three measures to a number of known gap-filling techniques, as well as the gap closure. The superiority of MAGR is illustrated on time series from synthetic systems and financial time series.
Statistical properties of turbidity, oxygen and pH fluctuations in the Seine river estuary (France)
2008, Physica A: Statistical Mechanics and its Applications
We consider here, the fluctuations of turbidity, oxygen and pH time series recorded from the MAREL system (Ifremer, France), which is based on the deployment of data buoys equipped with water analysis capabilities in an automated mode. We perform a spectral analysis (from 10 min to years) of these time series, and we estimate their probability density functions. Oxygen, turbidity and pH are important quantities for ecosystem studies and physics–biology couplings, and their fluctuations reveal the possible influence of environmental factors such as tides and turbulence. Turbidity variability is highly complex and does not appear to be directly coupled to turbulence, since no clear range is visible. On the other hand, oxygen percentage of saturation and pH data show remarkably nice scaling ranges, indicating an important coupling with turbulence, but also biological or chemical activity, since the statistics differ markedly from passive scalars. Several possible sources of this variability are discussed.
An Exploratory Study on Hindcasting with Analogue Ensembles of Principal Components
2022, Communications in Computer and Information Science
Evaluation of algorithms for correction of transcranial magnetic stimulation-induced artifacts in electroencephalograms
2019, Medical and Biological Engineering and Computing

View all citing articles on Scopus

View full text

Filling gaps in chaotic time series

Abstract

Introduction

Section snippets

A variational approach

An example

Discussion and conclusions

Acknowledgements

Physica D

Physica D

Physica D

Phys. Lett. A

J. Stat. Phys.

Phys. Rev. A