Environmental time series analysis and forecasting with the Captain toolbox
Introduction
Over the past 25 years, numerous publications by the present third author and his colleagues (e.g. Young, 1978, Young, 1983, Young, 1993, Young, 1998a, Young, 1998b, Young, 1999a, Young, 1999b, Young, 2000) have introduced the Data-Based Mechanistic (DBM) modelling philosophy and the associated statistical modelling techniques. Here, DBM models are obtained initially from an analysis of observational time-series, based on generic model structures, such as differential equations or their discrete-time equivalents. Unlike alternative ‘black-box’ modelling approaches, however, they are only considered credible in a DBM sense if they can also be interpreted in physically meaningful terms. It is a philosophy that emphasises the importance of parametrically efficient, low order, ‘dominant mode’ models (sometimes referred to as ‘top-down’ models), as well as the development of stochastic methods and the associated statistical analysis required for their identification and estimation.
The methodological tools and model structures utilised in this research can be categorised broadly into the following four closely related and overlapping themes (see the references above for details).
- (1)
The various model structures can be unified in terms of the Unobserved Components (UC) model. Here, the output time series is assumed to be composed of an additive or multiplicative combination of different components that have defined statistical characteristics but which cannot be observed directly. Such components may include a trend or low frequency component, a seasonal component (e.g. annual), additional sustained cyclical or quasi-cyclical components, stochastic perturbations, and a component that captures the influence of exogenous input signals.
- (2)
If the system being modelled can be approximated by a linearised, constant parameter Transfer Function (TF) model, then Instrumental Variable (IV) methods are employed. Here, an adaptive auxiliary model and optimal pre-filters are introduced into an iterative solution that ensures consistent, asymptotically unbiased and statistically efficient (minimum or low variance) parameter estimates.
- (3)
If the system is non-stationary, in the sense that the statistical properties of the signal, as defined by the parameters in an associated model, are changing over time at a rate that is ‘slow’ in relation to the rates of change of the stochastic state variables in the system under study, then the analysis typically utilises statistical estimation methods that are based on the identification and estimation of stochastic Time Varying Parameter (TVP) models. The algorithms used for such TVP identification and estimation are based on a stochastic state space formulation of the UC model and the use of recursive Kalman Filter (KF) and Fixed Interval Smoothing (FIS) algorithms. In most cases, the algorithm is formulated in terms of a TVP regression model structure. However, a recently developed instrumental variable algorithm allows for the estimation of time variable parameters in TF models, again exploiting recursive KF/FIS estimation.
- (4)
If the changes in the parameters are functions of the state or input variables, then the system is truly non-linear and it is likely to exhibit severe non-linear or even chaotic behaviour. Normally, this cannot be approximated in a simple TVP manner because the parameters can vary at a very rapid rate consistent with that of the state variables on which they depend. In this case, recourse is made to an alternative State Dependent Parameter (SDP) approach, which again exploits recursive KF/FIS estimation but this time within an iterative ‘backfitting’ algorithm that involves special re-ordering of the time series data.
The present paper focuses on a Matlab® compatible toolbox, Captain, that has evolved from the above research. It has recently been updated to include the latest methodological developments in all four of the above areas. Based around a powerful stochastic state space framework, Captain extends Matlab® in order to allow for the identification and estimation of the most general UC models, including popular forms such as the Basic Structural Model (BSM) of Harvey (1989) and the Dynamic Linear Model (DLM) of West and Harrison (1989). In Captain, these UC modelling tools are combined with a standard set of data pre-processing, system identification and model validation tools, so that the resulting toolbox constitutes a wide-ranging and widely applicable package for signal processing and general time series analysis.
Uniquely within a Matlab® context, however, Captain focuses on TVP and SDP models, where the stochastic evolution of each parameter is assumed to be described by a generalised random walk process (Jakeman and Young, 1981). Captain provides novel tools for non-stationary TVP analysis, allowing for the optimal estimation of dynamic regression models, including dynamic linear regression (Young, 1998a), dynamic auto-regression (Young, 1998b) and dynamic harmonic regression (Ng and Young, 1990, Young et al., 1999), in addition to the related state dependent parameter class of model (Young, 2000).
In all these cases, the state space formulation is particularly well suited to estimation based on optimal recursive KF/FIS estimation, in which the time variable parameters are estimated sequentially whilst working through the data in temporal order. Here, the time-varying parameters are represented as state variables, as suggested originally by Mayne (1963) and Lee (1964). Furthermore, the estimates obtained from an initial, forward-pass KF algorithm are updated sequentially whilst working through the data in reverse temporal order using a backwards-recursive FIS algorithm (e.g. Bryson and Ho, 1969). The use of FIS for parameter estimation follows from the work of Norton, 1975, Norton, 1976, Norton, 1986 and Jakeman and Young (1981).
In many situations, of course, time-invariant parameter models are quite sufficient for environmental systems analysis. In this regard, one model form that has received special treatment in the toolbox, and is particularly useful in an environmental context (see Young, 2005), is the multiple-input, single-output TF model. Captain includes functions for the identification and estimation of both discrete-time (Young and Jakeman, 1979, Young, 1984, Young, 1985) and continuous-time (Young and Jakeman, 1980, Young, 2002a) TF models of this kind. In both cases, the main statistical tools are the Refined (RIV) and Simplified Refined Instrumental Variable (SRIV) algorithms. One advantage of the TF model is its simplicity and ability to characterise the dominant modal behaviour of a dynamic system. This makes such a model an ideal basis for the estimation of parsimonious models of environmental systems from experimental and monitored data. However, it can also be used as a means of estimating reduced order, dominant mode approximations of high order environmental simulation models (e.g. Young et al., 1996, Young, 1998b). In addition, Captain has been successfully utilised for the design of practical control systems for many years, some in connection with environmental applications (e.g. Taylor et al., 2004a, Taylor et al., 2004b).
In the rest of this paper, Section 2 introduces the stochastic state space modelling framework, whilst Section 3 reviews the various TVP and SDP model forms. By contrast, Section 4 of the paper considers the special case of TF model estimation, where the state space approach is avoided because other superior methods are available. A brief overview of the toolbox functionality and application areas is given in Section 5, followed in Section 6 by several recent examples utilising the toolbox for the analysis of environmental time series data. Finally, the conclusions are presented in Section 7.
Section snippets
Stochastic state space methodology
The stochastic state space approach to time series analysis and modelling has been developed by researchers in many different scientific disciplines and is, perhaps, one of the most natural and convenient approaches for use with computers. For recent examples, see e.g. Bueso et al., 2005, Zolghadri et al., 2004, Zolghadri and Cazaurang, in press. For this reason, many of the models in the toolbox are implemented in such a form. In fact, a number of models that are uniquely available in Captain
Model structures: unobserved components
UC models in Captain can be synthesised by the following, general discrete-time equation,where is the observed time series; is a trend or low frequency component; is a sustained cyclical or quasi-cyclical component (e.g. a diurnal cycle caused by biological activity) with period different from that of any seasonality in the data; is a seasonal component (e.g. annual seasonality); captures the influence of a vector of exogenous variables
Transfer Function (TF) models
A special case of the UC model (6) is:where the main effect on arises from the exogenous inputs . In the stationary time series case, a specific but practically useful, alternative form of this model is the time invariant parameter, multi-input, single output TF model, the discrete-time version of which takes the form:where is the output; , are m input variables that are
Toolbox overview
Some of the estimation algorithms considered here were developed originally in the 1960/1970/1980's for the CAPTAIN and microCAPTAIN time series analysis and forecasting packages (MS-DOS based). The associated optimisation algorithms were developed later in the 1980/1990's and are used in the now obsolescent microCAPTAIN program (Young and Benner, 1991). However, the Matlab® implementation is far more flexible and wide-ranging than these previous implementations, including many algorithms not
Worked examples
The present section briefly considers four practical, environmentally-related applications, together with one simulation example, deliberately chosen to illustrate several areas of toolbox functionality. Note that the analysis is largely based on typical or default settings for the Captain functions utilised; numerous additional options are described in more detail by Pedregal et al. (2004).
Conclusions
This paper has described a Matlab® compatible toolbox, Captain, that has evolved from many years of research using Data-Based Mechanistic (DBM) models for the analysis of natural and man-made environmental systems (as well as systems in other areas of study from engineering to economics). This research has introduced a wide range of modelling tools, encompassing various model structures and identification algorithms, now fully implemented in the toolbox.
Essentially, the toolbox is a collection
Acknowledgement
The authors are grateful for the support of the Engineering and Physical Sciences Research Council (EPSRC).
References (54)
- et al.
A study on sensitivity of spatial sampling designs to a priori discretization schemes
Environmental Modelling and Software
(2005) - et al.
Efficient tests for normality, homoskedasticity and serial independence of regression residuals
Economic Letters
(1980) - et al.
Data-Based Mechanistic Modelling (DBM) and Control of Mass and Energy transfer in agricultural buildings
Annual Reviews in Control
(1999) The instrumental variable method: a practical approach to identification and system parameter estimation
Data-based mechanistic modelling of environmental, ecological, economic and engineering systems
Environmental Modelling and Software
(1998)Data-based mechanistic modelling, generalised sensitivity and dominant mode analysis
Computer Physics Communications
(1999)- et al.
Development of an operational model-based warning system for tropospheric ozone concentrations in Bordeaux, France
Environmental Modelling and Software
(2004) A new look at the statistical model identification
IEEE Transactions on Automatic Control
(1974)Detecting level shifts in time series
Journal of Business and Economic Statistics
(1993)- et al.
Time Series Analysis, Forecasting and Control
(1994)
Applied Optimal Control, Optimization, Estimation and Control
The problem of the Nile: conditional solution to a change-point problem
Biometrika
Time series analysis by state space methods
Forecasting Structural Time Series Models and the Kalman Filter
Air patterns and turbulence in an experimental livestock building
Journal of Agricultural Engineering Research
Recursive filtering and the inversion of ill-posed causal problems
Utilitas Math
A new approach to linear filtering and prediction problems
ASME Transactions Journal Basic Engineering
Optimal Estimation, Identification and Control
On a measure of lack of fit in time series models
Biometrika
Optimal nonstationary estimation of the parameters of a linear system with Gaussian inputs
Journal of Electron Control
Proportional-Integral-Plus (PIP) control of non-linear systems
Systems Science (Warszawa, Poland)
Recursive estimation and forecasting of nonstationary time series
Journal of Forecasting
Optimal smoothing in the identification of linear time-varying systems
Proceedings Institute Electrical Engineers
Identification by optimal smoothing using integrated random walks
Proceedings Institute Electrical Engineers
An Introduction to Identification
System Identification, Time Series Analysis and Forecasting
The Captain Toolbox Handbook
Cited by (235)
Long-term patterns and changes of unglaciated High Arctic stream thermal regime
2024, Science of the Total EnvironmentUISCEmod: Open-source software for modelling water level time series in ephemeral karstic wetlands
2023, Environmental Modelling and SoftwareModeling streamflow in headwater catchments: A data-based mechanistic grounded framework
2022, Journal of Hydrology: Regional StudiesComparison of the automated monitoring of the sow activity in farrowing pens using video and accelerometer data
2022, Computers and Electronics in Agriculture