Elsevier

Geoderma Regional

Volume 10, September 2017, Pages 39-47
Geoderma Regional

Evaluating regression-kriging for mid-infrared spectroscopy prediction of soil properties in western Kenya

https://doi.org/10.1016/j.geodrs.2017.04.003Get rights and content

Highlights

  • Predictions of aluminium, copper, iron, zinc and boron were improved using the hybrid method.

  • Spherical and Exponential methods gave lowest MSPE in most of the variogram models.

  • Soil carbon content was well predicted using the PLS regression method only.

  • Approach accounts for residual spatial correlation.

Abstract

In this study, the utility of regression-kriging was investigated in building prediction models for soil properties using mid-infrared (7498 to 600 cm 1) spectral data for soil samples collected from Nyando, Nzoia and Yala catchment areas in Kenya, sampled at 0–20 cm and 20–50 cm depths. Using a systematic technique, 158 samples were selected for analysis of a number of soil properties of interest using wet chemistry methods. We randomly divided the dataset into two groups: 118 samples in the calibration and 40 samples in the holdout validation set. The calibration set was first used to develop partial least squares regression (PLS) models for all the soil properties. Residuals from these models were used to generate semivariograms, which revealed a strong spatial dependence as determined by the ratio of nugget to sill for nitrogen, 9%; Al, 12%; and B, 36%, but with weak spatial dependence for exchangeable Ca (ExCa), 100%; and carbon, 76%. The fitted theoretical semivariograms were used to fit regression-kriging models. Lastly, both the PLS and regression-kriging models were assessed with the validation set and their prediction performance evaluated by R2 and root mean square error (RMSE). The results showed that regression kriging method gave lower RMSE values for all the evaluated soil properties except for ExCa, B and exchangeable acidity, with the best predictions, compared with the PLS model, obtained for ExMg (R2, 0.93 vs 0.88; RMSE, 6.1 vs 8.4 cmolc kg 1) and total nitrogen (R2 = 0.92 vs R2 = 0.74; RMSE, 0.11%, RMSE = 0.2%). In this study, regression-kriging, which takes into account spatial variation normally ignored by other methods, improved use of infrared spectroscopy for predicting soil properties.

Introduction

Soil is a multifunctional and complex medium providing ecosystem services such as the production of food, fiber and fuel, provision of habitat, water cycling and climate regulation. Ecosystems services depend on soil function, provided by soil minerals, organic matter, different types of organisms, as well as varying amounts of air and water. Good soil management, required to maintain ecosystem services requires accurate information on the status of soil properties, such as nutrient supply capacity. Soil testing, since the 1940s, has been a routine practice for assessing soil quality and to determine fertilizer requirements (Schoenholtz et al., 2000). However, traditional methods for quantifying soil chemical and physical properties are expensive and slow (Ludwig et al., 2008). In addition the analytical methods are associated with generation of toxic wastes that must be properly disposed (Carter, 2006).

Alternative methods for soil analysis have been developed and in particular infrared diffuse reflectance spectroscopy has been proposed for rapid, low cost determination of soil properties (Nguyen et al., 1991, Nocita et al., 2015, Pirie et al., 2005, Viscarra Rossel et al., 2006). A necessary step in using spectroscopy methods is analysis of the complex data acquired. Quantitative analysis of infrared spectra data has relied heavily on the use of chemometric techniques to construct appropriate calibration models to establish relationships between concentrations of physical and chemical components measured using reference methods and the spectral data.

Both near infrared (NIR; 25,000–4000 cm 1) and mid infrared (MIR; 4000–400 cm 1) regions in the electromagnetic spectrum were investigated for soil C quantification in (Dalal and Henry, 1986) using multiple LINEAR regression with only three absorption wavelengths as predictor variables. Since then, different statistical analysis methods have been used including: PLS regression (Awiti et al., 2008, Brown et al., 2006, Höskuldsson, 1988, Janik et al., 2007, Wold et al., 2001), principal component regression (PCR) (Jahn et al., 2006, Knox et al., 2015, Linker et al., 2005) and Boosted Regression Trees (BRT) (Brown et al., 2006). Partial least square (PLS) is the most commonly used calibration method because of its advantages of simplicity and ease-of-use. However, PLS is essentially a linear technique and may lead to limited prediction accuracy where nonlinearity is present, for instance due to deviations from the Beer-Lambert law, nonlinear detector responses, drift in the light source, and interaction between analytes (Benoudjit et al., 2004). Although weak non-linearity may be partially compensated by using additional latent variables, there is a danger of overfitting (Viscarra Rossel et al., 2006).

Although satisfactory results for use of infrared spectroscopy to predict soil properties have been reported in many studies, spatial dependency among soil samples used in infrared spectroscopy studies has received less attention. PLS regression models include a residual term that is assumed to be identically and independently distributed. But, for this assumption to hold, the Z soil property for samples should be independent of each other to guarantee optimality of the prediction model.

Past research work has shown that many soil properties in agricultural fields exhibit different degrees of spatial dependence. For example, Olea, 1994, Olea, 2006 collected soil samples at two regular grids, one in a conventional field (320 × 30 m) with a cell size of (15 × 15 m) and the other in an organic field (290 × 46 m) with a cell size of (15 × 15 m) in Iowa, USA. They found strong spatial correlation for soil pH, exchangeable Ca, total organic C, and total N in the conventional field, and strong spatial correlation for P and Mg in the organic field. Iqbal et al. (2005) determined the degree of spatial variability of soil physical properties of alluvial floodplain soils in Mississippi. A total of 209 soil profiles were each subdivided into three depths: surface, subsurface and deep horizons and sampled at a mean distance of 79.4 m. They found that soil physical properties (including bulk density, sand, clay, water content, etc.) showed medium to strong degrees of spatial dependence at all depths. However, few infrared spectroscopy studies have attempted to investigate the effect of spatial dependence among soil samples when developing prediction models. Odlare et al. (2005) suggested that combining infrared spectroscopy and geostatistics reveals spatial soil variation and thereby replace the more conventional, laborious and expensive soil analyses.

Geostatistics has been widely used to account for spatial correlation among samples and has been incorporated into soil property prediction models using other auxiliary variables (Odlare et al., 2005). The hybrid method is known as regression-kriging, as distinct from standard regression methods (e.g., PCR, PLS and BRT) or kriging models alone. Regression-kriging combines a regression of a response variable against independent variable(s). For example, Odeh et al. (1995) compared several methods involving spatial prediction of soil properties from landform attributes using carefully designed validation procedures. Methods including MLR, isotopic co-kriging and heterotopic co-kriging, were tested against ordinary kriging and universal kriging. The authors found that regression-kriging methods generally performed better than MLR and plain kriging. Hengl et al. (2004) proposed a generic framework for spatial prediction of soil variables based on regression-kriging, where the target variables were first fitted with stepwise regression and residuals interpolated with kriging. The framework was tested using a total of 135 soil profiles observed in Croatia, which were divided into two calibration (nc = 100) and validation (nv = 35) sets. In their work, three soil properties: organic matter, pH and topsoil thickness, were predicted from six relief parameters and nine soil mapping units. The authors found their approach gave a lower relative root mean squared error than ordinary kriging (53.3% versus 66.5%) for organic matter and for topsoil thickness achieved a lower relative root mean square error (66.6%) than ordinary kriging which gave a root mean square error of 83.3%.

Hengl et al. (2007) comprehensively discussed regression-kriging from first principles, including equations used and three case studies. For instance, in a case study of mapping land surface temperature, the initial deterministic model accounted for 56.4% variation in the data used to model temperature, but calibrating the model by kriging the residuals accounted for up to 95% of the total variation. However, they also note performance of regression-kriging can be limited by the quality of data being used and use of uneven predictors that have an uneven relation to the target variables, Furthermore, the analyst must carry out various steps in different statistical and geographical information system (GIS) software environments.

Among the different methods of spatial interpolation of soil properties, inverse distance weighing and ordinary kriging are most common (Walvoort and Gruijter, 2001). From a theoretical standpoint, kriging is the optimal interpolation method (Oliver and Webster, 1990); however, its correct application requires an accurate determination of the spatial structure via semivariogram construction and model fitting. Generally, at least 50 to 100 samples might be required to obtain a reliable semivariogram that correctly describes spatial structure (Guo et al., 2001, Olea, 1994, Olea, 2006). Most studies have studied spatial variation of soil properties at plot or field scale (< 100 ha area), but there are fewer studies at landscape or watershed scale.

This study aimed to assess the potential of coupling partial least square regression (PLS) and residual kriging methods for improving prediction of soil properties from georeferenced Fourier-transform mid-infrared diffuse reflectance (MIR) data. The study was conducted in western Kenya, where spatial variability in soil properties is large due to soil type and management effects (Tittonell et al., 2013).

Section snippets

Study area

The 20,000 km2 study area is located in three river basins of western Kenya, namely, Nyando (3550 km2), Yala (3363 km2) and Nzoia (12,984 km2), which together support a population of 7 million people. It is estimated that about 75% of the area within these basins is an agro-ecosystem. The study area had been subdivided into 9 focal areas (FAs) each measuring 100 km2, but the data used in this study are for 5 FAs. Elevation zones stratified the FAs, categorizing the study areas as: lowlands (1334–1440

Results and discussion

The soils collected from the five sites showed a wide range in the measured soil properties (Table 1). Soil pH ranged from strongly acidic to strongly alkaline (pH 4.0–9.0). Soils with pH value under 4.5 indicate a high availability of exchangeable Al, which causes plant toxicity (Delhaize and Ryan, 1995). Organic carbon ranged from very low to very high values (0.2–7.2) for agricultural soils. The high values indicate some of the sampling areas were from recently cleared forests or farms where

Conclusion

In this study PLS and residual regression kriging methods were evaluated together for improved soil property predictions using MIR spectral data using test sites from western Kenya. Although MIR is able to predict fundamental soil properties related to soil matrix like organic and total carbon and other properties related to mineral components like K, there are other macro and micronutrients whose MIR predictions require other calibrations mechanisms to be sought. In our study, Zn, ESP and Al

Acknowledgments

The authors would like to thank the team working on the Western Kenya Integrated Ecosystem Program (WKIEMP) from Kenya Agricultural Research Institute (KARI) and World Agroforestry Centre (ICRAF) (Grant Number TF054250-KE) for supporting this work. In particular we acknowledge the valuable work of Luka Anjeho and his field crew for the sample collection; Elvis Weullow, Dickens Ateku and Jane Mwangi for their help with spectroscopic measurements.

References (51)

  • M. Nocita et al.

    Soil spectroscopy: an alternative to wet chemistry for soil monitoring

  • I.O.A. Odeh et al.

    Further results on prediction of soil properties from terrain attributes: heterotopic cokriging and regression-kriging

    Geoderma

    (1995)
  • M. Odlare et al.

    Near infrared reflectance spectroscopy for assessment of spatial soil variation in an agricultural field

    Geoderma

    (2005)
  • T. Robinson et al.

    Testing the performance of spatial interpolation techniques for mapping soil properties

    Comput Electron Agric

    (2006)
  • S. Schoenholtz et al.

    A review of chemical and physical properties as indicators of forest soil quality: challenges and opportunities

    For Ecol Manage

    (2000)
  • T.-G. Vågen et al.

    Mapping of soil properties and land degradation risk in Africa using MODIS reflectance

    Geoderma

    (2016)
  • R. Viscarra Rossel et al.

    Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties

    Geoderma

    (2006)
  • S. Wold et al.

    PLS-regression: a basic tool of chemometrics

  • R.S. Bivand et al.

    Applied Spatial Data Analysis with R

  • R.S. Bivand et al.

    Applied Spatial Data Analysis with R

  • C.A. Cambardella et al.

    Spatial analysis of soil fertility parameters

    Precis. Agric.

    (1999)
  • Carter

    Soil sampling and methods of analysis

    Measurement

    (2006)
  • R.C. Dalal et al.

    Simultaneous determination of moisture, organic carbon, and total nitrogen by near infrared reflectance spectrophotometry

    Soil Sci Soc Am J

    (1986)
  • E. Delhaize et al.

    Aluminum toxicity and tolerance in plants

    Plant Physiol

    (1995)
  • P.J. Diggle

    Model-based geostatistics by Diggle, P. J. and Ribeiro, P. J

    Biometrics

    (2008)
  • Cited by (0)

    View full text