Skip to main content

2017 | OriginalPaper | Buchkapitel

A Review of Heteroscedasticity Treatment with Gaussian Processes and Quantile Regression Meta-models

verfasst von : Francisco Antunes, Aidan O’Sullivan, Filipe Rodrigues, Francisco Pereira

Erschienen in: Seeing Cities Through Big Data

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

For regression problems, the general practice is to consider a constant variance of the error term across all data. This aims to simplify an often complicated model and relies on the assumption that this error is independent of the input variables. This property is known as homoscedasticity. On the other hand, in the real world, this is often a naive assumption, as we are rarely able to exhaustively include all true explanatory variables for a regression. While Big Data is bringing new opportunities for regression applications, ignoring this limitation may lead to biased estimators and inaccurate confidence and prediction intervals.
This paper aims to study the treatment of non-constant variance in regression models, also known as heteroscedasticity. We apply two methodologies: integration of conditional variance within the regression model itself; treat the regression model as a black box and use a meta-model that analyzes the error separately. We compare the performance of both approaches using two heteroscedastic data sets.
Although accounting for heteroscedasticity in data increases the complexity of the models used, we show that it can greatly improve the quality of the predictions, and more importantly, it can provide a proper notion of uncertainty or “confidence” associated with those predictions. We also discuss the feasibility of the solutions in a Big Data context.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bishop CM (2006) Pattern recognition and machine learning. Springer, New York Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Zurück zum Zitat Boukouvalas A, Barillec R, Cornford D (2012) Gaussian process quantile regression using expectation propagation. In: Proceedings of the 29th international conference on machine learning (ICML-12), pp 1695–1702 Boukouvalas A, Barillec R, Cornford D (2012) Gaussian process quantile regression using expectation propagation. In: Proceedings of the 29th international conference on machine learning (ICML-12), pp 1695–1702
Zurück zum Zitat Breusch TS, Pagan AR (1979) A simple test for heteroscedasticity and random coefficient variation. Econometrica 47(5):1287–1294CrossRef Breusch TS, Pagan AR (1979) A simple test for heteroscedasticity and random coefficient variation. Econometrica 47(5):1287–1294CrossRef
Zurück zum Zitat Chen C, Hu J, Meng T, Zhang Y (2011) Short-time traffic flow prediction with ARIMA-GARCH model. In: Intelligent vehicles symposium (IV), IEEE, pp 607–612 Chen C, Hu J, Meng T, Zhang Y (2011) Short-time traffic flow prediction with ARIMA-GARCH model. In: Intelligent vehicles symposium (IV), IEEE, pp 607–612
Zurück zum Zitat Chipman JS (2011) International encyclopedia of statistical science. Springer, Berlin, pp 577–582CrossRef Chipman JS (2011) International encyclopedia of statistical science. Springer, Berlin, pp 577–582CrossRef
Zurück zum Zitat Cook RD, Weisberg S (1983) Diagnostics for heteroscedasticity in regression. Biometrika 70(1):1–10CrossRef Cook RD, Weisberg S (1983) Diagnostics for heteroscedasticity in regression. Biometrika 70(1):1–10CrossRef
Zurück zum Zitat Engle RF (1982) Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50(4):987–1007CrossRef Engle RF (1982) Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50(4):987–1007CrossRef
Zurück zum Zitat Fargas JA, Ben-Akiva ME, Pereira FC (2014) Prediction interval modeling using gaussian process quantile regression. Master’s Thesis, MIT, pp 1–65 Fargas JA, Ben-Akiva ME, Pereira FC (2014) Prediction interval modeling using gaussian process quantile regression. Master’s Thesis, MIT, pp 1–65
Zurück zum Zitat Fox CW, Roberts SJ (2012) A tutorial on variational Bayesian inference. Artif Intell Rev 38(2):85–95CrossRef Fox CW, Roberts SJ (2012) A tutorial on variational Bayesian inference. Artif Intell Rev 38(2):85–95CrossRef
Zurück zum Zitat Goldberg P, Williams C, Bishop C (1998) Regression with input-dependent noise: a Gaussian process treatment. Adv Neural Inf Process Syst 10:493–499 Goldberg P, Williams C, Bishop C (1998) Regression with input-dependent noise: a Gaussian process treatment. Adv Neural Inf Process Syst 10:493–499
Zurück zum Zitat Goldfeld SM, Quandt RE (1965) Some tests for homoscedasticity. J Am Stat Assoc 60:539–547CrossRef Goldfeld SM, Quandt RE (1965) Some tests for homoscedasticity. J Am Stat Assoc 60:539–547CrossRef
Zurück zum Zitat Gredilla LG, Titsias MK (2012) Variational heteroscedastic Gaussian process regression. In: 28th international conference on machine learning Gredilla LG, Titsias MK (2012) Variational heteroscedastic Gaussian process regression. In: 28th international conference on machine learning
Zurück zum Zitat Hensman J, Fusi N, Lawrence ND (2013) Gaussian processes for big data. In: Proceedings of the 29th conference annual conference on uncertainty in artificial intelligence (UAI-13), pp 282–290 Hensman J, Fusi N, Lawrence ND (2013) Gaussian processes for big data. In: Proceedings of the 29th conference annual conference on uncertainty in artificial intelligence (UAI-13), pp 282–290
Zurück zum Zitat Kersting K, Plagemann C, Pfaff P, Burgard W (2007) Most likely heteroscedas-tic Gaussian process regression. In: Proceedings of the International Machine Learning Society, pp 393–400 Kersting K, Plagemann C, Pfaff P, Burgard W (2007) Most likely heteroscedas-tic Gaussian process regression. In: Proceedings of the International Machine Learning Society, pp 393–400
Zurück zum Zitat Khosravi A, Mazloumi E, Nahavandi S, Creighton D, Van Lint JWC (2011) Prediction intervals to account for uncertainties in travel time prediction. IEEE Trans Intell Transp Syst 12(2):537–547CrossRef Khosravi A, Mazloumi E, Nahavandi S, Creighton D, Van Lint JWC (2011) Prediction intervals to account for uncertainties in travel time prediction. IEEE Trans Intell Transp Syst 12(2):537–547CrossRef
Zurück zum Zitat Koenker R, Hallock KF (2001) Quantile regression. J Econ Perspect 15(4):143–156CrossRef Koenker R, Hallock KF (2001) Quantile regression. J Econ Perspect 15(4):143–156CrossRef
Zurück zum Zitat Lee YS, Scholtes S (2014) Empirical prediction intervals revisited. Int J Forecast 30(2):217–234CrossRef Lee YS, Scholtes S (2014) Empirical prediction intervals revisited. Int J Forecast 30(2):217–234CrossRef
Zurück zum Zitat Leslie DS, Kohn R, Nott DJ (2007) A general approach to heteroscedastic linear regression. Stat Comput 17(2):131–146CrossRef Leslie DS, Kohn R, Nott DJ (2007) A general approach to heteroscedastic linear regression. Stat Comput 17(2):131–146CrossRef
Zurück zum Zitat Long JS, Ervin LH (1998) Correcting for heteroscedasticity with heteroscedasticity-consistent standard errors in the linear regression model: small sample considerations, Working Paper, Department of Statistics, Indiana University Long JS, Ervin LH (1998) Correcting for heteroscedasticity with heteroscedasticity-consistent standard errors in the linear regression model: small sample considerations, Working Paper, Department of Statistics, Indiana University
Zurück zum Zitat MacKinnon JG (2012) Thirty years of heteroskedasticity-robust inference, Working Papers, Queen’s University, Department of Economics MacKinnon JG (2012) Thirty years of heteroskedasticity-robust inference, Working Papers, Queen’s University, Department of Economics
Zurück zum Zitat MacKinnon JG, White H (1983) Some heteroskedasticity consistent covariance matrix estimators with improved finite sample properties, Working Papers, Queen’s University, Department of Economics MacKinnon JG, White H (1983) Some heteroskedasticity consistent covariance matrix estimators with improved finite sample properties, Working Papers, Queen’s University, Department of Economics
Zurück zum Zitat Osborne J, Waters E (2002) Four assumptions of multiple regression that researchers should always test. Pract Assess Res Eval 8(2):1–9 Osborne J, Waters E (2002) Four assumptions of multiple regression that researchers should always test. Pract Assess Res Eval 8(2):1–9
Zurück zum Zitat Pereira FC, Antoniou C, Fargas C, Ben-Akiva M (2014) A meta-model for estimating error bounds in real-traffic prediction systems. IEEE Trans Intell Trans Syst 15:1–13CrossRef Pereira FC, Antoniou C, Fargas C, Ben-Akiva M (2014) A meta-model for estimating error bounds in real-traffic prediction systems. IEEE Trans Intell Trans Syst 15:1–13CrossRef
Zurück zum Zitat Quinonero-Candela J, Rasmussen CE, Williams CKI (2007) Approximation methods for Gaussian process regression, Large-scale kernel machines, pp 203–223 Quinonero-Candela J, Rasmussen CE, Williams CKI (2007) Approximation methods for Gaussian process regression, Large-scale kernel machines, pp 203–223
Zurück zum Zitat Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press, Cambridge, MA Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press, Cambridge, MA
Zurück zum Zitat Robinson PM (1987) Asymptotically efficient estimation in the presence of heteroskedasticity of unknown form. Econometrica 55(4):875–891CrossRef Robinson PM (1987) Asymptotically efficient estimation in the presence of heteroskedasticity of unknown form. Econometrica 55(4):875–891CrossRef
Zurück zum Zitat Silverman BW (1985) Some aspect of the spline smoothing approach to non-parametric regression curve fitting. J R Stat Soc 47(1):1–52 Silverman BW (1985) Some aspect of the spline smoothing approach to non-parametric regression curve fitting. J R Stat Soc 47(1):1–52
Zurück zum Zitat Snelson E, Ghahramani Z (2007) Local and global sparse Gaussian process approximations. In: International conference on artificial intelligence and statistics, pp 524–531 Snelson E, Ghahramani Z (2007) Local and global sparse Gaussian process approximations. In: International conference on artificial intelligence and statistics, pp 524–531
Zurück zum Zitat Taylor JW, Bunn DW (1999) A quantile regression approach to generating prediction intervals. Manag Sci 45(2):225–237CrossRef Taylor JW, Bunn DW (1999) A quantile regression approach to generating prediction intervals. Manag Sci 45(2):225–237CrossRef
Zurück zum Zitat Tsekeris T, Stathopoulos A (2006) Real-time traffic volatility forecasting in urban arterial networks. Transp Res Rec 1964:146–156CrossRef Tsekeris T, Stathopoulos A (2006) Real-time traffic volatility forecasting in urban arterial networks. Transp Res Rec 1964:146–156CrossRef
Zurück zum Zitat Tzikas DG, Likas AC, Galatsanos NP (2008) The variational approximation for Bayesian inference. IEEE Signal Process Mag 25(6):131–146CrossRef Tzikas DG, Likas AC, Galatsanos NP (2008) The variational approximation for Bayesian inference. IEEE Signal Process Mag 25(6):131–146CrossRef
Zurück zum Zitat White H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48(4):817–838CrossRef White H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48(4):817–838CrossRef
Zurück zum Zitat Zeileis A, Wien W (2004) Econometric computing with HC and HAC covariance matrix estimators. J Stat Softw 11(10):1–17CrossRef Zeileis A, Wien W (2004) Econometric computing with HC and HAC covariance matrix estimators. J Stat Softw 11(10):1–17CrossRef
Zurück zum Zitat Zhou B, He D, Sun Z (2006) Traffic predictability based on ARIMA/GARCH model. In: 2nd conference on next generation internet design and engineering, pp 207–214 Zhou B, He D, Sun Z (2006) Traffic predictability based on ARIMA/GARCH model. In: 2nd conference on next generation internet design and engineering, pp 207–214
Metadaten
Titel
A Review of Heteroscedasticity Treatment with Gaussian Processes and Quantile Regression Meta-models
verfasst von
Francisco Antunes
Aidan O’Sullivan
Filipe Rodrigues
Francisco Pereira
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-40902-3_9

    Premium Partner