Skip to main content
Top

2021 | OriginalPaper | Chapter

Sibling Regression for Generalized Linear Models

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Field observations form the basis of many scientific studies, especially in ecological and social sciences. Despite efforts to conduct such surveys in a standardized way, observations can be prone to systematic measurement errors. The removal of systematic variability introduced by the observation process, if possible, can greatly increase the value of this data. Existing non-parametric techniques for correcting such errors assume linear additive noise models. This leads to biased estimates when applied to generalized linear models (GLM). We present an approach based on residual functions to address this limitation. We then demonstrate its effectiveness on synthetic data and show it reduces systematic detection variability in moth surveys.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
These properties can be obtained by applying the aforementioned exponential family properties.
 
2
Details in the Appendix.
 
4
The global model is included as a rough guide for the best possible generalization performance, even though it does not solve the task of denoising data within each year.
 
Literature
go back to reference Adams, R., Ji, Y., Wang, X., Saria, S.: Learning models from data with measurement error: tackling underreporting. arXiv:1901.09060 (2019) Adams, R., Ji, Y., Wang, X., Saria, S.: Learning models from data with measurement error: tackling underreporting. arXiv:​1901.​09060 (2019)
go back to reference Athey, S., Imbens, G.: Recursive partitioning for heterogeneous causal effects. Proc. Nat. Acad. Sci. 113, 7353–7360 (2016)MathSciNetCrossRef Athey, S., Imbens, G.: Recursive partitioning for heterogeneous causal effects. Proc. Nat. Acad. Sci. 113, 7353–7360 (2016)MathSciNetCrossRef
go back to reference Bang, H., Robins, J.: Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–973 (2005) Bang, H., Robins, J.: Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–973 (2005)
go back to reference Belsley, D.A., Kuh, E., Welsch, R.E.: Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, vol. 571. John Wiley & Sons, Hoboken (2005) Belsley, D.A., Kuh, E., Welsch, R.E.: Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, vol. 571. John Wiley & Sons, Hoboken (2005)
go back to reference Chouldechova, A., Benavides-Prado, D., Fialko, O., Vaithianathan. R.: A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. In: Proceedings of the 1st Conference on Fairness, Accountability and Transparency, volume 81 of Proceedings of Machine Learning Research, pp. 134–148. PMLR (2018) Chouldechova, A., Benavides-Prado, D., Fialko, O., Vaithianathan. R.: A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. In: Proceedings of the 1st Conference on Fairness, Accountability and Transparency, volume 81 of Proceedings of Machine Learning Research, pp. 134–148. PMLR (2018)
go back to reference Davidson, T., Bhattacharya, D., Weber, I.: Racial bias in hate speech and abusive language detection datasets. In: Workshop on Abusive Language Online (2019) Davidson, T., Bhattacharya, D., Weber, I.: Racial bias in hate speech and abusive language detection datasets. In: Workshop on Abusive Language Online (2019)
go back to reference Formann, A.K., Kohlmann, T.: Latent class analysis in medical research. Statist. Methods Med. Res. 5(2), 179–211 (1996)CrossRef Formann, A.K., Kohlmann, T.: Latent class analysis in medical research. Statist. Methods Med. Res. 5(2), 179–211 (1996)CrossRef
go back to reference Genbäck, M., de Luna, X.: Causal inference accounting for unobserved confounding after outcome regression and doubly robust estimation. Biometrics 75(2), 506–515 (2019) Genbäck, M., de Luna, X.: Causal inference accounting for unobserved confounding after outcome regression and doubly robust estimation. Biometrics 75(2), 506–515 (2019)
go back to reference Horton, N.J., Laird, N.M.: Maximum likelihood analysis of generalized linear models with missing covariates. Statist. Methods Med. Res. 8(1), 37–50 (1999)CrossRef Horton, N.J., Laird, N.M.: Maximum likelihood analysis of generalized linear models with missing covariates. Statist. Methods Med. Res. 8(1), 37–50 (1999)CrossRef
go back to reference Hutchinson, R.A., He, L., Emerson, S.C.: Species distribution modeling of citizen science data as a classification problem with class-conditional noise. In: AAAI, pp. 4516–4523 (2017) Hutchinson, R.A., He, L., Emerson, S.C.: Species distribution modeling of citizen science data as a classification problem with class-conditional noise. In: AAAI, pp. 4516–4523 (2017)
go back to reference Ibrahim, J.G., Weisberg, S.: Incomplete data in generalized linear models with continuous covariates. Australia J. Statist. 34(3), 461–470 (1992) Ibrahim, J.G., Weisberg, S.: Incomplete data in generalized linear models with continuous covariates. Australia J. Statist. 34(3), 461–470 (1992)
go back to reference Ibrahim, J.G., Lipsitz, S.R., Chen, M.-H.: Missing covariates in generalized linear models when the missing data mechanism is non-ignorable. J. R. Statist. Soc. Ser. B (Statist. Methodol.) 61(1), 173–190 (1999) Ibrahim, J.G., Lipsitz, S.R., Chen, M.-H.: Missing covariates in generalized linear models when the missing data mechanism is non-ignorable. J. R. Statist. Soc. Ser. B (Statist. Methodol.) 61(1), 173–190 (1999)
go back to reference Jones, M.P.: Indicator and stratification methods for missing explanatory variables in multiple linear regression. J. Am. statist. Assoc. 91(433), 222–230 (1996)MathSciNetCrossRef Jones, M.P.: Indicator and stratification methods for missing explanatory variables in multiple linear regression. J. Am. statist. Assoc. 91(433), 222–230 (1996)MathSciNetCrossRef
go back to reference Knape, J., Korner-Nievergelt, F.: On assumptions behind estimates of abundance from counts at multiple sites. Methods Ecol. Evol. 7(2), 206–209 (2016)CrossRef Knape, J., Korner-Nievergelt, F.: On assumptions behind estimates of abundance from counts at multiple sites. Methods Ecol. Evol. 7(2), 206–209 (2016)CrossRef
go back to reference Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Technique. MIT Press, Cambridge (2009) Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Technique. MIT Press, Cambridge (2009)
go back to reference Lele, S.R., Moreno, M., Bayne, E.: Dealing with detection error in site occupancy surveys: what can we do with a single survey? J. Plant Ecol. 5(1), 22–31 (2012)CrossRef Lele, S.R., Moreno, M., Bayne, E.: Dealing with detection error in site occupancy surveys: what can we do with a single survey? J. Plant Ecol. 5(1), 22–31 (2012)CrossRef
go back to reference Little, R.J.: Regression with missing x’s: a review. J. Am. Statist. Assoc. 87(420), 1227–1237 (1992) Little, R.J.: Regression with missing x’s: a review. J. Am. Statist. Assoc. 87(420), 1227–1237 (1992)
go back to reference MacKenzie, D.I., Nichols, J.D., Lachman, G.B., Droege, S., Royle, A., Langtimm, C.A.: Estimating site occupancy rates when detection probabilities are less than one. Ecology 83(8), 2248–2255 (2002)CrossRef MacKenzie, D.I., Nichols, J.D., Lachman, G.B., Droege, S., Royle, A., Langtimm, C.A.: Estimating site occupancy rates when detection probabilities are less than one. Ecology 83(8), 2248–2255 (2002)CrossRef
go back to reference Menon, A., van Rooyen, B., Ong, C., Williamson, R.: Learning from corrupted binary labels via class-probability estimation. Journal Machine Learning Research, vol. 16 (2015) Menon, A., van Rooyen, B., Ong, C., Williamson, R.: Learning from corrupted binary labels via class-probability estimation. Journal Machine Learning Research, vol. 16 (2015)
go back to reference Natarajan, N., Dhillon, I., Ravikumar, P., Tewari, A.: Learning with noisy labels. In: Advances in Neural Information Processing Systems (2013) Natarajan, N., Dhillon, I., Ravikumar, P., Tewari, A.: Learning with noisy labels. In: Advances in Neural Information Processing Systems (2013)
go back to reference Nelder, J., Wedderburn, R.: Generalized linear models. J. R. Statist. Soc. 135(3), 370–384 (1972) Nelder, J., Wedderburn, R.: Generalized linear models. J. R. Statist. Soc. 135(3), 370–384 (1972)
go back to reference Olteanu, A., Castillo, C., Diaz, F., Kiciman, E.: Social data: biases, methodological pitfalls, and ethical boundaries. CoRR (2016) Olteanu, A., Castillo, C., Diaz, F., Kiciman, E.: Social data: biases, methodological pitfalls, and ethical boundaries. CoRR (2016)
go back to reference Robins, J., Morgenstern, H.: The foundations of confounding in epidemiology. Comput. Math. Appl. 14, 869–916 (1987)MathSciNetCrossRef Robins, J., Morgenstern, H.: The foundations of confounding in epidemiology. Comput. Math. Appl. 14, 869–916 (1987)MathSciNetCrossRef
go back to reference Royle, J.A.: N-Mixture models for estimating population size from spatially replicated counts. Biometrics 60(1), 108–115 (2004)MathSciNetCrossRef Royle, J.A.: N-Mixture models for estimating population size from spatially replicated counts. Biometrics 60(1), 108–115 (2004)MathSciNetCrossRef
go back to reference Schölkopf, B., et al.: Removing systematic errors for exoplanet search via latent causes. In: Proceedings of the 32nd International Conference on Machine Learning (ICML) (2015) Schölkopf, B., et al.: Removing systematic errors for exoplanet search via latent causes. In: Proceedings of the 32nd International Conference on Machine Learning (ICML) (2015)
go back to reference Sharma, A.: Necessary and probably sufficient test for finding valid instrumental variables. CoRR, abs/1812.01412 (2018) Sharma, A.: Necessary and probably sufficient test for finding valid instrumental variables. CoRR, abs/1812.01412 (2018)
go back to reference Sólymos, P., Lele, S.R.: Revisiting resource selection probability functions and single-visit methods: clarification and extensions. Methods Ecol. Evol. 7(2), 196–205 (2016)CrossRef Sólymos, P., Lele, S.R.: Revisiting resource selection probability functions and single-visit methods: clarification and extensions. Methods Ecol. Evol. 7(2), 196–205 (2016)CrossRef
go back to reference Yu, J., Hutchinson, R.A., Wong, W.-K.: A latent variable model for discovering bird species commonly misidentified by citizen scientists. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014) Yu, J., Hutchinson, R.A., Wong, W.-K.: A latent variable model for discovering bird species commonly misidentified by citizen scientists. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)
go back to reference Zhang, Y., Jenkins, D., Manimaran, S., Johnson, W.: Alternative empirical bayes models for adjusting for batch effects in genomic studies. BMC Bioinformatics, vol. 19 (2018) Zhang, Y., Jenkins, D., Manimaran, S., Johnson, W.: Alternative empirical bayes models for adjusting for batch effects in genomic studies. BMC Bioinformatics, vol. 19 (2018)
Metadata
Title
Sibling Regression for Generalized Linear Models
Authors
Shiv Shankar
Daniel Sheldon
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-86520-7_48

Premium Partner