What weather variables are important in predicting heat-related mortality? A new application of statistical learning methods
Introduction
Heat waves are projected to occur more frequently, more intensely and to last longer as a consequence of climate change (Meehl and Tebaldi, 2004). Epidemiological studies have shown that heat waves are associated with elevated risk of mortality, hospital admissions, heat stroke, heat exhaustion, cardiovascular and respiratory diseases (Kovats and Hajat, 2007). Previous heat-related epidemiological studies have characterized heat or heat waves by using a single temperature metric (e.g., daily mean/minimum/maximum temperature), or a composite index combining temperature and relative humidity, or a more sophisticated index requiring substantial meteorological knowledge (e.g., spatial synoptic classification) (Hajat et al., 2010, Barnett et al., 2010). However, these weather metrics may not characterize human exposures to extreme heat very well since biometeorological studies have shown that human body temperature is related to many weather variables, e.g., temperature, relative humidity, solar radiation, barometric pressure, wind speed, etc. (Steadman, 1979a, Steadman, 1979b, Steadman, 1984). Also, people usually spend majority of their time indoors, e.g., Americans spend 86.9% of their time indoors on average (Klepeis et al., 2001). Some variables (e.g. absolute humidity) penetrate better than others. Moreover, several metrics are typically used for each weather variable mentioned above, e.g., daily mean, minimum, and maximum temperature, and no consensus exists on which measure of temperature has the most influence on mortality. Two likely reasons are that there is no single measure and that using temperature alone is not sufficient to characterize heat exposures. This fact contributes to the difficulty of comparing various studies and inconsistencies in the heat-health associations found in addition to differences in culture, housing and exposure across regions and populations. Identifying which variables are most consistently predictive of health outcomes across multiple cities could aid epidemiologic research. Furthermore, identifying the local weather conditions most predictive of heat-related mortality could inform design of heat wave and heat health warning systems by reducing the number of triggering metrics considered. Such information may guide local public and weather service authorities to more effectively mobilize resources to prevent adverse health effects during hot weather.
A small number of studies have examined the performances of different weather-related exposure metrics in estimating heat–mortality relationships; we describe two here. A multi-city study examined the performance of mean, minimum and maximum temperature with and without humidity, and apparent temperature and the Humidex (a function of temperature and relative humidity) in predicting mortality using mortality and weather data from 107 U.S. cities during 1987–2000 (Barnett et al., 2010). The measure of temperature most associated with mortality varied with city, season and age groups, but these different temperature measures had the same predictive ability, on average. Another multi-city study evaluated maximum temperature, dew point temperature and a few combinations of these two variables in 105 U.S. cities during 1987–2005 (Bobb et al., 2011). It was reported that the best temperature measure varied by city.
All these studies used either temperature predictors or temperature-humidity indices within the regression framework, and did not examine additional weather conditions simultaneously (e.g., absolute humidity and barometric pressure). Also, the generalized linear model (GLM) or generalized additive model (GAM) used in these prior studies does not have the ability to account for high-order interaction among covariates. Our prior work proposed a hybrid clustering method to classify potentially ‘dangerous’ heat based on four daily weather conditions: maximum/minimum temperature and maximum/minimum dew point (Zhang et al., 2012). Yet, even that approach did not take many weather variables into consideration simultaneously. Like studying multi-pollutant mixtures, properly accounting for the multiple weather conditions to which humans are exposed is a challenge for assessing heat-related health effects.
This study aims to evaluate many weather conditions simultaneously and identify the most important weather variables in predicting excess death counts associated with hot weather by evaluating their prediction performance. This analysis takes advantage of a recent advance in statistical learning methods— the random forests approach, and accounts for exposures to multiple weather conditions in a data-driven way. This approach reduces substantial scientific meteorological-related judgments while taking many weather conditions into consideration. It is important to note that this paper is not to demonstrate that random forests are an alternate method to GAM or GLM in heat-related epidemiological studies.
Section snippets
Data sources
This study uses daily mortality data and weather observations from four U.S. cities (Chicago, IL; Detroit, MI; Philadelphia, PA; and Phoenix, AZ) from 1998 to 2006. Death records were obtained from the National Center for Health Statistics. To prepare the data for analysis, we created daily counts of deaths, first for all-cause mortality and then for cause-specific mortality. International Classification of Diseases tenth revision (ICD-10) codes were in use for the period 1998–2006. Daily total
Results
Table 1 shows that, among the four cities, Phoenix had the highest temperature and apparent temperature (average values of daily mean temperature/apparent temperature: 32.3 and 31.5 °C, respectively), and the lowest dew point (average value of daily mean dew point: 8.2 °C) during the summertime in 1998–2006. Phoenix also had the lowest barometric pressure and absolute humidity (average values of daily mean metrics: 970.5×102 Pa and 8.9×10−3 kg m−3) compared to other three cities. Chicago and Detroit
Discussion
The heat epidemiological literature usually uses a single temperature metric or a composition index as a proxy for the complex mixture of weather conditions to which the body is exposed. This study presents a novel multivariate analysis of a mixture of weather conditions and heat-related health effects by applying a robust statistical learning method: the random forests technique. In particular, this analysis ranked the relative importance of each weather condition in predicting the deviations
Conclusions
A multivariate analysis was conducted to investigate the synergistic effects of mixtures of multiple weather variables on heat-related mortality in four US cities using a powerful statistical learning method, random forests. Our investigation showed that, although the importance of ranking of weather variables differed by city and mortality causes, apparent temperature appears to be the most robust predictor for all-cause mortality in four cities, and absolute humidity is on average most
Acknowledgments
The research described in this paper was funded through support of the Graham Environmental Sustainability Institute at the University of Michigan; the U.S. Environmental Protection Agency Science to Achieve Results (STAR) Grant R832752010; the U.S. Centers for Disease Control and Prevention Grant R18 EH 000348 and National Institute for Environmental Health Sciences Grants R01 ES-016932 and R21 ES-020695.
This paper does not necessarily reflect the views of these organizations.
The authors
References (20)
- et al.
What measure of temperature is the best predictor of mortality?
Environ. Res.
(2010) - et al.
Comparing exposure metrics for classifying ‘dangerous heat’ in heat wave and health warning systems
Environ. Int.
(2012) - et al.
A Bayesian model averaging approach for estimating the relative risk of mortality associated with heat waves in 105 U.S. cities
Biometrics
(2011) Random forests
Mach. Learn.
(2001)- et al.
The toxicological evaluation of realistic emissions of source aerosols study: statistical methods
Inhal. Toxicol.
(2011) Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models
(2006)- et al.
Heat-health warning systems: a comparison of the predictive capacity of different approaches to identifying dangerously hot days
Am. J. Public Health
(2010) - et al.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
(2009) - et al.
The National Human Activity Pattern Survey (NHAPS): a resource for assessing exposure to environmental pollutants
J. Expo. Anal. Environ. Epidemiol.
(2001) - et al.
Heat stress and public health: a critical review
Annu. Rev. Public Health
(2007)
Cited by (99)
Machine and deep learning for modelling heat-health relationships
2023, Science of the Total EnvironmentLong-term exposure to summer specific humidity and cardiovascular disease hospitalizations in the US Medicare population
2023, Environment InternationalEstablishment of outdoor thermal comfort index groups for quantifying climate impact on construction accidents
2023, Sustainable Cities and SocietyEnvironmental variable importance for under-five mortality in Malaysia: A random forest approach
2022, Science of the Total EnvironmentEstimating urban spatial temperatures considering anthropogenic heat release factors focusing on the mobility characteristics
2022, Sustainable Cities and SocietyThe impact of humidity on Australia's operational heatwave services
2022, Climate Services