Elsevier

Environmental Research

Volume 132, July 2014, Pages 350-359
Environmental Research

What weather variables are important in predicting heat-related mortality? A new application of statistical learning methods

https://doi.org/10.1016/j.envres.2014.04.004Get rights and content

Highlights

  • Apparent temperature is a robust parameter for activating heat alerts.

  • Absolute humidity should be included in future heat-health studies.

  • Random forests can be used to guide the choice of weather variables in health studies.

Abstract

Hot weather increases risk of mortality. Previous studies used different sets of weather variables to characterize heat stress, resulting in variation in heat–mortality associations depending on the metric used. We employed a statistical learning method – random forests – to examine which of the various weather variables had the greatest impact on heat-related mortality. We compiled a summertime daily weather and mortality counts dataset from four U.S. cities (Chicago, IL; Detroit, MI; Philadelphia, PA; and Phoenix, AZ) from 1998 to 2006. A variety of weather variables were ranked in predicting deviation from typical daily all-cause and cause-specific death counts. Ranks of weather variables varied with city and health outcome. Apparent temperature appeared to be the most important predictor of heat-related mortality for all-cause mortality. Absolute humidity was, on average, most frequently selected as one of the top variables for all-cause mortality and seven cause-specific mortality categories. Our analysis affirms that apparent temperature is a reasonable variable for activating heat alerts and warnings, which are commonly based on predictions of total mortality in next few days. Additionally, absolute humidity should be included in future heat-health studies. Finally, random forests can be used to guide the choice of weather variables in heat epidemiology studies.

Introduction

Heat waves are projected to occur more frequently, more intensely and to last longer as a consequence of climate change (Meehl and Tebaldi, 2004). Epidemiological studies have shown that heat waves are associated with elevated risk of mortality, hospital admissions, heat stroke, heat exhaustion, cardiovascular and respiratory diseases (Kovats and Hajat, 2007). Previous heat-related epidemiological studies have characterized heat or heat waves by using a single temperature metric (e.g., daily mean/minimum/maximum temperature), or a composite index combining temperature and relative humidity, or a more sophisticated index requiring substantial meteorological knowledge (e.g., spatial synoptic classification) (Hajat et al., 2010, Barnett et al., 2010). However, these weather metrics may not characterize human exposures to extreme heat very well since biometeorological studies have shown that human body temperature is related to many weather variables, e.g., temperature, relative humidity, solar radiation, barometric pressure, wind speed, etc. (Steadman, 1979a, Steadman, 1979b, Steadman, 1984). Also, people usually spend majority of their time indoors, e.g., Americans spend 86.9% of their time indoors on average (Klepeis et al., 2001). Some variables (e.g. absolute humidity) penetrate better than others. Moreover, several metrics are typically used for each weather variable mentioned above, e.g., daily mean, minimum, and maximum temperature, and no consensus exists on which measure of temperature has the most influence on mortality. Two likely reasons are that there is no single measure and that using temperature alone is not sufficient to characterize heat exposures. This fact contributes to the difficulty of comparing various studies and inconsistencies in the heat-health associations found in addition to differences in culture, housing and exposure across regions and populations. Identifying which variables are most consistently predictive of health outcomes across multiple cities could aid epidemiologic research. Furthermore, identifying the local weather conditions most predictive of heat-related mortality could inform design of heat wave and heat health warning systems by reducing the number of triggering metrics considered. Such information may guide local public and weather service authorities to more effectively mobilize resources to prevent adverse health effects during hot weather.

A small number of studies have examined the performances of different weather-related exposure metrics in estimating heat–mortality relationships; we describe two here. A multi-city study examined the performance of mean, minimum and maximum temperature with and without humidity, and apparent temperature and the Humidex (a function of temperature and relative humidity) in predicting mortality using mortality and weather data from 107 U.S. cities during 1987–2000 (Barnett et al., 2010). The measure of temperature most associated with mortality varied with city, season and age groups, but these different temperature measures had the same predictive ability, on average. Another multi-city study evaluated maximum temperature, dew point temperature and a few combinations of these two variables in 105 U.S. cities during 1987–2005 (Bobb et al., 2011). It was reported that the best temperature measure varied by city.

All these studies used either temperature predictors or temperature-humidity indices within the regression framework, and did not examine additional weather conditions simultaneously (e.g., absolute humidity and barometric pressure). Also, the generalized linear model (GLM) or generalized additive model (GAM) used in these prior studies does not have the ability to account for high-order interaction among covariates. Our prior work proposed a hybrid clustering method to classify potentially ‘dangerous’ heat based on four daily weather conditions: maximum/minimum temperature and maximum/minimum dew point (Zhang et al., 2012). Yet, even that approach did not take many weather variables into consideration simultaneously. Like studying multi-pollutant mixtures, properly accounting for the multiple weather conditions to which humans are exposed is a challenge for assessing heat-related health effects.

This study aims to evaluate many weather conditions simultaneously and identify the most important weather variables in predicting excess death counts associated with hot weather by evaluating their prediction performance. This analysis takes advantage of a recent advance in statistical learning methods— the random forests approach, and accounts for exposures to multiple weather conditions in a data-driven way. This approach reduces substantial scientific meteorological-related judgments while taking many weather conditions into consideration. It is important to note that this paper is not to demonstrate that random forests are an alternate method to GAM or GLM in heat-related epidemiological studies.

Section snippets

Data sources

This study uses daily mortality data and weather observations from four U.S. cities (Chicago, IL; Detroit, MI; Philadelphia, PA; and Phoenix, AZ) from 1998 to 2006. Death records were obtained from the National Center for Health Statistics. To prepare the data for analysis, we created daily counts of deaths, first for all-cause mortality and then for cause-specific mortality. International Classification of Diseases tenth revision (ICD-10) codes were in use for the period 1998–2006. Daily total

Results

Table 1 shows that, among the four cities, Phoenix had the highest temperature and apparent temperature (average values of daily mean temperature/apparent temperature: 32.3 and 31.5 °C, respectively), and the lowest dew point (average value of daily mean dew point: 8.2 °C) during the summertime in 1998–2006. Phoenix also had the lowest barometric pressure and absolute humidity (average values of daily mean metrics: 970.5×102 Pa and 8.9×10−3 kg m−3) compared to other three cities. Chicago and Detroit

Discussion

The heat epidemiological literature usually uses a single temperature metric or a composition index as a proxy for the complex mixture of weather conditions to which the body is exposed. This study presents a novel multivariate analysis of a mixture of weather conditions and heat-related health effects by applying a robust statistical learning method: the random forests technique. In particular, this analysis ranked the relative importance of each weather condition in predicting the deviations

Conclusions

A multivariate analysis was conducted to investigate the synergistic effects of mixtures of multiple weather variables on heat-related mortality in four US cities using a powerful statistical learning method, random forests. Our investigation showed that, although the importance of ranking of weather variables differed by city and mortality causes, apparent temperature appears to be the most robust predictor for all-cause mortality in four cities, and absolute humidity is on average most

Acknowledgments

The research described in this paper was funded through support of the Graham Environmental Sustainability Institute at the University of Michigan; the U.S. Environmental Protection Agency Science to Achieve Results (STAR) Grant R832752010; the U.S. Centers for Disease Control and Prevention Grant R18 EH 000348 and National Institute for Environmental Health Sciences Grants R01 ES-016932 and R21 ES-020695.

This paper does not necessarily reflect the views of these organizations.

The authors

References (20)

  • A.G. Barnett et al.

    What measure of temperature is the best predictor of mortality?

    Environ. Res.

    (2010)
  • K. Zhang et al.

    Comparing exposure metrics for classifying ‘dangerous heat’ in heat wave and health warning systems

    Environ. Int.

    (2012)
  • J.F. Bobb et al.

    A Bayesian model averaging approach for estimating the relative risk of mortality associated with heat waves in 105 U.S. cities

    Biometrics

    (2011)
  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • B.A. Coull et al.

    The toxicological evaluation of realistic emissions of source aerosols study: statistical methods

    Inhal. Toxicol.

    (2011)
  • J.J. Faraway

    Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models

    (2006)
  • S. Hajat et al.

    Heat-health warning systems: a comparison of the predictive capacity of different approaches to identifying dangerously hot days

    Am. J. Public Health

    (2010)
  • T. Hastie et al.

    The Elements of Statistical Learning: Data Mining, Inference, and Prediction

    (2009)
  • N.E. Klepeis et al.

    The National Human Activity Pattern Survey (NHAPS): a resource for assessing exposure to environmental pollutants

    J. Expo. Anal. Environ. Epidemiol.

    (2001)
  • R.S. Kovats et al.

    Heat stress and public health: a critical review

    Annu. Rev. Public Health

    (2007)
There are more references available in the full text version of this article.

Cited by (99)

View all citing articles on Scopus
View full text