Skip to main content

2020 | Buch

Statistical Methods for Global Health and Epidemiology

Principles, Methods and Applications

insite
SUCHEN

Über dieses Buch

This book examines statistical methods and models used in the fields of global health and epidemiology. It includes methods such as innovative probability sampling, data harmonization and encryption, and advanced descriptive, analytical and monitory methods. Program codes using R are included as well as real data examples. Contemporary global health and epidemiology involves a myriad of medical and health challenges, including inequality of treatment, the HIV/AIDS epidemic and its subsequent control, the flu, cancer, tobacco control, drug use, and environmental pollution. In addition to its vast scales and telescopic perspective; addressing global health concerns often involves examining resource-limited populations with large geographic, socioeconomic diversities. Therefore, advancing global health requires new epidemiological design, new data, and new methods for sampling, data processing, and statistical analysis. This book provides global health researchers with methods that will enable access to and utilization of existing data. Featuring contributions from both epidemiological and biostatistical scholars, this book is a practical resource for researchers, practitioners, and students in solving global health problems in research, education, training, and consultation.

Inhaltsverzeichnis

Frontmatter
17. Correction to: Statistical Methods for Global Health and Epidemiology
Xinguang Chen, (Din) Ding-Geng Chen

Data Acquisition and Management

Frontmatter
Chapter 1. Existent Sources of Data for Global Health and Epidemiology
Abstract
Many research questions in global health and epidemiology can be addressed using data from existing sources over the world. In the first chapter of this book, we provide a summary of data sources most commonly accessed for global health and epidemiological research. We focus on sources that provide relevant data by country on geographic area, population size and age composition, population mobility, socioeconomic status, cultural and legal characteristics, and morbidity and mortality. Specific examples from the World Bank, the World Health Organization, other global organizations, and well-known large-scale cross-country survey studies are emphasized.
Xinguang Chen, Bin Yu
Chapter 2. Satellite Imagery Data for Global Health and Epidemiology
Abstract
In this chapter, we describe commonly accessible sources of satellite imagery data free of charge for research. Exemplary data include lightening for development level, PM2.5 for pollution, temperature, recitation deforestation. We cover the sources of such data, methods to access, and utilization of them as a measure of the macro-environment in research, overall and zoom in down to specific country, district and community/neighborhood levels. Examples are used to illustrate the process, including R codes, screen shots, and tables.
Hao Chen, Keerati Ponpetch
Chapter 3. GIS/GPS-Assisted Probability Sampling in Resource-Limited Settings
Abstract
It is rather challenge to draw probability samples for epidemiology and global health research that involves specific geographic area and resource-limited countries and regions. Based on authors’ published work, in this chapter we introduce an innovative probability sampling method using the GIS technology for probability spatial sampling, the GIS and GPS technologies to connect the sampled geographic area with residential houses and residents, and the random digits method to select individual participants. With this method, data requirement and cost are minimized while implementation can be achieve in a short period. Most part of the method has been tested and used in a developing country to sample rural residents, rural-to-urban migrants and urban residents.
Xinguang Chen, Hui Hu
Chapter 4. Construal Level Theory Supported Method for Sensitive Topics: Applications in Three Different Populations
Abstract
Social desirability bias is a major threat to data quality for survey studies, particularly studies involving sensitive questions, such as age, income, sexual behaviors, and drug use. In this chapter, we introduced a construal level theory (CLT)-based method we devised to reduce social desirability bias. Construals are our mental constructions of the universe organized in hierarchies along with spatiotemporal and psychosocial distances, with self, here, and now as the reference. Answering sensitive question regarding self is often executed at low construal levels subjected to contextual factors. In this case, the respondent tend to edit the answer to make it socially desirable either to avoid penalty or to enhance reward. In contrast, answering sensitive questions for others is often executed at high construal levels, less likely to subject to contextual factors but more dependent on one’s own knowledge, attitudes and beliefs. CLT-based method is a technique based on this theory by asking participants to answer the same questions for 2–3 socially distant others. In this study, we reported our work on building the method through three studies, one with data collected from college students in the US, two with data collected in China, including one sample of urban residents and another sample of rural residents. Four questions (reading newspaper, engaging in physical activity, frequent of sexual intercourse and attitudes toward homosexuality) were used in the college student study conducted in the US; the Brief Sexual Openness Scale (BSOS) was used in the two studies conducted in China. The use of the method and future research are also recommended.
Yan Wang, Xinguang Chen
Chapter 5. Integrative Data Analysis and the Study of Global Health
Abstract
In this chapter, we introduce Integrative Data Analysis (IDA) for use in the field of Global Health. IDA is a novel framework for simultaneous analysis of individual-level data pooled from multiple studies. This framework has been applied to address questions about substance use, cancer, HIV, and rare diseases from studies around the world. Advantages of this approach include efficiency (i.e., reuse of extant data), statistical power (i.e., large combined sample sizes), the potential to address questions not answerable by a single contributing study (e.g., combining studies with overlapping ethnicities to examine cross-cultural differences or age periods to examine longer periods of development), and the opportunity to test replicability of effects across studies in the pooled analysis. We describe the IDA methodological framework, emphasizing unique issues in measurement harmonization and hypothesis testing. We illustrate the application of the method using examples. We also describe emerging tools to handle specific harmonization challenges. Finally, we consider the potential utility of IDA in Global Health and epidemiological research.
Andrea M. Hussong, Veronica T. Cole, Patrick J. Curran, Daniel J. Bauer, Nisha C. Gottfredson
Chapter 6. Introduction to Privacy-Preserving Data Collection and Sharing Methods for Global Health Research
Abstract
In global health and epidemiological research, collecting and sharing data for sensitive topics, such as income, age, sex partners, drug use, HIV infection, stigma, and religion, has been a long-standing challenge. In this chapter, we introduce a range of methods for privacy-preserving data collection and sharing. After a comprehensive review of the classic randomized response techniques and related extensions, we present a new privacy-preserving data collection method capitalizing on the matrix masking theory. In addition to an introduction to the theory and principles, examples are used to illustrate the procedures in applying the method in practice.
Guanhong Miao, Hanzhi Gao, Yan Wang, Samuel S. Wu

Essential Statistical Methods

Frontmatter
Chapter 7. Geographic Mapping for Global Health Research
Abstract
Geographic mapping represents one of the most efficient approaches for students and researchers to establish a global perspective on a specific medical, health and behavioral issues. In this chapter, we introduce the application of the free software R program packages available in geographic mapping. We demonstrate various mapping methods and R program codes using country-specific data for population and population density as examples.
Bin Yu
Chapter 8. A 4D Indicator System of Count, P Rate, G Rate and PG Rate for Epidemiology and Global Health
Abstract
How to end the HIV/AIDS epidemic is a typical global health question since the impact of HIV/AIDS is global and it cannot be ended without collaborative global effort. In this chapter, a new measurement system is introduced to inform HIV/AIDS control cross the globe. All countries with data available on area size, total population and total number of persons living with HIV (PLWH) were included, yielding a sample of 148 countries. Four indicators, including the total count, population-based p rate, geographic area-based g rate and population and geographic area-based pg rate were used as a 4D system to describe the global HIV epidemic. The total PLWH count provided data informing resource allocation for individual countries to improve HIV/AIDS care; and the top five countries with highest PLWH count were South Africa, Nigeria, India, Kenya, and Mozambique. Information from the remaining three indicators provided a global risk profile of the HIV epidemic, supporting HIV/AIDS prevention programming strategies. Five countries with highest p rates were Swaziland, Botswana, Lesotho, South Africa, and Zimbabwe; five countries with highest g rates were Swaziland, Malawi, Lesotho, Rwanda, and Uganda; and five countries with highest pg rates were Barbados, Swaziland, Lesotho, Malta, and Mauritius. According to pg rates, two HIV hotspots (south and middle Africa and Caribbean region) and one HIV belt across Euro-Asian were identified. In addition to HIV/AIDS, the 4D measurement system can be used to describe morbidity and mortality for many diseases across the globe. We recommend the use of this measurement system in research to address significant global health and epidemiologic issues.
Xinguang Chen, Bin Yu, (Din) Ding-Geng Chen
Chapter 9. Historical Trends in Mortality Risk over 100-Year Period in China with Recent Data: An Innovative Application of Age-Period-Cohort Modeling
Abstract
History is the best teacher. It is challenging to learn from history to address contemporary problems in the field of global health and epidemiology. A first challenge is the lack of data to examine medical and health problems in the past. Such challenge is more obvious in developing countries, such as China where no data were collected due to limited resources, wars, plaques, and natural disasters. Theoretical analysis and empirical data from age-period-cohort modeling indicate that recent data by age of a population contains information about the past. For example, mortality rate for people aged 90 in 1990 contains information about mortality risk in 1900 when they were born. Therefore, information contained in mortality by age functions like digital fossil; and the age-period-cohort modeling provides a tool to extract the information from the fossil. In this study we examined the mega-trends in mortality risk for China since 1901 when the 2000-year long feudalism was throughout, to 1949 when the independence was established, and up to 1980s when rapid economic growth emerged. We achieved the goal by analyzing data collected in recent years from 1990 to 2010 with the age-period-cohort modeling method and the intrinsic estimator. Findings of the study suggest the existence of four Sunny Periods and three Cloudy Periods during 1901–2010. These Sunny and Clouding Periods were in close coincident with significant social, cultural, political, economic events in the history of China. Findings of the study revealed that the highest mortality risk was associated with foreign invasion, the second highest risk was associated with civil wars, the third highest risk was associated with economic reform; the lower mortality risk was associated with the post-war period and the establishment of new China, and the longest period with reduced risk of mortality was associated with the Cultural Revolution Period. In conclusion, age-period-cohort modeling provide a powerful tool for researchers to examine medical and health issues in the past with more recent data to advance epidemiology and global health. We highly recommend the use of this method in research in both developed and developing countries.
Xinguang Chen
Chapter 10. Moore-Penrose Generalized-Inverse Solution to APC Modeling for Historical Epidemiology and Global Health
Abstract
Age-period-cohort (APC) modeling provides a powerful method for global health research in resource-limited countries and regions with limited data. This method enables researchers to investigate medical and health conditions and influential factors, potentially up to 100+ year in the past with data collected in recent decades. Although widely used in research to examine mortality of various diseases, suicide and quality of life, an APC model is mathematically nonidentifiable. This is because the conlinearity among the three time-related predictors (age, period, and birth cohort). Various methods are reported to deal with this identifiability issue, particularly the intrinsic estimator (IE) that has been most accepted. IE method has been developed through much effort, including mathematical proof, simulations and empirical testing. In this chapter, we introduce the application of Moor-Penrose generalized inverse matrix method (MP) in handling the nonidentifiable issue. Relative to the IE method, the MP method is straight-forward to understand and easy to implement. We also show that mathematically MP method is equivalent to IE method.
(Din) Ding-Geng Chen, Xinguang Chen, Huaizhen Qin
Chapter 11. Mixed Effects Modeling of Multi-site Data-Health Behaviors Among Adolescents in Hong Kong, Macao, Taipei, Wuhan and Zhuhai
Abstract
An important approach for global health and epidemiology research is to collect and use data from multiple study-sites within one or between various cultures to address high impact medical and health issues. When multisite data are used, it is challenge to deal with data heterogeneity, since such heterogeneity cannot be efficiently addressed using conventional multivariate regression methods. In this chapter, we describe application of mixed effects modeling, a statistical method designated for analyzing longitudinal trials, in analyzing cross-sectional multisite data. We demonstrate the application using data collected among middle and high school students in five Chinese cities (n = 13,950), including Hong Kong, Macau, Taipei, Wuhan, and Zhuhai. Data for lifestyle (sedentary, dietary, physical activity) and addictive behaviors (cigarette smoking, alcohol consumption and participation in gamble) were analyzed as outcomes. Factors at the individual and contextual level, as well as interventions between the two were associated with the outcome variables. Findings of this study indicate that although sharing a similar mainstream Chinese culture, these adolescent participants were significantly different from each other with regard to engagement in health-related behavior and the differences were associated with both individual- and contextual-level factors.
Xinguang Chen
Chapter 12. Geographically Weighted Regression
Abstract
The family of geographically weighted regression (GWR) methods has seen its wide applications in a variety of fields including ecology, agriculture, social science, and public health. The popularity of these methods stems from their ability to depict spatial heterogeneity, easy interpretation of outputs, and the availability of user-friendly software tools. These methods have evolved extensively in the recent decade to address the challenges of multicolinearity in predictors and variable selection in the era of big data, and a comprehensive review is needed to raise both awareness and practical validation of these progresses. Equally needed is an up-to-date introduction to the associated software packages, especially those developed on the popular statistical software platform R. This chapter provides a systematic overview of the foundation and recent development of the methodology of GWR, with a balance between rigidity and practicality. Via a case study, this chapter also offers step-by-step guideline to the use of three major GWR-dedicated R packages, including their facilities for multicolinearity diagnosis and variable selection. We hope a broadened user group of these methods will in turn motivate more methodological advances and improve the contribution of GWR methods to global health.
Yang Yang

Advanced Statistical Methods

Frontmatter
Chapter 13. Bayesian Spatial-Temporal Disease Modeling with Application to Malaria
Abstract
Background: Malaria remains a major public health challenge in Nigeria. Considerable effort has been made to reduce the prevalence and impact of the disease. The National Malaria Control Programme conducted a nationally representative Malaria Indicator Survey (MIS) within the malaria peak transmission season in 2008, 2010, 2013 and 2015 which comprises of all the six region of Nigeria. In this study, the spatial and temporal modeling of malaria risk within each region of Nigeria were studied using the MIS survey data. Methods: This study used data obtained from the Nigeria demographic health survey (NDHS) database to assess models; data were collected in 37 states between 2008, 2010, 2013 and 2015. We examine associations between malaria risk and socio-demographic factors using 16 Bayesian Poisson spatial-temporal models that incorporate spatial and temporal autocorrelations. The optimum model selected according to the deviance information criterion and effective number of parameters in the Bayesian paradigm. The models were implemented in R-INLA package. Results: The model included spatially uncorrelated heterogeneity, temporally correlated random-walk autocorrelation, and spatial temporal interaction model had small deviance information criteria. This model was the best in examining the association between malaria risk and socio-demographic factors using NDHS. The relationship between malaria risk and socio-demographic factor is statistically significant. Conclusion: The spatial-temporal interaction was statistically meaningful and the prevalence of malaria was influenced by the time and space interaction effect. Wealth index and place of residence have influence on malaria. To further reduce malaria burden, current tools should be supplemented by socio-demographic development.
Ropo Ebenezer Ogunsakin, (Din) Ding-Geng Chen
Chapter 14. BCEWMA: A New and Effective Biosurveillance System for Disease Outbreak Detection
Abstract
Disease outbreaks need to be detected in a timely manner for effective disease control. For disease surveillance, conventional statistical process control charts are often included in public health surveillance systems, without taking into account the complicated structure of the disease incidence data and/or additional covariate information. This chapter presents a novel prospective disease surveillance system, named BCEWMA (Biosurveillance via Covariate-Assisted Exponentially Weighted Moving Average Control Chart), which can accommodate seasonality and arbitrary distribution of disease incidence data. Methodologically, BCEWMA is based on the widely used exponentially weighted moving average control chart, incorporating useful information in covariates. This new surveillance system is applied to two real disease incidence datasets: one regarding the hand, foot and mouth disease in Sichuan province of China and the other about the influenza-like-illness in Florida. These real-data examples show the reliability and effectiveness of BCEWMA in disease outbreak detection.
Kai Yang, Peihua Qiu
Chapter 15. Cusp Catastrophe Regression Analysis of Testosterone in Bifurcating the Age-Related Changes in PSA, a Biomarker for Prostate Cancer
Abstract
Advancing cancer research needs to adapt nonlinear dynamic systems (NDS) approach in addition to the linear dynamic systems (LDS). Dynamic changes in prostate-specific antigen (PSA), a biomarker of prostate cancer showed NDS character but this character has not been examined in literature. In this study, we examine PSA guided by a NDS paradigm. Participants were urology patients diagnosed with either prostate cancer (n = 27) or benign prostate disorder (n = 352) from a tertiary hospital in northcentral Florida. Data were derived from the 2001 to 2015 electronic medical records (EMR). PSA levels (ng/mL) were analyzed with cusp catastrophe mode in which participants’ age at the PSA level was used as the asymmetry variable, and testosterone levels (ng/dL) as the bifurcation variable. Modeling analyses were executed in the open source R software. LDS-based linear correlation and regression analyses were also conducted as a comparison purpose. The mean age of the participants was 66.1 (SD = 9.8) years old; the PSA range was 0.05–13.8 with mean = 1.7 (SD = 1.2) ng/mL; and the total-testosterone range was 27.00–1297.00 with mean = 318.0(SD = 191.6) ng/dL. Results from Chen-Chen cusp regression indicate better data-model fit for cusp (R 2 = 0.47) than for linear regression (R 2 = 0.027). Serum PSA was significantly associated with age (a1 = 0.2691, p < .001) and bifurcated by blood testosterone (b1 = 1.0265, p < .00) with the estimated cusp point = (age = 63, testosterone = 630 ng/mL). The estimated cusp point was close to the epidemiology data that the risk of prostate cancer started to accelerate at about ages 60–65 years; and testosterone level of 630 ng/mL, closer to the up-limit 800 ng/dL of normal range (280–800) by the American Association of Clinical Endocrinologists (AACE). In conclusion, this is the first study that examined the dynamics of PSA in men and demonstrated that serum PSA level follow the NDS. In addition to confirming the relationship between age, testosterone and PSA, findings of this analysis provide a reasonable explanation of the large PSA-range in healthy men and the small difference in mean PSA between healthy men and men with prostate cancer (1.2 vs. 2.6). There is a need to re-evaluate the role of PSA for prostate cancer screening guided by NDS paradigm.
Xinguang Chen, Kai Wang, (Din) Ding-Geng Chen
Chapter 16. Logistic Cusp Catastrophe Regression for Binary Outcome: Method Development and Empirical Testing
Abstract
Cusp catastrophe models are unique to advance life sciences, psychology and behavioral studies. Extensive progresses have been made to utilize this modeling technique for continuous outcome and there is no development for binary data. To fill this gap, this chapter is then aimed to develop a cusp catastrophe modelling method for binary outcome. Building upon our previous research on the nonlinear regression cusp (RegCusp) catastrophe model for continuous outcome, we propose a logistic cusp catastrophe regression (LogisticCusp). LogisticCusp is based on the principles of logistic regression for binary outcome variable y (yes/no) being expressed as a latent binary variable Y through a logit link. This latent regression provides a mathematical connection between an observed outcome variable as a binomially distributed random variable and the deterministic cusp catastrophe at its equilibrium. By connecting the two, Y in the LogisticCusp is considered as one of the true roots of the deterministic cusp catastrophe model determined using the Maxwell or Delay conventions. We validate the method using a 5-step Monte-Carlo simulation with two predictors and three parameters for both bifurcation and asymmetry control variables. We further tested the method with binge drinking behavior in youth with data from the Monitoring the Future Study. Results from 5000 Monte-Carlo simulations indicate that the parameter estimates obtained through LogisticCusp are unbiased and efficient using maximum likelihood estimation with quasi-Newton numerical search algorithm. Results from empirical testing with real data are consistent with those estimated using other methods. LogisticCusp adds a new tool for researchers to examine many issues in psychology, life sciences, and behavioral studies, particularly, issues in medicine and public health with the powerful cusp catastrophe modeling for binary outcome.
(Din) Ding-Geng Chen, Xinguang Chen
Backmatter
Metadaten
Titel
Statistical Methods for Global Health and Epidemiology
herausgegeben von
Dr. Xinguang Chen
(Din) Ding-Geng Chen
Copyright-Jahr
2020
Electronic ISBN
978-3-030-35260-8
Print ISBN
978-3-030-35259-2
DOI
https://doi.org/10.1007/978-3-030-35260-8

Premium Partner