Using geocoded survey data to improve the accuracy of multilevel small area synthetic estimates

doi:10.1016/j.ssresearch.2015.12.006

Social Science Research

Volume 56, March 2016, Pages 108-116

https://doi.org/10.1016/j.ssresearch.2015.12.006 Get rights and content

Under a Creative Commons license

open access

Highlights

•
The data requirements for multilevel synthetic estimation are manifold and restrictive.
•
Having greater choice of area level covariates can improve the accuracy of synthetic estimates.
•
Attaching area data from an external source rather than aggregating survey responses can improve small area estimates.
•
Geocoded surveys are preferable when selecting a dataset to generate small area synthetic estimates.

Abstract

This paper examines the secondary data requirements for multilevel small area synthetic estimation (ML-SASE). This research method uses secondary survey data sets as source data for statistical models. The parameters of these models are used to generate data for small areas. The paper assesses the impact of knowing the geographical location of survey respondents on the accuracy of estimates, moving beyond debating the generic merits of geocoded social survey datasets to examine quantitatively the hypothesis that knowing the approximate location of respondents can improve the accuracy of the resultant estimates. Four sets of synthetic estimates are generated to predict expected levels of limiting long term illnesses using different levels of knowledge about respondent location. The estimates were compared to comprehensive census data on limiting long term illness (LLTI). Estimates based on fully geocoded data were more accurate than estimates based on data that did not include geocodes.

Keywords

Multilevel

Synthetic estimation

UK census

Geocodes

spatial identifiers

Limiting long term illness