Simulation of close-to-reality population data for household surveys with application to EU-SILC

Alfons, Andreas; Kraft, Stefan; Templ, Matthias; Filzmoser, Peter

doi:10.1007/s10260-011-0163-2

Simulation of close-to-reality population data for household surveys with application to EU-SILC

Published: 02 April 2011

Volume 20, pages 383–407, (2011)
Cite this article

Statistical Methods & Applications Aims and scope Submit manuscript

Andreas Alfons¹,
Stefan Kraft¹^nAff2,
Matthias Templ^1,3 &
…
Peter Filzmoser¹

465 Accesses
36 Citations
Explore all metrics

Abstract

Statistical simulation in survey statistics is usually based on repeatedly drawing samples from population data. Furthermore, population data may be used in courses on survey statistics to explain issues regarding, e.g., sampling designs. Since the availability of real population data is in general very limited, it is necessary to generate synthetic data for such applications. The simulated data need to be as realistic as possible, while at the same time ensuring data confidentiality. This paper proposes a method for generating close-to-reality population data for complex household surveys. The procedure consists of four steps for setting up the household structure, simulating categorical variables, simulating continuous variables and splitting continuous variables into different components. It is not required to perform all four steps so that the framework is applicable to a broad class of surveys. In addition, the proposed method is evaluated in an application to the European Union Statistics on Income and Living Conditions (EU-SILC).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development of a Household Urban Micro-Simulation Model (HUMS) Using Available Open-Data and Urban Policy Evaluation

An Evaluation of Two Synthetic Small-Area Microdata Simulation Methodologies: Synthetic Reconstruction and Combinatorial Optimisation

Constructing synthetic populations in the age of big data

Article Open access 31 October 2023

References

Alfons A (2010) \({\tt{simFrame}}\): simulation framework. R package version 0.3.7
Alfons A, Kraft S (2010) \({\tt{simPopulation}}\): simulation of synthetic populations for surveys based on sample data. R package version 0.2.1
Alfons A, Templ M, Filzmoser P (2010a) An object-oriented framework for statistical simulation: the R package \({\tt{simFrame}}\). J Stat Softw 37(3): 1–36
Google Scholar
Alfons A, Templ M, Filzmoser P (2010b) Simulation of EU-SILC population data: using the R package \({\tt{simPopulation}}\). Research Report CS-2010-5, Department of Statistics and Probability Theory, Vienna University of Technology
Atkinson T, Cantillon B, Marlier E, Nolan B (2002) Social indicators: the EU and social inclusion. Oxford University Press, New York ISBN 0-19-925349-8
Google Scholar
Clarke G (1996) Microsimulation: an introduction. In: Clarke G (ed) Microsimulation for urban and regional policy analysis. Pion, London
Google Scholar
Drechsler J, Bender S, Rässler S (2008) Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB Establishment Panel. Trans Data Priv 1(3): 105–130
MathSciNet Google Scholar
Embrechts P, Klüppelberg G, Mikosch T (1997) Modelling extremal events for insurance and finance. Springer, New York ISBN 3-540-60931-8
MATH Google Scholar
Eurostat (2004) Description of target variables: cross-sectional and longitudinal. EU-SILC 065/04, Eurostat, Luxembourg
Horvitz D, Thompson D (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47(260): 663–685
Article MathSciNet MATH Google Scholar
Kendall M, Stuart A (1967) The advanced theory of statistics, vol 2, 2nd edn. Charles Griffin & Co. Ltd, London
Google Scholar
Kleiber C, Kotz S (2003) Statistical size distributions in economics and actuarial sciences. Wiley, Hoboken ISBN 0-471-15064-9
Book MATH Google Scholar
Kraft S (2009) Simulation of a population for the European living and income conditions survey. Master’s thesis, Vienna University of Technology
Meyer D, Zeileis A, Hornik K (2006) The \({\tt{strucplot}}\) framework: visualizing multi-way contingency tables with \({\tt{vcd}}\). J Stat Softw 17(3): 1–48
Google Scholar
Meyer D, Zeileis A, Hornik K (2010) \({\tt{vcd}}\): visualizing categorical data. R package version 1.2–9
Münnich R, Schürle J (2003) On the simulation of complex universes in the case of applying the German Microcensus. DACSEIS research paper series No. 4, University of Tübingen
Münnich R, Schürle J, Bihler W, Boonstra HJ, Knotterus P, Nieuwenbroek N, Haslinger A, Laaksonen S, Eckmair D, Quatember A, Wagner H, Renfer JP, Oetliker U, Wiegert R (2003) Monte Carlo simulation study of European surveys. DACSEIS Deliverables D3.1 and D3.2, University of Tübingen
Raghunathan T, Reiter J, Rubin D (2003) Multiple imputation for statistical disclosure limitation. J Off Stat 19(1): 1–16
Google Scholar
R Development Core Team (2010) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0
Reiter J (2009) Using multiple imputation to integrate and disseminate confidential microdata. Int Stat Rev 77(2): 179–195
Article Google Scholar
Rubin D (1993) Discussion: statistical disclosure limitation. J Off Stat 9(2): 461–468
Google Scholar
Sarkar D (2008) Lattice: multivariate data visualization with R. Springer, New York ISBN 978-0-387-75968-5
MATH Google Scholar
Sarkar D (2011) \({\tt{lattice}}\): lattice graphics. R package version 0.19-17
Simonoff J (2003) Analyzing categorical data. Springer, New York ISBN 0-387-00749-0
MATH Google Scholar
Templ M, Alfons A (2010) Disclosure risk of synthetic population data with application in the case of EU-SILC. In: Domingo-Ferrer J, Magkos E (eds) Privacy in statistical databases. Lecture notes in computer science, vol 6344. Springer, Heidelberg, pp 174–186
Google Scholar
Walker A (1977) An efficient method for generating discrete random variables with general distributions. ACM Trans Math Softw 3(3): 253–256
Article MATH Google Scholar
Weisberg S (2005) Applied linear regression, 3rd edn. Wiley, Hoboken ISBN 0-471-66379-4
Book MATH Google Scholar

Download references

Author information

Stefan Kraft
Present address: Institute for Quantitative Asset Management (IQAM), Wollzeile 36–38, 1010, Vienna, Austria

Authors and Affiliations

Department of Statistics and Probability Theory, Vienna University of Technology, Wiedner Hauptstraße 7, 1040, Vienna, Austria
Andreas Alfons, Stefan Kraft, Matthias Templ & Peter Filzmoser
Methods Unit, Statistics Austria, Guglgasse 13, 1110, Vienna, Austria
Matthias Templ

Authors

Andreas Alfons
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Kraft
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Templ
View author publications
You can also search for this author in PubMed Google Scholar
Peter Filzmoser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Alfons.

Additional information

This work was partly funded by the European Union (represented by the European Commission) within the 7th framework programme for research (Theme 8, Socio-Economic Sciences and Humanities, Project AMELI (Advanced Methodology for European Laeken Indicators), Grant Agreement No. 217322). Visit http://ameli.surveystatistics.net for more information on the project.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alfons, A., Kraft, S., Templ, M. et al. Simulation of close-to-reality population data for household surveys with application to EU-SILC. Stat Methods Appl 20, 383–407 (2011). https://doi.org/10.1007/s10260-011-0163-2

Download citation

Accepted: 07 February 2011
Published: 02 April 2011
Issue Date: August 2011
DOI: https://doi.org/10.1007/s10260-011-0163-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simulation of close-to-reality population data for household surveys with application to EU-SILC

Abstract

Access this article

Similar content being viewed by others

Development of a Household Urban Micro-Simulation Model (HUMS) Using Available Open-Data and Urban Policy Evaluation

An Evaluation of Two Synthetic Small-Area Microdata Simulation Methodologies: Synthetic Reconstruction and Combinatorial Optimisation

Constructing synthetic populations in the age of big data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Simulation of close-to-reality population data for household surveys with application to EU-SILC

Abstract

Access this article

Similar content being viewed by others

Development of a Household Urban Micro-Simulation Model (HUMS) Using Available Open-Data and Urban Policy Evaluation

An Evaluation of Two Synthetic Small-Area Microdata Simulation Methodologies: Synthetic Reconstruction and Combinatorial Optimisation

Constructing synthetic populations in the age of big data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation