The policy relevance of social indicators has risen with the latest financial and economic crisis. They were awarded a prominent status in European politics with the European Commission’s Europe 2020-target for social inclusion (
2010) and before that the Laeken indicator set on social inclusion (
2001). The European Community Statistics on Income and Living Conditions (EU-SILC) are one of the pillars of social statistics in the European Statistical System and the most relevant household survey at the European level in the field of household income, living standards and poverty. Several indicators of social inclusion, amongst them the Europe 2020-target for the “risk of poverty or social exclusion”, are calculated annually on the basis of this source.
Being so highly recognized, those indicators are expected to fulfil high statistical standards concerning reliability, validity and comparability (both over time and between countries). The evaluation of measurement error in this context is therefore crucial. In our paper, we focus on the measurement of household income in EU-SILC and investigate differences between data collected using surveys and data collected from registers. For this purpose, we take advantage of the fact, that for the Austrian EU-SILC of 2008–2011, both register- and survey-based income data are available for the same observational units.
1 Using the differences in these measurements for households at the micro level, we aim to provide explanations for changes in different income-based poverty indicators by investigating the underlying changes in the distribution of household income as a consequence of using register data. First, by estimating multinomial logit and linear models with covariates referring to the income and employment structure, the interview situation (e.g. CATI vs. CAPI) and other household characteristics, we try to explain whether certain types of households tend to under- or over-report their household income when asked via the survey method. Second, we ask which component (income type, weighting) contributes most to the change in the poverty measurement if register data are used instead of survey data.
1.1 Differences Between Register and Survey Data, Measurement Error and Its Impact on Poverty
The identification of data errors requires by definition some a point of reference to judge the accuracy of the information. In most cases, administrative data are proclaimed to be the benchmark. Bound et al. (
2001) distinguish between micro-level and macro-level validation studies for assessing measurement error. Micro-level validation studies usually define measurement error as the difference between the value recorded in administrative records and the value observed in the survey. Macro-level validation studies, in contrast, compare population parameters, such as income inequality or the sum of earnings, derived from the survey to official reports based on administrative records or to estimates obtained from a comparable survey. Existing studies on measurement error are mostly done for the US population and with a focus on personal or market income.
Mean-reverting errors with low earnings inflated and high earnings underreported are a common finding in such studies (Bound et al.
2001; Gottschalk and Huynh
2010; Kim and Tamborini
2012,
2014). Income volatility and income structure also matter: based on a survey sample for a developing country, Akke (
2011) found that prior earnings volatility strongly affects measurement error in the current period. Moreover, there is evidence for a positive correlation between measurement error and the number of different income sources in the household (Moore et al.
2000).
Besides income-related variables, studies have also shown the importance of the survey duration and survey mode. In longitudinal studies, panel participants’ responses may increasingly begin to differ from their initial responses to the same survey questions due to learning effects in answering a complex questionnaire and/or by an improved personal relationship between respondent and (the same) interviewer (Sikkel and Hoogendoorn
2008; Chadi
2013). Such effects have been found for questions on life satisfaction (Frick et al.
2006; Landua
1992) and for subjective mental health (Wooden and Li
2014). For income, however, a longer participation in a panel does not necessarily result in higher accuracy. Measurement errors for income are usually found to be positively serially correlated in such studies for the US population (Pischke
1995; Bound and Krueger
1991). Whether this also applies for representative survey data for a population sample in a European country will be investigated in our paper.
Mode effects refer to the type of interaction between interviewer and interviewee. Existing studies focus on differences between CATI and CAPI and on the relevance of proxy interviews for income measurement error. The literature has shown that respondents to CATI are more likely to present socially desirable responses (Beland and St-Pierre
2007; Groves et al.
2009; Holbrook et al.
2003). A study for Austria found that telephone interviews lead to a larger downward bias concerning income inequality (Fessler et al.
2013). For proxy interviews, however, a more ambivalent picture emerges (Brown et al.
2001; Tourangeau et al.
2000). On the one hand, proxy interviews may enhance data quality because there is less social desirability pressure and thus a lower likelihood of mean-reverting errors. On the other hand, income of other household members can easily be overlooked due to recall error or interview fatigue. Some (and mostly older) studies have found only little proxy bias in earnings (Bound and Krueger
1991), whereas more recent studies show that proxy interviews bias earnings downwards (Reynolds and Wenger
2012) and their effect also interacts with demographic variables (Tamborini and Kim
2013).
Furthermore, differences in income inequality are observed when survey and register data are compared: Gottschalk and Huynh (
2010) discuss the implications of measurement error in surveys on earnings income inequality. By matching the US Survey of Income and Program Participation to tax data, they find that income inequality is 20% higher in the register data. Based on a random sample from the Danish population, Kreiner et al. (
2013) compare a one-shot recall question on total personal income (employment income, pension income, social transfers) with the corresponding tax records of the respondents. The authors find a lower mean and a lager spread for the survey measure.
A smaller number of studies are concerned with total household income and the consequences of using register data instead of survey data for the calculation of household income and poverty indicators. The studies available also differ in their validation methodology. In sum, the measurement error of income has been shown to affect cross-sectional poverty rates (Nordberg et al.
2001; Figari et al.
2012), poverty dynamics (Rendtel et al.
1998; McGarry
1995; Breen and Moisio
2004; Worts et al.
2010) and statistical relationships of poverty indicators with other variables (Lohmann
2011). Nordberg et al. (
2001) found that income estimates derived from administrative records are quite reliable and generally higher than surveyed income, except for very low register incomes. They interpreted the differences observed as being mainly due to measurement errors in the interview data. Their results showed that survey data produced higher inequality and poverty estimates than register data. Lohmann (
2011) makes use of between-country differences concerning data sources for income variables (register or survey data). Results show that the degree of consistency between earnings and employment status (i.e. no earnings reported if the status is non-working) is on average lower in register countries; this also impacts on the poverty rate conditional on activity status in some countries. The author concludes that the relationship between employment status and poverty status also depends on the data collection approach used. Figari et al. (
2012) compare empirical estimates of income distribution and poverty rates based on microsimulation methods with observed survey-data-based estimates. The authors use simulated estimates in their model in accordance with prevailing rules on liability and eligibility in four European countries. On the one hand, their results show that poverty rates, defined as the number of people with equalized incomes less than 60% of the national median, which use reported data are slightly higher than those calculated using simulated incomes. On the other hand, there was an overlap of 75% for both approaches.
1.2 Characteristics of Register Data and Implementation in EU-SILC
Assuming register data to be a less error-prone source for validating survey questions on income, however, may not always be justified (Abowd and Stinson
2013; Kapteyn and Ypma
2007) and depends on the context of data production. Since administrative registers are not initially built to answer certain research questions, they should not be expected to provide perfect statistical data (United Nations
2007; Wallgren and Wallgren
2007; Zhang
2011). Abowd and Stinson (
2013) identify three potential causes for deviations from survey data which must not be confounded with different levels of measurement error: a) definitional differences between survey and register data—like taxable income relevant for a wage-tax register versus actual disposable income from a standard-of-living perspective; b) errors in administrative data itself (e.g. coverage issues and updating intervals) and c) and mistakes in the matching process of multiple data sources.
In the European Statistical System, common definitions and methods have been agreed upon in order to facilitate the comparability of poverty indicators and income between countries. EU-SILC
2 comprises several variables of personal and household income components and is conducted in all 28 member countries plus several more.
3 In cooperation with the National Statistical Institutes (NSI), Eurostat aims to maximize the comparability of indicators across the participating countries through the output harmonization of variables (i.e. providing/developing explicit conceptual definitions of what to measure, namely so-called “target variables”, as opposed to specifications of how to measure them) and agreements on various methodological aspects like sampling, weighting and precision requirements. However, whereas detailed rules for the content of variables and the construction of those indicators exist, the source of income data—amongst other parameters—is up to the Member States. As a consequence, some countries mainly use official registers, whereas other countries mostly (have to) rely on survey data to fill specific income components. Thus, the heterogeneity of the data sources is something of an obstacle to their comparability, though it may lead to a good overall level of data quality in the outcome indicators.
When EU-SILC started in 2004, only few countries were using registers; but nowadays ever more Member States are making the step towards integrating register data into their SILC data collections. Studies investigating the impact of register use on measurement error are therefore vital. Törmälehto (
2013) draws four main conclusions for the context of EU-SILC
4:
1.
Integrating register data in a data collection may affect multiple phases of a survey process: sampling and weighting (as new information from registers can be used e.g. to design the sample), non-response analysis, calibration of weights, survey designs (as the potential for dropping questions from the survey may alter the whole “flow”), processing and quality control, imputation, dissemination and documentation.
2.
It is challenging to generalise about quality of registers in a cross-national context.
3.
There is a lot of variation concerning the particular data sources for specific variables. Register data may originate from survey-like data collections (e.g. self-administered questionnaires) but also from entirely electronic exchanges of administrative data.
4.
The combined use of survey and register data affects the total survey error (Groves et al.
2009), and expands the traditional survey error sources to those related to registers. To explain this, Törmälehto (
2013) also cites Zhang (
2011), who proposed an addition to Groves’ Total Surveys Error model. While Groves’ ideas were designed for the context of (sampled) survey data, Zhang (
2011) further develops and applies them to error sources associated with register data (e.g. problems of conceptualization, measurement, and accuracy). Zhang proposes a “two-phase life cycle of integrated statistical micro data” where the first phase concerns the data from each single source, and the second the integration of data from different sources. Register data could be used as a benchmark against which survey data could be compared to estimate the magnitude and predictors of measurement error in a given country. However, it should not be expected that register data themselves are not prone to (other) sources of error and that the combination of register and survey data leads to perfect statistical data—on the contrary: “At the present stage, there is still clearly a lack of statistical theories for assessing the quality of such register-based statistics.” (Zhang
2011: 446). In sum, there could be more sources of error when using register data, but usually the expectation is to have a lower total error due to fewer measurement errors.
1.3 Effects of Register Data Use in EU-SILC: A Comparative Perspective
Some countries have a longer history of register data use than others, mostly for legal and administrative reasons: Denmark, Finland, Island, Netherlands, Norway, Sweden and Slovenia are those that started with administrative data in SILC right away (i.e. from 2004/05). Then there are those who joined in more recently: Italy gradually since 2004, France since 2008, Austria since 2012, and Spain since 2013. Although the “old” register countries encountered the same challenges,
5 we focus on those countries that have made the transition from survey income in more recent years and therefore have SILC waves with different income sources to compare.
In Spain, the new methodology, where register and survey income information is combined, is considered a more comprehensive method of collecting income in lower and in higher parts of the distribution (Méndez
2015). Income levels are significantly higher than when using the survey approach but inequality indicators, like the risk of poverty, remain similar. Similarly, the French experience (Burricand
2013) showed that the change in methodology did not have a significant impact on the poverty rate, while other inequality indicators increased. Differences between the two income sources—registers versus surveys—were more important in the extremes of the distribution than in the mid-range, and for some income components (pensions) than others (wages). In Italy (Consolini and Donatiello
2013), the inclusion of register data produced a substantial increase in the estimate of average income among self-employed earners, while the increase for employees was less pronounced. At the same time, the use of a mixed data-collection strategy versus survey data only resulted in a substantial decrease in the risk of poverty and Gini coefficient. Only about half of all persons were at risk of poverty according to both methodologies; the others had a different status with each methodology.
1.4 Conclusions Drawn from the Literature Review
To sum up, the prevailing literature in the field highlights the following problems related to measurement error in income: (a) errors explained by data collection methods (e.g. type of question—yearly vs. current, simple vs. complex; source for income variables—register or survey or any combination of both); (b) problems caused by panel design and relevant for measuring poverty dynamics correctly; and (c) challenges concerning cross-country comparisons. Furthermore, two main conclusions can be derived from existing SILC studies and similar surveys: (1) the effect of register data is generally more visible in the lower and upper extremes of the household income distributions and varies for different income components; (2) the effect of register data use on income inequality and poverty indicators varies between countries.
We add to the literature in several aspects. All of the studies on household income and poverty indicators discussed above use either microsimulation or some variant of Markov modelling to capture measurement error. In this paper, survey data is directly validated against register data at a micro level. Moreover, the consequences of deviations for estimating poverty indicators are investigated. The focus of the analysis lies on equivalised household income. Additionally, studies that report on situations where consent from sampled individuals is necessary to link a survey with register data—as in the US—do not apply in our case, as giving or withdrawing consent introduces a further burden and potential bias. Seeking respondents’ consent is not legally required in Austria for a voluntary survey like SILC. This allows for a more complete comparison of survey and register data across the income distribution and socio-demographic groups. This article thus aims to pave the way to a better understanding of the specifics for Austria and to contribute to a more comprehensive picture in EU-SILC and other large European surveys.
The remainder of the article is structured as follows: Sect.
2 describes the development and specifics of register use for SILC in Austria as well as the context of the re-calculation of household incomes for 2008–2011. It then goes on to illustrate the household income concept and its components. Section
3 describes how the analysis addresses the main research questions. The fourth section is divided into two parts. The first part illustrates the effects of the data switch on the aggregate poverty rate and the underlying statics of the distribution of household income. In the second part, the results of both cross-sectional (for 2010) and longitudinal (2008–2011) regression models for the observed income differences are discussed. The robustness of the main results is further evaluated against different model specifications and statistical tests. The outcomes of these tests are summarized in the fifth section. Section
6 concludes and describes limitations of the current study and suggestions for further research.