3.1 Data set and design
The empirical analysis is based on longitudinal data from the DAB panel study (Becker et al.
2020a). The aim of the panel study is the mechanism-based investigation of the educational and occupational trajectories of youth born around 1997 and living in German-speaking cantons of Switzerland. The sample of this target population is random. The target population of the DAB study consists of eight-graders of the 2011/12 school year who were enrolled in regular classes in public schools. The panel data are based on a random and 10% stratified gross sample of 296 school classes, out of a total universe of 3045 classes. A disproportional sampling of school classes from different school types, as well as a proportional sampling of school classes regarding the share of migrants within schools, was applied. At school level, a simple random sample of school classes was chosen. The initial probability sampling is based on data obtained from the Swiss Federal Statistical Office (Glauser
2015).
The panel study started in 2012. In the first three surveys, the target persons were interviewed in the context of their school class via online questionnaire. After that, they left compulsory school and have had to be pursued individually, since summer 2013. Therefore, a sequential mixed-mode design was established, including the TDM suggested by Dillman et al. (
2014). Since the fourth wave, the eligible panellists have been pushed towards the web-based online mode by a personalised advanced invitation letter, including an incentive, sent by regular postal mail (Becker et al.
2019). Using the fast option offered by Swiss Post, the A-post, it is guaranteed that eligible target persons will receive this letter the next day. They are informed that the panel study is financed by the Swiss Secretary of Education, Research and Innovation (SERI), a governmental agency, and that it is conducted by a team of researchers at a cantonal university. One day later, they receive the clickable URL and password to log on to the web site by email. If they do not start to complete the questionnaire after some days, they get personalised reminders. About two weeks after survey launch, nonrespondents are invited to take part in CATI. If they do not react to call attempts and reminders, a traditional paper-and-pencil survey is offered as a final mode.
Eight surveys have been realised. For the current research issue, analyses are focused on the field period of the two most recent waves, conducted in May/June 2018 and 2020. In both waves, the eligible panellists received an unconditionally prepaid monetary incentive enclosed in the invitation letter (Becker et al.
2020a). To test the hypotheses, paradata from the first mode—the web-based online survey—are used, providing accurate time references of field periods and individuals’ survey participation. In the online mode, the personalised reminders were sent about four, seven, and 10 days after the survey launch (i.e. at three-day intervals). The first reminder was a text sent by SMS; the second was an SMS or email; and the third was an email. After about 12 days, the nonrespondents were informed about a contact for the CATI. For each contact the exact time and status references were documented. After three call attempts the nonrespondents got a reminder via SMS. Three weeks after survey launch, they got an email reminding them to take part in the CATI. The total field period lasted 40 days for the seventh wave and 52 days for the eighth wave.
3.2 Dependent and independent variables
There are two dependent variables. The first is the respondents’ likelihood of taking part in the survey at any point during the field period. This distinguishes between participation in the online and CATI modes. The second variable is the respondents’ likelihood of receiving an (electronic) reminder during the field period across both survey modes.
For the independent variables, different analytical levels are considered. At the
macro level, the
weather situation and the
regional opportunity structure are taken into account. The weather situation is measured using time series delivered by the Federal Office of Meteorology and Climatology (
2020) on a daily basis, considering weather characteristics during the field periods, such as average air temperature by day (in degrees centigrade), relative humidity (daily average percentage), rainfall (daily average in millimetres), duration of sunshine (in hours per day), and barometric pressure (in hectopascals), and extracted by confirmatory factor analysis (Becker
2021). The opportunity structure of the region in which the panellists live is measured by macro data from the Swiss Federal Statistical Office and reflects the principle of small, partially cross-cantonal labour market areas with functional orientation towards centred and peripheral opportunities and living standards, in addition to urbanicity, population density, and lack of social cohesion (Glauser and Becker
2016: 20).
At the meso level of survey characteristics, the number of panel waves is considered as a dummy variable. This is also true for the number and different types of reminders delivered to the target persons.
At the
micro level of target units, first of all, their
social origin is indicated by their parents’ social class. This is measured by the class scheme suggested by Erikson and Goldthorpe (
1992). Second, their
enrolment in secondary school until the end of compulsory schooling is taken into account. This distinguishes between different educational levels, considering requirements such as basic, extended, and advanced levels relevant for the ability to read and complete a questionnaire. Furthermore, the panellists’
gender, as well as their
language proficiency measured by the standardised grade average points in German, correlating with their reading literacy, is taken into account. Finally, panellists’ personality traits, such as persistence, control beliefs, and decisiveness, are taken into account, since these characteristics are seen as significant for willingness to take part in a social-scientific survey (Saßenroth
2013), as well as for the choice of the survey mode offered to them.
2
3.3 Statistical procedures
Since participation in push-to-web surveys or CATI is modelled as a
time-dependent stochastic process of an individual’s decision on participation and selection of survey mode, which could occur at each of the points in time across the field period,
event history analysis is applied to reveal causal endogenous and exogenous factors influencing the likelihood and timing of survey participation, as well as the effect of reminders on survey participation (Tourangeau et al.
2013: 38). By considering time-varying covariates in an event-oriented design, it is possible to reveal the causalities of this stochastic process (Rohwer and Blossfeld
1997; Pötter and Blossfeld
2001). In regard to statistical analysis, event history analysis provides techniques and procedures to take these theoretical and methodological premises into account (Blossfeld et al.
2019: 1–40).
For the longitudinal analysis, different procedures of
event history analysis are utilised to analyse the time until interesting events—such as survey participation, selection of one of the survey modes, or receiving a reminder—occur within the field period. However, due to the
sequential mixed-mode design, specialities of the timing of events have to be considered. In the sequential mixed-mode design, access to the online mode is possible for each of the invitees during the complete field period. As mentioned above, the nonrespondents among them are asked, about two weeks after survey launch, to take part in the CATI mode. This means there is then a competing risk of taking part in one of the two offered modes, which are mutually exclusive during an overlapping risk period. A competing risk is an event—such as participation in one of the two survey modes—that either hinders the occurrence of the primary event of interest (e.g. participation in the online survey instead of CATI) or that modifies the chance that this event (e.g. participation in CATI) occurs (Noordzij et al.
2013: 2670). For example, when analysing participation in the online survey (initial mode) towards which potential respondents are pushed at survey launch, inviting nonrespondents to take part in the CATI (subsequent mode) about two weeks after the survey launch is an event that competes with the acceptance of the initial mode as the primary event of research interest. When an eligible panellist chooses one mode or another, the unchosen mode cannot be realised at another point in time, due to censoring. However, panellists who have not started completing the online questionnaire have the “
chance” to take part in the CATI or online mode at a point in time that is convenient for them. According to Schwartz (
2009), due to the burden of an additional option, they could be spoilt for choice. In this case, it is likely that they will not take part in the survey. Another positive outcome is that the individual’s preference for CATI hinders them in starting the online questionnaire. Unintended by the researchers, it could occur that offering the CATI mode pushes nonrespondents towards the online mode. Finally, it could be likely that survey participation will be postponed due to indifference or choice overload.
According to Schuster et al. (
2020), estimations could be biased systematically when competing events—i.e. two or more cause-specific hazards (Kalbfleisch and Prentice
2002)—are ignored in the analysis of survival data. Against the background of competing risk—the potentially simultaneous occurrence of mutually exclusive events, such as participation in the online mode versus the CATI mode, in overlapping stages of the field period—the traditional survival analysis (i.e. Kaplan–Meier product-limit estimations) is inadequate for describing the timing and likelihood of panellists’ survey participation. The assumption of standard survival analysis, namely that the censoring of events is independent, is not valid in this case. Therefore, the Kaplan–Meier estimator is biased since the probability of the event of primary interest is overestimated (Noordzij et al.
2013: 2672). The overestimation of probabilities increases with risk time. Therefore, alternative
nonparametric procedures of competing risk analysis—the
cumulative incidence competing risk method—are used to describe the panellists’ participation patterns across the field period. Since Kaplan–Meier plots are biased in the presence of competing risks, the
cause-specific cumulative incidence function (CIF), which is the probability of survey participation before the end of field period
\(t\), is estimated to reveal the risk of choosing one of the competing survey modes (Lambert
2017). The CIF describes the incidence of the occurrence of an event while taking competing risks into account (Austin and Fine
2017: 4293).
In particular, the
piecewise constant analysis is also used to describe the hazard rates for receiving reminders, as well as panellists’ survey participation, to reveal the effect of reminders on panellists’ reactions at each point in time during the field periods. According to Blossfeld et al. (
2019: 124), the basic idea of this procedure is to split the time axis into time periods (e.g. on a daily basis) and to assume that transition rates are constant in each of these intervals but can change between them. Using this procedure, it is possible to describe the occurrence of (competing) events in different phases of the field period. Given theoretically defined time periods, the transition rate for survey participation is defined as follows:
\(r_{k} \left( t \right) = exp\left\{ {\overline{\alpha }\begin{array}{*{20}c} {\left( k \right)} \\ I \\ \end{array} + A^{\left( k \right)} \alpha^{\left( k \right)} } \right\} if t \in I_{t}\), whereby k is the destination,
I the time interval,
\(\overline{\alpha }\begin{array}{*{20}c} {\left( k \right)} \\ I \\ \end{array}\) is a constant coefficient associated with the
lth time period,
\(A^{\left( k \right)}\) is a vector of covariates, and
\(\alpha^{\left( k \right)}\) is an associated vector of coefficients assumed not to vary across time (Blossfeld et al.
2019: 125). This model is estimated without any covariates, since crude hazard rates should be estimated across the time interval of the field periods.
Furthermore,
parametric regression procedures are used to estimate the impact of independent variables on the likelihood of interesting events. The hazard rate
\(r\left( t \right)\) is defined as the marginal value of the conditional probability of such an event occurring—namely the instantaneous rate for survey participation or receiving a reminder—in the time interval
\(\left( {t, t + \Delta t} \right)\), given that this event has not occurred before time
\(t\) (Blossfeld et al.
2019: 29). First of all, for single events, such as survey participation, or repeated events, such as receiving reminders across the field period, the hazard rate is estimated on the basis of an
exponential model:
\(r\left( {t|x\left( t \right)} \right) = \exp \left( {\beta^{\prime } x\left( t \right)} \right)\), whereby
\(x\left( t \right)\) is the time-dependent vector of exogenous variables whose unknown coefficients
β have to be estimated. To account for time-varying covariates, the technique of
episode splitting is used: i.e. the initial waiting time is split into sub-episodes on a daily basis. For each of these sub-episodes, a constant hazard rate is assumed. By applying this procedure, it is possible to model step functions displaying the empirically observed hazard function for the entire process until participation or getting a reminder.
In the case of competing risks in terms of participation in the online survey versus the CATI, the
exponential model (including the episode splitting) is equivalent to the
proportional cause-specific hazards model suggested by Kalbfleisch and Prentice (
2002). According to Schuster et al. (
2020: 44), the “cause-specific hazard denotes the instantaneous rate of occurrence of the event of interest in a setting in which subjects can also experience the competing event”. Since this hazard is estimated by removing individuals from the risk set the moment they experience the competing event, meaning that competing events are treated as censored observations, it is possible to estimate the cause-specific hazard using an exponential model in which all events other than the event of interest are treated as censoring. Schuster et al. (
2020: 44) suggest interpreting these hazard ratios “among subjects who did not (yet) experience the event of interest or a competing event. As the cause-specific hazard is directly quantified among subjects that are actually at risk of developing the event of interest, the cause-specific hazard model is considered more appropriate for etiological research.” This is realised by calculating estimations of survey participation separately for the different modes. However, according to Lunn and McNeil (
1995: 524), these methods provide the drawback “that [they do] not treat the different types of failures jointly, complicating the comparison of parameter estimates corresponding to different failure types”.
Another approach—the
subdistribution hazards approach by Fine and Gray (
1999)—is often seen as the most appropriate method to use for analysing competing risks. In contrast to the cause-specific hazards model, “subjects who experience a competing event remain in the risk set (instead of being censored), although they are in fact no longer at risk of the event of interest” (Noordzij et al.
2013: 2673). This precondition is necessary to establish the direct link between the covariates with the CIF to predict the hazard ratios. However, this makes it difficult to interpret them in a straightforward way, and is therefore not appropriate for etiological research (Schuster et al.
2020: 44). By taking competing risks into account, the coefficients estimated by the
stcrreg module implemented in the statistical package
Stata can be used to compute the cumulative incidence of participation in one of the survey modes, and to depict the hazards in a CIF plot. In sum, the “cause-specific hazard model estimates the effect of covariates on the cause-specific hazard functions, while the Fine-Gray subdistribution hazard model estimates the effect of covariates on the subdistribution hazard function” (Austin and Fine
2017: 4393).