Skip to main content
Top
Published in: Transportation 6/2023

Open Access 13-11-2022

Response willingness in consecutive travel surveys: an investigation based on the National Household Travel Survey using a sample selection model

Authors: Xinyi Wang, F. Atiyya Shaw, Patricia L. Mokhtarian, Kari E. Watkins

Published in: Transportation | Issue 6/2023

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Declining survey response rates have increased the costs of travel survey recruitment. Recruiting respondents based on their expressed willingness to participate in future surveys, obtained from a preceding survey, is a potential solution but may exacerbate sample biases. In this study, we analyze the self-selection biases of survey respondents recruited from the 2017 U.S. National Household Travel Survey (NHTS), who had agreed to be contacted again for follow-up surveys. We apply a probit with sample selection (PSS) model to analyze (1) respondents’ willingness to participate in a follow-up survey (the selection model) and (2) their actual response behavior once contacted (the outcome model). Results verify the existence of self-selection biases, which are related to survey burden, sociodemographic characteristics, travel behavior, and item non-response to sensitive variables. We find that age, homeownership, and medical conditions have opposing effects on respondents’ willingness to participate and their actual survey participation. The PSS model is then validated using a hold-out sample and applied to the NHTS samples from various geographic regions to predict follow-up survey participation. Effect size indicators for differences between predicted and actual (population) distributions of select sociodemographic and travel-related variables suggest that the resulting samples may be most biased along age and education dimensions. Further, we summarized six model performance measures based on the PSS model structure. Overall, this study provides insight into self-selection biases in respondents recruited from preceding travel surveys. Model results can help researchers better understand and address such biases, while the nuanced application of various model measures lays a foundation for appropriate comparison across sample selection models.
Notes

Supplementary Information

The online version contains supplementary material available at https://​doi.​org/​10.​1007/​s11116-022-10312-w.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

High-quality survey data provide the foundation for research and policymaking across many fields. While novel data sources are actively being examined for use in transport applications, both currently and for the foreseeable future traditional travel surveys will continue to play an irreplaceable role in providing critical data for use in travel demand modeling, regional planning, and policymaking. However, survey response rates are in continuous and significant decline, thus requiring increased efforts toward respondent recruitment. Further necessitating these increased efforts is the fact that low response rates and their accompanying nonresponse biases can threaten the validity of survey data, and thus contingent research findings (National Research Council 2013).
Survey teams have employed a range of efforts aimed at increasing response rates and improving survey data quality. Among the most common tools are the use of passive datasets such as GPS records (Bohte and Maat 2009) and targeted marketing data (Shaw et al. 2021), novel survey formats (e.g., interactive surveys; Collins et al. 2012), and targeted sampling frames (e.g., online panels; Circella et al. 2016), to name a few. Another approach, which is the focus of this paper, is to recruit survey respondents who had expressed willingness to be contacted again in a previous survey; this approach has been shown to produce a significantly higher response rate and lower cost per valid response relative to random sampling (Amarov and Rendtel 2013; Kim et al. 2019; Circella et al. 2020).
This recruitment method has some similarities to the approach used in panel studies in that both nominally draw respondents from preceding surveys. Accordingly, both approaches are subject to attrition biases. There are some important differences, however. For one thing, respondents to a panel study are normally informed at the outset that participation in the study involves completing multiple surveys (and therefore that agreement to participate signifies agreement to complete multiple surveys), whereas in the present case, the willingness to complete a later survey is an entirely separate decision, not even presented to the respondent at the entrance to the initial study. Other differences reside in the survey purpose, contents, or outcome. Specifically, panel surveys focus on repeated observations on a set of variables for the same sample unit over time (Lavrakas 2008), which allows the tracking of specific variables or study interests. In contrast, recruiting respondents from a previous survey is not a periodical behavior, and the follow-up survey may have relatively little in common with the initial one. The use of this recruitment method: (1) increases the survey response rates obtained on follow-up surveys; (2) reduces the financial burden for local transportation agencies and researchers; and (3) facilitates the expansion of the variable set of the preceding survey and enables data fusion across datasets (Shaw et al. 2022). In view of the plethora of single cross-section surveys and the challenges of conducting panel studies (notably time and money, among others), using a prior cross-sectional survey to help recruit for the next one is certainly an attractive prospect.
However, in the transportation domain, this recruitment method has not been widely adopted nor carefully examined. A major potential drawback of recruiting respondents based on their willingness expressed in a preceding survey is the non-representativeness that may be inherent in that sample (Couper et al. 2007). Accordingly, the present study is interested in the following questions: (1) Who is more likely to respond to a follow-up survey? (2) How does recruiting respondents based on their willingness expressed in a preceding travel survey bias the follow-up survey sample? (3) In view of the importance (in sample size, geographic scope, and information value) of the National Household Travel Survey (NHTS), how helpful is it to use the NHTS in particular as the springboard for follow-on survey recruitment? Specifically, what survey sample could we expect if we recruited respondents from the 2017 NHTS respondents in different geographic regions in the U.S.?
To address the questions raised above and bridge the gap in the literature regarding recruiting survey respondents from a preceding travel survey, we do the following:
(1)
We analyze the first-stage self-selection (willingness to participate in a follow-up survey) and second-stage non-response (actual response behavior) biases simultaneously for respondents recruited from a previous travel survey (the NHTS), using a probit with sample selection (PSS) model, which could remedy the model coefficient biases. We also propose several standardized PSS model performance measures to enable model comparisons.
 
(2)
We apply the PSS model to a holdout sample to decompose biases (e.g., dataset bias, self-selection bias, non-response bias) accumulated along the way and further analyze the representativeness of the recruited survey respondents by comparing sample and population marginal distributions for various variables.
 
(3)
We predict follow-up survey samples from different geographic regions in the U.S. as another PSS model application example, and check the model’s generalizability.
 
By understanding the dataset biases that can result when respondents are recruited from a preceding survey, researchers/practitioners can better assess the tradeoff between data quality and resource constraints associated with respondent recruitment. Moreover, understanding these biases would allow survey developers to adjust their invited sample—for example, by oversampling underrepresented groups in the follow-up surveys. This work would, therefore, be especially useful for transportation professionals if the NHTS in particular retained the willingness question as a recurring item in future surveys, thereby allowing local agencies and researchers to recruit follow-up respondents from the NHTS sample efficiently. Even outside of the NHTS, the contributions of this paper have general findings and implications for researchers using the approach of recruiting respondents from prior cross-sectional surveys.
The rest of the paper is organized as follows. We begin with the “Literature review” section. We describe the data source used in this study in the “Data description” section. The “Methodology” section introduces details of the probit with sample selection (PSS) model and summarizes six modified model performance measures. In the  “Results” section, we present and analyze model results, including both model interpretations and performance measures. In the “PSS model validation and application” section, we apply the calibrated model to a holdout sample to decompose sample biases and predict follow-up survey participation in diverse geographic regions in the U.S. We close with a summary of findings in the “Conclusion” section.

Literature review

As mentioned, continuously declining survey response rates make it increasingly difficult for survey developers to obtain high-quality survey data with the same survey budgets as in the past. To enhance response rates, researchers and practitioners have developed and applied many approaches to aid the survey recruitment process.
We first summarize a few commonly used recruitment approaches and the accompanying sample biases. The use of survey incentives is an effective approach to increase survey response rates; examples of these include lotteries, tokens, and philanthropic donations (Edwards et al. 2002; Smith et al. 2020; Young et al. 2020). Coryn et al. (2020) found a lottery to be the most cost-effective incentive format, while Parsons and Manierre (2014) showed that unconditional incentives might exacerbate the overrepresentation of females among survey respondents. Using different survey modes (e.g., mail, phone, and web) is another way to increase response rates of specific population groups. For example, web surveys have (at least in the past) been found to generate a much lower response rate than mail surveys in general (Manfreda et al. 2008; Hardigan et al. 2012), but younger generations such as college students are more responsive to web surveys (Shih and Xitao 2008; Börkan 2010). However, the sample may retain biases associated with the sampling mode, i.e., a mode effect. In a survey aimed at college students, Carini et al. (2003) found that web survey respondents gave more favorable responses regarding computing and information technology than the paper survey respondents. Survey developers could also obtain higher response rates by carefully selecting the sampling frame (Wolf et al. 2005). In recent years, scholars have used commercially-operated online opinion panels, consisting of people who pre-register for survey participation in return for rewards (e.g., cash, vouchers), to reach out to survey respondents and enhance response rates (Neufeld and Mokhtarian 2012; Miller et al. 2020; Chauhan et al. 2021). Some companies that operate these online opinion panels allow quota sampling within the panelists to ensure a (more) representative sample regarding the selected control variables (usually sociodemographic variables). Still, this does not guarantee the representativeness of other variables. For example, a recent study by this team found that online opinion panel respondents have significantly lower life satisfaction than respondents recruited from other sources, even when controlling for socio-demographics (Wang et al. 2022).
Another approach, as previously detailed in the “Introduction”, entails the recruitment of survey respondents who indicated willingness to respond in prior surveys (e.g., Lin et al. 2011). As with the other recruitment approaches discussed, this method also results in unrepresentative samples. Couper et al. (2007) modeled internet users’ willingness to do an online survey and their subsequent follow-up response. They concluded that self-selected samples of internet users are not representative of the population with respect to demographic, financial, and health-related variables. In another example, Germany’s Federal Statistical Office developed an access panel (a pool of persons willing to take part in voluntary surveys) from a large-scale household survey. The access panel was then used as the sampling frame for multiple surveys, and was found to be unrepresentative by multiple teams. Specifically, Amarov and Rendtel (2013) explored the survey participation propensity of the access panel and identified self-selection biases existing in multiple variables, including age, household size, and item-nonresponse. An accompanied simulation experiment (Tobias et al. 2013) on the selection process of the access panel emphasizes the importance of constructing proper statistical models for the access panel recruitment to ensure the appropriate usage of this high-response-rate and low-cost recruitment method. Similarly, Adriaan and Jacco (2009) applied bivariate logistic regressions to analyze the selectivity of the nonresponse of an online panel, which was recruited using a three-stage process: participation in a first telephone interview, willingness to be recontacted, and final agreement to participate in the online panel. The authors found selection biases with regards to age, income, and personal computer ownership.
Although transportation studies on this topic are limited, some studies have examined the nonresponse bias in travel surveys, which could inform the analysis of self-selection biases in recruiting survey respondents from a preceding travel survey. Wittwer and Hubrich (2015) reached out to survey nonrespondents with an abbreviated survey, and found that age and household size have significant differences between main survey respondents and nonrespondents. de Haas et al. (2018) used information obtained from a screening survey and found that age, gender, and education influence people’s willingness to participate in a household travel survey panel. They also found that willingness to participate in a travel survey could modify model coefficients and slightly improve the fits of mode choice models.
This study aims to address the literature gap by examining the practice of recruiting respondents from the NHTS for a statewide travel survey, and constructing a proper statistical model for the recruitment process in the transportation context. We apply the probit with sample selection (PSS) model for analysis, which remedies the selection biases by allowing correlations between the unobservables in the selection and outcome equations (Heckman et al. 2001). The PSS model was proposed by van de Ven and van Praag (1981), which is modified from the Heckman model (Heckman 1976; originally designed for correcting sample selection biases in linear regressions) to fit binary outcome dependent variables. In the transportation domain, sample selection models have been applied for various purposes, one of the most common of which is to correct for residential self-selection effects (Cao 2009; Chen et al. 2017; van Herick & Mokhtarian 2020). In that context, outcomes are observed for both “selected” and “unselected” groups. In other contexts, including ours, outcomes are only observed for “selected” cases—for us, the cases who self-select into both being willing to respond, and actually responding, to a follow-up survey (Alemi et al. 2019; Sun et al. 2019). In this study, we select the PSS model structure since it both fits our data structure (see the “Data description” section) and matches the conceptual reasoning (see the “Model structure and application” section).

Data description

The National Household Travel Survey (NHTS) is a repeated cross-sectional travel survey conducted by the Federal Highway Administration, and is widely used by regional planning agencies across the United States. The Georgia subsample of the 2017 NHTS constitutes the survey dataset used for this study. The NHTS typically obtains household, individual, vehicle, and trip information using several survey instruments; these include a recruitment survey, a retrieval survey, travel logs, and a vehicle odometer mileage form. In 2017, for the first time, NHTS allowed states to opt into including a question regarding respondents’ willingness to participate in follow-up travel surveys, and Georgia was one of the six states/regions that chose to do so. We segmented NHTS Georgia respondents based on their willingness to participate in a follow-up survey as well as their actual response behavior to the follow-up survey (see Decisions 1 and 2 in Fig. 1).1 The follow-up survey, denoted the GDOT survey in Fig. 1, is further discussed later in this section.
As shown in Fig. 1, the first decision was made through the willingness question in the NHTS (i.e., “Would you be willing to participate in a follow-up survey?”). This question is only asked of the main household respondent (i.e., the respondent who answered household-related questions in the retrieval survey), and solely of those living in the regions (i.e., states or Metropolitan Planning Organization areas) that specifically requested the inclusion of this question, with Georgia being one of those regions as mentioned before. As such, we used only the main household respondents for analysis purposes, as we did not have additional information regarding other household members’ willingness to participate in a follow-up survey. The final working dataset comprised 8418 respondents, 4,965 of whom indicated a willingness to participate in a follow-up survey (W1), whereas the remaining 3,453 respondents did not want to be contacted again for future surveys (W0).
For the 4,965 NHTS respondents who indicated a willingness to participate in a follow-up survey, their second decision (Fig. 1) was made through their actual response to a follow-up survey, the Georgia Department of Transportation-funded Emerging Technologies Survey (GDOT survey, Kim et al. 2019). The GDOT survey is a 15-page attitudinally-rich travel survey with an emphasis on the impacts of emerging technologies on travel behavior. Our research team mailed the GDOT survey to the 4,965 NHTS respondents in September 2017. The respondents could either mail the completed paper survey back using the postage-paid reply envelope we provided, or use the URL we also provided to complete the survey online. Ultimately, 1,432 of the 4,965 NHTS respondents replied to the GDOT survey (W1R1), while the remaining 3,533 did not reply (W1R0). Thus, at this point, we have segmented all 8418 NHTS Georgia respondents based on the two decisions. We note that for the purpose of this paper, the GDOT survey was used only to segment/classify respondents; all respondent data was obtained from the NHTS.
In Table 1, we present descriptive statistics for each segment and the overall sample. In the full sample, the average household size is 2.13, the average age is 55.6 years, and 53% of the sample is female. Overall, participants are highly educated, with 48% of the participants reporting they have a bachelor’s degree or higher. Compared to respondents who are unwilling to be contacted (W0), respondents who are willing to be contacted for a follow-up survey (W1) tend to be younger (means of 54.35 versus 57.30 years). On average, the W1 segment conducts more trips on the selected travel day (4.16 versus 3.52 trips) and lives in denser areas (859.07 versus 769.92 housing units per sq. mi.). Among the respondents willing to be contacted, those who replied to the GDOT survey (W1R1) tend to be older than those who did not reply (W1R0, 59.00 versus 52.46 years). The W1R1 segment conducts more trips (4.47 versus 4.03) on the selected travel days, and they come from less dense areas than other groups.
Table 1
Descriptive statistics of the working dataset (sample means/shares)
  
Full sample:
W0:
Unwilling to be contacted
W1:
Willing to be contacted
W1R0:
Willing but did not reply
W1R1:
Willing and did reply
 
Sample size
8,418
3,453
4,965
3,533
1,432
Household sociodemographic
Household size (persons)*
2.13
2.17
2.10
2.13
2.01
Home ownership (yes)
0.75
0.80
0.71
0.66
0.84
Individual sociodemographic
Age*
55.56
57.30
54.35
52.46
59.00
Has a medical condition (yes)
0.13
0.13
0.13
0.14
0.12
Gender (female)
0.58
0.57
0.59
0.60
0.55
Born in US (yes)
0.93
0.91
0.94
0.93
0.95
Race: white (yes)
0.73
0.74
0.72
0.69
0.79
Education
     
 Less than a high school graduate
0.038
0.043
0.035
0.041
0.022
 High school graduate or equivalent
0.19
0.20
0.17
0.18
0.15
 Some college or associate's degree
0.30
0.29
0.30
0.30
0.30
 Bachelor's degree
0.24
0.23
0.24
0.24
0.26
 Graduate degree or professional degree
0.24
0.23
0.25
0.24
0.28
Worker (yes)
0.54
0.52
0.56
0.59
0.48
Travel-related characteristics
No. of trips recorded in one-day travel diary *
3.90
3.52
4.16
4.03
4.47
Transit usage frequency*1
0.64
0.40
0.81
0.95
0.46
Survey-related characteristics
Household income—missing value
0.035
0.064
0.015
0.016
0.011
VMD—"I don't know"
0.25
0.32
0.20
0.21
0.17
VMD—"I prefer not to answer"
0.015
0.025
0.009
0.009
0.008
Land use characteristics
Housing units per sq. mi.*
822.51
769.92
859.07
920.44
707.68
10 = Never; 1 = Less than once a month; 2 = 1–3 times a month; 3 = 1–2 times a week; 4 = 3–4 times a week; 5 = 5 or more times a week
*Treated as continuous variables for modeling; descriptive statistics are sample means
Treated as continuous variables for modeling; descriptive statistics are sample shares
The remaining variables are binary variables. For simplicity, we only show sample shares of one category as indicated in the table
All descriptive statistics are unweighted. We provide weighted distributions in Table 7, including population distributions based on the 2018 American Community Survey five-year estimates and the full NHTS Georgia sample
In the following sections, we separate the final working dataset (N = 8,418) into a training set (60%, N = 5,051) and a test set (40%, N = 3,367) to enable appropriate model evaluation.

Methodology

Model structure and application

As described in the last section, for this paper we model and analyze two consecutive decisions made by the 2017 NHTS Georgia respondents: (1) their willingness to participate in a follow-up survey and (2) their actual response behavior to the follow-up survey. The perspective we take is that the target behavior of interest is the participation in the second survey by anyone, and the goal is to obtain consistent estimates of the coefficients of the explanatory variables in the model predicting that behavior. But since we are only able to observe the second decision for NHTS respondents who are willing to participate in a follow-up survey (i.e., respondents who are self-selected, and so received a follow-up survey), modeling the observed response behavior only of this subsample could produce biased (econometrically inconsistent) estimates of those coefficients, relative to their true values in the population at large.
To address the self-selection bias, Heckman (1976) proposed the sample selection model as a corrective method for linear regression models. Given the binary nature of the two decisions in our case (i.e., willing/unwilling to participate, respond/do not respond to the follow-up survey), we apply the analogous corrective method for discrete choice models, the probit with sample selection (PSS) model (van de Ven and van Praag 1981), to deal with the self-selection bias.
In the PSS model, we have a selection model and an outcome model, which correspond to the willingness and response decisions, respectively. The selection and outcome models are defined as
$$y_{i}^{{S^{*} }} = {\varvec{z}}_{{\varvec{i}}} {\varvec{\gamma}} + \varepsilon_{i}^{s} ,$$
(1)
$$y_{i}^{{O^{*} }} = {\varvec{x}}_{{\varvec{i}}} {\varvec{\beta}} + \varepsilon_{i}^{O} ,$$
(2)
$$y_{i}^{S} = \left\{ {\begin{array}{*{20}c} {0, \quad if\,y_{i}^{{S^{*} }} < 0} \\ {1, \quad otherwise} \\ \end{array} } \right.$$
(3)
$$y_{i}^{O} = \left\{ \begin{aligned} {unobserved}, \quad &{if\,y_{i}^{S} = 0} \\ {0}, \quad&{if \,y_{i}^{S} = 1\,and\,y_{i}^{{O^{*} }} < 0} \\ {1}, \quad &{if\,y_{i}^{S} = 1\,and\,y_{i}^{{O^{*} }} \ge 0,} \\ \end{aligned} \right.$$
(4)
where \(y_{i}^{{S^{*} }}\) is the continuous latent variable indicating the tendency for individual \(i\) to be willing to participate in a follow-up survey; \(y_{i}^{{O^{*} }}\) is the tendency for individual \(i\) to respond to the follow-up survey (the GDOT survey); \({\varvec{z}}_{{\varvec{i}}}\) and \({\varvec{x}}_{{\varvec{i}}}\) are vectors of explanatory variables for the selection and outcome models, respectively; \({\varvec{\gamma}}\) and \({\varvec{\beta}}\) are the corresponding coefficient vectors; and \(\varepsilon_{i}^{S}\) and \(\varepsilon_{i}^{O}\) are error terms that capture the unobserved effects in the two models. As is standard, we assume that the error terms follow a bivariate normal distribution:
$$\left( {\begin{array}{*{20}c} {\varepsilon^{S} } \\ {\varepsilon^{O} } \\ \end{array} } \right) \sim N\left( {\left( {\begin{array}{*{20}c} 0 \\ 0 \\ \end{array} } \right),} \right.\left. {\left( {\begin{array}{*{20}c} 1 & \rho \\ \rho & 1 \\ \end{array} } \right)} \right).$$
(5)
In the observed choice formulations (Eqs. 34), \(y_{i}^{S}\) is the observed binary selection choice (willing to participate in a follow-up survey = 1, unwilling = 0), and \(y_{i}^{O}\) s the observed binary outcome choice (responds to the follow-up survey = 1, does not respond = 0). We observe the outcome if and only if the latent selection variable \(y_{i}^{{S^{*} }}\) is positive (or \(y_{i}^{S}\) = 1). Finally, we estimate the parameters \(\hat{\varvec{\gamma }},\widehat{{\varvec{\beta}} }, \hat{\rho }\) using maximum likelihood estimation. The log-likelihood can be written as
$$\ell \left( {\hat{\varvec{\gamma }},\widehat{{\varvec{\beta}}}, \hat{\rho }} \right) = \mathop \sum \limits_{{i:y_{i}^{s} = 0}} ln\left( {{\Phi }\left( { - {\varvec{z}}_{{\varvec{i}}} \hat{\varvec{\gamma }}} \right)} \right) + \mathop \sum \limits_{{i\!:\!y_{i}^{s} = 1, } \atop {y_{i}^{o} = 1}} ln\left( {{\Phi }_{2} \left( {{\varvec{z}}_{{\varvec{i}}} \hat{\varvec{\gamma }}, {\varvec{x}}_{{\varvec{i}}} \widehat{\varvec{\beta }};\hat{\rho }} \right)} \right) + \mathop \sum \limits_{{i\!:\!y_{i}^{s} = 1,} \atop { y_{i}^{o} = 0} } ln\left( {{\Phi }_{2} \left( {{\varvec{z}}_{{\varvec{i}}} \hat{\varvec{\gamma }}, - {\varvec{x}}_{{\varvec{i}}} \widehat{\varvec{\beta }};\hat{\rho }} \right)} \right),$$
(6)
where \({\Phi }\left( \cdot \right)\) represents the cumulative univariate standard normal distribution function and \({\Phi }_{2} \left( \cdot \right)\) represents the cumulative bivariate normal distribution function. With this model formulation, we can calculate three sets of probabilities: the marginal probabilities of being willing or not (Eqs. 78), joint probabilities of being willing and responding or not responding (Eqs. 910), and conditional probabilities of responding or not, given willingness (Eqs. 1112).
Marginal probabilities:
$$P\left( {y_{i}^{S} = 0} \right) = {\Phi }\left( { - {\varvec{z}}_{{\varvec{i}}} \hat{\varvec{\gamma }}} \right)$$
(7)
$$P\left( {y_{i}^{S} = 1} \right) = {\Phi }\left( {{\varvec{z}}_{{\varvec{i}}} \hat{\varvec{\gamma }}} \right)$$
(8)
Joint probabilities:
$$P\left( {y_{i}^{S} = 1, y_{i}^{O} = 0} \right) = {\Phi }_{2} \left( {{\varvec{z}}_{{\varvec{i}}} \hat{\varvec{\gamma }}, - {\varvec{x}}_{{\varvec{i}}} \widehat{\varvec{\beta }}; \hat{\rho }} \right)$$
(9)
$$P\left( {y_{i}^{S} = 1, y_{i}^{O} = 1} \right) = {\Phi }_{2} \left( {{\varvec{z}}_{{\varvec{i}}} \hat{\varvec{\gamma }}, {\varvec{x}}_{{\varvec{i}}} \widehat{\varvec{\beta }}; \hat{\rho }} \right)$$
(10)
Conditional probabilities:
$$P\left( { y_{i}^{O} = 0 | y_{i}^{S} = 1} \right) = {\Phi }_{2} \left( {{\varvec{z}}_{{\varvec{i}}} \hat{\varvec{\gamma }}, - {\varvec{x}}_{{\varvec{i}}} \widehat{\varvec{\beta }}; \hat{\rho }} \right)/{\Phi }\left( {{\varvec{z}}_{{\varvec{i}}} \hat{\varvec{\gamma }}} \right)$$
(11)
$$P\left( { y_{i}^{O} = 1 | y_{i}^{S} = 1} \right) = {\Phi }_{2} \left( {{\varvec{z}}_{{\varvec{i}}} \hat{\varvec{\gamma }}, {\varvec{x}}_{{\varvec{i}}} \widehat{\varvec{\beta }}; \hat{\rho }} \right)/{\Phi }\left( {{\varvec{z}}_{{\varvec{i}}} \hat{\varvec{\gamma }}} \right)$$
(12)
The three sets of probabilities reflect distinct statistical explanations, which should be appropriately used under different model applications. In Table 2, we summarize a few application scenarios and the corresponding probabilities, in the context of a two-stage survey sample recruitment. This study will mainly focus on the first application scenario while lightly touching on the third one in the section “Outside Georgia: what does the follow-up survey sample look like?”. It is worth mentioning here that, similar to any other model, prediction errors exist in the PSS model applications. We summarize several model performance measures in the next section to help evaluate the quality of the model.
Table 2
Applications of the PSS model in different scenarios
Scenario
Model and probability used in the prediction
1. Decomposition of the deviation (i.e., bias) of the follow-up survey sample from the population into its various components (e.g., dataset bias, self-selection bias, prediction errors). This is enabled by comparisons of the predicted sample and population distributions at various stages of the model
• Use the selection model and the marginal probability of selection \({\text{P}}\left( {{\text{y}}_{{\text{i}}}^{{\text{S}}} = 1} \right)\) for the prediction of people who are willing to participate in a follow-up survey
• Use the joint model and joint probability of selection and outcome \({\text{P}}\left( {{\text{y}}_{{\text{i}}}^{{\text{S}}} = 1,{\text{ y}}_{{\text{i}}}^{{\text{O}}} = 1} \right)\) for the final prediction of follow-up survey respondents
2. Prediction of the response to a second-stage survey following a large-scale first-stage survey (e.g., NHTS) that contains the willingness question. Survey developers conduct a small-scale field test of the second-stage survey to enable the estimation of the PSS model, and then apply the outcome model to the remainder of the willing first-stage sample to predict the size and characteristics of the full-scale second-stage sample
• Use the conditional probability \({\text{P}}\left( {{\text{y}}_{{\text{i}}}^{{\text{O}}} = 1|{\text{y}}_{{\text{i}}}^{{\text{S}}} = 1} \right)\) to predict the second-stage response of the willing first-stage sample
3. Prediction of the response to a second-stage survey following a large-scale first-stage survey (e.g., NHTS) that does not contain the willingness question. Survey developers do not know the response willingness of the first-stage sample, and adopt a PSS model estimated from other datasets / regions to predict the size and characteristics of the second-stage sample
• Using a joint model estimated from other datasets, compute the joint probability \({\text{P}}\left( {{\text{y}}_{{\text{i}}}^{{\text{S}}} = 1,{\text{ y}}_{{\text{i}}}^{{\text{O}}} = 1} \right)\) to predict the second-stage response from the full first-stage sample

Model performance measures

Due to the two-level model structure of the PSS model, the usual discrete choice model performance measures cannot be directly applied, which might explain why PSS models have diverse performance measures in the literature. Accordingly, we aim to address the lack of clarity in the literature surrounding PSS measures by providing a resource for six frequently used categories of model measures, adjusted based on the PSS model structure: the log-likelihood, McFadden’s pseudo R-squared, information criteria, correlation, root mean squared error, and success table. Table 3 provides definitions of the six measures, and gives examples of them being applied within the literature. We also demonstrate their use by calculating all of them for the PSS model developed in this paper in section “Model performance results”.
Table 3
Model performance measures for probit with sample selection models
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Tab3_HTML.png
Since both selection and outcome models are binary probit models, we first introduce the log-likelihoods for three models associated with the PSS model: the equally-likely (EL) model, market-share (MS) model, and full model (Eqs. 13–15). Log-likelihoods provide direct measures of the model performance, but they do not allow model comparisons across studies since the values are related to the sample size. McFadden’s pseudo R-squared (\({\rho }^{2}\)) provides a measure that is derived from the log-likelihoods but is bounded between 0 and 1. A higher \({\rho }^{2}\) means greater information explained by the model (Mokhtarian 2016). Equations 16 and 17 are \({\rho }^{2}\) s with EL and MS bases, respectively. Information criteria such as the Akaike information criterion (AIC, Eq. 18) and Bayesian information criterion (BIC, Eq. 19) are also based on log-likelihoods. These criteria penalize the number of model coefficients to promote parsimony, which could be used for model selection. However, similar to the drawback of log-likelihoods, we do not have a benchmark for such information criteria. The three log-likelihood-associated categories of measures are suitable when the overall PSS model performance is required, such as for Scenarios 1 and 3 in Table 2.
Another model performance measure is the correlation coefficient between predicted probabilities and observed choices. Since the observed choice is a binary variable and the predicted probability is a continuous variable, we apply point-biserial correlation coefficients (Eq. 20), which range between -1 (the wrong outcome is predicted with certainty) and 1 (the correct outcome is predicted with certainty). The closer \({r}_{pb}\) is to 1, the better the model. Root mean squared error (RMSE) measures the (square root of the) average squared discrepancy between the observed choice (0 or 1) and the predicted probability (Eq. 21). For our model, RMSE ranges between 0 and 1, with smaller RMSE indicating better prediction results. Although the correlation and RMSE measures do not provide an overall measure of the PSS model but only measure separate model performances of the selection and outcome models, they are instrumental under specific application scenarios. For example, in the bias decomposition application (Scenario 1 in Table 2), separate performance measures provide comparable prediction error indicators between selection and outcome models as we decompose biases step by step (see the section “Inside Georgia: Breakdown of sample biases” for more details). Separate model performance measures are also useful when we only need the performance of a single model (e.g., the outcome model performance with known selection results, Scenario 2 in Table 2).
The last model performance measure category is the probability-based success table, which was originally proposed by McFadden (2000). Given the two-level model structure of the PSS model, we could generate a \(3\times 3\) matrix based on the observation and model prediction results (\({y}_{i}^{S}=0\); \({y}_{i}^{S}=1\,\text{and}\,{y}_{i}^{O}=0\); \({y}_{i}^{S}=1\,\text{and}\,{y}_{i}^{O}=1\)). Equation 22 calculates the number of cases in the \(m{n}^{th}\) cell in a success table. Success tables allow both overall model performance measures (i.e., overall prediction accuracy) and alternative-specific measures (i.e., success proportion, success index). Success tables are usually computed for both training and test sets to examine the generalizability of the model.

Results

In this section, we first present the PSS model result (Table 4) and then measure the model performance with the six metrics presented in the previous section (Table 5).
Table 4
Probit with sample selection model results (N = 5,051)
Variables
Coefficient
Std. error
Selection model: willingness to participate in a follow− up survey
  
Household sociodemographic
  
Household size (log transformed)
− 0.185***
0.0377
Homeowner
− 0.178***
0.0469
Individual sociodemographic
 
Age
− 0.00726***
0.00139
Has a medical condition
0.150*
0.0581
Female
0.111**
0.0369
Born in US
0.194**
0.0694
Travel−related characteristics
Number of trips
0.0478***
0.00629
Transit usage frequency
0.0579*
0.0230
Survey− related characteristics
Household income—missing
− 0.857***
0.106
VMD—"I don't know"
− 0.464***
0.0424
VMD—"I prefer not to answer"
− 0.796***
0.140
Constant
0.188*
0.0852
Outcome model: response to the follow−up survey
Household sociodemographic
  
Homeowner
0.417***
0.0606
Individual sociodemographic
Age
0.0120***
0.00178
Has a medical condition
− 0.331***
0.0733
Race: white
0.106*
0.0534
Education
0.0746***
0.0215
Worker
− 0.181***
0.0540
Land use characteristics
Housing units per sq. mi
− 0.0528*
0.0246
Constant
− 0.619***
0.129
Error terms correlation
\(\rho\)
− 0.574***
0.0964
***Coefficient is statistically significant at the 0.001 level
**Coefficient is statistically significant at the 0.01 level
*Coefficient is statistically significant at the 0.05 level
Insignificant variables removed from the model include no. of vehicles per driver in the household, no. of children in the household, frequency of walk trips, and usage of delivery services, among others
Table 5
Probit with sample selection model measures (N = 5,051)
Measure
Formula
Value
Log-likelihood
\(\ell\left(\varvec{0}\right)\)
 − 5571.517
 
\(\ell\left(\varvec{c}\right)\)
− 5231.426
 
\(\ell\left(\widehat{\varvec{\gamma }},\widehat{\varvec{\beta }}, \widehat{\rho }\right)\)
− 4921.783
McFadden’s pseudo R-squared
\({\rho }_{EL}^{2}\)
0.117
 
\({\rho }_{MS}^{2}\)
0.059
Information criteria
\(AIC\)
9885.567
 
\(BIC\)
10022.640
Point-biserial correlation coefficient
\({r}_{pb}\)
\({r}_{pb}\) (selection model) = 0.274
\({r}_{pb}\) (outcome model) = 0.271
Root mean squared error (RMSE)
\(RMSE\)
\(RMSE\) (selection model) = 0.473
\(RMSE\) (outcome model) = 0.439

Model results

Selection model

The selection model explains respondents’ willingness to participate in a follow-up survey. We organized the explanatory variables into three categories: household- and individual-level sociodemographic characteristics, travel-related characteristics, and survey-related characteristics (Table 4).
Among the household-level sociodemographic characteristics tested, we see that respondents from larger households are less willing to participate in a follow-up survey compared to respondents from smaller households; we propose that one reason for this finding may reside in the format of the NHTS. Specifically, NHTS requires all household members five years of age or older to complete the personal section in the retrieval survey and record their travel on the designated travel day. As such, it is more time-consuming and burdensome for larger households to complete the NHTS requirements, which may weaken the motivation of the main household respondent to volunteer for another survey. Furthermore, the log transformation of household size indicates that the impact on survey willingness of a one-person increase in household size becomes weaker (but still negative) as the household size grows. The model also shows that homeowners are less willing to participate in a follow-up survey. On the one hand, moderate correlations between homeownership and vehicle ownership (0.37), and between homeownership and household income (0.36), suggest that the homeownership variable may be considered a proxy indicator of middle-to-high-income households. On the other hand, individuals who own homes tend to be at different life stages relative to those who rent (e.g., a later career stage with more demands on their time).2 In either case, respondents from such households would have higher values of time and thus be less willing to take follow-up surveys.
Among individual-level sociodemographic characteristics, we find that younger people, women, and people who were born in the U.S. are more willing to participate in a follow-up survey. We also find that individuals who have a medical condition restricting them from traveling outside the home are more willing to participate than people who do not have such restrictions. On the one hand, the travel-limited group comprises primarily older individuals who may be retired and thus have more time for doing surveys. The results may also reflect the altruism of the travel-limited group, possibly suggesting that they seek to contribute to society in ways that are accessible to them. On the other hand, their interest and participation in travel-related surveys may also highlight the unmet travel demands of these individuals.
Among travel-related characteristics tested, the model shows that people who report more trips on the designated travel day are more willing to participate in a follow-up survey, which runs counter to our expectations. Based on the findings regarding household size, we conjectured that having to record more trips would reduce the willingness to participate in a follow-up survey. A resolution of the paradox might reside in the individual’s liking for travel. Specifically, travel-liking people might record their travel logs more comprehensively (e.g., walk one block to buy coffee in the middle of the workday, pick up dry cleaning on the way back home), and also be eager to complete a future travel survey.3 In contrast, those reporting fewer trips might tend to ignore trivial, non-mandatory, short trips or stops because they are not sensitive enough to catch these trips and/or they want to alleviate the burden of completing the travel logs. Alternatively, even without especially liking traveling, heavy travelers may still be interested in the subject precisely because it is such a big part of their lives. Accordingly, they may be more likely than others to express willingness to be surveyed again, whether or not they are too busy traveling to actually respond when the invitation comes. Moreover, frequent transit users are also more willing to participate in a follow-up survey, which might be due to their desire to improve the quality of their travel experience by providing feedback through travel surveys.
Survey-related characteristics constitute a group of variables unique to the selection model: item non-responses. In NHTS, many questions provide choices of “I don’t know” and “I prefer not to answer”, which allows respondents to protect their privacy for sensitive information (e.g., income) and avoid imprecise estimations (e.g., vehicle-miles driven, VMD). In our model, we combine “I don’t know” and “I prefer not to answer” for the household income question and treat both of these responses as indicative of respondents who choose to protect their privacy. The resultant variable is called the household income missing value indicator, and the negative sign of the coefficient implies that respondents who are more protective of their privacy are less willing to participate in a follow-up survey.4 Regarding VMD, since the variable is self-estimated by NHTS respondents, we believe some respondents who do not care much about their travel might be unclear about their annual VMD. As such, “I don’t know” may represent an apathetic attitude toward travel, whereas “I prefer not to answer” reflects a privacy-protective attitude, and accordingly we keep those responses separate for VMD. The model shows that both respondents who are less interested in their travel behavior and respondents who are protective of their privacy regarding travel behavior, are less willing to respond to a follow-up survey.

Outcome model

The outcome model explains the actual, observed response to the GDOT survey for NHTS respondents who reported being willing to participate in a follow-up survey. The outcome model contains two groups of explanatory variables: household- and individual-level sociodemographic characteristics, and land use characteristics.
Homeownership is the household-level sociodemographic characteristic that was found to be significant in both the selection and outcome models. Interestingly, however, the variable has opposing signs in the two models. Specifically, homeowners were less willing to participate in a follow-up survey than the renters, but among respondents who are willing to participate in a follow-up survey, homeowners are more likely to respond than renters. One reason for the latter outcome may be that homeowners are more likely to receive the follow-up survey because they move less often, whereas the follow-up survey might not reach renters due to address changes. However, we do not have reliable records of everyone who had moved and thus did not receive the GDOT survey invitation. Another reason might be that homeowners were initially less willing to commit their time to a follow-up survey due to having more household responsibilities, but once opting in, the same commitment to one’s responsibilities makes them more likely to follow through.
Age and medical conditions are individual-level sociodemographic characteristics that are significant in both selection and outcome models, albeit also with opposing signs. In general, younger people report being more willing to participate in a follow-up survey compared to older people, while among respondents expressing willingness to participate in a follow-up survey, older people are more likely to actually respond than younger people. Potentially, younger people are less reachable (i.e., more transient) or less able to participate when the time actually comes, even though they may aspire to be helpful. As previously discussed, respondents with travel-restricting medical conditions are more willing to participate in a follow-up survey compared to respondents who do not have such restrictions. However, among people willing to participate in a follow-up survey, medically-restricted respondents are less likely to respond than people who do not have any travel restrictions. It is possible that the medical conditions that restrict travel might also limit these respondents from completing the follow-up survey (e.g., poor eyesight); it is also possible that the medical conditions worsened during the approximately one-year interval between surveys.5 The outcome model also shows that white, higher-educated people are more likely to respond to the follow-up survey, while workers are less likely to respond to the follow-up survey than non-workers, probably due to time constraints on the part of the worker group.
The land use characteristics are the variable group unique to the outcome model, as they were only found to be significant in this model. We find that people from less dense areas are more likely to respond to the follow-up survey, which could be related to the types of individuals who typically live in lower density areas in Georgia (e.g. older, more likely to be retired).6

Error terms

The correlation of the error terms in the selection and outcome models is highly significant and sizable (-0.574), which indicates that the self-selection bias (expressed willingness to participate in a follow-up survey) significantly influences whether or not an individual responds to a follow-up survey. Specifically, its negative value signifies that on net, unobserved characteristics that increase the reported willingness to participate in a follow-up survey will tend to decrease the tendency to actually do so. Or conversely, unobserved factors that decrease the reported willingness (e.g., a sense of responsibility leading one to count the cost before agreeing to do something) might be the same factors that influence respondents to keep the commitment once they opt in to the follow-up survey. Having already seen this pattern from the three observed explanatory variables with opposing signs in the selection and outcome models (i.e., homeownership, age, and medical condition), it is not hard to imagine that it could prevail among unobserved variables as well.

Model performance results

In this section, we apply model performance measures from the six categories proposed in section “Model performance measures” to our PSS model. Table 5 presents measures from the first five categories including log-likelihood, McFadden’s pseudo R-squared, information criteria, correlation, and root mean squared error. The success table is presented in Table 6.
Table 6
Success table
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Tab6_HTML.png
As discussed previously, we cannot compare log-likelihoods and information criteria with models in other studies due to the varying sample sizes, whereas McFadden’s pseudo R-squareds are comparable given their 0 to 1 range. In this study, McFadden’s pseudo R-squareds are relatively low, which could result from the nature of predicting survey participation. The willingness to participate in a follow-up survey and the actual response also depend on people’s mood and time pressure at the moment, which are unobserved in our dataset but may explain a large share of the variability in the dependent variables. In the literature, the model fits regarding survey willingness and actual response are similar to ours. For example, Wittwer and Hubrich (2015) developed a binary logistic regression model of survey response behaviors and McFadden’s pseudo R-squared was 0.052 (relative to the constant-only model benchmark). Regarding an internet survey, Couper et al. (2007) obtained Cox and Snell pseudo R-squareds of 0.044 and 0.067 for the willingness and response models, respectively.7
The last model performance measure is the probability-based success table. As shown in Table 6, the bolded numbers on the diagonal represent the number of correct predictions, while the off-diagonal elements are the number of misclassifications. Based on the success table, we calculate overall prediction accuracy (sum of the diagonal elements divided by the total, which is 0.41 for the training set) and the alternative-specific accuracy (i.e., success proportion). Specifically, a success proportion is the number of correct predictions of a specific choice divided by the total number of predictions of that choice. For example, 45% of the people who are predicted to be unwilling to participate in a follow-up survey (\({y}_{i}^{S}\)=0) actually do not want to participate in a follow-up survey. We could further normalize success proportions by the corresponding observed shares to obtain success indices, which directly compare the performance of the calibrated model with the market-share prediction for each alternative. In general, we expect the success index to be greater than 1, signifying superiority of the final model over the market-share model. Larger success indices indicate more accurate predictions. For example, our model is respectively 1.11, 1.10, and 1.21 times better than the market-share model in predicting the three outcomes. Table 6b is the success table based on the test set. Recall that we separated the final working dataset (N = 8,418) into a training set (60%, N = 5,051) and a test set (40%, N = 3,367) to enable appropriate model evaluation. In general, the PSS model has quite similar performances in the training and test sets, which indicates good generalizability of the model to “new” data drawn from the same context.

PSS model validation and application

In this section, we will first apply the PSS model to the hold-out NHTS Georgia sample (the test set) to further validate our model results (Parady et al. 2021) and retrieve sample biases in the follow-up survey from multiple sources (Scenario 1, Table 2). We will then apply the PSS model to selected states in diverse geographic regions of the US (California, Massachusetts, Minnesota, North Carolina, and New York) and the full 2017 NHTS national sample, to predict follow-up survey participation and test the transferability of the PSS model (Scenario 3, Table 2).

Inside Georgia: breakdown of sample biases

In this section, we apply the PSS model to the test set to predict respondent participation in the follow-up survey, and compare the marginal distributions of several selected variables with the corresponding population8 distributions derived from the 2018 American Community Survey five-year estimates (https://​www.​census.​gov/​programs-surveys/​acs). By analyzing the distribution divergence between the follow-up survey respondents and the population, we summarize the potential biases residing in the sampling method, i.e., recruiting respondents from a preceding travel survey. Figure 2 visualizes the five bias sources: dataset bias, household representative bias, self-selection bias, non-response bias, and prediction error. Please see Table 7 for detailed distributions.
Table 7
Marginal distributions of selected variables
(a) Individual-level
Column number
1
2
3
4
5
6
7
8
9
Dataset
ACS
GA1
NHTS
GA2
NHTS
HH reps2
Follow-up survey opt-in observed2
Follow-up survey opt-in predicted2†
Follow-up survey final observed2
Follow-up survey final predicted2‡
Percent change3
Effect size3
Age
 18–24
0.13
0.13
0.043
0.046
0.053
0.018
0.025
− 0.81
0.43**
 25–34
0.18
0.17
0.16
0.18
0.18
0.09
0.11
− 0.37
 
 35–44
0.18
0.19
0.20
0.22
0.20
0.16
0.17
− 0.04
 
 45–54
0.18
0.17
0.20
0.19
0.20
0.22
0.20
0.12
 
 55–64
0.16
0.17
0.20
0.21
0.19
0.26
0.23
0.42
 
 65 + 
0.17
0.17
0.20
0.17
0.17
0.25
0.26
0.52
 
Gender
 Male
0.48
0.48
0.41
0.41
0.42
0.45
0.44
− 0.08
0.08
 Female
0.52
0.52
0.59
0.59
0.58
0.55
0.56
0.07
 
Education
 Less than a high  school graduate
0.062
0.070
0.051
0.042
0.052
0.019
0.038
− 0.38
0.61***
 High school graduate or equivalent
0.36
0.25
0.20
0.18
0.20
0.19
0.17
− 0.53
 
 Some college or associate's degree
0.30
0.30
0.31
0.32
0.31
0.27
0.29
− 0.02
 
 Bachelor's degree
0.17
0.21
0.24
0.24
0.23
0.26
0.24
0.36
 
 Graduate degree or professional degree
0.10
0.17
0.21
0.21
0.21
0.26
0.26
1.53
 
Worker
0.59
0.62
0.63
0.62
0.62
0.59
0.56
− 0.05
0.06
Hispanic
0.078
0.083
0.075
0.066
0.073
0.058
0.058
− 0.26
0.08
Asian/Pacific Islander
0.046
0.039
0.034
0.025
0.026
0.0090
0.017
− 0.63
0.14*
Black
0.31
0.31
0.32
0.35
0.33
0.26
0.26
− 0.18
0.12*
Native American
0.0090
0.0036
0.0037
0.0017
0.0033
0.0034
0.0028
− 0.69
0.07
White
0.62
0.63
0.62
0.60
0.62
0.70
0.70
0.13
0.17*
Commute mode
 Private vehicle
0.94
0.93
0.92
0.90
0.91
0.96
0.93
− 0.02
0.16*
 Taxi
0.0030
0.0077
0.0050
0.0079
0.0091
0.0017
0.0059
0.97
 
 Public transit
0.022
0.032
0.041
0.055
0.042
0.017
0.032
0.44
 
 Walk
0.016
0.018
0.022
0.022
0.021
0.007
0.018
0.13
 
 Bike
0.0025
0.0065
0.0090
0.013
0.012
0.015
0.0085
2.40
 
 Other
0.013
0.0074
0.0050
0.005
0.007
0.00
0.0077
− 0.41
 
Commute time
 0–10 min
0.21
0.21
0.21
0.22
0.21
0.18
0.21
− 0.01
0.17*
 10–20 min
0.30
0.26
0.27
0.27
0.27
0.25
0.27
− 0.11
 
 20–30 min
0.21
0.20
0.19
0.18
0.18
0.18
0.17
− 0.16
 
 30–60 min
0.23
0.27
0.27
0.27
0.27
0.34
0.28
0.21
 
 60–90 min
0.033
0.048
0.047
0.052
0.044
0.036
0.050
0.53
 
 90 + min
0.015
0.026
0.017
0.019
0.020
0.015
0.020
0.28
 
(b) Household-level
Column number
1
2
3
4
5
6
7
8
9
Dataset
ACS
GA4
NHTS GA5
NHTS HH reps5
Follow-up survey opt-in observed5
Follow-up survey opt-in predicted5†
Follow-up survey final observed5
Follow-up survey final predicted5‡
Percent change3
Effect size3
Household size
 1
0.27
0.28
0.28
0.31
0.32
0.31
0.31
0.14
0.12*
 2
0.33
0.33
0.33
0.29
0.31
0.33
0.35
0.05
 
 3 + 
0.40
0.39
0.39
0.40
0.37
0.36
0.34
− 0.14
 
Household income
 Less than $24,999
0.22
0.27
0.27
0.29
0.29
0.22
0.23
0.04
0.08
 $25,000 to $49,999
0.23
0.23
0.23
0.23
0.23
0.25
0.22
− 0.08
 
 $50,000 to $74,999
0.18
0.16
0.16
0.14
0.15
0.17
0.17
− 0.06
 
 $75,000 to $99,999
0.12
0.11
0.11
0.11
0.10
0.09
0.12
− 0.05
 
 $100,000 to $149,999
0.13
0.14
0.14
0.13
0.13
0.16
0.15
0.14
 
 More than $150,000
0.11
0.086
0.086
0.10
0.10
0.12
0.12
0.05
 
Vehicle ownership
 0
0.067
0.078
0.078
0.091
0.092
0.040
0.062
− 0.07
0.10*
 1
0.33
0.35
0.35
0.37
0.37
0.36
0.34
0.03
 
 2
0.38
0.34
0.34
0.33
0.32
0.36
0.34
− 0.10
 
 3 + 
0.22
0.23
0.23
0.20
0.22
0.24
0.26
0.14
 
Homeowner
0.63
0.62
0.62
0.57
0.58
0.75
0.75
0.20
0.25*
Number of children
 0
0.70
0.68
0.68
0.67
0.69
0.75
0.73
0.04
0.10*
 1
0.13
0.13
0.13
0.14
0.13
0.10
0.13
− 0.02
 
 2
0.11
0.12
0.12
0.12
0.12
0.11
0.11
− 0.03
 
 3 + 
0.060
0.063
0.063
0.06
0.05
0.037
0.038
− 0.37
 
For each variable, the sum of category shares might not equal 1 due to rounding errors
12018 ACS individual weights are applied
2NHTS individual weights, based on the 2015 ACS individual weights, are applied
3Comparison between the population distribution and follow-up survey predicted distribution (columns 1 and 7)
42018 ACS household weights are applied
5NHTS household weights are applied
*Small effect size (\(w\) = 0.10). ** Medium effect size (\(w\) = 0.30). *** Large effect size (\(w\) = 0.50)
Calculated with \({\text{P}}\left( {{\text{y}}_{{\text{i}}}^{{\text{S}}} = 1} \right)\). Calculated with \({\text{P}}\left( {{\text{y}}_{{\text{i}}}^{{\text{S}}} = 1,{\text{ y}}_{{\text{i}}}^{{\text{O}}} = 1} \right)\)
The PSS model has demonstrated the existence of self-selection biases through the highly significant and sizable correlation between the error terms in the selection and outcome models. Self-selection bias, however, is not the only source that contributes to the marginal distribution divergence between the follow-up survey respondents and the population (i.e., the bias in the follow-up survey respondents). As shown in Fig. 2, the first contribution arises from any coverage, sampling, and non-response biases associated with the dataset of the preceding survey, which is the 2017 NHTS in our case. Since the 2017 NHTS created individual and household weights using the 2015 ACS data as control variables, and since we used the 2018 ACS data to determine the “true” population distribution,9 the dataset bias associated with those control variables is trivial (columns 1 and 2 in Table 7).
The second contribution to bias comes from the fact that only people who answer the household-related questions in the retrieval survey–i.e., “household representatives (reps)”–are asked the willingness question in the NHTS. The follow-up survey (i.e., the GDOT survey) was therefore delivered only to household representatives and not to any other household members. The household representative filter results in individual-level biases (e.g., age, gender). The household-level variables are not influenced since household weights are the same across household members. Consequently, the marginal distributions of individual-level variables have sizable differences between the 2017 NHTS Georgia sample and the household representative sample (columns 2 and 3 in Table 7). If the household representative filter could be removed (i.e., if the willingness question were asked of all NHTS respondents), we would expect a more representative follow-up survey sample (see Appendix A for details of a scenario that simulates this hypothetical situation, with results that support the conjecture).
The distribution divergence between NHTS household representatives and individuals who are willing to participate in a follow-up survey (opt-in) reflects the self-selection bias (columns 3 and 4 in Table 7). The distribution divergence between the opt-in individuals and individuals who actually complete the follow-up survey reflects a non-response bias (columns 4 and 6), which might result from multiple reasons, such as the opt-in individual being no longer willing or able to do the follow-up survey at the time when it was received, or the follow-up survey not reaching the opt-in individual due to an address change.
The distribution divergence between the observed follow-up survey final respondents and the corresponding PSS predicted results indicates the prediction error (columns 4 versus 5 and columns 6 versus 7 in Table 7).
Beyond the bias breakdown, the sum of all biases and errors shown in Fig. 2, which indicates the distribution divergence between the population and the predicted follow-up survey respondents, is of the most concern.10 A small distribution divergence indicates that the follow-up survey sample is expected to be representative of the population, which is a positive sign that recruiting respondents from a preceding survey is efficient and reasonable. Otherwise, a large divergence indicates that a biased follow-up survey sample is expected, which may call for some sampling remedies to improve its representativeness. Accordingly, in Table 7, we present the percentage change (column 8) and effect size (ES, column 9) between the population (column 1) and the predicted follow-up survey respondents (column 7). The definition of ES (\(w\)) is as follows (Cohen 1977):
$$w = \sqrt {\mathop \sum \limits_{i = 1}^{m} \frac{{\left( {P_{prd\left( i \right) } - P_{pop\left( i \right)} } \right)^{2} }}{{P_{pop\left( i \right)} }},}$$
(23)
where \(m\) is the number of variable categories; \({P}_{prd(i)}\) is the predicted proportion of category \(i\) in the follow-up survey (Table 7, column 7); \({P}_{pop(i)}\) is the actual proportion of category \(i\) in the population (Table 7, column 1). In general, a smaller ES indicates similar distributions. Cohen (1977) provides references for ES magnitudes: effect sizes of 0.10, 0.30, and 0.50 are considered as small, medium, and large, respectively.
Among the individual-level variables (Table 7a), the distributions of education and age in the follow-up survey samples diverge most widely from the corresponding population distribution. Specifically, the follow-up survey respondents overrepresent highly educated and older groups. In the case of education, we see that the bias begins with the original set of NHTS respondents, and is amplified at the second stage of predicted response to the GDOT survey. The two commute-related variables show that we have a larger share of follow-up survey respondents who use non-private vehicles for commuting compared to the population, which might further contribute to the larger share of long commute times. The effect sizes of the household-level variables have overall smaller magnitudes than those of the individual-level variables (Table 7b). Homeownership has the largest effect size of 0.25. Specifically, the follow-up survey recruits a larger share of homeowners, which might relate to the survey mode (mailing) used for the follow-up survey: homeowners are more likely to receive the survey since they have permanent mailing addresses, while renters might not receive the follow-up survey due to address changes.
In Appendix B, we provide a visualization of selected variables shown in Table 7. The visualization presents the changing trajectories of the marginal distributions from the population to the predicted follow-up survey respondents.

Outside Georgia: what does the follow-up survey sample look like?

In this section, we test the transferability of the PSS model to different populations, by checking the representativeness of follow-up survey respondents for selected states in diverse geographic regions of the US (west to east: California, Minnesota, North Carolina, New York and Massachusetts) and the full 2017 NHTS national sample. Table 8 presents the effect size by state.
Table 8
Effect size by different geographic regions
 
GA
US
CA
MN
NC
NY
MA
ES by region1
Individual-level
Age
0.43
0.45
0.50
0.41
0.44
0.49
0.48
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Figa_HTML.gif
Gender
0.08
0.10
0.08
0.13
0.14
0.06
0.10
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Figb_HTML.gif
Education
0.61
0.60
0.67
0.46
0.54
0.58
0.65
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Figc_HTML.gif
Worker
0.06
0.07
0.09
0.08
0.11
0.05
0.08
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Figd_HTML.gif
Hispanic
0.08
0.11
0.14
0.15
0.05
0.16
0.15
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Fige_HTML.gif
Asian/Pacific Islander
0.14
0.15
0.20
0.10
0.09
0.19
0.12
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Figf_HTML.gif
Black
0.12
0.09
0.07
0.10
0.09
0.18
0.10
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Figg_HTML.gif
Native American
0.07
0.08
0.07
0.13
0.08
0.07
0.08
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Figh_HTML.gif
White
0.17
0.18
0.32
0.13
0.12
0.37
0.18
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Figi_HTML.gif
Commute mode
0.16
0.10
0.17
0.22
0.15
0.33
0.16
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Figj_HTML.gif
Commute time
0.17
0.12
0.21
0.15
0.11
0.20
0.33
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Figk_HTML.gif
Household-level
Household size
0.12
0.13
0.16
0.16
0.13
0.03
0.16
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Figl_HTML.gif
Household income
0.08
0.06
0.05
0.20
0.05
0.13
0.11
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Figm_HTML.gif
Household vehicles
0.10
0.12
0.05
0.09
0.06
0.43
0.13
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Fign_HTML.gif
Homeowner
0.25
0.29
0.33
0.21
0.23
0.46
0.30
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Figo_HTML.gif
No. of children
0.10
0.08
0.09
0.12
0.09
0.09
0.12
https://static-content.springer.com/image/art%3A10.1007%2Fs11116-022-10312-w/MediaObjects/11116_2022_10312_Figp_HTML.gif
Bolded numbers are the maximum effect size by row
1Visualization of the effect size for each state in the same order as presented in the table
In general, different regions have similar effect sizes for a given variable, which indicates a similar divergence level of the marginal distributions between the follow-up survey respondents and the populations in different regions. In that respect, the results show respectable generalizability of the PSS model across different areas. Nevertheless, the effect sizes do vary by state, which might point to regional differences that are not captured by the current PSS model. Moreover, the variations in effect size are not consistent across variables. For example, New York has the most representative follow-up survey sample regarding gender among the seven regions, but is the least representative on commute mode, household vehicles, and homeownership. Some of these large effect sizes of New York doubtless result from its diverse population composition and different lifestyles (e.g., large share of public transit use) compared to other states. Clearly, a model for Georgia is not seamlessly transferable to New York, but then it appears that a model for many other states would not be transferable to New York, either. Aside from New York, the model for Georgia seems to transfer relatively well to states that are dissimilar to it in many ways, including California and Massachusetts, as well as to the United States as a whole.
Overall, similar to findings in the previous section, the follow-up survey respondents are less representative in terms of age and education among the individual-level variables. Homeownership is the household-level variable that is hardest to represent in the follow-up survey. Appendix C provides marginal distributions of the variables in the selected geographic regions.

Conclusion

In this study, we identified and analyzed the self-selection bias existing in follow-up survey respondents who were recruited from a preceding travel survey (the 2017 NHTS). We applied a probit with a sample selection (PSS) model to examine the willingness of NHTS respondents to participate in a follow-up survey, together with their actual response behavior. Overall, as expected, we identified self-selection biases among survey respondents recruited from a preceding household travel survey. Findings suggest that the requirements of the preceding survey influenced respondents’ willingness to participate in follow-up surveys. In the particular context of NHTS, respondents from survey-burdensome households (e.g., large households) were less likely to report being willing to respond to a follow-up survey, although individuals reporting more trips were unexpectedly more likely to be willing. Respondents’ attitudes towards privacy, and some other travel-related characteristics, were also influential to their willingness to be contacted for a follow-up survey. For example, respondents from specific groups (e.g., travel-restricted people, frequent transit users) were more likely to report being willing to participate in a follow-up survey. By participating in travel surveys, these groups may be seeking to improve the quality of their travel. We also found three explanatory variables with opposing signs between the selection and outcome models, a finding that indicated inconsistencies between people’s reported willingness (to participate in a survey) and their actual (response) behaviors. Similarly, the negative error term correlations signified that, on net, unobserved characteristics had impacts on selection that were opposite to their impacts on the outcome.
PSS models do not have model performance measures that are consistently reported in the literature. To address this gap, this paper summarizes six well-known model performance measure categories, adjusted based on the PSS model structure: the log-likelihood, McFadden’s pseudo R-squared, information criteria, point-biserial correlation coefficient, root mean squared error, and success table. McFadden’s pseudo R-squared bounds the model fit between 0 and 1, which is straightforward for understanding and could be used to compare across different PSS models. The success table provides overall model performance measures as well as performance measures for each alternative, which supplies information important to evaluating the model.
We analyzed the representativeness of the follow-up survey respondents regarding 17 selected variables, including sociodemographic and travel-related variables. We decomposed the divergence of the marginal distributions between the population and the predicted follow-up survey respondents into five components, namely dataset bias, household representative bias, self-selection bias, non-response bias, and prediction error. Results showed that the household rep selection contributed to a large proportion of the distribution divergence of individual-level variables. The effect size for marginal distributions showed that education and age were the two least representative individual-level variables in the follow-up survey, whereas homeownership had the largest effect size among the household-level variables.
We also applied the PSS model to different geographic regions of the U.S., namely California, Massachusetts, Minnesota, North Carolina, and New York. Similar effect sizes across states indicated good generalizability of the PSS model, however education, age, and homeownership were still poorly represented among predicted respondents to the follow-up survey for these other states. New York had less representative predicted follow-up survey respondents compared to other states, presumably a consequence of its diverse population composition and different transportation-related lifestyles.
These results can help survey developers assess the representativeness and cost-effectiveness of the proposed sampling frame (i.e., a pool of previous survey respondents), which in turn will suggest adjustments to the sampling frame that can improve the representativeness of the new sample. Specifically, by using this approach to identify likely biases in the follow-up survey sample, study designers may choose to proactively oversample the predicted-to-be-underrepresented groups when recruiting from other data sources (e.g., online opinion panels). We recommend that large-scale travel surveys like the NHTS retain the willingness question as a recurring item, thereby allowing local agencies and researchers to efficiently recruit follow-up respondents from their sample. In fact, we recommend that the question be asked of all survey respondents, not only the main household respondent as was the case here. Recruiting future survey respondents from among all willing preceding survey respondents could substantially reduce sampling biases at the outset.
In a companion study (Wang 2021), we analyze the consequence of self-selection biases by assessing their influence on travel behavior models developed on the second-stage sample. We examine and compare two techniques (sample weights and sample selection models) that could remedy the influence of unrepresentative samples recruited from a preceding survey on travel behavior models.
The study also has several caveats. First, the follow-up survey is a personal travel survey instead of a household travel survey. Our results do not speak to a situation in which the follow-up survey aims to obtain answers from all household members. If “household willingness-to-respond” is defined to be “willingness of every household member to respond”, we would first of all expect a much lower willingness rate, and if follow-through response is required from every household member in order to count, we would secondly expect a much lower follow-through rate among the reported-to-be-willing households. We would further expect more severe biases on the part of the willing and responsive households. For example, our results suggest that, in view of the heavier burden, larger households will probably be less likely to express willingness to respond and to actually respond to follow-up surveys. Given these concerns, we imagine that it would be prudent, if at all possible, to allow something less than full household participation to “count”, at both stages of the process. Nevertheless, it is not presently clear how best to balance the disadvantages of a smaller and more biased sample when requiring full participation, against the disadvantages of incomplete household information when relaxing that requirement.
Another caveat is that the follow-up survey lags the preceding one by an interval ranging from four to 18 months, during which the address and demographic information of the initial survey respondents may have changed without our knowledge. We encourage future studies to explore the impact of time interval on the actual response to follow-up surveys. Moreover, it can be interesting to study the impact of completion modes (e.g., paper, online) for both preceding and follow-up surveys on the willingness to participate.

Acknowledgements

This work was funded under the Teaching Old Models New Tricks (TOMNET) Center, a University Transportation Center sponsored by the U.S. Department of Transportation through Grant No. 69A3551747116. Any opinions, findings, and conclusions or recommendations expressed in this study are those of the authors and do not necessarily reflect the views of the sponsor organizations. The paper was presented at the Transportation Research Board 100th Annual Meeting (poster presentation), the UC Davis Sustainable Transportation Energy Pathways Program (STEPS) seminar, and the Institute of Transportation Engineers (ITE) Student Chapter Joint Seminar (UMD, TAMU, UTK). We are grateful for the comments of the three anonymous reviewers, which have improved the paper.

Declarations

Conflicts of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix

Appendix A: Marginal distribution of selected variables (random selection)

As discussed in section “Inside Georgia: Breakdown of sample biases”, the household representative filter results in biases for individual-level variables. We would expect a more representative follow-up survey sample if the NHTS were to ask for every household member's willingness to participate in a follow-up survey. We simulate such a scenario by randomly selecting one adult from each household as the household representative and predicting their response to the follow-up survey. Table
Table 9
Marginal distribution of selected individual-level variables (HH reps and random selection)
Column number
1
2
3
3a
7
7a
9
9a
Dataset
ACS
GA1
NHTS
GA2
NHTS
HH reps2
NHTS
random2
Follow-up survey final predicted (HH reps)2‡
Follow-up survey final predicted (random)2‡
Effect size (HH reps)3
Effect size (random)4
Age
 18–24
0.13
0.13
0.043
0.097
0.025
0.087
0.43**
0.26*
 25–34
0.18
0.17
0.16
0.18
0.11
0.13
  
 35–44
0.18
0.19
0.20
0.20
0.17
0.17
  
 45–54
0.18
0.17
0.20
0.16
0.20
0.16
  
 55–64
0.16
0.17
0.20
0.18
0.23
0.19
  
 65 + 
0.17
0.17
0.20
0.19
0.26
0.25
  
Gender
 Male
0.48
0.48
0.41
0.45
0.44
0.45
0.08
0.06
 Female
0.52
0.52
0.59
0.55
0.56
0.55
  
Education
 Less than a high school graduate
0.062
0.070
0.051
0.072
0.038
0.058
0.61***
0.44**
 High school graduate or equivalent
0.36
0.25
0.20
0.23
0.17
0.22
  
 Some college or associate's degree
0.30
0.30
0.31
0.30
0.29
0.29
  
 Bachelor's degree
0.17
0.21
0.24
0.22
0.24
0.23
  
 Graduate degree or professional degree
0.10
0.17
0.21
0.18
0.26
0.21
  
Worker
0.59
0.62
0.63
0.62
0.56
0.55
0.06
0.08
Hispanic
0.078
0.083
0.075
0.078
0.058
0.062
0.08
0.06
Asian/Pacific Islander
0.046
0.039
0.034
0.034
0.017
0.028
0.14*
0.09
Black
0.31
0.31
0.32
0.33
0.26
0.23
0.12*
0.17*
Native American
0.0090
0.0036
0.0037
0.0029
0.0028
0.0031
0.07
0.06
White
0.62
0.63
0.62
0.61
0.70
0.72
0.17*
0.19*
Commute mode
 Private vehicle
0.94
0.93
0.92
0.92
0.93
0.93
0.16*
0.17*
 Taxi
0.0030
0.0077
0.0050
0.011
0.0059
0.0068
  
 Public transit
0.022
0.032
0.041
0.035
0.032
0.028
  
 Walk
0.016
0.018
0.022
0.019
0.018
0.016
  
 Bike
0.0025
0.0065
0.0090
0.0083
0.0085
0.0058
  
 Other
0.013
0.0074
0.0050
0.0073
0.0077
0.010
  
Commute time
 0–10 min
0.21
0.21
0.21
0.21
0.21
0.21
0.17*
0.17*
 10–20 min
0.30
0.26
0.27
0.26
0.27
0.25
  
 20–30 min
0.21
0.20
0.19
0.19
0.17
0.20
  
 30–60 min
0.23
0.27
0.27
0.27
0.28
0.27
  
 60–90 min
0.033
0.048
0.047
0.042
0.050
0.041
  
 90 + min
0.015
0.026
0.017
0.024
0.020
0.027
  
For each variable, the sum of category shares might not equal 1 due to rounding errors. Column numbers in Table 9 match the counterparts in Table 7
12018 ACS individual weights are applied
2NHTS individual weights, based on 2015 ACS individual weights, are applied
3Comparison between the population distribution and follow-up survey predicted distribution (HH reps, columns 1 and 7a)
4Comparison between the population distribution and follow-up survey predicted distribution (random, columns 1 and 7b)
*Small effect size (\(w\) = 0.10). ** Medium effect size (\(w\) = 0.30). *** Large effect size (\(w\) = 0.50)
Calculated with \(\mathrm{P}\left({\mathrm{y}}_{\mathrm{i}}^{\mathrm{S}}=1, {\mathrm{y}}_{\mathrm{i}}^{\mathrm{O}}=1\right)\)
9 presents the marginal distributions for randomly selected NHTS respondents (column 3a), the corresponding follow-up survey prediction (column 7a), and the effect size between the prediction and the population distribution (column 9a). Compared to the household representatives prediction (column 9), the new effect sizes calculated from the randomly selected NHTS respondents are generally reduced, especially for the largest effect sizes (e.g., age, education).

Appendix B: Changing trajectories of marginal distributions

To further illustrate the changing trajectories of the marginal distributions from the population to the predicted follow-up survey respondents, we select two individual-level variables (i.e., age, gender) and two household-level variables (i.e., household size, household income) and visualize them in Fig. 3 (for each figure, read lines from left to right).
Regarding the two individual-level variables, we see large differences between the NHTS Georgia population and NHTS household representatives. Specifically, household representatives underrepresent younger groups (i.e., 18–24 and 25–34) and males, meaning that middle-aged/older people (45 +) and females are more likely to answer the household-related questions in the retrieval survey. In the observed opt-in follow-up survey sample, we see slightly increased shares of young and middle-aged people, which indicates that the self-selection bias partially offsets the HH representative bias. However, the non-response bias results in an even worse underrepresentation of younger people and overrepresentation of older people in the observed final follow-up survey. The marginal distribution of gender is relatively stable after the household representative filter (except for the small increase of males in the sample), which indicates small self-selection biases, non-response biases, and prediction errors.
The two household-level sociodemographic variables, namely, household size and household income, have fluctuating trajectories. Regarding household size, we see similar marginal distributions of the population (ACS) and the NHTS Georgia sample/household rep sample. The main distribution divergence occurs between the NHTS Georgia/household rep sample and the observed opt-in follow-up survey respondents. As we have discussed in section “Model results”, larger households are less willing to participate in a follow-up survey due to the heavy burden of survey completion that accompanies more family members. After the opt-in process, the proportion of households with three or more members keeps shrinking, while two-member households take the largest share in the final follow-up survey sample due to non-response biases and prediction errors.
Regarding household income, we see that the NHTS Georgia/household rep sample overrepresents the lower income group (less than $24,999) and underrepresents some middle/high-income groups ($50,000 to $ 99,999, $150,000 or more). The household income distributions of the observed opt-in follow-up sample diverge from the household income distribution of the NHTS Georgia/household rep sample, which indicates self-selection biases. Interestingly, the traits of observed final follow-up survey respondents partially correct some of the divergences, i.e., the marginal distribution of the final follow-up survey respondents is close to the population marginal distribution. In other words, the non-response biases partially offset the self-selection bias.

Electronic supplementary material

Below is the link to the electronic supplementary material.
Footnotes
1
The NHTS public dataset is available at https://​nhts.​ornl.​gov. Access to the Decision variables will be given upon request.
 
2
We also investigated whether the presence of children might be a source of time poverty, but the correlation between homeownership and household size was only 0.06.
 
3
Since the NHTS did not measure travel-liking attitudes, we could not test our hypothesis with the presented PSS model. However, to investigate this conjecture we constructed a binary probit model for respondents’ willingness to participate in a follow-up survey using the GDOT survey data, which measured respondents’ willingness to participate in yet another follow-up survey as well as the travel-liking attitude. Results indicated that the travel-liking attitude positively associated with the willingness to participate at a significance level (p-value) of 0.001.
 
4
When we treated the two responses (“I don’t know” and “I prefer not to answer”) as separate variables, their coefficients were very similar.
 
5
The 2017 NHTS was conducted between April 2016 and May 2017. The GDOT survey was distributed in September 2017. Accordingly, the interval between the two surveys varies from 4 months to 1.5 years, but we do not know the specific gap for a given individual, since the date of completion of the NHTS survey was not provided with the data.
 
6
We checked the correlations of housing density with the home ownership (-0.18), household size (-0.11), age (-0.13), and worker (0.077) variables, but none of them were large enough to cause collinearity concerns.
 
7
To enable the comparison between our PSS model and the two single models in Couper et al. (2007), we calculate the Cox and Snell pseudo R-squared with the formula \(1 - \left( {\frac{{\ell\left( \varvec{c} \right)}}{{\ell\left( {\varvec{\hat{\gamma }}, \widehat{\varvec{\beta }}, \hat{\rho }} \right)}}} \right)^{{2/N}}\) and the value is 0.115.
 
8
Although we refer to these as “population” distributions for convenience and because they presumably closely approximate the true distributions, they are in fact based on samples, and accordingly the ACS data has been weighted by the U.S. Census Bureau to correct for sampling and other biases.
 
9
The 2015 (5-year) ACS estimates were the most recent available when the NHTS was administered in 2016–2017. However, the 2018 ACS provided the most recent 5-year estimates when we conducted the analysis. Since the latter involve data from 2014 to 2018, we expect them to provide a good estimate for the middle two years (2016–2017) during which the data for both surveys was collected.
 
10
The distribution divergence between the population and the observed follow-up survey respondents is of interest in an ex post analysis, but here we focus on ex ante applications of the PSS such as those in Scenarios 2 and 3 of Table 2. The distribution divergence metrics between the population and the predicted follow-up survey respondents could serve as benchmarks in the section “Outside Georgia: what does the follow-up survey sample look like?”.
 
Literature
go back to reference Adriaan, H., Jacco, D.: Nonresponse in the recruitment of an internet panel based on probability sampling. Surv. Res. Methods 3(2), 59–72 (2009) Adriaan, H., Jacco, D.: Nonresponse in the recruitment of an internet panel based on probability sampling. Surv. Res. Methods 3(2), 59–72 (2009)
go back to reference Alemi, F., Circella, G., Mokhtarian, P., Handy, S.: What drives the use of ridehailing in California? Ordered probit models of the usage frequency of Uber and Lyft. Transp. Res. Part C 102, 233–248 (2019)CrossRef Alemi, F., Circella, G., Mokhtarian, P., Handy, S.: What drives the use of ridehailing in California? Ordered probit models of the usage frequency of Uber and Lyft. Transp. Res. Part C 102, 233–248 (2019)CrossRef
go back to reference Amarov, B., Rendtel, U.: The recruitment of the access panel of German official statistics from a large survey in 2006: Empirical results and methodological aspects. Surv. Res. Methods 7, 103–114 (2013) Amarov, B., Rendtel, U.: The recruitment of the access panel of German official statistics from a large survey in 2006: Empirical results and methodological aspects. Surv. Res. Methods 7, 103–114 (2013)
go back to reference Bohte, W., Maat, K.: Deriving and validating trip purposes and travel modes for multi-day GPS-based travel surveys: a large-scale application in the Netherlands. Transp. Res. Part C 17, 285–297 (2009)CrossRef Bohte, W., Maat, K.: Deriving and validating trip purposes and travel modes for multi-day GPS-based travel surveys: a large-scale application in the Netherlands. Transp. Res. Part C 17, 285–297 (2009)CrossRef
go back to reference Börkan, B.: The mode effect in mixed-mode surveys: mail and web surveys. Soc. Sci. Comput. Rev. 28, 371–380 (2010)CrossRef Börkan, B.: The mode effect in mixed-mode surveys: mail and web surveys. Soc. Sci. Comput. Rev. 28, 371–380 (2010)CrossRef
go back to reference Cao, X.: Disentangling the influence of neighborhood type and self-selection on driving behavior: an application of sample selection model. Transportation 36(2), 207–222 (2009)CrossRef Cao, X.: Disentangling the influence of neighborhood type and self-selection on driving behavior: an application of sample selection model. Transportation 36(2), 207–222 (2009)CrossRef
go back to reference Carini, R.M., Hayek, J.C., Kuh, G.D., Kennedy, J.M., Ouimet, J.A.: College student responses to web and paper surveys: does mode matter? Res. High. Educ. 44, 1–19 (2003)CrossRef Carini, R.M., Hayek, J.C., Kuh, G.D., Kennedy, J.M., Ouimet, J.A.: College student responses to web and paper surveys: does mode matter? Res. High. Educ. 44, 1–19 (2003)CrossRef
go back to reference Chauhan, R.S., Bhagat-Conway, M.W., Capasso da Silva, D., et al.: A database of travel-related behaviors and attitudes before, during, and after COVID-19 in the United States. Sci. Data 8, 245 (2021)CrossRef Chauhan, R.S., Bhagat-Conway, M.W., Capasso da Silva, D., et al.: A database of travel-related behaviors and attitudes before, during, and after COVID-19 in the United States. Sci. Data 8, 245 (2021)CrossRef
go back to reference Chen, F., Wu, J., Chen, X., Zegras, P.C., Wang, J.: Vehicle kilometers traveled reduction impacts of transit-oriented development: evidence from Shanghai City. Transp. Res. Part D 55, 227–245 (2017)CrossRef Chen, F., Wu, J., Chen, X., Zegras, P.C., Wang, J.: Vehicle kilometers traveled reduction impacts of transit-oriented development: evidence from Shanghai City. Transp. Res. Part D 55, 227–245 (2017)CrossRef
go back to reference Circella, G., Tiedeman, K., Handy, S., Alemi, F. Mokhtarian, P. (2016): What Affects Millennials’ Mobility? PART I: Investigating the Environmental Concerns, Lifestyles, Mobility-Related Attitudes and Adoption of Technology of Young Adults in California. UC Davis: National Center for Sustainable Transportation. Available from the authors and at https://escholarship.org/uc/item/6wm51523. Circella, G., Tiedeman, K., Handy, S., Alemi, F. Mokhtarian, P. (2016): What Affects Millennials’ Mobility? PART I: Investigating the Environmental Concerns, Lifestyles, Mobility-Related Attitudes and Adoption of Technology of Young Adults in California. UC Davis: National Center for Sustainable Transportation. Available from the authors and at https://​escholarship.​org/​uc/​item/​6wm51523.
go back to reference Cohen, J.: Statistical power analysis for the behavioral sciences. Academic Press (1977) Cohen, J.: Statistical power analysis for the behavioral sciences. Academic Press (1977)
go back to reference Collins, A.T., Rose, J.M., Hess, S.: Interactive stated choice surveys: a study of air travel behaviour. Transportation 39, 55–79 (2012)CrossRef Collins, A.T., Rose, J.M., Hess, S.: Interactive stated choice surveys: a study of air travel behaviour. Transportation 39, 55–79 (2012)CrossRef
go back to reference Coryn, C.L.S., Becho, L.W., Westine, C.D., Mateu, P.F., Abu-Obaid, R.N., Hobson, K.A., Schröter, D.C., Dodds, E.L., Vo, A.T., Ramlow, M.: Material incentives and other potential factors associated with response rates to internet surveys of American evaluation association members: findings from a randomized experiment. Am. J. Eval. 41, 277–296 (2020)CrossRef Coryn, C.L.S., Becho, L.W., Westine, C.D., Mateu, P.F., Abu-Obaid, R.N., Hobson, K.A., Schröter, D.C., Dodds, E.L., Vo, A.T., Ramlow, M.: Material incentives and other potential factors associated with response rates to internet surveys of American evaluation association members: findings from a randomized experiment. Am. J. Eval. 41, 277–296 (2020)CrossRef
go back to reference Couper, M.P., Kapteyn, A., Schonlau, M., Winter, J.: Noncoverage and nonresponse in an Internet survey. Soc. Sci. Res. 36, 131–148 (2007)CrossRef Couper, M.P., Kapteyn, A., Schonlau, M., Winter, J.: Noncoverage and nonresponse in an Internet survey. Soc. Sci. Res. 36, 131–148 (2007)CrossRef
go back to reference de Haas, M.C., Hoogendoorn, R.G., Scheepers, C.E., Hoogendoorn-Lanser, S.: Travel mode choice modeling from cross-sectional survey and panel data: the inclusion of initial nonresponse. Transp. Res. Procedia 32, 268–278 (2018)CrossRef de Haas, M.C., Hoogendoorn, R.G., Scheepers, C.E., Hoogendoorn-Lanser, S.: Travel mode choice modeling from cross-sectional survey and panel data: the inclusion of initial nonresponse. Transp. Res. Procedia 32, 268–278 (2018)CrossRef
go back to reference Drucker, J., Khattak, A.J.: Propensity to work from home: modeling results from the 1995 Nationwide Personal Transportation Survey. Transp. Res. Rec. 1706(1), 108–117 (2000)CrossRef Drucker, J., Khattak, A.J.: Propensity to work from home: modeling results from the 1995 Nationwide Personal Transportation Survey. Transp. Res. Rec. 1706(1), 108–117 (2000)CrossRef
go back to reference Edwards, P., Roberts, I., Clarke, M., Diguiseppi, C., Pratap, S., Wentz, R., Kwan, I.: Increasing response rates to postal questionnaires: Systematic review. BMJ 324, 1183–1185 (2002)CrossRef Edwards, P., Roberts, I., Clarke, M., Diguiseppi, C., Pratap, S., Wentz, R., Kwan, I.: Increasing response rates to postal questionnaires: Systematic review. BMJ 324, 1183–1185 (2002)CrossRef
go back to reference Hardigan, P.C., Succar, C.T., Fleisher, J.M.: An analysis of response rate and economic costs between mail and web-based surveys among practicing dentists: a randomized trial. J. Commun. Health 37, 383–394 (2012)CrossRef Hardigan, P.C., Succar, C.T., Fleisher, J.M.: An analysis of response rate and economic costs between mail and web-based surveys among practicing dentists: a randomized trial. J. Commun. Health 37, 383–394 (2012)CrossRef
go back to reference Heckman, J., Tobias, J.L., Vytlacil, E.: Four parameters of interest in the evaluation of social programs. South. Econ. J. 68(2), 211–223 (2001) Heckman, J., Tobias, J.L., Vytlacil, E.: Four parameters of interest in the evaluation of social programs. South. Econ. J. 68(2), 211–223 (2001)
go back to reference Heckman, J.: The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Chap. Ann. Econ. Soc. Measure. 5(4), 475–492 (1976) Heckman, J.: The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Chap. Ann. Econ. Soc. Measure. 5(4), 475–492 (1976)
go back to reference Lavrakas, P.J.: Panel Survey. Chapters in Encyclopedia of Survey Research Methods. Sage Publications Inc, California (2008)CrossRef Lavrakas, P.J.: Panel Survey. Chapters in Encyclopedia of Survey Research Methods. Sage Publications Inc, California (2008)CrossRef
go back to reference Lin, Y.H., Yang, C.M., Hurng, B.S., Liu, I.W., Wu, S.I., Chiou, S.T.: Practical strategies to improve the response rate for a household interview survey. Taiwan J. Public Health 30, 290–299 (2011) Lin, Y.H., Yang, C.M., Hurng, B.S., Liu, I.W., Wu, S.I., Chiou, S.T.: Practical strategies to improve the response rate for a household interview survey. Taiwan J. Public Health 30, 290–299 (2011)
go back to reference Manfreda, K.L., Bosnjak, M., Berzelak, J., Haas, I., Vehovar, V.: Web surveys versus other survey modes: a meta-analysis comparing response rates. Int. J. Mark. Res. 50, 79–104 (2008)CrossRef Manfreda, K.L., Bosnjak, M., Berzelak, J., Haas, I., Vehovar, V.: Web surveys versus other survey modes: a meta-analysis comparing response rates. Int. J. Mark. Res. 50, 79–104 (2008)CrossRef
go back to reference McFadden, D. (2000): Disaggregate behavioral travel demand’s RUM side: A 30-year retrospective. International Association for Travel Behaviour Research (IATBR) Conference, Gold Coast, Queensland, Australia, July 2–7, 2000. McFadden, D. (2000): Disaggregate behavioral travel demand’s RUM side: A 30-year retrospective. International Association for Travel Behaviour Research (IATBR) Conference, Gold Coast, Queensland, Australia, July 2–7, 2000.
go back to reference Miller, C.A., Guidry, J.P.D., Dahman, B., Thomson, M.D.: A tale of two diverse Qualtrics samples: information for online survey researchers. Cancer Epidem. Biomark. Prev. 29(4), 731–735 (2020)CrossRef Miller, C.A., Guidry, J.P.D., Dahman, B., Thomson, M.D.: A tale of two diverse Qualtrics samples: information for online survey researchers. Cancer Epidem. Biomark. Prev. 29(4), 731–735 (2020)CrossRef
go back to reference Mokhtarian, P.: Discrete choice models’ ρ2: a reintroduction to an old friend. J. Choice Model. 21, 60–65 (2016)CrossRef Mokhtarian, P.: Discrete choice models’ ρ2: a reintroduction to an old friend. J. Choice Model. 21, 60–65 (2016)CrossRef
go back to reference National Research Council: Nonresponse in social science surveys : a research agenda. The National Academies Press, Washington (2013) National Research Council: Nonresponse in social science surveys : a research agenda. The National Academies Press, Washington (2013)
go back to reference Parady, G., Ory, D., Walker, J.: The overreliance on statistical goodness-of-fit and under-reliance on model validation in discrete choice models: a review of validation practices in the transportation academic literature. J. Choice Model. 38, 100257 (2021)CrossRef Parady, G., Ory, D., Walker, J.: The overreliance on statistical goodness-of-fit and under-reliance on model validation in discrete choice models: a review of validation practices in the transportation academic literature. J. Choice Model. 38, 100257 (2021)CrossRef
go back to reference Parsons, N.L., Manierre, M.J.: Investigating the relationship among prepaid token incentives, response rates, and nonresponse bias in a web survey. Field Methods 26, 191–204 (2014)CrossRef Parsons, N.L., Manierre, M.J.: Investigating the relationship among prepaid token incentives, response rates, and nonresponse bias in a web survey. Field Methods 26, 191–204 (2014)CrossRef
go back to reference Ruiz, T., Habib, K.N.: Scheduling decision styles on leisure and social activities. Transp. Res. Part A 88, 304–317 (2016) Ruiz, T., Habib, K.N.: Scheduling decision styles on leisure and social activities. Transp. Res. Part A 88, 304–317 (2016)
go back to reference Shaw, F.A., Wang, X., Mokhtarian, P., Watkins, K.: Supplementing transportation data sources with targeted marketing data: applications, integration, and validation. Transp. Res. Part A 149, 150–169 (2021) Shaw, F.A., Wang, X., Mokhtarian, P., Watkins, K.: Supplementing transportation data sources with targeted marketing data: applications, integration, and validation. Transp. Res. Part A 149, 150–169 (2021)
go back to reference Shaw, F. A., Wang, X., Mokhtarian, P. & Watkins, K. (2022). Using machine learning to enrich transportation surveys through variable transfer: with a sample application for psychometric variables. Paper in preparation. Available from the authors. Shaw, F. A., Wang, X., Mokhtarian, P. & Watkins, K. (2022). Using machine learning to enrich transportation surveys through variable transfer: with a sample application for psychometric variables. Paper in preparation. Available from the authors.
go back to reference Shih, T.H., Xitao, F.: Comparing response rates from web and mail surveys: a meta-analysis. Field Methods 20, 249–271 (2008)CrossRef Shih, T.H., Xitao, F.: Comparing response rates from web and mail surveys: a meta-analysis. Field Methods 20, 249–271 (2008)CrossRef
go back to reference Smith, V.K., Larson, K.L., York, A.: Using quality signaling to enhance survey response rates. Appl. Econ. Lett. 27, 951–954 (2020)CrossRef Smith, V.K., Larson, K.L., York, A.: Using quality signaling to enhance survey response rates. Appl. Econ. Lett. 27, 951–954 (2020)CrossRef
go back to reference Stavropoulou, C.: Non-adherence to medication and doctor-patient relationship: evidence from a European survey. Patient Educ. Couns. 83, 7–13 (2011)CrossRef Stavropoulou, C.: Non-adherence to medication and doctor-patient relationship: evidence from a European survey. Patient Educ. Couns. 83, 7–13 (2011)CrossRef
go back to reference Sun, H., Wang, H., Wan, Z.: Model and analysis of labor supply for ride-sharing platforms in the presence of sample self-selection and endogeneity. Transp. Res. Part B 125, 76–93 (2019)CrossRef Sun, H., Wang, H., Wan, Z.: Model and analysis of labor supply for ride-sharing platforms in the presence of sample self-selection and endogeneity. Transp. Res. Part B 125, 76–93 (2019)CrossRef
go back to reference Tobias, E., Ralf, M., Christian, B.: On the impact of response patterns on survey estimates from access panels. Surv. Res. Methods 7, 91–101 (2013) Tobias, E., Ralf, M., Christian, B.: On the impact of response patterns on survey estimates from access panels. Surv. Res. Methods 7, 91–101 (2013)
go back to reference van de Ven, W.P.M.M., van Praag, B.M.S.: The demand for deductibles in private health insurance: a probit model with sample selection. J. Econ. 17, 229–252 (1981)CrossRef van de Ven, W.P.M.M., van Praag, B.M.S.: The demand for deductibles in private health insurance: a probit model with sample selection. J. Econ. 17, 229–252 (1981)CrossRef
go back to reference van Herick, D., Mokhtarian, P.L.: How much does the method matter? An empirical comparison of ways to quantify the influence of residential self-selection. Travel Behav. Soc. 18, 68–82 (2020)CrossRef van Herick, D., Mokhtarian, P.L.: How much does the method matter? An empirical comparison of ways to quantify the influence of residential self-selection. Travel Behav. Soc. 18, 68–82 (2020)CrossRef
go back to reference Wang, X (2021). Respondent Recruitment to Consecutive Travel Surveys: Exploring Sample Representativeness and Travel Behavior Model Quality Using Sample Selection Models. Master’s thesis, Georgia Institute of Technology. Wang, X (2021). Respondent Recruitment to Consecutive Travel Surveys: Exploring Sample Representativeness and Travel Behavior Model Quality Using Sample Selection Models. Master’s thesis, Georgia Institute of Technology.
go back to reference Wittwer, R., Hubrich, S.: Nonresponse in household surveys: a survey of nonrespondents from the repeated cross-sectional study “Mobility in Cities – SrV” in Germany. Transp. Res. Procedia 11, 66–84 (2015)CrossRef Wittwer, R., Hubrich, S.: Nonresponse in household surveys: a survey of nonrespondents from the repeated cross-sectional study “Mobility in Cities – SrV” in Germany. Transp. Res. Procedia 11, 66–84 (2015)CrossRef
go back to reference Wolf, H.K., Kuulasmaa, K., Tolonen, H., Sans, S., Molarius, A., Eastwood, B.J.: Effect of sampling frames on response rates in the WHO MONICA risk factor surveys. Eur. J. Epidemiol. 20, 293–299 (2005)CrossRef Wolf, H.K., Kuulasmaa, K., Tolonen, H., Sans, S., Molarius, A., Eastwood, B.J.: Effect of sampling frames on response rates in the WHO MONICA risk factor surveys. Eur. J. Epidemiol. 20, 293–299 (2005)CrossRef
go back to reference Young, B., Bedford, L., Das Nair, R., Gallant, S., Littleford, R., Robertson, J.F.R., Schembri, S., Sullivan, F.M., Vedhara, K., Kendrick, D., ECLS study team: Unconditional and conditional monetary incentives to increase response to mailed questionnaires: a randomized controlled study within a trial (SWAT). J. Eval. Clin. Pract. 26, 893–902 (2020)CrossRef Young, B., Bedford, L., Das Nair, R., Gallant, S., Littleford, R., Robertson, J.F.R., Schembri, S., Sullivan, F.M., Vedhara, K., Kendrick, D., ECLS study team: Unconditional and conditional monetary incentives to increase response to mailed questionnaires: a randomized controlled study within a trial (SWAT). J. Eval. Clin. Pract. 26, 893–902 (2020)CrossRef
Metadata
Title
Response willingness in consecutive travel surveys: an investigation based on the National Household Travel Survey using a sample selection model
Authors
Xinyi Wang
F. Atiyya Shaw
Patricia L. Mokhtarian
Kari E. Watkins
Publication date
13-11-2022
Publisher
Springer US
Published in
Transportation / Issue 6/2023
Print ISSN: 0049-4488
Electronic ISSN: 1572-9435
DOI
https://doi.org/10.1007/s11116-022-10312-w

Other articles of this Issue 6/2023

Transportation 6/2023 Go to the issue

Premium Partner