Introduction
The neighborhood effects literature concerns itself with identifying causal effects of living in (deprived) neighborhoods on a range of individual-level outcomes, such as income, education, employment, and health. The literature on neighborhood effects is far from conclusive, and a major debate persists as to the size and significance of neighborhood effects as well as whether the effects are causal. Several studies have suggested that selection—not causality—is behind most of the current neighborhood-effects evidence (e.g., Bolster et al.
2007; Oreopoulos
2003; van Ham and Manley
2010; van Ham et al.
2012a). According to this perspective, many existing studies have failed to convincingly show real causal neighborhood effects because they either ignored or failed to adequately address neighborhood selection (Durlauf
2004; van Ham and Manley
2010). Thus, despite the impression that neighborhood effects are important, these studies in reality might simply show correlations between individual and neighborhood characteristics (Cheshire
2007). From this vantage point, studies claiming to have found that poor neighborhoods make people poor(er) likely show only that poor people live in poor neighborhoods because they cannot afford to live elsewhere (Cheshire
2007).
The problem with estimating neighborhood effects on, for example, individual income is that people are nonrandomly allocated to neighborhoods; people select into neighborhoods based on their preferences and resources, in combination with housing availability. That is, people tend to move to neighborhoods that have affordable dwellings, match their tenure preferences, and are associated a low likelihood of discrimination by landlords against them. As a result of this selection process, parameter estimates of neighborhood effects are likely inflated because the characteristics that drive households into certain areas are highly correlated with the outcomes of interest to most researchers. Several econometric techniques have been proposed to correct for selection effects—for example, instrumental variables or fixed-effects models that hold constant time-invariant factors that presumably vary across households. Although these techniques can reduce selection bias, no perfect fix exists with which to completely rule out threats posed by endogeneity (Boschman
2015; Harding
2003; Vogel et al.
2017). Perhaps more importantly, controlling for neighborhood selection using such approaches is suboptimal because the processes that funnel certain households into particular neighborhoods are theoretically meaningful and should be modeled explicitly (Hedman and van Ham
2012). Instead of treating neighborhood selection as a nuisance that needs to be controlled away, we present an empirical framework that directly incorporates neighborhood selection in models of neighborhood effects (see also van Ham and Manley
2012).
Few studies have attempted to model neighborhood choice to correct for selection bias in models of neighborhood effects (see Hedman and Galster
2011; Ioannides and Zabel
2008; Sari
2012). Following Ioannides and Zabel (
2008), we model neighborhood choice using a conditional logit model and subsequently incorporate correction components into a neighborhood-effects model of individual income from work. This approach allows us to adjust our neighborhood-effects model for selection processes driven by various household and neighborhood characteristics that are assessed simultaneously and in combination.
Our approach diverges in two crucial ways from prior work using a similar two-step framework (Ioannides and Zabel
2008). First, we proceed by estimating a conditional logit model on the full choice set of available neighborhoods in a regional housing market versus a smaller random choice set. As we argue later, only the full choice set allows us to properly correct for nonrandom selection in the neighborhood-effects model. We next derive a series of linear probabilities from the conditional logit model that reflect the likelihood that a household will choose to move to a specific neighborhood over all alternative neighborhoods in the region. These probabilities form the basis for the correction terms used in the subsequent neighborhood-effects model. Given the high degree of collinearity among items, we employ principal components analysis (PCA) to reduce the number of correction terms in the subsequent neighborhood-effects model. The specification of the conditional logit model from which these terms are derived allows for these reduced components to be interpreted as the probability that a certain type of
household will select into a certain type of
neighborhood. We incorporate these components into the second-stage neighborhood-effects model to account for the differential sorting of households into particular
types of neighborhoods, the characteristics of which are likely conflated with subsequent earnings. This approach is conceptually appealing given that most households select on neighborhood type rather than a specific neighborhood, and preferences usually vary by households’ sociodemographic characteristics. We estimate our models on longitudinal population data from the Netherlands Social Statistical Database (SSD), a population registry comprising geocoded individual-level data covering the entire population of the Netherlands from 1999 to 2013.
Background
The body of literature on the so-called neighborhood effects—defined here as the independent influence of the residential environment on individual outcomes—has grown considerably over the last two decades (see Durlauf
2004; Ellen and Turner
1997; Galster
2002; Nieuwenhuis
2016; Nieuwenhuis and Hooimeijer
2016; Sampson et al.
2002; Sharkey and Faber
2014; van Ham and Manley
2012; van Ham et al.
2012a,
b). Many studies have reported neighborhood effects on outcomes such as school dropout, childhood achievement, transition rates from welfare to work, deviant behavior, social exclusion, social mobility, and income.
Since the seminal work by Wilson (
1987), theoretical explanations of neighborhood effects have been expanded to include role model effects and peer group influences, social and physical disconnection from job-finding networks, a culture of poverty leading to dysfunctional values, discrimination by employers and other gatekeepers, access to public services, and exposure to criminal behavior. For an excellent overview of potential causal mechanisms, see Galster
2012. The neighborhood-effects literature suggests that living in a low-income neighborhood, or a neighborhood with a high concentration of poverty, can have a negative effect on the incomes of individuals. Various causal mechanisms could lead to such negative contextual effects on individual incomes (Galster
2012). For example, those living in high-poverty neighborhoods could have difficulties accessing good employment opportunities due to the spatial distribution of jobs and the lack of transportation. Also, people living in high-poverty neighborhoods might lack job-finding networks that could help them to find (better) paid positions. In addition, the lack of positive role models in the residential neighborhood might lead to negative attitudes toward paid employment. People living in high-poverty neighborhoods can also face discrimination from employers, thus reducing the probability of finding a job or increasing earnings.
The concept of neighborhood effects is academically intriguing, and policymakers have embraced the concept to justify area-based policies (van Ham and Manley
2012). Despite the popularity of the concept and the ever-growing body of literature, however, considerable debate remains as to the importance of neighborhood effects above and beyond the shared characteristics of neighborhood residents. Further, although increasing evidence suggests that neighborhoods are relevant for the social and economic well-being of their residents, many studies have struggled with the identification of causal neighborhood effects because they have either ignored or failed to adequately address the forces that differentially funnel certain people into particular areas (Durlauf
2004; van Ham and Manley
2010). The main problem is that people do not choose where they live at random. The neighborhood sorting process is highly structured, and often the outcome of interest (e.g., income) may also be responsible for people selecting into deprived neighborhoods in the first place (van Ham and Manley
2012). In other words, impoverished neighborhoods may not make residents poor(er). Rather, low-income households tend to live in particular types of places—for instance, where rent is low, landlords are less discriminating, and (most importantly) housing is available (Desmond
2016). Disentangling the shared characteristics of neighborhood residents from true neighborhood effects is paramount for understanding whether and how characteristics of residential places influence individual outcomes, such as economic well-being.
A growing body of literature underscores the importance of neighborhood choice in determining the spatial distribution of households across metropolitan areas. Most households have preferences about the type of neighborhoods in which they want to live and thus concentrate their search efforts in these areas. The availability of a dwelling with the right characteristics in the housing vacancy chain then determines the exact neighborhood in which a household locates (see White
1971). Most studies have modeled the probability that a household moves to a certain type of neighborhood based on one or two neighborhood characteristics—typically, the level of deprivation and/or the level of concentration of ethnic minorities (Bråmå
2006; Clark and Ledwith
2007; Logan and Alba
1993). Capturing neighborhood selection with only one or two characteristics does little justice to the variety of neighborhoods in the urban housing market. Hedman et al. (
2011; see also Boschman and van Ham
2015; Sermons
2000) took a different approach. Following Ioannides and Zabel (
2008) and Quillian and Bruch (
2010), they applied a conditional logit model (McFadden
1974) that allowed for multiple characteristics of destination neighborhoods to be assessed simultaneously and in combination. The conditional logit model estimates the probability that a household chooses a certain neighborhood from a set of alternative neighborhoods, based on interaction effects between household characteristics and a range of neighborhood characteristics. Using administrative data from Sweden, Hedman et al. (
2011) reported that neighborhood sorting is a highly structured process. Households were more likely to choose neighborhoods where the population composition matched their own social and demographic backgrounds. Income was the most important driver of the sorting process: higher-income households were most likely to sort into high-income neighborhoods, and low-income households were most likely to sort into low-income neighborhoods. However, other socioeconomic and demographic characteristics were also important. Ethnic minorities moved to neighborhoods with higher shares of ethnic minorities, and families with children to neighborhoods with many families with children. As a result of the neighborhood choices made by moving households, neighborhood characteristics were reproduced over time. Hedman et al. (
2011:1395) were careful to note that
[T]he concept of choice needs to be used with caution. Households make choices within a restricted choice set. Choices are restricted by household preferences, resources, and restrictions, but also by constraints imposed by the structure of the housing market. It is very likely that poor households do not “choose” to move to poverty neighbourhoods, but move there because they cannot afford to live anywhere else.
Consistent with this observation, van Ham and Manley (
2012) argued that one of the most pressing challenges for research on neighborhood effects is to explicitly incorporate neighborhood selection in models of neighborhood effects. Controlling for selection effects through econometric modeling alone may not be sufficient because selection is at the very heart of understanding neighborhood effects. They further advocated for the necessity of a theory of selection bias to help explicate the “unmeasured characteristics which cause people to move to certain neighborhoods, and also cause people to have a certain income, health or other outcome” (van Ham and Manley
2012:2791). Only a few studies have tried to explicitly model neighborhood choice itself and to use the outcomes to correct for bias in models of neighborhood effects (see, e.g., Hedman and Galster
2011; Ioannides and Zabel
2008; Sari
2012). Although several studies have attempted to address the sorting problem, we will briefly discuss three different approaches.
Hedman and Galster (
2011) specified a structural equation model in which both neighborhood income mix (neighborhood sorting) and individual income (neighborhood effects) were modeled as mutually reinforcing. This approach was designed to avoid both selection on unobservable characteristics and endogeneity resulting from nonrandom neighborhood selection. Their results suggested that models failing to control for endogeneity underestimate the true neighborhood effect. In other words, the parameter estimates for neighborhood effects were smaller in the models that did not correct for selection bias. This seems somewhat counterintuitive: one would expect that controlling for selection would reduce the effect of neighborhood characteristics on individual outcomes.
Sari (
2012) used a different approach to address the endogeneity problem that results from the fact that residential location may be jointly determined with employment status as a result of nonrandom sorting. Two models were estimated. First, a bivariate probit estimated the probability of living in a deprived neighborhood and the probability of being employed. Second, a probit model was estimated on a subsample of households living in public housing, assuming that the location choice was exogenous in this sample. The results of this approach showed that individual unemployment depends not only on experience and skills but is also related to residential location (Sari
2012).
Finally, Ioannides and Zabel (
2008) developed a two-step model of housing structure demand that controlled for the nonrandom sorting into neighborhoods. The first step used a conditional logit model to model choice for a specific neighborhood from a set of alternative neighborhoods. The choice set was determined by the chosen neighborhood in which the household lived plus a sample of 10 alternative census tracts, randomly selected from all census tracts of the metropolitan area. This resulted in a choice set of 11 tracts, 10 of which were random. The conditional logit model included interactions between individual characteristics and tract-level characteristics and, similar to Hedman et al. (
2011), confirmed that individuals select into tracts with neighbors like themselves. Ioannides and Zabel (
2008) subsequently modeled housing structure demand and included 11 bias correction terms, one for probability of choosing each of the alternative neighborhoods in the choice set. Like findings of Hedman and Galster (
2011), the results from this two-stage model demonstrated that neighborhood effects were strengthened when neighborhood choice was controlled for (Ioannides and Zabel
2008).
Adjusting for Selection by Neighborhood Type
The current study builds on and moves beyond prior research incorporating neighborhood selection in models of neighborhood effects. Following Ioannides and Zabel (
2008), we employ a two-step approach: we first model neighborhood selection and then model neighborhood effects on individual income. The linear probabilities from the first-stage model are used to adjust for the nonrandom selection of households into neighborhoods in the neighborhood-effects model.
We depart from the approach presented by Ioannides and Zabel (
2008) in two important ways. First, whereas Ioannides and Zabel (
2008) constructed correction terms based on the probability that a household selects a certain neighborhood, we construct correction components based on the probability that a household selects a certain
type of neighborhood. There are both conceptual and methodological reasons to use neighborhood types to construct correction components. Conceptually, most households are likely to search for a dwelling in a particular type of neighborhood instead of a dwelling in a specific neighborhood. This is especially the case when a regional housing market is divided into a large number of smaller neighborhoods. Many of these neighborhoods will share similar characteristics, generating only a limited number of neighborhoods types. This leads us to a methodological reason to use neighborhood types to construct correction components for nonrandom selection of neighborhoods. Correction terms based on individual neighborhoods are likely to be highly intercorrelated because many neighborhoods have very similar characteristics and hence a very similar probability that a household will move there. In the Data and Methods section, we explain in greater detail how we employ a PCA to construct correction components based on neighborhood types to overcome this problem.
Second, our approach departs from Ioannides and Zabel (
2008) in that our neighborhood-choice model (the first step) uses the full closed-choice set of all alternative neighborhood options within a regional housing market rather than a random subset of alternative neighborhoods. Of course, households do not actively consider all possible neighborhoods when choosing where to live: most households focus on a limited number of parameters (e.g., proximity to schools, building age, lot size) as they begin their housing search based on their (lack of) knowledge of the regional housing market. But because we do not know the types of neighborhoods where households search, we cannot make assumptions with regard to a more limited choice set. On the other hand, the Dutch housing market is extremely transparent because households have online access to information on almost all dwellings that are for sale and for rent. The majority of real estate agents advertise dwellings for sale on the website
www.funda.nl, which allows households to search for both rented and owner-occupied dwellings on a map and see detailed neighborhood characteristics. The majority of socially rented dwellings are offered on the woningnet.nl website, which operates on the regional housing market level.
Data and Methods
Data and Research Population
Our empirical analyses draw on longitudinal population data from the Netherlands SSD, a population registry comprising geocoded individual-level data covering the entire population of the Netherlands from 1999 to 2013. We append these data to neighborhood-level information, including ethnic, household, dwelling, and income composition, compiled by Netherlands Statistics (Kerncijfers Wijken en Buurten). We focus on heads of household who moved within the Utrecht urban region during 2009. We first estimate a selection model in which household heads select their subsequent destination neighborhood based on neighborhood characteristics prior to the move (measured in 2008). We then model the effects of neighborhood characteristics after the move (measured on January 1, 2010) on subsequent income from work in 2013.
Our decision to focus on the Utrecht urban region is twofold. First, the neighborhood selection model necessitates a study area that functions as a single regional housing market to ensure that (at least in theory) all neighborhoods within this area are part of the choice set of moving households. Second, we want an area with a large variation in neighborhood types. The Utrecht urban region, which consists of the city of Utrecht and the surrounding suburban municipalities, meets these criteria. In the Netherlands, more than 70 % of moves are within urban regions (Vliegen
2005). Within the Utrecht urban region, the social housing sector employs a choice-based letting system that allows applicants to bid on dwellings throughout the region (via the website
www.woningnet.nl). The region is characterized by large variation in terms of ethnic composition, dwelling prices, housing tenure, and accessibility of facilities between neighborhoods. Consistent with prior research in the Netherlands, we use administrative neighborhoods (
buurten) to reflect residential neighborhood boundaries. These neighborhoods are relatively small-scale, administratively determined geographic areas. In urban areas, these neighborhoods are analogous to the more familiar census tract in U.S.-based research, often consisting of relatively homogenous populations and covering, on average, one-half square kilometers in land area. Our initial sample contains 256 neighborhoods in Utrecht.
Based on the administrative data, we identify 25,643 household heads who lived in the Utrecht urban region on the first of January 2010 and who moved there from within the urban region after the first of January 2009, thus meeting our selection criteria. Households who moved to the Utrecht urban region from elsewhere are excluded from the analytic sample because we cannot assume that they included only neighborhoods within the Utrecht urban region in their choice set. Of the 256 neighborhoods in Utrecht, we exclude 53 because of missing data on neighborhood average income and average dwelling values. Income data are provided for only those neighborhoods with at least 200 inhabitants, and average dwelling values are provided for only those neighborhoods with at least five dwellings. Excluding these 53 neighborhoods results in the exclusion of 848 heads of household who moved to these neighborhoods. Because our modeling strategy necessitates information on the income of the household, we therefore exclude another 601 household heads for which no data are available on income. We thus have an analytic sample of 24,014 individuals who lived in 203 neighborhoods.
Modeling Strategy
Our modeling strategy unfolds in two steps. We first estimate a conditional logit model in which all 24,014 household heads select one neighborhood from a choice set of 203 neighborhoods within the Utrecht urban region (the selection model). The model is based on interactions between personal characteristics and the characteristics of the neighborhoods in the choice set. The conditional logit model has clear advantages over alternative strategies because it allows us to address selection effects associated with multiple individual characteristics and neighborhood characteristics simultaneously. From here, we derive the linear probabilities reflecting the likelihood that a household moves to Neighborhood 1, Neighborhood 2 , . . . , Neighborhood 203. The conditional logit model on
all potential neighborhood options allows us to retain a high degree of precision in the estimation of the conditional probabilities. Following Ioannides and Zabel (
2008), we transform these linear probabilities to generate correction terms akin to the inverse Mills ratios popularized by Heckman’s (
1979) two-stage regression framework. Because the conditional logit model incorporates interactions between household and neighborhood characteristics, these terms can be loosely interpreted as the probability that a certain type of household selects a particular neighborhood. Note that the neighborhood selection process is highly structured such that, for instance, young families demonstrate strong preferences for living with other young families, and ethnic minorities prefer neighborhoods with a large concentration of families with similar backgrounds. As a result, these terms tend to “hang together” for certain types of households, displaying high-levels of collinearity. We therefore employ PCA to reduce these 203 terms to a more narrow set of correction components. Given the specification of the conditional logit model from which these terms are derived, we can interpret the reduced components as reflecting the probability that a certain
type of household will select into a certain
type of neighborhood: for instance, minority heads of household will demonstrate a preference for living in neighborhoods with those from a similar ethnic background.
In the second step, we estimate a neighborhood-effects model in which we predict individual income from work in 2013 as a function of the characteristics of the residential neighborhood on January 1, 2010. In other words, we examine the effect of neighborhood characteristics on subsequent earnings among heads of household who moved within the Utrecht region in 2009. Our neighborhood-effects model includes the neighborhood type correction components derived from the neighborhood-selection model and the PCA. We restrict this model to heads of household who were employed in 2013 (thus excluding students, entrepreneurs, or people on welfare benefits) because the causal mechanisms that produce neighborhood effects on income will be different for employees than for other groups. Of the 24,014 household heads in the selection model, 13,430 were employed in 2013 and are therefore included in the neighborhood-effects model.
The Selection Model
We use a conditional logit model to model neighborhood selection. In this model, household
i selects neighborhood
j with the highest utility from a choice set of
J neighborhoods. The utility of a neighborhood depends on the neighborhood’s characteristics and the value of these characteristics to households, and is therefore calculated as neighborhood characteristics times parameters plus an error term (Hoffman and Duncan
1988; McFadden
1974). If we assume that the error term is identically and independently extreme value distributed across neighborhoods, the probability that household
i chooses neighborhood
j—thus, that the utility of neighborhood
j to household
i is higher than the utility of all other neighborhoods—can be estimated. Thus, let
P
ij
denote the probability that household
i will choose neighborhood
j, based on the characteristics of the of the
jth neighborhood (
N
j
) and the characteristics of the other neighborhoods in the choice set (
N
k
). Following Hoffman and Duncan (
1988), the conditional logit model is written as follows:
$$ {P}_{ij}=\frac{\exp \left(\upbeta {N}_j\right)}{\sum_{k=1}^J\exp \left(\upbeta {N}_k\right)}. $$
(1)
The utility of a neighborhood to a specific household depends on the match between individual and neighborhood characteristics and, thus, on the value of the neighborhood’s characteristics to the specific household. The selection of a neighborhood is modeled
within a household; therefore, the household characteristics do not vary between neighborhood options. To include household characteristics in the model, they must be interacted with neighborhood characteristics. This can be included in Eq. (
1) by letting
X
i
denote the characteristics of the
ith household.
$$ {P}_{ij}=\frac{\exp \left(\upbeta {N}_j{\mathbf{X}}_i\right)}{\sum_{k=1}^J\exp \left(\upbeta {N}_k{\mathbf{X}}_i\right)}. $$
(2)
All households in our model moved during the 2009 calendar year and thus selected a new neighborhood; the selected neighborhood is the neighborhood where the household lived on January 1, 2010. When possible, we use neighborhood characteristics from 2008 in the selection models because presumably households select their neighborhood based on the characteristics of the neighborhood before they move. We model neighborhood selection based on the following neighborhood characteristics: household composition, housing characteristics (tenure composition, share of dwellings built after 2000), accessibility, dwelling values, and the share of non-Western minorities (see upcoming Table
1). Measuring neighborhood characteristics before the move is important in order to avoid endogeneity problems (Manski
1993)—that is, conflating the characteristics of in-migrants with the later composition of the neighborhood. The data on neighborhood housing characteristics are, however, available only in 2009. Therefore, we use this information as a proxy for the housing characteristics in 2008, before the move. Characteristics of moving households might affect the neighborhood ethnic and household composition, but they cannot affect the building period or tenure composition of the neighborhood.
Table 1
Descriptive statistics: Neighborhood characteristics (selection model), N = 203
Average Dwelling Values (× 1,000) (2008) | 291.3 | 138.5 | 138.0 | 1,098.0 |
Restaurants Within 3 km (2008) | 76.2 | 92.3 | 0 | 268.3 |
Distance to Train Station (2008) | 3.9 | 3.4 | 0.3 | 12.2 |
Distance to Highway Access Lane (2008) | 1.9 | 0.9 | 0.1 | 6.4 |
Share of Dwellings Built >2,000 (2009) | 14.0 | 26.2 | 0 | 100 |
Share of Social Housing (2009) | 30.5 | 24.2 | 0 | 100 |
Share of Private Rental (2009) | 14.0 | 11.8 | 0 | 92 |
Share of Singles (2008) | 41.3 | 18.5 | 10.0 | 97 |
Share of Couples (2008) | 26.8 | 6.7 | 3.0 | 46 |
Share of Non-Western Minorities (2008) | 12.5 | 12.2 | 0 | 79 |
We interact these neighborhood characteristics with personal characteristics to estimate differences between households in neighborhood selection. We use household characteristics of the new household, after the move (measured on January 1, 2010). If households change during a move—for instance, when two people start living together, or when an individual leaves the parental home—the characteristics of the new household (rather than the old household) determine residential preferences and therefore neighborhood selection. We assume that households do not experience any unexpected changes between the move (at some point in 2009) and January 1, 2010. A couple that selected a new neighborhood based on their shared residential preferences and opportunities, however, may be separated on January 1, 2010.
Constructing Correction Components for Neighborhood Types
We use the conditional logit model to estimate the conditional probability that a household selects a specific neighborhood over all other alternative neighborhoods. Departing from prior research (e.g., Ioannides and Zabel
2008), we use the full choice set of available neighborhoods instead of a random sample of neighborhoods. In the appendix, we detail our method and justify the necessity of the full choice set to construct meaningful correction terms. As a robustness check, we also compare our results of the neighborhood-effects model using correction terms based on the full choice set with a model including correction terms based on a random choice set. Because households can select a neighborhood from a choice set of 203 neighborhoods, the selection model yields 203 linear probabilities per individual. These probabilities reflect the likelihood that a household head will decide to live in a given neighborhood based on his or her own sociodemographic background and the characteristics of the neighborhood in question. Similar to Ioannides and Zabel (
2008), we use these predicted probabilities to generate correction terms analogous to the more familiar inverse Mills ratios popularized by Heckman’s (
1979) two-stage regression framework.
When entered in the second-stage model, the 203 correction terms gleaned from the selection model are highly intercorrelated, which makes sense because the correction terms reflect the probability that certain types of people will select certain types of neighborhoods. For instance, ethnic minorities demonstrate a preference to live with other ethnic minorities, and young families prefer to live among other young families. Some households may strongly prefer a handful of neighborhoods and demonstrate an aversion to living in other types of areas. These preferences are allocated along sociodemographic lines. Thus, when all 203 correction terms are included in the second-stage model, they display high levels of collinearity because many neighborhoods share similar characteristics. This prohibits the estimation of the neighborhood-effects regression models with all correction terms entered simultaneously.
To overcome the collinearity issues, we perform a PCA to reduce the number of variables necessary to capture all variance in the correction terms (and remedy the high degree of correlation). The model produces eight principal components with eigenvalues greater than 1.0 that collectively capture 98.7 % of the total variance. These components are then orthogonally rotated to generate eight correction terms to be included in the second-stage neighborhood-effects model. As noted, the specification of the conditional logit model allows these correction components to be interpreted as the likelihood that certain types of households select a certain type of neighborhood instead of the likelihood of selecting a specific neighborhood.
Neighborhood Effects Models Incorporating Neighborhood Selection
The neighborhood-effect models estimate the effect of neighborhood income, the share of non-Western minorities, and the share of social housing on individual income from work in 2013. We model the income for all employed persons in 2013 based on neighborhood characteristics in 2010. We compare three neighborhood-effects models: (1) a model without controls, (2) a model controlling for personal characteristics, and (3) a model with correction components for neighborhood types derived from the selection model in Step 1. Both the personal characteristics and the correction components are measured at the individual level; therefore, we use clustered standard errors to account for the nonrandom distribution of individuals across neighborhoods.
Conclusions
One of the most significant challenges confronting neighborhood effects scholars concerns the assorted issues with neighborhood selection. Households are not randomly distributed across urban areas; rather, they choose a type of neighborhood based on their preferences and their income. Such nonrandom allocation of households across neighborhoods makes it difficult to establish causal relationships between neighborhood characteristics and individual outcomes. Whereas most of the literature sets out to control for selection effects, either through covariate controls or counterfactual models, we argue that the processes through which certain households decide to move to certain types of neighborhoods should be examined and explicitly incorporated in models of neighborhood effects (see also Hedman and Galster
2011; Hedman and van Ham
2012; Ioannides and Zabel
2008; Sari
2012).
In this article, we present an empirical framework to help disentangle selection processes in empirical models of neighborhood effects. We build on prior research by modeling neighborhood choice using a conditional logit model and subsequently incorporating correction components into a neighborhood-effects model of individual income from work. In the first step, we model neighborhood selection for all movers and generate the conditional probability that each head of household would select a certain neighborhood from a choice set of 203 neighborhoods in the Utrecht urban region. In line with previous research, we find that the neighborhood selection process is highly structured and that households are likely to prefer neighborhoods where the population composition matches their own social and demographic background.
In the second step, we model the effect of three neighborhood characteristics on individual income from work, including correction components for neighborhood selection into types of neighborhoods in our model. This approach crucially diverges from prior research in that we construct correction components based on neighborhood types instead of individual neighborhoods, which we argue to be both conceptually and methodologically more appealing. And in our approach, in order to construct correction components based on neighborhood types, we use the full choice set of available neighborhoods in the regional housing market instead of a random choice set. We find that the effect of the average neighborhood income on individual income is reduced when the neighborhood selection mechanism is controlled for. In addition, we find that the model with correction terms explains the variation in the data much better than the standard models.
The conclusion from our models is that controlling for neighborhood selection leads to less biased estimates of neighborhood effects. But most importantly, even after neighborhood selection is controlled for, we still find a small but statistically significant negative relationship between living in a deprived neighborhood and individual income. This is an important finding given that our results suggest that neighborhood effects reflect more than the shared characteristics of neighborhood residents; place of residence partially determines economic well-being.