Nonparametric identification of daily activity durations using kernel density estimators
Introduction
Activity-based approaches to travel demand forecasting have been proposed as a new paradigm in travel behavior analysis because of their ability to model the derived nature of travel demand (i.e., people travel in order to participate in various activities). The majority of these new frameworks advocate the use of behavioral simulation to create virtual, daily individual schedules Barret et al., 1995, Ettema et al., 1995a, Ettema et al., 1995b, Kwan, 1995, Bowman and Ben-Akiva, 1996, Kitamura et al., 1996, Vaughn et al., 1997. To this end, significant advances have also been made in data collection specifically designed for these frameworks (Arentze et al., 1997). Methodological frameworks and models following this new paradigm in travel demand analysis and forecasting can be found in Jones, 1990, Hamed and Mannering, 1993, Ben-Akiva and Bowman, 1995, Hofman et al., 1995, Morrison and Loose, 1995, Pendyala et al., 1995, Bhat, 1996a, Recker, 1995, Golob, 1996, Golob et al., 1996, Golob and McNally, 1996, Miller, 1996, Stopher and Lee-Gosselin, 1996, Ettema and Timmermans, 1997, Ma, 1997.
The success of an activity-based forecasting system depends heavily on the behavioral models that depict and attempt to predict a person’s daily schedule. However, facilitation of such analyses requires a variety of models (estimated from observed data and/or coded into a microsimulation model) that allocate time in activities and travel. Examples of variables of interest in such models are the time expenditure for each activity type, the frequency of activity types, the amount of time a person spends outside the home, and so forth. If models that explain each of these entities are available and can be embedded into a “simulator”, then synthetic (or simulated-virtual) schedules can be produced. Hence, analyzing the activity participation behavior of individuals or households can provide meaningful insight to the prediction of travel demand.
Fundamental to activity-based forecasting systems are parametric or nonparametric regression models that reflect the relationships between covariates and the aforementioned dependent variables. Correct depiction of this relationship is based on the mathematical relationship between the covariates and the probability distribution of the dependent variable. Typically, the relationship between the dependent variable and its covariates has been assumed to be either linear or a form that may be linearized. Moreover, assumptions regarding the distribution of the dependent variable have spanned a wide spectrum of parametric distributions and a few ad-hoc nonparametric approaches. However, these approaches may have difficulty accounting for special features, such as bimodality, in the distributions. The current paper provides an illustration of a cross-classification alternative to regression models that can accommodate anomalies in the distribution of the dependent variable while making no assumptions about the functional relationship between the dependent variable and its explanatory variables (or covariates). The dependent variables in this study are the amount of time allocated to six activity types in a day by each segment of the population. The segments are defined for a person’s lifecycle stage using a few explanatory variables such as age, presence of children in the household, and gender. It should be noted that the method here can be expanded to other dependent and explanatory variables used in the activity-based approaches including social and economic circumstances and the level of service offered by the urban setting and transportation system.
The next section reviews proportional hazard-based approaches and formally defines the problem. Section 3 gives a review of kernel density estimation, and in Section 4, we apply this nonparametric technique to duration modeling to investigate a few covariates and visually inspect heterogeneity. Section 5 presents the data used for evaluation of the technique and our empirical results. Section 6 summarizes our conclusions and future research in this area.
Section snippets
The problem
The standard approach for analyzing activity duration is to use parametric hazard-based models. This approach accounts for the fact that, in many situations, the time at which an activity will end is conditioned upon the amount of time already spent participating in the activity (i.e., duration dependence). The hazard function describes the rate at which we expect an activity to end given that an individual has been participating in the activity up to a certain time, t. Suppose a continuous
Overview of density estimation
Let X denote a continuous random variable and , denote n independent, identically distributed observations of X. Denote by f, the pdf for X. Density estimation is the attempt to either parametrically or nonparametrically approximate the pdf for X. In parametric density estimation, the focus is on fitting a theoretical probability distribution to the collected data. This approach requires the analyst to determine the appropriate theoretical distribution and its corresponding
Model of activity duration
Consider a general setting in which we assume that duration data (i.e., time expenditure in a particular activity) are available for various daily activities of individuals. Furthermore, we assume that the data are segmented into a finite number of disjoint sets representing various lifecycle stages. It is well known that lifecycle stages influence travel behavior. In addition, when studying the dynamics of travel behavior, lifecycle stage and transitions from one stage to another may capture a
Data and empirical results
Data used in this study for the purpose of evaluating the nonparametric approach originated from the Puget Sound Transportation Panel (Goulias and Ma, 1996). This database is comprised of results of a survey conducted in the Seattle metropolitan area in the fall of 1989. The data consist of five waves with each wave containing a two-day travel survey. Each survey contains information regarding household and person demographics and socio-economic characteristics, reported travel behavior, and
Conclusions and future research
This paper presents experiments using a nonparametric pattern recognition tool for the purpose of investigating covariate effects and heterogeneity. The technique utilizes a kernel estimate of the pdf of various activity durations and allows for statistical comparison of distribution functions to evaluate differences between groups of individuals. Within this framework, it is not necessary to make restrictive assumptions about the distribution of activity duration; nor is it necessary to assume
Acknowledgements
The authors wish to acknowledge the suggestions made by anonymous referees that substantially improved the paper. This research was sponsored by the Mid-Atlantic Universities Transportation Center (MAUTC), Region III, US Department of Transportation and the Center for Intelligent Transportation Systems (CITranS) at the College of Engineering, The Pennsylvania State University. This research was also made possible by funds from the Weiss Graduate Fellowship for the first author and the
References (55)
A hazard-based duration model of shopping activity with nonparametric baseline specification and nonparametric control for unobserved heterogeneity
Transport. Res. B
(1996)A generalized multiple durations proportional hazard model with an application to activity behavior during the evening work-to-home commute
Transport. Res. B
(1996)- et al.
Maturing motorization and household travel: the case of nuclear-family households
Transport. Res.
(1986) The household activity pattern problem: general formulation and solution
Transport. Res.
(1995)- Arentze, T., Hofman, F., Kalfs, N., Timmermans, H., 1997. Data needs, data collection and data quality requirements of...
- Barret, C., Berkbigler, K., Smith, L., Loose, V., Beckman, B., Davis, J., Roberts, D., Williams, M., 1995. An...
- et al.
Trade union decline and the distribution of wages in the UK: evidence from kernel density estimation
Oxford Bull. Econom. Stat.
(1998) - Ben-Akiva, M.E., Bowman, J.L., 1995. Activity-based disaggregate travel demand model system with daily activity...
- Bowman, J.L., Ben-Akiva, M., 1996. Activity based travel forecasting. Tutorial transcript from the TMIP Conference on...
- et al.
Lifecycle concept: a practical application to transportation planning
Transport. Res. Rec.
(1984)
Practical Nonparametric Statistics
Regression models and life tables
J. R. Statist. Soc. B
Nonparametric estimation of a multidimensional probability density
Theor. Probab. Appl.
A competing risk-hazard model of activity choice, timing, sequencing, and duration
Transport. Res. Rec.
Bootstrap selection of the smoothing parameter in nonparametric hazard rate estimation
J. Am. Stat. Assoc.
Have panel surveys told us anything new?
Modelling travelers’ postwork activity involvement: toward a new methodology
Transport. Sci.
Flexible parametric estimation of duration and competing risk models
J. Appl. Econometrics
Cited by (27)
Assessing the collective safety of automated vehicle groups: A duration modeling approach of accumulated distances between crashes
2024, Accident Analysis and PreventionOptimization of multi-type sensor locations for simultaneous estimation of origin-destination demands and link travel times with covariance effects
2022, Transportation Research Part B: MethodologicalUnderstanding individual and collective human mobility patterns in twelve crowding events occurred in Shenzhen
2022, Sustainable Cities and SocietyA generalized diffusion model for preference and response time: Application to ordering mobility-on-demand services
2020, Transportation Research Part C: Emerging TechnologiesCitation Excerpt :Chen et al. (2016) found that factors on task complexity and TP during the deliberation are statistically significant in a discrete choice model for daily activity and travel choices. Generally speaking, statistical (hazard-based or nonparametric) duration models (e.g., Kharoufeh and Goulias, 2002) have been applied to model the distribution of an event (e.g., an activity’s duration) to last, and yet, there seem few, if any, studies on the appropriateness of modeling the distributions of multiple alternative options to be chosen by a decision maker. Although many of these models allow heterogeneity in terms of risk preference, an individual traveler’s risk preference is usually assumed fixed.
Hazard-based duration analysis of the time between motorcyclists’ initial training and their first crash
2020, Analytic Methods in Accident ResearchCitation Excerpt :The analysis of such duration data (the time duration between crashes) is well suited to hazard-based duration models which consider the probability of a crash at some point in time given that a crash has not occurred up until that time (Washington et al., 2020). Over the years, hazard-based duration models have been widely used in other areas of transportation including the duration of highway closures (Jones et al., 1991; Nam and Mannering, 2000; Chung, 2010), traffic queue dissipation (Paselk and Mannering, 1994), traffic incident duration (Nam and Mannering, 2000; Stathopoulos and Karlaftis, 2002; Kang and Fang, 2011; Hojati et al., 2013), roadway pavement performance (Loizos and Karlaftis, 2005; Anastasopoulos and Mannering, 2015), performance based contracts (Anastasopoulos et al., 2009), the time between trips (Mannering and Hamed, 1990; Hamed and Mannering, 1993; Ettema et al., 1995; Bhat, 1996a, 1996b; Niemeier and Morita, 1996; Wang, 1996; Kitamura et al., 1997; Kharoufeh and Goulias, 2002; Bhat et al., 2004), travel and transit times (Martchouk et al., 2011; Anastasopoulos et al., 2012; Hainen et al., 2013; Jordan et al., 2019), evacuation timing (Hasan et al., 2013), the time between vehicle purchase (Mannering and Winston, 1991; Gilbert, 1992; De Jong, 1996; Yamamoto and Kitamura, 2000), waiting times (Psarros et al., 2011; Yang et al., 2015), activity duration (Kim and Mannering, 1997; Van den Berg et al., 2012), overtaking duration (Vlahogianni, 2013; Bella and Gulisano, 2020), reaction times (Haque and Washington, 2014), deceleration times (Haque and Washington, 2015; Bella and Silvestri, 2016, 2018), plug-in electric vehicle charging intervals (Kim et al., 2017), and minimum gap times (Ali et al., 2019). Despite the relatively wide use of duration models in the transportation field in general, there have been relatively few applications to highway safety.
Statistical evaluation of data requirement for ramp metering performance assessment
2020, Transportation Research Part A: Policy and Practice
- 1
Tel.: +1-814-865-9860; fax: +1-814-863-4745.