Increasing the accuracy of trip rate information from passive multi-day GPS travel datasets: Automatic trip end identification issues

https://doi.org/10.1016/j.tra.2006.05.001Get rights and content

Abstract

With the availability of Global Positioning System (GPS) receivers to capture vehicle location, it is now feasible to easily collect multiple days of travel data automatically. However, GPS-collected data are not ready for direct use in trip rate or route choice research until trip ends are identified within large GPS data streams. One common parameter used to divide trips is dwell time, the time a vehicle is stationary. Identifying trips is particularly challenging when there is trip chaining with brief stops, such as picking up and dropping off passengers. It is hard to distinguish these stops from those caused by traffic controls or congestion. Although the dwell time method is effective in many cases, it is not foolproof and recent research indicates use of additional logic improves trip dividing. While some studies incorporating more than dwell time to identify trip ends having been conducted, research including actual trip ends to evaluate the success of trip dividing methods used have been limited. In this research, 12 ten-day real-world GPS travel datasets were used to develop, calibrate and compare three methods to identify trip start points in the data stream. The true start and end points of each trip were identified in advance in the GPS data stream using a supplemental trip log completed by the participants so that the accuracy of each automated trip division method could be measured and compared. A heuristic model, which combines heading change, dwell time and distance between the GPS points and the road network, performs best, correctly identifying 94% of trip ends.

Introduction

Trip rate, the number of trips made by a household or an individual in a period of time (usually an hour or a day), is an essential measure in travel behavior research and transportation demand modeling. It is widely used in planning for trip generation, emissions modeling, as well as other transportation related evaluations. Conventional survey methods, such as mail and phone surveys that collect travel information based on participant recall, are limited, especially in capturing short trips or trip chains. Short trips are frequently omitted by survey participants, especially when data collection lasts for multiple days and participants are required to record trips at the end of a day or at the end of the entire data collection period.

It is widely agreed that GPS offers significant advantages over traditional survey methods in collecting travel data. Murakami and Wagner (1999) concluded, based on a field experiment, that even though GPS was not perfect, its ability to capture multiple-day data and record routes and speed is better than retrospective surveys. Yalamanchili et al. (1999) determined that the GPS-based data furnished more than twice as many multi-stop chains as recall data. Wolf et al. (2001) compared the number of trips collected by GPS to those by traditional computer-assisted telephone interview (CATI) in a California study and concluded that GPS captured 29.2% more trips than the equivalent CATI method.

There are generally two types of in-vehicle GPS configuration. One is a GPS device used together with personal data assistant (PDA) where participants are required to input information for each trip, for example trip purpose and names of driver and passengers. With compliant or dedicated participants, the GPS and PDA combination may have the advantage of collecting more complete data to identify individual trips in the GPS data stream, but it is more expensive and involves high participant burden. The other option is a passive GPS that requires no intervention from participants and collects data automatically. This can serve the need to minimize both the participant burden and trip reporting fatigue while at the same time collecting as complete a dataset as possible. It is easier for participants, but requires post-processing because all trips are stored in a single continuous data stream. Although the data processing is challenging, the completeness of the data and the lower cost and burden of passive GPS merits its use and several researchers over the last several years have pursued the specific development and evaluation of methodologies to post-process the data. Accurate post-processing methods are necessary to enable more routine collection of large passive GPS travel datasets that are useful for trip rate information.

Research published specifically on developing and calibrating methods to process and identify trip ends in large passive GPS datasets are understandably limited because of the relative difficulty of calibrating and measuring the accuracy of the methods. Ideally, the trip dividing methods should be calibrated using field-based data where actual true trip start and end points are known. Unfortunately, many travel data collection efforts that used GPS devices either did not include a corresponding manual log of trips or only part of the survey period had both manually logged travel data and GPS data collected. Furthermore, the completeness of manual trip logs in previous research were not always guaranteed. Therefore, the comparison of GPS derived trip data with actual trip ends was either impossible or the comparison was conducted with qualitatively and quantitatively limited manual logged data.

The focus of this study was to use in-vehicle logs validated for completeness to quantitatively evaluate post-processing techniques for passive in-vehicle GPS data. Travel data were collected for 10 days using both GPS devices as well as in-vehicle booklets where participants recorded information about each of their trips. The multiple-day GPS dataset was divided into individual trips using three methods and the automatically identified trip start points were compared to the actual trip start points that participants recorded. While the full study consisted of 256 individual households, only the most complete 12 datasets with well-record in-vehicle logs were used here because of the time-intensive process to manually check the in-vehicle logs against the GPS datasets for completeness as well as the manually augmenting of the in-vehicle booklets. The overall objective of this study is to measure the relative performance of these three trip dividing methods and compare them to existing methods. In addition, the settings for parameters within these methods are evaluated. The data collection period was longer than many previous studies, making the size of the dataset an advantage.

Section snippets

Background

GPS devices can record latitude, longitude, speed, heading, and altitude of a vehicle. Depending on the unit’s settings, the data are recorded every second or several seconds when the GPS antenna is receiving acceptable signals from a sufficient number of satellites (absolute minimum is four). When signals are blocked (by buildings or tunnels for example) data cannot be recorded. There is a short reacquisition time before a location can be re-calculated and recorded after signals are

Data collection and preparation

The data used for this analysis are a subset of a large GPS travel dataset collected in Lexington, KY between March 2002 and July 2003. The overall objective of that study was to evaluate motivations and variations in route choice behavior. Lexington (population 250,000 and 293 square miles) has a relatively self-contained 1350 mile road network comprising of 13,000 links. Within the city, there are divided highways (including a circle freeway), boulevards, and one-way streets. There is a

Identifying trip ends

It is often straightforward to identify the features described above from direct visual interpretation of the data in a GIS map environment, especially if one is familiar with the particular region where the data were collected. However, locating a trip start or end point automatically in a tabular GPS data stream is more complicated. For example, the selection of threshold values that work in all cases is not straightforward. Since satellite signals are not always available, in urban canyons

Analysis of results

The five parameters tested in this research are: (1) the minimum dwell time, (2) the maximum dwell time, (3) the threshold value for distance from the road network that qualifies points to be eliminated and dwell times recalculated, (4) the number of points before and after a flagged point that should be incorporated into the test of a heading change, and (5) the minimum number of points a trip end should be away from another trip end to be deemed a separate trip. In total, the number of levels

Conclusions

A combination of a maximum and minimum dwell time, a heading change and a check for distance between the GPS points and the road network provides an improvement over dwell time alone in automatically identifying trip ends in a passive GPS data stream. On average 94% of trips are correctly identified. This good performance makes it realistic to confidently batch-process GPS data obtained from passive GPS devices and use the data for trip rate or route choice research. In the test dataset used in

References (12)

  • E. Murakami et al.

    Can using Global Positioning System (GPS) improve trip reporting?

    Transportation Research C

    (1999)
  • Axhausen, K.W., Schonfelder, S., Wolf, J., Oliveira, M., Samaga, U., 2003. Eighty weeks of GPS traces: Approaches to...
  • Casas, J., Arce, C.H., 1999. Trip reporting in household travel diaries: A comparison to GPS-collected data. In:...
  • M. Clarke et al.

    Error and uncertainty in travel surveys

    Transportation

    (1981)
  • Doherty, S.T., Noel, N., Gosselin, M., Sirois, C., Ueno, M., Theberge, F., (Eds.), 2000. Moving beyond observed...
  • T. Golob et al.

    Biases in response over time in a seven-day travel diary

    Transportation

    (1986)
There are more references available in the full text version of this article.

Cited by (128)

  • Use of passive data for determining link level long distance trips

    2024, Transportation Research Part A: Policy and Practice
  • Multi-data-based travel behavior analysis and prediction

    2023, Handbook of Mobility Data Mining: Volume 2: Mobility Analytics and Prediction
  • Reviewing trip purpose imputation in GPS-based travel surveys

    2020, Journal of Traffic and Transportation Engineering (English Edition)
View all citing articles on Scopus
View full text