Skip to main content

About this book

This book presents papers based on the presentations and discussions at the international workshop on Big Data Smart Transportation Analytics held July 16 and 17, 2016 at Tongji University in Shanghai and chaired by Professors Ukkusuri and Yang. The book is intended to explore a multidisciplinary perspective to big data science in urban transportation, motivated by three critical observations:

The rapid advances in the observability of assets, platforms for matching supply and demand, thereby allowing sharing networks previously unimaginable.

The nearly universal agreement that data from multiple sources, such as cell phones, social media, taxis and transit systems can allow an understanding of infrastructure systems that is critically important to both quality of life and successful economic competition at the global, national, regional, and local levels.

There is presently a lack of unifying principles and methodologies that approach big data urban systems.

The workshop brought together varied perspectives from engineering, computational scientists, state and central government, social scientists, physicists, and network science experts to develop a unifying set of research challenges and methodologies that are likely to impact infrastructure systems with a particular focus on transportation issues.

The book deals with the emerging topic of data science for cities, a central topic in the last five years that is expected to become critical in academia, industry, and the government in the future. There is currently limited literature for researchers to know the opportunities and state of the art in this emerging area, so this book fills a gap by synthesizing the state of the art from various scholars and help identify new research directions for further study.

Table of Contents


Chapter 1. Beyond Geotagged Tweets: Exploring the Geolocalisation of Tweets for Transportation Applications

Researchers in multiple disciplines have used Twitter to study various mobility patterns and “live” aspects of cities. In the field of transportation planning, one major area of interest has been to use Twitter data to infer movement patterns and origins and destinations of trip-makers. In the area of transportation operations, researchers have been interested in automated incident detection or event detection. Because the number of geotagged tweets pinpointing the location of the user at the time of tweeting tends to be sparse for transportation applications, there is a need to consider expanding and geolocalising the sample of non-geotagged tweets that can be associated with locations. We call this process “geolocalisation”. While geolocalisation is an active area of research associated with the geospatial semantic Web and Geographic Information Retrieval, much of the work has focused on geolocalisation of users, or on geolocalisation of tweeting activity to fairly coarse geographical levels, whereas our work relates to street-level or even building-level geolocalisation. We will consider two different approaches to geolocalisation that make use of Points of Interest databases and a second information retrieval-based approach that trains on geotagged tweets. Our objective is to make a comprehensive assessment of the differences in spatial and content coverage between non-geotagged tweets geolocalised using different approaches compared to using geotagged tweets alone. We find that using geolocalised tweets allows discovery of a larger number of incidents and socioeconomic patterns that are not evident from using geotagged data alone, including activity throughout the metropolitan area, including deprived “Environmental Justice” (EJ) areas where the degree of social media activity detected is usually low. Conclusions are drawn on the relative usefulness of the alternative approaches.

Jorge David Gonzalez Paule, Yeran Sun, Piyushimita (Vonu) Thakuriah

Chapter 2. Social Media in Transportation Research and Promising Applications

The newly emerged social media data can collect large quantities of location, time information, as well as the fully detailed text messages, which in turn contribute to existing transportation studies. With the wide spread of mobile device, information acquired from social media appears to be easier and larger than the traditional data collection methods and the related topics cover a wide range of transportation-related events.This chapter uses one of the social media tools: Twitter to demonstrate the promises of social media in complementing traditional transportation studies. Three major applications in transportation research are examined: traffic event detection, human mobility exploration, and trip purpose and travel demand forecasting. In these applications, we detail the process how to leverage the GPS information to extract displacement; how to automatically extract topics from text messages; how to forecast travel demands toward a social event. The state-of-the-art methods are employed to process the Big Data of social media and the results show the advantages as well as the deficiencies of social media in transportation research and applications.

Zhenhua Zhang, Qing He

Chapter 3. Ground Transportation Big Data Analytics and Third Party Validation: Solutions for a New Era of Regulation and Private Sector Innovation

One of the first systematic collections of ground transportation big data was instituted by the New York City Taxi & Limousine Commission (“TLC”) in 2004 through the Taxicab Passenger Enhancement Program (“T-PEP”). The T-PEP includes GPS location-based information, a rear-seat passenger information monitor, a driver information monitor, and full integration with the taximeter to convey fare information to the passenger and to transmit massive amounts of data to the selected vendors and to the TLC. Since its inception, T-PEP datasets related to ridership, fares, and traffic patterns have proven to be vital for academia, urban planners, and transportation regulators in conducting market analysis and policy making. In recent years, Transportation Network Companies (“TNCs”) such as Uber and Lyft have significantly changed the dynamics of transportation data. Not only do these companies capture a voluminous amount of data for post-trip analysis, they also rely on big data for their daily operation and strategic planning—in other words, survival! TNCs have fought hard to prevent their data from being released to transportation regulators, frustrating their mission to implement policy, make and enforce regulations. This chapter will examine how big data has been used by both the TLC and TNCs, and more importantly, highlight the pertinent issues within the realm of ground transportation such as data accuracy, security, privacy, transparency, and compliance. Ultimately, this chapter will advocate the need for third-party independent institutions to audit and maintain this data.

Matthew W. Daus

Chapter 4. A Privacy-Preserving Urban Traffic Estimation System

This chapter describes a novel traffic monitoring system based on data generated by Inertial Measurement Units (IMUs) in conjunction with short range Bluetooth or WiFi readers. The IMUs are used to estimate the vehicle path along the transportation network, detect traffic stops and go waves, classify traffic-related events, and possibly monitor the condition of the roadway. We introduce a trajectory estimation method for estimating vehicle paths from IMU data and Bluetooth reader position data only. Using this method, we show that the state of traffic on an urban network can be estimated locally by solving a set of independent traffic estimation problems with unknown boundary conditions. This set of independent solutions are then regularized using a consensus-type algorithm to estimate the unknown boundary conditions during the process. This system allows one to estimate the state of traffic over an urban network, while maintaining the privacy of the users, unlike current systems.

Tian Lei, Alexander Minbaev, Christian G. Claudel

Chapter 5. Data, Methods, and Applications of Traffic Source Prediction

Traffic source prediction provides a new way of mitigating traffic congestion. Since the initial discovery that the major usage of road segments can be traced to surprisingly few driver sources, studies of traffic source prediction have recently experienced rapid development. With more high-resolution traffic data available, dynamical driver sources and passenger sources have been proposed, and the method of targeting traffic sources highly related to congestion has triggered a number of applications ranging from travel demand control to vehicle routing guidance and infrastructure upgrades. Here, we present a comprehensive review of the data, methods, and applications of traffic source prediction.

Chengcheng Wang, Pu Wang

Chapter 6. Analyzing the Spatial and Temporal Characteristics of Subway Passenger Flow Based on Smart Card Data

Passenger flow is a core feature of rail transportation stations, and its station-level fluctuation is strongly influenced by its surrounding land-use types. This study develops a sequential K-means clustering algorithm that utilizes smart card data to categorize Beijing subway stations. The temporal characteristics of daily inbound and outbound subway passenger flows are considered in the clustering. The stations are divided into 10 groups that are classified under three categories: employment-oriented, dual-peak, and residence-oriented stations. We analyze how these categories differ in terms of station-level passenger flow. In addition, a station-level buffer area calculation method is used to estimate the land-use density around each subway station. Considering the spatial nonstationarity of passenger flow, we employ a geographically weighted regression (GWR) model to determine the correlation effect between peak-hour passenger flow and land-use density. We then analyze the spatial distribution of the correlation coefficients. Results demonstrate that most residents commute via rail transportation, and the passenger flows for the different categories of stations exhibit distinct characteristics of residences and workplaces. The findings of this study provide insightful information and theoretical foundation for rail transportation network design and operation management.

Xiaolei Ma, Jiyu Zhang, Chuan Ding

Chapter 7. An Initial Evaluation of the Impact of Location Obfuscation Mechanisms on Geospatial Analysis

Since the emergence of GPS capable mobile devices and the increasing demand of contextual services such as location-based services (LBSs), there has been a rising concern for location privacy. This led to the creation of manifold location privacy protection mechanisms (LPPMs) in the scientific community. The authors propose the use of geospatial analyses as an evaluation tool of the impact of noise-based algorithms on location data. For this, the Pinwheel mechanism is tested, using different noise settings, to identify a threshold where privacy is provided, and geostatistical inferences are not affected greatly. Results show that 500 m of random noise introduce a small level of change that does not change the general trend in both heatmap and hotspot analysis but still can provide a minimum level of protection to individuals.

Pedro Wightman, Mayra Zurbarán

Chapter 8. PETRA: The PErsonal TRansport Advisor Platform and Services

Smart Cities applications are fostering research in many fields including Computer Science and Engineering. Data Mining is used to support applications such as the optimization of a public urban transit network and event detection. The aim of the PErsonal TRansport Advisor (PETRA) EU FP7 project is to develop an integrated platform to supply urban travelers with smart journey and activity advices, on a multi-modal network, while taking into account uncertainty, such as delays in time of arrivals, and variations of the walking speed.In this chapter, we describe the architecture of the PETRA platform, and present results obtained by applying PETRA to two different use cases, namely journey planning under uncertainty, and smart tourism advisor with crowd balancing.We present the main modules of the platform, namely data acquisition and integration, mobility mining, journey planning under uncertainty, activity planning, with details on design and interfacing choices. We then present the results of applying PETRA in two cities, Rome and Venice, corresponding to the above two use cases. In Rome, we demonstrate how our multi-modal journey planner under uncertainty can cope with the intrinsic uncertainty in the transport network and can integrate private transport into the public one. In Venice, we demonstrate how our crowd-balancing tourism activity planner can schedule visit activities in order to reduce pedestrian congestions occurring in the historical city centre.Our results are presented also as demonstrators to the EU as part of the results of the PETRA FP7 EU project.

Michele Berlingerio, Veli Bicer, Adi Botea, Stefano Braghin, Francesco Calabrese, Nuno Lopes, Riccardo Guidotti, Francesca Pratesi, Andrea Sassi

Chapter 9. Mobility Pattern Identification Based on Mobile Phone Data

Travel behavior plays a crucial role in urban planning and epidemic control. Extensive researches concerning mobile phone data have been performed with the increasing widespread use of mobile phones. However, data sparsity and localization noise pose a great challenge to further studies. In this research, mobile phone call record data (CRD) of 60 days obtained from Shenzhen, China has been put into use. By identifying the home and work location for users, location information for every timeslot was labeled. First, mobility topic was extracted by latent Dirichlet allocation (LDA) model. Then, affinity propagation (AP) was used to analyze users’ mobility patterns from mobility topic distribution. The results revealed that clustering on the level of mobility topic outperformed directly clustering location sequence information. This method could effectively mitigate the adverse impact brought by missing information. Finally, 25 and 17 patterns are found in weekday and weekend, respectively. Representative features were captured from each pattern. By measuring the accuracy of the representative feature, it can be concluded that mobility feature after clustering is capable of describing the main mobility patterns.

Chao Yang, Yuliang Zhang, Satish V. Ukkusuri, Rongrong Zhu


Additional information

Premium Partner

Stellmach & BröckersBBL | Bernsau BrockdorffMaturus Finance GmbHPlutahww hermann wienberg wilhelm
image credits