Introduction
Exemplar problem
Exemplar description
Data sources
Data Sources | Source Type | Frequency | Description |
---|---|---|---|
Twitter [11] | Live text | Live. Query every 5 min | Decahose – Geo-tagged tweets within Chicago city limits |
Travel Mid-west [12] | Various | Traffic camera images every 15 min Vehicle Detection System (VDS) every 10 min Dynamic Message Sign (DMS) every 10 min Thousands of camera locations | Traffic Cameras VDS—Vehicle Speeds, Vehicle Occupancy DMS – Traffic times, Lane Closures, Accidents |
City of Chicago [13] | Various | Traffic Segments every 10–15 min Traffic Region every 10–20 min Construction Moratorium—Infrequent | Traffic Segments – Vehicle Speeds, Vehicle Occupancy Traffic Region—Vehicle Speeds, Vehicle Occupancy Construction Moratorium – Road closures |
GDELT [14] | Various | Every 15 min | Global Knowledge Graph – provides context and feeling between people, organizations, and locations Event Mentions, Events |
MapQuest [15] | Various | Every 5 min | Reported Incidents |
Digital Globe [16] | Satellite Imagery | 1–3 images a day | Satellite Imagery (limited number of images) |
Dash Camera | 3-h Video | Field experiment | Dash Camera Video (Live Experiment and Validation) |
System requirement
Requirements | Descriptions | Goal | Threshold |
---|---|---|---|
Scalability | Number of streaming location supported | 150 streaming location | 100 streaming location |
Data Variety | Structured, Unstructured, Semi-structured | Structured, Unstructured, Semi-structured | Structured, Semi-structured |
Average Throughput Per Location | Average data transfer rate per location | 1 Mbps per source Location | 0.50 Mbps per source location |
Average Data Latency | Time measured from data creation to the time the data has arrived and indexed into our system | Less than or equal to the polling frequency | max (polling frequency, data update frequency) + 2 min |
Data Management Guarantees | Level of guarantee on which message to be processed | Fully process each message | Drop message on failure |
Traffic Classification Accuracy | Traffic Classification Accuracy | 95% accuracy on trained location | 90% accuracy on trained location |
Assumption about data
Related works
Methods
System setup
BDAI architecture contributions
BDAI Architecture–algorithm workflow
Muti-source data fusion
BDAI architecture analytical fusion algorithm
Results and discussion
Chicago traffic analytic–multi-source analytical fusion demonstration
Traffic classifier performance
BDAI dashboard
System performance
Event type | Avg num of locations | Avg record size (bytes) | Avg daily record total | Total num of records | Avg latency (min) | Avg throughput (bytes/sec) |
---|---|---|---|---|---|---|
Tweet_posted | 1 | 3835 | 132,305 | 16,875,132 | 5.5 | 5873 |
Traffic_segment_updated | 818 | 117,957 | 4,362,857 | 29.9 | ||
Vds_report_updated | 818 | 639 | 117,250 | 2,305,651 | 4.7 | |
Gdelt_gkg_posted | 109,388 | 4254 | 1,104,740 | 3.0 | 5386 | |
Gdelt_mention_posted | 7509 | 4504 | 1,034,357 | 1.0 | 391 | |
Camera_picture_taken | 150 | 350,000 | 14,041 | 516,782 | 2.2 | 56,879 |
Dms_report_updated | 150 | 2200 | 20,906 | 508,288 | 34.9 | 532 |
Gdelt_event_posted | 1204 | 1793 | 217,691 | 1.7 | 25 | |
Traffic_region_updated | 4156 | 153,555 | 37.3 | |||
Tweet_traffic_posted | 1 | 3835 | 561 | 68,124 | 5.8 | 25 |
Construction_moratorium | 1000 | 37,000 | ||||
Congestion_report_updated | 74 | 28,092 | 28.6 | |||
Incident_report_updated | 238 | 9346 | ||||
Construction_report_updated | 99 | 800 | ||||
Social_event_report_updated | 225 |
Topology | Polling Rate |
---|---|
ChicagoTrafficTrackerTopology | no more frequently than 10 min (~ between 10 and 12 min) |
XmlTopology | no more frequently than 10 min (~ between 10 and 12 min) |
MoratoriumTopology | 24 h |
CamerasTopology | 15 min |
GDELT | 15 min |
MapQuestTopology | 5 min |
TweetTopology | every 5 or 15 min subject to twitter rate limits |
Performance vs requirement discussion
Event type | Avg Record Per Day | Num of Source Station | Query Freq (min) | Polling Freq (min) | Average throughput (bytes/sec) | Average latency (measured) | Status |
---|---|---|---|---|---|---|---|
Tweets | 132 K | 1 | 5 or 15 (subject to rate limit) | 5 or 15 min subject to rate limit) | 5873 | 5.5 min | Met Goal |
Tweet Traffic Posts | 0.5 K | 1 | 5 or 15 (subject to rate limit) | 5 or 15 (subject to rate limit) | 25 | 5.8 min | Met Goal |
Camera Images | 14 K | 150 | 15 | 15 | 56,879 | 3 min | Met Goal |
Gdelt Global Knowledge Graphs | 4.2 K | 1 | 15 | 15 | 5386 | 3 min | Met Goal |
Gdelt Mention Posts | 4.5 K | 1 | 15 | 15 | 391 | 1 min | Met Goal |
Gdelt Event Posts | 1.7 K | 1 | 15 | 15 | 25 | 1.7 min | Met Goal |
Dynamic Message Sign Report | 20.9 K | 150 | 15 | 10–12 | 532 | 34.9 min | Failed |