|
small (250x250 max)
medium (500x500 max)
large ( > 500x500)
Full Resolution
|
|
CALIFORNIA CENTER FOR INNOVATIVE TRANSPORTATION
INSTITUTE OF TRANSPORTATION STUDIES
UNIVERSITY OF CALIFORNIA, BERKELEY
Mobile Century Final Report
for TO 1021 and TO 1029:
A Traffic Sensing Field Experiment
Using GPS Mobile Phones
Alexandre M. Bayen, Ph. D., Principal Investigator
CCIT Research Report
UCB- ITS- CWP- 2010- 4
This work was performed by the California Center for Innovative Transportation, a research group at the University of California, Berkeley, in cooperation with the State of California Business, Transportation, and Housing Agency’s Department of Transportation, and the United States Department of Transportation’s Federal Highway Administration.
The contents of this report reflect the views of the authors, who are responsible for the facts and the accuracy of the data presented herein. The contents do not necessarily reflect the official views or policies of the State of California. This report does not constitute a standard, specification, or regulation.
December 2010
Project Fact Sheet
Task Order # 1021
Title:
Deployment of Value- Added Mobile Traffic Probes
Project Sponsor:
Caltrans
Project Stakeholders:
Caltrans
Executing Organization:
California Center for Innovative Transportation
2105 Bancroft Way, Berkeley, CA 94720
Phone: ( 510) 642- 4522 Fax: ( 510) 642- 0910
Contract No.
65A0212
Execution Period:
03/ 01/ 2007 – 09/ 30/ 2009
Contract Amount:
$ 612,376
Principal Investigator:
Alexandre Bayen, Ph. D.
Center Director:
Thomas West
Project Manager:
Ali Mortazavi
Project Fact Sheet
Task Order # 1029
Title:
Post Processing Mobile Century ( aka Probe Data Analysis)
Project Sponsor:
Caltrans
Project Stakeholders:
Caltrans
Executing Organization:
California Center for Innovative Transportation
2105 Bancroft Way, Berkeley, CA 94720
Phone: ( 510) 642- 4522 Fax: ( 510) 642- 0910
Contract No.
65A0212
Execution Period:
07/ 01/ 2008 – 06/ 30/ 2009
Contract Amount:
$ 313,000
Principal Investigator:
Alexandre Bayen, Ph. D.
Center Director:
Thomas West
Project Manager:
Ali Mortazavi
Acknowledgments
Mobile Century i
Acknowledgements
The authors would first like to acknowledge the support of our sponsors at Caltrans, especially Larry Orcutt, who took a risk to support a truly innovative effort. We are also thankful for the the guidance and consideration of Greg Larson, Hassan Aboukhadijeh, Asfand Siddiqui, and Gurprit Hansra.
The authors wish to thank everyone who participated in Mobile Century, including those who contributed to the initial brainstorming, those who mobilized during the course of its evolution, and those who forged ahead to propel this project to heights unimaginable just three years ago. The success of Mobile Century would have been impossible without the leadership, dedication, and foresight of J. D. Margulici, Thomas West, and Joe Butler.
We are grateful to the entire staff of the California Center for Innovative Transportation for the logistical planning, and successful implementation, of Mobile Century. In particular, we thank Coralie Claudel, Marika Benko, Osama Elhamshary, Tia Dodson, Chris Flens- Batina, Manju Kumar, Jed Arnold, Benson Chiou, Lori Luddington, Xiaohong Pan, Erica Sherlock- Thomas, and Arthur Wiedmer.
We appreciate the hard work of all the execution officers, Emma Strong, Jason Wexler, Jean Parks, Jennifer Chang, Kristen Ray, Negin Aryaee, Timmy Siauw, Anurag Sridharan, Madeline Ziser, Matthew Vaggione, Qingfang Wu, Tarek Rabbani, Josh Pilachowski, Kristen Parrish, Megan Smirti, Carl Misra, Christina Sedighi, Jessica Ariani, Elizabeth Kincaid, Sandy Do, Nick Semon, Trucy Phan, Timothy Racine, Alan Wang, Charlotte Wong, Irene Kwan, Karl David Cruz, Swe Shin Maung, Tyler Moser, Alexis Clinet, and Julie Percelay, who assisted in planning, preparation, and operations on the day of the 100- vehicle deployment.
We are also indebted to members of the Nokia team: John Paul Shen, Bob Ianucci, Dave Sutter, and John Loughney; members of the CITRIS team: Gary Baldwin, Lorie Mariano, Paul Wright, Aaron Walburg, and Khossrov Taherian; members of the ITS team: Margaret Chang, Norine Shima, Jillene Bohr, John Li, and Ann Guy; members of the CoE team: Shankar Sastry, Lisa Alvarez Cohen, and Barbara Blackford; the student team: Annalisa Schiaccoli, Saurabh Amin, and Dengfeng Sun; the viability team: Patrick Saint- Pierre, and Jean- Pierre Aubin; and Sarah Yang at the UC Berkeley Media Office.
The authors would like to apologize in advance to anyone who contributed to develop, build, and deploy the traffic monitoring system implemented as part of the Mobile Century experiment but whose name we neglected to mention through our own oversight. You have our gratitude.
During the compilation of this report, we thank the many people who provided crucial subject matter through personal interviews, Skype interviews, original project documents, briefing materials, personal notes, emails, and other sources that made it possible to provide the rich Acknowledgments
Mobile Century ii
narrative herein. Finally, we thank Sara Bagwell for her work to compile and format this document.
Executive Summary
Mobile Century iii
Executive Summary
Traffic monitoring is most commonly accomplished with government- deployed, dedicated equipment. Adopting new technology in this paradigm can be costly and slow. However, recent advances in the mobile internet, cell phone technology, and location- based services may be leveraged to transcend the old paradigm. Doing so will reduce costs, increase coverage and yield a wealth of new data that will empower the traveling public with real- time access to current traffic conditions. Furthermore, transportation operators will gain access to an unprecedented wealth of information to help them better manage road networks.
Nonetheless, significant technical barriers and privacy concerns may impede widespread acceptance of a new paradigm. To understand and overcome these barriers, the Mobile Century experiment was conceived as a proof- of- concept demonstration of a traffic monitoring system based on probe vehicles equipped with GPS- enabled mobile phones.
The sheer scale of the experiment required significant logistical effort. A base station was erected at Union Landing, to house a temporary control center that was linked to a secondary control center in Palo Alto. Over one hundred graduate students from UC Berkeley were employed to circulate in loops along Interstate 880 between Hayward and Fremont, California, for an entire day. During the experimental deployment, an average penetration rate of probe vehicles was sustained near 3% ( a significant logistical feat), which is viewed as realistic in the near future considering the increasing penetration of GPS- enabled cellular devices.
Classical methods of traffic modeling operate in the vehicular density domain, and use data such as occupancies and flows from inductive loop detectors. Understanding how to use velocity measurements instead was a significant technical contribution. In this work, the classical model was converted to the velocity domain, and GPS- based measurements were directly fed into the model.
Mobile Century proved that data from GPS- enabled mobile phones alone were sufficient to infer traffic features, i. e., to construct an accurate velocity map over time and space. The methods employed were able to function properly during both congested and free flow traffic conditions, and to detect correctly a traffic incident that occurred during the deployment.
Another important contribution from this work was that ground- truth travel times were recovered by re- identifying vehicles captured on videotape. Therefore all results in this report can be asserted with high confidence. We conclude that the quality of data obtainable from present- day smartphones is adequate for useful, real- time traffic applications, such as calculating travel times.
The architecture of the traffic monitoring system was designed such that identity information is encrypted and handled separately from traffic information, with no single entity having access Executive Summary
Mobile Century iv
to both. The spatial sampling strategy is based on the use of virtual trip lines that can be re- configured on- the- fly. This feature builds- in guaranteed flexibility for future monitoring needs.
The new paradigm demonstrated in Mobile Century yet requires substantial effort to bring to fruition. Any industrial- grade, real- time system will require partnerships between government, academia, and industry. Business cases for future deployment must address incentives for public participation
In conclusion, Mobile Century was the first to demonstrate the near- term potential for using velocity data from GPS cell phones to reconstruct traffic state with precision. This opens the door for further research in this area to scale up the solution and to deliver considerable value to Caltrans and the traveling public.
Table of Contents
Mobile Century v
Table of Contents
Acknowledgements .......................................................................................................................... i
Executive Summary ......................................................................................................................... iii
Table of Contents ............................................................................................................................. v
List of Figures ............................................................................................................................... .. xi
1 Introduction ............................................................................................................................ 1
1.1 Motivation ........................................................................................................................ 1
1.2 Historical Narrative .......................................................................................................... 2
1.3 Scope ............................................................................................................................... 4
1.4 Summary of Findings ........................................................................................................ 5
1.5 Organization of Report ..................................................................................................... 6
2 Background ............................................................................................................................. 9
2.1 Traffic Monitoring with Dedicated Equipment ( Road Infrastructure) ............................. 9
2.2 Growth of the Mobile Internet ...................................................................................... 11
2.3 Barriers to a New Paradigm ........................................................................................... 12
3 Design Challenges ................................................................................................................. 17
3.1 Preliminary Investigation ............................................................................................... 17
3.2 Scale of Experiment ........................................................................................................ 23
3.3 Practical Considerations ................................................................................................. 25
3.4 Alternate Site .................................................................................................................. 26
4 VTL Traffic Monitoring .......................................................................................................... 29
4.1 Privacy Risks and Threat Model ..................................................................................... 29
4.2 Preserving Privacy with Virtual Trip Lines ...................................................................... 31 Table of Contents
Mobile Century vi
4.3 Implementation .............................................................................................................. 34
4.1 Experimental Deployment ............................................................................................. 38
4.2 Trip Line Placement ........................................................................................................ 41
4.3 Estimating traffic ............................................................................................................ 47
4.4 Discussion ....................................................................................................................... 53
5 Assimilating Lagrangian Data ................................................................................................ 55
5.1 Background ..................................................................................................................... 55
5.2 Description of Preliminary Concepts.............................................................................. 57
5.3 Explanation of Proposed Methods ................................................................................. 63
5.4 Assessment of the Methods ........................................................................................... 71
5.5 Numerical example of the NR nudging factor ................................................................ 81
6 Velocity- based Modeling ...................................................................................................... 83
6.1 Related Work .................................................................................................................. 83
6.2 Highway Traffic Flow Model ........................................................................................... 84
6.3 Speed Estimation ............................................................................................................ 89
6.4 Implementation and Validation ..................................................................................... 90
6.5 Conclusion and Future Work .......................................................................................... 94
7 Video validation .................................................................................................................... 95
7.1 Resolution Requirements and Camera Capabilities ....................................................... 95
7.2 Final Selection of Video Cameras ................................................................................... 98
7.3 Practical Considerations for Camera Deployment ......................................................... 99
7.4 Trial Tests ....................................................................................................................... 99
7.5 Protocol for Camera Deployment ................................................................................ 100 Table of Contents
Mobile Century vii
7.6 Narrative of Camera Deployment ................................................................................ 103
7.7 Post- processing of Video Data ..................................................................................... 104
8 Probe Vehicle Deployment ................................................................................................. 107
8.1 Resources ..................................................................................................................... 107
8.2 Procedures ................................................................................................................... 113
8.3 Collection of Probe Data .............................................................................................. 115
8.4 Narrative of Experiment ............................................................................................... 115
8.5 Data Pre- processing ..................................................................................................... 118
9 Experimental results ........................................................................................................... 121
9.1 Real- Time Traffic Monitoring Using Only Cell Phone Data .......................................... 121
9.2 Trajectory data ............................................................................................................. 122
9.3 Velocity Field ................................................................................................................ 123
9.4 Ground- truth Travel Times ........................................................................................... 125
9.5 Reasons For Disparity Between Loop and VTL Data .................................................... 129
9.6 Achieved Penetration Rate During Experiment ........................................................... 130
9.7 Inferring parameters from shockwave speed .............................................................. 132
10 Revisiting the Density Based Methods ............................................................................... 135
10.1 Different Scenarios ....................................................................................................... 135
10.1 Results .......................................................................................................................... 136
10.2 Conclusions................................................................................................................... 141
11 Discussion..................................................................................................................... ...... 145
11.1 Conclusions................................................................................................................... 146
11.2 Challenges .................................................................................................................... 147 Table of Contents
Mobile Century viii
11.3 Future Goals ................................................................................................................. 147
11.4 Steps Toward Deployment ........................................................................................... 148
References ............................................................................................................................... .. 151
Appendices ............................................................................................................................... .. 161
1 CPHS Protocol Narrative Form ............................................................................................ 163
2 Deployment Prototyping .................................................................................................... 203
2.1 Strategy and Objectives ............................................................................................... 205
2.2 Protocol for 20- Vehicle Deployment ........................................................................... 211
2.3 Instructions for Odd Drivers ......................................................................................... 219
2.4 Instructions for Even Drivers ........................................................................................ 229
3 Contingency Plans ............................................................................................................... 241
3.1 Risk Management ......................................................................................................... 243
3.2 Emergency Response ................................................................................................... 255
3.3 Directions for Phone Operators ................................................................................... 259
4 Driver Briefings ................................................................................................................... 271
4.1 Red Team ...................................................................................................................... 273
4.2 Yellow Team ................................................................................................................. 281
4.3 Orange Team ................................................................................................................ 291
5 Driver Instructions .............................................................................................................. 301
5.1 Red AM route ............................................................................................................... 303
5.2 Red PM route ............................................................................................................... 309
5.3 Yellow AM route ........................................................................................................... 315
5.4 Yellow PM route ........................................................................................................... 321 Table of Contents
Mobile Century ix
5.5 Orange AM route ......................................................................................................... 327
5.6 Orange PM route .......................................................................................................... 333
6 Field Experiment Protocol .................................................................................................. 339
7 Press Release Materials ...................................................................................................... 405
7.1 Fact Sheet ..................................................................................................................... 407
7.2 Guest Program ............................................................................................................. 411
8 Supplemental Tasks ............................................................................................................ 417
8.1 AASHTO presentation................................................................................................... 418
8.2 Mobile Millennium Planning, Design, and Server Development – Initial Stages ......... 419
8.3 Mobile Millennium Arterial Modeling – Initial Stages ................................................. 420
List of Figures
Mobile Century xi
List of Figures
Figure 3.1: Vehicle trajectories from NGSIM. Shown in red, a flow fraction, , of trajectories are randomly designated as probes. ................................................................................................... 18
Figure 3.2: One vehicle trajectory. Parameters are shown for real- time probe data reports. .... 19
Figure 3.3: Spatio- temporal dispersion of probe measurements for different combinations of penetration and sampling rates. ................................................................................................... 20
Figure 3.4: Vehicle accumulations. Comparison of estimates using the Kalman filter method ( top), and Nudging method ( bottom). In both cases, the incorporation of Lagrangian data results in improved estimates over those using Eulerian data only. ............................................ 21
Figure 3.5: Vehicle accumulations. The higher the sampling rate, the higher the fidelity of the accumulation estimate. ................................................................................................................ 22
Figure 3.6: Vehicle density estimated at the middle of the modeled expressway section. ......... 22
Figure 3.7: Travel time estimation for the modeled expressway section. ................................... 23
Figure 3.8: Alternate I- 680 site. .................................................................................................... 27
Figure 4.1: Driving Patterns and Speed Variations in Highway Traffic. ........................................ 31
Figure 4.2: Virtual Trip Line: Privacy- Preserving Traffic monitoring System Architecture. This system was implemented and ran for the entire duration of the 100- vehicle deployment of the Mobile Century experiment .......................................................................................................... 33
Figure 4.3: Road networks extracted from Bay Area DLG files ( Left) and Trip Lines per road segment in Palo Alto CA ( Right). ................................................................................................... 36
Figure 4.4: Comparison of the speed measurements recorded from the N95 ( dots), the VTLs ( boxes) and the vehicle speedometer ( circles) as a function of time. ......................................... 37
Figure 4.5: Satellite image of the first experiment site I- 80 near Berkeley, CA. The red lines represent the locations of the VTLs, the blue squares show the speed recorded by the VTL, and the green squares represent the position and speed stored in the phone log. The brown circles represent the readings from the vehicle speedometer. .............................................................. 38
Figure 4.6: I880 Highway Segment for Twenty Car Experiment. .................................................. 40
Figure 4.7: Experimental Setup in a Car for Twenty Car Experiment. .......................................... 40 List of Figures
Mobile Century xii
Figure 4.8: Speed Measurements over Distance. ......................................................................... 41
Figure 4.9: Speed Measurements over Time. ............................................................................... 42
Figure 4.10: Linking prediction on a straight highway section. .................................................... 43
Figure 4.11: Minimum Spacing Constraints for Straight Highway Section. .................................. 44
Figure 4.12: Linking attack near an on- ramp. ............................................................................... 45
Figure 4.13: Actual travel times compared with an estimate given by the instantaneous method ( 30 second aggregation interval). ................................................................................................. 47
Figure 4.14: Travel time estimate errors by different sampling intervals using 15 VTLs. ............ 49
Figure 4.15: Comparison between VTL- based spatial sampling and temporal periodic sampling against the same number of total anonymous samples. ............................................................. 49
Figure 4.16: Exclusion Area on Test Road Segment. Tracking starts from the point marked by star. ............................................................................................................................... ............... 50
Figure 4.17: Spatial sampling and the benefit of an exclusion zone. ........................................... 51
Figure 4.18: Travel time accuracy plotted vs. VTL spacing. .......................................................... 52
Figure 5.1 Highway US101 S used for the NGSIM dataset. .......................................................... 72
Figure 5.2 Schematic of the sampling strategy on equipped vehicle n. ....................................... 73
Figure 5.3. Vehicle accumulation per cell for a) ground truth, and b) EDO case. ........................ 75
Figure 5.4 Vehicle accumulation ( vehicles per cell) estimated using Newtonian relaxation method ( left) and Kalman Filtering techniques ( right) for scenarios 1 ( top), 3 ( middle), and 9 ( bottom). ............................................................................................................................... ....... 76
Figure 5.5 Total vehicle accumulation on the entire section. ..................................................... 77
Figure 5.6 Percentage of Improvement ( PoI) in the RMSE as the number of Lagrangian measurements increases: ( a) computing observed density using the fundamental diagram ( Section 5.2.3), ( b) using the actual density computed from vehicle trajectories as the observed density. Note that the two graphs are at different scales. ........................................................... 79 List of Figures
Mobile Century xiii
Figure 5.7 True density ( computed from vehicle trajectories) versus Observed density ( computed using the fundamental diagram as described in Section 5.2.3) for scenario 3 ( left) and 9 ( right). ............................................................................................................................... . 80
Figure 6.1: Greenshields model. Left: Classical fundamental diagram ( parabolic). Center: Linear relation between speed and density. Right: Flux function for the LWR- v PDE ( 6- 4). The flux is parabolic with negative values. .................................................................................................... 86
Figure 6.2: Paramics velocity contours. Top: Ground truth velocity contour average across all vehicles. Bottom: Estimated velocity contour from the EnKF CTM- v algorithm ( 6- 19) through ( 6- 24) at 5% penetration rate. X- axis: position along highway in milepost; Y- axis: time of day. . 91
Figure 6.3: Error comparison of the EnKF CTM- v scheme, equations ( 6- 19) through ( 6- 24), ( solid) and the averaging scheme ( 6- 25), ( dashed) using Paramics. Top: Relative error computed from ( 6- 26) as a function of penetration rate. Bottom: Absolute error computed from ( 6- 27) as a function of penetration rate. ........................................................................................................ 93
Figure 7.1: Source image for resolution analysis. ......................................................................... 95
Figure 7.2: Simulated resolution for three camera types, assuming one camera per lane. ........ 96
Figure 7.3: Simulated resolution for four camera types, assuming one camera for five lanes. .. 96
Figure 7.4: Image of license plate with blurring to simulate 2.0 cm resolution ........................... 97
Figure 7.5: Comparison of standard vs. HD camcorder image quality ......................................... 98
Figure 7.6: Video frame with fence mesh between lanes. ......................................................... 100
Figure 7.7: HD Camcorder recording mode. ............................................................................... 100
Figure 7.8: Shutter priority switch. ............................................................................................. 101
Figure 7.9: Shutter control vs aperture control. ......................................................................... 101
Figure 7.10: Focus control........................................................................................................... 102
Figure 7.11: Custom image effects. ............................................................................................ 102
Figure 7.12: Controls for contrast and sharpness....................................................................... 103
Figure 7.13: Camera deployment during MC experiment. Stevenson and I880 ( South bridge), Decoto and I880 ( Central bridge), Winton and I880 ( North bridge). ......................................... 104 List of Figures
Mobile Century xiv
Figure 8.1: Traffic monitoring infrastructure built for field experiment. ................................... 109
Figure 8.2: Layout of Base Camp on the Union Landing Parking Lot. ......................................... 110
Figure 8.3: Tent Layout. .............................................................................................................. 111
Figure 8.4: Study site and driver routes. ..................................................................................... 112
Figure 8.5: Assigning drivers to routes and cars. ........................................................................ 113
Figure 8.6: Drivers logistics ......................................................................................................... 117
Figure 9.1: Snapshot of the live traffic feed provided by the system in the present work ( and from 511. org in the inset) at 10: 52am on February 8, 2008. Traffic conditions after an incident on the northbound direction of I- 880 are displayed. Numbers in circles correspond to speed in mph. ............................................................................................................................... ............ 122
Figure 9.2: Vehicle trajectories in the northbound direction extracted from the data stored by 50% of the cell phones. The propagation of the shockwave from the accident can clearly be identified from this plot. The red lines in the close- up were drawn by hand by fitting a line through the points where trajectories change slope. ................................................................ 123
Figure 9.3: Loop detector locations along the northbound direction. Numbers indicating mileposts increase in the direction of traffic flow from left to right. ......................................... 124
Figure 9.4: Velocity fields in mph using: ( a) 17 loop detector stations; ( b) vehicle trajectories and Edie’s generalized definition; ( c) 17 VTLs at the loop detector locations; and ( d) 30 equally spaced VTLs. ............................................................................................................................... 125
Figure 9.5: Travel time ( in minutes) between Decoto Rd. and Winton Ave. The x- axis indicates arrival time at Decoto Rd. Dots correspond to individual vehicle travel times ( 4268 in total), collected manually using video. Black dash- dotted lines correspond to the standard deviation of travel times obtained from video cameras in 5- minute windows. ............................................ 126
Figure 9.6: Loop detector vs. VTL velocity measurements ( all locations). Dotted lines are the 5 mph thresholds. s ........................................................................................................................ 127
Figure 9.7: Loop detector and VTL velocity data collected at: ( a) milepost 21.3, downstream of Decoto Rd.; ( b) milepost 22.5, half- way between Alvarado Blvd. and Alvarado Niles Rd., ( c) milepost 27.3, the most downstream detector near the Winton Ave. exit; and ( d) milepost 24.0, downstream of Whipple Rd. Subfigure ( e) shows the penetration rate at each of these four locations during the day. ............................................................................................................ 129
Figure 9.8: Penetration rate map................................................................................................ 131 List of Figures
Mobile Century xv
Figure 9.9: ( a) and ( c): Average penetration rate over time at existing detector station locations during the morning and the afternoon. The range is one standard deviation below and over the mean. Traffic flows from left to right. ( b) and ( d): Histogram of the penetration rate including all the 17 locations during the morning routes and the afternoon routes, respectively. .............. 132
Figure 10.1; Density field ( in vpm) using 17 loop detector stations deployed in the section of interest ( obtained through PeMS). ............................................................................................. 136
Figure 10.2: Density field ( in vpm) using the Newtonian relaxation method ( left) and the Kalman Filtering techniques ( right) for scenario 1 ( top), 2 ( middle), and 3 ( bottom). For each scenario, the boundary data is provided by loop detectors. ..................................................................... 137
Figure 10.3: Flow comparison at mileposts ( a) 21.3 ( detector 1), ( b) 24 ( detector 7), and ( c) 25.2 ( detector 10). .............................................................................................................................. 139
Figure 10.4: Results of the quantitative analysis for the Newtonian relaxation method and the Kalman Filtering technique for scenario 1 ( left), 2 ( center), and 3 ( right). Results obtained using loop detector data are also included for comparison. ............................................................... 140
A2.2 Figure 1: The 4 mile section to be used in the 20 cars experiment. .................................. 213
A2.2 Figure 2: Detail of the Alvarado- Niles interchange and the mall. ...................................... 214
A2.2 Figure 3: Detail of the Tennyson Rd. interchange. ............................................................. 215
A2.2 Figure 4: Detail of the SR92 interchange. ........................................................................... 215
A2.2 Figure 5: From I880S and I880N to the parking lot.. ......................................................... 216
A2.2 Figure 6: Lane numbering. .................................................................................................. 217
A2.3 Figure 1: From 1880S and I880M to the parking lot. ......................................................... 222
A2.3 Figure 2: From the parking lot onto the freeway. .............................................................. 223
A2.3 Figure 3: Lane numbering. .................................................................................................. 223
A2.3 Figure 4: Turn back at CA- 92 ( north end of the long loop). ............................................... 224
A2.3 Figure 5: Turn back at Alvarado- Niles Rd. ( sound end of both the long and short loops). 225 List of Figures
Mobile Century xvi
A2.3 Figure 6: Turn back at Alvarado Blvd./ Fremont Blvd. in case driver misses exit 23. ......... 226
A2.3 Figure 7: Turn back at Winton Ave. in case driver misses off- ramp to CA- 92W. ............... 227
A2.4 Figure 1: From I- 880S and I- 880M to the parking lot. ........................................................ 232
A2.4 Figure 2: From the parking lot onto the freeway. .............................................................. 233
A2.4 Figure 3: Lane numbering. .................................................................................................. 233
A2.4 Figure 4: Turn back at CA- 92 ( north end of the long loop). ............................................... 234
A2.4 Figure 5: Turn back at W Tennyson Rd. ( north end of the short loop). ............................. 235
A2.4 Figure 6: Turn back at Alvarado- Niles Rd. ( sound end of both the long and short loops). 236
A2.4 Figure 7: Turn back at Alvarado Blvd./ Fremont Blvd. in case driver misses exit 23. ........ 237
A2.4 Figure 8: Turn back at Winton Ave. in case driver misses off- ramp to CA- 92W. ............... 238
A3 Figure 1: Architecture of mobile probe data collection / travel time calculation system ... 252
Mobile Century
Final Report for TO 1021 & TO 1029: A Traffic Sensing Field Experiment Using GPS Mobile Phones
Prepared by:
Anthony D. Patire
Alexandre M. Bayen
Daniel B. Work
Juan C. Herrera
Ryan Herring
Xuexang ( Jeff) Ban
Quinn Jacobson
Olli- Pekka Tossavainen
Sebastien Blandin
Christian Claudel
Ali Mortazavi
Steve Andrews
Baik Hoh
Marco Gruteser
Murali Annavaram
Toch Iwuchukwu
Kenneth Tracton
For:
California Department of Transportation
Division of Research and Innovation
California Center for Innovative Transportation
University of California, Berkeley
2105 Bancroft Way, Suite 300, Berkeley, CA 94720- 3830
Phone: ( 510) 642- 4522 Fax: ( 510) 642- 0910 http:// www. calccit. org
Chapter 1
Mobile Century 1
1 Introduction
This final report for TO 1021 and TO 1029 documents the Mobile Century Project. At its heart, it is a narrative of painstaking preparation that ultimately culminated in an unprecedented deployment of 100- probe vehicles, and the subsequent analysis of the collected data. In a larger context, this is a story about how a singular experiment impacted the future paradigm of traffic monitoring.
This chapter begins with a discussion motivating the line of inquiry pursued in this work. A brief historical narrative follows, describing the initial actors, task orders, and subsequent amendments that shaped the outcome of Mobile Century. Within this context, the scope of this report is defined.
Key findings are summarized, and their significance is explained. In particular, the most promising directions for future research are identified, and near- term possibilities are explored. The chapter concludes with a description of the organization of this report.
1.1 Motivation
Expanding the scope and coverage of roadway Advanced Traveler Information Systems ( ATIS) is a top- priority of Caltrans. Supporting statements for more and better traveler information across the state of California have come all the way from the Governor’s office.
ATIS benefits the transportation system for at least two reasons. First, the availability of information enhances the service provided to travelers. Numerous studies reveal that commuters appreciate and value timely information, which reduces their uncertainty and their stress. Second, reliable information can arguably enable travelers to make educated choices about their itinerary, departure time or even transportation mode, with the result of bringing about system self- management. It remains to be established that system self- management can take place on a large scale and significantly impact network- level operations. However, at a more anecdotal level, information about an accident ahead or a scheduled ramp closure certainly influences driver decisions. An additional side benefit of ATIS is that it builds the awareness of the traveling public toward Intelligent Transportation Systems ( ITS). Such awareness can translate into political support for ITS projects and enable more improvements in the long term.
One of the main pieces of ATIS content is undoubtedly travel time estimations. Travel times on selected itineraries represent information that is easy for the traveling public to understand and process. Travel times can be posted on freeway or arterial Changeable Message Signs ( CMS) and reach a very large audience, as is currently done at dozens of locations in the San Francisco Bay Area and in Southern California. Estimating travel times, either at the present time or into the future, requires large amounts of good quality traffic data. Traditionally, traffic data is Chapter 1
Mobile Century 2
collected by sensors such as inductive loops installed at fixed locations. While this method yields great results to estimate volume and occupancy, it does not provide accurate travel time information unless the sensor coverage is very dense. Traffic sensors also happen to be expensive to install and maintain. Therefore, except for some of the busiest corridors in the state, the data collected from fixed sensors is mostly inadequate for travel time estimation.
Appraising various methods of collecting traffic data and specifically travel times is a definite need for Caltrans. Besides providing the bulk of the content required for ATIS, travel times also represent precious data to Caltrans as a network operator. While travel times alone may not cover the full extent of the department’s traffic data needs, accurate and reliable travel times can be used for both planning and operations purposes.
Over the past few years, a number of private industry vendors have approached Caltrans with solutions to collect travel time data on highways and city arterials. Solutions revolve around two basic concepts and trends. The first trend suggests leveraging new technologies that significantly lower the cost of fixed detection. Both in- pavement technologies such as wireless magnetometers from Sensys Networks, Inc. and off- pavement technologies such as radar- based sensors by Speedinfo, Inc. offer much more attractive price points than inductive loops and make it conceivable to augment detection to a level that would yield accurate traffic maps and travel time estimates. An alternative concept is to use so- called mobile traffic probes to measure travel times from actual trips. Mobile traffic probes are essentially vehicles that are tagged and tracked along a corridor. This concept can be implemented by toll collection tags and readers, or by automated license plate readers. In either of those two cases, travel times are collected for preset segments of roadways in- between readers. For instance, the San Francisco Bay Area 511 system relies for a large part on data collected from FasTrak readers.
In the past several years, cell- phone based technology has gained momentum as a promising avenue, although previous research and field tests have not been conclusive. This technology previously relied on positioning provided by cellular networks, which still has to overcome significant challenges. However, the introduction of GPS receiver chips into more and more handsets represents a new opportunity. The prospect of large numbers of GPS- equipped cell phones reporting position and speed with 10 meter / 3 mph accuracy at regular intervals represents a huge leap forward. Yet its implementation requires addressing key questions regarding individual privacy, data ownership, network load, and proper traffic flow estimation techniques. The emergence of new mediums to diffuse traffic information, such as in- vehicle telematics displays and GPS- equipped cell phones could bring about a shift in how travelers perceive and consume traffic information in years to come. As the State Department of Transportation, Caltrans needs to monitor and leverage this paradigm shift.
1.2 Historical Narrative
The present work originated from several distinct sources. One group was comprised of scientists at the University of California, Berkeley ( UCB) who were investigating applications of mobile sensors. Another group from Nokia Research Center ( NRC) in Palo Alto was interested Chapter 1
Mobile Century 3
in social networking and location- based services that protect the privacy of participants. These two groups identified traffic monitoring as a potential area of overlap. Supported by a seed grant from the Center for Information Technology Research in the Interest of Society ( CITRIS) they began to explore the rich milieu of research questions involving conflicting needs for both data collection and privacy preservation. Soon afterward, Nokia awarded the UCB group with another seed grant that was matched by the University of California under the MICRO program.
Independently from the aforementioned groups, another team at the California Center for Innovative Transportation ( CCIT) wrote a proposal in December of 2006, entitled “ Deployment of value- added mobile traffic probes.” This proposal was funded by Caltrans in January of 2007 under Task Order 1021. Although details of TO 1021 evolved over the years, the spirit remained unchanged.
Four crucial technologies for ATIS: ( 1) GPS, ( 2) GIS and digital maps, ( 3) internet networking, and ( 4) wireless data communications, are identified in the proposal. In addition, the proposal notes that these technologies have achieved levels of maturity and affordability that place the ATIS industry “ on the verge of an unprecedented boom.”
Almost prescient in foresight, the proposal speculates that “ research under this project may serve in creating a paradigm shift in traffic data collection, from fixed sensors paid for by the state to mobile probes deployed as part of a self- sustaining private business model.”
The proposal further notes that “ in the past several years, cell- phone based technology has gained momentum as a promising avenue, although previous research and field tests are still not conclusive. Yet the salient point is that a 2- way communication device like a cell phone can provide data about the entire trip of a vehicle, rather than be limited to observations at specific locations. This potentially represents a considerable improvement over fixed readers in terms of both the resolution and the timeliness of the data being collected.”
Although recognizing the potential for cell phones as traffic probes, the technology choice for the original proposal was the Dash Navigation Unit. These units each included an accurate GPS receiver, a locally- stored comprehensive digital map, a complete historical traffic model for prediction, significant processing capability, and wireless communications built into a special purpose navigation appliance. Subsequent to the award, however, the provider of the navigation units withdrew from the project.
Under sponsorship from Caltrans, the team from CCIT joined forces with the UCB and NRC groups, who had already formed a strong partnership. The first readjustment of the research plan was to replace the Dash Navigation Unit with the Nokia N95 smartphone. These smartphones had lesser GPS capabilities and lacked the navigational aids made possible with locally- stored historical traffic, and digital mapping data. In contrast with the Dash Navigation Unit, the GPS- equipped N95 was more of a general purpose communication device. Note that the N95 was one of the first smartphones ever developed ( before the iPhone) and is regarded as a precursor to today’s participatory sensing based traffic monitoring systems. At the time, Chapter 1
Mobile Century 4
the limitations of the N95 platform forced the researchers to adopt a more focused and simple deployment plan.
The UCB and NRC groups contributed a substantially improved experimental protocol. For example, the underlying sampling scheme was made privacy- aware, as is further explained in Chapter 4. The privacy studies were performed in collaboration with a team from Rutgers University, with expertise in the field of privacy research, and funded by Nokia. For now, consider that this early consideration focused technology development in a direction more appropriate for widespread adoption. In addition, the details of the mobile probe deployment were improved to enable a scientific analysis of results. Rather than using a remote site, a well- studied section of I- 880 was chosen to enable rigorous evaluation of the data quality from the GPS- enabled smartphones.
The outstanding success of the 100- vehicle deployment precipitated further expansions to the direction of research. Caltrans allocated a supplemental award ( TO 1029), marshaling resources to process the enormous amount of data that were collected during the 100- vehicle deployment. In addition, efforts were refocused toward a second major deployment of probe vehicles at a scale of at least one order of magnitude larger than before. New partnerships ( including subcontracts with Covaluate, an Information Technology Service Provider, and Rensselaer Technology Institute) were forged in accordance with the new directives, which will be the focus of the report on Mobile Millennium, a follow up to Mobile Century launched soon afterwards.
1.3 Scope
This is the final report for Task Order 1021 and Task Order 1029. The bulk of this work is embodied in what has become known as Mobile Century. In terms of task orders, this refers to the entirety of the work plan for TO 1029 ( except for the small section entitled Mobile Millennium Traffic Server Development), and Task 2 as written in Amendment A ( replacing the original task order, TO 1021).
All other tasks in TO 1021, TO 1029 and subsequent amendments fall in to one of three categories:
( 1) AASHTO presentation
( 2) Mobile Millennium Planning, Design, and Server Development
( 3) Mobile Millennium Arterial Modeling
We note here that the above tasks ( 2) and ( 3) related to Mobile Millennium were enormous in scope. The monies allocated from TO 1021 and TO 1029 toward these tasks amounted to a small fraction of the total required to bring these tasks to fruition. Herein we only document initial stages of work toward these tasks funded by TO 1021 and TO 1029. The ultimate success of tasks ( 2) and ( 3) lie outside the scope of Mobile Century, and will be addressed in the Chapter 1
Mobile Century 5
forthcoming Final Report for Mobile Millennium ( Agreement 65A0301). Tasks ( 1), ( 2), and ( 3) as they are relevant to TO 1021 and TO 1029 are reported in Appendix 8.
1.4 Summary of Findings
Exploring a new paradigm. Advances in the mobile internet, alongside current trends in cell phone technology and location- based services, place the field of traffic monitoring at the cusp of a new era in data collection. A new paradigm in which GPS- enabled smartphones supply the bulk of raw traffic monitoring data is very promising. Compared to the status quo, the costs of collecting traffic information would be drastically reduced, and coverage could be extended far beyond what is currently feasible with fixed detectors alone. The opportunity for government agencies is significant: the availability of data will empower the traveling public with real- time access to current traffic conditions, while transportation operators will gain access to an unprecedented wealth of information to help them better manage road networks.
A successful proof- of- concept. The successful 100- vehicle deployment presented in this report was conceived as a proof- of- concept for a traffic monitoring system based on GPS- enabled mobile phones. During the experimental deployment, an average penetration rate of equipped vehicles was sustained near 3%, which at the time of the experiment was representative of the 18- month growth forecast for the GPS fleet in the smartphone market.
Traffic reconstructed from smartphone data. Raw data from the GPS- enabled smartphones alone were sufficient to infer traffic features, i. e., to construct an accurate velocity map over time and space. Therefore, probe vehicles deployed during the Mobile Century experiment were evaluated as providing substantial added value. Since ground- truth travel times were recovered by re- identifying vehicles captured on videotape, these results can be asserted with high confidence. We conclude that the quality of data obtainable from present- day smartphones is adequate for useful, real- time traffic applications, such as calculating travel times.
VTL- based monitoring. As will be explained in Chapter 4, a data sampling approach using Virtual Trip Lines ( VTLs) was designed and implemented. The VTL approach combined with a sustained 3% penetration rate of probes provided better data for travel time prediction than that of the PeMS loop detectors spaced at an average distance of 0.35 mi. Furthermore, the use of VTLs provides enough data for traffic monitoring purposes while protecting the privacy of participants. In addition to the privacy benefits, another key advantage of virtual trip lines over physical traffic sensors is the flexibility with which they can be deployed.
Challenges. As a business model, significant challenges yet remain. For example, participation of the traveling public is crucial for success. In order to create and maintain the desired service quality, a large number of participants must be recruited and sustained. To achieve this, the right incentives for participation are needed. Premature deployment would be counterproductive. One can imagine a worst case scenario in which a deployment fails for lack of public interest and participation. Chapter 1
Mobile Century 6
Future work. The next iteration of this program includes efforts to extend Mobile Century in a number of ways. First, better methods are required for incorporating data from both static ( loop detectors) and mobile sensors ( GPS- enabled mobile phones). Inverse modeling and data assimilation algorithms aimed at identifying and circumventing potential deficiencies in available data are also necessary. Finally, the monitoring of arterials brings additional challenges that also require much future work. These issues will be explored in a follow up report for Mobile Millennium.
Deployment. Stated simply, any future deployment will require substantial research and development. We recommend an evolutionary progression of field operational tests, so that lessons learned during any particular iteration may be incorporated into subsequent efforts as the scope of the system is continually scaled up. To make this possible, technology infrastructure must be implemented to support the computational modeling that will be required. In addition, any future deployment effort must include a strong industry component, and proceed in a way that complements existing trends in mobile computing.
Future applications. Future work as a part of this program has direct application toward the strategic goals of Caltrans in the area of operational data collection. Data sharing modalities should be explored between industrial companies, Caltrans, and local public agencies. Although traffic data collected as part of the next iteration will be served back to the mobile phone users who originally generated the data, future iterations may disseminate information much more broadly ( broadcast media, internet websites, personal navigation devices, and roadside changeable message signs).
1.5 Organization of Report
The limitations of the status quo, the possibilities enabled by the mobile internet, and the grand- scheme challenges to the new paradigm are introduced in Chapter 2. A substantial literature review is furnished in the broader context of these issues; this discussion applies beyond Mobile Century.
In contrast, Chapter 3 focuses specifically on how to build a medium- scale, one- day deployment of a proof- of- concept system. Efforts to determine the parameters of a workable experiment, and initial back- of- the- envelope calculations are described, thus setting the stage for everything that follows in this report.
Chapter 4 explores the challenge of building a traffic monitoring system that addresses the goals of ( 1) acquiring quality real- time probe data from GPS- enabled cell phones, and ( 2) protecting participants from privacy threats by design. Initial prototyping was performed to assess the software and hardware systems that were implemented and the quality of data obtained from the smartphones.
Assuming that real- time probe data is available, Chapters 5 and 6 describe algorithms to reconstruct the traffic state from that data. Assimilating Lagrangian data in the density domain Chapter 1
Mobile Century 7
is the subject of Chapter 5. An alternate scheme is presented in Chapter 6, in which Lagrangian data is fed directly into a velocity domain model. This velocity domain model was ultimately the one adopted for the real- time, 100- vehicle deployment of February 8, 2008.
Chapter 7 stands alone as a narrative of the video validation effort. The selection of video camcorders, trial tests, deployment protocol, and post- processing procedures are described. This crucial aspect of Mobile Century is what made possible an objective comparison of the state- of- the- art, status quo monitoring system ( based on inductive loop detectors) with the proof- of- concept implementation ( based entirely on mobile probes).
An overview of the experimental protocol for the 100- vehicle deployment is described in Chapter 8. Explained are the employed resources, established procedures, gathered data, and post- processing efforts. Supplementary material, including more detailed logistics, schedule of execution, and emergency procedures are furnished in the Appendices.
Chapter 9 presents the experimental results of the 100- vehicle deployment, the cornerstone of Mobile Century. The trajectory data and reconstructed velocity fields are compared with the ground- truth supplied by the video cameras. Travel time estimated from loop detectors is compared with that estimated from probe data.
Chapter 10 revisits the density domain data assimilation methods that were introduced in Chapter 5. Additional evaluation of these methods is performed using the data from the 100- vehicle deployment.
This report concludes in Chapter 11 with an evaluation of the project and recommendations for future deployment of the new paradigm for traffic monitoring.
Chapter 2
Mobile Century 9
2 Background
This chapter begins with a discussion of the status quo in Section 2.1: traffic monitoring with dedicated equipment and sensing infrastructure. Experience shows that deploying, operating, and maintaining new technology in this paradigm is costly and slow. As explained in Section 2.2, advances in the mobile internet bring forth potential to leverage current trends in cell phone technology and location- based services to transcend the old paradigm. Discussed in Section 2.3 are the barriers that impede adoption of this new paradigm. In particular, technical barriers, and social acceptance issues related to privacy concerns are addressed.
2.1 Traffic Monitoring with Dedicated Equipment ( Road Infrastructure)
Traffic monitoring with inductive loop detector ( ILD) systems. ILD systems are the most common highway traffic monitoring tool, and have been in use for decades. The current highway monitoring system consists of wire inductive loops placed directly in the top layer of the pavement. When a vehicle passes over the sensor, it is recorded by a roadside controller. In the case of travel time ( the most important performance metric to the driving public), these sensors suffer from some fundamental drawbacks.
ILD velocity estimation is inaccurate. ILDs are accurate sensors for flows ( vehicle counts), but they often generate inaccurate velocity measurements. California's freeways are equipped with about 23,000 ILDs embedded in the pavement, accounting for roughly 8000 detector stations. Several of these stations feature a single inductive loop per lane, which cannot measure vehicle speed directly. Practitioners have attempted to create aggregate velocity estimates using the average length of a vehicle on the highway and the percentage of time the sensor is occupied. Even when the sensor is working properly, these estimates are particularly noisy ( with estimates ranging from 20 mph to 120mph) for traffic flowing at greater than 50 mph [ 24]. This has lead researchers to develop algorithms to improve these single loop estimates [ 24, 49, 64, 83, 93]. In contrast, dual loops ( composed of two successive inductive loops) compute velocity by matching the respective occupancy patterns. In practice, they also have been found to produce significant errors [ 24].
Loop detector stations are expensive to deploy and maintain. The cost of an ILD is roughly $ 900-$ 2000 depending on the type of the loop. More importantly, the direct and indirect costs of deployment are significant ( staff to install the sensors, and corresponding impact on traffic). According to the PeMS system [ 91], only 65% of the detectors in California are working properly; the main causes of malfunction are problems with the controller. In [ 90] malfunction rates of loop detectors and their causes were studied using data obtained from loops on the same stretch of I- 880 examined in the present work. The average malfunction rate was 21%, despite significant efforts to maintain system operations during the study.
RFID transponders for travel time measurements. Radio- frequency identification ( RFID) transponders are often deployed to collect automatic toll payment, such as FasTrak in California Chapter 2
Mobile Century 10
or E- ZPass in some states on the East coast. These transponders can also be used to obtain individual travel times based on vehicle re- identification [ 10, 116]. Readers located on the side of the road keep a record of the time the transponder ( i. e., the vehicle) crosses that location. Measurements from the same vehicle are matched between consecutive readers to obtain travel time. This technology is successful only when drivers have an incentive to carry the transponder ( such as sorter toll booth queues), and can only provide travel times between segments where the readers have been deployed.
Travel time measurement through LPR technology. License plate readers consist of high speed cameras that record the license plates of vehicles on the highway. As a vehicle passes multiple cameras, the travel time between the readers is computed. Although LPRs avoid the need for in- vehicle equipment, these systems are complicated to install, and require an additional camera for each lane of traffic to be monitored. The relatively high cost of the readers ( in the $ 10,000 range plus installation costs), have limited their widespread implementation. Example deployments include Traffic Master’s passive target flow management ( PTFM) on trunk roads in the United Kingdom [ 112], and Oregon DOT’s Frontier Travel Time project [ 16].
Traffic monitoring with dedicated probe vehicles. Dedicated probe vehicles equipped with a Global Positioning System ( GPS) device are capable of collecting information such as position, speed, and travel time. The work in [ 100] addressed some of the key issues of a traffic monitoring system based on probe vehicle reports, and concluded that they constitute a feasible source of traffic data. The work in [ 123] also investigated the use of GPS devices as a source of data for traffic monitoring. Two tests were performed to evaluate the accuracy of GPS as a source of velocity and acceleration data. The accuracy level was found to be good, despite limitations of the selective availability1 feature that was imposed at the time of the study [ 92].
Deployment of probe vehicles. HICOMP [ 52] is an example of one small- scale deployment of dedicated probe vehicles using GPS devices to monitor traffic for some freeways and major highways in California. Unfortunately, dedicated probe vehicles equipped with a GPS device represent added cost that cannot be applied at a global scale. As pointed out by [ 74], the penetration of HICOMP is low and the collected travel times are not as reliable as other systems such as PeMS. Other approaches have investigated the possibility of using dedicated fleets of vehicles equipped with GPS or automatic vehicle location ( AVL) technology to monitor traffic [ 17, 85, 102], such as FedEx, UPS trucks, taxis, buses or other dedicated vehicles. While industry models have been successful at gathering substantial amounts of historical data using this strategy, for example Inrix, the use of dedicated fleets always poses issues of coverage, penetration, bias due to operational constraints and specific travel patterns. Nevertheless, it appears to be a viable source of data, particularly in large cities.
1 Selective availability is the intentional inclusion of positioning error in civilian GPS receivers. It was introduced by the Department of Defense of the U. S. to prevent these devices from being used in a military attack on the U. S. This feature was turned off on May 1, 2000. Chapter 2
Mobile Century 11
Deployment of dedicated communication systems is slow. One policy intended to enable dedicated communication systems for the transportation network has achieved only limited deployment. On October 21, 1999, the Federal Communications Commission allocated 75MHz of spectrum as part of the US Department of Transportation’s ( DOT) Intelligent Transportation Systems ( ITS) US- wide program, with mostly traveler safety, fuel efficiency and pollution in mind. The first industry- government supported standard followed on August 24, 2001, when ASTM’s E17.51 Standards Committee voted 20- 2 to base Dedicated Short Range Communication ( DSRC) on a modification of the IEEE 802.11a specification, now named IEEE 802.11p. At the same time, the US DOT launched a plan that included the deployment of around 250,000 roadside DSRC radios, but only led to around 100 radios deployed for the entire US as of 2008 ( mostly in Michigan and California). This example highlights the difficulty of creating a dedicated communication system for the transportation network.
2.2 Growth of the Mobile Internet
Smartphones as sensors of the built environment. The convergence of communication and sensing on multimedia platforms such as smartphones provides the engineering community with unprecedented monitoring capabilities. Smartphones include a video camera, numerous sensors ( accelerometers, magnetometers, light sensors, GPS, microphones), wireless communication outlets ( GSM, GPRS, WiFi, Bluetooth, infrared), computational power and memory. With the rise of the Android and the iPhone, this trend has now greatly accelerated. These phones can be used to listen to the radio, to watch digital TV, to browse the internet, to do video conferencing, to scan barcodes, to read PDFs, and the list is endless. The rapid penetration of GPS in smartphones is enabling device geopositioning and context awareness, which in turn is causing an explosion of Location Based Services ( heavily relying on mapping) on the devices. For example, Nokia Maps displays theaters and museums near the phone, Google Mobile provides driving directions from the phone location, and the iPhone Travelocity shows hotels near the phone. Due to their portability, computation, and communication capabilities, smartphones are becoming useful for numerous applications in which they act as sensors moving with humans embedded in the built infrastructure. Large scale applications include everything from population migration tracking and traffic flow estimation to physical activity monitoring for assisted living.
The competition for probe traffic data collection as a proxy for the larger war to conquer the mobile internet. There has been a trend of increased levels of competition between cell phone manufacturers, network providers, internet service providers, computer and software manufacturers, and mapping companies. Following the transition from desktops to laptops to smaller and more portable devices, top companies in these industries are redefining themselves to remain relevant as the internet goes mobile. In the context of traffic monitoring, the examples below show the importance of information technology for transportation systems. In late 2007, Google made a move toward the phone industry with the launch of the Open Handset Alliance and the Linux- based Android platform ( leading to the T- Mobile G1 Google phone). In part because of the pressure to use open platforms enhanced by the Google OS, Nokia, who manufactures 40% of the cell phones in the world, purchased Symbian, which Chapter 2
Mobile Century 12
licenses the operating system running on more than half of the smartphones in the world. Nokia then established the Symbian Foundation, with the intention of unifying the platform and making it open- source ( Apple also partially opened its iPhone OS to software developers with the release of a software development kit). To strengthen its own mapping capabilities, Nokia also bought Navteq, which is the largest mapping company in the world, following personal navigation device manufacturer TomTom’s purchase of Tele Atlas, Navteq’s chief competitor. Navteq in turn owns Traffic. com, one of the leading traffic data collection and broadcast companies. Its competitors include Inrix, which provides traffic data to Microsoft’s web, desktop, and mobile applications.
Smartphones: a transformation from dedicated infrastructure to market- driven technology. The scale at which cell phones are produced, and the rate at which they integrate new technology, is dramatic. The total number of cell phones worldwide exceeded three billion at the time of this project. Some European countries have a penetration rate of more than 150% ( 150 cell phones for 100 people), and forecast 1 billion smartphones by 2012. Nokia alone produces more than 13 phones a second; with the increasing penetration of GPS in the cellular phone fleet, cell phones will soon constitute one of the major traffic information sources available to the public. In North America and Europe, the overwhelming majority of commuters have a cell phone, potentially populating the entire arterial network with probe traffic sensors. Obviously, the use of cellular devices as traffic sensors has numerous benefits: ( 1) It is possible to leverage the market driven communication infrastructure already in place; ( 2) The spatio- temporal penetration of cell phones in the transportation network is increasing at an extremely fast pace; ( 3) The use of cell phones as traffic probes is device and carrier agnostic, leading to faster penetration; and, ( 4) Major car manufacturing companies already have cradles and interfaces with cell phones ( for example BMW and the iPhone) in their new cars, so the sensing information gathered by modern cars can also be sent to such monitoring systems.
2.3 Barriers to a New Paradigm
New paradigm. The concept of using cell phones as sensors has the potential to usher in a new paradigm for traffic monitoring. Such systems promise to significantly improve coverage and timeliness of traffic information [ 5, 56, 58]. Near- term solutions will require GPS measurements to be fused with traditional sources of traffic information such as loop detectors, camera, and human reports. However, with sufficient penetration, this approach could potentially enable the collection of real- time traffic information over the complete road network, including arterials, at minimal cost for transportation agencies.
Barriers. Several studies have demonstrated the feasibility of probe based traffic estimation through analysis, simulations, and experiments [ 29, 37, 111, 120]. Yet many challenges, both technical and societal must be addressed. Chapter 2
Mobile Century 13
2.3.1 Technical Barriers
Non- GPS based localization of cellphones is problematic. Multiple technological solutions exist to overcome the localization problem using cell phones. Historically, the seminal approach chosen for monitoring vehicle motion using cell phones ( prior to the rapid penetration of GPS in cellular devices) uses cell tower signal information to identify a handset’s location. This technique usually relies on triangulation, trilateration, tower hand- offs, or a combination of these. Several studies have investigated the use of mobile phones for traffic monitoring using this approach [ 13, 38, 81, 115, 117]. The fundamental challenge in using cell tower information for estimating position and motion of vehicles is the inherent inaccuracy of the method, which poses significant difficulties to the computation of speed. Several solutions have been implemented to circumvent this difficulty, in particular by the company Airsage, which historically developed its traffic monitoring infrastructure based on cell tower information [ 80, 104]. Based on the time difference between two positions, average link travel time and speed can be estimated. [ 119] conducted a field experiment to compare the performance of cell phones and GPS devices for traffic monitoring. The study concluded that GPS technology is more accurate than cell tower signals for tracking purposes. In addition, the low positioning accuracy of non- GPS based methods prevents its massive use for monitoring purposes, especially in places with complex road geometries. Also, while travel times for large spatio- temporal scales can be obtained from such methods, other traffic variables of interest, such as instantaneous velocity are more challenging to obtain accurately.
GPS based localization provides high quality data. Increasing numbers of smartphones or PDAs come with GPS as a standard feature. This technology can provide more accurate location information and thus more accurate traffic data such as speeds and/ or travel times. Additional quantities can potentially be obtained from these devices, such as instantaneous velocity, acceleration, and direction of travel. In [ 38], cell phones are used for traffic monitoring purposes, and the need for GPS- level accuracy position information to compute reasonable estimates of travel time and speed is discussed. Furthermore, [ 118] and [ 119] concluded that if GPS- equipped cell phones are widely used, they will become a more attractive and realistic alternative for traffic monitoring. GPS- enabled mobile phones can potentially provide exhaustive spatial and temporal coverage of the transportation network when there is traffic, with the high positioning accuracy achieved by a GPS receiver.
Lagrangian vs. Eulerian information. While cellular phones provide an ideal bridge between the physical world ( vehicle flows and dynamics on the road) and the information world ( software systems monitoring the network), there is one major difference between the data collected by cell phones and traditional data, commonly used to estimate traffic in real time: the data collected by phones in cars is Lagrangian, i. e. gathered along cars trajectories, and not Eulerian, i. e., control volume based. This poses major challenges in building an information system for a cyberphysical infrastructure such as the transportation network. While a static loop detector or a camera ( both Eulerian) can easily capture all vehicles going through the space monitored by the sensor, and therefore infer aggregate quantities ( flows, counts, local speed), a Lagrangian Chapter 2
Mobile Century 14
sensor can only monitor quantities following the vehicle, without direct access to flows, counts, etc.
Distributed models for the transportation network. Because GPS enabled phones measure velocity, or travel time between two consecutive GPS readings, constitutive models used to describe the evolution of the system need to incorporate these reading and bypass quantities which cannot be measured ( density, flows, counts). The development of such flow models, for highways and arterials is still at its infancy. Techniques used for this include partial differential equations, queuing systems, and hybrid system models of flow equations.
Machine learning models to circumvent lack of geographical infrastructure information. Knowledge of signage, traffic light presence, and cycle information is difficult to procure. The presence of stop signs, lights, and their effect on traffic is not available from databases on a US- wide scale. Furthermore, they change too often to be incorporated into flow models. This difficulty has to be circumvented by machine learning algorithms capable of learning the flow features without knowledge of the detailed infrastructure, using techniques such as clustering analysis.
Inverse modeling and data assimilation. In the age of massive data collection, one of the most fundamental theoretical challenges associated with the reconstruction of traffic using mobile data will be the proper use of techniques to incorporate data into flow models or statistical models. The development of these techniques in fields such as oceanography or meteorology is relatively mature. For large- scale infrastructure systems, the state of modeling, model inversion and computation is still at its infancy, but promises significant breakthroughs in the near future.
Considerations for initially low penetration of equipped vehicles. As suggested in the literature [ 72, 94, 117, 118] field tests are needed to assess the potential of new technologies such as GPS- enabled mobile phones. Test deployments to assess the potential of traffic monitoring using cell phones go back to the advent of GPS on phones. In particular, the study of [ 30] investigates the deployment of 200 vehicles for an extended period of three months and the potential data that can be gathered from it. In light of that study, one of the main issues in experiments or pilot tests is the problem of penetration, i. e. percentage of vehicles equipped vs. total number of vehicles on the road.
Real- time, online and robust availability. Unlike the more permanent Eulerian detectors, to which data quality, reliability and performance indices can be easily attributed, the penetration of cell phones at a given location and time is highly variable. Before this type of monitoring becomes the standard, the participation of the public will be spatially and temporally unpredictable. This means that the algorithms used for estimating traffic must be robust to variability in penetration. Chapter 2
Mobile Century 15
2.3.2 Privacy Issues and Societal Acceptance
Privacy concerns with a new paradigm. Traffic monitoring through GPS- equipped vehicles raises significant privacy concerns, because the external traffic monitoring entity acquires fine- grained movement traces of the probe vehicle drivers. These location traces might reveal sensitive places that drivers have visited, from which, for example, medical conditions, political affiliations, speeding, or potential involvement in traffic accidents could be inferred. Furthermore, the correlation of this data with existing records poses specific threats to the preservation of privacy.
Example of data granularity. A variety of sampling techniques can be used to collect data from GPS enabled mobile devices. In the case of the Nokia N95, the embedded GPS chip- set is capable of producing a time- stamped geo- position ( latitude, longitude, altitude) once every three seconds. From this time and position data, the instantaneous velocity is produced by the phone at the same frequency. Over time, this vehicle trajectory and velocity information produces a rich history of the dynamics of the vehicle and the velocity field through which it evolves.
Risk of unintended re- identification. While this level of detail is particularly useful for traffic estimation, it can be privacy invasive, since the device is ultimately carried by a single user. Even if personally identifiable information from the data is replaced with a randomly chosen ID through a process known as pseudo- anonymization, it is still possible to reidentify individuals from trajectory data. For example, pseudo- anonymous trajectories have been combined with free, publicly available data sets to determine the addresses of participant’s homes [ 54].
Data value: sensitivity vs. utility. The transmission of high frequency data without regard to location also wastes resources throughout the system, which can pose scalability problems. In addition to disclosing sensitive information, the trajectory information on small roadways near users homes are of lower value to the general commuting public than major thoroughfares such as interstates. Thus, collection of low utility and highly sensitive data should be avoided when sampling using mobile devices.
Spatially aware sampling and privacy. At the heart of such a system, privacy- by- design sampling techniques must be used to prevent privacy invasion. In addition to proper anonymous data collection and encryption, sampling the vehicles at locations which are privacy safe is key to ensuring the ongoing participation of the public that is needed for such a system.
Disincentives for participation. Realistically, future users will have the option to choose the terms under which they share location information. Without providing tangible benefit, or safeguards to insure that an acceptable level of privacy can be guaranteed, adoption will not be widespread among the traveling public.
Current studies regarding privacy concerns are inadequate. Traffic monitoring applications based on a large number of probe vehicles have recently received much attention [ 21, 57, 120]. Chapter 2
Mobile Century 16
None of these works have addressed location privacy concerns in such systems. Since most traffic monitoring applications do not depend on the specific identification information about probe vehicles, the anonymization of sensing information has been a solution in practical deployments [ 58, 59, 110]. Not surprisingly, recent analyses of GPS traces [ 47, 71, 73], have shown that naive anonymization by simply omitting identifiers from a location dataset does not guarantee anonymity. Unique parts of GPS traces may be exploited to re- identify individuals using multi- target tracking, k- means clustering, or fingerprinting approaches to identify computer systems.
Centralized architectures for privacy protection. Therefore, several stronger protection mechanisms have been investigated. The k- anonymity concept [ 99, 108] provides a guaranteed level of anonymity for a database, although some recent studies [ 69, 82] have identified weaknesses. For location services, the k- anonymity concept has led to the development of centralized architectures that temporally and spatially cloak location- based queries [ 41, 46, 84]. This present work, in comparison, concentrates on providing privacy without requiring a single trustworthy entity.
Other best effort approaches require a trustworthy server. There are many best effort approaches [ 15, 65] that degrade information in a controlled way before releasing it. These approaches can be implemented in a centralized architecture or a decentralized approach. Many best effort approaches successfully preserve the privacy of users in high density areas, but they do not guarantee the privacy regardless of user density and user behavior pattern. [ 55] proposes the uncertainty- aware path cloaking algorithm to provide guaranteed privacy regardless of user density, but this again requires the existence of a trustworthy privacy server.
Perturbation and access control. Anonymous communication systems ( e. g., onion routing [ 31, 43]) use a similar approach of distributing knowledge over several mixes. Random perturbation approaches for privacy- aware data mining [ 6, 7], which perturbs the collected inputs from users to preserve privacy of data subjects while maintaining the quality of data, are not applicable for time- series location data since noise with large variance does not preserve sufficient data accuracy, while noise with small variance may be filtered by tracking algorithms due to the spatio- temporal nature of the data [ 70]. Access control methods [ 39, 121] restrict access to data to permitted users. However, these techniques do not fully address the dishonest insider challenge. Further, they are not applicable to business models where the aggregated data is transferred to third party.
Chapter 3
Mobile Century 17
3 Design Challenges
This chapter furnishes an overview of how practical requirements shaped the one- day, experimental deployment of probe vehicles. Logical steps are retraced, and initial back- of- the- envelope calculations are described. The goal is to design an experiment to capture the essence of what a new traffic monitoring paradigm might look like. How might one address the key barriers outlined in Chapter 2? What constitutes a proof- of- concept? How might one build a miniature version of a near- term deployment? Projections suggested that in five to ten years, a substantial fraction of cell phones will be equipped with GPS receivers. In such a world, how accurately can traffic conditions be reconstructed in real- time with only GPS- enabled cell phone data? The experiment was designed to answer these types of questions.
Achieve desired penetration rate by design. A survey of previous work revealed that one key issue of smartphone based systems is the penetration rate. The penetration rate is defined as the fraction of vehicles ( by flow) that act as probes to provide traffic data. The traffic state cannot be estimated to any useful precision when data is too sparse; i. e., the penetration rate is too low. From a traffic monitoring perspective, any successful experiment must maintain a penetration rate above some threshold. Previous studies reported that data coming from about 3% to 5% of the total flow are sufficient to obtain accurate estimates of the travel time [ 100, 115, 119]. This penetration rate is also realistic as a near- term future possibility. The present work distinguishes itself from previous studies in that a sufficiently high penetration rate was achieved by the design approach described in this chapter.
Addressing barriers to the new paradigm. Questions motivated by the discussion of Chapter 2 are reiterated here. The quality of data acquired from a privacy- aware sampling scheme is to be investigated. Is the information content of this data appropriate for reconstructing traffic states in practice? Are the algorithms employed able to reconstruct traffic states with adequate precision? How does the experimental system compare with a state- of- the- art system using ILDs? Specifically, how do both systems compare with ground- truth travel times, and how might one acquire that ground truth?
3.1 Preliminary Investigation
Confirm necessary penetration rate with NGSIM data. Trajectories from the NGSIM2 data set provide accurate ground truth for all vehicles traveling along a 2000 ft stretch of expressway for a duration of 45 minutes. Measures such as vehicle accumulation and exact travel times can be calculated directly from the ground truth. The problem to be solved is to sample only a limited amount of information from the original data set, reconstruct the traffic flow, and to estimate vehicle accumulations and travel times based on the reconstruction. The estimated accumulations and travel times are compared with the ground truth, and errors are quantified.
2 http:// ngsim. camsys. com/ Chapter 3
Mobile Century 18
Consider Kalman Filtering and Newtonian relaxation algorithms. The accuracy of the estimates is limited by two factors, the quantity of information, and the algorithms for traffic reconstruction. At this early planning stage, sampling was assumed to occur at a fixed rate in time ( temporal based sampling), and modeling was assumed to be performed in the density domain. Two algorithms were considered, Kalman Filtering and Newtonian relaxation ( the latter is also called the “ nudging method,” borrowed from oceanography). These two methods are described in detail in Chapter 5, and further evaluated in Chapter 10. This chapter furnishes a description of strictly preliminary findings used during the initial design stages of the experiment.
Figure 3.1: Vehicle trajectories from NGSIM. Shown in red, a flow fraction, , of trajectories are randomly designated as probes.
Problem formulation. The problem is posed in the following way. Assume that a fraction of vehicles on the expressway are equipped with GPS- enabled phones. In Figure 3.1, these equipped vehicles, shown in red, flow along with general expressway traffic, shown in blue. An example trajectory from one probe vehicle is displayed in Figure 3.2. It is assumed that the equipped vehicles can calculate average velocity over a time period . In addition, the probes report their velocity and position once every T seconds. The penetration rate and the sampling rate T determine the quantity of data available to the reconstruction algorithms. For Chapter 3
Mobile Century 19
each of the four test scenarios defined in Table 3.1, the number of observation samples arising from each parameter set is listed.
Figure 3.2: One vehicle trajectory. Parameters are shown for real- time probe data reports.
Table 3.1: Test cases for traffic reconstruction.
Case
(%) T
( sec)
t ( sec)
# of Lag. observations
1
5
600
30
117
2
5
10
10
773
3
20
600
30
417
4
20
10
10
2577
Chapter 3
Mobile Century 20
Reconstruct traffic using measurements distributed in space and time. For the sake of this discussion, it is assumed that a communication network has been implemented to collect data from multiple cell phones and to deliver it to some central server where the algorithms will be run. The observations are dispersed in space and time as shown in Figure 3.3. The immediate goal is to reconstruct traffic flow based on these data. For this purpose, the expressway is discretized and the cell transmission model is employed to solve3 the conservation partial differential equation ( PDE), the underlying model for this entire study. 4
Figure 3.3: Spatio- temporal dispersion of probe measurements for different combinations of penetration and sampling rates.
Solving the PDE. Assuming only initial and boundary conditions, the PDE can be solved. This solution is referred to as the EDO solution, for Eulerian data only. Typically, the EDO solution has poor accuracy at spatio- temporal regions far from the initial and boundary conditions. As the PDE solver steps through time, probe data becomes available. This data needs to be
3 Methods are described in detail in Chapter 5.
4 Based on traffic flow physics, [ 79] and [ 97] independently proposed a first order partial differential equation ( referred to as the LWR PDE) to describe traffic evolution over time and space. Chapter 3
Mobile Century 21
incorporated into the solution so as to improve accuracy. Physically, this corresponds to adding or deleting vehicles in the cells so that the solution agrees with the additional data.
Incorporating Lagrangian data in the solution. Two algorithms were used to incorporate the Lagrangian data into the traffic reconstruction. The results were used to calculate measures comparable to the ground truth. Figure 3.4 shows how estimates of vehicular accumulations are improved as Lagrangian data is incorporated. Figure 3.5 shows that as more Lagrangian data is incorporated, the performance of the reconstruction algorithm improves. Figure 3.6 displays estimates of vehicular density at the middle of the modeled expressway section. Figure 3.7 displays estimates of travel time. Details of the mathematical tools employed to generate the estimates shown in these figures are presented in Chapter 5.
Figure 3.4: Vehicle accumulations. Comparison of estimates using the Kalman filter method ( top), and Nudging method ( bottom). In both cases, the incorporation of Lagrangian data results in improved estimates over those using Eulerian data only. Chapter 3
Mobile Century 22
Figure 3.5: Vehicle accumulations. The higher the sampling rate, the higher the fidelity of the accumulation estimate.
Figure 3.6: Vehicle density estimated at the middle of the modeled expressway section. Chapter 3
Mobile Century 23
Figure 3.7: Travel time estimation for the modeled expressway section.
Discussion of results. The algorithms used here, and described later in the report, incorporate Lagrangian data in the modeling, and the accuracy of the estimates improve as more Lagrangian data are made available. Based on preliminary findings, Kalman Filtering slightly outperforms the nudging method. For this reason, Kalman Filtering was eventually chosen for the actual probe deployment. As indicated above, more details on these methods are furnished in Chapter 5, and Chapter 10.
Sampling rate issues. There is a clear design trade- off between penetration rate and sampling rate. The lower the penetration rate, the greater the necessary sampling rate for a given necessary quantity of data. Of course, a greater sampling rate can be more privacy intrusive. This issue leads to yet another trade- off between quality of data for traffic monitoring and privacy preservation; these issues are discussed in detail in Chapter 4.
3.2 Scale of Experiment
Scale of experiment. An experiment with ten vehicles would be too small, and an experiment with 1000 vehicles would not be feasible. To a first order, the scale of the experiment was a function of available funding. How many vehicles can be rented, and how many drivers can be employed within the funding constraints? For the clarity of exposition, we assume an experiment involving 100 vehicles, and investigate the consequences of this choice in terms of logistics and experiment design. Chapter 3
Mobile Century 24
Formulation of constraints. Assuming that probe vehicles circulate in loops along an expressway, the cycle time, C, is given by:
121211CLTTvv ( 3- 1)
Where L is the length of the section ( one way), iv is the average speed for an equipped vehicle on direction i, and iT is the lost time incurred when exiting the expressway and entering again in the opposite direction. The specification of a minimum penetration rate, , places a constraint on the number of equipped vehicles, N, the maximum flow in either direction, F, and the cycle time, C, as
NCF ( 3- 2)
Intuitively, the constraint is met when N stays high, and Fstays low. Constraints may also be formulated in terms of occupancy. For example,
2NGLr ( 3- 3)
Where G is the average effective vehicle length, and r is the occupancy. Assuming G= 20 ft, = 0.2 and N= 100, the maximum feasible expressway length can be calculated. In free- flow conditions near capacity, one might expect r= 0.1, and L= 9.5 miles. With congestion in both directions, one might expect r= 0.3, and L= 3.2 miles.
Consequences of 100 vehicles. A moderately scaled experiment with 100 participating probe vehicles has consequences. Chief among these is that a base station becomes necessary for staging purposes. The base camp requires significant space; i. e., a large parking lot that one might find at a shopping center. In addition, an experiment of this size requires non- trivial logistical support. The participation of probe vehicles, together with the penetration requirement, determines the feasible area that can be monitored.
Capture free flow and congestion conditions by design. To be a feasible paradigm for traffic monitoring, it is crucial to show that the methods and algorithms will work in both congestion and in free flow. For this purpose, a location must be chosen that can serve as a dependable source of recurrent congestion.
Variable spatial zone vs. variable number of probes. As shown above, the monitoring capability with a fixed number of vehicles depends greatly on the traffic conditions to be monitored. If the spatial zone of the experiment were constant, than one would need a variable number of probe vehicles as traffic conditions vary. Instead, it was proposed to define Chapter 3
Mobile Century 25
the spatial zone of the experiment to correspond appropriately with the expected traffic conditions during the day.
Compare status quo monitoring with capabilities of new paradigm. Another requirement is to make a fair comparison with start- of- the- art traffic monitoring systems such as 511. org that make use of ILDs. For this purpose, an expressway segment with good detector coverage is necessary. In addition, both systems must be benchmarked with ground- truth travel time data. Therefore, it was necessary to instrument the endpoints of the expressway section with video cameras capable of re- identifying vehicles.
Effective measurement, without disrupting traffic, by design. Previous work with Caltrans on a PARAMICS model for I- 880 was leveraged to refine the experimental protocol. With this model, the incremental effect of adding 100 extra cars onto I- 880 was investigated. The question was whether the additional 100 vehicles would cause an undue influence on the data to be gathered. From simulation, it became clear that with 100 extra vehicles traveling in loops, that it was possible for on- ramps or off- ramps to become oversaturated. One outcome of the simulation was to break up the 100 vehicles into three separate groups with three separate routes. In this way, the probe vehicles would sense traffic, but not unduly affect the traffic being measured.
Privacy preservation. Having three routes in which vehicles may pass each other improves the data set for the purposes of investigating the privacy threats inherent in the data gathering mechanisms. This issue was of particular concern for one of the corporate partners, Nokia. With only one circuit, and with few opportunities for vehicles to pass each other, re- identification becomes trivial. Three circuits provide for overtaking and mixing, as would be the case with less contrived driving situations.
3.3 Practical Considerations
Legal requirements affecting experimental protocol. As the experiment began to take shape, the formal role of the drivers to be hired needed further clarification. In particular, it was necessary to understand how federal regulations and policies for the protection of human subjects might impact the experimental protocol.
Office for the Protection of Human Subjects. It is essentially for this purpose that the Office for the Protection of Human Subjects ( OPHS) exists at the University of California, Berkeley. This office coordinates with the Committee for Protection of Human Subjects ( CPHS) of which there are two groups: CPHS- 1; and CPHS- 2, who serve as Institutional Review Boards ( IRBs). These IRBs ensure the protection of the rights and welfare of all human participants in research conducted by university faculty, staff and students.
Initial protocol to comply with federal and university requirements. In particular, a CPHS Narrative Form detailing the proposed protocol was submitted for approval. This document is included as Appendix 1 in its entirety. Although the final protocol for the February 2008 Chapter 3
Mobile Century 26
experiment differs from what was proposed in March of 2007, Appendix 1 is useful as a historical document that captures a snapshot of the evolving experiment that came to be known as Mobile Century.
Practical considerations. The driver schedule was defined based on the most restrictive truck driver regulations worldwide, stipulating at least 45 minutes of rest for every 4 hours of driving. In addition, to limit the total number of hours worked, it was deemed infeasible to begin the experiment early enough in the morning to capture both the morning rush and the evening rush. Instead, only the evening rush was intended to be captured.
Institutional collaboration. The collaborative model for the present work was that industry provides implementation expertise. The NRC team was tasked to build the back- end infrastructure, the cell phone client, the real- time sampling scheme, and the servers to get data off the phones, and software to visualize the results. The role for academia was to provide new processing techniques, scientific expertise, and algorithm development. The UCB team was to perform the traffic experiment design, propose a suitable site, and build the algorithms to estimate traffic conditions. The UCB team consisted of about twenty to thirty people during the evolution of the experiment design. Meetings were regularly held between members of both teams to coordinate the details of data- handling. In particular, data from the cell phones needed to be supplied in a useable form for the UCB team’s algorithms and then passed back to the NRC team’s software for visualization.
Choosing a date. When choosing a date for the experiment, the key concern was to find a part of the semester when graduate students would be available to participate. Weather was also a concern, but avoiding finals week and vacation weekends was crucial. Taking these factors into account, the second week of February was chosen as the target date.
3.4 Alternate Site
A site in Danville, pictured in Figure 3.8, was initially envisioned as a possible venue for the 100- car deployment due to its proximity to Berkeley and ease of access. However, detailed inspection of the traffic properties along this site revealed that the presence of recurrent congestion was not reliable. In addition, this site suffered from a lack of reliable PeMS coverage available at that time.
The I- 880 site eventually chosen for the 100- car deployment had excellent PeMS coverage, which proved to be crucial for analysis purposes. Chapter 3
Mobile Century 27
Figure 3.8: Alternate I- 680 site.
Chapter 3
Mobile Century 28
Chapter 4
Mobile Century 29
4 VTL Traffic Monitoring
This chapter begins with an overview of privacy risks in Section 4.1. To address these risks, the VTL concept is introduced in Section 4.2 as a spatial sampling scheme. In Section 4.3, a traffic monitoring architecture is designed and implemented according to the VTL scheme.
Next, two prototyping efforts are described. The data collected in these small- scale deployments are examined briefly in Section 4.1. The privacy risk and quality of travel time estimation is a function of the number and placement of VTLs. This trade- off is explored in Section 4.2. Privacy is addressed by defining exclusion zones and a minimum spacing. Travel time quality is addressed by optimal placement of VTLs. The VTL scheme is compared against a temporal sampling strategy. In Section 4.3, the tradeoff between privacy and data quality is further explored. Key results are summarized in Section 4.4.
4.1 Privacy Risks and Threat Model
Privacy concerns for participating drivers. Traffic monitoring through GPS- equipped cell phones raises significant privacy concerns for participating users. Social acceptance of such monitoring is less likely if location traces are detailed enough to infer medical conditions, political affiliations, speeding, or involvement in traffic accidents.
Threat Model and Assumptions. The present work assumes that adversaries can compromise any single infrastructure component to extract information and can eavesdrop on network communications. We assume that different infrastructure parties do not collude and that a driver’s own handset is trustworthy. We believe this model is useful in light of the many data breaches that occur due to dishonest insiders, hacked servers, stolen computers, or lost storage media ( see [ 4] for an extensive list, including a dishonest insider case that released 4500 records from California’s FasTrak automated road toll collection system). These cases usually involve the compromise of log files or databases in a single system component and motivate our approach of ensuring that no single infrastructure component can accumulate sensitive information.
Naive anonymization is insufficient to protect privacy. We consider sensitive information to be any information from which the precise location of an individual at a given time can be inferred. Traffic monitoring requires at least aggregated statistics from a large number of probe vehicles, but does not require individual node identities. Therefore, one obvious privacy measure would be to anonymize the location data by removing identifiers such as network addresses. This approach is insufficient, however, because drivers can often be re- identified by correlating anonymous location traces with identified data from other sources. For example, home locations can be identified from anonymous GPS traces [ 54, 73], which may be correlated with address databases to infer the likely driver. Similarly, records on work locations or automatic toll booth records could help identify drivers. Even if anonymous point location samples from several drivers are mixed, it is possible to reconstruct individual traces because successive Chapter 4
Mobile Century 30
samples from the same vehicle inherently share a high spatio- temporal correlation. If overall vehicle density is low, samples that are close in time and space likely originate from the same vehicle. This approach is formalized in target tracking models [ 96].
Example formulation of a tracking model. As an example of tracking anonymous samples, consider the following problem: given a time series of anonymous location and speed samples mixed from multiple users, extract a subset of samples generated by the same vehicle. Toward this end, an adversary can predict the location of the next sample ttttxvtx based on the reported speed of the previous sample, where tx and ttx are locations at time t and tt , respectively, and tv is the reported speed at t. The adversary then associates the prior location sample with the next sample closest to the prediction, or more formally with the most likely sample, where likelihood can be described through a conditional probability () tttPxx that primarily depends on spatial and temporal proximity to the prediction. The probability can be modeled through a probability density function ( pdf) of distance ( or time) differences between the predicted sample and an actual sample ( under the assumption that the distance difference is independent of the given location sample).
Speed patterns correlate with route choice, and provide clues to an adversary. Knowing speed patterns further helps tracking anonymous location samples if it is combined with map information. For example, consider the traffic scenarios depicted in Figure 4.1. On straight sections ( a) vehicles on high- occupancy vehicle ( HOV) or overtaking lanes often experience lower variance in speed. Vehicles entering at an on- ramp ( b) or exiting after an off- ramp ( c) usually drive slower than main road traffic. These general observations can be formally introduced into the tracking model by assigning an a priori probability derived from the speed deviations. For example, to identify the next location sample after an on- ramp for a vehicle that generated tx on the main route before the ramp, an adversary could assign a lower probability to location samples with low speed. These low speed samples are likely generated by vehicles that just entered after the ramp.
Privacy Metrics. As observed in [ 55], the degree of privacy risk depends on how long an adversary successfully tracks a vehicle. Longer tracking increases the likelihood that an adversary can identify a vehicle and observe it visiting sensitive places. We thus adopt the time- to- confusion [ 55] metric and its variant distance- to- confusion, which measures the time or distance over which tracking may be possible. Distance- to- confusion is defined as the travel distance until tracking uncertainty rises above a defined threshold. Tracking uncertainty is calculated separately for each location sample in a trace as the entropy logiiHpp , where the ip are the normalized probabilities derived from the likelihood values described later. These likelihood values are calculated for every location sample generated within a temporal and spatial window after the location sample under consideration.
Chapter 4
Mobile Century 31
Figure 4.1: Driving Patterns and Speed Variations in Highway Traffic.
4.2 Preserving Privacy with Virtual Trip Lines
We introduce the concept of virtual trip lines ( VTLs) for privacy- preserving monitoring and describe an architecture that embodies it.
4.2.1 Design Goals
Tradeoff between quality information and privacy protection. The big- picture challenge is to balance the tradeoff between two conflicting requirements. On one hand, quality traffic information needs to be acquired from each smartphone client. On the other hand, all gathered information must be limited and structured in such a way that it is unwieldy, or difficult, to exploit for unintended purposes. In particular, we address issues of privacy invasion that have the potential to hinder widespread social acceptance of such a system.
Privacy. We aim to achieve privacy protection by design so that the compromise of a single entity, even by an insider at the service provider, does not allow the identifying or tracking of users.
Data Integrity. The system should not allow adversaries to insert spoofed data, which would reduce the data quality of traffic information. This is especially challenging because it conflicts with the desire for anonymity.
Chapter 4
Mobile Century 32
Smartphone Client. The client software must cope with the resource constraints of current smartphone platforms. For energy consumption, we mainly focus on designing a light- weight component that filters noisy GPS samples and computes trip- line measurements.
4.2.2 Virtual Trip Line Concept
Definition of VTL. The proposed traffic monitoring system builds on the concept of virtual trip lines and the notion of separating the communication and traffic monitoring responsibilities ( as introduced in [ 54]). A VTL is a line in geographic space that, when crossed, triggers a client to send a VTL sample to the traffic monitoring server. More specifically, it is defined by:
1122,,,,,][ yxytlidxdv ( 4- 1)
where vtlid is the virtual trip line ID, 1x, 1y, 2x, and 2y are the () xy coordinates of two line endpoints, and d is a default direction vector ( e. g., N- S or E- W). When a vehicle traverses the trip line its VTL sample comprises time, trip line ID, speed, and the direction of crossing. The trip lines are pre- generated and downloaded and stored in clients.
Spatial sampling preferred over temporal sampling. Virtual trip lines control disclosure of location data by sampling in space rather than sampling in time, since clients generate VTL samples at predefined geographic locations, compared to sending samples at periodic time intervals. The rationale for this approach is that in certain locations traffic information is more valuable and certain locations are more privacy- sensitive than others. Through careful placement of trip lines the system can thus better manage data quality and privacy than through a uniform temporal sampling interval. In addition, the ability to store trip lines on the clients can reduce the dependency on trustworthy infrastructure for coordination. These concepts are revisited in Section 4.2.
4.2.3 Architecture for Probabilistic Privacy
Strict separation of identity information ( for communication) and location information ( for traffic monitoring). To achieve the anonymization of VTL samples from clients while authenticating the sender of VTL samples, we split the actions of authentication and data processing onto two different entities, an ID proxy server and a traffic monitoring server. By separately encrypting the identification information and the sensing measurements ( i. e., trip line ID, speed, and direction) with different keys, we prevent each entity from observing both the identification and the sensing measurements.
Overview of system architecture. Figure 4.2 shows the resulting system architecture eventually implemented for the field experiment. It comprises four key entities: probe vehicles with the cell phone handsets, an ID proxy server, a traffic monitoring service provider, and a VTL generator. Each probe vehicle carries a GPS- enabled mobile handset that executes the client application. This application is responsible for the following functions: downloading and caching Chapter 4
Mobile Century 33
trip lines from the VTL server, detecting trip line traversal, and sending measurements to the service provider. To determine trip line traversals, probe vehicles check if the line between the current GPS position and the previous GPS position intersects with any of the trip lines in its cache. Upon traversal, handsets create a VTL sample comprising trip line ID, speed readings, timestamps, and the direction of traversal and encrypt it with the VTL server’s public key. Handsets then transmit this sample to the ID proxy server over an encrypted and authenticated communication link set up for each handset separately. Each handset and the ID proxy share an authentication key in advance.
Figure 4.2: Virtual Trip Line: Privacy- Preserving Traffic monitoring System Architecture. This system was implemented and ran for the entire duration of the 100- vehicle deployment of the Mobile Century experiment
ID proxy server handles identity information. The ID proxy’s responsibility is to first authenticate each client to prevent unauthorized VTL samples and then forward anonymized samples to the VTL server. Since the VTL sample is encrypted with the VTL server’s key, the ID proxy server cannot access the VTL sample content. It has knowledge of which phone transmitted a VTL sample, but no knowledge of the phones position. The ID proxy server strips off the identifying information and forwards the anonymous VTL sample to the VTL server over another secure communication link. Chapter 4
Mobile Century 34
VTL server handles location information. The VTL server aggregates samples from a large number of probe vehicles and uses them for estimating the real- time traffic status. The VTL generator determines the position of trip lines, stores them in a database, and distributes trip lines to probe vehicles when any download request from probe vehicles is received. Similar to the ID proxy, each handset and the VTL generator should share an authentication key in advance. The VTL generator first authenticates each download requester to prevent unauthorized requests and can encrypts trip lines with a key agreed upon between the requester and the VTL generator. Both the download request message and the response message are integrity protected by a message authentication code.
Advantages of this architecture. The above architecture improves location privacy of probe vehicle drivers through several mechanisms. First, the VTL server must follow specific restrictions on trip line placements that we will describe in Section 4.2. This means that a handset will only generate samples in areas that are deemed less sensitive and not send any information in other areas. By splitting identity- related and location- related processing, a breach at any single entity would not reveal the precise position of an identified individual. A breach at the ID proxy would only reveal which phones are generating samples ( or are moving) but not their precise positions. Similarly, a breach at the VTL server would provide precise position samples but not the individual’s identities. Separating the VTL server from the VTL generator prevents active attacks that modify trip line placement to obtain more sensitive data. This is, however, only a probabilistic guarantee because tracking and eventual identification of outlier trips may still be possible. For example, tracking would be straightforward for a single probe vehicle driving along on empty roadway at night [ 55]. The outlier problem in sparse traffic situations can be alleviated by changing trip lines based on traffic density heuristics. Trip lines could be locally deactivated by the client based on time of day or the clients speed. They could also be deactivated by the VTL generator based on traffic observations from other sources such as loop detectors.
4.3 Implementation
The architecture described above was implemented using Nokia N95 smartphone handsets, which include a full Global Positioning System receiver that can be accessed by application software.
4.3.1 Map Tiles and Trip Lines
Quadrant representation. In our system, we recursively divide the geographic region of interest into four smaller rectangles ( or quadrants), and the minimum quadrant size is 1m by 1m. We convert the GPS location of a user into a Mercator projection using the WSG84 world model. Mercator projects the world into a square planar surface. A zoom of 25 is assumed to be the maximum precision that location can be specified in. By default every GPS location is converted into 25 bit x and y values with zoom set to 25. By using the quadrant representation the mobile device can efficiently control the granularity by simply changing the zoom level. In this Chapter 4
Mobile Century 35
encoding, the world is treated as a square grid of four quadrants with zoom level 2, where x and y are the offsets from the top left corner of the world.
VTLs contained in map tiles. This representation makes it easy to specify the specific map tile. We define a map tile as a container that groups all trip lines within it. When a client wants to download all virtual trip lines within the San Francisco Bay Area, it sends the VTL server the triplet, () zoomxy for the corresponding region. In our implementation, we choose 12 as the default zoom level, which corresponds to an 8 km by 8 km square.
Memory requirements. This representation also helps in reducing storage size and bandwidth consumption. Since the general area is identified by the quadrant, we only store the 13 least significant bits of the trip line end point coordinates instead of the full 25 bits used for typical UTM coordinates. This decreases storage consumption to 68bits ( 15 bit id, 1 bit direction, 413 bits coordinates) per trip line. As an example of required storage and bandwidth consumption, consider the San Francisco Bay Area, the total road network of which contains about 20,000 road segments, according to the Digital Line Graph 1: 24K scale maps of the San Francisco Bay Area Regional Database ( BARD [ 1], managed by USGS). Assuming that the system on average places one trip line per segment this results in 166KB of storage.
4.3.2 Client Device and Software
Client hardware and software. We implemented the client software using J2ME ( Java Platform, Micro Edition) on an Nokia N95 handset. This Symbian OS handset uses an ARM11- based Texas Instruments OMAP2420 processor running at 330MHz, and it contains 64MB RAM and 160MB internal memory. Its storage can be expanded up to 8GB with flash memory. We use the JSR 179 library ( Location API for J2ME) [ 2] for communicating with the internal TI GPS5300 NaviLink 4.0 single- chip GPS/ A- GPS module to set the sampling period and retrieve the position readings. This setup did not provide speed information. Instead, we calculate the mean speed using two successive location readings ( in our implementation, every 3 seconds). The client software registers the task for checking the traversal of trip lines as an event handler for GPS module location samples, which is automatically invoked whenever a new position reading becomes available.
Communication protocol. The communication between the handset and the ID proxy server, to send updated lists of VTLs or to request VTL downloads, is implemented via HTTPS GET/ POST messages. The client software encrypts the message content but not the handset identification information using the public key of the VTL server so that only the VTL server with the corresponding private key can decrypt the message. To save network bandwidth and to reduce delay, we cache the downloaded trip lines for the nine map tiles closest to the current position in local memory. When a vehicle crosses a tile boundary, it initiates VTL download background threads for the missing tiles. Chapter 4
Mobile Century 36
4.3.3 Servers and Databases
VTL database server. At the bottom of the hierarchy of our server implementation is a backend database server. The database server contains two databases. First is a VTL database which holds GPS coordinates of all trip lines. In future we plan to enhance our trip line database to hold meta data associated with that trip line. For instance, the meta data for a trip line can contain the posted speed limit at that trip line which can be used by the client application to decide if it is going over the speed limit in which case the client application can disable the transmission of VTL samples. Write access to this database is restricted only to traffic administrators who can add, delete or update a VTL.
Figure 4.3: Road networks extracted from Bay Area DLG files ( Left) and Trip Lines per road segment in Palo Alto CA ( Right).
Traffic database server. The second database is the VTL sample measurement database. This database stores the VTL samples sent by the mobile device whenever the mobile device chooses to send a sample after crossing a VTL. The sample database simply appends every VTL sample along with a time stamp on when the sample was received. To sanitize bogus VTL samples from the clients, the VTL sample database also keeps both the encrypted and decrypted versions of the VTL sample for further investigation in collaboration with the ID proxy server. When bogus VTL samples are detected in the VTL sample database, their encrypted versions are compared to the encrypted version stored in the ID proxy server to blacklist the originator of bogus VTL samples. Chapter 4
Mobile Century 37
Database implementation. We use Microsoft SQL to implement the databases, and we develop the VTL server using J2EE ( Java Platform, Enterprise Edition) and JDBC ( Java Database Connectivity) to control the SQL databases that are connected to the VTL server. While we have used only a single DB server in this prototype, the two databases should ideally be implemented by different entities to prevent active trip line modification attacks by a compromised traffic monitoring entity.
ID Proxy Server. On top of the database server is the ID Proxy server. The identification proxy server is envisioned to be operated by an entity that is independent of the traffic service provider. We implement the ID proxy server as a servlet- based web server that takes in HTTPS GET/ POST messages from clients and forwards messages to the VTL server. The HTTP message received by the proxy server from the client has two components. The first component contains the mobile device identification information, namely phone number of the message origin. This component of the message is required for all cell phone communications as operator needs to appropriately charge for data communication costs. The second component of the message contains information that is intended for the database server. The proxy server strips all the identification information from the message, namely the first component of the message, and passes on the second component of the message to the application server. We implemented the secure channel between ID proxy server and the VTL server using WSDL ( Web Service Definition Language)- RPC ( Remote Procedure Call) over J2EE Server.
Figure 4.4: Comparison of the speed measurements recorded from the N95 ( dots), the VTLs ( boxes) and the vehicle speedometer ( circles) as a function of time. Chapter 4
Mobile Century 38
4.1 Experimental Deployment
The implementation described above was used for several experimental deployments. The correct operation of the traffic monitoring system was first demonstrated with an initial test along I- 80. A second test involving twenty cars was performed to measure data quality and to inform the design of the 100 vehicle deployment.
Figure 4.5: Satellite image of the first experiment site I- 80 near Berkeley, CA. The red lines represent the locations of the VTLs, the blue squares show the speed recorded by the VTL, and the green squares represent the position and speed stored in the phone log. The brown circles represent the readings from the vehicle speedometer.
4.1.1 Velocity Measurement Accuracy
GPS speed and position accuracy. A first experiment was performed to estimate the position and speed accuracy of a single cell phone carried onboard a vehicle. The experiment route consisted of a single 7- mile loop on I- 80 near Berkeley, CA. VTLs were placed evenly on the highway every 0.2 miles. Speed and position measurements were stored locally on the phone every 3 seconds, and speed measurements were sent over the wireless access provider’s data network every time a VTL was crossed. The speed measurements were computed using two consecutive position measurements. In order to substantiate the correctness of the data, Chapter 4
Mobile Century 39
vehicle speed was also recorded directly from the speedometer on a laptop with a clock synchronized with the N95. In Figure 4.4, the speed measured directly from the vehicle sp
Click tabs to swap between content that is broken into logical sections.
| Rating | |
| Title | Mobile Century final report for TO 1021 and TO 1029. A traffic sensing field experiment using GPS mobile phones |
| Subject | TA1001.C797 no. 2010-4; Traffic monitoring--California--San Francisco Bay Area.; GPS receivers--California--San Francisco Bay Area.; Cell phones--California--San Francisco Bay Area. |
| Description | Performed in cooperation with California Dept. of Transportation and U.S. Federal Highway Administration.; "December 2010."; Includes bibliographical references (leaves 151-160). |
| Creator | Bayen, Alexander M. |
| Publisher | California Center for Innovative Transportation, Institute of Transportation Studies, University of California at Berkeley |
| Contributors | California. Dept. of Transportation.; California Center for Innovative Transportation.; University of California, Berkeley. Institute of Transportation Studies. |
| Type | Text |
| Language | eng |
| Relation | Also available online.; http://www.its.berkeley.edu/publications/UCB/2010/CWP/UCB-ITS-CWP-2010-4.pdf; http://worldcat.org/oclc/700289363/viewonline |
| Title-Alternative | Traffic sensing field experiment using GPS mobile phones; Deployment of value-added mobile traffic; Post processing Mobile Century (aka Probe data analysis) |
| Date-Issued | [2010] |
| Format-Extent | xvi, 423 leaves : col. ill., col. charts, col. maps ; 29 cm. |
| Relation-Is Part Of | CCIT research report, UCB-ITS-CWP-2010-4; Research report (University of California, Berkeley. Institute of Transportation Studies. California Center for Innovative Transportation) ; UCB-ITS-CWP-2010-4. |
| Transcript | CALIFORNIA CENTER FOR INNOVATIVE TRANSPORTATION INSTITUTE OF TRANSPORTATION STUDIES UNIVERSITY OF CALIFORNIA, BERKELEY Mobile Century Final Report for TO 1021 and TO 1029: A Traffic Sensing Field Experiment Using GPS Mobile Phones Alexandre M. Bayen, Ph. D., Principal Investigator CCIT Research Report UCB- ITS- CWP- 2010- 4 This work was performed by the California Center for Innovative Transportation, a research group at the University of California, Berkeley, in cooperation with the State of California Business, Transportation, and Housing Agency’s Department of Transportation, and the United States Department of Transportation’s Federal Highway Administration. The contents of this report reflect the views of the authors, who are responsible for the facts and the accuracy of the data presented herein. The contents do not necessarily reflect the official views or policies of the State of California. This report does not constitute a standard, specification, or regulation. December 2010 Project Fact Sheet Task Order # 1021 Title: Deployment of Value- Added Mobile Traffic Probes Project Sponsor: Caltrans Project Stakeholders: Caltrans Executing Organization: California Center for Innovative Transportation 2105 Bancroft Way, Berkeley, CA 94720 Phone: ( 510) 642- 4522 Fax: ( 510) 642- 0910 Contract No. 65A0212 Execution Period: 03/ 01/ 2007 – 09/ 30/ 2009 Contract Amount: $ 612,376 Principal Investigator: Alexandre Bayen, Ph. D. Center Director: Thomas West Project Manager: Ali Mortazavi Project Fact Sheet Task Order # 1029 Title: Post Processing Mobile Century ( aka Probe Data Analysis) Project Sponsor: Caltrans Project Stakeholders: Caltrans Executing Organization: California Center for Innovative Transportation 2105 Bancroft Way, Berkeley, CA 94720 Phone: ( 510) 642- 4522 Fax: ( 510) 642- 0910 Contract No. 65A0212 Execution Period: 07/ 01/ 2008 – 06/ 30/ 2009 Contract Amount: $ 313,000 Principal Investigator: Alexandre Bayen, Ph. D. Center Director: Thomas West Project Manager: Ali Mortazavi Acknowledgments Mobile Century i Acknowledgements The authors would first like to acknowledge the support of our sponsors at Caltrans, especially Larry Orcutt, who took a risk to support a truly innovative effort. We are also thankful for the the guidance and consideration of Greg Larson, Hassan Aboukhadijeh, Asfand Siddiqui, and Gurprit Hansra. The authors wish to thank everyone who participated in Mobile Century, including those who contributed to the initial brainstorming, those who mobilized during the course of its evolution, and those who forged ahead to propel this project to heights unimaginable just three years ago. The success of Mobile Century would have been impossible without the leadership, dedication, and foresight of J. D. Margulici, Thomas West, and Joe Butler. We are grateful to the entire staff of the California Center for Innovative Transportation for the logistical planning, and successful implementation, of Mobile Century. In particular, we thank Coralie Claudel, Marika Benko, Osama Elhamshary, Tia Dodson, Chris Flens- Batina, Manju Kumar, Jed Arnold, Benson Chiou, Lori Luddington, Xiaohong Pan, Erica Sherlock- Thomas, and Arthur Wiedmer. We appreciate the hard work of all the execution officers, Emma Strong, Jason Wexler, Jean Parks, Jennifer Chang, Kristen Ray, Negin Aryaee, Timmy Siauw, Anurag Sridharan, Madeline Ziser, Matthew Vaggione, Qingfang Wu, Tarek Rabbani, Josh Pilachowski, Kristen Parrish, Megan Smirti, Carl Misra, Christina Sedighi, Jessica Ariani, Elizabeth Kincaid, Sandy Do, Nick Semon, Trucy Phan, Timothy Racine, Alan Wang, Charlotte Wong, Irene Kwan, Karl David Cruz, Swe Shin Maung, Tyler Moser, Alexis Clinet, and Julie Percelay, who assisted in planning, preparation, and operations on the day of the 100- vehicle deployment. We are also indebted to members of the Nokia team: John Paul Shen, Bob Ianucci, Dave Sutter, and John Loughney; members of the CITRIS team: Gary Baldwin, Lorie Mariano, Paul Wright, Aaron Walburg, and Khossrov Taherian; members of the ITS team: Margaret Chang, Norine Shima, Jillene Bohr, John Li, and Ann Guy; members of the CoE team: Shankar Sastry, Lisa Alvarez Cohen, and Barbara Blackford; the student team: Annalisa Schiaccoli, Saurabh Amin, and Dengfeng Sun; the viability team: Patrick Saint- Pierre, and Jean- Pierre Aubin; and Sarah Yang at the UC Berkeley Media Office. The authors would like to apologize in advance to anyone who contributed to develop, build, and deploy the traffic monitoring system implemented as part of the Mobile Century experiment but whose name we neglected to mention through our own oversight. You have our gratitude. During the compilation of this report, we thank the many people who provided crucial subject matter through personal interviews, Skype interviews, original project documents, briefing materials, personal notes, emails, and other sources that made it possible to provide the rich Acknowledgments Mobile Century ii narrative herein. Finally, we thank Sara Bagwell for her work to compile and format this document. Executive Summary Mobile Century iii Executive Summary Traffic monitoring is most commonly accomplished with government- deployed, dedicated equipment. Adopting new technology in this paradigm can be costly and slow. However, recent advances in the mobile internet, cell phone technology, and location- based services may be leveraged to transcend the old paradigm. Doing so will reduce costs, increase coverage and yield a wealth of new data that will empower the traveling public with real- time access to current traffic conditions. Furthermore, transportation operators will gain access to an unprecedented wealth of information to help them better manage road networks. Nonetheless, significant technical barriers and privacy concerns may impede widespread acceptance of a new paradigm. To understand and overcome these barriers, the Mobile Century experiment was conceived as a proof- of- concept demonstration of a traffic monitoring system based on probe vehicles equipped with GPS- enabled mobile phones. The sheer scale of the experiment required significant logistical effort. A base station was erected at Union Landing, to house a temporary control center that was linked to a secondary control center in Palo Alto. Over one hundred graduate students from UC Berkeley were employed to circulate in loops along Interstate 880 between Hayward and Fremont, California, for an entire day. During the experimental deployment, an average penetration rate of probe vehicles was sustained near 3% ( a significant logistical feat), which is viewed as realistic in the near future considering the increasing penetration of GPS- enabled cellular devices. Classical methods of traffic modeling operate in the vehicular density domain, and use data such as occupancies and flows from inductive loop detectors. Understanding how to use velocity measurements instead was a significant technical contribution. In this work, the classical model was converted to the velocity domain, and GPS- based measurements were directly fed into the model. Mobile Century proved that data from GPS- enabled mobile phones alone were sufficient to infer traffic features, i. e., to construct an accurate velocity map over time and space. The methods employed were able to function properly during both congested and free flow traffic conditions, and to detect correctly a traffic incident that occurred during the deployment. Another important contribution from this work was that ground- truth travel times were recovered by re- identifying vehicles captured on videotape. Therefore all results in this report can be asserted with high confidence. We conclude that the quality of data obtainable from present- day smartphones is adequate for useful, real- time traffic applications, such as calculating travel times. The architecture of the traffic monitoring system was designed such that identity information is encrypted and handled separately from traffic information, with no single entity having access Executive Summary Mobile Century iv to both. The spatial sampling strategy is based on the use of virtual trip lines that can be re- configured on- the- fly. This feature builds- in guaranteed flexibility for future monitoring needs. The new paradigm demonstrated in Mobile Century yet requires substantial effort to bring to fruition. Any industrial- grade, real- time system will require partnerships between government, academia, and industry. Business cases for future deployment must address incentives for public participation In conclusion, Mobile Century was the first to demonstrate the near- term potential for using velocity data from GPS cell phones to reconstruct traffic state with precision. This opens the door for further research in this area to scale up the solution and to deliver considerable value to Caltrans and the traveling public. Table of Contents Mobile Century v Table of Contents Acknowledgements .......................................................................................................................... i Executive Summary ......................................................................................................................... iii Table of Contents ............................................................................................................................. v List of Figures ............................................................................................................................... .. xi 1 Introduction ............................................................................................................................ 1 1.1 Motivation ........................................................................................................................ 1 1.2 Historical Narrative .......................................................................................................... 2 1.3 Scope ............................................................................................................................... 4 1.4 Summary of Findings ........................................................................................................ 5 1.5 Organization of Report ..................................................................................................... 6 2 Background ............................................................................................................................. 9 2.1 Traffic Monitoring with Dedicated Equipment ( Road Infrastructure) ............................. 9 2.2 Growth of the Mobile Internet ...................................................................................... 11 2.3 Barriers to a New Paradigm ........................................................................................... 12 3 Design Challenges ................................................................................................................. 17 3.1 Preliminary Investigation ............................................................................................... 17 3.2 Scale of Experiment ........................................................................................................ 23 3.3 Practical Considerations ................................................................................................. 25 3.4 Alternate Site .................................................................................................................. 26 4 VTL Traffic Monitoring .......................................................................................................... 29 4.1 Privacy Risks and Threat Model ..................................................................................... 29 4.2 Preserving Privacy with Virtual Trip Lines ...................................................................... 31 Table of Contents Mobile Century vi 4.3 Implementation .............................................................................................................. 34 4.1 Experimental Deployment ............................................................................................. 38 4.2 Trip Line Placement ........................................................................................................ 41 4.3 Estimating traffic ............................................................................................................ 47 4.4 Discussion ....................................................................................................................... 53 5 Assimilating Lagrangian Data ................................................................................................ 55 5.1 Background ..................................................................................................................... 55 5.2 Description of Preliminary Concepts.............................................................................. 57 5.3 Explanation of Proposed Methods ................................................................................. 63 5.4 Assessment of the Methods ........................................................................................... 71 5.5 Numerical example of the NR nudging factor ................................................................ 81 6 Velocity- based Modeling ...................................................................................................... 83 6.1 Related Work .................................................................................................................. 83 6.2 Highway Traffic Flow Model ........................................................................................... 84 6.3 Speed Estimation ............................................................................................................ 89 6.4 Implementation and Validation ..................................................................................... 90 6.5 Conclusion and Future Work .......................................................................................... 94 7 Video validation .................................................................................................................... 95 7.1 Resolution Requirements and Camera Capabilities ....................................................... 95 7.2 Final Selection of Video Cameras ................................................................................... 98 7.3 Practical Considerations for Camera Deployment ......................................................... 99 7.4 Trial Tests ....................................................................................................................... 99 7.5 Protocol for Camera Deployment ................................................................................ 100 Table of Contents Mobile Century vii 7.6 Narrative of Camera Deployment ................................................................................ 103 7.7 Post- processing of Video Data ..................................................................................... 104 8 Probe Vehicle Deployment ................................................................................................. 107 8.1 Resources ..................................................................................................................... 107 8.2 Procedures ................................................................................................................... 113 8.3 Collection of Probe Data .............................................................................................. 115 8.4 Narrative of Experiment ............................................................................................... 115 8.5 Data Pre- processing ..................................................................................................... 118 9 Experimental results ........................................................................................................... 121 9.1 Real- Time Traffic Monitoring Using Only Cell Phone Data .......................................... 121 9.2 Trajectory data ............................................................................................................. 122 9.3 Velocity Field ................................................................................................................ 123 9.4 Ground- truth Travel Times ........................................................................................... 125 9.5 Reasons For Disparity Between Loop and VTL Data .................................................... 129 9.6 Achieved Penetration Rate During Experiment ........................................................... 130 9.7 Inferring parameters from shockwave speed .............................................................. 132 10 Revisiting the Density Based Methods ............................................................................... 135 10.1 Different Scenarios ....................................................................................................... 135 10.1 Results .......................................................................................................................... 136 10.2 Conclusions................................................................................................................... 141 11 Discussion..................................................................................................................... ...... 145 11.1 Conclusions................................................................................................................... 146 11.2 Challenges .................................................................................................................... 147 Table of Contents Mobile Century viii 11.3 Future Goals ................................................................................................................. 147 11.4 Steps Toward Deployment ........................................................................................... 148 References ............................................................................................................................... .. 151 Appendices ............................................................................................................................... .. 161 1 CPHS Protocol Narrative Form ............................................................................................ 163 2 Deployment Prototyping .................................................................................................... 203 2.1 Strategy and Objectives ............................................................................................... 205 2.2 Protocol for 20- Vehicle Deployment ........................................................................... 211 2.3 Instructions for Odd Drivers ......................................................................................... 219 2.4 Instructions for Even Drivers ........................................................................................ 229 3 Contingency Plans ............................................................................................................... 241 3.1 Risk Management ......................................................................................................... 243 3.2 Emergency Response ................................................................................................... 255 3.3 Directions for Phone Operators ................................................................................... 259 4 Driver Briefings ................................................................................................................... 271 4.1 Red Team ...................................................................................................................... 273 4.2 Yellow Team ................................................................................................................. 281 4.3 Orange Team ................................................................................................................ 291 5 Driver Instructions .............................................................................................................. 301 5.1 Red AM route ............................................................................................................... 303 5.2 Red PM route ............................................................................................................... 309 5.3 Yellow AM route ........................................................................................................... 315 5.4 Yellow PM route ........................................................................................................... 321 Table of Contents Mobile Century ix 5.5 Orange AM route ......................................................................................................... 327 5.6 Orange PM route .......................................................................................................... 333 6 Field Experiment Protocol .................................................................................................. 339 7 Press Release Materials ...................................................................................................... 405 7.1 Fact Sheet ..................................................................................................................... 407 7.2 Guest Program ............................................................................................................. 411 8 Supplemental Tasks ............................................................................................................ 417 8.1 AASHTO presentation................................................................................................... 418 8.2 Mobile Millennium Planning, Design, and Server Development – Initial Stages ......... 419 8.3 Mobile Millennium Arterial Modeling – Initial Stages ................................................. 420 List of Figures Mobile Century xi List of Figures Figure 3.1: Vehicle trajectories from NGSIM. Shown in red, a flow fraction, , of trajectories are randomly designated as probes. ................................................................................................... 18 Figure 3.2: One vehicle trajectory. Parameters are shown for real- time probe data reports. .... 19 Figure 3.3: Spatio- temporal dispersion of probe measurements for different combinations of penetration and sampling rates. ................................................................................................... 20 Figure 3.4: Vehicle accumulations. Comparison of estimates using the Kalman filter method ( top), and Nudging method ( bottom). In both cases, the incorporation of Lagrangian data results in improved estimates over those using Eulerian data only. ............................................ 21 Figure 3.5: Vehicle accumulations. The higher the sampling rate, the higher the fidelity of the accumulation estimate. ................................................................................................................ 22 Figure 3.6: Vehicle density estimated at the middle of the modeled expressway section. ......... 22 Figure 3.7: Travel time estimation for the modeled expressway section. ................................... 23 Figure 3.8: Alternate I- 680 site. .................................................................................................... 27 Figure 4.1: Driving Patterns and Speed Variations in Highway Traffic. ........................................ 31 Figure 4.2: Virtual Trip Line: Privacy- Preserving Traffic monitoring System Architecture. This system was implemented and ran for the entire duration of the 100- vehicle deployment of the Mobile Century experiment .......................................................................................................... 33 Figure 4.3: Road networks extracted from Bay Area DLG files ( Left) and Trip Lines per road segment in Palo Alto CA ( Right). ................................................................................................... 36 Figure 4.4: Comparison of the speed measurements recorded from the N95 ( dots), the VTLs ( boxes) and the vehicle speedometer ( circles) as a function of time. ......................................... 37 Figure 4.5: Satellite image of the first experiment site I- 80 near Berkeley, CA. The red lines represent the locations of the VTLs, the blue squares show the speed recorded by the VTL, and the green squares represent the position and speed stored in the phone log. The brown circles represent the readings from the vehicle speedometer. .............................................................. 38 Figure 4.6: I880 Highway Segment for Twenty Car Experiment. .................................................. 40 Figure 4.7: Experimental Setup in a Car for Twenty Car Experiment. .......................................... 40 List of Figures Mobile Century xii Figure 4.8: Speed Measurements over Distance. ......................................................................... 41 Figure 4.9: Speed Measurements over Time. ............................................................................... 42 Figure 4.10: Linking prediction on a straight highway section. .................................................... 43 Figure 4.11: Minimum Spacing Constraints for Straight Highway Section. .................................. 44 Figure 4.12: Linking attack near an on- ramp. ............................................................................... 45 Figure 4.13: Actual travel times compared with an estimate given by the instantaneous method ( 30 second aggregation interval). ................................................................................................. 47 Figure 4.14: Travel time estimate errors by different sampling intervals using 15 VTLs. ............ 49 Figure 4.15: Comparison between VTL- based spatial sampling and temporal periodic sampling against the same number of total anonymous samples. ............................................................. 49 Figure 4.16: Exclusion Area on Test Road Segment. Tracking starts from the point marked by star. ............................................................................................................................... ............... 50 Figure 4.17: Spatial sampling and the benefit of an exclusion zone. ........................................... 51 Figure 4.18: Travel time accuracy plotted vs. VTL spacing. .......................................................... 52 Figure 5.1 Highway US101 S used for the NGSIM dataset. .......................................................... 72 Figure 5.2 Schematic of the sampling strategy on equipped vehicle n. ....................................... 73 Figure 5.3. Vehicle accumulation per cell for a) ground truth, and b) EDO case. ........................ 75 Figure 5.4 Vehicle accumulation ( vehicles per cell) estimated using Newtonian relaxation method ( left) and Kalman Filtering techniques ( right) for scenarios 1 ( top), 3 ( middle), and 9 ( bottom). ............................................................................................................................... ....... 76 Figure 5.5 Total vehicle accumulation on the entire section. ..................................................... 77 Figure 5.6 Percentage of Improvement ( PoI) in the RMSE as the number of Lagrangian measurements increases: ( a) computing observed density using the fundamental diagram ( Section 5.2.3), ( b) using the actual density computed from vehicle trajectories as the observed density. Note that the two graphs are at different scales. ........................................................... 79 List of Figures Mobile Century xiii Figure 5.7 True density ( computed from vehicle trajectories) versus Observed density ( computed using the fundamental diagram as described in Section 5.2.3) for scenario 3 ( left) and 9 ( right). ............................................................................................................................... . 80 Figure 6.1: Greenshields model. Left: Classical fundamental diagram ( parabolic). Center: Linear relation between speed and density. Right: Flux function for the LWR- v PDE ( 6- 4). The flux is parabolic with negative values. .................................................................................................... 86 Figure 6.2: Paramics velocity contours. Top: Ground truth velocity contour average across all vehicles. Bottom: Estimated velocity contour from the EnKF CTM- v algorithm ( 6- 19) through ( 6- 24) at 5% penetration rate. X- axis: position along highway in milepost; Y- axis: time of day. . 91 Figure 6.3: Error comparison of the EnKF CTM- v scheme, equations ( 6- 19) through ( 6- 24), ( solid) and the averaging scheme ( 6- 25), ( dashed) using Paramics. Top: Relative error computed from ( 6- 26) as a function of penetration rate. Bottom: Absolute error computed from ( 6- 27) as a function of penetration rate. ........................................................................................................ 93 Figure 7.1: Source image for resolution analysis. ......................................................................... 95 Figure 7.2: Simulated resolution for three camera types, assuming one camera per lane. ........ 96 Figure 7.3: Simulated resolution for four camera types, assuming one camera for five lanes. .. 96 Figure 7.4: Image of license plate with blurring to simulate 2.0 cm resolution ........................... 97 Figure 7.5: Comparison of standard vs. HD camcorder image quality ......................................... 98 Figure 7.6: Video frame with fence mesh between lanes. ......................................................... 100 Figure 7.7: HD Camcorder recording mode. ............................................................................... 100 Figure 7.8: Shutter priority switch. ............................................................................................. 101 Figure 7.9: Shutter control vs aperture control. ......................................................................... 101 Figure 7.10: Focus control........................................................................................................... 102 Figure 7.11: Custom image effects. ............................................................................................ 102 Figure 7.12: Controls for contrast and sharpness....................................................................... 103 Figure 7.13: Camera deployment during MC experiment. Stevenson and I880 ( South bridge), Decoto and I880 ( Central bridge), Winton and I880 ( North bridge). ......................................... 104 List of Figures Mobile Century xiv Figure 8.1: Traffic monitoring infrastructure built for field experiment. ................................... 109 Figure 8.2: Layout of Base Camp on the Union Landing Parking Lot. ......................................... 110 Figure 8.3: Tent Layout. .............................................................................................................. 111 Figure 8.4: Study site and driver routes. ..................................................................................... 112 Figure 8.5: Assigning drivers to routes and cars. ........................................................................ 113 Figure 8.6: Drivers logistics ......................................................................................................... 117 Figure 9.1: Snapshot of the live traffic feed provided by the system in the present work ( and from 511. org in the inset) at 10: 52am on February 8, 2008. Traffic conditions after an incident on the northbound direction of I- 880 are displayed. Numbers in circles correspond to speed in mph. ............................................................................................................................... ............ 122 Figure 9.2: Vehicle trajectories in the northbound direction extracted from the data stored by 50% of the cell phones. The propagation of the shockwave from the accident can clearly be identified from this plot. The red lines in the close- up were drawn by hand by fitting a line through the points where trajectories change slope. ................................................................ 123 Figure 9.3: Loop detector locations along the northbound direction. Numbers indicating mileposts increase in the direction of traffic flow from left to right. ......................................... 124 Figure 9.4: Velocity fields in mph using: ( a) 17 loop detector stations; ( b) vehicle trajectories and Edie’s generalized definition; ( c) 17 VTLs at the loop detector locations; and ( d) 30 equally spaced VTLs. ............................................................................................................................... 125 Figure 9.5: Travel time ( in minutes) between Decoto Rd. and Winton Ave. The x- axis indicates arrival time at Decoto Rd. Dots correspond to individual vehicle travel times ( 4268 in total), collected manually using video. Black dash- dotted lines correspond to the standard deviation of travel times obtained from video cameras in 5- minute windows. ............................................ 126 Figure 9.6: Loop detector vs. VTL velocity measurements ( all locations). Dotted lines are the 5 mph thresholds. s ........................................................................................................................ 127 Figure 9.7: Loop detector and VTL velocity data collected at: ( a) milepost 21.3, downstream of Decoto Rd.; ( b) milepost 22.5, half- way between Alvarado Blvd. and Alvarado Niles Rd., ( c) milepost 27.3, the most downstream detector near the Winton Ave. exit; and ( d) milepost 24.0, downstream of Whipple Rd. Subfigure ( e) shows the penetration rate at each of these four locations during the day. ............................................................................................................ 129 Figure 9.8: Penetration rate map................................................................................................ 131 List of Figures Mobile Century xv Figure 9.9: ( a) and ( c): Average penetration rate over time at existing detector station locations during the morning and the afternoon. The range is one standard deviation below and over the mean. Traffic flows from left to right. ( b) and ( d): Histogram of the penetration rate including all the 17 locations during the morning routes and the afternoon routes, respectively. .............. 132 Figure 10.1; Density field ( in vpm) using 17 loop detector stations deployed in the section of interest ( obtained through PeMS). ............................................................................................. 136 Figure 10.2: Density field ( in vpm) using the Newtonian relaxation method ( left) and the Kalman Filtering techniques ( right) for scenario 1 ( top), 2 ( middle), and 3 ( bottom). For each scenario, the boundary data is provided by loop detectors. ..................................................................... 137 Figure 10.3: Flow comparison at mileposts ( a) 21.3 ( detector 1), ( b) 24 ( detector 7), and ( c) 25.2 ( detector 10). .............................................................................................................................. 139 Figure 10.4: Results of the quantitative analysis for the Newtonian relaxation method and the Kalman Filtering technique for scenario 1 ( left), 2 ( center), and 3 ( right). Results obtained using loop detector data are also included for comparison. ............................................................... 140 A2.2 Figure 1: The 4 mile section to be used in the 20 cars experiment. .................................. 213 A2.2 Figure 2: Detail of the Alvarado- Niles interchange and the mall. ...................................... 214 A2.2 Figure 3: Detail of the Tennyson Rd. interchange. ............................................................. 215 A2.2 Figure 4: Detail of the SR92 interchange. ........................................................................... 215 A2.2 Figure 5: From I880S and I880N to the parking lot.. ......................................................... 216 A2.2 Figure 6: Lane numbering. .................................................................................................. 217 A2.3 Figure 1: From 1880S and I880M to the parking lot. ......................................................... 222 A2.3 Figure 2: From the parking lot onto the freeway. .............................................................. 223 A2.3 Figure 3: Lane numbering. .................................................................................................. 223 A2.3 Figure 4: Turn back at CA- 92 ( north end of the long loop). ............................................... 224 A2.3 Figure 5: Turn back at Alvarado- Niles Rd. ( sound end of both the long and short loops). 225 List of Figures Mobile Century xvi A2.3 Figure 6: Turn back at Alvarado Blvd./ Fremont Blvd. in case driver misses exit 23. ......... 226 A2.3 Figure 7: Turn back at Winton Ave. in case driver misses off- ramp to CA- 92W. ............... 227 A2.4 Figure 1: From I- 880S and I- 880M to the parking lot. ........................................................ 232 A2.4 Figure 2: From the parking lot onto the freeway. .............................................................. 233 A2.4 Figure 3: Lane numbering. .................................................................................................. 233 A2.4 Figure 4: Turn back at CA- 92 ( north end of the long loop). ............................................... 234 A2.4 Figure 5: Turn back at W Tennyson Rd. ( north end of the short loop). ............................. 235 A2.4 Figure 6: Turn back at Alvarado- Niles Rd. ( sound end of both the long and short loops). 236 A2.4 Figure 7: Turn back at Alvarado Blvd./ Fremont Blvd. in case driver misses exit 23. ........ 237 A2.4 Figure 8: Turn back at Winton Ave. in case driver misses off- ramp to CA- 92W. ............... 238 A3 Figure 1: Architecture of mobile probe data collection / travel time calculation system ... 252 Mobile Century Final Report for TO 1021 & TO 1029: A Traffic Sensing Field Experiment Using GPS Mobile Phones Prepared by: Anthony D. Patire Alexandre M. Bayen Daniel B. Work Juan C. Herrera Ryan Herring Xuexang ( Jeff) Ban Quinn Jacobson Olli- Pekka Tossavainen Sebastien Blandin Christian Claudel Ali Mortazavi Steve Andrews Baik Hoh Marco Gruteser Murali Annavaram Toch Iwuchukwu Kenneth Tracton For: California Department of Transportation Division of Research and Innovation California Center for Innovative Transportation University of California, Berkeley 2105 Bancroft Way, Suite 300, Berkeley, CA 94720- 3830 Phone: ( 510) 642- 4522 Fax: ( 510) 642- 0910 http:// www. calccit. org Chapter 1 Mobile Century 1 1 Introduction This final report for TO 1021 and TO 1029 documents the Mobile Century Project. At its heart, it is a narrative of painstaking preparation that ultimately culminated in an unprecedented deployment of 100- probe vehicles, and the subsequent analysis of the collected data. In a larger context, this is a story about how a singular experiment impacted the future paradigm of traffic monitoring. This chapter begins with a discussion motivating the line of inquiry pursued in this work. A brief historical narrative follows, describing the initial actors, task orders, and subsequent amendments that shaped the outcome of Mobile Century. Within this context, the scope of this report is defined. Key findings are summarized, and their significance is explained. In particular, the most promising directions for future research are identified, and near- term possibilities are explored. The chapter concludes with a description of the organization of this report. 1.1 Motivation Expanding the scope and coverage of roadway Advanced Traveler Information Systems ( ATIS) is a top- priority of Caltrans. Supporting statements for more and better traveler information across the state of California have come all the way from the Governor’s office. ATIS benefits the transportation system for at least two reasons. First, the availability of information enhances the service provided to travelers. Numerous studies reveal that commuters appreciate and value timely information, which reduces their uncertainty and their stress. Second, reliable information can arguably enable travelers to make educated choices about their itinerary, departure time or even transportation mode, with the result of bringing about system self- management. It remains to be established that system self- management can take place on a large scale and significantly impact network- level operations. However, at a more anecdotal level, information about an accident ahead or a scheduled ramp closure certainly influences driver decisions. An additional side benefit of ATIS is that it builds the awareness of the traveling public toward Intelligent Transportation Systems ( ITS). Such awareness can translate into political support for ITS projects and enable more improvements in the long term. One of the main pieces of ATIS content is undoubtedly travel time estimations. Travel times on selected itineraries represent information that is easy for the traveling public to understand and process. Travel times can be posted on freeway or arterial Changeable Message Signs ( CMS) and reach a very large audience, as is currently done at dozens of locations in the San Francisco Bay Area and in Southern California. Estimating travel times, either at the present time or into the future, requires large amounts of good quality traffic data. Traditionally, traffic data is Chapter 1 Mobile Century 2 collected by sensors such as inductive loops installed at fixed locations. While this method yields great results to estimate volume and occupancy, it does not provide accurate travel time information unless the sensor coverage is very dense. Traffic sensors also happen to be expensive to install and maintain. Therefore, except for some of the busiest corridors in the state, the data collected from fixed sensors is mostly inadequate for travel time estimation. Appraising various methods of collecting traffic data and specifically travel times is a definite need for Caltrans. Besides providing the bulk of the content required for ATIS, travel times also represent precious data to Caltrans as a network operator. While travel times alone may not cover the full extent of the department’s traffic data needs, accurate and reliable travel times can be used for both planning and operations purposes. Over the past few years, a number of private industry vendors have approached Caltrans with solutions to collect travel time data on highways and city arterials. Solutions revolve around two basic concepts and trends. The first trend suggests leveraging new technologies that significantly lower the cost of fixed detection. Both in- pavement technologies such as wireless magnetometers from Sensys Networks, Inc. and off- pavement technologies such as radar- based sensors by Speedinfo, Inc. offer much more attractive price points than inductive loops and make it conceivable to augment detection to a level that would yield accurate traffic maps and travel time estimates. An alternative concept is to use so- called mobile traffic probes to measure travel times from actual trips. Mobile traffic probes are essentially vehicles that are tagged and tracked along a corridor. This concept can be implemented by toll collection tags and readers, or by automated license plate readers. In either of those two cases, travel times are collected for preset segments of roadways in- between readers. For instance, the San Francisco Bay Area 511 system relies for a large part on data collected from FasTrak readers. In the past several years, cell- phone based technology has gained momentum as a promising avenue, although previous research and field tests have not been conclusive. This technology previously relied on positioning provided by cellular networks, which still has to overcome significant challenges. However, the introduction of GPS receiver chips into more and more handsets represents a new opportunity. The prospect of large numbers of GPS- equipped cell phones reporting position and speed with 10 meter / 3 mph accuracy at regular intervals represents a huge leap forward. Yet its implementation requires addressing key questions regarding individual privacy, data ownership, network load, and proper traffic flow estimation techniques. The emergence of new mediums to diffuse traffic information, such as in- vehicle telematics displays and GPS- equipped cell phones could bring about a shift in how travelers perceive and consume traffic information in years to come. As the State Department of Transportation, Caltrans needs to monitor and leverage this paradigm shift. 1.2 Historical Narrative The present work originated from several distinct sources. One group was comprised of scientists at the University of California, Berkeley ( UCB) who were investigating applications of mobile sensors. Another group from Nokia Research Center ( NRC) in Palo Alto was interested Chapter 1 Mobile Century 3 in social networking and location- based services that protect the privacy of participants. These two groups identified traffic monitoring as a potential area of overlap. Supported by a seed grant from the Center for Information Technology Research in the Interest of Society ( CITRIS) they began to explore the rich milieu of research questions involving conflicting needs for both data collection and privacy preservation. Soon afterward, Nokia awarded the UCB group with another seed grant that was matched by the University of California under the MICRO program. Independently from the aforementioned groups, another team at the California Center for Innovative Transportation ( CCIT) wrote a proposal in December of 2006, entitled “ Deployment of value- added mobile traffic probes.” This proposal was funded by Caltrans in January of 2007 under Task Order 1021. Although details of TO 1021 evolved over the years, the spirit remained unchanged. Four crucial technologies for ATIS: ( 1) GPS, ( 2) GIS and digital maps, ( 3) internet networking, and ( 4) wireless data communications, are identified in the proposal. In addition, the proposal notes that these technologies have achieved levels of maturity and affordability that place the ATIS industry “ on the verge of an unprecedented boom.” Almost prescient in foresight, the proposal speculates that “ research under this project may serve in creating a paradigm shift in traffic data collection, from fixed sensors paid for by the state to mobile probes deployed as part of a self- sustaining private business model.” The proposal further notes that “ in the past several years, cell- phone based technology has gained momentum as a promising avenue, although previous research and field tests are still not conclusive. Yet the salient point is that a 2- way communication device like a cell phone can provide data about the entire trip of a vehicle, rather than be limited to observations at specific locations. This potentially represents a considerable improvement over fixed readers in terms of both the resolution and the timeliness of the data being collected.” Although recognizing the potential for cell phones as traffic probes, the technology choice for the original proposal was the Dash Navigation Unit. These units each included an accurate GPS receiver, a locally- stored comprehensive digital map, a complete historical traffic model for prediction, significant processing capability, and wireless communications built into a special purpose navigation appliance. Subsequent to the award, however, the provider of the navigation units withdrew from the project. Under sponsorship from Caltrans, the team from CCIT joined forces with the UCB and NRC groups, who had already formed a strong partnership. The first readjustment of the research plan was to replace the Dash Navigation Unit with the Nokia N95 smartphone. These smartphones had lesser GPS capabilities and lacked the navigational aids made possible with locally- stored historical traffic, and digital mapping data. In contrast with the Dash Navigation Unit, the GPS- equipped N95 was more of a general purpose communication device. Note that the N95 was one of the first smartphones ever developed ( before the iPhone) and is regarded as a precursor to today’s participatory sensing based traffic monitoring systems. At the time, Chapter 1 Mobile Century 4 the limitations of the N95 platform forced the researchers to adopt a more focused and simple deployment plan. The UCB and NRC groups contributed a substantially improved experimental protocol. For example, the underlying sampling scheme was made privacy- aware, as is further explained in Chapter 4. The privacy studies were performed in collaboration with a team from Rutgers University, with expertise in the field of privacy research, and funded by Nokia. For now, consider that this early consideration focused technology development in a direction more appropriate for widespread adoption. In addition, the details of the mobile probe deployment were improved to enable a scientific analysis of results. Rather than using a remote site, a well- studied section of I- 880 was chosen to enable rigorous evaluation of the data quality from the GPS- enabled smartphones. The outstanding success of the 100- vehicle deployment precipitated further expansions to the direction of research. Caltrans allocated a supplemental award ( TO 1029), marshaling resources to process the enormous amount of data that were collected during the 100- vehicle deployment. In addition, efforts were refocused toward a second major deployment of probe vehicles at a scale of at least one order of magnitude larger than before. New partnerships ( including subcontracts with Covaluate, an Information Technology Service Provider, and Rensselaer Technology Institute) were forged in accordance with the new directives, which will be the focus of the report on Mobile Millennium, a follow up to Mobile Century launched soon afterwards. 1.3 Scope This is the final report for Task Order 1021 and Task Order 1029. The bulk of this work is embodied in what has become known as Mobile Century. In terms of task orders, this refers to the entirety of the work plan for TO 1029 ( except for the small section entitled Mobile Millennium Traffic Server Development), and Task 2 as written in Amendment A ( replacing the original task order, TO 1021). All other tasks in TO 1021, TO 1029 and subsequent amendments fall in to one of three categories: ( 1) AASHTO presentation ( 2) Mobile Millennium Planning, Design, and Server Development ( 3) Mobile Millennium Arterial Modeling We note here that the above tasks ( 2) and ( 3) related to Mobile Millennium were enormous in scope. The monies allocated from TO 1021 and TO 1029 toward these tasks amounted to a small fraction of the total required to bring these tasks to fruition. Herein we only document initial stages of work toward these tasks funded by TO 1021 and TO 1029. The ultimate success of tasks ( 2) and ( 3) lie outside the scope of Mobile Century, and will be addressed in the Chapter 1 Mobile Century 5 forthcoming Final Report for Mobile Millennium ( Agreement 65A0301). Tasks ( 1), ( 2), and ( 3) as they are relevant to TO 1021 and TO 1029 are reported in Appendix 8. 1.4 Summary of Findings Exploring a new paradigm. Advances in the mobile internet, alongside current trends in cell phone technology and location- based services, place the field of traffic monitoring at the cusp of a new era in data collection. A new paradigm in which GPS- enabled smartphones supply the bulk of raw traffic monitoring data is very promising. Compared to the status quo, the costs of collecting traffic information would be drastically reduced, and coverage could be extended far beyond what is currently feasible with fixed detectors alone. The opportunity for government agencies is significant: the availability of data will empower the traveling public with real- time access to current traffic conditions, while transportation operators will gain access to an unprecedented wealth of information to help them better manage road networks. A successful proof- of- concept. The successful 100- vehicle deployment presented in this report was conceived as a proof- of- concept for a traffic monitoring system based on GPS- enabled mobile phones. During the experimental deployment, an average penetration rate of equipped vehicles was sustained near 3%, which at the time of the experiment was representative of the 18- month growth forecast for the GPS fleet in the smartphone market. Traffic reconstructed from smartphone data. Raw data from the GPS- enabled smartphones alone were sufficient to infer traffic features, i. e., to construct an accurate velocity map over time and space. Therefore, probe vehicles deployed during the Mobile Century experiment were evaluated as providing substantial added value. Since ground- truth travel times were recovered by re- identifying vehicles captured on videotape, these results can be asserted with high confidence. We conclude that the quality of data obtainable from present- day smartphones is adequate for useful, real- time traffic applications, such as calculating travel times. VTL- based monitoring. As will be explained in Chapter 4, a data sampling approach using Virtual Trip Lines ( VTLs) was designed and implemented. The VTL approach combined with a sustained 3% penetration rate of probes provided better data for travel time prediction than that of the PeMS loop detectors spaced at an average distance of 0.35 mi. Furthermore, the use of VTLs provides enough data for traffic monitoring purposes while protecting the privacy of participants. In addition to the privacy benefits, another key advantage of virtual trip lines over physical traffic sensors is the flexibility with which they can be deployed. Challenges. As a business model, significant challenges yet remain. For example, participation of the traveling public is crucial for success. In order to create and maintain the desired service quality, a large number of participants must be recruited and sustained. To achieve this, the right incentives for participation are needed. Premature deployment would be counterproductive. One can imagine a worst case scenario in which a deployment fails for lack of public interest and participation. Chapter 1 Mobile Century 6 Future work. The next iteration of this program includes efforts to extend Mobile Century in a number of ways. First, better methods are required for incorporating data from both static ( loop detectors) and mobile sensors ( GPS- enabled mobile phones). Inverse modeling and data assimilation algorithms aimed at identifying and circumventing potential deficiencies in available data are also necessary. Finally, the monitoring of arterials brings additional challenges that also require much future work. These issues will be explored in a follow up report for Mobile Millennium. Deployment. Stated simply, any future deployment will require substantial research and development. We recommend an evolutionary progression of field operational tests, so that lessons learned during any particular iteration may be incorporated into subsequent efforts as the scope of the system is continually scaled up. To make this possible, technology infrastructure must be implemented to support the computational modeling that will be required. In addition, any future deployment effort must include a strong industry component, and proceed in a way that complements existing trends in mobile computing. Future applications. Future work as a part of this program has direct application toward the strategic goals of Caltrans in the area of operational data collection. Data sharing modalities should be explored between industrial companies, Caltrans, and local public agencies. Although traffic data collected as part of the next iteration will be served back to the mobile phone users who originally generated the data, future iterations may disseminate information much more broadly ( broadcast media, internet websites, personal navigation devices, and roadside changeable message signs). 1.5 Organization of Report The limitations of the status quo, the possibilities enabled by the mobile internet, and the grand- scheme challenges to the new paradigm are introduced in Chapter 2. A substantial literature review is furnished in the broader context of these issues; this discussion applies beyond Mobile Century. In contrast, Chapter 3 focuses specifically on how to build a medium- scale, one- day deployment of a proof- of- concept system. Efforts to determine the parameters of a workable experiment, and initial back- of- the- envelope calculations are described, thus setting the stage for everything that follows in this report. Chapter 4 explores the challenge of building a traffic monitoring system that addresses the goals of ( 1) acquiring quality real- time probe data from GPS- enabled cell phones, and ( 2) protecting participants from privacy threats by design. Initial prototyping was performed to assess the software and hardware systems that were implemented and the quality of data obtained from the smartphones. Assuming that real- time probe data is available, Chapters 5 and 6 describe algorithms to reconstruct the traffic state from that data. Assimilating Lagrangian data in the density domain Chapter 1 Mobile Century 7 is the subject of Chapter 5. An alternate scheme is presented in Chapter 6, in which Lagrangian data is fed directly into a velocity domain model. This velocity domain model was ultimately the one adopted for the real- time, 100- vehicle deployment of February 8, 2008. Chapter 7 stands alone as a narrative of the video validation effort. The selection of video camcorders, trial tests, deployment protocol, and post- processing procedures are described. This crucial aspect of Mobile Century is what made possible an objective comparison of the state- of- the- art, status quo monitoring system ( based on inductive loop detectors) with the proof- of- concept implementation ( based entirely on mobile probes). An overview of the experimental protocol for the 100- vehicle deployment is described in Chapter 8. Explained are the employed resources, established procedures, gathered data, and post- processing efforts. Supplementary material, including more detailed logistics, schedule of execution, and emergency procedures are furnished in the Appendices. Chapter 9 presents the experimental results of the 100- vehicle deployment, the cornerstone of Mobile Century. The trajectory data and reconstructed velocity fields are compared with the ground- truth supplied by the video cameras. Travel time estimated from loop detectors is compared with that estimated from probe data. Chapter 10 revisits the density domain data assimilation methods that were introduced in Chapter 5. Additional evaluation of these methods is performed using the data from the 100- vehicle deployment. This report concludes in Chapter 11 with an evaluation of the project and recommendations for future deployment of the new paradigm for traffic monitoring. Chapter 2 Mobile Century 9 2 Background This chapter begins with a discussion of the status quo in Section 2.1: traffic monitoring with dedicated equipment and sensing infrastructure. Experience shows that deploying, operating, and maintaining new technology in this paradigm is costly and slow. As explained in Section 2.2, advances in the mobile internet bring forth potential to leverage current trends in cell phone technology and location- based services to transcend the old paradigm. Discussed in Section 2.3 are the barriers that impede adoption of this new paradigm. In particular, technical barriers, and social acceptance issues related to privacy concerns are addressed. 2.1 Traffic Monitoring with Dedicated Equipment ( Road Infrastructure) Traffic monitoring with inductive loop detector ( ILD) systems. ILD systems are the most common highway traffic monitoring tool, and have been in use for decades. The current highway monitoring system consists of wire inductive loops placed directly in the top layer of the pavement. When a vehicle passes over the sensor, it is recorded by a roadside controller. In the case of travel time ( the most important performance metric to the driving public), these sensors suffer from some fundamental drawbacks. ILD velocity estimation is inaccurate. ILDs are accurate sensors for flows ( vehicle counts), but they often generate inaccurate velocity measurements. California's freeways are equipped with about 23,000 ILDs embedded in the pavement, accounting for roughly 8000 detector stations. Several of these stations feature a single inductive loop per lane, which cannot measure vehicle speed directly. Practitioners have attempted to create aggregate velocity estimates using the average length of a vehicle on the highway and the percentage of time the sensor is occupied. Even when the sensor is working properly, these estimates are particularly noisy ( with estimates ranging from 20 mph to 120mph) for traffic flowing at greater than 50 mph [ 24]. This has lead researchers to develop algorithms to improve these single loop estimates [ 24, 49, 64, 83, 93]. In contrast, dual loops ( composed of two successive inductive loops) compute velocity by matching the respective occupancy patterns. In practice, they also have been found to produce significant errors [ 24]. Loop detector stations are expensive to deploy and maintain. The cost of an ILD is roughly $ 900-$ 2000 depending on the type of the loop. More importantly, the direct and indirect costs of deployment are significant ( staff to install the sensors, and corresponding impact on traffic). According to the PeMS system [ 91], only 65% of the detectors in California are working properly; the main causes of malfunction are problems with the controller. In [ 90] malfunction rates of loop detectors and their causes were studied using data obtained from loops on the same stretch of I- 880 examined in the present work. The average malfunction rate was 21%, despite significant efforts to maintain system operations during the study. RFID transponders for travel time measurements. Radio- frequency identification ( RFID) transponders are often deployed to collect automatic toll payment, such as FasTrak in California Chapter 2 Mobile Century 10 or E- ZPass in some states on the East coast. These transponders can also be used to obtain individual travel times based on vehicle re- identification [ 10, 116]. Readers located on the side of the road keep a record of the time the transponder ( i. e., the vehicle) crosses that location. Measurements from the same vehicle are matched between consecutive readers to obtain travel time. This technology is successful only when drivers have an incentive to carry the transponder ( such as sorter toll booth queues), and can only provide travel times between segments where the readers have been deployed. Travel time measurement through LPR technology. License plate readers consist of high speed cameras that record the license plates of vehicles on the highway. As a vehicle passes multiple cameras, the travel time between the readers is computed. Although LPRs avoid the need for in- vehicle equipment, these systems are complicated to install, and require an additional camera for each lane of traffic to be monitored. The relatively high cost of the readers ( in the $ 10,000 range plus installation costs), have limited their widespread implementation. Example deployments include Traffic Master’s passive target flow management ( PTFM) on trunk roads in the United Kingdom [ 112], and Oregon DOT’s Frontier Travel Time project [ 16]. Traffic monitoring with dedicated probe vehicles. Dedicated probe vehicles equipped with a Global Positioning System ( GPS) device are capable of collecting information such as position, speed, and travel time. The work in [ 100] addressed some of the key issues of a traffic monitoring system based on probe vehicle reports, and concluded that they constitute a feasible source of traffic data. The work in [ 123] also investigated the use of GPS devices as a source of data for traffic monitoring. Two tests were performed to evaluate the accuracy of GPS as a source of velocity and acceleration data. The accuracy level was found to be good, despite limitations of the selective availability1 feature that was imposed at the time of the study [ 92]. Deployment of probe vehicles. HICOMP [ 52] is an example of one small- scale deployment of dedicated probe vehicles using GPS devices to monitor traffic for some freeways and major highways in California. Unfortunately, dedicated probe vehicles equipped with a GPS device represent added cost that cannot be applied at a global scale. As pointed out by [ 74], the penetration of HICOMP is low and the collected travel times are not as reliable as other systems such as PeMS. Other approaches have investigated the possibility of using dedicated fleets of vehicles equipped with GPS or automatic vehicle location ( AVL) technology to monitor traffic [ 17, 85, 102], such as FedEx, UPS trucks, taxis, buses or other dedicated vehicles. While industry models have been successful at gathering substantial amounts of historical data using this strategy, for example Inrix, the use of dedicated fleets always poses issues of coverage, penetration, bias due to operational constraints and specific travel patterns. Nevertheless, it appears to be a viable source of data, particularly in large cities. 1 Selective availability is the intentional inclusion of positioning error in civilian GPS receivers. It was introduced by the Department of Defense of the U. S. to prevent these devices from being used in a military attack on the U. S. This feature was turned off on May 1, 2000. Chapter 2 Mobile Century 11 Deployment of dedicated communication systems is slow. One policy intended to enable dedicated communication systems for the transportation network has achieved only limited deployment. On October 21, 1999, the Federal Communications Commission allocated 75MHz of spectrum as part of the US Department of Transportation’s ( DOT) Intelligent Transportation Systems ( ITS) US- wide program, with mostly traveler safety, fuel efficiency and pollution in mind. The first industry- government supported standard followed on August 24, 2001, when ASTM’s E17.51 Standards Committee voted 20- 2 to base Dedicated Short Range Communication ( DSRC) on a modification of the IEEE 802.11a specification, now named IEEE 802.11p. At the same time, the US DOT launched a plan that included the deployment of around 250,000 roadside DSRC radios, but only led to around 100 radios deployed for the entire US as of 2008 ( mostly in Michigan and California). This example highlights the difficulty of creating a dedicated communication system for the transportation network. 2.2 Growth of the Mobile Internet Smartphones as sensors of the built environment. The convergence of communication and sensing on multimedia platforms such as smartphones provides the engineering community with unprecedented monitoring capabilities. Smartphones include a video camera, numerous sensors ( accelerometers, magnetometers, light sensors, GPS, microphones), wireless communication outlets ( GSM, GPRS, WiFi, Bluetooth, infrared), computational power and memory. With the rise of the Android and the iPhone, this trend has now greatly accelerated. These phones can be used to listen to the radio, to watch digital TV, to browse the internet, to do video conferencing, to scan barcodes, to read PDFs, and the list is endless. The rapid penetration of GPS in smartphones is enabling device geopositioning and context awareness, which in turn is causing an explosion of Location Based Services ( heavily relying on mapping) on the devices. For example, Nokia Maps displays theaters and museums near the phone, Google Mobile provides driving directions from the phone location, and the iPhone Travelocity shows hotels near the phone. Due to their portability, computation, and communication capabilities, smartphones are becoming useful for numerous applications in which they act as sensors moving with humans embedded in the built infrastructure. Large scale applications include everything from population migration tracking and traffic flow estimation to physical activity monitoring for assisted living. The competition for probe traffic data collection as a proxy for the larger war to conquer the mobile internet. There has been a trend of increased levels of competition between cell phone manufacturers, network providers, internet service providers, computer and software manufacturers, and mapping companies. Following the transition from desktops to laptops to smaller and more portable devices, top companies in these industries are redefining themselves to remain relevant as the internet goes mobile. In the context of traffic monitoring, the examples below show the importance of information technology for transportation systems. In late 2007, Google made a move toward the phone industry with the launch of the Open Handset Alliance and the Linux- based Android platform ( leading to the T- Mobile G1 Google phone). In part because of the pressure to use open platforms enhanced by the Google OS, Nokia, who manufactures 40% of the cell phones in the world, purchased Symbian, which Chapter 2 Mobile Century 12 licenses the operating system running on more than half of the smartphones in the world. Nokia then established the Symbian Foundation, with the intention of unifying the platform and making it open- source ( Apple also partially opened its iPhone OS to software developers with the release of a software development kit). To strengthen its own mapping capabilities, Nokia also bought Navteq, which is the largest mapping company in the world, following personal navigation device manufacturer TomTom’s purchase of Tele Atlas, Navteq’s chief competitor. Navteq in turn owns Traffic. com, one of the leading traffic data collection and broadcast companies. Its competitors include Inrix, which provides traffic data to Microsoft’s web, desktop, and mobile applications. Smartphones: a transformation from dedicated infrastructure to market- driven technology. The scale at which cell phones are produced, and the rate at which they integrate new technology, is dramatic. The total number of cell phones worldwide exceeded three billion at the time of this project. Some European countries have a penetration rate of more than 150% ( 150 cell phones for 100 people), and forecast 1 billion smartphones by 2012. Nokia alone produces more than 13 phones a second; with the increasing penetration of GPS in the cellular phone fleet, cell phones will soon constitute one of the major traffic information sources available to the public. In North America and Europe, the overwhelming majority of commuters have a cell phone, potentially populating the entire arterial network with probe traffic sensors. Obviously, the use of cellular devices as traffic sensors has numerous benefits: ( 1) It is possible to leverage the market driven communication infrastructure already in place; ( 2) The spatio- temporal penetration of cell phones in the transportation network is increasing at an extremely fast pace; ( 3) The use of cell phones as traffic probes is device and carrier agnostic, leading to faster penetration; and, ( 4) Major car manufacturing companies already have cradles and interfaces with cell phones ( for example BMW and the iPhone) in their new cars, so the sensing information gathered by modern cars can also be sent to such monitoring systems. 2.3 Barriers to a New Paradigm New paradigm. The concept of using cell phones as sensors has the potential to usher in a new paradigm for traffic monitoring. Such systems promise to significantly improve coverage and timeliness of traffic information [ 5, 56, 58]. Near- term solutions will require GPS measurements to be fused with traditional sources of traffic information such as loop detectors, camera, and human reports. However, with sufficient penetration, this approach could potentially enable the collection of real- time traffic information over the complete road network, including arterials, at minimal cost for transportation agencies. Barriers. Several studies have demonstrated the feasibility of probe based traffic estimation through analysis, simulations, and experiments [ 29, 37, 111, 120]. Yet many challenges, both technical and societal must be addressed. Chapter 2 Mobile Century 13 2.3.1 Technical Barriers Non- GPS based localization of cellphones is problematic. Multiple technological solutions exist to overcome the localization problem using cell phones. Historically, the seminal approach chosen for monitoring vehicle motion using cell phones ( prior to the rapid penetration of GPS in cellular devices) uses cell tower signal information to identify a handset’s location. This technique usually relies on triangulation, trilateration, tower hand- offs, or a combination of these. Several studies have investigated the use of mobile phones for traffic monitoring using this approach [ 13, 38, 81, 115, 117]. The fundamental challenge in using cell tower information for estimating position and motion of vehicles is the inherent inaccuracy of the method, which poses significant difficulties to the computation of speed. Several solutions have been implemented to circumvent this difficulty, in particular by the company Airsage, which historically developed its traffic monitoring infrastructure based on cell tower information [ 80, 104]. Based on the time difference between two positions, average link travel time and speed can be estimated. [ 119] conducted a field experiment to compare the performance of cell phones and GPS devices for traffic monitoring. The study concluded that GPS technology is more accurate than cell tower signals for tracking purposes. In addition, the low positioning accuracy of non- GPS based methods prevents its massive use for monitoring purposes, especially in places with complex road geometries. Also, while travel times for large spatio- temporal scales can be obtained from such methods, other traffic variables of interest, such as instantaneous velocity are more challenging to obtain accurately. GPS based localization provides high quality data. Increasing numbers of smartphones or PDAs come with GPS as a standard feature. This technology can provide more accurate location information and thus more accurate traffic data such as speeds and/ or travel times. Additional quantities can potentially be obtained from these devices, such as instantaneous velocity, acceleration, and direction of travel. In [ 38], cell phones are used for traffic monitoring purposes, and the need for GPS- level accuracy position information to compute reasonable estimates of travel time and speed is discussed. Furthermore, [ 118] and [ 119] concluded that if GPS- equipped cell phones are widely used, they will become a more attractive and realistic alternative for traffic monitoring. GPS- enabled mobile phones can potentially provide exhaustive spatial and temporal coverage of the transportation network when there is traffic, with the high positioning accuracy achieved by a GPS receiver. Lagrangian vs. Eulerian information. While cellular phones provide an ideal bridge between the physical world ( vehicle flows and dynamics on the road) and the information world ( software systems monitoring the network), there is one major difference between the data collected by cell phones and traditional data, commonly used to estimate traffic in real time: the data collected by phones in cars is Lagrangian, i. e. gathered along cars trajectories, and not Eulerian, i. e., control volume based. This poses major challenges in building an information system for a cyberphysical infrastructure such as the transportation network. While a static loop detector or a camera ( both Eulerian) can easily capture all vehicles going through the space monitored by the sensor, and therefore infer aggregate quantities ( flows, counts, local speed), a Lagrangian Chapter 2 Mobile Century 14 sensor can only monitor quantities following the vehicle, without direct access to flows, counts, etc. Distributed models for the transportation network. Because GPS enabled phones measure velocity, or travel time between two consecutive GPS readings, constitutive models used to describe the evolution of the system need to incorporate these reading and bypass quantities which cannot be measured ( density, flows, counts). The development of such flow models, for highways and arterials is still at its infancy. Techniques used for this include partial differential equations, queuing systems, and hybrid system models of flow equations. Machine learning models to circumvent lack of geographical infrastructure information. Knowledge of signage, traffic light presence, and cycle information is difficult to procure. The presence of stop signs, lights, and their effect on traffic is not available from databases on a US- wide scale. Furthermore, they change too often to be incorporated into flow models. This difficulty has to be circumvented by machine learning algorithms capable of learning the flow features without knowledge of the detailed infrastructure, using techniques such as clustering analysis. Inverse modeling and data assimilation. In the age of massive data collection, one of the most fundamental theoretical challenges associated with the reconstruction of traffic using mobile data will be the proper use of techniques to incorporate data into flow models or statistical models. The development of these techniques in fields such as oceanography or meteorology is relatively mature. For large- scale infrastructure systems, the state of modeling, model inversion and computation is still at its infancy, but promises significant breakthroughs in the near future. Considerations for initially low penetration of equipped vehicles. As suggested in the literature [ 72, 94, 117, 118] field tests are needed to assess the potential of new technologies such as GPS- enabled mobile phones. Test deployments to assess the potential of traffic monitoring using cell phones go back to the advent of GPS on phones. In particular, the study of [ 30] investigates the deployment of 200 vehicles for an extended period of three months and the potential data that can be gathered from it. In light of that study, one of the main issues in experiments or pilot tests is the problem of penetration, i. e. percentage of vehicles equipped vs. total number of vehicles on the road. Real- time, online and robust availability. Unlike the more permanent Eulerian detectors, to which data quality, reliability and performance indices can be easily attributed, the penetration of cell phones at a given location and time is highly variable. Before this type of monitoring becomes the standard, the participation of the public will be spatially and temporally unpredictable. This means that the algorithms used for estimating traffic must be robust to variability in penetration. Chapter 2 Mobile Century 15 2.3.2 Privacy Issues and Societal Acceptance Privacy concerns with a new paradigm. Traffic monitoring through GPS- equipped vehicles raises significant privacy concerns, because the external traffic monitoring entity acquires fine- grained movement traces of the probe vehicle drivers. These location traces might reveal sensitive places that drivers have visited, from which, for example, medical conditions, political affiliations, speeding, or potential involvement in traffic accidents could be inferred. Furthermore, the correlation of this data with existing records poses specific threats to the preservation of privacy. Example of data granularity. A variety of sampling techniques can be used to collect data from GPS enabled mobile devices. In the case of the Nokia N95, the embedded GPS chip- set is capable of producing a time- stamped geo- position ( latitude, longitude, altitude) once every three seconds. From this time and position data, the instantaneous velocity is produced by the phone at the same frequency. Over time, this vehicle trajectory and velocity information produces a rich history of the dynamics of the vehicle and the velocity field through which it evolves. Risk of unintended re- identification. While this level of detail is particularly useful for traffic estimation, it can be privacy invasive, since the device is ultimately carried by a single user. Even if personally identifiable information from the data is replaced with a randomly chosen ID through a process known as pseudo- anonymization, it is still possible to reidentify individuals from trajectory data. For example, pseudo- anonymous trajectories have been combined with free, publicly available data sets to determine the addresses of participant’s homes [ 54]. Data value: sensitivity vs. utility. The transmission of high frequency data without regard to location also wastes resources throughout the system, which can pose scalability problems. In addition to disclosing sensitive information, the trajectory information on small roadways near users homes are of lower value to the general commuting public than major thoroughfares such as interstates. Thus, collection of low utility and highly sensitive data should be avoided when sampling using mobile devices. Spatially aware sampling and privacy. At the heart of such a system, privacy- by- design sampling techniques must be used to prevent privacy invasion. In addition to proper anonymous data collection and encryption, sampling the vehicles at locations which are privacy safe is key to ensuring the ongoing participation of the public that is needed for such a system. Disincentives for participation. Realistically, future users will have the option to choose the terms under which they share location information. Without providing tangible benefit, or safeguards to insure that an acceptable level of privacy can be guaranteed, adoption will not be widespread among the traveling public. Current studies regarding privacy concerns are inadequate. Traffic monitoring applications based on a large number of probe vehicles have recently received much attention [ 21, 57, 120]. Chapter 2 Mobile Century 16 None of these works have addressed location privacy concerns in such systems. Since most traffic monitoring applications do not depend on the specific identification information about probe vehicles, the anonymization of sensing information has been a solution in practical deployments [ 58, 59, 110]. Not surprisingly, recent analyses of GPS traces [ 47, 71, 73], have shown that naive anonymization by simply omitting identifiers from a location dataset does not guarantee anonymity. Unique parts of GPS traces may be exploited to re- identify individuals using multi- target tracking, k- means clustering, or fingerprinting approaches to identify computer systems. Centralized architectures for privacy protection. Therefore, several stronger protection mechanisms have been investigated. The k- anonymity concept [ 99, 108] provides a guaranteed level of anonymity for a database, although some recent studies [ 69, 82] have identified weaknesses. For location services, the k- anonymity concept has led to the development of centralized architectures that temporally and spatially cloak location- based queries [ 41, 46, 84]. This present work, in comparison, concentrates on providing privacy without requiring a single trustworthy entity. Other best effort approaches require a trustworthy server. There are many best effort approaches [ 15, 65] that degrade information in a controlled way before releasing it. These approaches can be implemented in a centralized architecture or a decentralized approach. Many best effort approaches successfully preserve the privacy of users in high density areas, but they do not guarantee the privacy regardless of user density and user behavior pattern. [ 55] proposes the uncertainty- aware path cloaking algorithm to provide guaranteed privacy regardless of user density, but this again requires the existence of a trustworthy privacy server. Perturbation and access control. Anonymous communication systems ( e. g., onion routing [ 31, 43]) use a similar approach of distributing knowledge over several mixes. Random perturbation approaches for privacy- aware data mining [ 6, 7], which perturbs the collected inputs from users to preserve privacy of data subjects while maintaining the quality of data, are not applicable for time- series location data since noise with large variance does not preserve sufficient data accuracy, while noise with small variance may be filtered by tracking algorithms due to the spatio- temporal nature of the data [ 70]. Access control methods [ 39, 121] restrict access to data to permitted users. However, these techniques do not fully address the dishonest insider challenge. Further, they are not applicable to business models where the aggregated data is transferred to third party. Chapter 3 Mobile Century 17 3 Design Challenges This chapter furnishes an overview of how practical requirements shaped the one- day, experimental deployment of probe vehicles. Logical steps are retraced, and initial back- of- the- envelope calculations are described. The goal is to design an experiment to capture the essence of what a new traffic monitoring paradigm might look like. How might one address the key barriers outlined in Chapter 2? What constitutes a proof- of- concept? How might one build a miniature version of a near- term deployment? Projections suggested that in five to ten years, a substantial fraction of cell phones will be equipped with GPS receivers. In such a world, how accurately can traffic conditions be reconstructed in real- time with only GPS- enabled cell phone data? The experiment was designed to answer these types of questions. Achieve desired penetration rate by design. A survey of previous work revealed that one key issue of smartphone based systems is the penetration rate. The penetration rate is defined as the fraction of vehicles ( by flow) that act as probes to provide traffic data. The traffic state cannot be estimated to any useful precision when data is too sparse; i. e., the penetration rate is too low. From a traffic monitoring perspective, any successful experiment must maintain a penetration rate above some threshold. Previous studies reported that data coming from about 3% to 5% of the total flow are sufficient to obtain accurate estimates of the travel time [ 100, 115, 119]. This penetration rate is also realistic as a near- term future possibility. The present work distinguishes itself from previous studies in that a sufficiently high penetration rate was achieved by the design approach described in this chapter. Addressing barriers to the new paradigm. Questions motivated by the discussion of Chapter 2 are reiterated here. The quality of data acquired from a privacy- aware sampling scheme is to be investigated. Is the information content of this data appropriate for reconstructing traffic states in practice? Are the algorithms employed able to reconstruct traffic states with adequate precision? How does the experimental system compare with a state- of- the- art system using ILDs? Specifically, how do both systems compare with ground- truth travel times, and how might one acquire that ground truth? 3.1 Preliminary Investigation Confirm necessary penetration rate with NGSIM data. Trajectories from the NGSIM2 data set provide accurate ground truth for all vehicles traveling along a 2000 ft stretch of expressway for a duration of 45 minutes. Measures such as vehicle accumulation and exact travel times can be calculated directly from the ground truth. The problem to be solved is to sample only a limited amount of information from the original data set, reconstruct the traffic flow, and to estimate vehicle accumulations and travel times based on the reconstruction. The estimated accumulations and travel times are compared with the ground truth, and errors are quantified. 2 http:// ngsim. camsys. com/ Chapter 3 Mobile Century 18 Consider Kalman Filtering and Newtonian relaxation algorithms. The accuracy of the estimates is limited by two factors, the quantity of information, and the algorithms for traffic reconstruction. At this early planning stage, sampling was assumed to occur at a fixed rate in time ( temporal based sampling), and modeling was assumed to be performed in the density domain. Two algorithms were considered, Kalman Filtering and Newtonian relaxation ( the latter is also called the “ nudging method,” borrowed from oceanography). These two methods are described in detail in Chapter 5, and further evaluated in Chapter 10. This chapter furnishes a description of strictly preliminary findings used during the initial design stages of the experiment. Figure 3.1: Vehicle trajectories from NGSIM. Shown in red, a flow fraction, , of trajectories are randomly designated as probes. Problem formulation. The problem is posed in the following way. Assume that a fraction of vehicles on the expressway are equipped with GPS- enabled phones. In Figure 3.1, these equipped vehicles, shown in red, flow along with general expressway traffic, shown in blue. An example trajectory from one probe vehicle is displayed in Figure 3.2. It is assumed that the equipped vehicles can calculate average velocity over a time period . In addition, the probes report their velocity and position once every T seconds. The penetration rate and the sampling rate T determine the quantity of data available to the reconstruction algorithms. For Chapter 3 Mobile Century 19 each of the four test scenarios defined in Table 3.1, the number of observation samples arising from each parameter set is listed. Figure 3.2: One vehicle trajectory. Parameters are shown for real- time probe data reports. Table 3.1: Test cases for traffic reconstruction. Case (%) T ( sec) t ( sec) # of Lag. observations 1 5 600 30 117 2 5 10 10 773 3 20 600 30 417 4 20 10 10 2577 Chapter 3 Mobile Century 20 Reconstruct traffic using measurements distributed in space and time. For the sake of this discussion, it is assumed that a communication network has been implemented to collect data from multiple cell phones and to deliver it to some central server where the algorithms will be run. The observations are dispersed in space and time as shown in Figure 3.3. The immediate goal is to reconstruct traffic flow based on these data. For this purpose, the expressway is discretized and the cell transmission model is employed to solve3 the conservation partial differential equation ( PDE), the underlying model for this entire study. 4 Figure 3.3: Spatio- temporal dispersion of probe measurements for different combinations of penetration and sampling rates. Solving the PDE. Assuming only initial and boundary conditions, the PDE can be solved. This solution is referred to as the EDO solution, for Eulerian data only. Typically, the EDO solution has poor accuracy at spatio- temporal regions far from the initial and boundary conditions. As the PDE solver steps through time, probe data becomes available. This data needs to be 3 Methods are described in detail in Chapter 5. 4 Based on traffic flow physics, [ 79] and [ 97] independently proposed a first order partial differential equation ( referred to as the LWR PDE) to describe traffic evolution over time and space. Chapter 3 Mobile Century 21 incorporated into the solution so as to improve accuracy. Physically, this corresponds to adding or deleting vehicles in the cells so that the solution agrees with the additional data. Incorporating Lagrangian data in the solution. Two algorithms were used to incorporate the Lagrangian data into the traffic reconstruction. The results were used to calculate measures comparable to the ground truth. Figure 3.4 shows how estimates of vehicular accumulations are improved as Lagrangian data is incorporated. Figure 3.5 shows that as more Lagrangian data is incorporated, the performance of the reconstruction algorithm improves. Figure 3.6 displays estimates of vehicular density at the middle of the modeled expressway section. Figure 3.7 displays estimates of travel time. Details of the mathematical tools employed to generate the estimates shown in these figures are presented in Chapter 5. Figure 3.4: Vehicle accumulations. Comparison of estimates using the Kalman filter method ( top), and Nudging method ( bottom). In both cases, the incorporation of Lagrangian data results in improved estimates over those using Eulerian data only. Chapter 3 Mobile Century 22 Figure 3.5: Vehicle accumulations. The higher the sampling rate, the higher the fidelity of the accumulation estimate. Figure 3.6: Vehicle density estimated at the middle of the modeled expressway section. Chapter 3 Mobile Century 23 Figure 3.7: Travel time estimation for the modeled expressway section. Discussion of results. The algorithms used here, and described later in the report, incorporate Lagrangian data in the modeling, and the accuracy of the estimates improve as more Lagrangian data are made available. Based on preliminary findings, Kalman Filtering slightly outperforms the nudging method. For this reason, Kalman Filtering was eventually chosen for the actual probe deployment. As indicated above, more details on these methods are furnished in Chapter 5, and Chapter 10. Sampling rate issues. There is a clear design trade- off between penetration rate and sampling rate. The lower the penetration rate, the greater the necessary sampling rate for a given necessary quantity of data. Of course, a greater sampling rate can be more privacy intrusive. This issue leads to yet another trade- off between quality of data for traffic monitoring and privacy preservation; these issues are discussed in detail in Chapter 4. 3.2 Scale of Experiment Scale of experiment. An experiment with ten vehicles would be too small, and an experiment with 1000 vehicles would not be feasible. To a first order, the scale of the experiment was a function of available funding. How many vehicles can be rented, and how many drivers can be employed within the funding constraints? For the clarity of exposition, we assume an experiment involving 100 vehicles, and investigate the consequences of this choice in terms of logistics and experiment design. Chapter 3 Mobile Century 24 Formulation of constraints. Assuming that probe vehicles circulate in loops along an expressway, the cycle time, C, is given by: 121211CLTTvv ( 3- 1) Where L is the length of the section ( one way), iv is the average speed for an equipped vehicle on direction i, and iT is the lost time incurred when exiting the expressway and entering again in the opposite direction. The specification of a minimum penetration rate, , places a constraint on the number of equipped vehicles, N, the maximum flow in either direction, F, and the cycle time, C, as NCF ( 3- 2) Intuitively, the constraint is met when N stays high, and Fstays low. Constraints may also be formulated in terms of occupancy. For example, 2NGLr ( 3- 3) Where G is the average effective vehicle length, and r is the occupancy. Assuming G= 20 ft, = 0.2 and N= 100, the maximum feasible expressway length can be calculated. In free- flow conditions near capacity, one might expect r= 0.1, and L= 9.5 miles. With congestion in both directions, one might expect r= 0.3, and L= 3.2 miles. Consequences of 100 vehicles. A moderately scaled experiment with 100 participating probe vehicles has consequences. Chief among these is that a base station becomes necessary for staging purposes. The base camp requires significant space; i. e., a large parking lot that one might find at a shopping center. In addition, an experiment of this size requires non- trivial logistical support. The participation of probe vehicles, together with the penetration requirement, determines the feasible area that can be monitored. Capture free flow and congestion conditions by design. To be a feasible paradigm for traffic monitoring, it is crucial to show that the methods and algorithms will work in both congestion and in free flow. For this purpose, a location must be chosen that can serve as a dependable source of recurrent congestion. Variable spatial zone vs. variable number of probes. As shown above, the monitoring capability with a fixed number of vehicles depends greatly on the traffic conditions to be monitored. If the spatial zone of the experiment were constant, than one would need a variable number of probe vehicles as traffic conditions vary. Instead, it was proposed to define Chapter 3 Mobile Century 25 the spatial zone of the experiment to correspond appropriately with the expected traffic conditions during the day. Compare status quo monitoring with capabilities of new paradigm. Another requirement is to make a fair comparison with start- of- the- art traffic monitoring systems such as 511. org that make use of ILDs. For this purpose, an expressway segment with good detector coverage is necessary. In addition, both systems must be benchmarked with ground- truth travel time data. Therefore, it was necessary to instrument the endpoints of the expressway section with video cameras capable of re- identifying vehicles. Effective measurement, without disrupting traffic, by design. Previous work with Caltrans on a PARAMICS model for I- 880 was leveraged to refine the experimental protocol. With this model, the incremental effect of adding 100 extra cars onto I- 880 was investigated. The question was whether the additional 100 vehicles would cause an undue influence on the data to be gathered. From simulation, it became clear that with 100 extra vehicles traveling in loops, that it was possible for on- ramps or off- ramps to become oversaturated. One outcome of the simulation was to break up the 100 vehicles into three separate groups with three separate routes. In this way, the probe vehicles would sense traffic, but not unduly affect the traffic being measured. Privacy preservation. Having three routes in which vehicles may pass each other improves the data set for the purposes of investigating the privacy threats inherent in the data gathering mechanisms. This issue was of particular concern for one of the corporate partners, Nokia. With only one circuit, and with few opportunities for vehicles to pass each other, re- identification becomes trivial. Three circuits provide for overtaking and mixing, as would be the case with less contrived driving situations. 3.3 Practical Considerations Legal requirements affecting experimental protocol. As the experiment began to take shape, the formal role of the drivers to be hired needed further clarification. In particular, it was necessary to understand how federal regulations and policies for the protection of human subjects might impact the experimental protocol. Office for the Protection of Human Subjects. It is essentially for this purpose that the Office for the Protection of Human Subjects ( OPHS) exists at the University of California, Berkeley. This office coordinates with the Committee for Protection of Human Subjects ( CPHS) of which there are two groups: CPHS- 1; and CPHS- 2, who serve as Institutional Review Boards ( IRBs). These IRBs ensure the protection of the rights and welfare of all human participants in research conducted by university faculty, staff and students. Initial protocol to comply with federal and university requirements. In particular, a CPHS Narrative Form detailing the proposed protocol was submitted for approval. This document is included as Appendix 1 in its entirety. Although the final protocol for the February 2008 Chapter 3 Mobile Century 26 experiment differs from what was proposed in March of 2007, Appendix 1 is useful as a historical document that captures a snapshot of the evolving experiment that came to be known as Mobile Century. Practical considerations. The driver schedule was defined based on the most restrictive truck driver regulations worldwide, stipulating at least 45 minutes of rest for every 4 hours of driving. In addition, to limit the total number of hours worked, it was deemed infeasible to begin the experiment early enough in the morning to capture both the morning rush and the evening rush. Instead, only the evening rush was intended to be captured. Institutional collaboration. The collaborative model for the present work was that industry provides implementation expertise. The NRC team was tasked to build the back- end infrastructure, the cell phone client, the real- time sampling scheme, and the servers to get data off the phones, and software to visualize the results. The role for academia was to provide new processing techniques, scientific expertise, and algorithm development. The UCB team was to perform the traffic experiment design, propose a suitable site, and build the algorithms to estimate traffic conditions. The UCB team consisted of about twenty to thirty people during the evolution of the experiment design. Meetings were regularly held between members of both teams to coordinate the details of data- handling. In particular, data from the cell phones needed to be supplied in a useable form for the UCB team’s algorithms and then passed back to the NRC team’s software for visualization. Choosing a date. When choosing a date for the experiment, the key concern was to find a part of the semester when graduate students would be available to participate. Weather was also a concern, but avoiding finals week and vacation weekends was crucial. Taking these factors into account, the second week of February was chosen as the target date. 3.4 Alternate Site A site in Danville, pictured in Figure 3.8, was initially envisioned as a possible venue for the 100- car deployment due to its proximity to Berkeley and ease of access. However, detailed inspection of the traffic properties along this site revealed that the presence of recurrent congestion was not reliable. In addition, this site suffered from a lack of reliable PeMS coverage available at that time. The I- 880 site eventually chosen for the 100- car deployment had excellent PeMS coverage, which proved to be crucial for analysis purposes. Chapter 3 Mobile Century 27 Figure 3.8: Alternate I- 680 site. Chapter 3 Mobile Century 28 Chapter 4 Mobile Century 29 4 VTL Traffic Monitoring This chapter begins with an overview of privacy risks in Section 4.1. To address these risks, the VTL concept is introduced in Section 4.2 as a spatial sampling scheme. In Section 4.3, a traffic monitoring architecture is designed and implemented according to the VTL scheme. Next, two prototyping efforts are described. The data collected in these small- scale deployments are examined briefly in Section 4.1. The privacy risk and quality of travel time estimation is a function of the number and placement of VTLs. This trade- off is explored in Section 4.2. Privacy is addressed by defining exclusion zones and a minimum spacing. Travel time quality is addressed by optimal placement of VTLs. The VTL scheme is compared against a temporal sampling strategy. In Section 4.3, the tradeoff between privacy and data quality is further explored. Key results are summarized in Section 4.4. 4.1 Privacy Risks and Threat Model Privacy concerns for participating drivers. Traffic monitoring through GPS- equipped cell phones raises significant privacy concerns for participating users. Social acceptance of such monitoring is less likely if location traces are detailed enough to infer medical conditions, political affiliations, speeding, or involvement in traffic accidents. Threat Model and Assumptions. The present work assumes that adversaries can compromise any single infrastructure component to extract information and can eavesdrop on network communications. We assume that different infrastructure parties do not collude and that a driver’s own handset is trustworthy. We believe this model is useful in light of the many data breaches that occur due to dishonest insiders, hacked servers, stolen computers, or lost storage media ( see [ 4] for an extensive list, including a dishonest insider case that released 4500 records from California’s FasTrak automated road toll collection system). These cases usually involve the compromise of log files or databases in a single system component and motivate our approach of ensuring that no single infrastructure component can accumulate sensitive information. Naive anonymization is insufficient to protect privacy. We consider sensitive information to be any information from which the precise location of an individual at a given time can be inferred. Traffic monitoring requires at least aggregated statistics from a large number of probe vehicles, but does not require individual node identities. Therefore, one obvious privacy measure would be to anonymize the location data by removing identifiers such as network addresses. This approach is insufficient, however, because drivers can often be re- identified by correlating anonymous location traces with identified data from other sources. For example, home locations can be identified from anonymous GPS traces [ 54, 73], which may be correlated with address databases to infer the likely driver. Similarly, records on work locations or automatic toll booth records could help identify drivers. Even if anonymous point location samples from several drivers are mixed, it is possible to reconstruct individual traces because successive Chapter 4 Mobile Century 30 samples from the same vehicle inherently share a high spatio- temporal correlation. If overall vehicle density is low, samples that are close in time and space likely originate from the same vehicle. This approach is formalized in target tracking models [ 96]. Example formulation of a tracking model. As an example of tracking anonymous samples, consider the following problem: given a time series of anonymous location and speed samples mixed from multiple users, extract a subset of samples generated by the same vehicle. Toward this end, an adversary can predict the location of the next sample ttttxvtx based on the reported speed of the previous sample, where tx and ttx are locations at time t and tt , respectively, and tv is the reported speed at t. The adversary then associates the prior location sample with the next sample closest to the prediction, or more formally with the most likely sample, where likelihood can be described through a conditional probability () tttPxx that primarily depends on spatial and temporal proximity to the prediction. The probability can be modeled through a probability density function ( pdf) of distance ( or time) differences between the predicted sample and an actual sample ( under the assumption that the distance difference is independent of the given location sample). Speed patterns correlate with route choice, and provide clues to an adversary. Knowing speed patterns further helps tracking anonymous location samples if it is combined with map information. For example, consider the traffic scenarios depicted in Figure 4.1. On straight sections ( a) vehicles on high- occupancy vehicle ( HOV) or overtaking lanes often experience lower variance in speed. Vehicles entering at an on- ramp ( b) or exiting after an off- ramp ( c) usually drive slower than main road traffic. These general observations can be formally introduced into the tracking model by assigning an a priori probability derived from the speed deviations. For example, to identify the next location sample after an on- ramp for a vehicle that generated tx on the main route before the ramp, an adversary could assign a lower probability to location samples with low speed. These low speed samples are likely generated by vehicles that just entered after the ramp. Privacy Metrics. As observed in [ 55], the degree of privacy risk depends on how long an adversary successfully tracks a vehicle. Longer tracking increases the likelihood that an adversary can identify a vehicle and observe it visiting sensitive places. We thus adopt the time- to- confusion [ 55] metric and its variant distance- to- confusion, which measures the time or distance over which tracking may be possible. Distance- to- confusion is defined as the travel distance until tracking uncertainty rises above a defined threshold. Tracking uncertainty is calculated separately for each location sample in a trace as the entropy logiiHpp , where the ip are the normalized probabilities derived from the likelihood values described later. These likelihood values are calculated for every location sample generated within a temporal and spatial window after the location sample under consideration. Chapter 4 Mobile Century 31 Figure 4.1: Driving Patterns and Speed Variations in Highway Traffic. 4.2 Preserving Privacy with Virtual Trip Lines We introduce the concept of virtual trip lines ( VTLs) for privacy- preserving monitoring and describe an architecture that embodies it. 4.2.1 Design Goals Tradeoff between quality information and privacy protection. The big- picture challenge is to balance the tradeoff between two conflicting requirements. On one hand, quality traffic information needs to be acquired from each smartphone client. On the other hand, all gathered information must be limited and structured in such a way that it is unwieldy, or difficult, to exploit for unintended purposes. In particular, we address issues of privacy invasion that have the potential to hinder widespread social acceptance of such a system. Privacy. We aim to achieve privacy protection by design so that the compromise of a single entity, even by an insider at the service provider, does not allow the identifying or tracking of users. Data Integrity. The system should not allow adversaries to insert spoofed data, which would reduce the data quality of traffic information. This is especially challenging because it conflicts with the desire for anonymity. Chapter 4 Mobile Century 32 Smartphone Client. The client software must cope with the resource constraints of current smartphone platforms. For energy consumption, we mainly focus on designing a light- weight component that filters noisy GPS samples and computes trip- line measurements. 4.2.2 Virtual Trip Line Concept Definition of VTL. The proposed traffic monitoring system builds on the concept of virtual trip lines and the notion of separating the communication and traffic monitoring responsibilities ( as introduced in [ 54]). A VTL is a line in geographic space that, when crossed, triggers a client to send a VTL sample to the traffic monitoring server. More specifically, it is defined by: 1122,,,,,][ yxytlidxdv ( 4- 1) where vtlid is the virtual trip line ID, 1x, 1y, 2x, and 2y are the () xy coordinates of two line endpoints, and d is a default direction vector ( e. g., N- S or E- W). When a vehicle traverses the trip line its VTL sample comprises time, trip line ID, speed, and the direction of crossing. The trip lines are pre- generated and downloaded and stored in clients. Spatial sampling preferred over temporal sampling. Virtual trip lines control disclosure of location data by sampling in space rather than sampling in time, since clients generate VTL samples at predefined geographic locations, compared to sending samples at periodic time intervals. The rationale for this approach is that in certain locations traffic information is more valuable and certain locations are more privacy- sensitive than others. Through careful placement of trip lines the system can thus better manage data quality and privacy than through a uniform temporal sampling interval. In addition, the ability to store trip lines on the clients can reduce the dependency on trustworthy infrastructure for coordination. These concepts are revisited in Section 4.2. 4.2.3 Architecture for Probabilistic Privacy Strict separation of identity information ( for communication) and location information ( for traffic monitoring). To achieve the anonymization of VTL samples from clients while authenticating the sender of VTL samples, we split the actions of authentication and data processing onto two different entities, an ID proxy server and a traffic monitoring server. By separately encrypting the identification information and the sensing measurements ( i. e., trip line ID, speed, and direction) with different keys, we prevent each entity from observing both the identification and the sensing measurements. Overview of system architecture. Figure 4.2 shows the resulting system architecture eventually implemented for the field experiment. It comprises four key entities: probe vehicles with the cell phone handsets, an ID proxy server, a traffic monitoring service provider, and a VTL generator. Each probe vehicle carries a GPS- enabled mobile handset that executes the client application. This application is responsible for the following functions: downloading and caching Chapter 4 Mobile Century 33 trip lines from the VTL server, detecting trip line traversal, and sending measurements to the service provider. To determine trip line traversals, probe vehicles check if the line between the current GPS position and the previous GPS position intersects with any of the trip lines in its cache. Upon traversal, handsets create a VTL sample comprising trip line ID, speed readings, timestamps, and the direction of traversal and encrypt it with the VTL server’s public key. Handsets then transmit this sample to the ID proxy server over an encrypted and authenticated communication link set up for each handset separately. Each handset and the ID proxy share an authentication key in advance. Figure 4.2: Virtual Trip Line: Privacy- Preserving Traffic monitoring System Architecture. This system was implemented and ran for the entire duration of the 100- vehicle deployment of the Mobile Century experiment ID proxy server handles identity information. The ID proxy’s responsibility is to first authenticate each client to prevent unauthorized VTL samples and then forward anonymized samples to the VTL server. Since the VTL sample is encrypted with the VTL server’s key, the ID proxy server cannot access the VTL sample content. It has knowledge of which phone transmitted a VTL sample, but no knowledge of the phones position. The ID proxy server strips off the identifying information and forwards the anonymous VTL sample to the VTL server over another secure communication link. Chapter 4 Mobile Century 34 VTL server handles location information. The VTL server aggregates samples from a large number of probe vehicles and uses them for estimating the real- time traffic status. The VTL generator determines the position of trip lines, stores them in a database, and distributes trip lines to probe vehicles when any download request from probe vehicles is received. Similar to the ID proxy, each handset and the VTL generator should share an authentication key in advance. The VTL generator first authenticates each download requester to prevent unauthorized requests and can encrypts trip lines with a key agreed upon between the requester and the VTL generator. Both the download request message and the response message are integrity protected by a message authentication code. Advantages of this architecture. The above architecture improves location privacy of probe vehicle drivers through several mechanisms. First, the VTL server must follow specific restrictions on trip line placements that we will describe in Section 4.2. This means that a handset will only generate samples in areas that are deemed less sensitive and not send any information in other areas. By splitting identity- related and location- related processing, a breach at any single entity would not reveal the precise position of an identified individual. A breach at the ID proxy would only reveal which phones are generating samples ( or are moving) but not their precise positions. Similarly, a breach at the VTL server would provide precise position samples but not the individual’s identities. Separating the VTL server from the VTL generator prevents active attacks that modify trip line placement to obtain more sensitive data. This is, however, only a probabilistic guarantee because tracking and eventual identification of outlier trips may still be possible. For example, tracking would be straightforward for a single probe vehicle driving along on empty roadway at night [ 55]. The outlier problem in sparse traffic situations can be alleviated by changing trip lines based on traffic density heuristics. Trip lines could be locally deactivated by the client based on time of day or the clients speed. They could also be deactivated by the VTL generator based on traffic observations from other sources such as loop detectors. 4.3 Implementation The architecture described above was implemented using Nokia N95 smartphone handsets, which include a full Global Positioning System receiver that can be accessed by application software. 4.3.1 Map Tiles and Trip Lines Quadrant representation. In our system, we recursively divide the geographic region of interest into four smaller rectangles ( or quadrants), and the minimum quadrant size is 1m by 1m. We convert the GPS location of a user into a Mercator projection using the WSG84 world model. Mercator projects the world into a square planar surface. A zoom of 25 is assumed to be the maximum precision that location can be specified in. By default every GPS location is converted into 25 bit x and y values with zoom set to 25. By using the quadrant representation the mobile device can efficiently control the granularity by simply changing the zoom level. In this Chapter 4 Mobile Century 35 encoding, the world is treated as a square grid of four quadrants with zoom level 2, where x and y are the offsets from the top left corner of the world. VTLs contained in map tiles. This representation makes it easy to specify the specific map tile. We define a map tile as a container that groups all trip lines within it. When a client wants to download all virtual trip lines within the San Francisco Bay Area, it sends the VTL server the triplet, () zoomxy for the corresponding region. In our implementation, we choose 12 as the default zoom level, which corresponds to an 8 km by 8 km square. Memory requirements. This representation also helps in reducing storage size and bandwidth consumption. Since the general area is identified by the quadrant, we only store the 13 least significant bits of the trip line end point coordinates instead of the full 25 bits used for typical UTM coordinates. This decreases storage consumption to 68bits ( 15 bit id, 1 bit direction, 413 bits coordinates) per trip line. As an example of required storage and bandwidth consumption, consider the San Francisco Bay Area, the total road network of which contains about 20,000 road segments, according to the Digital Line Graph 1: 24K scale maps of the San Francisco Bay Area Regional Database ( BARD [ 1], managed by USGS). Assuming that the system on average places one trip line per segment this results in 166KB of storage. 4.3.2 Client Device and Software Client hardware and software. We implemented the client software using J2ME ( Java Platform, Micro Edition) on an Nokia N95 handset. This Symbian OS handset uses an ARM11- based Texas Instruments OMAP2420 processor running at 330MHz, and it contains 64MB RAM and 160MB internal memory. Its storage can be expanded up to 8GB with flash memory. We use the JSR 179 library ( Location API for J2ME) [ 2] for communicating with the internal TI GPS5300 NaviLink 4.0 single- chip GPS/ A- GPS module to set the sampling period and retrieve the position readings. This setup did not provide speed information. Instead, we calculate the mean speed using two successive location readings ( in our implementation, every 3 seconds). The client software registers the task for checking the traversal of trip lines as an event handler for GPS module location samples, which is automatically invoked whenever a new position reading becomes available. Communication protocol. The communication between the handset and the ID proxy server, to send updated lists of VTLs or to request VTL downloads, is implemented via HTTPS GET/ POST messages. The client software encrypts the message content but not the handset identification information using the public key of the VTL server so that only the VTL server with the corresponding private key can decrypt the message. To save network bandwidth and to reduce delay, we cache the downloaded trip lines for the nine map tiles closest to the current position in local memory. When a vehicle crosses a tile boundary, it initiates VTL download background threads for the missing tiles. Chapter 4 Mobile Century 36 4.3.3 Servers and Databases VTL database server. At the bottom of the hierarchy of our server implementation is a backend database server. The database server contains two databases. First is a VTL database which holds GPS coordinates of all trip lines. In future we plan to enhance our trip line database to hold meta data associated with that trip line. For instance, the meta data for a trip line can contain the posted speed limit at that trip line which can be used by the client application to decide if it is going over the speed limit in which case the client application can disable the transmission of VTL samples. Write access to this database is restricted only to traffic administrators who can add, delete or update a VTL. Figure 4.3: Road networks extracted from Bay Area DLG files ( Left) and Trip Lines per road segment in Palo Alto CA ( Right). Traffic database server. The second database is the VTL sample measurement database. This database stores the VTL samples sent by the mobile device whenever the mobile device chooses to send a sample after crossing a VTL. The sample database simply appends every VTL sample along with a time stamp on when the sample was received. To sanitize bogus VTL samples from the clients, the VTL sample database also keeps both the encrypted and decrypted versions of the VTL sample for further investigation in collaboration with the ID proxy server. When bogus VTL samples are detected in the VTL sample database, their encrypted versions are compared to the encrypted version stored in the ID proxy server to blacklist the originator of bogus VTL samples. Chapter 4 Mobile Century 37 Database implementation. We use Microsoft SQL to implement the databases, and we develop the VTL server using J2EE ( Java Platform, Enterprise Edition) and JDBC ( Java Database Connectivity) to control the SQL databases that are connected to the VTL server. While we have used only a single DB server in this prototype, the two databases should ideally be implemented by different entities to prevent active trip line modification attacks by a compromised traffic monitoring entity. ID Proxy Server. On top of the database server is the ID Proxy server. The identification proxy server is envisioned to be operated by an entity that is independent of the traffic service provider. We implement the ID proxy server as a servlet- based web server that takes in HTTPS GET/ POST messages from clients and forwards messages to the VTL server. The HTTP message received by the proxy server from the client has two components. The first component contains the mobile device identification information, namely phone number of the message origin. This component of the message is required for all cell phone communications as operator needs to appropriately charge for data communication costs. The second component of the message contains information that is intended for the database server. The proxy server strips all the identification information from the message, namely the first component of the message, and passes on the second component of the message to the application server. We implemented the secure channel between ID proxy server and the VTL server using WSDL ( Web Service Definition Language)- RPC ( Remote Procedure Call) over J2EE Server. Figure 4.4: Comparison of the speed measurements recorded from the N95 ( dots), the VTLs ( boxes) and the vehicle speedometer ( circles) as a function of time. Chapter 4 Mobile Century 38 4.1 Experimental Deployment The implementation described above was used for several experimental deployments. The correct operation of the traffic monitoring system was first demonstrated with an initial test along I- 80. A second test involving twenty cars was performed to measure data quality and to inform the design of the 100 vehicle deployment. Figure 4.5: Satellite image of the first experiment site I- 80 near Berkeley, CA. The red lines represent the locations of the VTLs, the blue squares show the speed recorded by the VTL, and the green squares represent the position and speed stored in the phone log. The brown circles represent the readings from the vehicle speedometer. 4.1.1 Velocity Measurement Accuracy GPS speed and position accuracy. A first experiment was performed to estimate the position and speed accuracy of a single cell phone carried onboard a vehicle. The experiment route consisted of a single 7- mile loop on I- 80 near Berkeley, CA. VTLs were placed evenly on the highway every 0.2 miles. Speed and position measurements were stored locally on the phone every 3 seconds, and speed measurements were sent over the wireless access provider’s data network every time a VTL was crossed. The speed measurements were computed using two consecutive position measurements. In order to substantiate the correctness of the data, Chapter 4 Mobile Century 39 vehicle speed was also recorded directly from the speedometer on a laptop with a clock synchronized with the N95. In Figure 4.4, the speed measured directly from the vehicle sp |
|
|
| B |
| C |
| I |
| S |
|
|