HSR Ridership Forecasting Models
Review
State Senate Testimony ( 11/ 4/ 10)
Samer Madanat ( UC Berkeley)
David Brownstone ( UC Irvine)
The importance of ridership forecasts
• Evaluation of the feasibility of new infrastructure projects
require forecasting costs and benefits
• Construction costs are typically easier to estimate than
operations costs, which depend on ridership
• Benefits ( reduction of congestion on freeways or at
airports, reduction in GHG emissions and criteria
pollutants, etc.) depend on ridership forecasts
• Revenues, and thus the financial soundness of new
systems are primarily a function of ridership forecasts
Development of demand models
Forecasting demand for existing transportation
systems is performed by developing demand
models and then applying them:
1. Collect observations about current users’ mode
choices and socio- economic characteristics; also
collect observations of the attributes of the available
transport modes ( explanatory variables)
2. Use data and statistical methods to estimate the
parameters ( any unknowns in the model) of demand
models
3. Apply demand models with estimated parameters to
forecast future usage, by using forecasts of
explanatory variables as inputs
Forecast bias and variance
• Bias: the difference between the realized
ridership and the mean model forecast
– The mean forecast: if we repeat the model
development exercise N times, obtain N models,
producing N forecasts, then take their average
– If correct statistical methods used, parameter
estimates and forecasts are unbiased
• Variance: the variation of the individual forecasts
around the mean model forecast
– Variance can never be eliminated but can be lowered
by using better models and data
Forecast bias and variance example
• Suppose we have two lotteries that both cost
$ 10. Lottery A pays $ 55 or $ 45 with equal
probability, and Lottery B pays $ 0 or $ 100 with
equal probability.
• For both A and B the forecast earnings are $ 40,
and this forecast is unbiased. If you play each
lottery many times, your average net winnings
will be $ 40.
• However the model forecast variance for Lottery
A is $ 50 and the model forecast variance for
Lottery B is $ 5000.
Forecast bias and variance example cont.
• Lottery A and Lottery B are similar to different
demand models.
• Being unbiased just means if you repeat the
demand modeling process independently
( including data collection) many times, your
average forecast will get close to being correct.
• Clearly we want to choose the demand model
( or the lottery) with the lowest forecast variance.
• At least we need to estimate the forecast
variance!
Review of the HSR travel demand models
Four major problems were identified:
1. Changing parameter estimation results to
match prior expectations → bias
2. Assuming passengers always select nearest
airport or train station → bias
3. Choice- based sampling of respondents →
bias and high variance
4. Excessive adjustment of estimated model
parameters to better match historical data →
understating forecast variance
1. Changing the estimated “ headway
parameter” to match expectations
• In urban travel, average waiting time is half the headway
( time between consecutive bus arrivals), because riders
arrive to bus stop without knowledge of bus arrival time
• Past experience has shown that waiting time is twice as
onerous as in- vehicle- travel- time ( IVTT)
• Thus, headway and IVTT are equally onerous in urban
travel
• HSR demand modelers were surprised that their results
showed that headway was less onerous than IVTT ( the
parameter of headway was smaller than that of IVTT)…
this result did not match their prior expectations
Changing the “ headway parameter” to
match expectations ( continued)
• But this result is not surprising: in intercity travel ( air and
rail travel), people arrive at airport according to their flight
schedule… they don’t arrive randomly at airport and take
the next available flight or train!
• Therefore, waiting time is NOT equal to half the
headway… and therefore, headway and in- vehicle-travel-
time are NOT equally onerous in inter- city travel.
• HSR modelers “ adjusted” the headway parameter in
their final results ( they made it equal to the IVTT
coefficient) to match their expectations!
• Effect: exaggerating the importance of frequent service
2. Assuming passengers always select
nearest airport or station
• Modelers assumed travelers always select the nearest
airport or train station
• In reality, people consider schedule convenience, access
time and other factors in selecting airport or station
• In Bay Area, assuming that South Bay passengers
choose nearest station meant these travelers faced a
HSR headway in the Altamont alignment that it twice as
long as the headway in the Pacheco alignment
• Together with exaggerated “ headway” parameter, this
made the Altamont alignment less attractive… this must
be the reason for bizarre behavior of model forecasts.
Sacramento
Stockton
Modesto
Merced
Gilroy
San Jose
SFO
San Francisco
Mid Peninsula
Pacheco Altamont
80 M
+ 13M
93 M
94 M
- 7M
87 M
To SJ
To SF
Frequency
Halved
3. Sampling of respondents
• Ideally, obtain a random sample, which is representative
of the population
• But a random sample may not contain a sufficient
number of users of low- share modes ( e. g., rail)
• A choice- based sample was utilized: users of low- share
modes ( rail and air) were oversampled
• A choice- based sample is not representative of the
population, and this leads to biased estimates of the
parameters and biased model forecasts
• This bias must be accounted for after model parameter
estimation by applying a correction
Correcting for choice- based sampling bias
• For some simple models ( MNL), the bias is limited to a
subset of parameters ( the “ constants”)
• The constants can be adjusted by calibrating the model
against observed demand ( from a previous year), thus
eliminating the bias
• For complex models ( including NL used in HSR
forecasting model), all parameters ( not only the
constants) are biased.
• Here, the parameter bias is not eliminated by adjusting
the model parameters through “ calibration”
• Effect: introduce Bias in the forecasts
4. Adjustment of model parameters
• Adjusting the constants by “ calibrating” the
model forecasts against observed demand is
legitimate for some models ( MNL).
• Not legitimate: adjusting other parameters from
values obtained by statistical estimation to:
– Match the a- priori expectations of modelers
– Provide better match between the model “ back- cast”
and observed demand ( from a previous year)
• Effect: understate the Variance of the forecast
and possibly add bias
So what?
• The HSR forecasting modelers act as if their
model is like Lottery A – unbiased with low
forecast variance.
• In fact, the current HSR forecasting model is
worse than Lottery B – it has very high forecast
variance and it is biased
• Remember that the actual HSR forecasts are
similar to just one play of the lottery – so the high
forecast variance implies the forecasts can be
very far away from what will actually happen if
the HSR system is built.
Conclusions
• Parameters of CA HSR demand models are
biased, which leads to biased ridership forecasts
• Comparison of Altamont vs. Pacheco tainted by
incorrect adjustment of the headway parameter
and pre- assignment of travelers to stations
• Variance of ridership forecasts are certainly
understated and likely very large
• Models should be revised to minimize bias and
reduce variance of forecasts