
small (250x250 max)
medium (500x500 max)
Large
Extra Large
large ( > 500x500)
Full Resolution


i Sacramento’s Fix I 5 Project: Impact on Bus Transit Ridership By RACHEL A. CARPENTER B. S. ( California Polytechnic State University, San Luis Obispo) 2008 THESIS Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in Civil and Environmental Engineering in the OFFICE OF GRADUATE STUDIES of the UNIVERSITY OF CALIFORNIA DAVIS Approved: _____________________________________ Chair H. Michael Zhang _____________________________________ Patricia L. Mokhtarian _____________________________________ Alexander Aue Committee in Charge 2010 ii ACKNOWLEDGEMENTS I would like to thank my committee chair and advisor, Professor Michael Zhang, for his guidance, suggestions and financial support. I would like to thank my committee members, Professor Patricia Mokhtarian and Professor Alexander Aue, for their advice and constructive comments on my thesis. I would like to show gratitude to my student colleagues Zhen ( Sean) Qian, Yi Ru Chen, and Wei Shen for their ever present willingness to help, and also for their friendships. Finally, I would like to thank my parents, Linda and Dave, and sister, Molly, for their continued support and patience during my studies at UC Davis. This research was funded by the Cal EPA. iii CONTENTS Chapter 1 INTRODUCTION ............................................................................................... 1 1.1 Purpose ...................................................................................................................... 3 1.2 Analysis Scope .......................................................................................................... 3 1.3 Gap in Knowledge ..................................................................................................... 4 1.4 Response to the Event ............................................................................................... 5 1.4.1 City of Sacramento Traffic Operations Center ................................................... 5 1.4.2 Government Media Outreach ............................................................................. 7 1.4.3 Private Media Outreach ...................................................................................... 9 1.4.4 Transit Agency Outreach and Preparation ........................................................ 10 1.5 Organization of Analysis ......................................................................................... 11 Chapter 2 LITERATURE REVIEW .................................................................................. 13 2.1 Time Series .............................................................................................................. 14 2.1.1 Background ....................................................................................................... 14 2.1.2 Trend Components ........................................................................................... 17 2.1.3 Seasonal Components ....................................................................................... 19 2.2 Goodness of fit Tests .............................................................................................. 21 2.3 Multicollinearity ...................................................................................................... 23 2.4 Lagged Variables..................................................................................................... 24 2.5 Box Jenkins ( ARMA) Models ................................................................................ 25 2.6 Intervention Analysis .............................................................................................. 27 2.7 Review of Relevant Past Studies ............................................................................. 29 2.7.1 Predicting Transit Ridership Using Multiple Regression ................................. 31 2.7.2 Transit Ridership and Intervention Analysis .................................................... 32 2.8 Summary of Literature Review ............................................................................... 34 Chapter 3 DATA DESCRIPTION ..................................................................................... 35 3.1 Methods of Data Collection .................................................................................... 35 3.2 Data Sample ............................................................................................................ 36 3.2.1 Regional Transit ............................................................................................... 37 3.2.2 Yolobus ............................................................................................................. 39 iv 3.2.3 Roseville Transit ............................................................................................... 40 3.2.4 North Natomas TMA ........................................................................................ 41 3.2.5 Yuba Sutter Transit .......................................................................................... 42 3.3 Data Filtering........................................................................................................... 42 3.3.1 General Procedure ............................................................................................ 42 3.3.2 Special Modifications to General Procedure for Regional Transit................... 44 3.4 Independent Variables ............................................................................................. 45 3.5 Data Quality ............................................................................................................ 47 3.5.1 Automatic Passenger Counting Devices ........................................................... 48 3.5.2 Electronic Registering Fareboxes ..................................................................... 49 3.5.3 Manual Counts by Route Checkers .................................................................. 50 3.5.4 Manual Counts by Bus Drivers ........................................................................ 51 3.5.5 Additional Data Quality Considerations........................................................... 52 3.6 Data Cleaning .......................................................................................................... 53 3.7 Descriptive Statistics for Transit Ridership ............................................................ 57 3.7.1 Measures of Central Tendency ......................................................................... 57 3.7.2 Measures of Dispersion .................................................................................... 62 3.7.3 Discussion ......................................................................................................... 63 Chapter 4 MODEL BUILDING ........................................................................................ 65 4.1 Multiple Regression ................................................................................................ 65 4.1.1 Bus Transit Ridership and Gas Prices .............................................................. 66 4.1.2 Bus Transit Ridership and Unemployment Rates ............................................. 67 4.1.3 Bus Transit Ridership and Gross Domestic Product ........................................ 69 4.1.4 Bus Transit Ridership and Transit Fares .......................................................... 70 4.2 Sinusoidal Decomposition....................................................................................... 72 4.3 Intervention ............................................................................................................. 72 Chapter 5 RESULTS.......................................................................................................... 75 5.1 Eliminating Trends: Details of Multiple Regression............................................... 75 5.2 Eliminating Seasonal Components in the Data: Details of Sinusoidal Decomposition .............................................................................................................. 81 5.3 Intervention Analysis: Details of the Fix I 5 Impact............................................... 85 v 5.4 Significance of Results ............................................................................................ 87 5.5 Discussion ............................................................................................................... 90 5.6 Implications for Transit Agencies for Future Road Closure Work ......................... 93 5.7 Threats to Validity ................................................................................................... 94 Chapter 6 CONCLUSIONS ............................................................................................... 96 6.1 Summary ................................................................................................................. 96 6.2 Future Work .......................................................................................................... 102 REFERENCES ................................................................................................................ 104 APPENDICES ................................................................................................................. 111 A. City of Sacramento Traffic Operations Center Visit, August 21, 2008 ................. 111 B. Original and Cleaned Transit Agency Ridership Data Sets ................................... 118 1. North Natomas TMA AM Ridership ................................................................... 118 2. North Natomas TMA PM Ridership ................................................................... 119 3. Roseville Transit AM Ridership .......................................................................... 120 4. Roseville Transit PM Ridership .......................................................................... 121 5. Yolobus Ridership ............................................................................................... 122 6. Yuba Sutter AM Ridership .................................................................................. 123 7. Yuba Sutter PM Ridership .................................................................................. 124 8. Regional Transit AM Ridership .......................................................................... 125 9. Regional Transit PM Ridership ........................................................................... 126 C. Independent Variable Data Sets ............................................................................. 127 1. Gas Price Independent Variable .......................................................................... 127 2. Unemployment Rate Independent Variable ........................................................ 127 3. Gross Domestic Product Independent Variable................................................... 128 D. Holiday and Limited Service Imputation Dates ..................................................... 129 1. Year: 2006 ........................................................................................................... 129 2. Year: 2007 ........................................................................................................... 131 3. Year: 2008 ........................................................................................................... 132 E. Ad Hoc Data Imputation Method Details ............................................................... 135 F. Histograms for Each Transit Agency ...................................................................... 136 G. Multiple Regression Model Selection .................................................................... 138 vi H. Transit Agency Periodograms ................................................................................ 143 1. Yuba Sutter AM Periodogram ............................................................................. 143 2. Yuba Sutter PM Periodogram ............................................................................. 143 3. Yolobus Daily Periodogram ................................................................................ 144 4. Roseville Transit AM Periodogram ..................................................................... 144 5. Roseville Transit PM Periodogram ..................................................................... 145 6. North Natomas AM Periodogram........................................................................ 145 7. North Natomas PM Periodogram ........................................................................ 146 8. Regional Transit AM Periodogram ..................................................................... 146 9. Regional Transit PM Periodogram ...................................................................... 147 I. Intervention Analysis Model Results ....................................................................... 148 J. Goodness of fit Tests............................................................................................... 152 vii TABLE OF FIGURES AND TABLES Figures Figure 1.1: The Fix I 5 Construction Area ......................................................................... 2 Figure 1.2: City of Sacramento T. O. C. ............................................................................... 6 Figure 1.3: The Fix I 5 Website Encouraged Transit ......................................................... 7 Figure 1.4: Informational Documents Regarding Fix I 5 Closures .................................... 8 Figure 3.1: Plots of Original Roseville Peak Period Ridership Data................................ 53 Figure 3.2: Roseville and Yuba Sutter Transit Histograms .............................................. 59 Figure 5.1: Yuba Sutter Transit AM Peak Periodogram .................................................. 82 Tables Table 3.1: Data Collection Details .................................................................................... 36 Table 3.2: Transit Lines Servicing the Downtown Core .................................................. 43 Table 3.3 AM Peak Period Definitions of Each Data Set ................................................. 43 Table 3.4: PM Peak Period Definitions of Each Data Set ................................................ 43 Table 3.5: Sample Size of Each Data Set ......................................................................... 44 Table 3.6: Independent Variable Details .......................................................................... 46 Table 3.7: Fare Pricing Details ......................................................................................... 47 Table 3.8: Percent Imputed Data ...................................................................................... 56 Table 3.9: Measures of Central Tendency: Mean and Median ......................................... 58 Table 3.10: Yearly Means and Medians for Transit Agencies with Data Spanning 2006 2008........................................................................................................................... ....... 59 Table 3.11: Means by Season for 2006, 2007 and 2008 ................................................... 61 Table 3.12: Means by Construction Period for 2008 ........................................................ 62 Table 3.13: Variance and Standard Deviation for Each Transit Agency .......................... 63 Table 4.1: Ridership Gas Price Correlation Coefficients ................................................. 67 Table 4.2: Ridership Unemployment Correlation Coefficients ........................................ 69 Table 4.3: Ridership Fare Correlation Coefficients.......................................................... 71 Table 5.1: Adjusted R2 With and Without the GDP Independent Variable...................... 78 Table 5.2: Statistically Significant Predictors of Bus Transit Ridership .......................... 80 Table 5.3: Means by Construction Period for 2008 After Detrending Data ..................... 81 Table 5.4: Statistically Significant Periodic Components for Each Agency .................... 83 Table 5.5: Intervention Analysis Final Model Results ..................................................... 87 Table 5.6: Final Model Significance ................................................................................. 88 Table 5.7: Interpretation of Model Significance ............................................................... 90 Table 6.1: Significant Periodic Components of Each Transit Agency ............................. 99 Table 6.2: Intervention Model Summary ................................................................... 101 viii ABSTRACT The Fix I 5 project was an engineering project that rehabilitated drainage and pavement on Interstate 5 in downtown Sacramento, from May 30, 2008 to July 28, 2008. In order to alleviate congestion, media outreach alerted commuters about projected traffic conditions as well as advised alternative modes or routes of travel. The construction schedule included complete closures of north or southbound portions of Interstate 5. This study analyzed the impact of the Fix I 5 project closures on peak period bus transit ridership of five transit agencies serving the downtown Sacramento core. The results indicated that gasoline prices and unemployment rates were statistically significant predictors of transit ridership, with increased gasoline prices and unemployment related to increased bus transit ridership. All agencies had overall increases in mean ridership during the study period, but there were also seasonal variations in mean ridership. Removal of trend and seasonal components in the bus transit ridership data sets was accomplished using multiple regression and sinusoidal decomposition. Time series intervention analysis then estimated that the Fix I 5 project had little impact on mean number of bus riders for all five transit agencies. Bus transit agencies with main service areas closest to the Fix I 5 project were most affected, with ridership increases of about three percent or less attributable to Fix I 5. This study did not analyze the impact of Fix I 5 on other modes of transportation, which may have been more affected than bus transit ridership. 1 CHAPTER 1 INTRODUCTION Interstate 5 ( I 5) is a major interstate that runs north south, connecting Mexico to Canada through California, and was started in 1947 by the Federal Highway Administration. The downtown Sacramento portion of I 5 was completed in the 1960’ s and is nicknamed the “ Boat Section” because it was constructed below the water level of the Sacramento River, which runs adjacent to the freeway ( Caltrans, 2008). In order to construct the boat section of the freeway, Caltrans had to initially drain this section, and engineer a drainage system of pipes and pumps. The boat section was manually monitored during each winter season to ensure pumps were working properly. After over 40 years and without major renovation, pavement cracking and sediment accumulation required the boat section to undergo repair, and an opportunity was provided for drainage system upgrades. The California Department of Transportation ( Caltrans) Engineers’ Estimate projected that the rehabilitation of drainage and pavement of Interstate 5 in downtown Sacramento, dubbed “ Fix I 5,” would take 305 working days at a cost of more than $ 44 million ( C. C. Myers, Inc., 2009). On February 2, 2008, a Rancho Cordova based engineering firm, C. C. Myers, Inc., won the Fix I 5 project bid with a proposed 85 working days and 29 night and weekend schedule at a substantially lower cost of $ 36.5 million, with financial incentives for earlier completion ( Caltrans, 2009). Aggressive and compressed construction schedules are not novel for C. C. Myers. Their resume includes more than 17 emergency projects for the State of California, including emergency work on the San Francisco Bay Area’s 2007 MacArthur Maze meltdown ( C. C. Myers, Inc., 2009). Although not emergency work, the Fix I 5 project specifically included a reconstructed six inch pavement slab, an upgraded drainage system, new de watering wells, and installation ( Solak, 2008). The project was completed in a shorter period than predicted, from May 30, 2008 to July 28, 2008 in 35 days and 3 weekends The Fix I 5 construction schedule portions of Interstate 5 through Sacramento emergency construction. Sacramento each day ( Schwarzenegger, 2008). periods, traffic congestion could increase nineteen times ( closure periods, traffic was detoured to arterial streets and other freeways. alleviate congestion, media outreach alerted commuters about projected traffic conditions as well as advised alternative modes of travel. Employers, including government which is one of the largest employers in the area with 75,000 commute encouraged employees to use alternative modes of travel ( Figure 1.1: The Fix I 5 Construction Area of electronic monitoring equipment . using full unidirectional closure periodically closed entire northbound or southbound Sacramento, a relatively new technique for non Approximately 200,000 vehicles travel on Interstate 5 in mento Reports projected that during closure Schwarzenegger, 2008 California state Schwarzenegger, 2008 2 closures. , non 2008). During In order to commuters, 2008). 3 1.1 Purpose The main objective of this analysis is to examine the effect that the Fix I 5 project had on commuters' mode choices, more specifically bus transit ridership ( supplementary studies are examining the impact of Fix I 5 on other modes of travel). This objective includes the determination of whether the Fix I 5 project caused changes in mean bus transit ridership levels, whether this effect on ridership was permanent or temporary, and the magnitude of the effect. This research includes not only those statistics, but also provides information for service changes for bus transit agencies that need to prepare for future planned construction work, which includes freeway closures such as Fix I 5, and also for unplanned events which force closures. 1.2 Analysis Scope The primary focus of media outreach was to suggest alternate transportation for those who commute on I 5. State governments and other employers with a large number of employees in the downtown Sacramento core urged employees to use alternate transportation during the Fix I 5 period. Consequently, this study analyzed bus transit agencies’ data from the morning ( AM) and evening ( PM) peak periods. The boundaries of the downtown core were defined as follows: the south boundary defined by the 50/ 80 freeway, the north boundary defined by Richards Blvd, the west boundary defined by the Sacramento River and the east boundary defined by the Business 80/ 99 freeway. Bus stops directly below freeway boundaries were considered part of the downtown core. This corresponds to other transit agencies’ definitions of downtown Sacramento. Since this analysis focused on commute behavior, only inbound ridership was considered for the AM peak period, while outbound ridership was considered for the PM peak period. 4 Inbound trips are defined as those trips with a final destination within the downtown core, while outbound trips originate within the downtown core but have a final destination outside it. The AM peak period is the primary morning commute period, but specific hours varied by transit agency. The PM peak period is the primary afternoon commute period and also varied by transit agency. In general, the peak periods occurred between the hours of 5: 00AM to 9: 00AM, and 3: 00PM to 7: 30PM. In order to accurately assess bus transit ridership in the downtown Sacramento area, this analysis employed bus transit ridership counts for five transit agencies which provide commute service to the Sacramento downtown core, including: Yuba Sutter Transit, Yolobus, Roseville Transit, North Natomas Transportation Management Association ( TMA) and Sacramento Regional Transit. As state workers comprise 75,000 commuters in Sacramento, and many state agencies have headquarters in the downtown core, the commute choices made by that group likely had a sizable impact on this study’s data sets. 1.3 Gap in Knowledge In general, many studies have examined transportation related data using time series methods, although not many have examined bus transit ridership. Few time series studies have analyzed bus transit ridership affected by an outside event ( an intervention) using intervention analysis. To date, there are no known studies that examine the intervention of construction work on bus transit ridership. 5 1.4 Response to the Event Many public and private agencies united to publicize, prepare and provide for public safety for the Fix I 5 project. These measures included public outreach, intercity and interagency partnerships including the City of Sacramento, City of West Sacramento, Sacramento Area Council of Governments, Downtown Sacramento Partnership, and the Old Sacramento’s Merchant’s Association. Other efforts included announcements via changeable message signs and highway advisory radio, and California Highway Patrol enforcement in the construction area. Much media outreach was done to warn commuters about traffic conditions and suggest alternative modes of travel. Additionally, various media sources made information about up to date information regarding the Fix I 5 project’s progress easily available to the general public. The Governor’s Executive Order ( S 04 08) cited Assembly Bill 32, the California Global Warming Solutions Act of 2006, and advised alternatives to widely used single occupant vehicle commuting including telecommuting and public transit. Some of the private entities that provided information included News 10, the Sacramento Bee, Sacramento Region 511, and Capital Public Radio, as well as some private business websites. Transit agencies responded to the Fix I 5 project by media outreach that advertised the convenience and availability of transit. 1.4.1 City of Sacramento Traffic Operations Center An operational tactic for traffic management is the use of traffic operations centers ( TOC). The City of Sacramento’s single jurisdiction, single agency TOC is operated by the City of Sacramento Traffic Engineering Services Department and funded by Measure A, the gas tax. The goal of their TOC is twofold; first, they must make Sacramento City’s 6 transportation network efficient for all transportation modes, and second, they must make the system reliable. Many steps were taken by the TOC in order to ensure their responsibilities were fulfilled during the Fix I 5 project. Planning steps included ( City of Sacramento, 2008): • Identification of potential problem corridors • Signal maintenance • Construction of Synchro ( transportation modeling software) Model • Modified signal timing plan • Coning & striping plan The TOC makes use of many tools for network monitoring and operation, especially useful during the Fix I 5 project, including ( City of Sacramento, 2008): • Closed circuit television ( CCTV) • Advance signal control systems • Sacramento Police Department Helicopter • Sacramento Police Officers • Signal and signage crews • Traffic cameras ( 8 Cameras in 2 streams) • Multi agency Construction Advisory Team ( CAT) • Traffic Alerts • Media Contacts Figure 1.2: City of Sacramento T. O. C. 7 A more detailed summary of the City of Sacramento TOC, based on a field visit on August 21, 2008, is provided in Appendix A. 1.4.2 Government Media Outreach Although all of the sources provided useful information, the official Fix I 5 website, supported by Caltrans, was the most comprehensive and accessible ( although no longer active circa August 2008). This website included daily updates ranging from construction updates to detours. It included sections on current work and a history of the portion of I 5 to be repaired ( the Boat Section). It also included useful links such as 511 Travel Info, Live Traffic Cameras, and Commute Alternatives. It also provided links to many downtown area businesses, some offering specials to entice people to stay downtown and avoid peak period travel. Caltrans also hosted three public meetings regarding Fix I 5 in Downtown Sacramento, Natomas and South Sacramento. They gave numerous presentations to audiences including state and local government agencies, residential organizations, private businesses and public officials, and reached an estimated 10,000 people. In addition to the Fix I 5 website and public meetings Caltrans provided public informational documents. They sent out an email to all Sacramento Personnel Departments which included recommended alternatives to normal work days, including Figure 1.3: The Fix I 5 Website Encouraged Transit 8 revised work schedules, telecommuting and public transportation. Caltrans provided paycheck stuffers to Sacramento Area employers through Public Outreach Contractors. This document advised departments to reschedule or postpone meetings and events that draw people to downtown. It also informs about a Cal EPA hotline set up for state workers who needed commute assistance during the Fix I 5 project. Caltrans outreach contractors made information cards available to Sacramento businesses located in the downtown area. These cards provided basic facts about the Fix I 5 project, as well as provided the Fix I 5 website address. Figure 1.4: Informational Documents Regarding Fix I 5 Closures Additionally, Assembly member Dave Jones' office sent out a letter to his constituents warning them about the Fix I 5 project, and traffic delays they might encounter. He also encouraged alternate forms of transportation during construction, as well as encouraging shopping or dining with downtown merchants during peak hours. Although not as comprehensive as the official Fix I 5 website, the City of Sacramento website provided information about the Fix I 5 project. The City of Sacramento also provided parking promotions for six of their parking garages for most of the duration of the Fix I 5 project. 9 1.4.3 Private Media Outreach Many private agencies also provided information regarding Fix I 5. In general, these postings included general and up to date information about the Fix I 5 Project, but some businesses provided unique information. The News 10 website allowed people to “ comment, blog and share photos;” an option not available on the Fix I 5 website. This feature allowed users to share alternate routes through blogs. It also provided Sacramento travel times, as well as easy to read color coded maps that showed lane closure information. The Sacramento Bee provided coverage regarding the Fix I 5 project, through their newspaper publication and website, which provided mobile alerts, a blog jam, and a complete listing of the Fix I 5 stories which were published in the Sacramento Bee newspaper. The Sacramento Region 511 website permanently provides information about traffic, transit, ridesharing and bicycling. They provided minimal coverage regarding the Fix I 5 project, but links to information on transit providers, finding carpools and vanpools, and a guide to bicycle commuting may have been particularly useful to downtown commuters. Capital Public Radio’s website provided information about Fix I 5, including a clever ‘ Jam Factor’ scale on their website showing congestion levels on Sacramento area freeways including both north and south bound I 5. Additional Sacramento area businesses posted information about the Fix I 5 project including the NBA Sacramento Monarch’s Basketball team, Natomas Racquet Club, California State University Sacramento, Talk Radio 1530 KFBK, and YouTube. 10 1.4.4 Transit Agency Outreach and Preparation To prepare riders for the Fix I 5 construction, Regional Transit ( RT) posted a press release on their website encouraging people to take transit during the construction period. With additional funding from Caltrans, RT was able to provide supplemental bus and light rail services that increased both capacity and reliability during their peak commuting hours. RT kept ten buses on standby during the construction period and advised passengers to take earlier buses when possible. RT also reminded the public of the 18 park and ride lots available throughout Sacramento. To prepare for the I 5 construction, Yolobus provided an I 5 Construction Options guide in their newsletter. The guide warned passengers of delays and advised them to take earlier morning buses to avoid these delays. Yolobus also took several measures to alleviate overcrowding and delays during the construction period. They had up to two supplemental buses on standby in case other buses were running behind. Yolobus added two morning and two afternoon express trips to both route 45 ( service between Sacramento and Woodland) and to route 43 ( service between Sacramento and Davis). In addition, Yolobus sold discounted Capitol Corridor train tickets in order to encourage drivers to take transit during the construction period. To accommodate for the Fix I 5 construction, Roseville Transit posted information on their website regarding the Governor’s Executive Order urging government employees to take transit during the construction. Roseville Transit encouraged new commuter passengers and listed on their website the AM and PM commuter routes with available seating. 11 In preparation for the Fix I 5 construction, North Natomas T. M. A. posted information in a specific Fix I 5 email newsletter about service changes for the construction period, including loop and route changes that went into effect on June 2, 2008. Additionally, supplemental shuttles and drivers were provided to ease the impact of the anticipated higher ridership during the construction period. The T. M. A. was able to provide additional shuttles with extra funds provided by Caltrans for the construction period, but were required to provide daily counts of AM and PM shuttle riders for each loop. North Natomas T. M. A. also created a special shuttle hotline for passengers to call for up to date information about route changes and delays during this period. In addition to the supplemental schedules, Yuba Sutter took several other measures to accommodate for the I 5 construction. Route or schedule changes were not made with the exception of minor detours during northbound I 5 closures. Second, Yuba Sutter had additional buses on call in the event that any early morning buses became overcrowded. Third, Yuba Sutter used all buses during the construction period, whereas they normally keep three buses non operational. And finally, Yuba Sutter closely monitored traffic conditions, which was made possible by improved connections with Caltrans, the City of Sacramento, and Regional Transit. 1.5 Organization of Analysis The organization of the analysis is as follows. Chapter 2 provides an overview of important concepts in time series which is used in this analysis. It also describes past studies analyzing bus transit ridership, and more specifically those few that used intervention analysis to analyze the impact of an intervention on a time series data set. 12 Chapter 3 describes the transit agency data, including details of each agency’s samples and collection methods, as well as data quality considerations. It also includes information about data cleaning, which was needed to adjust for holidays and limited service days. Finally, data exploration is presented in two sections: measures of centrality and measures of spread for the transit agencies’ data sets. Both sections begin by briefly defining the statistics included in that section. Chapter 4 describes the methodology for eliminating trends and cyclic components, and the intervention analysis which examined the impact of the Fix I 5 construction on bus transit ridership. Chapter 5 presents the results of the intervention analysis for each agency, in addition to implications for bus transit agencies for future freeway closures. To conclude, Chapter 6 summarizes the analysis methods and results, and gives recommendations for future work. 13 CHAPTER 2 LITERATURE REVIEW Many studies have been conducted analyzing variables that impact transit ridership, primarily using two statistical methods of analysis; time series, and multiple regression. Some, categorized as econometric studies, use those two statistical methods with a focus on economic theory. Time series is used to analyze a series of data points, to understand the underlying order or context of the data. A review of the literature ( Cryer, 1986; Shumway and Stoffer, 2006; Brockwell and Davis, 2002; Anderson, 1976; Kendall, 1973; Kyte et al., 1988) identified a host of different methods used to model time series data, including but not limited to univariate and multiple time series models and transfer function models. Simple regression is used to analyze the change in a dependent variable as an independent variable changes or is manipulated, while multiple regression uses multiple independent variables ( Mann, 2004). However, all regression models assume that the error terms, and therefore response variable observations, are uncorrelated ( Kutner et al., 2005). In contrast, time series data often contains observations which are serially dependent ( Box and Tiao, 1975). Additional regression methods have been developed that are used for autocorrelated time series data. They employ typical regression techniques, but model the error term using time series models ( Tsay, 1984). Econometrics uses statistical methods to study economic principles ( Tinbergen, 1951). The primary focus is the evaluation of economic theory using statistical methods. Discussions of strict and weak stationarity, autoregressive models, and lag structures are found in both econometric time series literature and statistics time series literature. 14 However, a standard tool in econometrics is to use the structural econometric time series approach ( SEMTSA), which uses Box Jenkins methods but imposes a priori restrictions on the equations based on economic theories ( Christ, 1983). Further, this approach is commonly simplified to vector autoregression models ( VAR) which omit the moving average polynomial of the ARIMA model ( Zellner and Franz, 2004). Time series was the primary method of analysis used in this study, as autocorrelation was likely to be present in the transit ridership data. Time series analysis encompasses a wide range of models which can handle multiple scenarios within data sets. Time series intervention analysis was used, which provided a methodology to determine the effects of one event on a series. This study used the ARIMA class of time series models, which specify only causality and invertibility as restrictions on the parameters, a feature that was an advantage over models which place additional assumptions on the parameters. Regression was also used to analyze the relationship between multiple independent variables and transit ridership, and for eliminating trends related to independent variables in the transit ridership data sets. 2.1 Time Series Because time series is a method less commonly used in the field of transportation engineering, a brief overview is given in the following sections. 2.1.1 Background A time series ( xt) is a sequence of observations collected over time for one variable. Time series can be either continuous or discrete depending on how the observations have been collected. A time series is continuous if observations are taken continuously over time 15 whereas the series is said to be discrete if observations are taken at specific times ( Chatfield, 1975). Time series is concerned with chronologically ordered observations of time. Data that is observed over time, both discrete and continuous, is common across many disciplines. In the field of engineering, some examples include series observed over time such as traffic counts and water quality measures. There are many examples in economics, including profits, interest rates, as well as overall economic indicators such as gross domestic product and unemployment rates. In meteorology, a common observation that constitutes a time series is temperature. Because future observations could be hard to predict, a time series ( xt) is more technically a realization ( sample function) of a stochastic process ( Xt), which is a family of random variables ( Brockwell and Davis, 1987). Time series analysis focuses on studying a time series realization ( xt of Xt) in order to gain insight into the stochastic process ( Xt) ( Aue, 2009). In practical time series analyses, much of the work is devoted to transforming a nonstationary time series into a stationary process ( Fuller, 1976). Conceptually, stationarity is similar to equilibrium within a system. A time series is strictly stationary if its probability structure is not affected by time ( Anderson, 1971). In other words, the joint probability distribution of xt… xt+ n, is equivalent to the joint probability distribution of xt+ h… xt+ h+ n for all t,..., t + n T and h such that t + h,… , t+ h+ n T. ( Montgomery et al., 2008). A typically less strict definition of stationarity ( for cases where the variance is finite) is called weak stationarity, and is often used because distribution functions are commonly unknown. In order for a time series to be weakly stationary there are two conditions ( Shumway and Stoffer, 2006; Brockwell and Davis, 2002): 16 1. The first moment of xt is independent of time, t, and is constant. 2. The autocovariance function, defined as , , which depends only on lag h, and is independent of t. One important example of a stationary process is called white noise. White noise is commonly denoted ~ 0, , where Zt is a sequence of uncorrelated random variables with zero mean and finite variance, σ 2 ( Shumway and Stoffer, 2006). White noise is an important building block in time series analysis, as it is the foundation for many more complex processes ( Cryer, 1986). It is interesting to note that term white noise is derived from white light which is composed of a continuous distribution of wavelengths with the implication that white noise is composed equally of oscillations at all frequencies ( Shumway and Stoffer, 2006). Furthermore, if the series of shocks generated are not just uncorrelated ( a white noise process), but are independent and identically distributed, the sequence is called i. i. d., denoted ~ 0, ( Anderson, 1976). Further, if the series is normally distributed, it is both white noise and i. i. d.. Often, a time series ( Xt) can be well explained by a trend component ( mt), a seasonal component ( st), and a zero mean, random error component ( Yt) ( Chatfield, 1975). The process can be represented in the form . The following provides a short description of each component, although it should be noted that a time series model may exhibit any combination of these components: The trend component ( mt): Encompasses long run changes in mean. Trends can have many underlying causes including, but not limited to, changes in economic 17 conditions, technological changes and changes in social custom ( Farnum and LaVerne, 1989). The seasonal component ( st): Encompasses cycles at any recurrent period. This component can include obvious seasonal or annual cycles, or less apparent cycles occurring at any fixed period such as a daily, weekly, or quarterly basis. The noise component ( Yt): A zero mean, random error component. There are multiple methodological approaches to the analysis of time series data, more specifically to the removal of trend and seasonal components, including the use of both the time and frequency domains. Analysis in the time domain bases inference on the autocorrelation function, while analysis in the frequency domain pertains to inference based on the spectral density function. Both domains can be used to eliminate seasonal components, while trend components can only be eliminated in the time domain. In this study a decomposition method was used which identified and separately removed the trend and seasonal components from the series. The removal of trend components used methods associated with the time domain, and the removal of seasonal components used methods associated with the frequency domain. 2.1.2 Trend Components Analysis in the time domain includes methods for removal of both the trend and seasonal components including least squares estimation, smoothing with moving averages, differencing, small trend methods, and moving average estimation ( Aue, 2009). Additionally, trend components can be removed using regression techniques ( Yaffee, 2000). Aue ( 2009) provides a detailed description of each method. This study used 18 multiple regression to remove trend components. The multiple regression method is discussed below, in addition to differencing which is referred to in later sections: Multiple Regression: When there are four predictor variables, , , ! , " as in this analysis, the model is formulated as # $ # # # ! ! # " " % . In this study, the combination of the predictor variables ( # # # ! ! # " " ) constitutes the trend component, while the regression error term ( % ) constitutes both the seasonal and error terms ( st + Yt). As discussed previously, standard linear regression models assume that the error terms, % , and therefore, response variable observations, , are uncorrelated ( Kutner et al., 2005). Time series data, on the other hand, often contains observations which are serially dependent ( Box and Tiao, 1975). Therefore, modifications to standard linear regression would be necessary, including modeling the error terms as time series autoregressive moving average models ( Tsay, 1984; Ostrom, 1978). Differencing: Applies the difference operator to the original series in order to create a new, stationary series. The lag & difference operator ( ' ) is defined as ( Shumway and Stoffer, 2006): ' ( x * + , +  . . In practice, it is common to denote the use of the difference operator by using the backshift operator, B. In this case, ' ( x * + , +  . 1 , 0 . + 19 2.1.3 Seasonal Components This study’s decomposition method removed the trend components using multiple regression, and removed seasonal components from the series using a frequency domain approach. The frequency domain, also referred to as the spectral domain, pertains to inference based on the spectral density function. A time series can be decomposed into periodic components, each of which contains variation at that period’s frequency, whose variations combine together to cause the overall variation in the time series. Therefore, a time series can be well represented as the sum of significant periodic components ( Chatfield, 1980): + 1 A 3 cos 7 2πω 3 t : ; < = B 3 sin 7 2πω 3 t : where Aj and Bj are uncorrelated random variables with mean zero and variances both equal to σ 2 and A B , where d is the period of the cycle. For example, if there is an annual cycle and the data set contains monthly data points, one period, d, could be 12. Exploratory analysis using the periodogram can help to determine genuine periodic ( seasonal) components within the time series, Xt. The definition of the periodogram for { X1,…, Xn} is given below ( Brockwell and Davis, 2002): A 2 C 1 D E 1 X * e H * ω I = E where ω is the frequency. The periodogram is the graph of A and A and is an estimation of the power spectral density function. Although the periodogram is not a consistent estimator of the spectral density because the variance of A does not 20 decrease as the sample size, n, increases, it will be used in this analysis to determine periodicities, which is a common practice ( Chatfield, 1975). If the periodogram is constructed for , π P ω P π the area under the periodogram represents the variance of the time series ( Brocklebank, 2003). Therefore, peaks in the periodogram generally indicate frequencies that can explain a significant part of the total variance. For example, a periodogram that displays a large peak at frequency A 0.25, indicates a period, S 4, which for quarterly data indicates an annual cycle. If a periodogram does not display any obvious peaks, all frequencies are contributing to the series’ variance, and the series may even be a white noise process. The variance by cycles can be decomposed as follows ( Aue, 2009), 7 A < : S U 7 A < : 2 S V 7 A < : where S U 7 A < : √ I Σ cos 2 C A < Y I = and S V 7 A < : √ I Σ sin 2 C A < Y I = . As discussed, the periodogram can help to determine seasonalities and peaks in the periodogram can signify a genuine periodic component which explains a large portion of the variance in the time series. However, it is possible that peaks may occur because of random fluctuations in the sample ( Priestley, 1981). This study used spectral analysis of variance to determine whether peaks in the periodogram explain a larger portion of the variance than is expected with sequences such as white noise and ARMA processes. 21 2.2 Goodness of fit Tests Ideally, after trend and seasonality are removed, the remaining series will be a white noise process. There are many goodness of fit tests to determine whether the residuals are white. For an extensive review of diagnostic checks, refer to Li ( 2004). For the purposes of this study, four goodness of fit tests will be utilized, including the sample autocorrelation function ( ACF), the portmanteau test ( Ljung Box modification), the rank test, and a test of normality including the squared correlation ( R2) based on a qq plot. An explanation of the four goodness of fit tests is described below: 1. The sample autocorrelation function ( ACF): The autocorrelation function and sample autocorrelation functions at lag h are defined as ( Anderson, 1976): ρ Z $ ρ [ Z [ [ $ For a series, Y1,…, Yn, with a large sample size, n, the sample autocorrelations are i. i. d. with zero mean and variance I ( Brockwell and Davis, 2002). Therefore, in order to test for randomness, a plot of the sample autocorrelation function for any amount of lags h should show should that 95% of those lags fall within the bounds \ . ] ^ √ I if the process is i. i. d. ( Aue, 2009). 2. The portmanteau test ( Ljung Box modification): In order to test for randomness, originally, Box and Pierce ( 1970) suggested the portmanteau test, and developed the statistic, Q, as Q( _ ̂ D Σ _ ̂ a < = 22 where _ ̂ is defined as the autocorrelation function. Ljung and Box ( 1978, p. 298) suggest that the Box Pierce methodology produces “ suspiciously low values of Q( _ ̂ …” and propose a modified version as Q( _ ̂ D D 2 Σ I  < _ ̂ a < = where Q can be approximated as a chi squared distribution with h degrees of freedom. The hypothesis that the residuals are i. i. d can be rejected at the level α, if b c d  e ( Brockwell and Davis, 2002). 3. The rank correlation test: The rank test is a test of randomness, to establish whether there remains any systematization in the residuals. For a time series, a trend can be determined by the correlations between the rank order of the time series observations and their time values ( Kendall, 1955). In total, there are b n n , 1 pairs, where P designates the number of positive correlations, and Q designates the number of negative correlations. P is represented by Kendall’s τ, called the coefficient of rank correlation: τ f g h i I I  The coefficient of rank correlation ranges between 1 ( perfect positive correlation) and  1 ( perfect negative correlation), with τ 0 representing an independent, white noise process. Refer to Kendall ( 1955) for further explanation. 4. R2 based on a qq plot: In order to assess the normality of the residuals, the squared correlation ( R2) value can be calculated based on a quantile quantile plot ( qq plot). A qq plot is a graph that compares the quantiles of two distributions. For this study, the first data set is the ordered residuals from the fitted model assuming a mean zero, 23 variance one process denoted as Yj. The second data set is ordered statistics from a random normal sample with mean μ, variance σ 2 denoted as nj. If the model residuals are normally distributed, the pairs ( nj , Yj) should have a linear relationship ( Shumway and Stoffer, 2000). Hence, perfect normally distributed residuals would display an R2 value equal to one. If the R2 value is too small ( based on the level α), then the assumption of normality must be rejected. More specifically, the R2 value can be computed as follows, noting that Φ 3 represents the normal distribution: R k Σ D < , l < Φ 3 I < = m Σ D < , l < I < = Σ Φ 3 I < = Refer to ( Shapiro and Francia, 1972) for the critical values of R2. For residual testing in this study, lag h = 20 was used which is commonly used in time series residual testing ( Shumway and Stoffer, 2000). 2.3 Multicollinearity Multicollinearity occurs when independent variables are highly correlated in a multiple regression model ( Kutner et al., 2005). This means that the two correlated variables are not providing independent information which helps to predict the dependent variable. Severe cases of multicollinearity must be corrected, because the result can be unstable regression coefficient estimates. Further, multicollinearity is often indicated by very large standard errors, even though the coefficients are still the best linear unbiased estimators ( BLUE) ( Washington et al., 2003). If two independent variables are highly correlated, it is difficult to determine which variable is explaining more variation in the dependent variable ( both variables’ standard errors will become large). Another test for the presence 24 of multicollinearity is the comparison of correlation coefficients to regression coefficients. If their signs are different (+/) then multicollinearity should be further investigated ( Kutner et al., 2005). Two methods for detecting multicollinearity are the variance inflation factor and the condition index. 1. The variance inflation factor ( VIF) is defined as n o p q i r s t q t i where # p t are the estimated standardized regression coefficients and t is the variance of the error term1for the correlation transformed model ( also called the standardized regression model). The multiple regression model discussed previously was # $ # # # ! ! # " " % , while the standardized regression model is t # t t # t t # ! t ! t # " t " t % t . If the mean of the VIF values is greater than 1, serious multicollinearity may exist ( Kutner et al., 2005). 2. The condition number ( κ) is defined as the largest condition index ( CI). It is defined as u v w x y z w x {  where λmax is the largest eigenvalue, and λmin is the smallest eigenvalue of the } matrix. Condition numbers between 5 and 10 indicate some dependence, while CI values of 30 and above signify strong dependencies ( Belsley et al., 1980). 2.4 Lagged Variables In time series regression models, it is often the case that time lags need to be included ( Ostrom, 1978). For example, there is a time lag associated with exposure to carcinogenic substances and the development of cancer. If there are time lags between a change in the 1 In this study, the error term ( when testing for multicollinearity) includes a seasonal and noise term ( st + Yt). 25 independent variable and the effect on the dependent variable, a lag term should be included in the regression model. In this study, the explanatory variables were unleaded regular gas prices, unemployment rates, gross domestic product and transit fare prices, each of which could have a time lag with transit ridership. However, a time series study of Portland, Oregon transit ridership between 1971 and 1982 focusing on factors that affect ridership show that neither gas price ( aggregation level unspecified) nor county employment rates show a time lag for bus transit ridership ( Kyte et al., 1988). However, Kyte et al. found a time lag between transit fare prices and ridership. The authors stated that the largest response in ridership to the fare increase occurred almost immediately, and then decayed at a measurable rate for three months. Prior studies have not determined a set of independent variables that consistently predict bus transit ridership. The effects of GDP on ridership have not been studied. 2.5 Box Jenkins ( ARMA) Models In the time domain, linear filters are often used to transform one time series into another, under the assumption of linearity, and can be defined as: Y * 1 Ψ 3 ∞ < =  ∞ X *  3 where Ψ 3 ′ s are weights for each X * , and X * and Y * are the input and output time series, respectively ( Chatfield, 1975; Montgomery et al., 2008). There are many types of linear filters which can be applied to white noise to obtain a more complex linear time series. In general, there are three major classes of linear filters, including autoregressive, moving average and autoregressive moving average filters. They are described below: 26 1. Autoregressive Process, AR( p): An autoregressive process can be represented as:   . The equation’s conceptual interpretation is that the current time series observation is a linear weighted combination of the p most recent past values of the same time series, plus an error term ( Montgomery et al., 2008). The autoregressive polynomial is defined as 1 , , , , . The roots of the polynomial 0 must lie outside of the unit circle to ensure that an AR( p) process is stationary; a condition commonly referred to in time series literature as causality ( Box et al., 2008). 2. Moving Average, MA( q): An moving average process can be represented as: ,  , ,  . Observably, a moving average model assumes the current value is a linear weighted combination of q lagged white noise terms. Further, a condition called invertibility is imposed on the weights, θ 3 , to ensure a unique MA process for an autocorrelation function ( Chatfield, 1980). The moving average polynomial is defined as 1 . An MA( q) process is invertible if the roots of 0 are outside the unit circle ( Box et al., 2008). Invertibility and stationary are two separate conditions; an MA( q) process will always be stationary. 3. Autoregressive Moving Average, ARMA( p, q): An autoregressive moving average ( ARMA) model assumes that the current observation is a linear weighted 27 combination of the p most recent past observations from the same time series ( the AR( p) portion), as well as q lagged white noise terms ( the MA( q) portion). An autoregressivemoving average process ARMA( p, q) can be represented as   ,  , ,  . An ARMA ( p, q) process is causal if the roots of the polynomial 0 lie outside of the unit circle, and is only invertible if the roots of 0 are outside the unit circle ( Box et al, 2008). The coefficients for causality are computed from the expression Ψ , while the coefficients for invertibility computed from the expression Ψ . 2.6 Intervention Analysis Gene Glass ( 1972, p. 463) coined the term intervention and described it as follows: “ Observation of a variable Z at several equally spaced points in time yields the observations , ,…, . Suppose that an intervention ( T) is made at some point in time before time N into the process presumed to be controlling Z. The time series is said to be interrupted at a point in time, say D less than : ,…, I h , , I h ,…, .” Box and Tiao ( 1975) used the term intervention and constructed an analysis method to determine the effect of an intervention, occurring at a known time in a time series. Their intervention model is based on the basic transfer function model, , 1 < ∞ < = $  < , where Xt, 1 represents the input series, while Xt, 2 represents the output series, which constitutes common notation in transfer function modeling. In intervention analysis, it is 28 more common to replace Xt, 1 with Xt, and also to replace Xt, 2 with Yt. The basic intervention model can be described: 1 < ∞ < = $  < where Xt and Yt are the input ( pulse/ step) and output ( ridership, after removal of trend and seasonal components) series of the model respectively, < is a linear filter and Nt represents a noise sequence. < is defined as < a / where is the cross correlation between Xt and Yt, and σ 2 is the variance of each series. In the case of intervention “ 0 Σ < ∞ < = $ 0 < is simplified with a rational operator of the form T( B) " where b is the delay parameter, and W and V help to provide coefficients to represent more complicated indicator series built upon a step or pulse function ( Brockwell and Davis, 2002, pp. 340 341). The intervention term is then 0 . For a series that might be best represented as an intervention causing a temporary change in the response variable, a pulse indicator variable would be most appropriate: 01 Y Y where t is time, and T is the period of the intervention. For a series that might be best represented as an intervention causing a permanent change in the response variable, a step indicator variable would be most appropriate: 29 01 Y Y . In general, the Box and Tiao intervention analysis methodology follows a five step process ( Box and Tiao, 1975): 1. Eliminate trend and seasonal components from the original time series ( Xt). This study eliminated trend components from the series through a multiple regression, and eliminated seasonal components using sinusoidal decomposition with cycles determined by the periodogram and cycle significance based on spectral ANOVA. 2. Use ordinary least squares ( OLS) regression to obtain a initial estimate of < , which represents the transfer model. 3. Model the residuals from the OLS regression as an ARMA( p, q) process, which will represent the noise model. For model diagnostics, analyze the residuals using goodness of fit tests. 4. Minimize the sum of squares, Σ ¡ ¡ ¢ W, V, ¥ ¦ , θ ¦ I = ¨ t , where m*= max ( p2 + p, b + p2 + q), in order to obtain final parameter estimates of both the noise and transfer models. 5. Analyze the final model residuals using goodness of fit tests. This study used the Sample ACF, qq plot, Ljung Box test and rank test. 2.7 Review of Relevant Past Studies Previous studies involving multiple regression and time series analysis are discussed. Many studies have examined transportation related time series data and used time series methods to analyze the data. Kyte et al. ( 1988) reviews previous work in transportation 30 related time series, other than bus transit ridership. For example, Atkins ( 1979) analyzes the effect of speed limit changes on traffic accidents in British Columbia in the 1970’ s using intervention analysis. Additionally, studies that use vehicle miles travelled ( VMT) forecasting models show that VMT can be predicted by independent variables that are similar to those used to predict transit ridership. Common predictors include, but are not limited to, gasoline price ( Schimek, 1996 ( 1521 and 1558); Goodwin et al., 2004; Gately, 1990) and income ( Schimek, 1996 ( 1521 and 1558); Goodwin et al., 2004; Gately 1990). Dahl ( 1986) summaries previous research on gasoline consumption demand, VMT and miles per gallon ( not just VMT), finding negative elasticities for price, and positive elasticities for income. Elasticities measure the responsiveness of one variable to change in another variable. Mokhtarian et al. ( 2002) analyzed induced demand with respect to highway capacity expansion, and listed predictors of induced vehicle travel as changes in population, demographics, the economy, mode and land use, but not highway capacity expansion. Rose ( 1982, 1986) examines rail transit ridership using time series and multiple regression techniques. Rose ( 1986) studies Chicago Transit Authority rail ridership, more specifically, 11 years of monthly average weekday data. He used fares, weekday service miles, cost of car trips ( including gas prices), and weather changes and found that gas prices and service levels were significant predictors of rail ridership. But, there are few studies that analyze bus transit ridership with time series models, a fact that was confirmed by librarians at the Physical Science and Engineering Library at UC Davis, and the Institute of Transportation Studies Library at UC Berkeley. Those pertaining to transit ridership ( defined as both bus and rail, or just bus) are discussed below. 31 2.7.1 Predicting Transit Ridership Using Multiple Regression A number of studies use multiple regression techniques to determine factors that affect bus transit ridership. Those studies take into account autocorrelation in the residuals to ensure valid model results. The following presents studies which use multiple regression as the primary analysis method. Agrawal ( 1981) analyzed Southeastern Pennsylvania Transportation Authority’s City Transit Division’s annual full fare adult ridership between 1964 and 1974. Using multiple regression, he found that three factors were statistically significant in affecting ridership and produced a multiple correlation coefficient of 0.9985. The three significant predictors included average fare ( adult riders), jobs in Philadelphia, and bus miles of service, while number of vehicles owned was not a significant predictor. Lane ( 2009) applied regression techniques to monthly bus and rail transit ridership data from nine US cities between January 2002 and April 2008, and found that gasoline prices were a statistically significant predictor of changes in transit ridership, while service characteristics and seasonality were not significant predictors. Wang and Skinner ( 1984) analyzed fares, gas prices and monthly ridership data from seven transit authorities across the United States, and using regression techniques, found that as real gasoline prices increased, transit ridership increased, although by a small amount. Also, they found that as real fares increased, ridership decreased. Taylor et al. ( 2009) analyzed transit ridership from 265 urban areas using 22 independent variables to and show that the majority of transit ridership variation can be explained by variables within the categories of regional geography, metropolitan economy, population characteristics and auto/ highway system characteristics. They found a positive correlation between ridership and gas prices, and a negative correlation between ridership and 32 unemployment levels. Gomez Ibanez ( 1996) reported an increase in Massachusetts Bay Transportation Authority ( MBTA) bus transit ridership in Boston, in part due to service improvements such as phased station modernization and bus replacement, and transit fares which increased less than the inflation rate. Their study also included income, Boston employment, fares, and vehicle miles. Kitamura ( 1989) showed a causal relationship between car ownership and transit use, more specifically that an increase in car ownership leads to a decrease in transit use, using Dutch National Mobility Panel weekly travel diary data. Cervero ( 1990) provides a broad overview, and summarizes multiple empirical studies which show that there are many factors affecting transit trips, including characteristics of the traveler such as age, income, auto access, trip purpose, trip length, and also characteristics of the operating environments, such as land use and location settings. Although each study used a different set of independent variables to predict transit ridership, most of the studies that used multiple regression included gas prices, fares, and economic indicators such as unemployment rates. 2.7.2 Transit Ridership and Intervention Analysis In terms of transit ridership and intervention analysis, there is a scarcity of previous studies. Kyte et al. ( 1988) use Tri County Metropolitan Transportation District of Oregon bus transit ridership on various aggregation levels ( system, sector and route levels) between 1971 and 1982 to show that service level, transit fares, gasoline price, and employment are statistically significant predictors of bus transit ridership. They also note that to fully explain ridership demand, many more independent variables should be considered. Kyte et al. ( 1988) used intervention analysis to model changes in bus transit 33 ridership resulting from eleven separate cases of changes in their predictor variables including increased fares, system wide service changes, and route level changes, and they observed that the occurrence of multiple events at one time makes it difficult to isolate the impact of any single event on ridership. Their results showed that for the four cases of fare increases, the result in terms of ridership is varied. The separate interventions of system wide service changes and gasoline supply shortages combine to produce an intervention output of an additional 8,400 bus transit riders. Kyte et al. use elasticities greater than one to determine significance of intervention results. Narayan and Considine ( 1989) use intervention analysis to analyze two cases of fare increases, in April 1980 and April 1984, and their effects on monthly upstate New York transit ridership, assuming that ridership could be decomposed into a trend, seasonal, intervention and noise term. They assume that the intervention term is best represented as a step function; an “ abrupt and permanent change” in ridership ( Narayan and Considine, 1989, p. 248). Their methodology differs from the original Box and Tiao intervention analysis methodology, as their model isn’t based on the transfer function model, but on regression with correlated error terms, and eleven indicator variables for seasonality, indicator variables for the two fare price increases, and an error term which they claim “ correct[ s] for autocorrelated errors” ( Narayan and Considine, 1989, p. 249). However they didn’t use ARMA models for the noise, and it is unclear how they corrected for correlation, because they used t tests which require the removal of serial dependence, nonstationarity and seasonality. Both fare interventions produced significant ridership decreases. Considine and Narayan ( 1988) use data from Chattanooga, Tennessee and intervention analysis to examine the affect of market changes on total ridership, total operating 34 revenues, the ratio of total operating revenues to total revenue miles, and the ratio of total passenger trips to total revenue miles. They slightly modify the Box Tiao methodology by first using the entire sample to model the noise term, then separately estimating the intervention term, and then minimizing all parameters. They use t statistics to test for significance. They show that marketing does significantly affect transit ridership. 2.8 Summary of Literature Review An extensive literature review identified a number of past studies using transportationrelated data. Fewer studies used both multiple regression ( taking into account autocorrelation) and time series methods for the analysis of predictors for transit ridership. There were still fewer time series studies that analyzed transit ridership affected by an intervention, using intervention analysis. To date, there are no known studies that examine the impact of the intervention of construction work on bus transit ridership. 35 CHAPTER 3 DATA DESCRIPTION This chapter describes the data used in this analysis. A brief description of methods of ridership data collection is given. Each bus transit agency is described, with their ridership data, and the methods they use to collect ridership data. Data filtering that was required to construct a data set for this analysis is described, with information regarding data imputation for missing data. An analysis was performed on each of the ridership data sets to determine if any independent factors played a significant role in ridership changes during the period of analysis. Data quality with relation to methods of data collection is discussed. 3.1 Methods of Data Collection Four methods of ridership data collection were used among the five transit agencies that provided service to the downtown core. Those methods included automatic passenger counters ( APC), electronic fareboxes, manual counts by route checkers, and manual counts by bus drivers. A description of each method is provided below: 1. Automatic Passenger Counters ( APC): APC devices are often doormounted and use infrared beam technology to automatically count boarding and alighting riders. Many use GPS technology to associate collected data with a time and location. 2. Electronic Fareboxes: Electronic fareboxes are devices that collect ridership information. Typically, a bus driver enters a number corresponding to rider type into a key pad on the electronic farebox which stores the data until it is uploaded to a network. Usually, electronic fareboxes do not provide location information. 36 3. Manual Counts by Route Checkers: Route Checkers take manual counts of passengers boarding and alighting at each stop, and the arrival and departure times of these stops. 4. Manual Counts by Bus Drivers: Bus Drivers take manual counts of passengers boarding and alighting at each stop, and the arrival and departure times of these stops. 3.2 Data Sample This section describes the data samples provided by each of the five bus transit agencies. The section is divided into five sub sections, one for each agency. Each sub section includes a brief background of each bus transit agency, ridership data collection methods employed by each agency, and the data provided by each agency. Unless otherwise stated, all information regarding each transit agency was obtained through personal correspondence as listed in Table 3.1: Table 3.1: Data Collection Details Transit Agency Contact Contact's Official Position Title Type of Personal Correspondence Regional Transit James Drake Assistant Planner e mail, phone, mail, in person Yolobus Erik Reitz Transit Planner e mail, phone, mail, in person Roseville Transit Teri Sheets Alternative Transportation Analyst e mail Elizabeth Haydu Administrative Technician e mail, phone, in person North Natomas TMA Sarah Janus Program Coordinator e mail Yuba Sutter Transit Dawna Dutra Analyst e mail, phone 37 3.2.1 Regional Transit The Sacramento Regional Transit District ( RT) operates bus and light rail transit serving 418 square miles of the greater Sacramento metropolitan area ( Regional Transit, 2009). They are the largest provider of public transportation within the City of Sacramento, operating 256 buses servicing 97 bus routes with more than 3,600 bus stops which operate from 5 A. M. to 11: 30 PM, 365 days per year ( Regional Transit, 2009). RT uses all four data collection methods described in Section 3.1. Regional Transit is the only transit agency within our study that collects ridership data using APC devices, which have been installed in half of the RT bus fleet. Electronic farebox devices are installed on all RT buses, and is the method that RT uses for annual reporting. But because electronic fareboxes don’t keep track of location or alighting passengers, this data was not suitable for this study. The ridership data for RT consisted of APC data even though it is not used for official reporting. It records boarding and alighting riders, as well as time and location stamps for each record, which was necessary for filtering purposes. APC devices are still in testing stages. The FTA’s National Transit Database requires that two random bus trips must be sampled per day by route checkers, which is why RT also employs this ridership collection method ( Drake, 2007). Finally, RT makes use of manual counts by driver for its specialized Community Bus Service ( now called Neighborhood Ride) which offers intraneighborhood service within certain communities while also servicing seniors and the disabled. Because of the expansiveness of RT, total ridership counts are nearly impossible to obtain. On a normal weekday, the RT bus system makes 3,000 trips. As mentioned, 38 approximately half of its bus fleet is equipped with APC devices, which results in the collection of data for 1,500 trips per day. The data is wirelessly uploaded to the RT network, where it undergoes filtering which uses by a relaxed set of rules to remove faulty data. The core set of rules that determine the filtering include: • The difference between total riders on and total riders off for a bus must be 10% or less, • The difference between total riders on and total riders off for a block must be 10% or less ( a block is a schedule for one physical bus each day), • The difference between total riders on and total riders off for a trip must be 10% or less, • The number of stops counted must be “ pretty close” to the actual number of stops on the route, • Records showing obvious technology malfunctions. As filtering occurs, records are deleted from the database. After the filtering process is complete, about 400 records per day remain. The original raw data only remains as the output from the APC device in the form of a text file. It is difficult to accurately assess RT’s total bus ridership because the data is not a random sample, which is a result of the filtering process and the fact that bus lines are not randomly chosen. Because total ridership data by route is not easily obtained with filtered APC data, the raw APC data was also provided but was not used due to data quality concerns. In addition, data from General Farebox Inc. ( counts from the fareboxes on the buses), Parking Lot, Cash, and fare vending machine count data was provided. That additional data was not useful as it 39 provided system wide information, and was not specific to only those bus lines serving the downtown area. The APC daily data, including weekdays and weekends, spans from January 7, 2008 to December 30, 2008. Initially, the APC data was a count of boarding and alighting riders of a randomly chosen number of stops within the entire RT service area, allowing for the filtering of bus lines serving the downtown area. The data also contained the time and date of the observation as well as the bus stop identifier and route schedule identifier. RT defined the AM peak period to be 6: 30 9: 00AM, and the PM peak period to be 3: 00 6: 00PM. Although the RT sample only contains 49 weekly ridership counts, Cherwony and Polin ( 1977) used daily bus transit ridership data from Albany, New York to show that only 30 days of transit ridership data is needed to develop a valid travel forecasting model. 3.2.2 Yolobus Yolobus is operated by the Yolo County Transportation District and serves Yolo County and surrounding areas including Davis, Sacramento, Winters, and Woodland among others. Unlike RT, Yolobus also provides service to the Sacramento International Airport. Yolobus uses electronic farebox devices as well as manual counts by bus driver to collect ridership information. Yolobus separates its services into three types: regular, commute, and express. Regular services run every day of the week whereas commute and express services only run Monday through Friday. The agency operates 365 days of the year. Since Yolobus offers different types of services to the downtown core, the operating times of those services vary. Regular bus routes ( 40, 41, 42A/ B, 240) collectively run all 40 day from 4: 37 AM to 11: 48 PM during the week. The commute and express services, which only run Monday through Friday, only run during peak commuting periods. The commuter routes ( 39, 241) collectively run from 5: 35 AM to 8: 30 AM and 3: 35 PM to 6: 34 PM. Similarly, the express routes ( 43, 44, 45, 230, 231, 232) collectively run from 5: 55 AM to 8: 32 AM and from 4: 03 PM to 7: 17 PM. Yolobus restricts passenger travel in downtown Sacramento by not allowing passengers to both board and alight in downtown Sacramento. Instead, passengers are requested to utilize Sacramento RT for local services within downtown Sacramento. The Yolobus ridership data set contains the total daily ridership counts that span the three year period from January 2006 to December 2008. The data set is missing two days, July 30, 2006 and July 31, 2006. 3.2.3 Roseville Transit Roseville Transit is operated by the City of Roseville and mainly serves the City of Roseville, but additionally serves Sacramento commuters. Roseville Transit runs specific commuter routes that serve the Sacramento downtown core, including AM Routes 1 8 and PM Routes 1 8. The commuter routes only run Monday through Friday between the morning peak commute hours of 5 – 9 AM and the afternoon peak commute hours of 3: 30 – 7: 30 PM. Roseville Transit uses manual counts by bus driver to collect ridership information. Its daily, peak period, ridership data was provided for the entirety of 2006, 2007, and 2008, including separation by commuter route. 41 3.2.4 North Natomas TMA The North Natomas “ Flyer” is operated by the North Natomas TMA, serving Natomas as well as downtown Sacramento commuters. The Flyer runs between 20 and 28 passenger buses through several North Natomas neighborhoods, and although there are no timed stops, time points are listed on the schedule ( the bus will stop wherever there are passengers waiting). In downtown Sacramento, there are timed stops at set locations. The Flyer includes three routes that serve the downtown core: the Eastside Route, the Westside Route, and the Central Route. In September of 2008, North Natomas TMA began running a Square Route; however, those ridership counts were excluded from the daily totals because that route was added after the construction period had ended and there was no pre construction or construction data to use for comparison. The Flyer operates Monday through Friday except on certain holidays. North Natomas TMA runs peak period scheduled routes between Natomas and downtown Sacramento. The Eastside Route has three morning and three afternoon loops that run from 5: 54 AM – 9: 04 AM and from 3: 35 PM – 6: 54 PM, respectively. The Westside Route has two morning and two afternoon loops that run from 6: 00 AM to 7: 44 AM and from 4: 30 PM to 6: 30 PM, respectively. The Central Route also has three morning and three afternoon loops that run from 6: 03 AM – 9: 04 AM and from 4: 07 PM – 7: 06 PM, respectively. North Natomas TMA uses manual counts by driver as well as farebox counts to collect ridership information. Manual counts were also provided by volunteer riders during the duration of the Fix I 5 project. Their daily peak period ridership data, collected using both manual and automated collection methods, spans the complete 2008 year and is separated by route. 42 3.2.5 Yuba Sutter Transit Yuba Sutter Transit is operated by Sutter and Yuba Counties and the Cities of Marysville and Yuba City and provides service to Yuba City, Marysville, Linda, Olivehurst, East Nicolaus and Sacramento. Only the Sacramento Commuter Express provides Sacramento downtown commuter service ( via Highways 70 and 99). The commuter service runs on weekdays, but not on certain holidays. Yuba Sutter currently provides nine commuter schedules for each of the peak periods that operate from 5: 20 AM to 8: 00 AM and from 3: 45 PM to 6: 50 PM. Yuba Sutter Transit uses manual counts by the drivers to collect all ridership information. Their daily, peak period ridership data is for the Sacramento Commuter Service for the years 2005, 2006, 2007, and 2008. This data was broken down by day and further separated by route. 2008 data was provided in the same format but in an electronic version. 3.3 Data Filtering In order to modify ridership data provided by each transit agency a four step procedure was followed for each agency. 3.3.1 General Procedure Step 1: Filter data to include ridership only for lines which provide service to the Sacramento downtown core, as previously defined by cordon. Table 3.2 provides a list of each transit agency and their bus transit lines that provide service to the downtown: 43 Table 3.2: Transit Lines Servicing the Downtown Core Transit Agency Downtown Servicing Lines Regional Transit 2,3,6,7,11,15,29,30,31,33,34,36,38,50E, 51,62,63,67,68,86,88,89,109 Yolobus 39,40,41,42A, 42B, 43,44,45,230,231,232,240,241 North Natomas Eastside Route, Westside Route, and Central Route Roseville Transit AM Routes 1 8, PM Routes 1 8 Yuba Sutter Transit Sacramento Commuter Express Step 2: Filter data according to Table 3.3 to include inbound ridership for the AM peak period, as defined by agency. Table 3.3 AM Peak Period Definitions of Each Data Set Transit Agency AM Peak Period Definition Regional Transit 6: 30 9: 00 Yolobus Daily Data North Natomas Route: Eastside: 5: 54 9: 04, Westside: 6: 00 7: 44, Central: 6: 03 9: 04 Roseville Transit 5: 00 9: 00 Yuba Sutter Transit 5: 20 8: 00 Step 3: Filter data according to Table 3.4 to include outbound ridership for the PM peak period, as defined by agency. Table 3.4: PM Peak Period Definitions of Each Data Set Transit Agency PM Peak Period Definition Regional Transit 3: 00 6: 00 Yolobus Daily Data North Natomas Route: Eastside: 3: 35 6: 54, Westside: 4: 30 6: 30, Central: 4: 07 7: 06 Roseville Transit 3: 30 7: 30 Yuba Sutter Transit 3: 45 6: 50 Step 4: Filter data to include only Tuesday, Wednesday and Thursday ridership. Because modified work schedules are widely used, Monday and Friday are not representative of typical ridership. 44 The final model ridership is a combination of ridership across bus lines for a given agency, so that a single data point represents total ridership on all lines serving the downtown core. The sample size for the final data sets for this analysis is described in Table 3.5: Table 3.5: Sample Size of Each Data Set Transit Agency Time Period Aggregation Sample Size Regional Transit 2008 Weekly, Peak Period 49 Yolobus 2006 2008 Daily 441 North Natomas 2008 Daily, Peak Period 147 Roseville Transit 2006 2008 Daily, Peak Period 441 Yuba Sutter Transit 2006 2008 Daily, Peak Period 441 3.3.2 Special Modifications to General Procedure for Regional Transit RT data required more manipulation in order to perform the necessary filtering. The details are described. The main objective was to use the APC data to obtain the total weekly demand within the downtown core for all 52 weeks in 2008. More specifically, the goal was to obtain this weekly demand data for buses entering the downtown during the AM peak ( 6: 30 AM – 9: 00 AM) and for buses leaving the downtown during the PM peak ( 3: 00 PM – 6: 00 PM). Because RT ridership data is not collected for every passenger or trip, more complex methods were needed for RT. All of the bus stops within the downtown core were identified using RT generated identifying numbers. RT uses 325 bus stops in the downtown. Then the number of boarding and alighting riders associated with those bus stops during each peak period was obtained. Although the daily APC data is incomplete, it covers almost half of the stops within the downtown area every day. We can assume that the total data collection for one week ( Monday through Friday) covers all 45 of the stops within the downtown area and that the sample size for each bus stop is sufficient ( Drake, 2007). Next, using the APC data, the daily average of alighting riders during the AM peak period and the daily average of boarding riders during the PM peak period was calculated. Using the RT bus schedule, the frequency of stops at each bus stop during the peak hours was determined. This frequency is a fixed number every day for a certain schedule. RT had four different schedules throughout 2008; however, comparison between schedules shows that the frequency of the downtown bus stops did not change for the downtown core for 2008. Therefore, this study used the frequencies from the first schedule, Schedule 20, valid between January 6, 2008 through April 5, 2008, for both the AM and PM peak periods. The following equations were used to calculate total ridership: Y Y © & _ S l _ ª « I ¬ ® I B 1 © l _ © ¯ l " " ª l _ Y ª t _ l ° ± l D ² ³ Y ª © Y ´ ± Y ª S ± _ D ¯ µ ¶ ª l © · ! ¸ « = Y Y © & _ S l _ ª ® ¬ ® I B 1 © l _ © ¯ l " D " ª l _ Y ª t _ l ° ± l D ² ³ Y ª © Y ´ ± Y ª S ± _ D ¯ ¶ ª l © · ! ¸ « = 3.4 Independent Variables A multiple regression analysis was performed on each of the nine ridership data sets to determine if any independent factors played a significant role in ridership changes during the period of analysis. The regression was performed for all agencies ( and all peak periods) using four independent variables: GDP, unemployment rates, gas prices, and fare prices. The smallest period of data aggregation available was used for each independent variable. Table 3.6 describes the final independent variable data: 46 Table 3.6: Independent Variable Details Independent Variable Source Aggregation Location Contact Contact's Official Position Title Gross Domestic Product Bureau of Economic Analysis Quarterly National Lisa Mataloni Economist Gasoline Prices AAA Monthly Sacramento City Michael Geeser Media and Government Relations Representative Unemployment Rates Bureau of Labor Statistics Monthly Sacramento/ Arden Arcade/ Roseville Website Fares Yuba Sutter Transit Daily Agency Dawna Dutra Analyst Roseville Transit Elizabeth Haydu Administrative Technician In terms of GDP data, seasonally unadjusted data was used because the adjustment of GDP data is outsourced, and the Bureau of Economic Analysis doesn’t provide or have access to unadjusted GDP data. Additionally, state and metropolitan area GDP is only available on an annual basis and the lowest level of aggregation is national GDP provided on a quarterly basis. GDP was included as a measure of overall economic health. Hoel ( 1971), in his discussion of linear regression, gave the example of a 0.98 correlation coefficient between teacher’s salaries and liquor consumption, noting that in general the economy was doing well and upward trends were common. He warned about spurious correlations which must be considered in correlational studies. Gas price data was the unleaded gasoline price per gallon averaged for the city of Sacramento between 2006 and 2008. Finally, fares were used for two transit agencies who were affected by changes in basic fare rates namely Yuba Sutter Transit and 47 Roseville Transit, with one increase for the three year period ( 2006 2008) for both agencies. Table 3.7 describes the fare pricing for each agency: Table 3.7: Fare Pricing Details Transit Agency Single Ride, Adult Fare Regional Transit $ 2.00 Yolobus $ 1.50 North Natomas $ 1.00 Roseville Transit 11/ 1/ 2003 – 6/ 30/ 2007: $ 2.75, 7/ 1/ 2007 – 12/ 31/ 2008: $ 3.25 Yuba Sutter Transit 8/ 1/ 2002 – 6/ 30/ 2007: $ 3.00, 7/ 1/ 2007 – 12/ 31/ 2008: $ 3.50 For data aggregated on levels other than daily, the monthly or quarterly average value of the independent variable was repeated for all Tuesdays through Thursdays that existed for that month or quarter ( based on the information from the data manipulation section). Consequently, there is a single value for every day of ridership data that represents that month’s average of the independent variable. For weeks that straddled two months, the two monthly averages were averaged. For example, Week 17 of 2008 includes April 29, April 30 and May 1, and the unemployment rate for April 2008 is 5.9% while May’s unemployment rate is 6.3%. The unemployment rate for this week is calculated as [( 2* 5.9) + 6.3] / 3 = 6.03333%. 3.5 Data Quality2 This section describes data quality considerations, including sub sections describing quality concerns related to the four ridership data collection methods employed by the five transit agencies, as well as additional data quality concerns. The four collection methods consist of automatic passenger counting ( APC) devices, electronic registering 2 Sections 3.5.1, 3.5.2, 3.5.3, and 3.5.4 use information from a report prepared by Jessica Seifert, under the author’s direction. 48 fareboxes ( ERFs), manual counts by route checkers, and manual counts by bus drivers. Each section will briefly discuss the collection method, concerns related to data quality and which agencies use that method. All agencies ridership data sets were shortened to Tuesday, Wednesday, and Thursday data sets which did not contain any missing data. The two missing values for Yolobus, discussed previously, July 30, 2006 and July 31, 2006, fell on Sunday and Monday. 3.5.1 Automatic Passenger Counting Devices APC devices automate ridership data collection by tracking boarding and alighting riders, in addition to including a time and location stamp for each count. RT is the only transit agency within the study that utilizes APC devices, supplied by Clever Devices, Inc. ( Drake, 2009). This technology uses an infrared beam to count boarding and alighting riders, and is mounted above the bus doors ( Poggioli, 2009). The Clever Devices APC correlates the ridership data to GPS coordinates and scheduled routes so that the data may be viewed on a per bus, per door level ( Clever Devices, 2009). Clever Devices, Inc. claims that their APC system demonstrates over 95% accuracy, though they do not provide information on their website that would account for the 5% error ( Clever Devices, 2009). Boyle ( 1998), referring to all APC systems, stated that typically the most common problems are related to software, as transit agencies often have to upgrade their analytical programs, and secondarily hardware problems ( device failure and durability). But for the Clever Devices APC system, in large part, the 5% error can be attributed to mechanical malfunctions as well as door bunching, carrying a child, carrying large bags, drivers getting on and off the bus, non riders making inquiries 49 to bus drivers, and misalignment of sensors ( Poggioli, 2009). In addition to technical problems, Boyle ( 2008, p. 18) defines a “ debugging” period in which employees must familiarize themselves with the new technology. From the survey Boyle conducted in 1998, the average debugging period for APC devices was 17 months ( Boyle, 2008). The accuracy of APC systems can be evaluated by comparing its ridership data to manual counts, although manual counts may also have data quality problems ( see Sections 3.5.3 and 3.5.4) ( Boyle, 2008). As mentioned, RT is the only agency that uses APC devices to collect ridership information. RT was unable to provide APC ridership data for all of the buses that serve the downtown Sacramento region, because APC devices were not installed on the entire bus fleet and because the data was heavily filtered to remove data with obvious errors ( see Section 3.2.1 for filtering rules). Also, RT’s APC system is still in testing phases which could indicate that the devices are also within the debugging period ( Drake, 2009). 3.5.2 Electronic Registering Fareboxes Electronic registering fareboxes ( ERFs) are devices in which bus drivers enter a number corresponding to rider type into a key pad that connects to an electronic farebox ( Boyle, 1998). The drivers are also required to enter a value to indicate the route and run number at the beginning of each trip ( Boyle, 1998). ERFs do not collect location information, so ridership data is only available at the trip level ( Drake, 2009). As is done with APC devices, the data collected from electronic registering fareboxes can be “ validated” by a comparison with manual counts or by comparison with the revenue collected from fares ( Boyle, 1998). 50 There are four problems that may be encountered when using electronic registering fareboxes: mechanical problems, operator compliance, software problems, and accuracy of data ( Boyle, 1998). The bus operators must enter the correct codes at the beginning of each route and trip and the correct code for the type of passenger ( Boyle, 1998). Boyle’s survey ( 1998) indicates that some transit agencies experienced difficulties when adding these additional responsibilities to the bus drivers’ duties, although the most successful agencies were the ones that provided continuous ERF training to their drivers. Ultimately, the quality of the data collected from ERFs is affected both by human and software errors. The transit agencies within this study that used ERF are RT, Yolobus, and North Natomas TMA. RT has electronic fareboxes installed on all of their buses except for the community buses. 3.5.3 Manual Counts by Route Checkers Most transit agencies utilize manual counts either as their primary method of data collection, or for comparison against electronic methods ( Boyle, 1998). Route checkers ride the transit vehicle and take manual counts of passengers boarding and alighting at each stop ( Boyle, 1998). They typically have preprinted forms or handheld units that contain all of the stops on that route, with the sole responsibility to count passenger and record bus stop arrival and departure times ( Boyle, 1998). Manual counts are the most well established method of ridership data collection ( Boyle, 1998). The following problems are associated with manual counting by route checkers: accuracy of data, consistency of data, labor intensiveness, reliability of route checkers, and cost of 51 manual counting ( Boyle, 1998). Problems with accuracy and consistency of the data are a result of the training and reliability of the route checker, as well as transcription of the handwritten record to an electronic version ( Boyle, 1998). RT and Natomas TMA were the only transit agencies within the study that used manual counts by route checkers to collect ridership data. It should be noted that Natomas TMA used untrained volunteer riders to provide manual counts. 3.5.4 Manual Counts by Bus Drivers Manual counts by bus drivers are another method of ridership collection. Manual counting by bus drivers is concerned with many of the same problems as manual counting by route checker, including the labor intensiveness and reliability of the counter. But because bus drivers also have many other responsibilities such as driving the bus, monitoring passengers and collecting fares, they may be less focused on counting passengers than route checkers. All five of the transit agencies within the study use manual counts by bus drivers as either their primary method of data collection or in combination with another technique. Roseville Transit and Yuba Sutter Transit exclusively use manual counts by bus drivers to collect ridership information, while North Natomas TMA and Yolobus use manual counts by bus driver in addition to electronic registering fareboxes. RT uses manual counts by bus driver in addition to the other three techniques. 52 3.5.5 Additional Data Quality Considerations As discussed above, the RT data was heavily filtered and manipulated prior to being obtained by this study. Although quantification of data quality is not possible, RT data is probably the least reliable of the five agencies analyzed. RT ridership data was only a sample of total ridership ( unlike all other agencies), and further the collected data was not a random sample. Additionally the system was still in a testing phase. Furthermore, this analysis did not separate riders who board and alight within the downtown core. According to RT, ridership that fell within these categories was less than 5% of total ridership for commute periods, but other ridership data was not available to verify this. Although the Yolobus data was probably more reliable than RT, their daily ridership totals included regular bus routes ( 40, 41, 42A/ B, 240) which operated all day during weekdays, in addition to commute and express services which only run Monday through Friday during peak commuting periods. The Yolobus ridership sample therefore included some non commute data. Since the data from RT and Yolobus was received in an electronic format, there is a possibility of transcription errors on the part of the transit agency. The data from Yuba Sutter Transit, Roseville Transit, and North Natomas T. M. A. was received in a hardcopy format. There was also a possibility of transcription errors in entering that hardcopy data into data sets used by this study, although all data entry was verified for accuracy by a second person. 53 3.6 Data Cleaning The data sets provided by each of the five agencies contained only two cases of actual missing data, both for Yolobus ( July 30, 2006 and July 31, 2006). Although cases of missing data were rare, plots of the data that was provided by each agency indicated that some data manipulation would be necessary to account for holidays and limited service days. As an example, Figure 3.1 below displays the original data for Roseville Transit: Figure 3.1: Plots of Original Roseville Peak Period Ridership Data Plots of each agency’s original data set and imputed data set including Tuesday, Wednesday and Thursday ridership can be found in Appendix B. The drops in the plots represent transit holidays and limited service days as well as state holidays. Although not technically missing data, because agencies had provided data for all observations, the buses and the riders ( assumed to be workers in downtown Sacramento) were “ missing” 54 and therefore ridership was zero ( or very low) for transit and state holidays, and unusually low for limited service days. Those occurrences were treated as missing data. For some missing observations, the missing data could be considered missing completely at random ( MCAR). MCAR occurs if missing observations are distributed randomly over all observations, including that variable and any others, and can therefore be considered a simple, random subsample ( Allison, 2002). The missing data in this analysis are MCAR, although not ignorable. As discussed in the Literature Review, discrete time series data assumes that the time series is observed at equal intervals. More complex methods are necessary if the observations are not equally spaced, and therefore the missing data in this analysis had to be imputed. There are multiple methods to deal with missing data. Some conventional methods, excluding listwise and pairwise deletion, include dummy variable adjustment, and imputation. A basic dummy variable regression was first used, but an ad hoc imputation method was ultimately used because of the detailed information about the missing value cases and their likely “ true” values. Prior to any data imputation, it was necessary to identify days with no transit service, limited transit service days, and full transit service days that coincide with state holidays for each of the five transit agencies for 2006, 2007 and 2008. Those dates are considered missing observations, and are identified in Appendix D. From Yolobus data exploration, in general, there was low variation from the mean for Tuesday, Wednesday and Thursday ridership for any given week. However, it also appeared that there was low variation from the mean for the same weekday ridership for 55 three consecutive weeks. For example, the second Tuesday in a month showed similar ridership to the first and third Tuesdays in that month. Therefore, two methods for imputing data for “ missing” observations were compared. The methods were tested using Tuesday, Wednesday, and Thursday Yolobus ridership data. The same set of holidays was used to test both methods. The two methods are described below, and the detailed calculations are given in Appendix E: o Method 1 used the same week that the holiday falls in but different days. T1 is defined as the ridership of the first non holiday day in the holiday week, and T2 is defined as the ridership of the second non holiday day in the holiday week. For example, if the holiday fell on a Wednesday, T1 was the ridership on Tuesday and T2 was the ridership on Thursday, whereas if it fell on Tuesday, then T1 applied to Wednesday and T2 to Thursday of the same week. Then the absolute value of the difference,  T1 T2, was calculated. The differences for all of the holidays were summed and divided by the total number of holidays, giving the average difference. This difference was found to be equal to 152.33. o Method 2 uses the weeks prior to and after the holiday week but the same day. T1’ was defined as the ridership on the same day of the week before the holiday, and T2’ was defined as the ridership on the same day of the week after the holiday. For example, if the holiday fell on a Tuesday, T1’ was the ridership of the previous Tuesday and T2’ was the ridership of the following Tuesday. Then the absolute value of the difference,  T1’ T2’, was calculated. The differences for all of the holidays were summed and divided by the total number of holidays, giving the average difference. This difference was found to be equal to 154.17. 56 Since the average difference of Method 1 was smaller than the average difference of Method 2, Method 1 was used for this study. More specifically, the average of T1 and T2, which lie in the same week as the day with the holiday, was used to impute the missing day’s ridership. In addition, there were no problems with Method 1 when holidays occurred in consecutive weeks, for example, Christmas Day and New Year’s Eve. Data imputation was done using Method 1 for the days that each transit agency ran limited or no services as well as state holidays when they ran full services. Finally, Thanksgiving, Christmas, and New Year’s Eve weeks were eliminated from the data as those entire weeks showed extremely low ridership. The formula used for percent data imputed is % Imputed = ( Number of Days Imputed/ Total Number of Days) x 100. There were no differences between holidays, or limited service days, so the percent of data imputed for agencies with separate AM and PM peak data sets was constant. Table 3.8 shows that the amount of imputed data for any given agency is at most 2%, and usually much less. This is considered an acceptable level of imputation. Table 3.8: Percent Imputed Data Transit Agency Percent Roseville Transit 0.91% Yuba Sutter Transit 0.91% Yolobus 0.68% North Natomas T. M. A. 1.36% Regional Transit 2.04% 57 3.7 Descriptive Statistics for Transit Ridership3 The statistical methods discussed in the literature review are considered parametric statistical methods. Parametric methods make assumptions about the population parameters, more specifically probability distributions are usually assumed to be normal ( Mann, 2004). The statistical tests used in this study, including the regression and time series analyses presented later, use parametric methods. The following discussion provides a general statistical overview of each transit agency’s ridership data, based on the cleaned data sets, prior to in depth time series analysis. Both measures of center tendency and measures of dispersion will help to describe the data and its distribution. This section will present statistics, but leaves the interpretation of the statistics to Chapter 5. 3.7.1 Measures of Central Tendency Measures of center value describe the center of the distribution of a variable. The mean is an arithmetic average which is commonly used to describe distributions. However, the mean statistic is sensitive to extreme values, also known as outliers ( Ross, 2005). The median is also a measure of center value, and describes the middle value of the data without being as affected by outliers ( Ross, 2005). In order to describe the center values of each data set, Table 3.9 lists the mean and median ridership for each transit agency’s data sets. 3 Section 3.7 makes use of calculations and tables created by Jessica Seifert. 58 Table 3.9: Measures of Central Tendency: Mean and Median Transit Agency Data Aggregation Mean Ridership Median Ridership Roseville Transit AM 213.9 208.0 PM 200.4 198.0 Yuba Sutter Transit AM 239.2 226.0 PM 237.5 222.0 Yolobus Daily 3499.5 3383.0 North Natomas TMA AM 116.8 127.0 PM 96.36 96.0 RT AM 15498.4 15314.0 PM 13639.6 13641.0 A comparison of the median and the mean provides insight into the shape of the data sets distributions. If the median and mean have similar values, then the distribution is probably symmetric; otherwise, the data may be to some degree skewed ( Ross, 2005). The data for this analysis shows that the medians for each agency are similar to their means. This indicates that the distributions of the ridership data are fairly symmetric. More specifically, the medians for Roseville Transit, Yuba Sutter Transit, Yolobus, and Regional Transit are slightly less than their means, indicating that the distributions may be skewed to the right. Part of the skew in the histograms of Roseville Transit ( AM peak period), Yuba Sutter Transit, and Yolobus can be attributed to slightly higher ridership on Tuesdays compared to Wednesdays and Thursdays. The Roseville and Yuba Sutter histograms are shown in Figure 3.2, confirming that expectation. The histograms of all data sets are given in Appendix F. 59 Figure 3.2: Roseville and Yuba Sutter Transit Histograms Roseville Transit, Yuba Sutter Transit, and Yolobus that have data sets spanning the period of 2006 to 2008. All three agencies experienced increased ridership in both 2007 and 2008. In particular, Yuba Sutter Transit experienced high ridership increases; from 2006 to 2007, average AM ridership increased by 15.2% and from 2007 to 2008, it increased by 31.5%. Similar changes were seen in Yuba Sutter Transit’s PM ridership during those years. The medians in Table 3.10 are much closer to their means. In fact, all agencies display this tendency, indicating that the yearly distributions are much more symmetric than the distributions of the entire data sets. Table 3.10: Yearly Means and Medians for Transit Agencies with Data Spanning 2006 2008 Roseville Transit AM Ridership ( a) Ridership Frequency 100 150 200 250 300 350 0 10 20 30 40 50 60 Roseville Transit PM Ridership ( b) Ridership Frequency 100 150 200 250 300 350 400 0 10 20 30 40 50 60 Yuba Sutter Transit AM Ridership ( c) Ridership Frequency 150 200 250 300 350 400 0 10 20 30 40 Yuba Sutter Transit PM Ridership ( d) Ridership Frequency 150 200 250 300 350 400 0 10 20 30 40 Transit Agency Data Aggregation Mean Median 2006 2007 2008 2006 2007 2008 Roseville Transit AM 194.9 205.6 241.2 196.0 203.0 240.0 PM 189.1 192.5 219.6 190.0 194.0 217.0 Yuba Sutter Transit AM 195.6 225.4 296.5 195.0 226.0 300.0 PM 189.9 223.0 299.5 189.0 222.0 302.0 Yolobus Daily 3175.1 3346.1 3977.2 3188.0 3360.0 3932.0 60 3.7.1.1 Ridership Means by Period Since not all data sets span multiple years, the means of the data were also calculated seasonally for the year 2008. The calculations were made based on the following seasons: • 1st quarter: January – March • 2nd quarter: April – June • 3rd quarter: July – September • 4th quarter: October – December RT was excluded as its data is observed weekly. All of the agencies experienced increased ridership between the first and second quarters of 2008. North Natomas TMA ridership increased the most during this period, with a 41.6% increase in AM ridership and a 24.9% increase in PM ridership. Similarly, all agencies saw an increase in ridership between the second and third quarters of 2008, with North Natomas TMA again showing the largest increase. However, opposite changes occurred between the third and fourth quarters of 2008. Almost all of the agencies experienced a decrease in ridership during this period; Yolobus was the only agency that saw an increase in ridership ( 1.7%). The means for each quarter of 2006, 2007 and 2008 are displayed in Table 3.11. In general, it appears that transit ridership decreased in the first and fourth quarters. 61 Table 3.11: Means by Season for 2006, 2007 and 2008 Transit Agency Roseville Transit Yuba Sutter Transit Yolobus North Natomas TMA Data Aggregation AM PM AM PM Daily AM PM 2006 Mean Ridership 1st 190.3 182.4 189.7 182.9 3270.0 2nd 203.2 190.7 190.7 184.9 3185.0 3rd 197.2 196.2 201.9 196.2 3090.6 4th 187.3 186.0 200.6 196.0 3161.8 2007 Mean Ridership 1st 195.1 191.4 213.4 206.8 3333.8 2nd 200.3 192.6 219.9 214.1 3269.8 3rd 201.0 188.6 229.0 225.0 3385.3 4th 228.6 198.1 240.8 248.8 3403.5 2008 Mean Ridership 1st 226.3 199.6 253.9 257.2 3423.9 77.9 74.8 2nd 235.1 213.3 288.8 292.3 3782.8 110.3 93.4 3rd 266.0 234.6 333.8 335.7 4326.6 146.8 114.7 4th 234.5 231.3 307.2 310.8 4399.9 130.8 101.3 Means and medians were also calculated based on the Fix I 5 construction period. The three periods in Table 3.12 represent the time before the construction ( January 1, 2008 – May 30, 2008), the time during the construction ( May 31, 2008 – July 27, 2008), and the time after the construction ( July 28, 2008 – December 31, 2008). All of the agencies experienced increases in mean ridership between the pre construction and construction periods, but the changes in the ridership from the construction to post construction periods varied by agency and by peak period within agencies. However, these differences are confounded with seasonal differences, as the previous table had shown. 62 Table 3.12: Means by Construction Period for 2008 * RT AM and PM peak period ridership represents weekly ridership counts. 3.7.2 Measures of Dispersion But measures of central tendency do not give a complete picture of the data’s distribution; measures of dispersion are also included as descriptive statistics and include the standard deviation. The sample variance, s2, is the average of the squared deviations from the sample mean, ¹ , while the sample standard deviation, s, is the square root of the variance ( Ross, 2004). The relative size of the standard deviation can provide information about how tightly clustered the data are about the mean. Smaller standard deviations indicate that the data are tightly clustered whereas larger standard deviations indicate that the data are relatively more dispersed ( Mann, 2004). The standard deviation, together with the mean, can be used to calculate a range in which a certain percentage of the data can be expected to lie: the confidence interval ( Ross, 2005). This range provides values in terms of the original data’s units that indicate how much of the data are “ normally” contained in that range. Standard deviations ( s) by construction period are given in Table 3.13. Transit Agency Data Aggregation Mean Ridership Median Ridership Pre During Post Pre During Post Roseville Transit AM 228.2 253.8 249.8 229.0 250.5 246.0 PM 204.3 226.0 233.2 204.0 230.5 230.0 Yuba Sutter Transit AM 265.8 314.0 321.8 262.0 311.5 316.5 PM 268.8 317.2 324.8 268.0 318.5 320.5 Yolobus Daily 3563.1 4023.5 4393.5 3589.0 3973.0 4447 North Natomas TMA AM 85.2 146.2 138.1 82.0 148.5 138.5 PM 76.6 124.3 106.0 75.0 127.0 104.5 RT AM* 14785.9 15525.8 16235.7 14990.4 15132.5 16423.0 PM* 13361.6 13907.5 13824.4 13221.9 14188.6 13966.6 63 Table 3.13: Variance and Standard Deviation for Each Transit Agency Transit Agency Data Aggregation Standard Deviation ( s) º » ± Pre During Post s º » ± 2s º » ± 3s Roseville Transit AM 20.54 24.91 24.46 ( 186.0, 241.8) ( 158.1, 269.7) ( 130.2, 297.6) PM 15.49 20.17 23.49 ( 177.7, 223.1) ( 155.0, 245.8) ( 132.3, 268.5) Yuba Sutter Transit AM 29.14 16.33 21.38 ( 191.4, 287.0) ( 143.6, 334.8) ( 95.8, 382.6) PM 33 19.7 23.02 ( 185.7, 289.3) ( 133.9, 341.1) ( 82.1, 392.9) Yolobus Daily 240.38 209.77 229.1 ( 3043.5, 3955.5) ( 2587.5, 4411.5) ( 2131.5, 4867.5) North Natomas TMA AM 12.14 11.65 12.48 ( 86.7, 146.9) ( 56.6, 177.0) ( 26.5, 207.1) PM 8.75 12.37 12.42 ( 75.2, 117.6) ( 54.0, 138.8) ( 32.8, 160.0) RT AM 992.62 1265.84 1119.38 ( 14238, 16759) ( 12977, 18019) ( 11717, 19280) PM 687.33 1043.77 793.33 ( 12824, 14455) ( 12009, 15270) ( 11193, 16086) According to the empirical rule, the following percentages of approximately normal data lie in these respective ranges: 68% in the range ¹ ± s, 95% in the range ¹ ± 2s, and 99.7% in the range ¹ ± 3s ( Ross, 2005). These ranges were calculated for the data sets and are displayed in Table 3.13. The ¹ ± 3s range covers 100% of the data in all but two cases ( with Roseville Transit AM and PM data sets containing points that lie outside of the ¹ ± 3s, 99.7% range), indicating that the data sets are approximately normal. 3.7.3 Discussion All agencies had overall increases in mean ridership during the study period, but there were also seasonal variations in mean ridership. An informal analysis of data dispersion indicated that the data sets were approximately normal, with minor skews. Although this study’s data failed usual tests of normality, slight departures from normality do not cause serious issues ( Kutner et al., 2005) With the possible exception of RT, this study’s data 64 sets are random samples with a sufficiently large number of observations. Their populations were considered approximately normally distributed, and parametric methods were justified. The next Chapter, which uses multiple regression and time series analyses, studies the transit agency data sets to identify independent variables that correlate with increased ridership and which can be used in predictive models to explain the change in ridership means during the Fix I 5 project. 65 CHAPTER 4 MODEL BUILDING This chapter describes the methodology that was used to create the time series intervention models for each agency’s transit ridership data. The first two sections present the steps taken to transform each of the nine data sets into stationary processes, including detrending using multiple regression analysis, and eliminating seasonal components using sinusoidal decomposition. The last section explains the intervention analysis methodology. 4.1 Multiple Regression The nine time series plots shown in Appendix B show an overall increasing trend in ridership. As discussed in the Literature Review Section, there are multiple methods of removing trend components in the time domain including least squares estimation, smoothing with moving averages and differencing, as well as regression techniques ( Aue, 2009; Yaffee, 2000). Regression techniques allow the modeler to eliminate trends using independent variables. As discussed earlier, each previous study used a different set of independent variables to predict transit ridership, but most of the studies that used multiple regression included gas prices, fares, and economic indicators such as unemployment rates. The following sections describe the relationships between bus transit ridership and each independent variable used in this multiple regression. Plots of each independent variable can be found in Appendix C. 66 4.1.1 Bus Transit Ridership and Gas Prices Many studies have shown that gas prices significantly affect ridership, and that the ridership gas price correlation is positive. Lane ( 2009) showed that gasoline prices are a statistically significant predictor of positive changes in transit ridership, with positive ridership gas correlations. Wang and Skinner ( 1984) used data from seven transit authorities in the U. S. and showed that as real gasoline prices increase, transit ridership increases significantly. In a cross sectional study, Taylor et al. ( 2009) analyze transit ridership from 265 urban areas using regional fuel prices provided by the Bureau of Labor Statistics as an explanatory variable and hypothesized a positive correlation. They found that fuel prices were a significant external factor positively influencing aggregate transit ridership. Kyte et al. ( 1988) shows that gasoline price is a statistically significant predictor of bus transit ridership, explaining that increasing the cost of automobile travel ( i. e. gas prices) would motivate a mode change to transit. They find that gasoline prices show a negligible lag in their influence on ridership. For the previous work presented above, those studies that used gas prices in their analysis found them to be significant independent variables. This study’s data found strongly significant and positive Pearson’s correlations between bus transit ridership and gas prices ranging between 0.18 and 0.6 for eight of the data sets. The Pearson’s correlation coefficient, r, is defined for pairs ( xi, yi) as ( Ross, 2005): r Σ ¼ ½  ¼ ¹ ¾ ½ ¿ h À ½  À ¹ Á  Â Ã Â Ä . The data sets with the highest correlations ( Roseville Transit, Yuba Sutter Transit and Yolobus) are those having the longest time series, suggesting that perhaps the impact of 67 higher gas prices was beginning to level off by 2008 ( i. e. those who were susceptible to the effect of higher prices had already changed earlier than 2008), or possibly that the intervention of the Fix I 5 project and other anomalies of 2008 ( economic conditions, serious regional fires during the summer) disrupted the previously regular relationship between gas prices and ridership. Also, the highest correlations between ridership and gas price were for bus transit agencies farthest from the Sacramento downtown core ( Yuba Sutter Transit, Roseville Transit and Yolobus) indicating that commuters with longer commute distances may have been more sensitive to rising gas prices, and therefore, more inclined to use bus transit. The RT AM peak has a counterintuitive negative ridership gas correlation of  0.2. The ridership gas Pearson’s correlations are shown in Table 4.1. Table 4.1: Ridership Gas Price Correlation Coefficients Transit Agency Peak Period Pearson's Ridership Gas Price Correlation Coefficient Regional Transit AM  0.20*** Regional Transit PM 0.18*** Yolobus Daily 0.39*** North Natomas AM 0.25*** North Natomas PM 0.28*** Roseville Transit AM 0.60*** Roseville Transit PM 0.43*** Yuba Sutter Transit AM 0.59*** Yuba Sutter Transit PM 0.59*** ***: p < 0.001 4.1.2 Bus Transit Ridership and Unemployment Rates A small number of previous studies have examined the effects of labor statistics, specifically employment, on transit ridership. Agrawal ( 1981) showed that jobs in 68 Philadelphia were highly significant positive predictors of transit ridership. Surprisingly, he showed th
Click tabs to swap between content that is broken into logical sections.
Rating  
Title  Sacramento's Fix I5 project : impact on bus transit ridership 
Subject  University of California, Davis. Dept. of Civil and Environmental EngineeringDissertations.; Bus linesRidershipCaliforniaSacramento.; Express highwaysMaintenance and repairSocial aspectsCaliforniaSacramento.; Interstate 5. 
Description  Text document in PDF format.; Title from PDF title page (viewed on September 29, 2010).; Thesis (M.S.)University of California, Davis, 2010.; Includes bibliographical references (p. 104110). 
Creator  Carpenter, Rachel A. 
Publisher  Institute of Transportation Studies, University of California, Davis 
Contributors  University of California, Davis. Institute of Transportation Studies.; University of California, Davis. Dept. of Civil and Environmental Engineering. 
Type  Dissertations, Academic.; Text 
Language  eng 
Relation  http://worldcat.org/oclc/666946028/viewonline; http://pubs.its.ucdavis.edu/download_pdf.php?id=1414 
DateIssued  [2010] 
FormatExtent  viii, 154 p. : digital, PDF file (4.5 MB) with charts. 
RelationRequires  Mode of access: World Wide Web. 
RelationIs Part Of  Research report ; UCDITSRR1018; Research report (University of California, Davis. Institute of Transportation Studies) ; UCDITSRR1018. 
Transcript  i Sacramento’s Fix I 5 Project: Impact on Bus Transit Ridership By RACHEL A. CARPENTER B. S. ( California Polytechnic State University, San Luis Obispo) 2008 THESIS Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in Civil and Environmental Engineering in the OFFICE OF GRADUATE STUDIES of the UNIVERSITY OF CALIFORNIA DAVIS Approved: _____________________________________ Chair H. Michael Zhang _____________________________________ Patricia L. Mokhtarian _____________________________________ Alexander Aue Committee in Charge 2010 ii ACKNOWLEDGEMENTS I would like to thank my committee chair and advisor, Professor Michael Zhang, for his guidance, suggestions and financial support. I would like to thank my committee members, Professor Patricia Mokhtarian and Professor Alexander Aue, for their advice and constructive comments on my thesis. I would like to show gratitude to my student colleagues Zhen ( Sean) Qian, Yi Ru Chen, and Wei Shen for their ever present willingness to help, and also for their friendships. Finally, I would like to thank my parents, Linda and Dave, and sister, Molly, for their continued support and patience during my studies at UC Davis. This research was funded by the Cal EPA. iii CONTENTS Chapter 1 INTRODUCTION ............................................................................................... 1 1.1 Purpose ...................................................................................................................... 3 1.2 Analysis Scope .......................................................................................................... 3 1.3 Gap in Knowledge ..................................................................................................... 4 1.4 Response to the Event ............................................................................................... 5 1.4.1 City of Sacramento Traffic Operations Center ................................................... 5 1.4.2 Government Media Outreach ............................................................................. 7 1.4.3 Private Media Outreach ...................................................................................... 9 1.4.4 Transit Agency Outreach and Preparation ........................................................ 10 1.5 Organization of Analysis ......................................................................................... 11 Chapter 2 LITERATURE REVIEW .................................................................................. 13 2.1 Time Series .............................................................................................................. 14 2.1.1 Background ....................................................................................................... 14 2.1.2 Trend Components ........................................................................................... 17 2.1.3 Seasonal Components ....................................................................................... 19 2.2 Goodness of fit Tests .............................................................................................. 21 2.3 Multicollinearity ...................................................................................................... 23 2.4 Lagged Variables..................................................................................................... 24 2.5 Box Jenkins ( ARMA) Models ................................................................................ 25 2.6 Intervention Analysis .............................................................................................. 27 2.7 Review of Relevant Past Studies ............................................................................. 29 2.7.1 Predicting Transit Ridership Using Multiple Regression ................................. 31 2.7.2 Transit Ridership and Intervention Analysis .................................................... 32 2.8 Summary of Literature Review ............................................................................... 34 Chapter 3 DATA DESCRIPTION ..................................................................................... 35 3.1 Methods of Data Collection .................................................................................... 35 3.2 Data Sample ............................................................................................................ 36 3.2.1 Regional Transit ............................................................................................... 37 3.2.2 Yolobus ............................................................................................................. 39 iv 3.2.3 Roseville Transit ............................................................................................... 40 3.2.4 North Natomas TMA ........................................................................................ 41 3.2.5 Yuba Sutter Transit .......................................................................................... 42 3.3 Data Filtering........................................................................................................... 42 3.3.1 General Procedure ............................................................................................ 42 3.3.2 Special Modifications to General Procedure for Regional Transit................... 44 3.4 Independent Variables ............................................................................................. 45 3.5 Data Quality ............................................................................................................ 47 3.5.1 Automatic Passenger Counting Devices ........................................................... 48 3.5.2 Electronic Registering Fareboxes ..................................................................... 49 3.5.3 Manual Counts by Route Checkers .................................................................. 50 3.5.4 Manual Counts by Bus Drivers ........................................................................ 51 3.5.5 Additional Data Quality Considerations........................................................... 52 3.6 Data Cleaning .......................................................................................................... 53 3.7 Descriptive Statistics for Transit Ridership ............................................................ 57 3.7.1 Measures of Central Tendency ......................................................................... 57 3.7.2 Measures of Dispersion .................................................................................... 62 3.7.3 Discussion ......................................................................................................... 63 Chapter 4 MODEL BUILDING ........................................................................................ 65 4.1 Multiple Regression ................................................................................................ 65 4.1.1 Bus Transit Ridership and Gas Prices .............................................................. 66 4.1.2 Bus Transit Ridership and Unemployment Rates ............................................. 67 4.1.3 Bus Transit Ridership and Gross Domestic Product ........................................ 69 4.1.4 Bus Transit Ridership and Transit Fares .......................................................... 70 4.2 Sinusoidal Decomposition....................................................................................... 72 4.3 Intervention ............................................................................................................. 72 Chapter 5 RESULTS.......................................................................................................... 75 5.1 Eliminating Trends: Details of Multiple Regression............................................... 75 5.2 Eliminating Seasonal Components in the Data: Details of Sinusoidal Decomposition .............................................................................................................. 81 5.3 Intervention Analysis: Details of the Fix I 5 Impact............................................... 85 v 5.4 Significance of Results ............................................................................................ 87 5.5 Discussion ............................................................................................................... 90 5.6 Implications for Transit Agencies for Future Road Closure Work ......................... 93 5.7 Threats to Validity ................................................................................................... 94 Chapter 6 CONCLUSIONS ............................................................................................... 96 6.1 Summary ................................................................................................................. 96 6.2 Future Work .......................................................................................................... 102 REFERENCES ................................................................................................................ 104 APPENDICES ................................................................................................................. 111 A. City of Sacramento Traffic Operations Center Visit, August 21, 2008 ................. 111 B. Original and Cleaned Transit Agency Ridership Data Sets ................................... 118 1. North Natomas TMA AM Ridership ................................................................... 118 2. North Natomas TMA PM Ridership ................................................................... 119 3. Roseville Transit AM Ridership .......................................................................... 120 4. Roseville Transit PM Ridership .......................................................................... 121 5. Yolobus Ridership ............................................................................................... 122 6. Yuba Sutter AM Ridership .................................................................................. 123 7. Yuba Sutter PM Ridership .................................................................................. 124 8. Regional Transit AM Ridership .......................................................................... 125 9. Regional Transit PM Ridership ........................................................................... 126 C. Independent Variable Data Sets ............................................................................. 127 1. Gas Price Independent Variable .......................................................................... 127 2. Unemployment Rate Independent Variable ........................................................ 127 3. Gross Domestic Product Independent Variable................................................... 128 D. Holiday and Limited Service Imputation Dates ..................................................... 129 1. Year: 2006 ........................................................................................................... 129 2. Year: 2007 ........................................................................................................... 131 3. Year: 2008 ........................................................................................................... 132 E. Ad Hoc Data Imputation Method Details ............................................................... 135 F. Histograms for Each Transit Agency ...................................................................... 136 G. Multiple Regression Model Selection .................................................................... 138 vi H. Transit Agency Periodograms ................................................................................ 143 1. Yuba Sutter AM Periodogram ............................................................................. 143 2. Yuba Sutter PM Periodogram ............................................................................. 143 3. Yolobus Daily Periodogram ................................................................................ 144 4. Roseville Transit AM Periodogram ..................................................................... 144 5. Roseville Transit PM Periodogram ..................................................................... 145 6. North Natomas AM Periodogram........................................................................ 145 7. North Natomas PM Periodogram ........................................................................ 146 8. Regional Transit AM Periodogram ..................................................................... 146 9. Regional Transit PM Periodogram ...................................................................... 147 I. Intervention Analysis Model Results ....................................................................... 148 J. Goodness of fit Tests............................................................................................... 152 vii TABLE OF FIGURES AND TABLES Figures Figure 1.1: The Fix I 5 Construction Area ......................................................................... 2 Figure 1.2: City of Sacramento T. O. C. ............................................................................... 6 Figure 1.3: The Fix I 5 Website Encouraged Transit ......................................................... 7 Figure 1.4: Informational Documents Regarding Fix I 5 Closures .................................... 8 Figure 3.1: Plots of Original Roseville Peak Period Ridership Data................................ 53 Figure 3.2: Roseville and Yuba Sutter Transit Histograms .............................................. 59 Figure 5.1: Yuba Sutter Transit AM Peak Periodogram .................................................. 82 Tables Table 3.1: Data Collection Details .................................................................................... 36 Table 3.2: Transit Lines Servicing the Downtown Core .................................................. 43 Table 3.3 AM Peak Period Definitions of Each Data Set ................................................. 43 Table 3.4: PM Peak Period Definitions of Each Data Set ................................................ 43 Table 3.5: Sample Size of Each Data Set ......................................................................... 44 Table 3.6: Independent Variable Details .......................................................................... 46 Table 3.7: Fare Pricing Details ......................................................................................... 47 Table 3.8: Percent Imputed Data ...................................................................................... 56 Table 3.9: Measures of Central Tendency: Mean and Median ......................................... 58 Table 3.10: Yearly Means and Medians for Transit Agencies with Data Spanning 2006 2008........................................................................................................................... ....... 59 Table 3.11: Means by Season for 2006, 2007 and 2008 ................................................... 61 Table 3.12: Means by Construction Period for 2008 ........................................................ 62 Table 3.13: Variance and Standard Deviation for Each Transit Agency .......................... 63 Table 4.1: Ridership Gas Price Correlation Coefficients ................................................. 67 Table 4.2: Ridership Unemployment Correlation Coefficients ........................................ 69 Table 4.3: Ridership Fare Correlation Coefficients.......................................................... 71 Table 5.1: Adjusted R2 With and Without the GDP Independent Variable...................... 78 Table 5.2: Statistically Significant Predictors of Bus Transit Ridership .......................... 80 Table 5.3: Means by Construction Period for 2008 After Detrending Data ..................... 81 Table 5.4: Statistically Significant Periodic Components for Each Agency .................... 83 Table 5.5: Intervention Analysis Final Model Results ..................................................... 87 Table 5.6: Final Model Significance ................................................................................. 88 Table 5.7: Interpretation of Model Significance ............................................................... 90 Table 6.1: Significant Periodic Components of Each Transit Agency ............................. 99 Table 6.2: Intervention Model Summary ................................................................... 101 viii ABSTRACT The Fix I 5 project was an engineering project that rehabilitated drainage and pavement on Interstate 5 in downtown Sacramento, from May 30, 2008 to July 28, 2008. In order to alleviate congestion, media outreach alerted commuters about projected traffic conditions as well as advised alternative modes or routes of travel. The construction schedule included complete closures of north or southbound portions of Interstate 5. This study analyzed the impact of the Fix I 5 project closures on peak period bus transit ridership of five transit agencies serving the downtown Sacramento core. The results indicated that gasoline prices and unemployment rates were statistically significant predictors of transit ridership, with increased gasoline prices and unemployment related to increased bus transit ridership. All agencies had overall increases in mean ridership during the study period, but there were also seasonal variations in mean ridership. Removal of trend and seasonal components in the bus transit ridership data sets was accomplished using multiple regression and sinusoidal decomposition. Time series intervention analysis then estimated that the Fix I 5 project had little impact on mean number of bus riders for all five transit agencies. Bus transit agencies with main service areas closest to the Fix I 5 project were most affected, with ridership increases of about three percent or less attributable to Fix I 5. This study did not analyze the impact of Fix I 5 on other modes of transportation, which may have been more affected than bus transit ridership. 1 CHAPTER 1 INTRODUCTION Interstate 5 ( I 5) is a major interstate that runs north south, connecting Mexico to Canada through California, and was started in 1947 by the Federal Highway Administration. The downtown Sacramento portion of I 5 was completed in the 1960’ s and is nicknamed the “ Boat Section” because it was constructed below the water level of the Sacramento River, which runs adjacent to the freeway ( Caltrans, 2008). In order to construct the boat section of the freeway, Caltrans had to initially drain this section, and engineer a drainage system of pipes and pumps. The boat section was manually monitored during each winter season to ensure pumps were working properly. After over 40 years and without major renovation, pavement cracking and sediment accumulation required the boat section to undergo repair, and an opportunity was provided for drainage system upgrades. The California Department of Transportation ( Caltrans) Engineers’ Estimate projected that the rehabilitation of drainage and pavement of Interstate 5 in downtown Sacramento, dubbed “ Fix I 5,” would take 305 working days at a cost of more than $ 44 million ( C. C. Myers, Inc., 2009). On February 2, 2008, a Rancho Cordova based engineering firm, C. C. Myers, Inc., won the Fix I 5 project bid with a proposed 85 working days and 29 night and weekend schedule at a substantially lower cost of $ 36.5 million, with financial incentives for earlier completion ( Caltrans, 2009). Aggressive and compressed construction schedules are not novel for C. C. Myers. Their resume includes more than 17 emergency projects for the State of California, including emergency work on the San Francisco Bay Area’s 2007 MacArthur Maze meltdown ( C. C. Myers, Inc., 2009). Although not emergency work, the Fix I 5 project specifically included a reconstructed six inch pavement slab, an upgraded drainage system, new de watering wells, and installation ( Solak, 2008). The project was completed in a shorter period than predicted, from May 30, 2008 to July 28, 2008 in 35 days and 3 weekends The Fix I 5 construction schedule portions of Interstate 5 through Sacramento emergency construction. Sacramento each day ( Schwarzenegger, 2008). periods, traffic congestion could increase nineteen times ( closure periods, traffic was detoured to arterial streets and other freeways. alleviate congestion, media outreach alerted commuters about projected traffic conditions as well as advised alternative modes of travel. Employers, including government which is one of the largest employers in the area with 75,000 commute encouraged employees to use alternative modes of travel ( Figure 1.1: The Fix I 5 Construction Area of electronic monitoring equipment . using full unidirectional closure periodically closed entire northbound or southbound Sacramento, a relatively new technique for non Approximately 200,000 vehicles travel on Interstate 5 in mento Reports projected that during closure Schwarzenegger, 2008 California state Schwarzenegger, 2008 2 closures. , non 2008). During In order to commuters, 2008). 3 1.1 Purpose The main objective of this analysis is to examine the effect that the Fix I 5 project had on commuters' mode choices, more specifically bus transit ridership ( supplementary studies are examining the impact of Fix I 5 on other modes of travel). This objective includes the determination of whether the Fix I 5 project caused changes in mean bus transit ridership levels, whether this effect on ridership was permanent or temporary, and the magnitude of the effect. This research includes not only those statistics, but also provides information for service changes for bus transit agencies that need to prepare for future planned construction work, which includes freeway closures such as Fix I 5, and also for unplanned events which force closures. 1.2 Analysis Scope The primary focus of media outreach was to suggest alternate transportation for those who commute on I 5. State governments and other employers with a large number of employees in the downtown Sacramento core urged employees to use alternate transportation during the Fix I 5 period. Consequently, this study analyzed bus transit agencies’ data from the morning ( AM) and evening ( PM) peak periods. The boundaries of the downtown core were defined as follows: the south boundary defined by the 50/ 80 freeway, the north boundary defined by Richards Blvd, the west boundary defined by the Sacramento River and the east boundary defined by the Business 80/ 99 freeway. Bus stops directly below freeway boundaries were considered part of the downtown core. This corresponds to other transit agencies’ definitions of downtown Sacramento. Since this analysis focused on commute behavior, only inbound ridership was considered for the AM peak period, while outbound ridership was considered for the PM peak period. 4 Inbound trips are defined as those trips with a final destination within the downtown core, while outbound trips originate within the downtown core but have a final destination outside it. The AM peak period is the primary morning commute period, but specific hours varied by transit agency. The PM peak period is the primary afternoon commute period and also varied by transit agency. In general, the peak periods occurred between the hours of 5: 00AM to 9: 00AM, and 3: 00PM to 7: 30PM. In order to accurately assess bus transit ridership in the downtown Sacramento area, this analysis employed bus transit ridership counts for five transit agencies which provide commute service to the Sacramento downtown core, including: Yuba Sutter Transit, Yolobus, Roseville Transit, North Natomas Transportation Management Association ( TMA) and Sacramento Regional Transit. As state workers comprise 75,000 commuters in Sacramento, and many state agencies have headquarters in the downtown core, the commute choices made by that group likely had a sizable impact on this study’s data sets. 1.3 Gap in Knowledge In general, many studies have examined transportation related data using time series methods, although not many have examined bus transit ridership. Few time series studies have analyzed bus transit ridership affected by an outside event ( an intervention) using intervention analysis. To date, there are no known studies that examine the intervention of construction work on bus transit ridership. 5 1.4 Response to the Event Many public and private agencies united to publicize, prepare and provide for public safety for the Fix I 5 project. These measures included public outreach, intercity and interagency partnerships including the City of Sacramento, City of West Sacramento, Sacramento Area Council of Governments, Downtown Sacramento Partnership, and the Old Sacramento’s Merchant’s Association. Other efforts included announcements via changeable message signs and highway advisory radio, and California Highway Patrol enforcement in the construction area. Much media outreach was done to warn commuters about traffic conditions and suggest alternative modes of travel. Additionally, various media sources made information about up to date information regarding the Fix I 5 project’s progress easily available to the general public. The Governor’s Executive Order ( S 04 08) cited Assembly Bill 32, the California Global Warming Solutions Act of 2006, and advised alternatives to widely used single occupant vehicle commuting including telecommuting and public transit. Some of the private entities that provided information included News 10, the Sacramento Bee, Sacramento Region 511, and Capital Public Radio, as well as some private business websites. Transit agencies responded to the Fix I 5 project by media outreach that advertised the convenience and availability of transit. 1.4.1 City of Sacramento Traffic Operations Center An operational tactic for traffic management is the use of traffic operations centers ( TOC). The City of Sacramento’s single jurisdiction, single agency TOC is operated by the City of Sacramento Traffic Engineering Services Department and funded by Measure A, the gas tax. The goal of their TOC is twofold; first, they must make Sacramento City’s 6 transportation network efficient for all transportation modes, and second, they must make the system reliable. Many steps were taken by the TOC in order to ensure their responsibilities were fulfilled during the Fix I 5 project. Planning steps included ( City of Sacramento, 2008): • Identification of potential problem corridors • Signal maintenance • Construction of Synchro ( transportation modeling software) Model • Modified signal timing plan • Coning & striping plan The TOC makes use of many tools for network monitoring and operation, especially useful during the Fix I 5 project, including ( City of Sacramento, 2008): • Closed circuit television ( CCTV) • Advance signal control systems • Sacramento Police Department Helicopter • Sacramento Police Officers • Signal and signage crews • Traffic cameras ( 8 Cameras in 2 streams) • Multi agency Construction Advisory Team ( CAT) • Traffic Alerts • Media Contacts Figure 1.2: City of Sacramento T. O. C. 7 A more detailed summary of the City of Sacramento TOC, based on a field visit on August 21, 2008, is provided in Appendix A. 1.4.2 Government Media Outreach Although all of the sources provided useful information, the official Fix I 5 website, supported by Caltrans, was the most comprehensive and accessible ( although no longer active circa August 2008). This website included daily updates ranging from construction updates to detours. It included sections on current work and a history of the portion of I 5 to be repaired ( the Boat Section). It also included useful links such as 511 Travel Info, Live Traffic Cameras, and Commute Alternatives. It also provided links to many downtown area businesses, some offering specials to entice people to stay downtown and avoid peak period travel. Caltrans also hosted three public meetings regarding Fix I 5 in Downtown Sacramento, Natomas and South Sacramento. They gave numerous presentations to audiences including state and local government agencies, residential organizations, private businesses and public officials, and reached an estimated 10,000 people. In addition to the Fix I 5 website and public meetings Caltrans provided public informational documents. They sent out an email to all Sacramento Personnel Departments which included recommended alternatives to normal work days, including Figure 1.3: The Fix I 5 Website Encouraged Transit 8 revised work schedules, telecommuting and public transportation. Caltrans provided paycheck stuffers to Sacramento Area employers through Public Outreach Contractors. This document advised departments to reschedule or postpone meetings and events that draw people to downtown. It also informs about a Cal EPA hotline set up for state workers who needed commute assistance during the Fix I 5 project. Caltrans outreach contractors made information cards available to Sacramento businesses located in the downtown area. These cards provided basic facts about the Fix I 5 project, as well as provided the Fix I 5 website address. Figure 1.4: Informational Documents Regarding Fix I 5 Closures Additionally, Assembly member Dave Jones' office sent out a letter to his constituents warning them about the Fix I 5 project, and traffic delays they might encounter. He also encouraged alternate forms of transportation during construction, as well as encouraging shopping or dining with downtown merchants during peak hours. Although not as comprehensive as the official Fix I 5 website, the City of Sacramento website provided information about the Fix I 5 project. The City of Sacramento also provided parking promotions for six of their parking garages for most of the duration of the Fix I 5 project. 9 1.4.3 Private Media Outreach Many private agencies also provided information regarding Fix I 5. In general, these postings included general and up to date information about the Fix I 5 Project, but some businesses provided unique information. The News 10 website allowed people to “ comment, blog and share photos;” an option not available on the Fix I 5 website. This feature allowed users to share alternate routes through blogs. It also provided Sacramento travel times, as well as easy to read color coded maps that showed lane closure information. The Sacramento Bee provided coverage regarding the Fix I 5 project, through their newspaper publication and website, which provided mobile alerts, a blog jam, and a complete listing of the Fix I 5 stories which were published in the Sacramento Bee newspaper. The Sacramento Region 511 website permanently provides information about traffic, transit, ridesharing and bicycling. They provided minimal coverage regarding the Fix I 5 project, but links to information on transit providers, finding carpools and vanpools, and a guide to bicycle commuting may have been particularly useful to downtown commuters. Capital Public Radio’s website provided information about Fix I 5, including a clever ‘ Jam Factor’ scale on their website showing congestion levels on Sacramento area freeways including both north and south bound I 5. Additional Sacramento area businesses posted information about the Fix I 5 project including the NBA Sacramento Monarch’s Basketball team, Natomas Racquet Club, California State University Sacramento, Talk Radio 1530 KFBK, and YouTube. 10 1.4.4 Transit Agency Outreach and Preparation To prepare riders for the Fix I 5 construction, Regional Transit ( RT) posted a press release on their website encouraging people to take transit during the construction period. With additional funding from Caltrans, RT was able to provide supplemental bus and light rail services that increased both capacity and reliability during their peak commuting hours. RT kept ten buses on standby during the construction period and advised passengers to take earlier buses when possible. RT also reminded the public of the 18 park and ride lots available throughout Sacramento. To prepare for the I 5 construction, Yolobus provided an I 5 Construction Options guide in their newsletter. The guide warned passengers of delays and advised them to take earlier morning buses to avoid these delays. Yolobus also took several measures to alleviate overcrowding and delays during the construction period. They had up to two supplemental buses on standby in case other buses were running behind. Yolobus added two morning and two afternoon express trips to both route 45 ( service between Sacramento and Woodland) and to route 43 ( service between Sacramento and Davis). In addition, Yolobus sold discounted Capitol Corridor train tickets in order to encourage drivers to take transit during the construction period. To accommodate for the Fix I 5 construction, Roseville Transit posted information on their website regarding the Governor’s Executive Order urging government employees to take transit during the construction. Roseville Transit encouraged new commuter passengers and listed on their website the AM and PM commuter routes with available seating. 11 In preparation for the Fix I 5 construction, North Natomas T. M. A. posted information in a specific Fix I 5 email newsletter about service changes for the construction period, including loop and route changes that went into effect on June 2, 2008. Additionally, supplemental shuttles and drivers were provided to ease the impact of the anticipated higher ridership during the construction period. The T. M. A. was able to provide additional shuttles with extra funds provided by Caltrans for the construction period, but were required to provide daily counts of AM and PM shuttle riders for each loop. North Natomas T. M. A. also created a special shuttle hotline for passengers to call for up to date information about route changes and delays during this period. In addition to the supplemental schedules, Yuba Sutter took several other measures to accommodate for the I 5 construction. Route or schedule changes were not made with the exception of minor detours during northbound I 5 closures. Second, Yuba Sutter had additional buses on call in the event that any early morning buses became overcrowded. Third, Yuba Sutter used all buses during the construction period, whereas they normally keep three buses non operational. And finally, Yuba Sutter closely monitored traffic conditions, which was made possible by improved connections with Caltrans, the City of Sacramento, and Regional Transit. 1.5 Organization of Analysis The organization of the analysis is as follows. Chapter 2 provides an overview of important concepts in time series which is used in this analysis. It also describes past studies analyzing bus transit ridership, and more specifically those few that used intervention analysis to analyze the impact of an intervention on a time series data set. 12 Chapter 3 describes the transit agency data, including details of each agency’s samples and collection methods, as well as data quality considerations. It also includes information about data cleaning, which was needed to adjust for holidays and limited service days. Finally, data exploration is presented in two sections: measures of centrality and measures of spread for the transit agencies’ data sets. Both sections begin by briefly defining the statistics included in that section. Chapter 4 describes the methodology for eliminating trends and cyclic components, and the intervention analysis which examined the impact of the Fix I 5 construction on bus transit ridership. Chapter 5 presents the results of the intervention analysis for each agency, in addition to implications for bus transit agencies for future freeway closures. To conclude, Chapter 6 summarizes the analysis methods and results, and gives recommendations for future work. 13 CHAPTER 2 LITERATURE REVIEW Many studies have been conducted analyzing variables that impact transit ridership, primarily using two statistical methods of analysis; time series, and multiple regression. Some, categorized as econometric studies, use those two statistical methods with a focus on economic theory. Time series is used to analyze a series of data points, to understand the underlying order or context of the data. A review of the literature ( Cryer, 1986; Shumway and Stoffer, 2006; Brockwell and Davis, 2002; Anderson, 1976; Kendall, 1973; Kyte et al., 1988) identified a host of different methods used to model time series data, including but not limited to univariate and multiple time series models and transfer function models. Simple regression is used to analyze the change in a dependent variable as an independent variable changes or is manipulated, while multiple regression uses multiple independent variables ( Mann, 2004). However, all regression models assume that the error terms, and therefore response variable observations, are uncorrelated ( Kutner et al., 2005). In contrast, time series data often contains observations which are serially dependent ( Box and Tiao, 1975). Additional regression methods have been developed that are used for autocorrelated time series data. They employ typical regression techniques, but model the error term using time series models ( Tsay, 1984). Econometrics uses statistical methods to study economic principles ( Tinbergen, 1951). The primary focus is the evaluation of economic theory using statistical methods. Discussions of strict and weak stationarity, autoregressive models, and lag structures are found in both econometric time series literature and statistics time series literature. 14 However, a standard tool in econometrics is to use the structural econometric time series approach ( SEMTSA), which uses Box Jenkins methods but imposes a priori restrictions on the equations based on economic theories ( Christ, 1983). Further, this approach is commonly simplified to vector autoregression models ( VAR) which omit the moving average polynomial of the ARIMA model ( Zellner and Franz, 2004). Time series was the primary method of analysis used in this study, as autocorrelation was likely to be present in the transit ridership data. Time series analysis encompasses a wide range of models which can handle multiple scenarios within data sets. Time series intervention analysis was used, which provided a methodology to determine the effects of one event on a series. This study used the ARIMA class of time series models, which specify only causality and invertibility as restrictions on the parameters, a feature that was an advantage over models which place additional assumptions on the parameters. Regression was also used to analyze the relationship between multiple independent variables and transit ridership, and for eliminating trends related to independent variables in the transit ridership data sets. 2.1 Time Series Because time series is a method less commonly used in the field of transportation engineering, a brief overview is given in the following sections. 2.1.1 Background A time series ( xt) is a sequence of observations collected over time for one variable. Time series can be either continuous or discrete depending on how the observations have been collected. A time series is continuous if observations are taken continuously over time 15 whereas the series is said to be discrete if observations are taken at specific times ( Chatfield, 1975). Time series is concerned with chronologically ordered observations of time. Data that is observed over time, both discrete and continuous, is common across many disciplines. In the field of engineering, some examples include series observed over time such as traffic counts and water quality measures. There are many examples in economics, including profits, interest rates, as well as overall economic indicators such as gross domestic product and unemployment rates. In meteorology, a common observation that constitutes a time series is temperature. Because future observations could be hard to predict, a time series ( xt) is more technically a realization ( sample function) of a stochastic process ( Xt), which is a family of random variables ( Brockwell and Davis, 1987). Time series analysis focuses on studying a time series realization ( xt of Xt) in order to gain insight into the stochastic process ( Xt) ( Aue, 2009). In practical time series analyses, much of the work is devoted to transforming a nonstationary time series into a stationary process ( Fuller, 1976). Conceptually, stationarity is similar to equilibrium within a system. A time series is strictly stationary if its probability structure is not affected by time ( Anderson, 1971). In other words, the joint probability distribution of xt… xt+ n, is equivalent to the joint probability distribution of xt+ h… xt+ h+ n for all t,..., t + n T and h such that t + h,… , t+ h+ n T. ( Montgomery et al., 2008). A typically less strict definition of stationarity ( for cases where the variance is finite) is called weak stationarity, and is often used because distribution functions are commonly unknown. In order for a time series to be weakly stationary there are two conditions ( Shumway and Stoffer, 2006; Brockwell and Davis, 2002): 16 1. The first moment of xt is independent of time, t, and is constant. 2. The autocovariance function, defined as , , which depends only on lag h, and is independent of t. One important example of a stationary process is called white noise. White noise is commonly denoted ~ 0, , where Zt is a sequence of uncorrelated random variables with zero mean and finite variance, σ 2 ( Shumway and Stoffer, 2006). White noise is an important building block in time series analysis, as it is the foundation for many more complex processes ( Cryer, 1986). It is interesting to note that term white noise is derived from white light which is composed of a continuous distribution of wavelengths with the implication that white noise is composed equally of oscillations at all frequencies ( Shumway and Stoffer, 2006). Furthermore, if the series of shocks generated are not just uncorrelated ( a white noise process), but are independent and identically distributed, the sequence is called i. i. d., denoted ~ 0, ( Anderson, 1976). Further, if the series is normally distributed, it is both white noise and i. i. d.. Often, a time series ( Xt) can be well explained by a trend component ( mt), a seasonal component ( st), and a zero mean, random error component ( Yt) ( Chatfield, 1975). The process can be represented in the form . The following provides a short description of each component, although it should be noted that a time series model may exhibit any combination of these components: The trend component ( mt): Encompasses long run changes in mean. Trends can have many underlying causes including, but not limited to, changes in economic 17 conditions, technological changes and changes in social custom ( Farnum and LaVerne, 1989). The seasonal component ( st): Encompasses cycles at any recurrent period. This component can include obvious seasonal or annual cycles, or less apparent cycles occurring at any fixed period such as a daily, weekly, or quarterly basis. The noise component ( Yt): A zero mean, random error component. There are multiple methodological approaches to the analysis of time series data, more specifically to the removal of trend and seasonal components, including the use of both the time and frequency domains. Analysis in the time domain bases inference on the autocorrelation function, while analysis in the frequency domain pertains to inference based on the spectral density function. Both domains can be used to eliminate seasonal components, while trend components can only be eliminated in the time domain. In this study a decomposition method was used which identified and separately removed the trend and seasonal components from the series. The removal of trend components used methods associated with the time domain, and the removal of seasonal components used methods associated with the frequency domain. 2.1.2 Trend Components Analysis in the time domain includes methods for removal of both the trend and seasonal components including least squares estimation, smoothing with moving averages, differencing, small trend methods, and moving average estimation ( Aue, 2009). Additionally, trend components can be removed using regression techniques ( Yaffee, 2000). Aue ( 2009) provides a detailed description of each method. This study used 18 multiple regression to remove trend components. The multiple regression method is discussed below, in addition to differencing which is referred to in later sections: Multiple Regression: When there are four predictor variables, , , ! , " as in this analysis, the model is formulated as # $ # # # ! ! # " " % . In this study, the combination of the predictor variables ( # # # ! ! # " " ) constitutes the trend component, while the regression error term ( % ) constitutes both the seasonal and error terms ( st + Yt). As discussed previously, standard linear regression models assume that the error terms, % , and therefore, response variable observations, , are uncorrelated ( Kutner et al., 2005). Time series data, on the other hand, often contains observations which are serially dependent ( Box and Tiao, 1975). Therefore, modifications to standard linear regression would be necessary, including modeling the error terms as time series autoregressive moving average models ( Tsay, 1984; Ostrom, 1978). Differencing: Applies the difference operator to the original series in order to create a new, stationary series. The lag & difference operator ( ' ) is defined as ( Shumway and Stoffer, 2006): ' ( x * + , +  . . In practice, it is common to denote the use of the difference operator by using the backshift operator, B. In this case, ' ( x * + , +  . 1 , 0 . + 19 2.1.3 Seasonal Components This study’s decomposition method removed the trend components using multiple regression, and removed seasonal components from the series using a frequency domain approach. The frequency domain, also referred to as the spectral domain, pertains to inference based on the spectral density function. A time series can be decomposed into periodic components, each of which contains variation at that period’s frequency, whose variations combine together to cause the overall variation in the time series. Therefore, a time series can be well represented as the sum of significant periodic components ( Chatfield, 1980): + 1 A 3 cos 7 2πω 3 t : ; < = B 3 sin 7 2πω 3 t : where Aj and Bj are uncorrelated random variables with mean zero and variances both equal to σ 2 and A B , where d is the period of the cycle. For example, if there is an annual cycle and the data set contains monthly data points, one period, d, could be 12. Exploratory analysis using the periodogram can help to determine genuine periodic ( seasonal) components within the time series, Xt. The definition of the periodogram for { X1,…, Xn} is given below ( Brockwell and Davis, 2002): A 2 C 1 D E 1 X * e H * ω I = E where ω is the frequency. The periodogram is the graph of A and A and is an estimation of the power spectral density function. Although the periodogram is not a consistent estimator of the spectral density because the variance of A does not 20 decrease as the sample size, n, increases, it will be used in this analysis to determine periodicities, which is a common practice ( Chatfield, 1975). If the periodogram is constructed for , π P ω P π the area under the periodogram represents the variance of the time series ( Brocklebank, 2003). Therefore, peaks in the periodogram generally indicate frequencies that can explain a significant part of the total variance. For example, a periodogram that displays a large peak at frequency A 0.25, indicates a period, S 4, which for quarterly data indicates an annual cycle. If a periodogram does not display any obvious peaks, all frequencies are contributing to the series’ variance, and the series may even be a white noise process. The variance by cycles can be decomposed as follows ( Aue, 2009), 7 A < : S U 7 A < : 2 S V 7 A < : where S U 7 A < : √ I Σ cos 2 C A < Y I = and S V 7 A < : √ I Σ sin 2 C A < Y I = . As discussed, the periodogram can help to determine seasonalities and peaks in the periodogram can signify a genuine periodic component which explains a large portion of the variance in the time series. However, it is possible that peaks may occur because of random fluctuations in the sample ( Priestley, 1981). This study used spectral analysis of variance to determine whether peaks in the periodogram explain a larger portion of the variance than is expected with sequences such as white noise and ARMA processes. 21 2.2 Goodness of fit Tests Ideally, after trend and seasonality are removed, the remaining series will be a white noise process. There are many goodness of fit tests to determine whether the residuals are white. For an extensive review of diagnostic checks, refer to Li ( 2004). For the purposes of this study, four goodness of fit tests will be utilized, including the sample autocorrelation function ( ACF), the portmanteau test ( Ljung Box modification), the rank test, and a test of normality including the squared correlation ( R2) based on a qq plot. An explanation of the four goodness of fit tests is described below: 1. The sample autocorrelation function ( ACF): The autocorrelation function and sample autocorrelation functions at lag h are defined as ( Anderson, 1976): ρ Z $ ρ [ Z [ [ $ For a series, Y1,…, Yn, with a large sample size, n, the sample autocorrelations are i. i. d. with zero mean and variance I ( Brockwell and Davis, 2002). Therefore, in order to test for randomness, a plot of the sample autocorrelation function for any amount of lags h should show should that 95% of those lags fall within the bounds \ . ] ^ √ I if the process is i. i. d. ( Aue, 2009). 2. The portmanteau test ( Ljung Box modification): In order to test for randomness, originally, Box and Pierce ( 1970) suggested the portmanteau test, and developed the statistic, Q, as Q( _ ̂ D Σ _ ̂ a < = 22 where _ ̂ is defined as the autocorrelation function. Ljung and Box ( 1978, p. 298) suggest that the Box Pierce methodology produces “ suspiciously low values of Q( _ ̂ …” and propose a modified version as Q( _ ̂ D D 2 Σ I  < _ ̂ a < = where Q can be approximated as a chi squared distribution with h degrees of freedom. The hypothesis that the residuals are i. i. d can be rejected at the level α, if b c d  e ( Brockwell and Davis, 2002). 3. The rank correlation test: The rank test is a test of randomness, to establish whether there remains any systematization in the residuals. For a time series, a trend can be determined by the correlations between the rank order of the time series observations and their time values ( Kendall, 1955). In total, there are b n n , 1 pairs, where P designates the number of positive correlations, and Q designates the number of negative correlations. P is represented by Kendall’s τ, called the coefficient of rank correlation: τ f g h i I I  The coefficient of rank correlation ranges between 1 ( perfect positive correlation) and  1 ( perfect negative correlation), with τ 0 representing an independent, white noise process. Refer to Kendall ( 1955) for further explanation. 4. R2 based on a qq plot: In order to assess the normality of the residuals, the squared correlation ( R2) value can be calculated based on a quantile quantile plot ( qq plot). A qq plot is a graph that compares the quantiles of two distributions. For this study, the first data set is the ordered residuals from the fitted model assuming a mean zero, 23 variance one process denoted as Yj. The second data set is ordered statistics from a random normal sample with mean μ, variance σ 2 denoted as nj. If the model residuals are normally distributed, the pairs ( nj , Yj) should have a linear relationship ( Shumway and Stoffer, 2000). Hence, perfect normally distributed residuals would display an R2 value equal to one. If the R2 value is too small ( based on the level α), then the assumption of normality must be rejected. More specifically, the R2 value can be computed as follows, noting that Φ 3 represents the normal distribution: R k Σ D < , l < Φ 3 I < = m Σ D < , l < I < = Σ Φ 3 I < = Refer to ( Shapiro and Francia, 1972) for the critical values of R2. For residual testing in this study, lag h = 20 was used which is commonly used in time series residual testing ( Shumway and Stoffer, 2000). 2.3 Multicollinearity Multicollinearity occurs when independent variables are highly correlated in a multiple regression model ( Kutner et al., 2005). This means that the two correlated variables are not providing independent information which helps to predict the dependent variable. Severe cases of multicollinearity must be corrected, because the result can be unstable regression coefficient estimates. Further, multicollinearity is often indicated by very large standard errors, even though the coefficients are still the best linear unbiased estimators ( BLUE) ( Washington et al., 2003). If two independent variables are highly correlated, it is difficult to determine which variable is explaining more variation in the dependent variable ( both variables’ standard errors will become large). Another test for the presence 24 of multicollinearity is the comparison of correlation coefficients to regression coefficients. If their signs are different (+/) then multicollinearity should be further investigated ( Kutner et al., 2005). Two methods for detecting multicollinearity are the variance inflation factor and the condition index. 1. The variance inflation factor ( VIF) is defined as n o p q i r s t q t i where # p t are the estimated standardized regression coefficients and t is the variance of the error term1for the correlation transformed model ( also called the standardized regression model). The multiple regression model discussed previously was # $ # # # ! ! # " " % , while the standardized regression model is t # t t # t t # ! t ! t # " t " t % t . If the mean of the VIF values is greater than 1, serious multicollinearity may exist ( Kutner et al., 2005). 2. The condition number ( κ) is defined as the largest condition index ( CI). It is defined as u v w x y z w x {  where λmax is the largest eigenvalue, and λmin is the smallest eigenvalue of the } matrix. Condition numbers between 5 and 10 indicate some dependence, while CI values of 30 and above signify strong dependencies ( Belsley et al., 1980). 2.4 Lagged Variables In time series regression models, it is often the case that time lags need to be included ( Ostrom, 1978). For example, there is a time lag associated with exposure to carcinogenic substances and the development of cancer. If there are time lags between a change in the 1 In this study, the error term ( when testing for multicollinearity) includes a seasonal and noise term ( st + Yt). 25 independent variable and the effect on the dependent variable, a lag term should be included in the regression model. In this study, the explanatory variables were unleaded regular gas prices, unemployment rates, gross domestic product and transit fare prices, each of which could have a time lag with transit ridership. However, a time series study of Portland, Oregon transit ridership between 1971 and 1982 focusing on factors that affect ridership show that neither gas price ( aggregation level unspecified) nor county employment rates show a time lag for bus transit ridership ( Kyte et al., 1988). However, Kyte et al. found a time lag between transit fare prices and ridership. The authors stated that the largest response in ridership to the fare increase occurred almost immediately, and then decayed at a measurable rate for three months. Prior studies have not determined a set of independent variables that consistently predict bus transit ridership. The effects of GDP on ridership have not been studied. 2.5 Box Jenkins ( ARMA) Models In the time domain, linear filters are often used to transform one time series into another, under the assumption of linearity, and can be defined as: Y * 1 Ψ 3 ∞ < =  ∞ X *  3 where Ψ 3 ′ s are weights for each X * , and X * and Y * are the input and output time series, respectively ( Chatfield, 1975; Montgomery et al., 2008). There are many types of linear filters which can be applied to white noise to obtain a more complex linear time series. In general, there are three major classes of linear filters, including autoregressive, moving average and autoregressive moving average filters. They are described below: 26 1. Autoregressive Process, AR( p): An autoregressive process can be represented as:   . The equation’s conceptual interpretation is that the current time series observation is a linear weighted combination of the p most recent past values of the same time series, plus an error term ( Montgomery et al., 2008). The autoregressive polynomial is defined as 1 , , , , . The roots of the polynomial 0 must lie outside of the unit circle to ensure that an AR( p) process is stationary; a condition commonly referred to in time series literature as causality ( Box et al., 2008). 2. Moving Average, MA( q): An moving average process can be represented as: ,  , ,  . Observably, a moving average model assumes the current value is a linear weighted combination of q lagged white noise terms. Further, a condition called invertibility is imposed on the weights, θ 3 , to ensure a unique MA process for an autocorrelation function ( Chatfield, 1980). The moving average polynomial is defined as 1 . An MA( q) process is invertible if the roots of 0 are outside the unit circle ( Box et al., 2008). Invertibility and stationary are two separate conditions; an MA( q) process will always be stationary. 3. Autoregressive Moving Average, ARMA( p, q): An autoregressive moving average ( ARMA) model assumes that the current observation is a linear weighted 27 combination of the p most recent past observations from the same time series ( the AR( p) portion), as well as q lagged white noise terms ( the MA( q) portion). An autoregressivemoving average process ARMA( p, q) can be represented as   ,  , ,  . An ARMA ( p, q) process is causal if the roots of the polynomial 0 lie outside of the unit circle, and is only invertible if the roots of 0 are outside the unit circle ( Box et al, 2008). The coefficients for causality are computed from the expression Ψ , while the coefficients for invertibility computed from the expression Ψ . 2.6 Intervention Analysis Gene Glass ( 1972, p. 463) coined the term intervention and described it as follows: “ Observation of a variable Z at several equally spaced points in time yields the observations , ,…, . Suppose that an intervention ( T) is made at some point in time before time N into the process presumed to be controlling Z. The time series is said to be interrupted at a point in time, say D less than : ,…, I h , , I h ,…, .” Box and Tiao ( 1975) used the term intervention and constructed an analysis method to determine the effect of an intervention, occurring at a known time in a time series. Their intervention model is based on the basic transfer function model, , 1 < ∞ < = $  < , where Xt, 1 represents the input series, while Xt, 2 represents the output series, which constitutes common notation in transfer function modeling. In intervention analysis, it is 28 more common to replace Xt, 1 with Xt, and also to replace Xt, 2 with Yt. The basic intervention model can be described: 1 < ∞ < = $  < where Xt and Yt are the input ( pulse/ step) and output ( ridership, after removal of trend and seasonal components) series of the model respectively, < is a linear filter and Nt represents a noise sequence. < is defined as < a / where is the cross correlation between Xt and Yt, and σ 2 is the variance of each series. In the case of intervention “ 0 Σ < ∞ < = $ 0 < is simplified with a rational operator of the form T( B) " where b is the delay parameter, and W and V help to provide coefficients to represent more complicated indicator series built upon a step or pulse function ( Brockwell and Davis, 2002, pp. 340 341). The intervention term is then 0 . For a series that might be best represented as an intervention causing a temporary change in the response variable, a pulse indicator variable would be most appropriate: 01 Y Y where t is time, and T is the period of the intervention. For a series that might be best represented as an intervention causing a permanent change in the response variable, a step indicator variable would be most appropriate: 29 01 Y Y . In general, the Box and Tiao intervention analysis methodology follows a five step process ( Box and Tiao, 1975): 1. Eliminate trend and seasonal components from the original time series ( Xt). This study eliminated trend components from the series through a multiple regression, and eliminated seasonal components using sinusoidal decomposition with cycles determined by the periodogram and cycle significance based on spectral ANOVA. 2. Use ordinary least squares ( OLS) regression to obtain a initial estimate of < , which represents the transfer model. 3. Model the residuals from the OLS regression as an ARMA( p, q) process, which will represent the noise model. For model diagnostics, analyze the residuals using goodness of fit tests. 4. Minimize the sum of squares, Σ ¡ ¡ ¢ W, V, ¥ ¦ , θ ¦ I = ¨ t , where m*= max ( p2 + p, b + p2 + q), in order to obtain final parameter estimates of both the noise and transfer models. 5. Analyze the final model residuals using goodness of fit tests. This study used the Sample ACF, qq plot, Ljung Box test and rank test. 2.7 Review of Relevant Past Studies Previous studies involving multiple regression and time series analysis are discussed. Many studies have examined transportation related time series data and used time series methods to analyze the data. Kyte et al. ( 1988) reviews previous work in transportation 30 related time series, other than bus transit ridership. For example, Atkins ( 1979) analyzes the effect of speed limit changes on traffic accidents in British Columbia in the 1970’ s using intervention analysis. Additionally, studies that use vehicle miles travelled ( VMT) forecasting models show that VMT can be predicted by independent variables that are similar to those used to predict transit ridership. Common predictors include, but are not limited to, gasoline price ( Schimek, 1996 ( 1521 and 1558); Goodwin et al., 2004; Gately, 1990) and income ( Schimek, 1996 ( 1521 and 1558); Goodwin et al., 2004; Gately 1990). Dahl ( 1986) summaries previous research on gasoline consumption demand, VMT and miles per gallon ( not just VMT), finding negative elasticities for price, and positive elasticities for income. Elasticities measure the responsiveness of one variable to change in another variable. Mokhtarian et al. ( 2002) analyzed induced demand with respect to highway capacity expansion, and listed predictors of induced vehicle travel as changes in population, demographics, the economy, mode and land use, but not highway capacity expansion. Rose ( 1982, 1986) examines rail transit ridership using time series and multiple regression techniques. Rose ( 1986) studies Chicago Transit Authority rail ridership, more specifically, 11 years of monthly average weekday data. He used fares, weekday service miles, cost of car trips ( including gas prices), and weather changes and found that gas prices and service levels were significant predictors of rail ridership. But, there are few studies that analyze bus transit ridership with time series models, a fact that was confirmed by librarians at the Physical Science and Engineering Library at UC Davis, and the Institute of Transportation Studies Library at UC Berkeley. Those pertaining to transit ridership ( defined as both bus and rail, or just bus) are discussed below. 31 2.7.1 Predicting Transit Ridership Using Multiple Regression A number of studies use multiple regression techniques to determine factors that affect bus transit ridership. Those studies take into account autocorrelation in the residuals to ensure valid model results. The following presents studies which use multiple regression as the primary analysis method. Agrawal ( 1981) analyzed Southeastern Pennsylvania Transportation Authority’s City Transit Division’s annual full fare adult ridership between 1964 and 1974. Using multiple regression, he found that three factors were statistically significant in affecting ridership and produced a multiple correlation coefficient of 0.9985. The three significant predictors included average fare ( adult riders), jobs in Philadelphia, and bus miles of service, while number of vehicles owned was not a significant predictor. Lane ( 2009) applied regression techniques to monthly bus and rail transit ridership data from nine US cities between January 2002 and April 2008, and found that gasoline prices were a statistically significant predictor of changes in transit ridership, while service characteristics and seasonality were not significant predictors. Wang and Skinner ( 1984) analyzed fares, gas prices and monthly ridership data from seven transit authorities across the United States, and using regression techniques, found that as real gasoline prices increased, transit ridership increased, although by a small amount. Also, they found that as real fares increased, ridership decreased. Taylor et al. ( 2009) analyzed transit ridership from 265 urban areas using 22 independent variables to and show that the majority of transit ridership variation can be explained by variables within the categories of regional geography, metropolitan economy, population characteristics and auto/ highway system characteristics. They found a positive correlation between ridership and gas prices, and a negative correlation between ridership and 32 unemployment levels. Gomez Ibanez ( 1996) reported an increase in Massachusetts Bay Transportation Authority ( MBTA) bus transit ridership in Boston, in part due to service improvements such as phased station modernization and bus replacement, and transit fares which increased less than the inflation rate. Their study also included income, Boston employment, fares, and vehicle miles. Kitamura ( 1989) showed a causal relationship between car ownership and transit use, more specifically that an increase in car ownership leads to a decrease in transit use, using Dutch National Mobility Panel weekly travel diary data. Cervero ( 1990) provides a broad overview, and summarizes multiple empirical studies which show that there are many factors affecting transit trips, including characteristics of the traveler such as age, income, auto access, trip purpose, trip length, and also characteristics of the operating environments, such as land use and location settings. Although each study used a different set of independent variables to predict transit ridership, most of the studies that used multiple regression included gas prices, fares, and economic indicators such as unemployment rates. 2.7.2 Transit Ridership and Intervention Analysis In terms of transit ridership and intervention analysis, there is a scarcity of previous studies. Kyte et al. ( 1988) use Tri County Metropolitan Transportation District of Oregon bus transit ridership on various aggregation levels ( system, sector and route levels) between 1971 and 1982 to show that service level, transit fares, gasoline price, and employment are statistically significant predictors of bus transit ridership. They also note that to fully explain ridership demand, many more independent variables should be considered. Kyte et al. ( 1988) used intervention analysis to model changes in bus transit 33 ridership resulting from eleven separate cases of changes in their predictor variables including increased fares, system wide service changes, and route level changes, and they observed that the occurrence of multiple events at one time makes it difficult to isolate the impact of any single event on ridership. Their results showed that for the four cases of fare increases, the result in terms of ridership is varied. The separate interventions of system wide service changes and gasoline supply shortages combine to produce an intervention output of an additional 8,400 bus transit riders. Kyte et al. use elasticities greater than one to determine significance of intervention results. Narayan and Considine ( 1989) use intervention analysis to analyze two cases of fare increases, in April 1980 and April 1984, and their effects on monthly upstate New York transit ridership, assuming that ridership could be decomposed into a trend, seasonal, intervention and noise term. They assume that the intervention term is best represented as a step function; an “ abrupt and permanent change” in ridership ( Narayan and Considine, 1989, p. 248). Their methodology differs from the original Box and Tiao intervention analysis methodology, as their model isn’t based on the transfer function model, but on regression with correlated error terms, and eleven indicator variables for seasonality, indicator variables for the two fare price increases, and an error term which they claim “ correct[ s] for autocorrelated errors” ( Narayan and Considine, 1989, p. 249). However they didn’t use ARMA models for the noise, and it is unclear how they corrected for correlation, because they used t tests which require the removal of serial dependence, nonstationarity and seasonality. Both fare interventions produced significant ridership decreases. Considine and Narayan ( 1988) use data from Chattanooga, Tennessee and intervention analysis to examine the affect of market changes on total ridership, total operating 34 revenues, the ratio of total operating revenues to total revenue miles, and the ratio of total passenger trips to total revenue miles. They slightly modify the Box Tiao methodology by first using the entire sample to model the noise term, then separately estimating the intervention term, and then minimizing all parameters. They use t statistics to test for significance. They show that marketing does significantly affect transit ridership. 2.8 Summary of Literature Review An extensive literature review identified a number of past studies using transportationrelated data. Fewer studies used both multiple regression ( taking into account autocorrelation) and time series methods for the analysis of predictors for transit ridership. There were still fewer time series studies that analyzed transit ridership affected by an intervention, using intervention analysis. To date, there are no known studies that examine the impact of the intervention of construction work on bus transit ridership. 35 CHAPTER 3 DATA DESCRIPTION This chapter describes the data used in this analysis. A brief description of methods of ridership data collection is given. Each bus transit agency is described, with their ridership data, and the methods they use to collect ridership data. Data filtering that was required to construct a data set for this analysis is described, with information regarding data imputation for missing data. An analysis was performed on each of the ridership data sets to determine if any independent factors played a significant role in ridership changes during the period of analysis. Data quality with relation to methods of data collection is discussed. 3.1 Methods of Data Collection Four methods of ridership data collection were used among the five transit agencies that provided service to the downtown core. Those methods included automatic passenger counters ( APC), electronic fareboxes, manual counts by route checkers, and manual counts by bus drivers. A description of each method is provided below: 1. Automatic Passenger Counters ( APC): APC devices are often doormounted and use infrared beam technology to automatically count boarding and alighting riders. Many use GPS technology to associate collected data with a time and location. 2. Electronic Fareboxes: Electronic fareboxes are devices that collect ridership information. Typically, a bus driver enters a number corresponding to rider type into a key pad on the electronic farebox which stores the data until it is uploaded to a network. Usually, electronic fareboxes do not provide location information. 36 3. Manual Counts by Route Checkers: Route Checkers take manual counts of passengers boarding and alighting at each stop, and the arrival and departure times of these stops. 4. Manual Counts by Bus Drivers: Bus Drivers take manual counts of passengers boarding and alighting at each stop, and the arrival and departure times of these stops. 3.2 Data Sample This section describes the data samples provided by each of the five bus transit agencies. The section is divided into five sub sections, one for each agency. Each sub section includes a brief background of each bus transit agency, ridership data collection methods employed by each agency, and the data provided by each agency. Unless otherwise stated, all information regarding each transit agency was obtained through personal correspondence as listed in Table 3.1: Table 3.1: Data Collection Details Transit Agency Contact Contact's Official Position Title Type of Personal Correspondence Regional Transit James Drake Assistant Planner e mail, phone, mail, in person Yolobus Erik Reitz Transit Planner e mail, phone, mail, in person Roseville Transit Teri Sheets Alternative Transportation Analyst e mail Elizabeth Haydu Administrative Technician e mail, phone, in person North Natomas TMA Sarah Janus Program Coordinator e mail Yuba Sutter Transit Dawna Dutra Analyst e mail, phone 37 3.2.1 Regional Transit The Sacramento Regional Transit District ( RT) operates bus and light rail transit serving 418 square miles of the greater Sacramento metropolitan area ( Regional Transit, 2009). They are the largest provider of public transportation within the City of Sacramento, operating 256 buses servicing 97 bus routes with more than 3,600 bus stops which operate from 5 A. M. to 11: 30 PM, 365 days per year ( Regional Transit, 2009). RT uses all four data collection methods described in Section 3.1. Regional Transit is the only transit agency within our study that collects ridership data using APC devices, which have been installed in half of the RT bus fleet. Electronic farebox devices are installed on all RT buses, and is the method that RT uses for annual reporting. But because electronic fareboxes don’t keep track of location or alighting passengers, this data was not suitable for this study. The ridership data for RT consisted of APC data even though it is not used for official reporting. It records boarding and alighting riders, as well as time and location stamps for each record, which was necessary for filtering purposes. APC devices are still in testing stages. The FTA’s National Transit Database requires that two random bus trips must be sampled per day by route checkers, which is why RT also employs this ridership collection method ( Drake, 2007). Finally, RT makes use of manual counts by driver for its specialized Community Bus Service ( now called Neighborhood Ride) which offers intraneighborhood service within certain communities while also servicing seniors and the disabled. Because of the expansiveness of RT, total ridership counts are nearly impossible to obtain. On a normal weekday, the RT bus system makes 3,000 trips. As mentioned, 38 approximately half of its bus fleet is equipped with APC devices, which results in the collection of data for 1,500 trips per day. The data is wirelessly uploaded to the RT network, where it undergoes filtering which uses by a relaxed set of rules to remove faulty data. The core set of rules that determine the filtering include: • The difference between total riders on and total riders off for a bus must be 10% or less, • The difference between total riders on and total riders off for a block must be 10% or less ( a block is a schedule for one physical bus each day), • The difference between total riders on and total riders off for a trip must be 10% or less, • The number of stops counted must be “ pretty close” to the actual number of stops on the route, • Records showing obvious technology malfunctions. As filtering occurs, records are deleted from the database. After the filtering process is complete, about 400 records per day remain. The original raw data only remains as the output from the APC device in the form of a text file. It is difficult to accurately assess RT’s total bus ridership because the data is not a random sample, which is a result of the filtering process and the fact that bus lines are not randomly chosen. Because total ridership data by route is not easily obtained with filtered APC data, the raw APC data was also provided but was not used due to data quality concerns. In addition, data from General Farebox Inc. ( counts from the fareboxes on the buses), Parking Lot, Cash, and fare vending machine count data was provided. That additional data was not useful as it 39 provided system wide information, and was not specific to only those bus lines serving the downtown area. The APC daily data, including weekdays and weekends, spans from January 7, 2008 to December 30, 2008. Initially, the APC data was a count of boarding and alighting riders of a randomly chosen number of stops within the entire RT service area, allowing for the filtering of bus lines serving the downtown area. The data also contained the time and date of the observation as well as the bus stop identifier and route schedule identifier. RT defined the AM peak period to be 6: 30 9: 00AM, and the PM peak period to be 3: 00 6: 00PM. Although the RT sample only contains 49 weekly ridership counts, Cherwony and Polin ( 1977) used daily bus transit ridership data from Albany, New York to show that only 30 days of transit ridership data is needed to develop a valid travel forecasting model. 3.2.2 Yolobus Yolobus is operated by the Yolo County Transportation District and serves Yolo County and surrounding areas including Davis, Sacramento, Winters, and Woodland among others. Unlike RT, Yolobus also provides service to the Sacramento International Airport. Yolobus uses electronic farebox devices as well as manual counts by bus driver to collect ridership information. Yolobus separates its services into three types: regular, commute, and express. Regular services run every day of the week whereas commute and express services only run Monday through Friday. The agency operates 365 days of the year. Since Yolobus offers different types of services to the downtown core, the operating times of those services vary. Regular bus routes ( 40, 41, 42A/ B, 240) collectively run all 40 day from 4: 37 AM to 11: 48 PM during the week. The commute and express services, which only run Monday through Friday, only run during peak commuting periods. The commuter routes ( 39, 241) collectively run from 5: 35 AM to 8: 30 AM and 3: 35 PM to 6: 34 PM. Similarly, the express routes ( 43, 44, 45, 230, 231, 232) collectively run from 5: 55 AM to 8: 32 AM and from 4: 03 PM to 7: 17 PM. Yolobus restricts passenger travel in downtown Sacramento by not allowing passengers to both board and alight in downtown Sacramento. Instead, passengers are requested to utilize Sacramento RT for local services within downtown Sacramento. The Yolobus ridership data set contains the total daily ridership counts that span the three year period from January 2006 to December 2008. The data set is missing two days, July 30, 2006 and July 31, 2006. 3.2.3 Roseville Transit Roseville Transit is operated by the City of Roseville and mainly serves the City of Roseville, but additionally serves Sacramento commuters. Roseville Transit runs specific commuter routes that serve the Sacramento downtown core, including AM Routes 1 8 and PM Routes 1 8. The commuter routes only run Monday through Friday between the morning peak commute hours of 5 – 9 AM and the afternoon peak commute hours of 3: 30 – 7: 30 PM. Roseville Transit uses manual counts by bus driver to collect ridership information. Its daily, peak period, ridership data was provided for the entirety of 2006, 2007, and 2008, including separation by commuter route. 41 3.2.4 North Natomas TMA The North Natomas “ Flyer” is operated by the North Natomas TMA, serving Natomas as well as downtown Sacramento commuters. The Flyer runs between 20 and 28 passenger buses through several North Natomas neighborhoods, and although there are no timed stops, time points are listed on the schedule ( the bus will stop wherever there are passengers waiting). In downtown Sacramento, there are timed stops at set locations. The Flyer includes three routes that serve the downtown core: the Eastside Route, the Westside Route, and the Central Route. In September of 2008, North Natomas TMA began running a Square Route; however, those ridership counts were excluded from the daily totals because that route was added after the construction period had ended and there was no pre construction or construction data to use for comparison. The Flyer operates Monday through Friday except on certain holidays. North Natomas TMA runs peak period scheduled routes between Natomas and downtown Sacramento. The Eastside Route has three morning and three afternoon loops that run from 5: 54 AM – 9: 04 AM and from 3: 35 PM – 6: 54 PM, respectively. The Westside Route has two morning and two afternoon loops that run from 6: 00 AM to 7: 44 AM and from 4: 30 PM to 6: 30 PM, respectively. The Central Route also has three morning and three afternoon loops that run from 6: 03 AM – 9: 04 AM and from 4: 07 PM – 7: 06 PM, respectively. North Natomas TMA uses manual counts by driver as well as farebox counts to collect ridership information. Manual counts were also provided by volunteer riders during the duration of the Fix I 5 project. Their daily peak period ridership data, collected using both manual and automated collection methods, spans the complete 2008 year and is separated by route. 42 3.2.5 Yuba Sutter Transit Yuba Sutter Transit is operated by Sutter and Yuba Counties and the Cities of Marysville and Yuba City and provides service to Yuba City, Marysville, Linda, Olivehurst, East Nicolaus and Sacramento. Only the Sacramento Commuter Express provides Sacramento downtown commuter service ( via Highways 70 and 99). The commuter service runs on weekdays, but not on certain holidays. Yuba Sutter currently provides nine commuter schedules for each of the peak periods that operate from 5: 20 AM to 8: 00 AM and from 3: 45 PM to 6: 50 PM. Yuba Sutter Transit uses manual counts by the drivers to collect all ridership information. Their daily, peak period ridership data is for the Sacramento Commuter Service for the years 2005, 2006, 2007, and 2008. This data was broken down by day and further separated by route. 2008 data was provided in the same format but in an electronic version. 3.3 Data Filtering In order to modify ridership data provided by each transit agency a four step procedure was followed for each agency. 3.3.1 General Procedure Step 1: Filter data to include ridership only for lines which provide service to the Sacramento downtown core, as previously defined by cordon. Table 3.2 provides a list of each transit agency and their bus transit lines that provide service to the downtown: 43 Table 3.2: Transit Lines Servicing the Downtown Core Transit Agency Downtown Servicing Lines Regional Transit 2,3,6,7,11,15,29,30,31,33,34,36,38,50E, 51,62,63,67,68,86,88,89,109 Yolobus 39,40,41,42A, 42B, 43,44,45,230,231,232,240,241 North Natomas Eastside Route, Westside Route, and Central Route Roseville Transit AM Routes 1 8, PM Routes 1 8 Yuba Sutter Transit Sacramento Commuter Express Step 2: Filter data according to Table 3.3 to include inbound ridership for the AM peak period, as defined by agency. Table 3.3 AM Peak Period Definitions of Each Data Set Transit Agency AM Peak Period Definition Regional Transit 6: 30 9: 00 Yolobus Daily Data North Natomas Route: Eastside: 5: 54 9: 04, Westside: 6: 00 7: 44, Central: 6: 03 9: 04 Roseville Transit 5: 00 9: 00 Yuba Sutter Transit 5: 20 8: 00 Step 3: Filter data according to Table 3.4 to include outbound ridership for the PM peak period, as defined by agency. Table 3.4: PM Peak Period Definitions of Each Data Set Transit Agency PM Peak Period Definition Regional Transit 3: 00 6: 00 Yolobus Daily Data North Natomas Route: Eastside: 3: 35 6: 54, Westside: 4: 30 6: 30, Central: 4: 07 7: 06 Roseville Transit 3: 30 7: 30 Yuba Sutter Transit 3: 45 6: 50 Step 4: Filter data to include only Tuesday, Wednesday and Thursday ridership. Because modified work schedules are widely used, Monday and Friday are not representative of typical ridership. 44 The final model ridership is a combination of ridership across bus lines for a given agency, so that a single data point represents total ridership on all lines serving the downtown core. The sample size for the final data sets for this analysis is described in Table 3.5: Table 3.5: Sample Size of Each Data Set Transit Agency Time Period Aggregation Sample Size Regional Transit 2008 Weekly, Peak Period 49 Yolobus 2006 2008 Daily 441 North Natomas 2008 Daily, Peak Period 147 Roseville Transit 2006 2008 Daily, Peak Period 441 Yuba Sutter Transit 2006 2008 Daily, Peak Period 441 3.3.2 Special Modifications to General Procedure for Regional Transit RT data required more manipulation in order to perform the necessary filtering. The details are described. The main objective was to use the APC data to obtain the total weekly demand within the downtown core for all 52 weeks in 2008. More specifically, the goal was to obtain this weekly demand data for buses entering the downtown during the AM peak ( 6: 30 AM – 9: 00 AM) and for buses leaving the downtown during the PM peak ( 3: 00 PM – 6: 00 PM). Because RT ridership data is not collected for every passenger or trip, more complex methods were needed for RT. All of the bus stops within the downtown core were identified using RT generated identifying numbers. RT uses 325 bus stops in the downtown. Then the number of boarding and alighting riders associated with those bus stops during each peak period was obtained. Although the daily APC data is incomplete, it covers almost half of the stops within the downtown area every day. We can assume that the total data collection for one week ( Monday through Friday) covers all 45 of the stops within the downtown area and that the sample size for each bus stop is sufficient ( Drake, 2007). Next, using the APC data, the daily average of alighting riders during the AM peak period and the daily average of boarding riders during the PM peak period was calculated. Using the RT bus schedule, the frequency of stops at each bus stop during the peak hours was determined. This frequency is a fixed number every day for a certain schedule. RT had four different schedules throughout 2008; however, comparison between schedules shows that the frequency of the downtown bus stops did not change for the downtown core for 2008. Therefore, this study used the frequencies from the first schedule, Schedule 20, valid between January 6, 2008 through April 5, 2008, for both the AM and PM peak periods. The following equations were used to calculate total ridership: Y Y © & _ S l _ ª « I ¬ ® I B 1 © l _ © ¯ l " " ª l _ Y ª t _ l ° ± l D ² ³ Y ª © Y ´ ± Y ª S ± _ D ¯ µ ¶ ª l © · ! ¸ « = Y Y © & _ S l _ ª ® ¬ ® I B 1 © l _ © ¯ l " D " ª l _ Y ª t _ l ° ± l D ² ³ Y ª © Y ´ ± Y ª S ± _ D ¯ ¶ ª l © · ! ¸ « = 3.4 Independent Variables A multiple regression analysis was performed on each of the nine ridership data sets to determine if any independent factors played a significant role in ridership changes during the period of analysis. The regression was performed for all agencies ( and all peak periods) using four independent variables: GDP, unemployment rates, gas prices, and fare prices. The smallest period of data aggregation available was used for each independent variable. Table 3.6 describes the final independent variable data: 46 Table 3.6: Independent Variable Details Independent Variable Source Aggregation Location Contact Contact's Official Position Title Gross Domestic Product Bureau of Economic Analysis Quarterly National Lisa Mataloni Economist Gasoline Prices AAA Monthly Sacramento City Michael Geeser Media and Government Relations Representative Unemployment Rates Bureau of Labor Statistics Monthly Sacramento/ Arden Arcade/ Roseville Website Fares Yuba Sutter Transit Daily Agency Dawna Dutra Analyst Roseville Transit Elizabeth Haydu Administrative Technician In terms of GDP data, seasonally unadjusted data was used because the adjustment of GDP data is outsourced, and the Bureau of Economic Analysis doesn’t provide or have access to unadjusted GDP data. Additionally, state and metropolitan area GDP is only available on an annual basis and the lowest level of aggregation is national GDP provided on a quarterly basis. GDP was included as a measure of overall economic health. Hoel ( 1971), in his discussion of linear regression, gave the example of a 0.98 correlation coefficient between teacher’s salaries and liquor consumption, noting that in general the economy was doing well and upward trends were common. He warned about spurious correlations which must be considered in correlational studies. Gas price data was the unleaded gasoline price per gallon averaged for the city of Sacramento between 2006 and 2008. Finally, fares were used for two transit agencies who were affected by changes in basic fare rates namely Yuba Sutter Transit and 47 Roseville Transit, with one increase for the three year period ( 2006 2008) for both agencies. Table 3.7 describes the fare pricing for each agency: Table 3.7: Fare Pricing Details Transit Agency Single Ride, Adult Fare Regional Transit $ 2.00 Yolobus $ 1.50 North Natomas $ 1.00 Roseville Transit 11/ 1/ 2003 – 6/ 30/ 2007: $ 2.75, 7/ 1/ 2007 – 12/ 31/ 2008: $ 3.25 Yuba Sutter Transit 8/ 1/ 2002 – 6/ 30/ 2007: $ 3.00, 7/ 1/ 2007 – 12/ 31/ 2008: $ 3.50 For data aggregated on levels other than daily, the monthly or quarterly average value of the independent variable was repeated for all Tuesdays through Thursdays that existed for that month or quarter ( based on the information from the data manipulation section). Consequently, there is a single value for every day of ridership data that represents that month’s average of the independent variable. For weeks that straddled two months, the two monthly averages were averaged. For example, Week 17 of 2008 includes April 29, April 30 and May 1, and the unemployment rate for April 2008 is 5.9% while May’s unemployment rate is 6.3%. The unemployment rate for this week is calculated as [( 2* 5.9) + 6.3] / 3 = 6.03333%. 3.5 Data Quality2 This section describes data quality considerations, including sub sections describing quality concerns related to the four ridership data collection methods employed by the five transit agencies, as well as additional data quality concerns. The four collection methods consist of automatic passenger counting ( APC) devices, electronic registering 2 Sections 3.5.1, 3.5.2, 3.5.3, and 3.5.4 use information from a report prepared by Jessica Seifert, under the author’s direction. 48 fareboxes ( ERFs), manual counts by route checkers, and manual counts by bus drivers. Each section will briefly discuss the collection method, concerns related to data quality and which agencies use that method. All agencies ridership data sets were shortened to Tuesday, Wednesday, and Thursday data sets which did not contain any missing data. The two missing values for Yolobus, discussed previously, July 30, 2006 and July 31, 2006, fell on Sunday and Monday. 3.5.1 Automatic Passenger Counting Devices APC devices automate ridership data collection by tracking boarding and alighting riders, in addition to including a time and location stamp for each count. RT is the only transit agency within the study that utilizes APC devices, supplied by Clever Devices, Inc. ( Drake, 2009). This technology uses an infrared beam to count boarding and alighting riders, and is mounted above the bus doors ( Poggioli, 2009). The Clever Devices APC correlates the ridership data to GPS coordinates and scheduled routes so that the data may be viewed on a per bus, per door level ( Clever Devices, 2009). Clever Devices, Inc. claims that their APC system demonstrates over 95% accuracy, though they do not provide information on their website that would account for the 5% error ( Clever Devices, 2009). Boyle ( 1998), referring to all APC systems, stated that typically the most common problems are related to software, as transit agencies often have to upgrade their analytical programs, and secondarily hardware problems ( device failure and durability). But for the Clever Devices APC system, in large part, the 5% error can be attributed to mechanical malfunctions as well as door bunching, carrying a child, carrying large bags, drivers getting on and off the bus, non riders making inquiries 49 to bus drivers, and misalignment of sensors ( Poggioli, 2009). In addition to technical problems, Boyle ( 2008, p. 18) defines a “ debugging” period in which employees must familiarize themselves with the new technology. From the survey Boyle conducted in 1998, the average debugging period for APC devices was 17 months ( Boyle, 2008). The accuracy of APC systems can be evaluated by comparing its ridership data to manual counts, although manual counts may also have data quality problems ( see Sections 3.5.3 and 3.5.4) ( Boyle, 2008). As mentioned, RT is the only agency that uses APC devices to collect ridership information. RT was unable to provide APC ridership data for all of the buses that serve the downtown Sacramento region, because APC devices were not installed on the entire bus fleet and because the data was heavily filtered to remove data with obvious errors ( see Section 3.2.1 for filtering rules). Also, RT’s APC system is still in testing phases which could indicate that the devices are also within the debugging period ( Drake, 2009). 3.5.2 Electronic Registering Fareboxes Electronic registering fareboxes ( ERFs) are devices in which bus drivers enter a number corresponding to rider type into a key pad that connects to an electronic farebox ( Boyle, 1998). The drivers are also required to enter a value to indicate the route and run number at the beginning of each trip ( Boyle, 1998). ERFs do not collect location information, so ridership data is only available at the trip level ( Drake, 2009). As is done with APC devices, the data collected from electronic registering fareboxes can be “ validated” by a comparison with manual counts or by comparison with the revenue collected from fares ( Boyle, 1998). 50 There are four problems that may be encountered when using electronic registering fareboxes: mechanical problems, operator compliance, software problems, and accuracy of data ( Boyle, 1998). The bus operators must enter the correct codes at the beginning of each route and trip and the correct code for the type of passenger ( Boyle, 1998). Boyle’s survey ( 1998) indicates that some transit agencies experienced difficulties when adding these additional responsibilities to the bus drivers’ duties, although the most successful agencies were the ones that provided continuous ERF training to their drivers. Ultimately, the quality of the data collected from ERFs is affected both by human and software errors. The transit agencies within this study that used ERF are RT, Yolobus, and North Natomas TMA. RT has electronic fareboxes installed on all of their buses except for the community buses. 3.5.3 Manual Counts by Route Checkers Most transit agencies utilize manual counts either as their primary method of data collection, or for comparison against electronic methods ( Boyle, 1998). Route checkers ride the transit vehicle and take manual counts of passengers boarding and alighting at each stop ( Boyle, 1998). They typically have preprinted forms or handheld units that contain all of the stops on that route, with the sole responsibility to count passenger and record bus stop arrival and departure times ( Boyle, 1998). Manual counts are the most well established method of ridership data collection ( Boyle, 1998). The following problems are associated with manual counting by route checkers: accuracy of data, consistency of data, labor intensiveness, reliability of route checkers, and cost of 51 manual counting ( Boyle, 1998). Problems with accuracy and consistency of the data are a result of the training and reliability of the route checker, as well as transcription of the handwritten record to an electronic version ( Boyle, 1998). RT and Natomas TMA were the only transit agencies within the study that used manual counts by route checkers to collect ridership data. It should be noted that Natomas TMA used untrained volunteer riders to provide manual counts. 3.5.4 Manual Counts by Bus Drivers Manual counts by bus drivers are another method of ridership collection. Manual counting by bus drivers is concerned with many of the same problems as manual counting by route checker, including the labor intensiveness and reliability of the counter. But because bus drivers also have many other responsibilities such as driving the bus, monitoring passengers and collecting fares, they may be less focused on counting passengers than route checkers. All five of the transit agencies within the study use manual counts by bus drivers as either their primary method of data collection or in combination with another technique. Roseville Transit and Yuba Sutter Transit exclusively use manual counts by bus drivers to collect ridership information, while North Natomas TMA and Yolobus use manual counts by bus driver in addition to electronic registering fareboxes. RT uses manual counts by bus driver in addition to the other three techniques. 52 3.5.5 Additional Data Quality Considerations As discussed above, the RT data was heavily filtered and manipulated prior to being obtained by this study. Although quantification of data quality is not possible, RT data is probably the least reliable of the five agencies analyzed. RT ridership data was only a sample of total ridership ( unlike all other agencies), and further the collected data was not a random sample. Additionally the system was still in a testing phase. Furthermore, this analysis did not separate riders who board and alight within the downtown core. According to RT, ridership that fell within these categories was less than 5% of total ridership for commute periods, but other ridership data was not available to verify this. Although the Yolobus data was probably more reliable than RT, their daily ridership totals included regular bus routes ( 40, 41, 42A/ B, 240) which operated all day during weekdays, in addition to commute and express services which only run Monday through Friday during peak commuting periods. The Yolobus ridership sample therefore included some non commute data. Since the data from RT and Yolobus was received in an electronic format, there is a possibility of transcription errors on the part of the transit agency. The data from Yuba Sutter Transit, Roseville Transit, and North Natomas T. M. A. was received in a hardcopy format. There was also a possibility of transcription errors in entering that hardcopy data into data sets used by this study, although all data entry was verified for accuracy by a second person. 53 3.6 Data Cleaning The data sets provided by each of the five agencies contained only two cases of actual missing data, both for Yolobus ( July 30, 2006 and July 31, 2006). Although cases of missing data were rare, plots of the data that was provided by each agency indicated that some data manipulation would be necessary to account for holidays and limited service days. As an example, Figure 3.1 below displays the original data for Roseville Transit: Figure 3.1: Plots of Original Roseville Peak Period Ridership Data Plots of each agency’s original data set and imputed data set including Tuesday, Wednesday and Thursday ridership can be found in Appendix B. The drops in the plots represent transit holidays and limited service days as well as state holidays. Although not technically missing data, because agencies had provided data for all observations, the buses and the riders ( assumed to be workers in downtown Sacramento) were “ missing” 54 and therefore ridership was zero ( or very low) for transit and state holidays, and unusually low for limited service days. Those occurrences were treated as missing data. For some missing observations, the missing data could be considered missing completely at random ( MCAR). MCAR occurs if missing observations are distributed randomly over all observations, including that variable and any others, and can therefore be considered a simple, random subsample ( Allison, 2002). The missing data in this analysis are MCAR, although not ignorable. As discussed in the Literature Review, discrete time series data assumes that the time series is observed at equal intervals. More complex methods are necessary if the observations are not equally spaced, and therefore the missing data in this analysis had to be imputed. There are multiple methods to deal with missing data. Some conventional methods, excluding listwise and pairwise deletion, include dummy variable adjustment, and imputation. A basic dummy variable regression was first used, but an ad hoc imputation method was ultimately used because of the detailed information about the missing value cases and their likely “ true” values. Prior to any data imputation, it was necessary to identify days with no transit service, limited transit service days, and full transit service days that coincide with state holidays for each of the five transit agencies for 2006, 2007 and 2008. Those dates are considered missing observations, and are identified in Appendix D. From Yolobus data exploration, in general, there was low variation from the mean for Tuesday, Wednesday and Thursday ridership for any given week. However, it also appeared that there was low variation from the mean for the same weekday ridership for 55 three consecutive weeks. For example, the second Tuesday in a month showed similar ridership to the first and third Tuesdays in that month. Therefore, two methods for imputing data for “ missing” observations were compared. The methods were tested using Tuesday, Wednesday, and Thursday Yolobus ridership data. The same set of holidays was used to test both methods. The two methods are described below, and the detailed calculations are given in Appendix E: o Method 1 used the same week that the holiday falls in but different days. T1 is defined as the ridership of the first non holiday day in the holiday week, and T2 is defined as the ridership of the second non holiday day in the holiday week. For example, if the holiday fell on a Wednesday, T1 was the ridership on Tuesday and T2 was the ridership on Thursday, whereas if it fell on Tuesday, then T1 applied to Wednesday and T2 to Thursday of the same week. Then the absolute value of the difference,  T1 T2, was calculated. The differences for all of the holidays were summed and divided by the total number of holidays, giving the average difference. This difference was found to be equal to 152.33. o Method 2 uses the weeks prior to and after the holiday week but the same day. T1’ was defined as the ridership on the same day of the week before the holiday, and T2’ was defined as the ridership on the same day of the week after the holiday. For example, if the holiday fell on a Tuesday, T1’ was the ridership of the previous Tuesday and T2’ was the ridership of the following Tuesday. Then the absolute value of the difference,  T1’ T2’, was calculated. The differences for all of the holidays were summed and divided by the total number of holidays, giving the average difference. This difference was found to be equal to 154.17. 56 Since the average difference of Method 1 was smaller than the average difference of Method 2, Method 1 was used for this study. More specifically, the average of T1 and T2, which lie in the same week as the day with the holiday, was used to impute the missing day’s ridership. In addition, there were no problems with Method 1 when holidays occurred in consecutive weeks, for example, Christmas Day and New Year’s Eve. Data imputation was done using Method 1 for the days that each transit agency ran limited or no services as well as state holidays when they ran full services. Finally, Thanksgiving, Christmas, and New Year’s Eve weeks were eliminated from the data as those entire weeks showed extremely low ridership. The formula used for percent data imputed is % Imputed = ( Number of Days Imputed/ Total Number of Days) x 100. There were no differences between holidays, or limited service days, so the percent of data imputed for agencies with separate AM and PM peak data sets was constant. Table 3.8 shows that the amount of imputed data for any given agency is at most 2%, and usually much less. This is considered an acceptable level of imputation. Table 3.8: Percent Imputed Data Transit Agency Percent Roseville Transit 0.91% Yuba Sutter Transit 0.91% Yolobus 0.68% North Natomas T. M. A. 1.36% Regional Transit 2.04% 57 3.7 Descriptive Statistics for Transit Ridership3 The statistical methods discussed in the literature review are considered parametric statistical methods. Parametric methods make assumptions about the population parameters, more specifically probability distributions are usually assumed to be normal ( Mann, 2004). The statistical tests used in this study, including the regression and time series analyses presented later, use parametric methods. The following discussion provides a general statistical overview of each transit agency’s ridership data, based on the cleaned data sets, prior to in depth time series analysis. Both measures of center tendency and measures of dispersion will help to describe the data and its distribution. This section will present statistics, but leaves the interpretation of the statistics to Chapter 5. 3.7.1 Measures of Central Tendency Measures of center value describe the center of the distribution of a variable. The mean is an arithmetic average which is commonly used to describe distributions. However, the mean statistic is sensitive to extreme values, also known as outliers ( Ross, 2005). The median is also a measure of center value, and describes the middle value of the data without being as affected by outliers ( Ross, 2005). In order to describe the center values of each data set, Table 3.9 lists the mean and median ridership for each transit agency’s data sets. 3 Section 3.7 makes use of calculations and tables created by Jessica Seifert. 58 Table 3.9: Measures of Central Tendency: Mean and Median Transit Agency Data Aggregation Mean Ridership Median Ridership Roseville Transit AM 213.9 208.0 PM 200.4 198.0 Yuba Sutter Transit AM 239.2 226.0 PM 237.5 222.0 Yolobus Daily 3499.5 3383.0 North Natomas TMA AM 116.8 127.0 PM 96.36 96.0 RT AM 15498.4 15314.0 PM 13639.6 13641.0 A comparison of the median and the mean provides insight into the shape of the data sets distributions. If the median and mean have similar values, then the distribution is probably symmetric; otherwise, the data may be to some degree skewed ( Ross, 2005). The data for this analysis shows that the medians for each agency are similar to their means. This indicates that the distributions of the ridership data are fairly symmetric. More specifically, the medians for Roseville Transit, Yuba Sutter Transit, Yolobus, and Regional Transit are slightly less than their means, indicating that the distributions may be skewed to the right. Part of the skew in the histograms of Roseville Transit ( AM peak period), Yuba Sutter Transit, and Yolobus can be attributed to slightly higher ridership on Tuesdays compared to Wednesdays and Thursdays. The Roseville and Yuba Sutter histograms are shown in Figure 3.2, confirming that expectation. The histograms of all data sets are given in Appendix F. 59 Figure 3.2: Roseville and Yuba Sutter Transit Histograms Roseville Transit, Yuba Sutter Transit, and Yolobus that have data sets spanning the period of 2006 to 2008. All three agencies experienced increased ridership in both 2007 and 2008. In particular, Yuba Sutter Transit experienced high ridership increases; from 2006 to 2007, average AM ridership increased by 15.2% and from 2007 to 2008, it increased by 31.5%. Similar changes were seen in Yuba Sutter Transit’s PM ridership during those years. The medians in Table 3.10 are much closer to their means. In fact, all agencies display this tendency, indicating that the yearly distributions are much more symmetric than the distributions of the entire data sets. Table 3.10: Yearly Means and Medians for Transit Agencies with Data Spanning 2006 2008 Roseville Transit AM Ridership ( a) Ridership Frequency 100 150 200 250 300 350 0 10 20 30 40 50 60 Roseville Transit PM Ridership ( b) Ridership Frequency 100 150 200 250 300 350 400 0 10 20 30 40 50 60 Yuba Sutter Transit AM Ridership ( c) Ridership Frequency 150 200 250 300 350 400 0 10 20 30 40 Yuba Sutter Transit PM Ridership ( d) Ridership Frequency 150 200 250 300 350 400 0 10 20 30 40 Transit Agency Data Aggregation Mean Median 2006 2007 2008 2006 2007 2008 Roseville Transit AM 194.9 205.6 241.2 196.0 203.0 240.0 PM 189.1 192.5 219.6 190.0 194.0 217.0 Yuba Sutter Transit AM 195.6 225.4 296.5 195.0 226.0 300.0 PM 189.9 223.0 299.5 189.0 222.0 302.0 Yolobus Daily 3175.1 3346.1 3977.2 3188.0 3360.0 3932.0 60 3.7.1.1 Ridership Means by Period Since not all data sets span multiple years, the means of the data were also calculated seasonally for the year 2008. The calculations were made based on the following seasons: • 1st quarter: January – March • 2nd quarter: April – June • 3rd quarter: July – September • 4th quarter: October – December RT was excluded as its data is observed weekly. All of the agencies experienced increased ridership between the first and second quarters of 2008. North Natomas TMA ridership increased the most during this period, with a 41.6% increase in AM ridership and a 24.9% increase in PM ridership. Similarly, all agencies saw an increase in ridership between the second and third quarters of 2008, with North Natomas TMA again showing the largest increase. However, opposite changes occurred between the third and fourth quarters of 2008. Almost all of the agencies experienced a decrease in ridership during this period; Yolobus was the only agency that saw an increase in ridership ( 1.7%). The means for each quarter of 2006, 2007 and 2008 are displayed in Table 3.11. In general, it appears that transit ridership decreased in the first and fourth quarters. 61 Table 3.11: Means by Season for 2006, 2007 and 2008 Transit Agency Roseville Transit Yuba Sutter Transit Yolobus North Natomas TMA Data Aggregation AM PM AM PM Daily AM PM 2006 Mean Ridership 1st 190.3 182.4 189.7 182.9 3270.0 2nd 203.2 190.7 190.7 184.9 3185.0 3rd 197.2 196.2 201.9 196.2 3090.6 4th 187.3 186.0 200.6 196.0 3161.8 2007 Mean Ridership 1st 195.1 191.4 213.4 206.8 3333.8 2nd 200.3 192.6 219.9 214.1 3269.8 3rd 201.0 188.6 229.0 225.0 3385.3 4th 228.6 198.1 240.8 248.8 3403.5 2008 Mean Ridership 1st 226.3 199.6 253.9 257.2 3423.9 77.9 74.8 2nd 235.1 213.3 288.8 292.3 3782.8 110.3 93.4 3rd 266.0 234.6 333.8 335.7 4326.6 146.8 114.7 4th 234.5 231.3 307.2 310.8 4399.9 130.8 101.3 Means and medians were also calculated based on the Fix I 5 construction period. The three periods in Table 3.12 represent the time before the construction ( January 1, 2008 – May 30, 2008), the time during the construction ( May 31, 2008 – July 27, 2008), and the time after the construction ( July 28, 2008 – December 31, 2008). All of the agencies experienced increases in mean ridership between the pre construction and construction periods, but the changes in the ridership from the construction to post construction periods varied by agency and by peak period within agencies. However, these differences are confounded with seasonal differences, as the previous table had shown. 62 Table 3.12: Means by Construction Period for 2008 * RT AM and PM peak period ridership represents weekly ridership counts. 3.7.2 Measures of Dispersion But measures of central tendency do not give a complete picture of the data’s distribution; measures of dispersion are also included as descriptive statistics and include the standard deviation. The sample variance, s2, is the average of the squared deviations from the sample mean, ¹ , while the sample standard deviation, s, is the square root of the variance ( Ross, 2004). The relative size of the standard deviation can provide information about how tightly clustered the data are about the mean. Smaller standard deviations indicate that the data are tightly clustered whereas larger standard deviations indicate that the data are relatively more dispersed ( Mann, 2004). The standard deviation, together with the mean, can be used to calculate a range in which a certain percentage of the data can be expected to lie: the confidence interval ( Ross, 2005). This range provides values in terms of the original data’s units that indicate how much of the data are “ normally” contained in that range. Standard deviations ( s) by construction period are given in Table 3.13. Transit Agency Data Aggregation Mean Ridership Median Ridership Pre During Post Pre During Post Roseville Transit AM 228.2 253.8 249.8 229.0 250.5 246.0 PM 204.3 226.0 233.2 204.0 230.5 230.0 Yuba Sutter Transit AM 265.8 314.0 321.8 262.0 311.5 316.5 PM 268.8 317.2 324.8 268.0 318.5 320.5 Yolobus Daily 3563.1 4023.5 4393.5 3589.0 3973.0 4447 North Natomas TMA AM 85.2 146.2 138.1 82.0 148.5 138.5 PM 76.6 124.3 106.0 75.0 127.0 104.5 RT AM* 14785.9 15525.8 16235.7 14990.4 15132.5 16423.0 PM* 13361.6 13907.5 13824.4 13221.9 14188.6 13966.6 63 Table 3.13: Variance and Standard Deviation for Each Transit Agency Transit Agency Data Aggregation Standard Deviation ( s) º » ± Pre During Post s º » ± 2s º » ± 3s Roseville Transit AM 20.54 24.91 24.46 ( 186.0, 241.8) ( 158.1, 269.7) ( 130.2, 297.6) PM 15.49 20.17 23.49 ( 177.7, 223.1) ( 155.0, 245.8) ( 132.3, 268.5) Yuba Sutter Transit AM 29.14 16.33 21.38 ( 191.4, 287.0) ( 143.6, 334.8) ( 95.8, 382.6) PM 33 19.7 23.02 ( 185.7, 289.3) ( 133.9, 341.1) ( 82.1, 392.9) Yolobus Daily 240.38 209.77 229.1 ( 3043.5, 3955.5) ( 2587.5, 4411.5) ( 2131.5, 4867.5) North Natomas TMA AM 12.14 11.65 12.48 ( 86.7, 146.9) ( 56.6, 177.0) ( 26.5, 207.1) PM 8.75 12.37 12.42 ( 75.2, 117.6) ( 54.0, 138.8) ( 32.8, 160.0) RT AM 992.62 1265.84 1119.38 ( 14238, 16759) ( 12977, 18019) ( 11717, 19280) PM 687.33 1043.77 793.33 ( 12824, 14455) ( 12009, 15270) ( 11193, 16086) According to the empirical rule, the following percentages of approximately normal data lie in these respective ranges: 68% in the range ¹ ± s, 95% in the range ¹ ± 2s, and 99.7% in the range ¹ ± 3s ( Ross, 2005). These ranges were calculated for the data sets and are displayed in Table 3.13. The ¹ ± 3s range covers 100% of the data in all but two cases ( with Roseville Transit AM and PM data sets containing points that lie outside of the ¹ ± 3s, 99.7% range), indicating that the data sets are approximately normal. 3.7.3 Discussion All agencies had overall increases in mean ridership during the study period, but there were also seasonal variations in mean ridership. An informal analysis of data dispersion indicated that the data sets were approximately normal, with minor skews. Although this study’s data failed usual tests of normality, slight departures from normality do not cause serious issues ( Kutner et al., 2005) With the possible exception of RT, this study’s data 64 sets are random samples with a sufficiently large number of observations. Their populations were considered approximately normally distributed, and parametric methods were justified. The next Chapter, which uses multiple regression and time series analyses, studies the transit agency data sets to identify independent variables that correlate with increased ridership and which can be used in predictive models to explain the change in ridership means during the Fix I 5 project. 65 CHAPTER 4 MODEL BUILDING This chapter describes the methodology that was used to create the time series intervention models for each agency’s transit ridership data. The first two sections present the steps taken to transform each of the nine data sets into stationary processes, including detrending using multiple regression analysis, and eliminating seasonal components using sinusoidal decomposition. The last section explains the intervention analysis methodology. 4.1 Multiple Regression The nine time series plots shown in Appendix B show an overall increasing trend in ridership. As discussed in the Literature Review Section, there are multiple methods of removing trend components in the time domain including least squares estimation, smoothing with moving averages and differencing, as well as regression techniques ( Aue, 2009; Yaffee, 2000). Regression techniques allow the modeler to eliminate trends using independent variables. As discussed earlier, each previous study used a different set of independent variables to predict transit ridership, but most of the studies that used multiple regression included gas prices, fares, and economic indicators such as unemployment rates. The following sections describe the relationships between bus transit ridership and each independent variable used in this multiple regression. Plots of each independent variable can be found in Appendix C. 66 4.1.1 Bus Transit Ridership and Gas Prices Many studies have shown that gas prices significantly affect ridership, and that the ridership gas price correlation is positive. Lane ( 2009) showed that gasoline prices are a statistically significant predictor of positive changes in transit ridership, with positive ridership gas correlations. Wang and Skinner ( 1984) used data from seven transit authorities in the U. S. and showed that as real gasoline prices increase, transit ridership increases significantly. In a cross sectional study, Taylor et al. ( 2009) analyze transit ridership from 265 urban areas using regional fuel prices provided by the Bureau of Labor Statistics as an explanatory variable and hypothesized a positive correlation. They found that fuel prices were a significant external factor positively influencing aggregate transit ridership. Kyte et al. ( 1988) shows that gasoline price is a statistically significant predictor of bus transit ridership, explaining that increasing the cost of automobile travel ( i. e. gas prices) would motivate a mode change to transit. They find that gasoline prices show a negligible lag in their influence on ridership. For the previous work presented above, those studies that used gas prices in their analysis found them to be significant independent variables. This study’s data found strongly significant and positive Pearson’s correlations between bus transit ridership and gas prices ranging between 0.18 and 0.6 for eight of the data sets. The Pearson’s correlation coefficient, r, is defined for pairs ( xi, yi) as ( Ross, 2005): r Σ ¼ ½  ¼ ¹ ¾ ½ ¿ h À ½  À ¹ Á  Â Ã Â Ä . The data sets with the highest correlations ( Roseville Transit, Yuba Sutter Transit and Yolobus) are those having the longest time series, suggesting that perhaps the impact of 67 higher gas prices was beginning to level off by 2008 ( i. e. those who were susceptible to the effect of higher prices had already changed earlier than 2008), or possibly that the intervention of the Fix I 5 project and other anomalies of 2008 ( economic conditions, serious regional fires during the summer) disrupted the previously regular relationship between gas prices and ridership. Also, the highest correlations between ridership and gas price were for bus transit agencies farthest from the Sacramento downtown core ( Yuba Sutter Transit, Roseville Transit and Yolobus) indicating that commuters with longer commute distances may have been more sensitive to rising gas prices, and therefore, more inclined to use bus transit. The RT AM peak has a counterintuitive negative ridership gas correlation of  0.2. The ridership gas Pearson’s correlations are shown in Table 4.1. Table 4.1: Ridership Gas Price Correlation Coefficients Transit Agency Peak Period Pearson's Ridership Gas Price Correlation Coefficient Regional Transit AM  0.20*** Regional Transit PM 0.18*** Yolobus Daily 0.39*** North Natomas AM 0.25*** North Natomas PM 0.28*** Roseville Transit AM 0.60*** Roseville Transit PM 0.43*** Yuba Sutter Transit AM 0.59*** Yuba Sutter Transit PM 0.59*** ***: p < 0.001 4.1.2 Bus Transit Ridership and Unemployment Rates A small number of previous studies have examined the effects of labor statistics, specifically employment, on transit ridership. Agrawal ( 1981) showed that jobs in 68 Philadelphia were highly significant positive predictors of transit ridership. Surprisingly, he showed th 



B 

C 

I 

S 


