EXAMINING METHODS FOR IDENTIFYING THE OCCURRENCE OF SECONDARY CRASHES By Hadis Nouri A DISSERTATION Submitted to Michigan State University in partial fulfillment of requirements for the degree of Civil Engineering—Doctor of Philosophy 2022 ABSTRACT EXAMINING METHODS FOR IDENTIFYING THE OCCURRENCE OF SECONDARY CRASHES By Hadis Nouri Traffic crashes are a particular concern in urban areas, where the occurrence of a collision heightens the risk of subsequent secondary crashes upstream, particularly under high levels of traffic congestion. There is considerable difficulty in estimating the number of such crashes, and in identifying roadway locations and circumstances where the risks of such crashes are most pronounced. In light of these concerns, there is significant value in advancing our understanding of these issues, including our ability to predict and mitigate the potential for secondary crashes on freeways. A significant challenge in this regard is the ability to effectively identify a secondary crash with respect to the both the spatial temporal thresholds within which secondary crashes occur. Contemporary approaches are often based on static spatiotemporal impact windows, or on dynamic approaches that consider traffic flow conditions. Both methods are subject to important limitations that are investigated as a part of this research. As a part of this study, crash data from the Michigan interstate system was used to identify secondary crashes. A detailed review of police crash reports is conducted to verify which crashes are secondary in nature by examining standard fields on the report form, as well as information from the narrative section completed by the investigating officer. The influence of spatiotemporal window sizing (relative to the time and location of the primary crash) is explored with respect to the sensitivity and specificity of secondary crash detection in order to determine thresholds that yield minimal error. A static approach based on a large number of predefined window sizes was used to compare the rate of secondary crash identification. The static method was shown to consistently overestimate secondary crash occurrence and these results varied across thresholds sizes. Subsequent efforts used a dynamic approach, where the window size was varied based upon changes in speed profiles on the associated road segments. Real-time traffic and speed data were used to identify secondary crashes and the results vary considerably based upon the method employed. The research also identified contextual environments where the risks of secondary crashes are most pronounced through the estimation of a series of regression models, culminating in guidance to assist road agencies in effectively monitoring and clearing crashes and other incidents to minimize the potential for secondary crashes. This thesis is dedicated to Mom and Dad. Thank you for always believing in me. This thesis work is dedicated to my husband, Roozbeh, who has been a constant source of support and encouragement during my journey. iv TABLE OF CONTENTS LIST OF TABLES…………………………………….………………….……………….….......vi LIST OF FIGURES……………………………...…………………...……..........................…...vii CHAPTER 1.INTRODUCTION AND LITERATURE REVIEW………………………..............1 1.1 Existing Methods for Identification of Secondary Crashes .................................................. 2 1.1.1 Manual Method .............................................................................................................. 3 1.1.2 Static Method ................................................................................................................. 4 1.1.3 Dynamic Method............................................................................................................ 6 1.2 Summary and Research Objectives .................................................................................... 10 CHAPTER 2. MANUAL METHOD AND STATIC WINDOW SIZING ................................. 14 2.1 Keyword-Searching Approach/ Checking Narratives ........................................................ 15 2.2 Static Sizing: Spatiotemporal Window ............................................................................... 22 2.3 Analysis and Results ........................................................................................................... 23 2.4 Discussion and Conclusion ................................................................................................. 35 CHAPTER 3. SECONDARY CRASH IDENTIFICATION BASED ON SPEED DATA……..40 3.1 Data Acquisition ................................................................................................................. 41 3.2 Determination of Spatiotemporal Speed Matrix ................................................................. 44 3.3 Determination of Impact Area ............................................................................................ 46 3.4 Secondary Crash Identification Approach .......................................................................... 47 3.4.1 Speed trend plotted based on average speed data on each day of the week and each segment………………………………………………………………………………….…..48 3.4.2 Average speed trend at each section, with respect to day of the week ........................ 51 3.4.3 Estimating crash impact duration and secondary crash identification ......................... 53 3.5 Results and Discussion ....................................................................................................... 58 3.5.1 Secondary Crashes Identified by Manual Method Within Detroit Area ...................... 58 3.5.2 Secondary Crashes Identified Using the Dynamic Method in Detroit Area ................ 59 3.5.3 Static Sizing: Spatiotemporal Window in Detroit area ................................................ 61 3.6 Discussion and Conclusions ............................................................................................... 66 CHAPTER 4. MODELING AND PREDICTING SECONDARY CRASH RISK ..................... 69 4.1 Logistic Regression Analysis.............................................................................................. 69 4.1.1 Data Description and Summary ................................................................................... 70 4.1.2 Analysis and Result of Logistic Regression Model ..................................................... 72 4.2 Negative Binomial Model ................................................................................................... 76 4.2.1 Data Summary.............................................................................................................. 76 4.2.2 Analysis and Result of Negative Binomial .................................................................. 79 CHAPTER 5. CONCLUSION...................................................................................................... 83 BIBLIOGRAPHY………………………………………………………………………………..89 v LIST OF TABLES Table 1- 1: Summary of a spatiotemporal window in the static method ........................................ 5 Table 1- 2: Modeling approaches and contributing factors that affect secondary crashes ............. 9 Table 2- 1: Crashes used in the analysis ....................................................................................... 15 Table 2- 2: Contributing circumstance codes on Michigan UD-10 crash report .......................... 16 Table 2- 3: Example of crashes with secondary crash code that not meet the secondary crash identification conditions................................................................................................................ 17 Table 2- 4: Secondary crash results in manual method ................................................................ 18 Table 2- 5: Summary of manual approach result .......................................................................... 19 Table 2- 6: Secondary crash distribution for interstate roads in Michigan based on static and manual approach ........................................................................................................................... 31 Table 2- 7: Secondary crashes for interstate roads in Michigan based on static and manual approach ........................................................................................................................................ 32 Table 2- 8: Summary of secondary crash rates in literate ............................................................. 37 Table 3- 1: Segments and mile points within PR-number 639107………………………………49 Table 3- 2: Crashes on Friday, October 19th, 2018, along I-96 WB ............................................ 55 Table 3- 3: Result for reviewing the crash reports with secondary related crash code ................ 59 Table 3- 4: Secondary crash results from the dynamic approach for various cut-off scenarios ... 60 Table 3- 5: Comparison of secondary crashes identified by dynamic and manual method ......... 61 Table 3- 6: Secondary crash distribution for interstate roads in Detroit area based on static and dynamic approach ......................................................................................................................... 64 Table 4 - 1: Descriptive statistics for analysis dataset .................................................................. 70 Table 4 - 2 : Logistic regression model results for secondary crash likelihood ........................... 72 Table 4 - 3: Descriptive statistics of pertinent variables ............................................................... 78 Table 4 - 4: Model results for total secondary crashes ................................................................. 79 vi LIST OF FIGURES Figure 2- 1: Example crash report and narrative indicating a secondary crash ............................ 20 Figure 2- 2: Example crash report and narrative indicating a secondary crash ............................ 21 Figure 2- 3: Trade-off between sensitivity and specificity ........................................................... 23 Figure 2- 4: (a) Density function for fixed time grid=15 minutes and various distance grid, (b) Higher resolution inset of distribution in shorter distance (1 mile) .............................................. 24 Figure 2- 5: (a) Density function for fixed distance grid =0.05 and various time grid (b) Higher resolution inset of distribution in shorter time gap (30 minutes) .................................................. 25 Figure 2- 6: (a) Normalized plot of accumulated events registered by manual and static methods in windows with a time size of 15 minutes and various distance gaps ......................................... 26 Figure 2- 7: (a) Normalized plot of accumulated events registered by manual approach and static methods in windows with a gap size of 1 mile and various time gaps, (b) Accuracy of the static method shown by the number of confirmed secondary crashes captured vs. those captured by a static method. ................................................................................................................................ 28 Figure 2- 8: (a) The ratio of confirmed secondary crash events in the manual approach to the total predicted events in static approach within a gap size of 1 mile and various time intervals. (b)The ratio of confirmed secondary crashes in the manual approach to the total predicted events in the static approach within a gap size of 15 minutes and various distance gaps. ...................... 30 Figure 2- 9: Comparison of the spatiotemporal distribution of crashes in static window versus distribution of confirmed secondary crashes in relation to previous crash (a) Temporal distribution (b) Spatial distribution. .............................................................................................. 33 Figure 2- 10: Spatiotemporal distribution of secondary crashes in relation to previous crash (a) Temporal distribution (b) Spatial distribution. ............................................................................. 35 Figure 2- 11: Comparison of secondary crash rates in previous studies which applied static approach with the current study .................................................................................................... 38 Figure 3- 1: Interstate roadways in the Detroit area ..................................................................... 43 Figure 3- 2: Segments that speed data is missing ......................................................................... 44 Figure 3- 3: Speed contour matrix. St, i, is the speed on segment i during time interval t ........... 45 Figure 3- 4: Average speed contour matrix. St, i, σt, i, is the speed on segment i during time interval t with the standard deviation of σt, i ................................................................................ 45 Figure 3- 5: Example of Speed contour plot at day 107 within PR-number 639107.................... 47 vii Figure 3- 6: Demonstration of PR-number 639107 (I-96 WB) .................................................... 48 Figure 3- 7: a) Speed Trend in section 1 within PR-number 639107 (Sunday=1, Monday=2, Tuesday=3, Wednesday=4, Thursday=5, Friday=6, Saturday=7) b) PR-number 639107 (I-96 WB) with 8 XD-segments ............................................................................................................. 49 Figure 3- 8: Yearly Speed Average for all 8 segments within PR-number 639107 ..................... 50 Figure 3- 9: Yearly speed average within each time slot (PR-number 639107) ........................... 51 Figure 3- 10: Average speed profile for each day of the week within PR-number-639107 ......... 51 Figure 3- 11: Difference between daily and yearly average speed (October 19th 2018) .............. 52 Figure 3- 12: Different average speed profiles for the day Monday, February 5th (02/05/2018) of the week within PR-number-639107 ............................................................................................ 53 Figure 3- 13: Contour plot of the density of crashes in 2018 within PR-number 639107............ 54 Figure 3- 14: Detection of secondary crashes using speed data ................................................... 56 Figure 3- 15: The ratio of actual confirmed events in the dynamic method to the total predicted events in the static approach within a gap size of 1 mile and various time intervals. b) The ratio of actual confirmed events in the dynamic method to the total predicted events in the static approach within a gap size of 15 minutes and various distance gaps ........................................... 63 Figure 3- 16: Spatiotemporal distribution of secondary crashes in relation to previous crash (a) Temporal distribution (b) Spatial distribution .............................................................................. 65 viii CHAPTER 1. INTRODUCTION AND LITERATURE REVIEW Traffic incidents, such as crashes and vehicle breakdowns, cause significant congestion in urban areas and cause 30 to 40 percent of all congestion (Skabardonis et al., 1995; Ozbay and Kachroo, 1999). The congestion caused by an incident can also increase the potential for upstream traffic crashes. Such events, generally referred to as secondary crashes, usually increase the time needed for traffic flow to return to normal (i.e., pre-incident) levels. Between 2 and 15 percent of the initial incidents can cause secondary crashes, leading to traffic operations complications (Moore, Giuliano and Cho, 2004; Hirunyanitiwattana, 2006). Secondary crashes are one of the many undesirable consequences of crashes and other types of incidents. Such crashes are typically defined based upon the congested spatiotemporal boundaries impacted by primary crashes (Yang et al. 2018). Secondary crashes have increasingly been recognized as a significant problem in freeways that frequently affect both traffic operations and safety (Imprialou et al., 2014). As reported by Owens et al. (2010), as many as 20 percent of all crashes and 18 percent of all fatalities on freeways result from secondary crashes. It has also been shown that the occurrence of an earlier crash could increase the risk of secondary crashes by more than six times (Tedesco et al., 1994; Owens et al., 2010). Karlaftis et al. (1999) found that if the clearance time of an initial incident increases by an additional minute, the likelihood of secondary crashes may rise by about 2.8 percent (Karlaftis et al., 1999). There has been considerable variability in estimates as to the proportion of all crashes that are secondary in nature. This is due to several factors, including differences in the contextual environment of these studies and challenges that are inherent in determining those crashes that are directly due to the occurrence of a prior crash (Sarker et al., 2015). For example, Raub (1997) 1 found that more than 15 percent of the crashes reported by police may be secondary in nature (Raub, 1997). The study by Karlaftis et al. (1999) examined primary crash characteristics and showed that more than 15 percent of all crashes might have resulted from an earlier incident (Karlaftis et al., 1999). Moore et al. (2004) estimated secondary crash rates between 1.5 and 3.0 percent, significantly lower than previous studies suggested (Moore, Giuliano and Cho, 2004). Zhan et al. (2008) investigated incidents that resulted in lane blockages as potential causes of secondary crashes on Los Angeles freeways using crash records and traffic data from inductive loop detectors. The result showed that only 7.9 percent of all lane blockage incidents resulted in secondary crashes (Zhan et al. 2008). Due to substantial economic and safety risks associated with secondary crashes, transportation agencies have taken various measures to minimize and mitigate the potential for and impacts of such crashes (Yang et al. 2018). One main challenge in investigating this issue is the inherent difficulty in effectively identifying which crashes are actually due to a prior crash or other incidents (Sarker et al., 2015). Existing studies have made great efforts to explore the underlying mechanisms of secondary crashes, and relevant methodologies evolved regarding the identification, modeling, and prevention of these crashes. To date, there is significant variability in both the results and underlying methods used to identify secondary crashes (Yang et al. 2018). 1.1 Existing Methods for Identification of Secondary Crashes Research has generally defined secondary crashes based on congested spatiotemporal boundaries impacted by primary crashes (Yang et al. 2018). The reliability of the spatial and temporal information of the prior incident is critical to the accuracy of secondary crash detection. Defining the impact area of an initial incident or crash is generally the first step in identifying these spatiotemporal boundaries. Various research studies have investigated different approaches to 2 identify and analyze secondary crashes. These studies can be mainly classified into three types, including manual identification of crashes using real-time data (e.g., cameras from traffic management centers) or historical records (i.e., police crash reports), automatic identification using static spatiotemporal windows, and automatic identification using dynamic windows. In the latter two approaches, after identifying the impact area of the primary crash, the second step is to identify the secondary crashes that occur within the resultant spatiotemporal boundaries (Kitali, Alluri, Sando and Lentz, 2019; Kitali, Alluri, Sando and Wu, 2019). The following sections provide further descriptions of these three approaches to secondary crash identification. 1.1.1 Manual Method Manual identification of secondary crashes can be done in either real-time or using historical data from police crash reports. Real-time identification requires visual verification of crashes through active monitoring. This is typically done by transportation agency personnel, such as staff from transportation management canters, incident responders, or law enforcement. Agencies have traditionally used this approach to identify and respond to events in near real-time. The process is simple and straightforward; however, manual identification is inefficient and can be unreliable and inconsistent for the purposes of large-scale identification (Kitali, Alluri, Sando and Lentz, 2019). This approach is also only viable in areas where there is continuous coverage of the roadway network through either closed-circuit cameras, courtesy patrol vehicles, or other resource-intensive approaches. Large-scale manual identification of secondary crashes has been done in a limited number of studies using information from police crash reports, which are a very useful source of information for such purposes (Zhang et al. 2020). In a study by Zheng (2015), five years of crash data from Wisconsin were analyzed. A procedure was developed to automatically evaluate the 3 narrative sections of police crash reports and detect potential secondary crashes if the narrative explicitly mentioned the crash was secondary in nature. Results found that the average distances from the primary crash to the upstream secondary crash were 0.29 miles. In addition, the observed average time-lapse was found to be 17 minutes between the primary and secondary crashes (Zheng et al., 2015). 1.1.2 Static Method The second approach, referred to as the static method, was first proposed in a study by Raub (1997). A fixed spatiotemporal threshold is used to identify potential secondary crashes in the static approach. Raub (1997) considered a spatial threshold of 1600 meters upstream of the primary crash and 15 minutes after the clearance of the crashes as a temporal threshold for identification purposes (Raub, 1997a). Several studies have investigated the spatiotemporal distribution of secondary crashes using various thresholds (Tedesco et al., 1994; Raub, 1997a; Karlaftis et al., 1999a; Chang and Steven, 2002; Moore, Giuliano and Cho, 2004; Kopitch and Saphores, 2011; Jalayer, Baratian-Ghorghi and Zhou, 2015; Tian, Chen and Truong, 2016). Table 1-1 summarizes the spatiotemporal windows that have been used in prior research that utilized a static approach. Chung (2013) found an average time gap of 65.81 minutes and an average distance of 1.34 miles between primary and secondary crashes (Chung, 2013). Junhua et al. (2016) investigated the spatiotemporal gaps between crashes and found an average gap time of 74 minutes and a mean distance threshold of 4.52 miles. In addition, in 19.4 and 26.5 percent of the cases, gaps of less than one mile and 10 minutes were observed, respectively (Wang, Liu, et al., 2016). Kitali et al. (2019) concluded that 90 percent of secondary crashes were detected within the spatial threshold 4 of 5 miles and temporal threshold of 150 minutes. Based on this study, the distance gap was shown to vary greatly under different traffic conditions (Kitali, Alluri, Sando and Lentz, 2019). Defining representative spatial and temporal thresholds play a critical role in the success of this method. There are also inherent trade-offs involved as considering large spatiotemporal windows leads to better sensitivity (i.e., identification of crashes that are actually secondary in nature), but at the expense of worse specificity (i.e., false identification of crashes that are not actually secondary) (Zheng et al., 2015). Moreover, considering a fixed spatiotemporal threshold may result in under or overestimating secondary crash frequencies for smaller or larger spatiotemporal thresholds. The static method is somewhat subjective and arbitrary and does not allow for consideration of the dynamic nature of traffic as the spatiotemporal thresholds vary based upon the level of traffic congestion and various other factors (Zhang, Green, and Chen 2019). Table 1- 1: Summary of a spatiotemporal window in the static method Author Spatial Temporal Boundaries Boundaries Raub (1997) 1 mile 15 minutes Karlaftis et al. (1999) 1 mile 15 minutes Moore et al. (2004) 2 miles 120 minutes Hirunyanitiwattana and Mattingly 2 miles 60 minutes (2006) Pigman et al. (2011) 3.62 miles 42 minutes Chung (2013) 1.34 miles 65.81 minutes Wang et al. (2016) 4.518 miles 74 minutes Kitali (2019) 5 miles 150 minutes Chang et al. (2003) 2 miles 120 minutes Zhan et al. (2008) 2 miles 15 minutes 5 1.1.3 Dynamic Method Finally, the third approach is a dynamic method that establishes the spatiotemporal thresholds based on the primary incident's characteristics and concurrent traffic flow conditions. In order to overcome the static approach’s limitations, recent studies have investigated various dynamic approaches, such as queuing models, speed contours, shockwave theory, and vehicle probe data to identify secondary crashes (Junhua et al. 2016; Park and Haghani 2016b; Xu et al. 2016; Zhang, Cetin, and Khattak 2015) 1.1.3.1 Queuing Model Dynamic approaches mainly use prevailing traffic flow conditions in order to identify secondary crashes and may facilitate better capture flow of the traffic and the queue formation process (Yang, Guo, and Xu 2019). Several studies developed queuing models to capture the progression of the region in which secondary crash occurs (Sun and Chilukuri 2010; Sun and Chilukuri 2007; Vlahogianni, Karlaftis, and Orfanou 2012; Chengjun Zhan, Gan, and Hadi 2009). Traffic arrival rate, departure rate, crash duration, lane capacity, and travel speed are some of the contributing factors that are used to capture the vehicle queue length (Yang et al. 2017). 1.1.3.2 Shockwave Theory Shockwave theory is used to evaluate the dynamic traffic impact of a primary crash. In a study by Zheng et al. (2014), shockwave theory is used to model the dynamic impact area of primary crashes and identify secondary crashes occurring within these areas of large-scale transportation systems. The study utilized 2010 data from nearly 1,500 miles of freeways in Wisconsin. The result showed over 85 percent of secondary crashes were of three major crash 6 types, including two-vehicle rear-end collisions, multiple-vehicle rear-end collisions, and sideswipes (Zheng et al., 2014). A total of 49,753 crashes from 2010 to 2012 on California interstate freeways, along with their corresponding upstream loop data, were analyzed by the shockwave boundary filtering method to identify secondary crashes. Based on the result, secondary accidents accounted for 1.08 percent, much lower than previous research estimates (Wang et al. 2016). In another study, traffic shockwave speed and volume at the occurrence of a primary accident were considered in order to identify secondary crashes. In order to investigate contributing factors to secondary crash occurrence logistic regression model was developed. The study analyzed accident records from three years on California interstate freeways. Results show that primary crashes with long durations may expressively raise the possibility of secondary crashes. In addition, unsafe speed and weather are found to be factors contributing to the secondary crash occurrence (Wang, Xie, et al. 2016). 1.1.3.3 Vehicle Probe Data Vehicle probe technology is used for real-time traffic estimation, and it is a common practice for data providers to report data on real-time traffic message signs. Studies attempted to explore the dynamics of traffic evolution during the primary crash using vehicle probe technology. This method proved to have a better result in identifying secondary crashes in comparison to the static method (Park and Haghani 2016a; Park, Haghani, and Hamedi 2013; Yang et al. 2017). In another study, using vehicle probe technology, a new data-driven analysis framework was developed to support the identification of secondary crashes that consists of three major components. At first, the impact area of a primary crash was detected. Then, the boundary of the impact area was estimated, and secondary crashes within the boundary were identified. The test 7 results show that the proposed approach can best describe the impact area and identify up to 95 percent of the simulated crashes (Yang et al. 2017). However, this approach is limited to freeway segments which probe vehicle data is available. 1.1.3.4 Speed Contour Wang and Jiang (2020) proposed an approach of influencing/leveraging the spatiotemporal evolution of shockwaves in speed contour plots in order to identify secondary crashes on freeways. It has been demonstrated that the defined region corresponding to a single primary crash is generally consistent with the spatiotemporal evolution of shockwaves (Wang and Jiang 2020). Speed contour plots were used in a study by Yang et al. (2014) to identify secondary crashes. Based on the results, 75 and 50 percent of all secondary crashes occur within two hours and two miles upstream of the primary crash, respectively. In addition, rear-end crashes were found to be the dominant secondary crash and improper lane changing, distracted driving as well as unsafe speed is considered to be significant contributing factors (Yang et al. 2014). Kitali et al. (2019) tried to identify the impact area of primary crashes using speed data. Based on the study, depending on the spatial and temporal influence area of the primary crash, the process of identifying secondary crashes varies. In this study, prevailing speed data in each section of the freeway was used to identify the impact range of the primary crash. Following all crashes within that impact area have been considered secondary crashes. The study's main objective was to determine the effect of traffic flow characteristics that change over space and time, such as speed, which has a significant impact on queue formation as a result of the primary crash. Results from the study showed that almost 8 percent of crashes are secondary crashes, and also more than 75 percent of secondary crashes were due to congested traffic conditions (Kitali et al. 2019). 8 Following the identification of secondary crashes, some previous studies have focused on investigating major factors contributing to the occurrence of secondary crashes. The study by Raub (1997) found that clearance time, peak hours, and weekdays are associated with more secondary crashes (Raub, 1997a). The study by Hirunyanitiwattana (2006) identifies secondary and primary crash characteristics in the California Highway System. The study revealed secondary crash rates increases in the region with high traffic volumes during morning and evening peak hours (Hirunyanitiwattana, 2006). Karlaftis et al. (1999) applied a logistic regression model to examine what primary crash characteristics are associated with the likelihood of a secondary crash. They suggested that the type of vehicle involved, the clearance time, season, and lateral location of the primary crash are significant factors (Karlaftis et al. 1999). More studies investigated contributing factors that affect the secondary crash occurrence, as shown in Table 1-2. The majority of studies used logistic regression models, and some used probit models to evaluate the existence of a significant difference between primary and secondary crashes (Khattak, Wang, and Zhang 2010; Khattak, Wang, and Zhang 2009; Vlahogianni et al. 2010; Vlahogianni, Karlaftis, and Orfanou 2012; Yang et al. 2014; Yang, Bartin, and Ozbay 2013; Zhan et al. 2008; Chengjun Zhan, Gan, and Hadi 2009). Table 1- 2: Modeling approaches and contributing factors that affect secondary crashes Author Method Test variables Karlaftis et al. Logistic regression Clearance time, vehicle type, vehicle (1999) location, season, day of week Hirunyanitiwattana Proportional test Time of day, roadway classification, and Mattingly primary crash, severity level, crash (2006) type Zhan et al. (2008) Logistic regression Incident duration, time, environmental condition, incident type, location and traffic condition, lane closure, injuries, vehicle type 9 Table 1-2 (Cont’d) Zhan et al. (2009) Logistic regression Incident duration, time, environmental condition, incident type, location, traffic condition, lane closure, injury condition, vehicle type Khattak et al. (2009) Binary probit regression Detection source, crash type, response models vehicles, AADT, whether left shoulder affected, whether during peak hours, vehicle involved Zhang and Khattak Ordinal regression Incident duration, whether truck (2010) involved, number of vehicles, lane blockage, segment length, number of lanes, curve, AADT Vlahogianni et al. Bayesian network Time, number of vehicles, distance, (2010) duration, type of vehicle, location, maximum queue length, duration of queue observed upstream Zhang and Khattak Ordinary least squares The characteristics of primary crashes, (2011) (OLS) regression road geometry, traffic Vlahogianni et al. Probit models Duration, crash type, number of lanes, (2012) number of vehicles, heavy vehicle, travel speed, hourly volume, rainfall, downstream geometry, upstream geometry Yang et al. (2013a) Logistic regression Time period, rear end, severity, duration, work zone, weekend, winter, lane closure, truck involved Yang et al. (2013b, Probit model The frequency of secondary crashes, 2014a,b) spatiotemporal distributions, clearance time, crash type, severity 1.2 Summary and Research Objectives Secondary crashes affect traffic operations and safety. These crashes are a performance measure in evaluating traffic incident management programs. Several approaches have been introduced to identify secondary crashes. Static and dynamic methods are mainly used in order to identify secondary crashes. Several thresholds have been suggested for defining the primary crash impact area and secondary crashes. However, there are some important limitations with these existing methods. For example, the static threshold method does not consider the dynamic nature 10 of traffic conditions, introducing an implicit assumption that crashes occur at uniform rates irrespective of traffic flow conditions. Further, many studies focused on understanding the reliability of one window size have not included extensive validation with a detailed review of police-reported crash data. As such, the static approaches generally result in an overestimation of actual secondary crashes. Dynamic approaches address this limitation by determining the spatiotemporal thresholds of primary crashes based on real-time traffic flow characteristics such as speed and density. However, dynamic models heavily rely on real-time traffic data, which are costly and only available in limited locations. For instance, approaches proposed based on queue length estimations require detailed queuing information, which may not be available at every location. The goal of this research is to advance our understanding of the nature of secondary crashes, including the circumstances under which such crashes are most likely to occur. To address this goal, this study aims to: 1. Conduct a detailed investigation of police crash reports in order to identify the actual number and rate of secondary crashes on the Michigan interstate network; 2. Evaluate various spatial and temporal thresholds in terms of the precision and accuracy in identifying potential secondary crashes; 3. Compare scenarios under which various static and dynamic methods present advantages or disadvantages in identifying secondary crashes; 4. Assess the frequency of secondary crashes as a function of roadway characteristics. As a part of these investigation, the research provides important insights into key areas, such as the trade-off between the sensitivity and specificity of static and dynamic models, particularly as it relates to the effect of window sizing or spatiotemporal thresholds on data 11 reliability. This includes understanding the effect of the size of the static window in a large dataset and the correlation between static window predictions of secondary crash and actual number of secondary crashes. This research also advances our understanding of dynamic secondary crash identification by estimating the impact range of primary crashes on upstream traffic using speed data and identifying secondary crashes that occur within this range. This method helps to better capture the effects of changes in traffic flow characteristics that occur over space and time and affect issues such as queue formation due to primary crashes. Compared to the previous spatiotemporal thresholds, the proposed approach provides an accurate, feasible impact area for secondary crash identification. The research also presents a sensitivity analysis of different spatial and temporal thresholds of primary crashes on the detection of secondary crashes. Lastly, following the identification of secondary crashes through both the static and dynamic method, this research involves the development of a series of regression models in order to identify the interrelationships between secondary crash occurrence and various roadway and traffic characteristics of interest. The remainder of this dissertation is organized as follows: • Chapter 2 presents the results of the application of static methods for secondary crash identification. This includes the development of a crash-pairing algorithm developed to select spatially and temporally nearby crash pairs. Further, enhancements to the static methods are introduced by optimizing the trade-off between sensitivity and specificity to find the effect of window sizing or spatiotemporal thresholds on the reliability of data. In addition, the manual approach is used to define the control set, which is used to validate the accuracy of 12 static methods used in order to identify secondary crashes. Furthermore, following the identification of secondary crashes, logistic regression and a negative binomial model were developed in order to investigate major factors contributing to the occurrence of secondary crashes. • Chapter 3 presents a dynamic method in order to identify secondary crashes. Crash data and speed data in the Detroit freeway area were used to identify the impact area of the primary crash and secondary crash identification, respectively. In addition, the manual approach is used to define the control set, which is used to validate the accuracy of dynamic methods used in order to identify secondary crashes. 13 CHAPTER 2. MANUAL METHOD AND STATIC WINDOW SIZING The static and dynamic methods were used to identify secondary crashes. In the static method, a fixed spatial and temporal threshold is used for secondary crash identification. In the dynamic method, depending on queue length and traffic flow characteristics impact area of a primary crash varies. Therefore, the actual representation of traffic flow is not considered in the static method. One of the most important aspects of this research is determining whether a crash is actually secondary in nature. This determination is ultimately based upon information from the police crash report forms. To this end, in order to identify secondary crashes, manual approach is used to identify secondary crashes from police crash reports. The result will be used to validate the accuracy of the static method used in the identification of secondary crashes. In the manual approach, narratives from the police crash reports were checked manually. Whereas in the static process, fixed spatiotemporal thresholds were considered to identify secondary crashes. Data used in this study are drawn from police-reported crash data from the Michigan Traffic Crash Facts (MTCF) data query tool, which is maintained by the Michigan State Police (MSP) Office of Highway Safety Planning (OHSP). This tool allows users to have free access to query all crash reports from Michigan law enforcement agencies dating back to 2004. Detailed information is available from each crash, including PDF copies of the police crash reports. With respect to this study, these reports include essential details, such as the date, time, and location of the crash, and a crash narrative section, which provides details of the circumstances of the crash as determined by the investigating officer. The study area includes the Michigan interstate mainline system. The study area includes the entire Michigan interstate mainline system. In 2018, a total of 312,798 crashes occurred 14 throughout Michigan and, based on the Highway Class filter on MTCF, 35,123 crashes were indicated to have occurred on the interstate system. Next, crashes occurring on either an interstate exit or entrance ramp were removed using a roadway inventory file provided by the Michigan Department of Transportation (MDOT). Based on this filter 7,359 crashes that occur on-ramps were excluded. Subsequently, 363 crashes were removed where the crash report was either missing or incomplete. The final sample included 26,679 crashes. The individual crash report forms were all subsequently downloaded from MTCF, along with pertinent summary information (e.g., crash- ID, date, time, location, crash narrative) in spreadsheet format. Table 2-1 provides information about the crashes included in the analysis. Table 2- 1: Crashes used in the analysis Criteria Number Total crashes in interstate mainline Michigan 34,437 Missing or incomplete crash reports 363 Crashes on ramps 7,395 Total crashes included in the analysis 26,679 2.1 Keyword-Searching Approach/ Checking Narratives One of the most important aspects of this research is determining whether a crash is actually secondary in nature. This determination is ultimately based upon information from the police crash report forms. To this end, in order to identify secondary crashes, information from two primary fields in the crash report form was utilized. This included a series of standard fields that are used to designate various subsets of crashes, as well as a keyword search or manual approach that was used to review the narrative section from police crash reports. After identifying those crashes that were secondary in nature, the accuracy of the static window method was used to assess the efficacy of various fixed spatiotemporal time and distance thresholds in identifying secondary crashes. 15 At the onset of the study, reports for all crashes occurring on the Michigan interstate system in 2018 were obtained from the MTCF database. Police crash reports are critical to identifying secondary crashes as the investigating officers generally have either first- or second-hand information regarding the cause of a crash and various precipitating factors. However, the reporting accuracy depends on officers’ training, their understanding of how such crashes are defined, and related knowledge that a primary crash has occurred (Zhang et al. 2020). On the Michigan UD-10 crash report form, the contributing circumstances field indicates those factors that precipitated the occurrence of a crash, see Table 2-2. This field is also useful for explicitly identifying the occurrence of a secondary crash. Table 2- 2: Contributing circumstance codes on Michigan UD-10 crash report None Other Backup - Other Incident Glare Backup - Reg. Congestion Shoulders Prior Crash Traffic Control Device Unknown For each crash report, the unique crash identification number (crash-ID) was determined, along with data from a contributing circumstances field and the officer’s crash narrative. The contributing circumstances field provides a list of common factors that are found to precipitate the occurrence of a crash. This field includes three primary codes that may be indicative that a secondary crash has occurred: (1) prior crash; and (2) backup due to other incident. However, prior experience has shown there is often some variability in terms of how different officers complete this and other related fields on the crash report form. Consequently, as a first step, all narratives for crashes where one of the secondary crash related contributing circumstances were indicted were manually reviewed in order to assess 16 whether the crash was truly secondary in nature. There are two conditions to determine secondary crashes for this method; 1) The prior crash contributing circumstance was selected, and there was no conflicting information in the narrative section (see example Figure 2-1 and 2-2); or 2) The narrative section explicitly indicated the occurrence of a prior crash, though one of the other (i.e., not “prior crash”) contributing circumstances was selected. Based on the crash code, 1,896 crashes were coded as being due, at least in part, to a prior crash under the contributing circumstance field. For all those crashes, crash narratives have been reviewed manually, and the result showed that 277 crashes (14.6 percent) were found to be not meet the conditions and therefore not related to prior crashes. For such crashes, another reason other than a prior crash was mentioned in the narrative as the cause of a crash occurrence, see Table 2-3 for example of miscoded crashes. Also, in case that crash narrative section was blank, crash considered a secondary crash. Table 2- 3: Example of crashes with secondary crash code that not meet the secondary crash identification conditions Crash-ID Crash Code Narrative 1253395 Backup Due to Other Unit 1 was traveling E/B on I-96 when Incident she lost control, ran off the roadway to the left, and struck the cable barrier. 1253356 Backup Due to Other Vehicle 1 spun out after losing control Incident and was struck by Vehicle 2. 1256253 Prior Crash Driver 1 lost control after hitting a patch of ice. She was adamant that she was not going to fast and that the crash was caused by ice. She left the road and struck the cable barrier. 17 In the second step, the keyword-searching approach was used to identify additional target crashes based on the crash narratives. The keywords that were used in this method were previous crash, another crash, prior crash, previous accident, another accident, and prior accident, which are keywords that are used in the narratives by the officer to describe a secondary crash. These keywords were chosen after a manual review of secondary crash narratives that were identified in the previous step. Based on this method, an additional 249 secondary crashes were identified. The finding from this method also shows that law enforcement typically coded the contributing circumstances as backup due to regular congestion or other incidents instead of prior crashes. In total, 1,872 secondary crashes were identified based on the crash code and word searching approach. There were 882 cases where the contributing circumstance was noted as a prior crash. Among these, 155 were found to have been due to some other (i.e., non-crash) event, such as a vehicle breakdown. Similarly, 892 of 1,014 crashes where the contributing circumstance was due to backup caused by another incident appeared to have been due to another prior crash. Table 2-4 shows the summary of secondary crash results in a manual approach. Table 2- 4: Secondary crash results in manual method Contributing circumstances Total Nr. of Confirmed Nr. Other (non- (crash code) crashes of secondary secondary) crashes crashes Prior Crash 882 731 155 Backup - Other Incident 1014 892 122 Backup - Reg. Congestion 5,221 62 5,159 (Identified from Narrative) Other (Identified from Narrative) 19,562 187 19,373 Total 26,679 1,872 24,807 18 Table 2-5 shows the final result from the manual approach. Based on the result from the current method, almost 7.02 percent of the interstate mainline crashes are considered secondary crashes. Table 2- 5: Summary of manual approach result TYPE OF CRASH NUMBER OF CRASH PERCENTAGE Secondary crash 1,872 7.02 Crashes not due to congestion or 24804 92.98 another crash Total 26,679 100 19 Figure 2- 1: Example crash report and narrative indicating a secondary crash 20 Figure 2- 2: Example crash report and narrative indicating a secondary crash 21 2.2 Static Sizing: Spatiotemporal Window Under the static approach, a crash is classified as secondary in nature if it falls within a predefined time-space window originating from another (prior) crash. In order to identify potential secondary crashes, each crash is associated with its corresponding interstate road number, the geospatial location on the road, and the associated date and time. Using linear referencing in ArcGIS, the exact locations for each crash along a particular highway were determined based upon a route-specific identification number and a mile marker. Consecutive crashes on each road segment were identified based on date and time. The distance between two consecutive crashes was calculated from the difference between corresponding mile points. It is essential to mention that each direction has been considered separately, and only crashes that are happening in the same direction and upstream of the primary crash have been considered. For each crash, a spatiotemporal window was assigned, and then the events in the window were recorded as a secondary crash. In view of the large size of the database, nearest neighboring methods were coded in MAPLE, which is a math software to enable global identification of the nearest event in the crash database 1. The problem with the spatiotemporal window can be best summarized in Figure 2-3, which shows increasing the size of the window will increase true positives but will also increase false positives. Accordingly, there is an inherent trade-off between sensitivity and specificity of the given method, which can be tweaked to achieve a comprehensible result. Here the sensitivity defines as the probability of correctly identifying a secondary crash versus specificity is a 1 Maplesoft, a division of Waterloo Maple Inc.. (2019). Maple. Waterloo, Ontario. Retrieved from https://hadoop.apache.org 22 probability of correctly identifying a non-secondary crash. In the following section, this trade-off will be explored. Figure 2- 3: Trade-off between sensitivity and specificity 2.3 Analysis and Results Sizing of ST window: After determining the spatiotemporal thresholds between consecutive crash events within the 2018 interstate crash dataset, different time and distance intervals were used to define different sizes of spatiotemporal windows. Figure 2-4 (a) shows the probability density function for all crashes happening within 15 minutes of another crash and within different distance gaps from 1/2 to 5 miles. The inset shows further details within a one- mile radius. As can be seen, most of the crashes that are potentially secondary in nature occur within the first 0.2 miles and 15 minutes from the primary crash. Figure 2. 4 shows the probability density function for all crashes happening within the first mile gap and different time gaps from 0 to 120 minutes. As shown in Figure 2-4, within the first-mile gap, most crashes occur in the first 7-minute period after the primary crash, with gradual and persistent decreases in subsequent thresholds. 23 Considering the average frequency of occurring incidents to be a constant throughout the search space, the elevation in density near the peak shows the high specificity which is probability of correctly identifying a non-secondary crash of the static window to crashes that occurred in that region. As expected, the specificity fades as the window size increases while the sensitivity, probability of correctly identifying a secondary crash, increases. 0.50 Probability 0.40 0.30 0.20 0.40 0.10 0.35 0.00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.30 0.25 b) Distance (mile) Probability 0.20 0.15 0.10 0.05 0.00 0.25 0.5 0.75 1.25 1.5 1.75 2.25 2.5 2.75 3.25 3.5 3.75 4.25 4.5 4.75 1 2 3 4 5 a) Distance (mile) Figure 2- 4: (a) Density function for fixed time grid=15 minutes and various distance grid, (b) Higher resolution inset of distribution in shorter distance (1 mile) 24 0.5 0.4 0.3 Probability 0.4 0.2 0.35 0.1 0.3 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0.25 b) Time (minute) Probability 0.2 0.15 0.1 0.05 0 10 20 30 40 50 60 70 80 90 100 110 120 a) Time (minute) Figure 2- 5: (a) Density function for fixed distance grid =0.05 and various time grid (b) Higher resolution inset of distribution in shorter time gap (30 minutes) In order to understand the effect of window size on the accuracy of the predictions, one can plot the predictions obtained with a spatiotemporal approach against the actual confirmed events as determined by the crash code, and manual approach described previously. To this end, Figure 2 – 6 (a) was plotted with respect to the following four parameters • 𝑁𝑁𝑀𝑀[𝐿𝐿,𝑇𝑇] : Number of confirmed secondary crashes identified by the manual approach which fall within a specific spatiotemporal window from the first crash • 𝑁𝑁𝑀𝑀−𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 : Total number of confirmed secondary crashes identified by manual approach in the largest window • 𝑁𝑁𝑆𝑆[𝐿𝐿,𝑇𝑇] : Number of crashes that exists within a specific spatiotemporal window from the first crash • 𝑁𝑁𝑆𝑆−𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 : Number of crashes in the largest window 25 Figure 2-6. (a) demonstrates the normalized plot of the secondary crashes occurring within spatiotemporal windows of different distances with increments of one mile (a fixed time gap of 15 min is assumed) against the total number of secondary crashes identified by manual word searching within the largest window. Here the largest window is 15 minutes and 6 miles. The total crashes within this time gap window is 977 and from those 171 confirmed secondary crashes. The red line shows the normalized plot of the manually identified secondary crash with respect to different window sizes. As expected, as the window size increases, all secondary crashes identified by the manual approach will be covered by the spatiotemporal window. To describe what percentage of the crashes that fall within the spatiotemporal window are secondary crashes, Figure 2-6. (b) was developed where the ratio of the secondary crash to the total number of crashes for different sizes of windows was plotted. Similarly, as the spatiotemporal window grows, the sensitivity of the static method fades due to the large number of non-secondary crashes that are included (false positives). 𝑁𝑁𝑀𝑀[𝐿𝐿,𝑇𝑇] 𝑁𝑁𝑀𝑀−𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 Normalized number of identified events 𝑁𝑁𝑆𝑆[𝐿𝐿,𝑇𝑇] 𝑁𝑁𝑆𝑆−𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 a) Radius (Mile) Figure 2- 6: (a) Normalized plot of accumulated events registered by manual and static methods in windows with a time size of 15 minutes and various distance gaps 26 Figure 2- 6 (Cont’d) Ratio of events identified by each method 𝑁𝑁𝑀𝑀[𝐿𝐿,𝑇𝑇] 𝑁𝑁𝑆𝑆[𝐿𝐿,𝑇𝑇] b) Radius (Mile) (b) Accuracy of the static method shown by the number of confirmed secondary crashes captured vs. those captured by static method for with time size of 15 minutes and various distance gap Figure 2 - 7 shows that a similar statement can be made when the windows are growing in the time dimension, as well. While the offset and the slope of the normalized static method and manual approach curves may be different (compare Figure. 2-6 (a) and Figure. 2-7 (a), the blue line shows the ratio of crashes within the designated spatiotemporal window to the total number of crashes in the static method. This fact is better shown in Figures 2-6. (b) and 2-7. (b), where the ratio of confirmed secondary crashes that were identified in the manual approach (red curve) against crashes within the spatiotemporal window (blue curve) is plotted. It should be noted that the total number of crashes in the largest window here within a 6-mile distance gap (Figure 2-6) and 300 minutes time gap (Figure 2-7) is different therefore, the percentages vary in Figure 2-6. (a), and Figure 2-7. (a), accordingly. 27 𝑁𝑁𝑀𝑀[𝐿𝐿,𝑇𝑇] Normalized number of identified events 𝑁𝑁𝑀𝑀−𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑁𝑁𝑆𝑆[𝐿𝐿,𝑇𝑇] 𝑁𝑁𝑆𝑆−𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 a) Time (minute) Ratio of events identified by each method 𝑁𝑁𝑀𝑀[𝐿𝐿,𝑇𝑇] 𝑁𝑁𝑆𝑆[𝐿𝐿,𝑇𝑇] a) Time (minute) Figure 2- 7: (a) Normalized plot of accumulated events registered by manual approach and static methods in windows with a gap size of 1 mile and various time gaps, (b) Accuracy of the static method shown by the number of confirmed secondary crashes captured vs. those captured by a static method. 28 In order to illustrate the loss of accuracy by decreasing sensitivity, the ratio of all verified secondary crashes to the estimated number of secondary crashes under the static approach is evaluated by plotting 𝑁𝑁𝑀𝑀[𝐿𝐿,𝑇𝑇] / 𝑁𝑁𝑆𝑆[𝐿𝐿,𝑇𝑇] for different sizing of spatiotemporal windows. This plot shows that the sensitivity (i.e., probability of correctly identifying a secondary crash) of the static method is highest at the smallest time and distance windows. The proportion of secondary crashes that are correctly identified, 𝛼𝛼, is illustrated on windows with different spatial and temporal sizes (see Figure 2-8. (a) and Figure 2-8. (b). In general, Figure 2-8 suggests that the static method performs poorly at larger time and distance thresholds. The general trend here implies that the rate of crashes identified by the static method stabilizes at distances of approximately 3 miles and time periods of approximately 60 minutes in these scenarios. The same general pattern is observed in the analysis of different geographic regions, within the same regions during different seasons, and across different highway segments. In other words, one can say Equation 1, 𝑵𝑵𝑴𝑴[𝑳𝑳,𝑻𝑻] = 𝜶𝜶 𝑵𝑵𝑺𝑺[𝑳𝑳,𝑻𝑻] where 𝜶𝜶 ≅ [0.27 - 0.09] (1) • 𝑁𝑁𝑀𝑀[𝐿𝐿,𝑇𝑇] : Confirmed secondary crash events in the manual approach • 𝑁𝑁𝑆𝑆[𝐿𝐿,𝑇𝑇] : Number of crashes that exists within a specific spatiotemporal window from the first crash • 𝛼𝛼: Convergence limit (Sensitivity) Therefore, 𝛼𝛼 is the sensitivity of the of secondary crashes identified by the static window approach. It can be seen that within the aforementioned spatiotemporal window, as the window grows, the sensitivity drops to reach the line which has a constant declining rate which is correlated with the linear expansion of window size. The declining rate of the line can be considered almost constant since after certain window size based on the literature, secondary crashes rarely occur 29 beyond a time and distance thresholds, here for windows larger than 6 distance and 300 minutes time. The maximum drop in sensitivity occurs right before merging of the 𝛼𝛼 to the line Therefore, a spatiotemporal window can be used to estimate the number of confirmed secondary crashes identified by the manual approach. Ratio of events identified by each method 𝑁𝑁𝑀𝑀[𝐿𝐿,𝑇𝑇] 𝑁𝑁𝑆𝑆[𝐿𝐿,𝑇𝑇] a) Radius (mile) Figure 2- 8: (a) The ratio of confirmed secondary crash events in the manual approach to the total predicted events in static approach within a gap size of 1 mile and various time intervals. (b) The ratio of confirmed secondary crashes in the manual approach to the total predicted events in the static approach within a gap size of 15 minutes and various distance gaps. 30 Figure 2- 8 (Cont’d) Ratio of events identified by each method 𝑁𝑁𝑀𝑀[𝐿𝐿,𝑇𝑇] 𝑁𝑁𝑆𝑆[𝐿𝐿,𝑇𝑇] b) Time (minute) Table 2-6 shows the number of crashes within each spatiotemporal window (projected positive) and the number of confirmed secondary crashes and the ratio within each spatiotemporal window (true positives). The table shows the specificity and sensitivity. Table 2- 6: Secondary crash distribution for interstate roads in Michigan based on static and manual approach Distance Time Number of Number of Specificity Sensitivity grid gap crashes in verified secondary (within (Mile) (Min) spatiotemporal crashes within 300min, window spatiotemporal 6mile) N_S[L,T] window N_M[l,T] 1 15 509 142 93% 27.70% 30 773 185 93% 24.00% 60 1155 254 94% 21.80% 300 2605 362 94% 13.80% 3 15 740 166 93% 22.40% 30 1207 220 93% 18.30% 60 1929 318 94% 16.50% 300 5151 526 95% 10.20% 31 Table 2- 6 (Cont’d) 6 15 977 171 93% 17.50% 30 1611 235 93% 14.60% 60 2764 354 94% 13.40% 300 7431 638 unknown 8.90% A similar analysis has been done for each interstate roadway in Michigan in order to identify secondary crashes in each freeway. The goal was to determine which road is more critical and concerned in the possibility of secondary crashes occurrence. As previously mentioned, each direction has been considered separately. For each primary crash, crashes that occur in the same direction and upstream of a primary crash, are considered, and their spatiotemporal gap has been recorded. Table 2-7 shows percentages of secondary crashes in each of thirteen interstate roadways in Michigan. The results are based on the 6 miles and 300 minutes space-time window. Table 2- 7: Secondary crashes for interstate roads in Michigan based on static and manual approach Freeway Number Number of Percentages Number Number of Percentages of confirmed of confirmed of crashes confirmed of secondary Crashes secondary secondary in secondary crashes in crashes in crashes in spatiotem crashes in spatiotempor manual manual poral spatiotempo al window approach approach window ral window I-69 1835 125 6.8 199 33 16.6 I-75 7041 423 6.0 3016 182 6.03 I-94 7564 605 8.0 1627 180 11.1 I-96 5021 336 6.7 1072 116 10.8 I-194 52 2 3.8 12 0 0.0 I-196 1439 112 7.8 309 42 13.6 I-296 239 23 9.2 110 10 9.1 I-375 95 3 3.1 21 3 14.3 I-475 286 15 5.2 166 6 3.6 I-496 400 42 10.5 102 11 10.8 I-675 90 4 4.4 47 4 8.5 I-696 1984 152 7.7 533 50 9.4 I-275 633 32 5.1 366 14 3.8 Total 26,679 1,872 7.0 7586 651 8.6 32 A similar correlation factor has been observed in this set of results. The number of secondary crashes identified by static methods in each road is higher than the number of secondary crashes identified by the manual approach. Based on the result, interstate roads I-496 and I-375 are assumed to have the highest and the lowest rate of secondary crashes by 10.5 percent and 3.1 percent consecutively. Figure 2-9 shows the comparison of the spatiotemporal distribution of crashes in static window versus distribution of confirmed secondary crashes in relation to previous crash temporally, Figure 2-9 (a) and spatially, Figure 2-9 (b). Both figures show the frequency of crashes are higher in shorter time and distace interval. In addition, the crash frequency drops with increase in time and distance gap. 1200 1000 Crash frequency 800 600 400 200 0 0.25 0.5 0.75 1.25 1.5 1.75 2.25 2.5 2.75 3.25 3.75 3.5 4.25 4.75 5.5 4.5 5.25 5.75 1 2 3 4 5 6 (b) Distance gap to the previous crash (mile) Crash frequency in static window Cofirmed secondary crash frequency Figure 2- 9: Comparison of the spatiotemporal distribution of crashes in static window versus distribution of confirmed secondary crashes in relation to previous crash (a) Temporal distribution (b) Spatial distribution. 33 Figure 2-9 (Cont’d) 1200 1000 Crash frequency 800 600 400 200 0 15 30 45 60 90 105 120 135 150 165 180 195 210 230 245 260 275 300 (a) Time gap to previous crash Crash frequency in static window Cofirmed secondary crash frequency Figure 2-10 shows the temporal and spatial distribution and characteristics of the actual confirmed secondary crashes within each static window. Temporally, approximately 65 percent of the secondary crashes were found to occur within 90 minutes time gap from the previous crash. Spatially, about 80 percent of the secondary crashes occurred within a 2.5-mile distance gap from the previous crash. Generally, about 60 percent of secondary crashes occurred within 75 minutes of the time gap of the previous crash and within one mile upstream of the previous crash. In other words, about 40 percent of secondary crashes occurred beyond the most commonly used one mile and 75 spatiotemporal thresholds. 34 180 120.00% Frequency of secondary crashes 160 100.00% 140 120 80.00% 100 Cumulative percentage ≅ 70% 60.00% 80 60 40.00% 40 20.00% 20 0 0.00% 15 30 45 60 75 90 105 120 135 150 165 180 195 210 230 245 260 275 300 (a) Time gap to previous crash (minute) Frequency Cumulative % 250 120.00% Frequency of secondary crashes 100.00% 200 80.00% 150 Cumulative percentage ≅ 80% 60.00% 100 40.00% 50 20.00% 0 0.00% 0.25 0.5 0.75 1.5 2.5 3.5 4.5 5.5 1 1.25 1.75 2 2.25 2.75 3 3.25 3.75 4 4.25 4.75 5 5.25 5.75 6 (b) Distance gap to previous crash (mile) Frequency Cumulative % Figure 2- 10: Spatiotemporal distribution of secondary crashes in relation to previous crash (a) Temporal distribution (b) Spatial distribution. 2.4 Discussion and Conclusion Crashes constitute a significant source of delays, system unreliability, and inefficiency on freeways. The congestion caused by primary crashes often exposes the subsequent vehicle to the risk of secondary crashes. While secondary crashes are relatively infrequent, they pose a 35 significant safety risk in freeways and highly affect traffic operations and flow. Despite substantive research efforts, there is still considerable uncertainty as to the magnitude and nature of secondary crashes. The spatial and temporal influence of primary crashes on road users are closely related to occurrences of secondary crashes. Some studies, mostly based on static methods, have defined secondary incidents based on fixed spatial and temporal thresholds. In this approach, a fixed spatiotemporal window is assumed around the primary crash. In addition, the static approach considers the same window for all types of primary crashes regardless of the upstream traffic flow, density and speed. In this work, by leveraging a huge database of all events on Michigan Interstate roads in 2018, a keyword-searching/manual approach has been performed to define the control set of a secondary crash based on police reports. Results from manual approach are then used to validate the accuracy of the static method in order to identify secondary crashes. Based on manual results, about 7 percent of interstate crashes were recorded by police officers as secondary crashes. In addition, a large set of static window sizes was explored, and it was found that while predicting secondary crashes with fixed-size windows yield a significant overestimate, window sizes can be used to derive values that are linearly correlated with the confirmed number of secondary crashes regardless of the window size, traffic flow, density, and speed. By benchmarking secondary crash densities identified using different static thresholds with confirmed secondary crash density obtained by the manual approach, it has been shown that the static method consistently overestimates secondary crash rates, this can be seen in Figure 2-11. Table 2-8 shows the result from some of the previous studies which applied a static approach to identify secondary crash rates, and Figure 2-11 demonstrate the comparison of the result from the 36 previous studies with the result from the current study considering different spatiotemporal thresholds. Table 2- 8: Summary of secondary crash rates in literate Study Secondary Spaciotemporal Threshold Crash Rate Raub (1997) 15% 15 min and 1 mile Karlaftis et al. (1999) 35% 15 min and 1 mile Moore et al. (2004) 1.5% to 3% 2 hours and 2 miles upstream in both directions Kopitch and Saphores 5.53% 2 hours and 2 miles upstream in both (2011) directions Green et al. (2012) 0.10% to 80 min and 1,000 ft 0.15% Zhan et al. (2008) 7.90% Clearance time + 15min and 2 miles Figure 2-11 shows the comparison of the secondary crash rates from previous studies with the secondary crash rates within the current study. The blue dots in Figure 2-11 shows the secondary crash rate in different studies considering the static method and designated spatiotemporal thresholds. The orange color dots show the secondary crash rates within the current research regardless of the spatiotemporal thresholds. 37 35.00 30.00 25.00 20.00 15.00 10.00 5.00 0.00 Raub (1997) Karlaftis et Moore et al. Kopitch and Green et al. Zhan et al. al. (1999) (2004) Saphores (2012) (2008) (2011) Secondary crash rates in previous studies Total confirmed secondary crash rate in 2018 Figure 2- 11: Comparison of secondary crash rates in previous studies which applied static approach with the current study It should be noted that secondary crashes occur within the spatiotemporal impact area of the primary crash therefore, shorter spatiotemporal windows have been considered. It was found that with the increase in spatiotemporal window sizing, the specificity fades as the sensitivity increases. Identifying the factors that lead to secondary crashes is the first step toward preventing the occurrence of secondary crashes. Existing studies have used several statistical models to analyze the risk of secondary crash occurrence. The current research has adopted logistic regression and negative binomial models to identify characteristics that distinguish secondary crashes from primary crashes. This study's proposed methodological approach and research findings provided insights into the effects of traffic conditions, geometric characteristics, weather conditions, and primary crash characteristics on the probability of multiple secondary crashes on freeways. The logistic regression model suggests that the number of lanes, weather conditions, posted speed limit, crash severity, which involves fatal injury, number of units involved in the crash, and 38 crashes with emergency medical service involved are among key variables that affect secondary crash occurrence. The negative binomial model suggests that annual average daily traffic (AADT), large urbanized areas (with a population of more than 200,000), and median with concrete barriers are among the key variables that affect secondary crash occurrence. This result is expected to provide useful information in developing policies and strategies to prevent the occurrence of secondary crashes. Moreover, the developed model can also be incorporated into advanced traffic control systems on freeways to avoid the occurrence of secondary crashes. Secondary crashes caused by other non-crash incidents and also the effect of crashes in the opposite traffic direction deserve more investigation. In summary, the static method may fail to capture the impact area of primary crashes and often overestimate the secondary crash by considering all the nearby events as the secondary crash. On the other hand, dynamic approaches address this limitation by determining the spatiotemporal thresholds of primary crashes based on real-time traffic flow characteristics such as speed and density. Further investigation and dynamic method are recommended for future study. 39 CHAPTER 3. SECONDARY CRASH IDENTIFICATION BASED ON SPEED DATA Secondary crashes occur within the impact area of a prior incident and can lead to an increase in traffic flow, fluctuation, and risk of subsequent crashes. In order to mitigate the safety impact and congestion associated with secondary crashes, strategies should be developed to reduce the potential for such crashes. As described in the previous section, the static method identifies secondary crashes based on pre-specified spatiotemporal parameters. It has serious limitations as it fails to capture the actual impact range of primary crashes. Dynamic methods address the limitations associated with static methods. Despite their widespread application, static studies generally run into concerns as to their reliability due to their one-size-fits-all approach to the problem. Many prior studies using static methods have also assessed sensitivity of the results without explicitly validating secondary crash estimates with ground truth data as to the actual number of crashes in a large pool of data. Such approaches generally result in an overestimation of actual secondary crashes. The static threshold method also generally does not consider the actual representation of traffic conditions. The influence area, from both a temporal and spatial perspective, is expected to vary based upon real-time traffic flow characteristics (e.g., speed, density) and other factors. Compared to the static approach, the dynamic method is more advanced and reliable by limiting the search space based on traffic flow characteristics rather than assigning a static spatiotemporal window. However, the implementation of the dynamic approach depends on the availability of real-time traffic data. While traffic sensors for real-time traffic flow measurements are only available on limited access facilities, the use of the dynamic method is limited to the locations with available sensor data. Moreover, this method is resource-hungry and data-intensive. In this thesis, a dynamic secondary crash identification 40 method is proposed, which focuses on estimating the impact range of the primary crash using speed data. The proposed approach aims to use the data from traffic flow characteristics, such as speed, which change over space and time to describe the queue formation as a result of a primary crash. The contributions of this research are summarized as follows: • Identify secondary crashes from the integration of the speed contour plot and the spatiotemporal evolution of the primary crash impact area. • The current method can determine impact areas associated with multiple incidents and confirm that each impact area is consistent with the spatiotemporal evolution of shockwaves. • The proposed approach should lead to reducing the misidentification of secondary crashes compared to the static approach that considers fixed spatiotemporal thresholds. • Lastly, this research aims to identify those contextual environments where the risks of secondary crashes are most pronounced, culminating in guidance to assist road agencies in effectively monitoring and clearing crashes and other incidents to minimize the potential for secondary crashes 3.1 Data Acquisition Data used in this study are drawn from police-reported crash data from the Michigan Traffic Crash Facts (MTCF) data query tool, which is maintained by the Michigan State Police (MSP) Office of Highway Safety Planning (OHSP). This tool allows users free access to query all crash reports from Michigan law enforcement agencies dating back to 2004. Detailed information is available from each crash, including PDF copies of the police crash reports. With respect to this 41 study, these reports include important details, such as the date, time, and location of the crash, as well as a crash narrative section, which provides details of the circumstances of the crash as determined by the investigating officer. In 2018, a total of 312,798 crashes occurred throughout Michigan, and based on the Highway Class filter on MTCF, 34,437 crashes were indicated to have occurred on the interstate system. Next, crashes occurring on either an interstate exit or entrance ramp were removed using a roadway inventory file provided by the Michigan Department of Transportation (MDOT). Based on this filter, 7,359 crashes that occur on-ramps were excluded. Subsequently, 363 crashes were removed where the crash report was either missing or incomplete. Given the resources required for this dynamic analysis, the study area was constrained to include only the Detroit metro area interstate mainline system. Interstate in Detroit area includes all roads that are located in Macomb, Oakland, and Wayne county. The final dataset included a total of 13,392 crashes in the Detroit area. The individual crash report forms were all subsequently downloaded from MTCF, along with pertinent summary information (e.g., crash-ID, date, time, location, crash narrative) in spreadsheet format. In addition, real-time traffic data and speed from the Regional Integrated Transportation Information System (RITIS) website were used in this study. “RITIS is an automated data sharing, dissemination, and archiving system that includes many performance measures, dashboard, and visual analytics tools that help agencies to gain situational awareness, measure performance, and communicate information between agencies and to the public” 2. Real-time speed data for every 15-minute interval for every interstate segment was downloaded from RITIS. In order to acquire stable traffic flow rates, literature recommended utilizing a minimum of 15 minutes measurement 2 https://ritis.org/ 42 intervals (Smith and Ulmer, 2003). It should be noted that natural traffic flow data at shorter time intervals may contain a large amount of noise (Guo et al., 2017). Michigan roadways consist of different PR-Numbers, and each PR-Numbers consists of different XD-segments with different mile points. PR-Number is the physical road number of the segment, as imported from the Michigan Geographic Framework and XD-segment stands for extreme definition segment. Based on the definition, “XD-segments are segments that cover more miles of road than TMC segments, generally with greater granularity, and with the ability to adapt more quickly to changes in the road network and the addition of new roads and new markets” (Glossary - INRIX, no date). In total, there are 967 segments and 32 PR-number within Detroit area interstate roadways, see Figure 3-1. Figure 3- 1: Interstate roadways in the Detroit area From speed data downloaded from RITIS, speed data were missing for 83 segments (Figure 3-2). For those segments, the speed will be interpolated based on speed data from adjacent 43 segments. Missing data replaced by the average speed of the segment below and above that missing segment. Figure 3- 2: Segments that speed data is missing ArcGIS was used to create a new linear reference system based on the Detroit crash data and prepared linear referencing files for XD-segments, PR-numbers, and crashes in the Detroit area. 3.2 Determination of Spatiotemporal Speed Matrix Literature suggests that the evolution of travel speed in a link can be visualized by a speed contour plot (Park, Gao and Haghani, 2017; Wang, Qi and Jiang, 2018; Wang and Jiang, 2020). To construct a speed contour plot, a road section is segmented into 𝑖𝑖 sections and these sections labeled 1 to 𝑖𝑖 from upstream to downstream. The time period is discretized into T intervals labeled 1 to T. Here 𝑇𝑇 = 96 , as the time interval is 15-minutes, so the time period is discretized from 1 to 96 for a 24-hour time period. The combination of a specific time period and a particular road 44 segments defines a cell in the speed contour matrix, see Figure 3-3. 𝑆𝑆𝑡𝑡,𝑖𝑖 is defined as a travel speed in segment 𝑖𝑖 within time interval 𝑡𝑡. Figure 3-4 demonstrate average speed contour matrix using ̅ , is the average speed on segment 𝑖𝑖 yearly speed data observation on each day of the week. 𝑆𝑆𝑡𝑡,𝑖𝑖 during time interval 𝑡𝑡 with the standard deviation of 𝜎𝜎𝑡𝑡,𝑖𝑖 . It should be noted that separate yearly average speed profile for each of the seven days of the week was calculated. Time Traffic Flow Direction Segments Figure 3- 3: Speed contour matrix. 𝑆𝑆𝑡𝑡,𝑖𝑖 , is the speed on segment 𝑖𝑖 during time interval 𝑡𝑡 Time Traffic Flow Direction Segments ̅ , 𝜎𝜎𝑡𝑡,𝑖𝑖 , is the speed on segment 𝑖𝑖 during time Figure 3- 4: Average speed contour matrix. 𝑆𝑆𝑡𝑡,𝑖𝑖 interval 𝑡𝑡 with the standard deviation of 𝜎𝜎𝑡𝑡,𝑖𝑖 45 3.3 Determination of Impact Area The main goal is to compare the yearly speed matrix, where 𝑆𝑆𝑡𝑡,𝑖𝑖 ̅ is defined as yearly average travel speed in segment 𝑖𝑖 within time interval 𝑡𝑡, for each day of the week, with a daily speed contour matrix, 𝑆𝑆𝑡𝑡,𝑖𝑖 and assign a threshold to determine the spatiotemporal range and whether the daily speed is noticeably smaller than the yearly average travel speed for each day of the week. In ̅ was calculated for each day of the week separately, as the yearly average the current study, 𝑆𝑆𝑡𝑡,𝑖𝑖 speed varies for each day of the week. Therefore, a separate speed profile for each of the seven days of the week was calculated. The value of the cut-off deviation has a significant influence on describing the affected zones after a crash. Decreasing the cut-off thresholds reduces the affected zone upstream. Different scenarios have been considered to determine the crash impact area. In the current study, various cut-off deviation, such as 5 mph and 10 mph cut-off-speed, and standard deviation (STD), 1.65STD, 2STD, 3STD has been considered, and secondary crashes were identified based on each scenario. ̅ ≤ 𝜎𝜎𝑡𝑡,𝑖𝑖 𝑆𝑆𝑡𝑡,𝑖𝑖 1 𝑖𝑖𝑖𝑖 𝑆𝑆𝑡𝑡,𝑖𝑖 − 𝑆𝑆𝑡𝑡,𝑖𝑖 ̅ 𝑄𝑄𝑡𝑡,𝑖𝑖 = � (1) 0 𝑂𝑂𝑂𝑂ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑄𝑄𝑡𝑡,𝑖𝑖 : Discriminant binary indicator 𝜎𝜎𝑡𝑡,𝑖𝑖 : Standard deviation 𝑖𝑖: Segment number 𝑡𝑡: Time step 𝑆𝑆𝑡𝑡,𝑖𝑖 : Speed on segment i in time step t ̅ : Average yearly speed of the day of the week on segment i during time step t 𝑆𝑆𝑡𝑡,𝑖𝑖 46 To be specific if 𝑄𝑄𝑡𝑡,𝑖𝑖 = 1, the matrix cell is considered a congested area. As a result, the discriminant binary indicator 𝑄𝑄𝑡𝑡,𝑖𝑖 can be used to indicate whether the vehicle speed in segment 𝑖𝑖 during time interval 𝑡𝑡 is substantially lower than the yearly average speed within each day of the week. If there is an existing crash in cell 𝑖𝑖, 𝑡𝑡, the speed reduction is assumed to be due to the crash occurrence. Figure 3-5 shows an example of a speed contour plot for day 107 (04/17/2018) within PR-number 639107 (a segment in I-96 WB) in the Detroit area. Here 𝑇𝑇 = 96 and 𝐼𝐼 = 8. The time interval is 15-minutes, so the time period is discretized from 1 to 96 for a 24-hour time period. Based on the direction of traffic flow, segment 1 is considered to be upstream of segment 8. Based on the definition, for cells that speed is below the yearly average speed, the color changes from white to red. Time Segments Traffic Flow Direction Crash Figure 3- 5: Example of Speed contour plot at day 107 within PR-number 639107 3.4 Secondary Crash Identification Approach As mentioned in the previous section, each crash is matched to a specific location along the roadway segment based on geographic coordinates using ArcGIS. In addition, roadways consist of different PR-Numbers, and each PR-Number consists of different XD-segments with different mile points. The following steps were performed in order to identify secondary crashes: 47  Speed trend plotted based on yearly average speed data in 2018 for each day of the week and each segment.  Average speed trend at each section, with respect to the day of the week  Estimating crash impact duration and secondary crash identification 3.4.1 Speed trend plotted based on average speed data on each day of the week and each segment Recurrent speed trends for each XD-segments were plotted based on average speed data for the year 2018 in each day and each segment. The process will be demonstrated for PR-number 639107 (I-96 WB), see Figure 3-6. This PR-number consists of 8 XD-segments located on I-96 westbound, see Table 3-1. 8 1 I-96 WB Figure 3- 6: Demonstration of PR-number 639107 (I-96 WB) 48 Table 3- 1: Segments and mile points within PR-number 639107 PR- XD- Mile Segment Number segment point number 639107 1346346161 0.513 1 639107 1346346122 0.5127 2 639107 1346346133 0.5115 3 639107 1346452489 0.5245 4 639107 1346452504 0.6611 5 639107 1346453321 0.3862 6 639107 1346453331 0.5254 7 639107 1346453345 0.2391 8 Figure 3-7 shows the average 15-minute speed plot for 24 hours in the first segment within PR-number 639107 (I-96 WB). It can be seen from the diagram that the average speed in section one varies between 65 to 70 miles per hour. In addition, the speed drops during the morning peak hour, from 7:30-10:30 am, and evening peak hour, from 3:30-6:30 pm. As expected, such peak hour effects are generally observed on weekdays. Speed (mph) a) Time Figure 3- 7: a) Speed Trend in section 1 within PR-number 639107 (Sunday=1, Monday=2, Tuesday=3, Wednesday=4, Thursday=5, Friday=6, Saturday=7) b) PR-number 639107 (I-96 WB) with 8 XD-segments 49 Figure 3-8 demonstrates the average speed for all 8 segments within 639107 PR-Number aggregated over the year. Colors show a different speed range, orange, the highest, and blue, the lowest speed within the segment. The same trend can be observed that the speed significantly drops within morning and evening peak hours. Moreover, speed is considerably lower in the last four segments (segment 5-8). The reason could be the location of those segments that are located at the system interchange. Speed (mph) Figure 3- 8: Yearly Speed Average for all 8 segments within PR-number 639107 Figure 3-9 demonstrates the yearly average speed for each time slot during a day (96 Time slot) in various segments. Figure 3-9, shows that average speed varies in different segments, approximately from 75 to 65 mph. Furthermore, it also illustrates that the speed drops in the last four segments. As mentioned, lower speed at the last 4 segments may be induced by their locations, as they are located at a curve. Each line shows the yearly average speed evolution per time slot in 50 all 8 segments. No significant difference in yearly average speed in various time slots during a day was observed. Speed (mph) Segment number Figure 3- 9: Yearly speed average within each time slot (PR-number 639107) 3.4.2 Average speed trend at each section, with respect to day of the week The speed data for the same time and location were collected from all days in 2018, and the yearly average speed at each XD-segment, with respect to the day of the week, will be calculated. Subsequently, the result from daily speed compared with the annual average speed. The result will be demonstrated in the heat map, see Figure 3-10. Time step (15 minutes) Time step (15 minutes) Segment Nr. Day1(Sunday) Day2(Monday) Figure 3- 10: Average speed profile for each day of the week within PR-number-639107 51 Figure 3- 10 (Cont’d) Time step (15 minutes) Time step (15 minutes) Segment Nr. Day3(Tuesday) Day4(Wednesday) Time step (15 minutes) Time step (15 minutes) Segment Nr. Day5(Thursday) Day6(Friday) Time step (15 minutes) Segment Nr. Day7(Saturday) Figure 3- 11 demonstrates the heat map of the relative speed of the traffic in a 24-hour period on October 19th 2018, along I-96 WB in 15-min speed intervals with the yearly average speed in all 8 segments within PR-number 639107. Time step (15 minutes) Traffic Flow Direction Segment Nr. Figure 3- 11: Difference between daily and yearly average speed (October 19th 2018) Accordingly, one heat map can be generated for each day of a section. If the speed is lower than the yearly average speed, the color changes from white to red. As the 52 difference increases, the color will be intensified. Note that the speed increase has not been considered. Red zones describe the time and location of significant speed drops from the yearly average speed. In this corridor, significant congestion occurred, and the speed drop started in segment eight and continued till segment one, which is upstream of traffic, see Figure 3-12. Time step (15 minutes) Segment Nr. Traffic Flow Direction Segment Nr. Day2(Monday) Day36-Day2 (02/05/2018) Figure 3- 12: Different average speed profiles for the day Monday, February 5th (02/05/2018) of the week within PR-number-639107 3.4.3 Estimating crash impact duration and secondary crash identification Next, the crashes within each PR-number are extracted from the interstate crash database and implemented in the heat map. It should be noted that most segments do not experience even two crashes on the same day and thus can be automatically eliminated from the search space. Plotting the distribution of events over the year, Figure 3-13 is created, which describes the density of daily crashes in 2018 within PR-number 639107. The total number of crashes in 2018 in that PR-Number is 138. Using a colored gradient contour of white to red, Figure 3-13, can be used to quickly demonstrate the days with no crashes or one crash. Excluding those days, the dynamic method search space can be quickly constrained to 28 days with more than one crash. 53 Day of the month Traffic Flow Direction Month Figure 3- 13: Contour plot of the density of crashes in 2018 within PR-number 639107 Further, the speed data at the time of each crash, 𝑆𝑆𝑡𝑡,𝑖𝑖 has been compared to the average ̅ . Speed data at the time of crashes were used to establish yearly speed trend within that segment, 𝑆𝑆𝑡𝑡,𝑖𝑖 a recurrent speed profile of the section under normal traffic conditions. Speed plot trends of crashes plotted to identify the incident impact duration time. The incident impact duration is defined as the duration between the time that incident was detected and the time that speed returned to the normal trend, which is the yearly average speed for each day of the week. It was hypothesized that when the speeds from the incident reporting times are lower than the defined boundary of average speeds, the speed drop is assumed to be affected by the occurrence of an incident. In this case, the speed profile for each XD-segment is assumed to be affected by the occurrence of an incident when the speed at the incident times is substantially lower than the defined average speed. The speed drop in each road segment was compared spatially and temporally with the average annual speed in that segment to identify secondary crashes. In the case that the speed drops near the incident location, for every crash, the time and the distance in the upstream direction of the traffic are recorded till the speed gets back to the annual average speed. 54 Once the incident impact area for all crashes is identified, the model will search for other incidents occurring within the affected spatiotemporal window. Any crash within the impact area and upstream of a primary crash will be categorized as a secondary crash. ⎧ 𝑆𝑆𝑆𝑆𝑖𝑖 < 𝑆𝑆𝑆𝑆𝑗𝑗 𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ⎪ 𝑡𝑡𝑖𝑖 > 𝑡𝑡𝑗𝑗 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝐶𝐶𝑖𝑖 = 𝑆𝑆𝑆𝑆 � 𝐶𝐶𝑗𝑗 � 𝑖𝑖𝑖𝑖 ̅ 𝑑𝑑𝑑𝑑𝑑𝑑 (2) , 𝑆𝑆𝑡𝑡,𝑖𝑖 < 𝑆𝑆𝑡𝑡,𝑖𝑖 ⎨𝑡𝑡𝑖𝑖 ∈ [𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑜𝑜𝑜𝑜 𝐶𝐶𝑗𝑗 ] ⎪ 𝑆𝑆𝑆𝑆 ∈ [𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑜𝑜𝑜𝑜 𝐶𝐶 ] ⎩ 𝑖𝑖 𝑗𝑗 𝐶𝐶𝑖𝑖 : Crash i 𝐶𝐶𝑗𝑗 : Crash j SC: Secondary crash Sg: Segment of the crash occurrence t: Time of the crash occurrence 𝑆𝑆𝑡𝑡,𝑖𝑖 : Speed at the time of each crash ̅ 𝑑𝑑𝑑𝑑𝑑𝑑 : Average yearly speed within on segment i in time step t 𝑆𝑆𝑡𝑡,𝑖𝑖 If multiple crashes were detected within the affected spatiotemporal window, all of them would be categorized as secondary crashes. In the example depicted in Figure 3-14 a showcase of crashes that occurred on Friday, October 19th, 2018, along I-96 WB is provided. On this particular day, three crashes occurred along the study corridor, resulting in significant congestion, the average speed, 𝑆𝑆𝑡𝑡,𝑖𝑖 ̅ , dropped below the recurring speeds along this corridor, 𝑆𝑆𝑡𝑡,𝑖𝑖 . Two of these three crashes were considered as secondary crashes. Table 3- 2: Crashes on Friday, October 19th, 2018, along I-96 WB Crash ID Date Day of the week Time PR-Nr. 1514269 10/19/2018 6 15:35 639107 55 Table 3- 2 (Cont’d) 1514280 10/19/2018 6 16:30 639107 1509026 10/19/2018 6 16:50 639107 Time step Secondary Crash 1: Segment=3 Timeslot=66 Segment Nr. Secondary Crash 2: Segment=7 Timeslot=67 Primary Crash: Segment=8 Timeslot=62 Figure 3- 14: Detection of secondary crashes using speed data Crash 1 occurred at 15:35 pm (time slot 62) on segment 8 and affected eight segments in the upstream direction (from 8 to 1). 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶ℎ1 : [ 𝑡𝑡1 = 62, 𝑖𝑖1 = 8] (3) ̅ 𝑑𝑑𝑑𝑑𝑑𝑑 𝑆𝑆𝑡𝑡,𝑖𝑖 < 𝑆𝑆𝑡𝑡,𝑖𝑖 ∀ 𝑖𝑖 = [1 … ,8] ∀ 𝑡𝑡 = [62 … ,80] 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶ℎ1 : First crash with crash Id: 1514269 𝑖𝑖: Segment 𝑡𝑡: Time step ̅ 𝑑𝑑𝑑𝑑𝑑𝑑 : Average yearly speed on segment i in time step t 𝑆𝑆𝑡𝑡,𝑖𝑖 𝑆𝑆𝑡𝑡,𝑖𝑖 : Speed on segment i in time step t 56 The speed drop continues from time slot 62 to 80. It is worth noting that crash occurrence is considered to be the source of the congestion and speed drop, however, it may be possible that the speed reduction is not due to only the crash occurrence. From the Figure 3-14, it can be clearly observed that congestions and queue formations occur after the primary crash. However, less information has been obtainable in the Figure 3-14 about whether the queue formations resulted from recurrent congestion or another crash in the previous road segment. In order to eliminate the effects of recurrent congestions, the spatial and temporal influencing range of the prior crash should be determined. As a result of congestion caused by the primary crash and significant speed reduction, another crash occurred at 16:30 (time step 66) on segment three. This crash occurred 55 minutes later and upstream of the primary crash on segment 3. The crash resulted in a drop-in speed from time slot 66 to 78 and from segment 3 to 1. 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶ℎ2 : [ 𝑡𝑡2 = 66, 𝑖𝑖2 = 3] (4) ̅ 𝑑𝑑𝑑𝑑𝑑𝑑 𝑆𝑆𝑡𝑡,𝑖𝑖 < 𝑆𝑆𝑡𝑡,𝑖𝑖 ∀ 𝑖𝑖 = [1 … ,3] ∀ 𝑡𝑡 = [66 … ,78] 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶ℎ2 : Second crash with crash Id: 1514280 𝑖𝑖: Segment 𝑡𝑡: Time step ̅ 𝑑𝑑𝑑𝑑𝑑𝑑 : Average speed on segment i in time step t 𝑆𝑆𝑡𝑡,𝑖𝑖 𝑆𝑆𝑡𝑡,𝑖𝑖 : Speed on segment i in time step t 57 Following those crashes, another crash occurred at 16:50 (time slot 67) on segment 7. The speed drop continues from time slot 67 to 79 and from segment 7 to 1. 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶ℎ3 : [ 𝑡𝑡3 = 62, 𝑖𝑖3 = 8] (5) ̅ 𝑑𝑑𝑑𝑑𝑑𝑑 𝑆𝑆𝑡𝑡,𝑖𝑖 < 𝑆𝑆𝑡𝑡,𝑖𝑖 ∀ 𝑖𝑖 = [1 … ,7] , ∀ 𝑡𝑡 = [67 … ,79] 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶ℎ2 : Third crash with crash Id: 1509026 𝑖𝑖: Segment 𝑡𝑡: Time step ̅ 𝑑𝑑𝑑𝑑𝑑𝑑 : Average speed on segment i in time step t 𝑆𝑆𝑡𝑡,𝑖𝑖 𝑆𝑆𝑡𝑡,𝑖𝑖 : Speed on segment i in time step t In the showcase, the first crash is considered a primary crash, and other crashes are considered as secondary crashes as there are located in the primary crash impact area. The same analysis was done for days with multiple crashes. In some cases, a secondary crash could be a primary crash and leads to additional crashes. 3.5 Results and Discussion 3.5.1 Secondary Crashes Identified by Manual Method Within Detroit Area Secondary crashes were identified in the Detroit area using the manual method. In this approach, police crash reports were used to identify secondary crashes. This method was used to evaluate the sensitivity of spatiotemporal thresholds and also to determine the extent of under or overestimation of secondary crashes when compared with the dynamic method. Each crash report includes detailed information about the crash, such as date, time, location, and a crash narrative 58 and crash code. In the manual approach, narratives from the police crash reports were checked manually. In total, there were 13,392 crash reports in the Detroit region, and the information from these reports was converted to a spreadsheet format for review. Based on the crash code, 859 crashes were identified as being due, at least in part, to a prior crash under the contributing circumstance field. For all those crashes, crash narratives have been reviewed manually, and the result showed that about 82 percent, or 707 of them, were associated with a previous crash and secondary in nature. The rest of the crash reports were assumed not to be related to prior crashes. For such crashes, other reasons other than prior crashes were mentioned in the narrative as the cause of a crash occurrence, which means crash code and narratives were not correlated. As mentioned in the previous chapter, the manual approach is used for the rest of the crash reports. Due to this approach additional 122 secondary crashes were identified. The result is demonstrated in Table 3-3. The result shows that almost 6.2 percent of the crashes were considered secondary crashes within Detroit interstate mainline system. Table 3- 3: Result for reviewing the crash reports with secondary related crash code Contributing circumstances (crash Total Confirmed number Other (non- code) number of secondary crashes secondary) of crashes crashes Backup - Other Incident 455 382 73 Prior Crash 404 325 79 Other (Identified from narrative) 12,533 122 12,411 Total 13,392 829 12,563 3.5.2 Secondary Crashes Identified Using the Dynamic Method in Detroit Area The proposed approach used Detroit crash data (13,392 crashes) from MTCF database and real-time speed data from RITIS. Various scenarios have been considered as cut-off deviations, such as 5 mph and 10 mph cut-off-speed and STD, 1.65STD, 2STD, 3STD. Secondary crashes 59 have been identified based on different scenarios. The result shows that the identified secondary crashes accounted for 3 to 10 percent of the Detroit crashes based on different scenarios, see Table 3-4. It can be observed that the scenario with 5-mph cut-off deviation has the highest, and 3STD has the lowest number of identified secondary crashes. Table 3- 4: Secondary crash results from the dynamic approach for various cut-off scenarios Dynamic Method Nr of Secondary Percentage of Secondary Scenario Crash Crash 5 mph cut off 1301 9.72 10 mph cut off 828 6.18 (Standard Deviation) STD 1102 8.23 1.65STD 762 5.69 2STD 623 4.65 414 3.09 Total number of crashes 13,392 Further, the result from different scenarios in the dynamic method was compared with the result from the manual approach, see Table 3-5. From those crashes classified in the dynamic method as secondary crashes, some have been identified in the manual approach as well as the secondary crash. Here crashes identified in the manual approach are confirmed as actual secondary crashes. The percentages of actual secondary crashes have been calculated from the ratio of the number of secondary crashes identified in the manual method to those determined by the dynamic method considering various scenarios. The percentages of the actual secondary crash identified in the dynamic method are the highest in 3STD scenario by about 37 percent and the lowest in 5-mph scenario by 20 percent, respectively. 60 Table 3- 5: Comparison of secondary crashes identified by dynamic and manual method Dynamic method Nr. of Secondary Nr. of confirmed Percentage of Scenarios Crashes secondary confirmed identified in crashes (manual secondary crashes dynamic method method) 5 mph cut off 1301 259 19.9 10 mph cut off 828 207 25.0 (Standard deviation) STD 1102 249 22.6 1.65STD 762 209 27.4 2STD 623 195 31.3 3STD 414 155 37.4 • 829 ( ≅ 6.2%) total number of actual secondary crashes in the Detroit area (based on manual method) 3.5.3 Static Sizing: Spatiotemporal Window in Detroit area In order to compare the result from dynamic approach to the static approach similar process employed see previous chapter (section 2.2). In this section the number of secondary crashes that has been identified in dynamic approach within each spatiotemporal window determined. Each crash is associated with interstate road number, location on the road, date, and time. Using linear referencing in ArcGIS, the exact locations (mile points) for each crash along the interstate road were determined. In the first step, consecutive crashes on each road segment were identified based on date and time. From the difference between corresponding mile points, the distance between two consecutive crashes was calculated. After determining the spatiotemporal thresholds between consecutive crash events within the 2018 interstate crash dataset in the Detroit area, different time and distance intervals (the distance interval varies from 1 to 6 miles and the time interval from 0 to 300 minutes) were used to define different sizes of spatiotemporal windows based on the result from the different scenarios in the dynamic method. This approach has been explained in detail in the previous chapter (section 2.2). 61 The same analysis has been done for the dynamic process after determining the spatiotemporal thresholds between consecutive crash events. In order to illustrate the loss of accuracy by increasing sensitivity, the ratio of all verified secondary crashes in the dynamic approach to the total predicted events in a static approach (spatiotemporal window) is demonstrated by plotting 𝑁𝑁𝐷𝐷[𝐿𝐿,𝑇𝑇] / 𝑁𝑁𝑆𝑆[𝐿𝐿,𝑇𝑇] for different sizing of spatiotemporal windows, see Equation 6, 𝑵𝑵𝑫𝑫[𝑳𝑳,𝑻𝑻] = 𝜶𝜶𝑵𝑵𝑺𝑺[𝑳𝑳,𝑻𝑻] where 𝜶𝜶 ≅ [0.16 - 0.22] (6) • 𝑁𝑁𝐷𝐷[𝐿𝐿,𝑇𝑇] = Number of secondary crash events identified in a dynamic method • 𝑁𝑁𝑆𝑆[𝐿𝐿,𝑇𝑇] = Number of crashes that exists within a specific spatiotemporal window from the first crash • 𝛼𝛼: Convergence limit (Sensitivity) The convergence limit 𝛼𝛼 was observed on windows with different spatial and temporal sizes, see Figure 3-15. 62 Ratio of events identified by each method 𝑁𝑁𝐷𝐷[𝐿𝐿,𝑇𝑇] 𝑁𝑁𝑆𝑆[𝐿𝐿,𝑇𝑇] a) Radius (Mile) Ratio of events identified by each method 𝑁𝑁𝐷𝐷[𝐿𝐿,𝑇𝑇] 𝑁𝑁𝑆𝑆[𝐿𝐿,𝑇𝑇] b) Time (minute) Figure 3- 15: The ratio of actual confirmed events in the dynamic method to the total predicted events in the static approach within a gap size of 1 mile and various time intervals. b) The ratio of actual confirmed events in the dynamic method to the total predicted events in the static approach within a gap size of 15 minutes and various distance gaps 63 In other words, it can be seen that as the window grows, the accuracy decreases, and limit 𝛼𝛼 can be considered as sensitivity which is the probability of correctly identifying a secondary crash. Also, the specificity of dynamic approach which is the probability of correctly identifying a non-secondary crash calculated, see Table 3-6. Here the crash data within each window compared with the crash data within the largest window (spatiotemporal window of 6 mile and 300 minutes). Table 3- 6: Secondary crash distribution for interstate roads in Detroit area based on static and dynamic approach Distanc Time Number of Number of verified Specifici Sensitivit e grid gap crashes in secondary crashes ty y (Mile) (Min) spatiotemporal in dynamic (300min, window approach within 6mile) spatiotemporal N_S[L, T] window N_D[L,T] 1 15 204 77 86% 38% 30 315 109 87% 35% 60 482 151 88% 31% 300 1076 250 89% 23% 3 15 299 95 87% 32% 30 496 139 87% 28% 60 814 199 88% 24% 300 2171 377 90% 17% 6 15 394 108 87% 27% 30 669 165 87% 25% 60 1119 235 88% 21% 300 3170 480 unknown 15% 64 The temporal and spatial characteristics of secondary crashes within each static window can be observed in Figure 3-16. Temporally, approximately 75 percent of the secondary crashes were found to occur within 100 minutes time gap from the previous crash. Spatially, about 80 percent of the secondary crashes were found to occur within 2.5-mile distance gap from the previous crash. Generally, about 68 percent of secondary crashes occurred within 75 minutes of the time gap of the previous crash and within 1.5 miles upstream of the previous crash. In other words, about 32% of secondary crashes occurred beyond the most commonly used 1.75 miles and 75 spatiotemporal thresholds. These statistics confirm that the proposed dynamic approach identified more secondary crashes than the traditional manual method and less than the static method, which means that the static method overestimates the number of secondary crashes. Frequency of secondary crashes 70 120% 60 100% 50 80% 40 Cumulative percentage ≅ 75% 60% 30 40% 20 10 20% 0 0% 15 30 45 60 75 90 105 120 135 150 165 180 195 210 230 245 260 275 300 (a) Time gap to previous crash (minute) Frequency Cumulative % Figure 3- 16: Spatiotemporal distribution of secondary crashes in relation to previous crash (a) Temporal distribution (b) Spatial distribution 65 Figure 3- 16 (Cont’d) Frequency of secondary crashes 80 120% 70 100% 60 80% 50 40 Cumulative percentage ≅ 80% 60% 30 40% 20 20% 10 0 0% 0.25 0.5 0.75 1.5 2.5 3.5 4.5 5.5 1 1.25 1.75 2 2.25 2.75 3 3.25 3.75 4 4.25 4.75 5 5.25 5.75 6 (b) Distance gap to previous crash (mile) Frequency Cumulative % 3.6 Discussion and Conclusions Crashes are a major source of delays, system unreliability, and inefficiency on freeways. Congestion caused by a crash may increase the potential of subsequent vehicles to the risk of secondary crashes. Such crashes have been identified as a major problem in freeways that frequently affect both traffic operations and safety. Therefore, transportation agencies have taken various measures to minimize and mitigate the potential for and impacts of such crashes. Identifying secondary crashes is not a straightforward procedure as the definition is subjective. Past studies have proposed manual, static, and dynamic approaches to identify secondary crashes. Static methods have defined secondary crashes based on a fixed spatial and temporal threshold. In this approach, a fixed spatiotemporal window is assumed around the primary crash, which often overestimates the secondary crash by considering all the nearby events as the secondary crash. Furthermore, the static approach considers the same window for all types of primary crashes regardless of the upstream traffic flow, density and speed. The dynamic approach identifies a dynamic spatiotemporal impact area for each primary crash, in contrast to the static method, which considers a predefined threshold for the primary crash. 66 This research proposes a secondary crash identification method on freeways by tracking the spatiotemporal evolution of traffic flow. In this work, by leveraging a huge database of all events in Michigan Detroit interstate roads in 2018, a secondary crash identification approach from the integration of speed contour plot and the spatiotemporal evolution of primary crash impact area was proposed. Real-time travel speed data for every 15 minutes time interval was downloaded from RITIS and used in the method. In order to identify the crash impact area, the daily speed has been compared with the yearly average yearly speed within each day of the week. For each primary crash, a spatiotemporal speed matrix and corresponding speed contour plot within every segment are constructed. The area is considered congested when the daily speed is lower than the average speed. If there is an existing crash in the section, the speed reduction is assumed due to the crash occurrence. Further, if another crash occurs within the primary crash impact area, it is considered a secondary crash. It has been demonstrated that the static method consistently overestimates and with the increase in spatiotemporal window seizing, the specificity fades as the sensitivity increases. In addition, the number of secondary crashes identified by the dynamic method is highly dependent on the cut of speed. Based on the dynamic method, the total number of secondary crashes identified in the Detroit area varies from 3 to 10 percent, considering different scenarios. Different scenarios have been considered as cut-off deviations such as 5 mph and 10 mph cut off- speed as well as STD, 1.65STD, 2STD, 3STD. So, the 5-mph cut-off point scenario was considered to have the least sensitivity and 3STD the highest sensitivity consecutively. Logistic regression and negative binomial model were applied in order to identify factors that affect secondary crashes is the first step toward preventing the occurrence of secondary crashes. The result from the logistic regression model suggests that weather conditions, posted 67 speed limit, and crash severity, which involves minor injury, are among the key variables that affect secondary crash occurrence. The result from the negative binomial model suggests that annual average daily traffic (AADT), median with a concrete barrier, and a number of lanes and right shoulder width are among the key variables that affect secondary crash occurrence. This result is expected to provide useful information in developing policies and strategies to prevent the occurrence of secondary crashes. Moreover, the developed model can also be incorporated in advanced traffic control systems on freeways to prevent the occurrence of secondary crashes. With the comparison of the proposed approach to static and dynamic methods, it is expected that the proposed approach will lead to a reduction in the misidentification of secondary crashes. In addition, results may help to perform necessary strategies to mitigate secondary crashes, including improved traffic management policies and the implementation of advanced intelligent transportation warning systems. While this study only examined 2018 data on interstate roads in the Detroit area, it may not be a comprehensive representation of the whole state. Furthermore, secondary crashes caused by other non-crash incidents and also the effect of crashes in the opposite traffic direction deserve more investigation. 68 CHAPTER 4. MODELING AND PREDICTING SECONDARY CRASH RISK 4.1 Logistic Regression Analysis Existing studies have used several statistical models to analyze the risk of secondary crash occurrence. Among these studies, a number of studies e.g. (Karlaftis et al., 1999; Zhan et al., 2008) have adopted logistic regression models to identify those characteristics that distinguish secondary crashes from primary crashes. The results of such analyses can help to discern those scenarios where secondary crashes are most likely to occur, providing agencies with important insights to help with incident response and management activities. In the logistic regression framework, each crash can be characterized into one of two dichotomous outcomes, either the crash was secondary in nature (i.e., due to the occurrence of a previous, downstream crash) or it was not. The general form of this relationship is as follows, 𝑃𝑃𝑖𝑖 𝑌𝑌𝑖𝑖 = 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙(𝑃𝑃𝑖𝑖 ) = ln � � = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖1 + 𝛽𝛽2 𝑋𝑋𝑖𝑖2 + ⋯ + 𝛽𝛽𝑘𝑘 𝑋𝑋𝑖𝑖𝑖𝑖 (7) 1−𝑃𝑃𝑖𝑖 Where the response variable 𝑌𝑌𝑖𝑖 is the logistic transformation of the probability of a crash being secondary in nature (𝑃𝑃𝑖𝑖 ). The variables 𝑋𝑋𝑖𝑖1 to 𝑋𝑋𝑖𝑖𝑖𝑖 are factors assumed to be related to the occurrence of a secondary crash, 𝛽𝛽0 is an intercept, and 𝛽𝛽1 to 𝛽𝛽𝑘𝑘 are estimated regression parameters for each independent variable. These regression parameters are positive for those variables that are positively correlated with secondary crashes (i.e., secondary crashes are more likely as these variables are increased). Negative parameters are reflective of those variables that are underrepresented (i.e., less likely) among secondary crashes. 69 4.1.1 Data Description and Summary The initial dataset included a total of 26,679 crashes that occurred on mainline interstates in Michigan in the calendar year 2018. These data have been filtered out to consider only those crashes that occurred on roads with between two and five lanes and with speed limits from 55 mph to 75 mph. This reduced the final data set to 25,366 crashes. Table 4-1 shows the descriptive statistics corresponding to these data. Table 4 - 1: Descriptive statistics for analysis dataset Variables Mean Standard Deviation Interstate highway where the crash occurred I-69 (1 if yes; 0 if no) 0.066 0.249 I-75 (1 if yes; 0 if no) 0.259 0.438 I-94 (1 if yes; 0 if no) 0.283 0.450 I-96 (1 if yes; 0 if no) 0.195 0.396 I-196 (1 if yes; 0 if no) 0.056 0.229 I-275 (1 if yes; 0 if no) 0.024 0.153 I-296 (1 if yes; 0 if no) 0.009 0.096 I-475 (1 if yes; 0 if no) 0.011 0.105 I-496 (1 if yes; 0 if no) 0.015 0.124 I-194, I-375, I-675 (1 if yes; 0 if no) 0.007 0.085 I-696 (1 if yes; 0 if no) 0.074 0.262 Emergency medical services involved (1 if yes; 0 if no) 0.007 0.083 Total number of lanes at the site of the crash Two (1 if yes; 0 if no) 0.322 0.467 Three (1 if yes; 0 if no) 0.409 0.492 Four (1 if yes; 0 if no) 0.234 0.423 Five (1 if yes; 0 if no) 0.035 0.184 Urban area type Rural (1 if yes; 0 if no) 0.182 0.386 Small Urban and Small Urbanized (1 if yes; 0 if no) 0.117 0.322 Large Urbanized (1 if yes; 0 if no) 0.701 0.458 Time at which crash occurred Morning Peak hour (6:00 - 9:00) (1 if yes; 0 if no) 0.186 0.389 Evening Peak hour (15:00 - 19:00) (1 if yes; 0 if no) 0.275 0.446 70 Table 4 -1 (Cont’d) Off-Peak hour (1 if yes; 0 if no) 0.526 0.499 Day of week on which crash occurred Weekdays (1 if yes; 0 if no) 0.776 0.417 Weekend (1 if yes; 0 if no) 0.220 0.420 Number of units involved in the crash One (1 if yes; 0 if no) 0.421 0.494 Two (1 if yes; 0 if no) 0.490 0.500 More than two (1 if yes; 0 if no) 0.089 0.285 Relationship of crash to the roadway On the Road (1 if yes; 0 if no) 0.828 0.377 Median (1 if yes; 0 if no) 0.045 0.207 Shoulder (1 if yes; 0 if no) 0.064 0.245 Outside of Shoulder/Curb (1 if yes; 0 if no) 0.057 0.232 Gore/On-Street Parking/Off Roadway/Sidewalk/Bicycle 0.006 0.077 Lane (1 if yes; 0 if no) Weather Conditions Clear and cloudy (1 if yes; 0 if no) 0.699 0.459 Rain (1 if yes; 0 if no) 0.124 0.330 Snow (1 if yes; 0 if no) 0.157 0.364 Other (Fog, Severe Crosswinds, etc.) (1 if yes; 0 if no) 0.020 0.140 Crash Severity Fatal injury (1 if yes; 0 if no) 0.003 0.056 Suspected Serious Injury (1 if yes; 0 if no) 0.014 0.117 Suspected Minor Injury (1 if yes; 0 if no) 0.045 0.207 Possible injury (1 if yes; 0 if no) 0.125 0.331 No injury (1 if yes; 0 if no) 0.813 0.390 Posted Speed Limit 55 mph (1 if yes; 0 if no) 0.122 0.327 60-65 mph (1 if yes; 0 if no) 0.034 0.181 70 mph (1 if yes; 0 if no) 0.777 0.416 75 mph (1 if yes; 0 if no) 0.067 0.250 Crashes occurring on a total of thirteen interstate highways were included in the sample. Of these, the majority of crashes (54.2%) occurred on I-75 and I-94. Crashes were least frequent on bypass routes, such as I-194 and I-375. The Michigan UD-10 police crash report form classifies roads into four area type categories: 1) Rural (population is less than 5,000); 2) Small Urban (urban 71 cluster population is 5,000 - 49,999); 3) Small Urbanized (population is 50,000 - 199,999); and 4) Large Urbanized (population is 200,000 or more). Approximately 70 percent of crashes occurred in a large urbanized area. Approximately 78 percent of the crashes occurred on weekdays, and almost 70 percent happened during clear or cloudy weather conditions. From all crashes, only one percent involved more than two vehicles. In this study, based on occurrence time, crashes were categorized into two groups, namely, those that occurred during peak hours (06:00 to 10:00 and 15:00 to 19:00) and those that occurred during off-peak hours (9:01 am - 14:59 pm; 19:01 pm – 5:59 am). The information about peak hours has been determined based on the MDOT Freeway Congestion & Reliability Report in 2019. Based on the data, almost 53 percent of crashes are happening during off-peak hours. Overall, the summary statistics showed that almost 77 percent of crashes occurred on roads with a 70-mph posted speed limit. 4.1.2 Analysis and Result of Logistic Regression Model Estimation results for the logistic regression model for secondary crashes are shown in Table 4-2. All of the factors listed in the previous section were included in the initial model. The model was then tested to determine the significant variables. All of the identified variables are significant at the 0.05 level. Table 4 - 2 : Logistic regression model results for secondary crash likelihood Variables Estimate SE P-value Intercept -3.850 0.141 < 0.001 I-94 (baseline) I-69 -0.047 0.120 0.697 I-75 -0.151 0.075 0.045 I-96 -0.183 0.076 0.017 I-196 -0.191 0.120 0.111 I-275 -0.491 0.193 0.011 I-296 0.057 0.230 0.806 72 Table 4-2 (Cont’d) I-475 -0.161 0.271 0.551 I-496 0.137 0.177 0.438 I-194, I-375, I-675 -0.682 0.402 0.090 I-696 -0.026 0.111 0.813 Urban areas – Rural (baseline) Urban areas - Small Urban and Small Urbanized 0.023 0.102 0.823 Urban areas - Large Urbanized -0.102 0.088 0.246 Emergency medical services involved 1.122 0.201 < 0.001 Off-Peak hour (baseline) Morning Peak hour -0.098 0.067 0.142 Evening Peak hour -0.459 0.063 < 0.001 Weekend (baseline) Weekdays 0.026 0.065 0.690 Number of units - 1 (baseline) Number of units - 2 1.677 0.081 < 0.001 Number of units - more than 2 2.206 0.100 < 0.001 Crash Severity - No injury (baseline) Crash Severity - Fatal injury 0.739 0.306 0.016 Crash Severity - Suspected Serious Injury 0.171 0.192 0.373 Crash Severity - Suspected Minor Injury 0.042 0.121 0.726 Crash Severity - Possible injury 0.008 0.074 0.911 Weather Condition - Clear and cloudy (baseline) Weather Condition - Rain 0.227 0.079 0.004 Weather Condition - Snow 0.738 0.066 < 0.001 Weather Condition - other 0.103 0.203 0.612 Number of lanes- 2 (baseline) Number of lanes- 3 -0.365 0.072 < 0.001 Number of lanes- 4 -0.465 0.082 < 0.001 Number of lanes- 5 -0.792 0.082 < 0.001 Relationship of the crash to the roadway- On the Road (baseline) Relationship of the crash to the roadway - Median 0.123 0.166 0.459 Relationship of the crash to the roadway - 0.314 0.118 0.008 Shoulder Relationship of the crash to the roadway - Outside -0.131 0.169 0.440 of Shoulder/Curb Relationship of the crash to the roadway - Other -0.527 0.464 0.256 73 Table 4-2 (Cont’d) Speed Limit - 55 mph (baseline) Speed Limit - 60_65 mph 0.679 0.149 < 0.001 The result shows that the probability of secondary crash occurrence is lower in peak hours in comparison to non-peak hours. In addition, the likelihood of secondary crash occurrence is higher within the morning peak hour (6:00 AM to 9:00 AM) than the evening peak hour (15:00 PM to 19:00 PM). This result is consistent with the findings of the study by Vlahogianni et al. (2010), which found that during peak periods, crash influence is most likely increasing both temporally and especially in upstream traffic direction. Moreover, by expanding the crash duration, an extended response and clearance time may induce a significant likelihood of a secondary crash (Vlahogianni et al., 2010). However, a few other studies found peak hours as an insignificant factor in increasing the possibility of secondary crash occurrence (Khattak, Wang and Zhang, 2009; Xu et al., 2016; Sarker et al., 2017). One reason could be the speed drop in peak hour. Based on the result from Table 4-2 if all other factors are fixed, secondary crashes are more likely to occur when there are two and more than two vehicle units involved in the crash. Previous studies show mixed findings. This result is consistent with the findings from the study by Zhan et al. (2008) and Kopitch and Saphores (2011), where the number of vehicles is a significant factor in the likelihood of secondary crashes (Zhan et al., 2008; Kopitch and Saphores, 2011). Khattak et al. (2009) proposed three binary probit models to examine the interdependence between primary crash duration and secondary crash occurrence. Their findings showed that primary crash duration, AADT, and the number of involved vehicles positively affect the likelihood of secondary crashes (Khattak, Wang and Zhang, 2009). However, few other studies do not support this finding (Vlahogianni, Karlaftis and Orfanou, 2012; Park and Haghani, 2016a; Park, Gao and Haghani, 2017). The result shows that secondary crashes are more associated with crash injuries. Also, the 74 likelihood of secondary crash occurrence is higher when primary crash results in fatality. One of the possible reasons could be that a fatal crash is likely to lead higher effect on traffic flow on freeways, leading to a higher likelihood of multiple secondary crashes. Based on the result, secondary crash likelihood is higher during the week and decreased on weekends. This result is inconsistent with the finding of the previous study (Xu et al., 2016). Also, the likelihood of secondary crashes increases within rainy and snowy weather conditions, which is consistent with the previous study (Khattak, Wang and Zhang, 2011; Mishra et al., 2016; Wang, Liu, et al., 2016). In particular, the possibility of the secondary crash occurrence is higher in snowy weather. One reason could be that bad weather reduces visibility and friction between pavement and tires. Therefore, drivers have less time and space to take crash avoidance maneuvers. The chance of secondary crash occurrence is the highest on the roads with two lanes. The result shows that the probability of secondary crash occurrence decreases as the number of lanes increases. One possible reason is that with increasing the number of lanes vehicles could prevent secondary crashes by changing the lanes. This result is consistent with the findings of the study by Sarker et al. (2017) and Zhan et al. (2008), where the number of lanes was a factor that was found to be one of the key variables affecting secondary crash likelihood, whereas in the study by Park and Haghani (2016) and Park et al. (2017) the number of lanes was found to be negatively related to secondary crash occurrences (Zhan et al., 2008; Park and Haghani, 2016a; Park, Gao and Haghani, 2017; Sarker et al., 2017). The result in Table 4-2 shows that secondary crashes are more likely to occur in the median and shoulder of the road. The likelihood of secondary crash occurrence is higher on roads with 60 mph and 65 mph speed limit. This could be because by increasing the speed limit at the crash location, flowing vehicles do not have enough time to break and prevent secondary crashes. This 75 finding is consistent with the results of a previous study that speed is a significant factor affecting secondary crash likelihood. The study found that segments with higher posted speed limit (>55 mph) incur more secondary crashes compared with lower speed limit roads (Sarker et al., 2017). 4.2 Negative Binomial Model In addition to distinguishing between those factors associated with secondary (as compared to primary) crashes, further insights can be obtained by examining how frequently secondary crashes occur on individual road segments. As crash frequencies on a given road segment are composed of non-negative integers, count data models such as the negative binomial represent an appropriate analysis framework. Within the context of this study, the probability of the number of secondary crashes, y, occurring on interstate segment i, during a specific year of the analysis period is given as shown in Equation 3, 𝑦𝑦 𝑒𝑒 −𝜆𝜆 𝜆𝜆𝑖𝑖 𝑖𝑖 𝑃𝑃(𝑦𝑦𝑖𝑖 ) = (8) 𝑦𝑦𝑖𝑖 ! Where, 𝜆𝜆𝑖𝑖 is the average number of secondary crashes for segment i. 𝜆𝜆𝑖𝑖 is a function of various site-specific characteristics as shown in Equation 4, 𝜆𝜆𝑖𝑖 = 𝐸𝐸𝐸𝐸𝐸𝐸(𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝛽𝛽2 𝑋𝑋𝑖𝑖 + ⋯ + 𝛽𝛽𝑘𝑘 𝑋𝑋𝑘𝑘 + 𝜀𝜀𝑖𝑖 ) (9) where X1 to 𝑋𝑋𝑘𝑘 are a series of independent variables (e.g., traffic volumes, geometric characteristics, number lanes), β1 to βk are a series of parameters estimated from the regression model, and EXP(εi) is a gamma-distributed error term with mean equal to one and variance of α. 4.2.1 Data Summary The data used in the analysis was interstate road segments in Michigan. The data was excluded from the sufficiency file provided by the Michigan Department of Transportation 76 (MDOT). The National Functional Crash (NFC) code was used to filter the interstate road segments. NFC code classifies each street and highway based upon its primary function. The sample size contains 1,557 rows, each row has the information of unique segment number and mile point information. Table 4-3 provides descriptive statistics for the segments included in the final database. The curve length percentage demonstrates the geometric characteristic of the road. The curve percentage has been calculated from the length of the curve within the segment divided by the total segment length. AADT values ranged from 1,830 to approximately 103,000 vehicles per day (vpd), with an average of 30,768 vpd. The curve length percentage shows the geometric characteristics of the segment. Based on the data from 1,557 segments 460, about 30 percent of the segments contain curves. The right shoulder width parameter is the predominant width, to the nearest foot, of the improved shoulder on the right side of the roadway for divided segments or both sides of the roadway for undivided segments. The pavement edge or painted edge line is used as a reference point to determine the shoulder's width. The left shoulder width is the predominant width, to the nearest foot, of the improved shoulder on the left side of the roadway for divided segments. More than half of the segments are located in a large urbanized area with a population 200,000 or more, which comprised almost 52 percent of the sample. Table 4-3 also includes details about the frequency of the number of crashes within each road segment. Almost half of the crashes happen within interstate road segments on I-75 and I-94. Table 4-3 also provides details of the speed limit on segments where crashes are observed. The data shows that approximately 79 percent of crashes occurred on segments with a 70 mph speed limit. 77 Table 4 - 3: Descriptive statistics of pertinent variables Parameter Min. Max. Mean Std. Dev. Curve Length Percentage 0 100 10.492 21.125 Road number on which the crash occurred I-69 (1 if yes; 0 if no) 0 1 0.120 0.325 I-75 (1 if yes; 0 if no) 0 1 0.252 0.434 I-96 (1 if yes; 0 if no) 0 1 0.173 0.379 I-94 (1 if yes; 0 if no) 0 1 0.268 0.379 I-196 (1 if yes; 0 if no) 0 1 0.061 0.239 I-275 (1 if yes; 0 if no) 0 1 0.025 0.156 I-296 (1 if yes; 0 if no) 0 1 0.005 0.072 I-475 (1 if yes; 0 if no) 0 1 0.021 0.142 I-496 (1 if yes; 0 if no) 0 1 0.019 0.135 I-194, I-375, I-675 (1 if yes; 0 if no) 0 1 0.016 0.126 I-696 (1 if yes; 0 if no) 0 1 0.040 0.196 Speed limit 55 – 65 mph (1 if yes; 0 if no) 0 1 0.087 0.281 70 mph (1 if yes; 0 if no) 0 1 0.789 0.408 75 mph (1 if yes; 0 if no) 0 1 0.125 0.330 width of the shoulder on the right side of the roadway Right Shoulder Width - 0 to 10 ft (1 if yes; 0 if no) 0 1 0.690 0.462 Right Shoulder Width - 11 to 14 (1 if yes; 0 if no) 0 1 0.310 0.462 width of the shoulder on the left side of the roadway Left shoulder width - 0 to 8 ft (1 if yes; 0 if no) 0 1 0.640 0.360 Left shoulder width 9 to 17 ft (1 if yes; 0 if no) 0 1 0.360 0.480 predominant type of median for divided segments Concrete barrier (1 if yes; 0 if no) 0 1 0.347 0.476 Guardrail, graded with ditch (1 if yes; 0 if no) 0 1 0.653 0.476 Urban areas designated through FHWA Rural (population is less than 5,000) (1 if yes; 0 if 0 1 0.274 0.446 no) Small Urban (urban cluster population is 5,000 - 0 1 0.072 0.258 49,999) (1 if yes; 0 if no) Small Urbanized (population is 50,000 - 199,999) 0 1 0.138 0.345 (1 if yes; 0 if no) Large Urbanized (population is 200,000 or more) 0 1 0.516 0.500 (1 if yes; 0 if no) 78 Table 4 -3 (Cont’d) The total number of lanes at the site of the crash Two (1 if yes; 0 if no) 0 1 0.550 0.498 Three (1 if yes; 0 if no) 0 1 0.336 0.472 Four (1 if yes; 0 if no) 0 1 0.114 0.318 Annual Average Daily Traffic (AADT) 1,830 103,100 30,768.214 21,924.061 4.2.2 Analysis and Result of Negative Binomial This section presents the results of negative binomial models that were estimated to investigate the relationship between secondary crash frequency within each interstate road segment. Parameter estimates are presented for the model, along with the standard errors, t- statistic, and p-value. The model includes a variable that specifies the percentage of the curve within each road segment and AADT, median type, speed limit, number of lanes, and shoulder widths. When interpreting the results from the model, a positive parameter estimate indicates that secondary crashes increase as the independent variable is increased, and the converse is true for negative parameter estimates. Table 4-4 presents the results for total secondary crashes with respect to interstate road segments. Table 4 - 4: Model results for total secondary crashes Variables Estimate SE z-value P-value Intercept -1.543 1.085 -14.228 < 0.001 I-94 (baseline) I-69 -0.097 0.149 -0.651 0.515 I-75 -0.174 0.109 -1.594 0.111 I-96 -0.102 0.103 -0.987 0.324 I-196 0.233 0.159 1.461 0.144 I-275 -0.835 0.261 -3.201 0.001 I-296 0.478 0.364 1.314 0.189 I-475 -0.235 0.317 -0.740 0.459 I-496 0.494 0.246 2.005 0.045 79 Table 4 – 4 (Cont’d) I-194, I-375, I-675 -0.178 0.447 -0.399 0.690 I-696 -0.194 0.177 -1.095 0.273 Urban areas -Rural (baseline) Urban areas - Small Urban 0.310 0.175 1.773 0.076 Urban areas - Small Urbanized 0.151 0.139 1.091 0.275 Urban areas - Large Urbanized 0.324 0.124 2.625 0.009 Speed Limit - 55 – 65 mph (baseline) Speed Limit - 70 mph 0.002 0.137 0.011 0.991 Speed Limit - 75 mph 0.015 0.239 0.061 0.951 number of lanes- 2 (baseline) number of lanes- 3 -0.378 0.117 -3.230 0.001 number of lanes- 4 -0.645 0.167 -3.871 < 0.001 Guardrail, graded with ditch (baseline) Median - Concrete barrier 0.325 0.100 3.247 0.001 Right Shoulder Width - 0 to 10 ft (baseline) Right Shoulder Width - 11 to 14 ft -0.080 0.080 -1.002 0.316 Left shoulder width - 0 to 8 ft (baseline) Left shoulder width 9 to 17 ft -0.144 0.090 -1.601 0.109 Curve Length Percentage 0.000 0.002 -0.284 0.777 log (AADT) 1.495 0.108 13.823 < 0.001 The results from Table 4-4 show that some of the independent variables, such as the curve length percentage, and shoulder width, did not exhibit a clear relationship with the total number of secondary crashes. This finding is inconsistent with the results from the previous study that show curve segments lead to an increased risk of secondary crashes (Zhan et al., 2008). Also, in the study by Sarker et al. (2017), results show that roads with broad right shoulders (width >14 ft) have fewer secondary crashes compared to roads with narrow right shoulders. This is because sufficient right shoulder allows the traffic incident management agencies to manage the incident more effectively without significantly compromising the roadway's capacity (Sarker et al., 2017). Based on the results in Table 4-4, the frequency of secondary crashes has no relationship with the speed limit of the road segment. This finding is inconsistent with the results from a previous study, 80 which was one of the key variables in affecting secondary crash likelihood (Karlaftis et al., 1999a; Hirunyanitiwattana, 2006; Sarker et al., 2017). However, several independent variables were shown to strongly correlate with secondary crash frequency. Secondary crash frequency increased at the road segments with concrete barrier median, consistent with the previous study's finding (Sarker et al., 2017). The study considered two types of median type, raised median and no raised median type. The result shows that roads with a raised median have more secondary crashes than roads without a raised median. The secondary crash frequency decreases in the segments with three and four lanes compared to the segments with two lanes, which is consistent with the previous study's result where the number of lanes is among key variables that affect secondary crash occurrence (Sarker et al., 2017). Based on Table 4-4, the coefficient of the variable Urban areas - Large Urbanized is positive, indicating that the number of secondary crashes increases in the large urbanized areas with more than 200,000 population. The reason could be an increase in population leads to higher traffic volume, which increases the number of crashes and, consequently, the number of secondary crashes. The study by Sarker et al. (2017) analyzed the effect of land use on secondary crash occurrences and found that land use is among the key variables that affect secondary crash occurrences. The study considered suburban and urban areas, and the result shows that the number of secondary crashes is higher in urban areas (Sarker et al., 2017). The results from Table 4-4 show that annual average daily traffic (AADT) is statistically significant, and with the increase in the AADT the number of secondary crashes increased. One of the possible explanations is that higher traffic volume represents lower time headway between vehicles which leaves drivers less time for taking crash avoidance maneuvers when meeting 81 hazardous satiations. This may lead to an increase in the risks of a secondary crash. This result is consistent with the finding from previous studies that crash risks increase with an increase in traffic volume (Khattak, Wang and Zhang, 2009, 2011; Zhang and Khattak, 2011; Mishra et al., 2016; Sarker et al., 2017). 82 CHAPTER 5. CONCLUSION Crashes constitute a significant source of traffic congestion, in addition to reducing transportation system reliability, and efficiency, particularly on limited-access freeways. The congestion caused by primary crashes often exposes the following upstream vehicles to a heightened risk of secondary crashes. Therefore, transportation agencies have taken various measures to minimize and mitigate the potential for such crashes' and their resultant impacts. Although secondary crashes are relatively infrequent, they constitute a considerable safety concern and significantly impact traffic operations. Despite substantive research efforts, there is still significant uncertainty about the magnitude and nature of secondary crashes. The spatial and temporal impact of primary crashes on the road is closely related to occurrences of secondary crashes. Past studies have proposed manual, static, and dynamic approaches to identify secondary crashes. Static methods have defined secondary crashes based on fixed spatial and temporal thresholds. In this approach, a fixed spatiotemporal window is assumed with respect to the time and location of the primary crash. However, this approach often overestimates the rate of secondary crashes by classifying all events within these windows as secondary in nature. Furthermore, the static approach considers the same window sizes for all types of primary crashes regardless of the upstream traffic flow, density, and speed. In contrast, the dynamic approach identifies a spatiotemporal impact area for each primary crash that varies based upon traffic flow characteristics. In general, more severe crashes result in greater speed reductions and have impacts that extend further spatially and over longer durations temporally. 83 In this work, by leveraging a vast database of all crashes occurring on Michigan Interstate roads in 2018, an extensive manual review has been performed to identify actual secondary crashes and define this control set of secondary crashes based on information from police crash reports. The manual approach results are then used to assess the accuracy of the static method in identifying secondary crashes. Based on the manual approach, about seven percent of all interstate crashes were recorded by police officers as being secondary in nature. In addition, the role of static window sizes was explored. This study suggests that while predicting secondary crashes with fixed-size windows yield a significant overestimate; window sizes can be used to derive linearly correlated values with the confirmed number of secondary crashes regardless of the window size, traffic flow, density, and speed. This research further proposed a secondary crash identification method on freeways by tracking the spatiotemporal evolution of traffic flow. In this work, by leveraging a vast database of all crashes on interstate roads in Detroit, Michigan, a secondary crash identification approach was proposed from the integration of a speed contour plot and the spatiotemporal evolution of the primary crash impact area. Real-time travel speed data for every 15-minute time interval were collected from the Regional Integrated Transportation Information System (RITIS). To identify the crash impact area, the daily speed has been compared with the yearly average speed within each corresponding day of the week. For each primary crash, a spatiotemporal speed matrix and corresponding speed contour plot within every segment are constructed. The area is considered congested when the daily speed is lower than the average speed. If there is an existing crash in the section, the speed reduction is assumed due to the crash occurrence. Further, if another crash occurs within the primary crash impact area, it is considered a secondary crash. 84 In addition, the number of secondary crashes identified by the dynamic method is highly dependent on the cut-off speed that is used to identify periods during which the primary crash introduced non-recurrent congestion. Different scenarios have been considered in terms of these threshold values, such as 5 mph and 10 mph cut-off-speeds, as well as reductions of 1, 1.65, 2, and 3 standard deviations below the long-term average speeds for each day-of-week/time-of-day combination. The dynamic approach results show that the total number of secondary crashes identified in the Detroit area varies from 3 to 10 percent, considering different scenarios. So, the 5-mph cut-off point scenario was considered the least sensitivity and 3STD the highest sensitivity consecutively. Identifying the factors that lead to secondary crashes is the first step toward preventing the occurrence of secondary crashes. Existing studies have used several statistical models to analyze the risk of secondary crash occurrence. The current research has adopted logistic regression and negative binomial models to identify characteristics distinguishing between secondary and primary crashes. This study's proposed methodological approach and research findings provided insights into the effects of traffic conditions, geometric characteristics, weather conditions, and primary crash characteristics on the probability of multiple secondary crashes on freeways. The logistic regression model suggests that the number of lanes, weather conditions, posted speed limit, crash severity (particularly those resulting in fatal injury), number of units involved in the crash, and crashes with emergency medical service involved are among the key variables that are associated with the secondary crash occurrence. The negative binomial model suggests that annual average daily traffic (AADT), large urbanized areas (with a population of more than 200,000), and segments where median concrete barriers are present are among the key variables that are associated with the secondary crash occurrence. These results provide helpful information 85 in developing policies and strategies to prevent the occurrence of secondary crashes. Moreover, the developed model can also be incorporated into advanced traffic control systems on freeways to help mitigate the risk of secondary crashes and allow agencies to be prepared for circumstances under which the risks of secondary crashes are elevated. With the comparison of the proposed approach to static and dynamic methods, it is expected that the proposed approach will reduce the misidentification of secondary crashes. In addition, results may help to perform necessary strategies to mitigate secondary crashes, including improved traffic management policies and advanced intelligent transportation warning systems. While this study only examined 2018 data on interstate roads in the Detroit area, it may not be a comprehensive representation of the whole state. As such, additional research is warranted to understand differences that may exist on freeways with different traffic and geometric characteristics. The static and dynamic windows provide a fundamental tool to quantify how the occurrence of a secondary crash is influenced by primary crash severity. The tool could also help understand how quickly information should be transferred about the occurrence and location of traffic incidents to the upstream drivers to prevent secondary crashes. A dynamic approach could be used for locating critical time/zones in order to adopt proper strategies to prevent the risk of secondary crash occurrence based on the average speed profile per year and identifying high-risk zones. In addition, identifying zones with the likelihood of secondary crash occurrence will allow pre-emptive deployment of responding agencies such as highway patrols, emergency medical services, towing agencies, etc. Both static and dynamic methods, the two most common approaches used to define the impact area of the primary crash, have limitations that restrict their practical applications. Although 86 the dynamic method is proven to yield more accurate results, applying it requires real-time traffic data, which is only available in limited locations. On the other hand, the static method, which considers predefined and fixed spatiotemporal thresholds, does not yield reliable results. Secondary crashes caused by other non-crash incidents and the effect of crashes in the opposite traffic direction deserve more investigation. In summary, the static method may fail to capture the impact area of primary crashes and often overestimate the secondary crash by considering all the nearby events as the secondary crash. On the other hand, dynamic approaches address this limitation by determining the spatiotemporal thresholds of primary crashes based on real-time traffic flow characteristics such as speed and density. Further investigation and dynamic methods are recommended for future study. A complete understanding of secondary crash characteristics, contributing factors with respect to traffic, geometric conditions, and crash details can simplify and accelerate the identification of secondary crashes without analyzing individual reports. While most automatic identification methods of the secondary crash remain limited to the spatiotemporal boundary analysis, it has been demonstrated that the dynamic method is substantially more relevant in locations where the traffic flow is monitored and recorded. Ultimately, this research provides important insights that can aid road agencies in more proactive management of traffic crashes and other incident clearance activities. With that being said, there are some practical limitations, and the following research tasks are recommended as the next steps building upon the results of this research,  Investigating the role of prevailing traffic characteristics on secondary crashes should be considered in greater detail. This study shows that speed reductions have pronounced impacts on secondary crash occurrence. However, additional 87 information, such as traffic volume levels and other measures may help to further our understanding of these relationships. In general, many secondary crashes occur during congested traffic conditions, primarily using varying spatiotemporal thresholds depending on the prevailing traffic conditions.  Conducting additional case studies and varying spatiotemporal thresholds depending on the prevailing traffic conditions is expected to improve the accuracy of the thresholds used in the static model.  In a dynamic approach, the effect of special events and holidays, road maintenance and its effects on average speed, percentage of lane closure, shoulder blocked should also be investigated.  In addition, the role of attributes such as work zones, design features, vehicle technology, and pavement conditions in secondary crash occurrence should be investigated as these factors could affect the average speed in a segment. 88 BIBLIOGRAPHY 89 BIBLIOGRAPHY Chang, G.-L. and Steven, R. (2002) ‘Performance Evaluation of CHART (Coordinated Highways Action Response Team) Year 2002 (Final Report) Performance Evaluation of CHART’, University of Maryland, College Park and Maryland State Highway Administration. Chung, Y. (2013) ‘Identifying primary and secondary crashes from spatiotemporal crash impact analysis’, Transportation Research Record, (2386), pp. 62–71. Guo, J. et al. (2017) ‘Short-term traffic flow prediction using fuzzy information granulation approach under different time intervals; Short-term traffic flow prediction using fuzzy information granulation approach under different time intervals’, IET Intelligent Transport Systems. Institution of Engineering and Technology, 12(2), pp. 143–150. Hirunyanitiwattana, W. S. P. M. (2006) ‘Identifying secondary crash characteristics for California highway system’. Imprialou, M. I. M. et al. (2014) ‘Methods for defining spatiotemporal influence areas and secondary incident detection in Freeways’, Journal of Transportation Engineering, 140(1), pp. 70–80. Jalayer, M., Baratian-Ghorghi, F. and Zhou, H. (2015) ‘Identifying and characterizing secondary crashes on the Alabama state highway systems’, Advances in Transportation Studies, (37), pp. 129–140. Karlaftis, M. G. et al. (1999a) ‘ITS impacts on safety and traffic management: an investigation of secondary crash causes’, Journal of Intelligent Transportation Systems 5, no. 1, pp. 39–52. Karlaftis, M. G. et al. (1999b) ‘ITS Impacts on Safety and Traffic Management: An Investigation of Secondary Crash Causes’, ITS Journal, 5(1), pp. 39–52. Khattak, A. J., Wang, X. and Zhang, H. (2010) ‘Spatial analysis and modeling of traffic incidents for proactive incident management and strategic planning’, Transportation Research Record, (2178), pp. 128–137. Khattak, A. J., Wang, X. and Zhang, H. (2011) ‘iMiT: A Tool for Dynamically Predicting Incident Durations, Secondary Incident Occurrence, and Incident Delays’, TRB 90th Annual Meeting Compendium of Papers DVD, (January), pp. 1–17. Khattak, A., Wang, X. and Zhang, H. (2009) ‘Are incident durations and secondary incidents interdependent?’, Transportation Research Record, (2099), pp. 39–49. Kitali, A. E., Alluri, P., Sando, T. and Wu, W. (2019) ‘Identification of Secondary Crash Risk Factors using Penalized Logistic Regression Model’, Transportation Research Record, 2673(11), pp. 901–914. 90 Kitali, A. E., Alluri, P., Sando, T. and Lentz, R. (2019) ‘Impact of Primary Incident Spatiotemporal Influence Thresholds on the Detection of Secondary Crashes’, Transportation Research Record, 2673(10), pp. 271–283. Kopitch, L. and Saphores, J.-D. M. (2011) ‘Assessing Effectiveness of Changeable Message Signs on Secondary Crashes’, No. 11-427. Mishra, S. et al. (2016) ‘Effect of primary and secondary crashes: Identification, visualization, and prediction’, p. No. CFIRE 09-05. Moore, J. E., Giuliano, G. and Cho, S. (2004) ‘Secondary accident rates on Los Angeles freeways’, Journal of Transportation Engineering, 130(3), pp. 280–285. Owens, N. et al. (2010) ‘Traffic incident management handbook’, Washington, DC: Federal Highway Administration, Office of Transportation Operations, (Report No. Vol. 9. FHWA-HOP-10-013). Ozbay, K. and Kachroo, P. (1999) ‘Incident management in intelligent transportation systems’, MA: Artech House Publishers. Park, H., Gao, S. and Haghani, A. (2017) ‘Sequential interpretation and prediction of secondary incident probability in real time’, No. 17-062. Park, H. and Haghani, A. (2016a) ‘Real-time prediction of secondary incident occurrences using vehicle probe data’, Transportation Research Part C: Emerging Technologies. Elsevier Ltd, 70, pp. 69–85. Park, H. and Haghani, A. (2016b) ‘Stochastic Capacity Adjustment Considering Secondary Incidents’. IEEE, 17(10), pp. 2843–2853. Park, H., Haghani, A. and Hamedi, M. (2013) ‘Quantifying non-recurring congestion impact on secondary incidents using probe vehicle data’, 54th Annual Transportation Research Forum, TRF 2013, (March 2013), pp. 6–17. Raub, R. A. (1997a) ‘Occurrence of secondary crashes on urban arterial roadways’, Transportation Research Record, (1581), pp. 53–58. Raub, R. A. (1997b) ‘Secondary crashes: An important component of roadway incident management’, Transportation Quarterly, 51(3), pp. 93–104. Sarker, A. A. et al. (2015) ‘Development of a Secondary Crash Identification Algorithm and occurrence pattern determination in large scale multi-facility transportation network’, Transportation Research Part C: Emerging Technologies. Elsevier Ltd, 60, pp. 142–160. Sarker, A. A. et al. (2017) ‘Prediction of secondary crash frequency on highway networks’, Accident Analysis and Prevention. Elsevier Ltd, 98, pp. 108–117. 91 Skabardonis, A. et al. (1995) Freeway Service Patrol Evaluation. Berkeley,CA: California PATH Research Report. California. Smith, B. L. and Ulmer, J. M. (2003) ‘Freeway Traffic Flow Rate Measurement: Investigation into Impact of Measurement Time Interval’, Journal of Transportation Engineering, 129(3), pp. 223–229. Sun, C. C. and Chilukuri, V. (2010) ‘Dynamic incident progression curve for classifying secondary traffic crashes’, Journal of Transportation Engineering, 136(12), pp. 1153–1158. Sun, C. and Chilukuri, V. (2007) ‘Secondary Accident Data Fusion for Assessing Long-Term Performance of Transportation Systems’, (MTC Project 2005-04), pp. 1–35. Tedesco, S. et al. (1994) ‘Development of a model to assess the safety impacts of implementing IVHS user services’, IVHS America Annual Meeting. 2 Volumes. Tian, Y., Chen, H. and Truong, D. (2016) ‘A case study to identify secondary crashes on Interstate Highways in Florida by using Geographic Information Systems (GIS).’, Advances in Transportation Studies, 2. Vlahogianni, E. I. et al. (2010) ‘Freeway operations, spatiotemporal-incident characteristics, and secondary-crash occurrence’, Transportation Research Record, (2178), pp. 1–9. Vlahogianni, E. I., Karlaftis, M. G. and Orfanou, F. P. (2012) ‘Modeling the effects of weather and traffic on the risk of secondary incidents’, Journal of Intelligent Transportation Systems: Technology, Planning, and Operations, 16(3), pp. 109–117. Wang, J., Xie, W., et al. (2016) ‘Identification of freeway secondary accidents with traffic shock wave detected by loop detectors’, Safety Science. Elsevier Ltd, 87, pp. 195–201. Wang, J., Liu, B., et al. (2016) ‘Modeling secondary accidents identified by traffic shock waves’, Accident Analysis and Prevention. Elsevier Ltd, 87, pp. 141–147. Wang, Z. and Jiang, H. (2020) ‘Identifying Secondary Crashes on Freeways by Leveraging the Spatiotemporal Evolution of Shockwaves in the Speed Contour Plot’, Journal of Transportation Engineering, Part A: Systems, 146(2), 04019072. American Society of Civil Engineers (ASCE), 146(2). Wang, Z., Qi, X. and Jiang, H. (2018) ‘Estimating the spatiotemporal impact of traffic incidents: An integer programming approach consistent with the propagation of shockwaves’, Transportation Research Part B: Methodological, 111, pp. 356–369. Xu, C. et al. (2016) ‘Real-time estimation of secondary crash likelihood on freeways using high- resolution loop detector data’, Transportation Research Part C: Emerging Technologies. Elsevier Ltd, 71, pp. 406–418. Yang, B., Guo, Y. and Xu, C. (2019) ‘Analysis of Freeway Secondary Crashes with a Two-Step Method by Loop Detector Data’, IEEE Access. IEEE, 7, pp. 22884–22890. 92 Yang, H. et al. (2014) ‘Development of online scalable approach for identifying secondary crashes’, Transportation Research Record, 2470(26), pp. 24–33. Yang, H. et al. (2017) ‘Use of ubiquitous probe vehicle data for identifying secondary crashes’, Elsevier, pp. 138–160. Yang, H. et al. (2018) ‘Methodological evolution and frontiers of identifying, modeling and preventing secondary crashes on highways’, Accident Analysis and Prevention. Elsevier Ltd, 117, pp. 40–54. Yang, H., Bartin, B. and Ozbay, K. (2013) ‘Investigating the Characteristics of Secondary Crashes on Freeways’, 92nd Annual Meeting of the Transportation Research Board, Washington, DC, 2. Zhan, C. et al. (2008) ‘Understanding the characteristics of secondary crashes on freeways’, 87th Annual Meeting of the Transportation Research Board, TRB, No. 08-1835. Zhan, C., Gan, A. and Hadi, M. (2009) ‘Identifying secondary crashes and their contributing factors’, Transportation Research Record, (2102), pp. 68–75. Zhang, H., Cetin, M. and Khattak, A. J. (2015) ‘Joint analysis of queuing delays associated with secondary incidents’, Journal of Intelligent Transportation Systems: Technology, Planning, and Operations, 19(2), pp. 192–204. Zhang, H. and Khattak, A. (2011) ‘Spatiotemporal patterns of primary and secondary incidents on urban freeways’, Transportation Research Record, (2229), pp. 19–27. Zhang, X. et al. (2020) ‘Identifying secondary crashes using text mining techniques’, Journal of Transportation Safety and Security. Taylor & Francis, 12 (10)(0), pp. 1338–1358. Zhang, X., Green, E. and Chen, M. (2019) ‘Impact of Primary Incident Spatiotemporal Influence Thresholds on the Detection of Secondary Crashes’, Transportation Research Record. Elsevier Ltd, 2673(3), pp. 1–16. Zheng, D. et al. (2014) ‘Identification of Secondary Crashes on a Large-Scale Highway System’, Transportation Research Record: Journal of the Transportation Research Board, 2432(1), pp. 82–90. Zheng, D. et al. (2015) ‘Analyses of multiyear statewide secondary crash data and automatic crash report reviewing’, Transportation Research Record, 2514, pp. 117–128. 93