# «Spatial, Temporal and Size Distribution of Freight Train Delays: Evidence from Sweden Niclas A. Krüger* a,b,c Inge Vierth a,b Farzad Fakhraei ...»

4.2.1 Rank-Size distribution of stations In this part we examine arrival delays (at final destinations) regarding to their aggregated size in order to find a pattern of how the delay proneness is distributed among terminal stations. In characterizing and analyzing many phenomena, double-logarithmic rank-size distributions have been used. The reason is that it will unveil a linear pattern if it follows the discrete version of the power law. In order to compute the rank-size distribution, first we need to rank the size of total delays in a given station in a descending order and calculate the natural logarithm of the rank and delay size. Figure ranks the 20 stations which have the largest arrival delays. The station with the largest cumulative delay is assigned the rank 1 which is zero on the logarithmic scale.

** Figure 5: Rank-size distribution for both years**

A double-log model is fitted to the data via OLS-regression and the very high value of R-square shows that the rank-size distribution can be described by the estimated linear relationship.

Table 2 ranks the 5 stations most prone to delay out of 270 stations in total. As can be seen, there are some differences in the ranking of stations for year 2008 and 2009. Although Hallsberg marshaling yard is the station in both years with the largest cumulative delay time followed by Borlänge, the following ranks change both with respect to the order of stations and the stations among the top 5 between the two years. Among the 20 stations most prone to delay we find 7 marshaling yards in 2008 and 6 marshaling yards in 2009 out of in total 13 marshaling yards in Sweden. The reason is possibly the high degree of connectivity of the marshaling yards with many different origins. The last column in Table 2 shows how many different origins are linked to the stations (in column 1) during the year. There are 52 (59) stations that have their destination in Hallsberg marshaling yard and 42 (43) that have their destination Borlänge in 2008 (2009). The significantly higher number of links associated to Hallsberg and Borlänge makes them central hubs in the rail network. Hallsberg alone experiences 10 percent of all delays, the top 5 stations together experience 25% off all delays and the top 20 stations (out of 270) experience more than 50% of all delays. These results reveal in what way the network is vulnerable in case any problem occurs in these connected stations. Figure shows the total arrival delays in hours geographically.

We expect a high degree of correlation between the total number of freight trains arriving at a certain station and the total arrival delay in that station. In order to explore this issue we use scatter plots and regression analysis (see Figure 7). As expected, there is a linear relationship between the number of trains and total arrival delay that fits relatively well. However, total arrival delay in Hallsberg is higher than expected, given the number of trains arriving. One possible explanation is that passenger trains contribute to congestion in Hallsberg.

Figure 7: Scatter plot of no. of trains vs. arrival delay in 2008 (left) and 2009 (right) 4.2.2 Transmission of delays in the freight train network In this section we explore how a delay is propagated within the network. In a highly used rail network close to capacity limits we would expect that delays increase along the trip to the final destination. The simplest way to examine this question is to estimate the impact of delays in the origin on the delay in the final destination. Figure shows the scatter plot for departure and arrival delays.

** Figure 8: Departure delay vs. arrival delay in 2008 (left) and 2009 (right)**

One difference between the years we can see by plotting the data is that in 2009 the number of trains with no or small departure delay but with high arrival delay is smaller than in 2008. The observations of both years were merged together. In order to quantify a relationship between departure time from origin and relative arrival time at the destination OLS-estimation is used.

The distance is included to control for that the probability for arrival delay is higher for longer distance because of the accumulation of disturbances (see Equation 4). Table 3 shows the regression results.

A positive intercept in the regression model shows the expected delay at the destination given that the departure delay is zero. If the arrival delay increases one minute for every minute the departure is late, the slope coefficient of the regression would be equal to one. However, the slope coefficient for departure delay is significantly smaller than one. This implies that some part of the initial delay is recovered along the trip. This is contrary to the hypothesis outlined above, according to that, we would have expected that train delays are subjected to a vicious cycle, so that once a train is delayed, the train will be delayed even more along the travel to the destination.

One explanation is the high number of early arrivals in the data, implying that the timetable allows early departures due to a considerable number of available slots, suggesting that the network is not heavily utilized in certain parts of the Swedish rail network and during parts of the day (i.e. during the night when many freight trains go). Hence, it would be fruitful to analyze delay data for passenger trains (that have less slack) in order to reveal the true nature of delay propagation, but this is outside the scope of the present paper.

4.3 Temporal delay distribution In this section we want to explore the distributions of delays considering daily variations. Figure 9 shows the result of this daily aggregation as a time series for 2008 and 2009. The most pronounced differences are the large delays in December 2009, a combination of harsh winter conditions and high demand (many trains) due to the Christmas holiday.

By superimposing the delays for 2008 and 2009 on each other, we can see that there is a high degree of periodicity in the data, showing the same pattern in both years. However, there are deviations from the periodicities. Among the different periodicities (within weeks, months, seasons and years), the weekly variations seem to be most pronounced.

We sum the delays for every day and use the same procedure described above for delays at stations, that is, we assign rank 1 to the day with most delay minutes and rank the other days accordingly. Figure 10 shows the rank-size distribution of all days in the data set on a log-log scale. Interestingly, the tail of the rank-size distribution seems to follow a power-law. This is a difference in comparison for delay distributions per train (size distribution) and per station (spatial distribution) both apparently following an exponential distribution. Hence, the worst days are very different from a normal day and this might indicate that there is a cascading spread of delays within the network. We can also calculate the probability for an event (certain total amount of delay any given day) to occur in the network. There is a small but positive probability that a daily disturbance double the size that occurred during 2008 and 2009 will occur during a certain future year. The probability is determined by the slope of the linear fit to the tail of the rank size distribution in Figure 10.

4.4 Capacity utilization and expected delay In Section 4.2 we saw a strong but imperfect relationship between the amount of delay and the number of trains, that is, more trains means simply more delays. However, as capacity utilization increases due to the increased number of trains we would expect that average delay per train increases since congestion would lead to knock-on effects. Capacity utilization differs over time and rail demand exhibits periodic cycles within days, weeks and months. In fact, for the years examined here, the ton-kilometers transported on rail was 17% lower in 2009 compared to 2008 due to the economic contraction. In a sense, these exogenous variations allow us to examine the link between capacity utilization and the expected delay per train. All things equal, we would expect that the average delay increases as capacity utilization increases.

First, we examine the within week variation (see Figure 11). We expect to have more delays during workdays, since there would be more passenger trains commuting, which increases the chance of meeting that leads to more delay. In addition, there are also more companies sending and receiving goods during week days. Figure 11 shows total arrival delays at terminal stations regarding days of week and relatively total number of trains. It seems Tuesdays and Wednesdays are the busiest weekdays which result in higher total level of delays. As can be seen even if the number of freight trains decreases by more than half during weekends, the average delay is almost unaffected. The total number of passenger trains is constant across weekdays at circa 230 thousands whereas during weekends the number falls just below 150 thousands, which corresponds to a decreased capacity utilization of about 35 percent during weekends. Hence, we would expect a significant dip in average delays since capacity is freed up during weekends, but no such effect can be observed.

Next we aggregate arrival delays over months so that the strong within-week periodicity is filtered from the data. In Figure 12 we aggregate the data for months and compare total number of trains and average delay per train per month. We can see from Figure 12 that the annual trend which is negative due to the economic bust of 2009, is more pronounced than the seasonal variation (for example, there is a peak during autumn in both years). Interestingly, even as the total number of freight trains is falling during 2009, there is no visible effect on the average delay. No clear-cut relationship between the number of trains and average delay emerges. The correlation coefficient is -0.07, implying that there is a slight tendency that delay per train is decreasing as capacity utilization is increasing.

** Figure 12: Arrival delay and total number of trains aggregated over months**

It could be argued that the weak (and actually negative) link between capacity utilization and average delay is due to the fact that the traffic is decreasing in off-peak times whereas it is the same during peak-times and hence that the missing link is just an illusion. In order to investigate this issue we identify peak-times by plotting the total number of trains using the rail network at any given hour (see Figure 13). Figure 13 shows that the only peak time is around 6am for both years. It can be seen that there is a reduction of freight trains in 2009 both during peak hours and off-peak hours. Figure 14 analyzes the impact of the decrease in freight transportation demand between 2008 and 2009 on average delay in 2009. In general, there is no evidence in Figure 14 that traffic demand reductions either during peak or during off-peak decreases average delay in 2009, the correlation coefficient is -0.31. Hence, the link between capacity utilization and expected delay per train is best described as weak, although extreme delays might bias upward the average delay estimates, hence the negative relationship.

We also analyze the relationship between total number of trains including passenger trains in order to control for potential interaction effects, that is, whether passenger trains affect average delay for freight trains. The correlation coefficient for all train arrivals during a certain time of the day and average delay per freight train is 0.09.

Figure 14: Average delay in 2009 and reduction in number of trains in percent 5 Implications

5.1 Implications for inventory management and the value of transportation time variability The results have implications for how many parts companies have to keep in buffer stocks in order to provide a sufficient service level. The buffer stock size is also related to the value of transport time variability (VTTV), since in the absence of variability no buffer stock would be needed and hence no additional costs of variability would arise.

A representative company determines a required service level α, defined as the ratio of the number of on time deliveries and total deliveries. Based on this service level the company reacts to a stochastic delivery time by holding a buffer stock s, which is a function of standard deviation (or more general: uncertainty) in transport time: s=f(σ) ceteris paribus. The cost for holding a buffer stock and hence the cost of variability in transport times is the cost of physical storage of the goods and the capital costs of the goods stored. Hence we can compute the cost of variability under certain simplifying conditions.

More specific, we illustrate with the following example: if we consider a monthly production of 4400 units it implies that the monthly need for a certain input is 4400 as well (assuming here that the ratio is one input unit per one unit output). If production consists of 22 working days a month it implies that there is a daily need for 200 parts. If the production is based on Just-in-time deliveries the required service level would be about 99%, that is, 99% of all deliveries should be on time in order to avoid stockout cost. It can be shown (Hansmann, 2006), that if transport time is normally distributed, this means that the buffer stock required is (e is the value of a given percentile for the standardized normal distribution and q is the demand in

**production per time unit):**

s = e99 ⋅ σ ⋅ q (5) Note that σ and q have to be expressed in the same time unit; hence if the standard deviation is expressed in hours, we need to compute the input demand per hour as well. Between Malmö and Hallsberg we can compute from the data that the average planned transport time for rail freight is 6 hours and that the standard deviation is 1 hour. Assuming that the company faces this average transport time and variability the buffer stock becomes (we need to order 100 parts at each time

**if the company produces during 12 hours per day, that is 200/12 parts per hour):**