In the case of 1% and 5% significance levels, RANK is the best approach and MV-TSMM the second best one. When the significance level is 10%, MV-TSMM performs slightly better than RANK. It is worth mentioning that while the power of the RANK test in Panels A and B seems to be the same, we do not observe the same behavior for the MV-TSMM. Particularly for the MV-TSMM we notice an increase in the power when contaminating events are present. Furthermore, we observe that the STAR, V-TSMM and BMP allow the presence of contaminating events to be controlled much better for, without severe changes in power. This argument is not valid in the case of GARCH since a reduction in the power of the test is observed in the presence of contaminating events.

Taken all evidence together from Tables 1 and 2, the traditional BMP and BETA-1 approaches seem to perform well in the presence of contaminating events. Yet, for BETA-1 this comes at a significant cost when someone considers the reduction in its power. Likewise Aktas et al. (2007a) and Kolari and Pennönen (2010), among others, our analysis reveals that under the simulated setting, the RANK test appears to be a well attractive alternative since is seems resilient to the presence of contaminating events and preserves the highest levels of power among the candidate tests in the absence of event-induced increase in variance. The more elaborated tests of the regime-switching family (STAR, V-TSMM and MV-TSMM) do not seem to outperform overall the (less complex) traditional tests in the absence of eventinduced increase in variance. Nonetheless, Harrington and Shrider (2007) provide theoretical and empirical evidence to support the notion that all events induce variance. Therefore, the following section compares the statistical performance of the battery of tests we employ in our analysis in a more realistic fashion where we simulate also a variance increase on the event date.

3.1.2. Tests with variance increase in the event window Table 3 reports rejection rates for different cross sectional test statistics when an event creates no abnormal returns but increases variance in the event window. Panel A and Panel B of Table 3 show that the RANK test is poorly specified when an event-induced increase in return variance is present. The traditional BMP continues to exhibit relatively good performance in terms of the specification tests both, in the absence or presence of contaminating events. Furthermore, for the family of the regime-switching models rejection rates are quite close to the expected ones and do not seem to be much different in Panels A and B, proving supportive evidence that these tests are robust in the presence of contaminating events under the induced increase in variance analysis.

In Table 4 results show the power analysis for different test statistics when an event creates an increase in returns and variance. Comparing the results of Tables 2 and 4 we observe that the presence of an event-induced increase in return variance, drastically affects the power of all tests. In particular, the traditional BMP test statistic exhibits a severe reduction in power and appears to be extremely sensitive to the presence of event-induced increase in variance. In a similar fashion, BETA-1 and GARCH perform even worse than BPM and appear to be extremely weak in detecting abnormal performance in the presence of variance increases events. Contrary to its badly misspecification presented in Table 3, RANK shows the highest power from all traditional test statistics. The regime-switching family of models, STAR, V-TSMM and MV-TSMM, seem however to be the most attractive approaches in this investigation since by comparing the results in Panel A and Panel B of Table 4 there is a significant increase in the power of these three tests.

Since the performance of the test statistics differs significantly with respect to the specification error (as shown in Table 3), power analysis which is tabulated in Table 4 is not directly comparable across the different tests. Therefore, to further scrutinize our findings, we employ a graphical method proposed by Davidson and Mackinnon (1998), namely the sizepower curves, which allows the comparison of alternative test statistics that have different size. The results are presented in Figure 1: Panel A depicts the size-power curves in the absence of contaminating events in the estimation window, whereas, Panel B depicts evidence relating to the behavior of the tests in the presence of contaminating events.

Consistent with the results in Aktas et al. (2007a), the size-power curves of BETA-1 and GARCH reveal that these are the least powerful tests. The widely applied BMP and RANK tests perform much better than BETA-1 and GARCH, yet their performance is much inferior compared to the regime-switching family models. In particular, the graphical evidence in Figure 1 strongly supports that the STAR event study model provides the overall most powerful test statistics, since its curve dominates all other test statistic curves, both without or with contaminating events. The MV-TSMM and the TSMM test are the second best choices.

Overall, empirical evidence so far lends credence to the use of the regime-switching family models and in particular towards the utilization of the STAR event study model which appears to be the most accurate and robust method in the presence of both, event-induced increase in return variance and contaminated events.20

3.2. Real sample of M&As data Undoubtedly, the traditional simulated type event study approach has been routinely utilized to investigate specification and power performance of standard event study methods in the presence of artificially contaminated events and event-induced increase in variance cases. It remains interesting, however, to empirically validate whether similar model rankings To further guard against erroneous inferences that may arise in the presence of cross-sectional correlation and non-normal stock returns, we also apply the adjusted BMP test (cBMP) of Kolari and Pynnönen (2010) and the generalized RANK test (GRANK) of Kolari and Pynnönen (2011). Overall, performance of cBMP and GRANK is better when compared to their initial counterparts (i.e., the BMP and RANK tests, respectively). Our empirical results and inferences, however, are unchanged regarding the superiority of the regime-switching models and in particular of the STAR event study method over all other test statistics (the same holds true for the analysis that follows in Section 3.2). For the sake of brevity, we omit presenting results of these two tests in the tables; yet for illustration purposes and completeness, we include their size-power curve performance in Figures 1 and 2.

in terms of specification and power tests are also observed when dealing with real-world situations. Therefore, we focus on actual return generating processes coming from M&As where estimation period contamination and event-induced increase in return variance should emerge naturally for firms that engage in this particular corporate activity.

The results from the M&As are presented in Table 5. Panels A and B present the rejection rates without or with the presence of event-induced increase in variance, respectively. Likewise, Panels C and D provide the corresponding power analysis performance. Overall, we observe that the most powerful approaches are again from the regime-switching family, in particular the STAR, V-TSMM and MV-TSMM models. Under the event-induced increase in return variance setting, the models exhibit rejection rates close to the expected ones, as well as the highest levels of power among all rival test statistics. In general, these results are in the same line of reasoning with the ones we observed in the simulated environment under the event-induced increase in return variance case.

Figure 2 depicts the size-power curves with event-induced return of 1% and an eventinduced increase in return variance using the M&As sample. The overall model rankings are almost similar to the ones observed with the simulated data. The size-power curve evidence of BETA-1 and GARCH pinpoint that these are still the least powerful tests with real stock returns. The RANK test performs better than the BETA-1 and GARCH ones, whilst in contrast to results we get with the simulated data, the BMP test is now significantly inferior to the RANK test and only slightly better than BETA-1 and GARCH. Despite the good behavior exhibited by the RANK model, yet, its performance is much inferior compared to the regime-switching family models. In particular, the graphical evidence in Figure 2 supports that the STAR event study model provides the overall most powerful test statistics, since its curve dominates all other test statistic curves. The MV-TSMM and the TSMM test are again the second best choices. A noticeable difference we observed with the real stock returns is that the performance wedge between the regime-switching models and the rest traditional test statistics is much greater when compared to the simulated cases.

We further investigate the sensitivity of event study residuals to extreme market conditions. Particularly, from the sample of M&As we pick the 20% deals with either the lowest mean stock return performance or the highest volatility in the estimation window.21 We draw our motivation from prior literature. For instance, Klein and Rosenfeld (1987) suggest that traditional event study methods may suffer from serious deficiencies due to high autocorrelation that may emerge in the time-series of the resulting abnormal returns if the event days take place during either bull or bear markets (see also Chiang et al. 2013).

Moreover, Campbell et al. (2001) recognize that the increase in the idiosyncratic volatility might potentially affect the inferences of the event study analysis since abnormal eventrelated returns are highly determined by the volatility of individual stock returns relative to the market. All-in-all, by using this M&As sub-sample analysis we endeavour to clarify whether the specification and power performance of the test statistics we investigate are stable under different market conditions.

Table 6 presents the results for the sub-sample of M&As that exhibit the lowest mean returns in the estimation window. In the case where there is no event-induced increase in variance the most powerful test from Panel C is the MV-TSMM, yet its rejection rates are not as expected (Panel A). Although STAR is the second most powerful test, its rejection rates (Panel A) are close to the expected ones. Rejection rates of GARCH are similar to the rejection rates of the other tests but GARCH is the test with the lowest statistical power.

All conclusions regarding the mean return remain unaltered if we instead pick the 20% deals with the highest mean returns in the estimation window.

BETA-1 is similar to the RANK test regarding rejection rates and power. BMP and V-TSMM have similar rejection rates but V-TSMM clearly outperforms. Panel B (Panel D) provides rejection and power rates in the presence of event-induced increase in variance. We notice that the power of all tests is severely reduced in the presence of event-induced increase in variance. Results show that rejection rates for the STAR and V-TSMM tests are similar to the ones reported in Panel A and these tests are also the ones with the highest level of power.

BETA-1 and GARCH are the least powerful tests under the event-induced increase in variance scheme. BMP has similar rejection rates as in Panel A but its power is significantly lower in Panel D than the power reported in Panel C. Rejection rates of MV-TSMM are similar to V-TSMM but the power of MV-TSMM is lower than the power of V-TSMM. The fact that we observe lower power of MV-TSMM compared to V-TSMM is in line with empirical findings as in Aktas et al. (2007a) which suggest that the market model parameters are the same under both regimes. However, this is no longer true since contaminating events affect both mean and variance specification. Therefore the power of the MV-TSMM model could happen to be higher. Another important observation in the presence of event-induced increase in variance is that rejection rates of RANK differ significantly from rejection rates of the other tests. This observation provides additional empirical evidence that the RANK test performs poorly under real stock return data. Overall, empirical observations in Table 6 give support to the STAR event study method which exhibits again reasonable performance under extreme stock returns occurring in the estimation window.

Table 7 presents results regarding the case of investigating the 20% of M&As that preserve the highest return volatility in the estimation window. In general, this type of analysis again reveals that the regime-switching family of models dominates all other tests in terms of specification and power analysis. The power of the STAR model is yet much better than the power of V-TSMM and MV-TSMM; at 1% significance level, we observe the most intense differences in power rates between the STAR and the other two models. Additionally, rejection rates of STAR, V-TSMM and MV-TSMM are similar in Panels A and B, a fact that empirically demonstrates that these models perform rather well when event-induced increase in variance is present.

4. Discussion and conclusions There is a variety of tests that are robust to event-induced increase in variance caused by the cross-sectional variation in the effects of an event. According to Harrington and Shrider (2007), all events induce variance and therefore models that are robust to cross sectional variation must be used.

Using simulated data, we observe that when there is no event-induced increase in variance, the RANK test is an attractive approach while using real M&A data the power of the RANK test deteriorates significantly. Furthermore, the RANK test in the presence of event-induced increase in variance seems to be poorly specified since in all results rejection rates of this test statistics are extremely high compared to rejection rates of all other tests.

