«Automated Design of Finite State Machine Predictors Timothy Sherwood Brad Calder Department of Computer Science and Engineering University of ...»
We ﬁrst examine using our automated approach to automatically generate a conﬁdence estimation FSM to use for value prediction in a general purpose processor. The results show that we are able to automatically generate conﬁdence estimation FSMs the can achieve higher accuracy for greater coverage than previously proposed saturating up/down counters. More importantly, our framework automatically ﬁnds a Pareto optimal curve of custom FSMs solutions for trading off accuracy versus coverage.
We also present a customized branch prediction architecture that makes use of these custom built ﬁnite state machine predictors. The FSM predictors are generated for branches that are not easily captured by local two-bit counters. These custom state machines take up little area, and can efﬁciently and accurately capture the global correlation behavior of the target application. The global correlation is shown to be captured across input sets and results are presented for a variety of predictor sizes. For all of the programs examined, our custom predictors achieve a misprediction rate less than a general purpose predictor of twice it’s size or more. For two of the programs, the custom branch misprediction rates are lower than general purpose predictors of ﬁve times their size.
References  S. G. Abraham and S. A. Mahlke. Automatic and efﬁcient evaluation of memory hierarchies for embedded systems. In 32nd International Symposium on Microarchitecture, 1999.
 M. Burtscher and B.G. Zorn. Prediction outcome history-based conﬁdence estimation for load value prediction. Journal of Instruction-Level Parallelism, 1, 1999.
 B. Calder, P. Feller, and A. Eustace. Value proﬁling and optimization. Journal of Instruction Level Parallelism, 1999.
 B. Calder and G. Reinman. A comparative survery of load speculation architectures. Journal of Instruction Level Parallelism, 2, 2000.
 B. Calder, G. Reinman, and D. Tullsen. Selective value prediction. In 26th Annual International Symposium on Computer Architecture, June 1999.
 I.-C. Chen, J.T. Coffey, and T.N. Mudge. Analysis of branch prediction via data compression. In Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, October 1996.
 T-F. Chen and J-L. Baer. Effective hardware-based data prefetching for high performance processors. IEEE Transactions on Computers, 5(44):609–623, May 1995.
 R. J. Eickemeyer and S. Vassiliadis. A load instruction unit for pipelined processors. IBM Journal of Research and Development, 37:547–564, July 1993.
 J. Emer and N. Gloy. A language for describing predictors and its application to automatic synthesis. In 24th Annual International Symposium on Computer Architecture, June 1997.
 G. Ezer. Xtensa with user deﬁned dsp coprocessor microarchitectures. In Proceedings of the International Conference on Computer Design, 2000 (ICCD2000), pages 335–342, September 2000.
 J. A. Fisher, P. Faraboschi, and G. Desoli. Custom-ﬁt processors: Letting applications deﬁne architectures. In 29th International Symposium on Microarchitecture, pages 324–335, December 1996.
 M. J. Flynn and R. I. Winner. Asic microprocessors. In 23th International Symposium on Microarchitecture, pages 237–243, 1990.
 F. Gabbay and A. Mendelson. Speculative execution based on value prediction. EE Department TR 1080, Technion Israel Institue of Technology, November 1996.
 J. Gonzalez and A. Gonzalez. The potential of data value speculation to boost ilp. In 12th International Conference on Supercomputing, 1998.
 R. E. Gonzalez. Xtensa: A conﬁgurable and extensible processor. IEEE Micro, 20(2):60–70, March-April 2000.
 D. Grunwald, A. Klauser, S. Manne, and A. Pleskun. Conﬁdence estimation for speculation control. In 25th Annual International Symposium on Computer Architecture, June 1998.
 T.H. Heil and J.E. Smith. Selective dual path execution. Technical report, University of Wisconsin - Madison, November 1996.
 J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, 1979.
 E. Jacobsen, E. Rotenberg, and J.E. Smith. Assigning conﬁdence to conditional branch predictions. In 29th International Symposium on Microarchitecture, December 1996.
 A. Klauser, A. Paithankar, and D. Grunwald. Selective eager execution on the polypath architecture. In 25th Annual International Symposium on Computer Architecture, June 1998.
 S. Leibson. Xscale (strongarm-2) muscles in. Microprocessor Report, September 2000.
 M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value locality and load value prediction. In 17th International Conference on Architectural Support for Programming Languages and operating Systems, pages 138–147, October 1996.
 M.H. Lipasti and J.P. Shen. Exceeding the dataﬂow limit via value prediction. In 29th International Symposium on Microarchitecture, December 1996.
 S. Manne, A. Klauser, and D. Grunwald. Pipeline gating: Speculation control for energy reduction. In 25th Annual International Symposium on Computer Architecture, June 1998.
 S. McFarling. Program optimization for instruction caches. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III), pages 183–191, April 1989.
 S. McFarling. Combining branch predictors. Technical Report TN-36, Digital Equipment Corporation, Western Research Lab, June 1993.
 H. Mulder and R. J. Portier. Cost-effective design of application speciﬁc vliw processors using the scarce framework.
In 22th International Symposium on Microarchitecture, 1989.
 E. Musoll. Predicting the usefulness of a block result: a micro-architectural technique for high-performance low-power processors. In 32nd International Symposium on Microarchitecture, November 1999.
 B. Ramakrishna Rau and Michael S. Schlansker. Embedded computing: New directions in architecture and automation.
In 7th International Conference on High-Performance Computing (HiPC2000), 2000.
 R. Razdan and M. D. Smith. A high-performance microarchitecture with hardware-programmable functional units. In 27th International Symposium on Microarchitecture, pages 172–180, 1994.
 G. Reinman and B. Calder. Predictive techniques for aggressive load speculation. In 31st International Symposium on Microarchitecture, 1998.
 R. Rudell and A. Sangiovanni-Vincentelli. Multiple-valued minimization for pla optimization. IEEE Transactions on Computer Aided Design, 6(5):727–750, 1987.
 Y. Sazeides and J. E. Smith. The predictability of data values. In 30th International Symposium on Microarchitecture, pages 248–258, December 1997.
 R. Schreiber, S. Aditya, B.R. Rau, V. Kathail, S. Mahlke, S. Abraham, and G. Snider. High-level synthesis of nonprogrammable hardware accelerators. Technical report, Hewlett Packard Reseach Labs, 2000. HPL-2000-31.
 T. Sherwood and B. Calder. Loop termination prediction. In 3rd International Symposium on High Performance Computing, October 2000.
 T. Sherwood and B. Calder. Automated design of ﬁnite state machine predictors for customized processors. In 28th Annual Intl. Symposium on Computer Architecture, June 2001.
 T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to ﬁnd periodic behavior and simulation points in applications. In International Conference on Parallel Architectures and Compilation Techniques, September 2001.
 T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Tenth International Conference on Architectural Support for Programming Languages and Operating Systems, October
 T. Sherwood, S. Sair, and B. Calder. Predictor-directed stream buffers. In 33rd International Symposium on Microarchitecture, December 2000.
 J. E. Smith. A study of branch prediction strategies. In 8th Annual International Symposium of Computer Architecture, pages 135–148. ACM, 1981.
 C.D. Snyder. Fpga processors cores get serious. Microprocessor Report, 14(9), September 2000.
 G.S. Sohi, S.E. Breach, and T.N. Vijaykumar. Multiscalar processors. In 22nd Annual International Symposium on Computer Architecture, pages 414–425, June 1995.
 A. Srivastava and A. Eustace. Atom: A system for building customized program analysis tools. In Proceedings of the Conference on Programming Language Design and Implementation, pages 196–205. ACM, 1994.
 J. Tubella and A. Gonzalez. Control speculation in multithreaded processors through dynamic loop detection. In 4th International Symposium on High Performance Computer Architecture, February 1998.
 G. Tyson, M. Farrens, J. Mathews, and A. Pleszken. Managing data caches using selective cache line replacement.
International Journal of Parallel Programming, 25(3), 1997.
 G. Tyson, K. Lick, and M. Farrens. Limited dual path execution. Technical Report CSE-TR 345-97, University of Michigan, 1997.
 S. Wallace, B. Calder, and D.M. Tullsen. Threaded multiple path execution. In 22nd Annual International Symposium on Computer Architecture, June 1998.
 K. Wang and M. Franklin. Highly accurate data value prediction using hybrid predictors. In 30th Annual International Symposium on Microarchitecture, December 1997.
 T.Y. Yeh and Y.N. Patt. A comparison of dynamic branch predictors that use two levels of branch history. In 20th Annual International Symposium on Computer Architecture, pages 257–266, San Diego, CA, May 1993. ACM.
 A. Zhi, A. Moshovos, S. Hauck, and P. Banerjee. Chimaera: A high performance architecture with a tightly-coupled