FREE ELECTRONIC LIBRARY - Online materials, documents

Pages:     | 1 | 2 || 4 | 5 |

«Automated Design of Finite State Machine Predictors Timothy Sherwood Brad Calder Department of Computer Science and Engineering University of ...»

-- [ Page 3 ] --

A stride value predictor [7, 8, 33] keeps track of not only the last value brought in by an instruction, but also the difference between that value and the previous value. This difference is called the stride. The predictor speculates that the new value seen by the instruction will be the sum of the last value seen and the stride. We chose to use the two-delta stride predictor [8, 33], which only replaces the predicted stride with a new stride if that new stride has been seen twice in a row. Each entry contains a tag, the predicted value, the predicted stride, the last stride seen, and a saturating up and down confidence counter. We use a table size of 2K entries for the stride value predictor, which means there are also 2K confidence counters used (one for each table entry). We performed value prediction for only load instructions.

6.2 Value Prediction Confidence There have been several papers that point out the benefit of applying value prediction to various troublesome instructions. It has further been shown that actually predicting the values of these instructions is error prone. It was noted by both Calder et. al. [5, 4] and by Burtscher and Zorn [2] that in order for value prediction to be a viable option, an improved confidence estimation technique is needed. Confidence estimation is a way for the processor to more prudently allocate resources based on whether or not it believes the values predicted are correct or not. In effect, two predictions are made, a value prediction and a meta-prediction as to whether or not the value prediction was correct.

In [5, 4], Calder et al. examined Saturating Up and Down (SUD) counters with several different saturation thresholds and wrong decrement values to vary the accuracy and coverage, and this was done by trial and error.

They showed that no one SUD counter worked best for all programs. A very accurate SUD counter was needed for mispredicted values when using squash recovery to obtain increases in performance, but this resulted in low coverage of potential value predictions. In contrast, when value prediction used re-execution recovery, it did not have to be as accurate, since the miss penalty is small, and the SUD counter could instead concentrate on achieving a high coverage.

6.3 Automatically Designing Confidence Predictors Instead of using a Saturating Up/Down counter for estimating the confidence of the value predictions we propose using our automatically designed FSM predictors to generate the confidence estimation hardware.

Each time a load was executed, we put into the trace, which was used to create the Markov model, whether the load was correctly value predicted (1) or not (0). We then applied the techniques in Section 4 to automatically generate an FSM predictor. This automatically generated FSM provides a confidence prediction each executed load predicting if it will going to be correctly value predicted or not.

We use the same design techniques as described in Section 4, except that instead of targeting a single application we target a suite of general purpose applications. In order to make sure that the predictors generated are not specialized to a specific application, but rather to a suite of applications, we provide cross-trained results. For each application in our suite, we combine the traces from all of the other programs excluding the application to be used for reporting results. So for example, when we show results for gcc, the predictor has been trained on all of the other programs in the suite, but not on gcc.

6.4 Results Figure 2 shows value prediction confidence for saturating up/down counters versus state machine predictors that were designed automatically with cross-training across all benchmarks. The x-axis is the Accuracy of the value predictor when using the various confidence schemes. The accuracy is the percent of value predictions that were marked as confident, that were in fact correct predictions. An accuracy of 100% means that every-time the confidence predictor marked a prediction as confident, it was a correct prediction. The y-axis is the coverage which is defined as the percent of correct value predictions that were allowed through by the confidence predictor.

A coverage of 100% means that all of the times that the value predictor predicted a value correctly, it was marked as confident by the confidence predictor.

Accuracy and coverage trade off with each other. It is possible to get higher accuracy by being conservative and only marking those value predictions that you are very sure are correct as being confident. On the other hand, higher coverage can be had by being more trusting of the value predictor. This tradeoff can be clearly seen in figure 2. The points for the up/down counters are the accuracy and tradeoff points for a variety of counter configurations. The points shown are for counters with a maximum value (number of states) of 5, 10, 20, and 40, miss penalties of 1, 2, 5, 10, and full, and for thresholds of 50% 80% and 90%.

This is to be compared to the lines in Figure 2 that show the coverage and accuracy tradeoffs that can be made by using automatically designed state machines with history lengths of size 2 to 10. These results indicate that there is a significant amount of coverage to be gained for any given accuracy. Take gcc for example, at a target accuracy of 80%, the best configuration of saturating up-down counter gets a coverage of less than 10%. In comparison, our automatically designed state machine (derived from cross-training) is capable of covering 20% of correct predictions, which would result in a more than twice as many correct value predictions at the same accuracy.

It is interesting to note that our automatically generated FSM predictors converge with the saturating updown counter results for extremely high accuracy requirements. This is because, in order to achieve the highest accuracies possible, the best solution is to simply wait for very long sequences of correct predictions before becoming confident. Being this conservative comes at price though, which is very low coverage.

7 Branch Predictors for Customized Processors In this section we examine applying our automated approach to designing FSM predictors to the customization of branch predictors for customized processors. We begin with a discussion of customized processors and describe our customized branch prediction architecture, and then the training approach we use for building the customized FSM branch predictors. We then evaluate the performance and area of the customized predictor and compare it to a range of prior branch predictors.

7.1 Customized Processors The rapidly growing embedded electronics industry demands high performance, low cost systems with increased pressure on design time. The gap between the two current methods for dealing with such systems, ASICs and off

–  –  –

40% 40% 20% 20%

–  –  –

40% 30% 20% 20% 10%

–  –  –

Figure 2: Value Prediction Confidence for saturating up/down counters versus state machine predictors that were designed automatically from cross-training across benchmarks. The x-axis is the Accuracy of the value predictor when using the various confidence schemes. The y-axis is the percent of correct value predictions that were allowed through by the confidence predictor.

the-shelf processors, leave many unsatisfied with the trade offs between long design cycles and lower performance.

The designer could choose to design a custom ASIC, which has the advantages of high performance at the cost of long design and verification times. An alternative choice is an off the shelf embedded processor. Embedded processors allow rapid development times and low development cost in terms of both time and money, but with performance typically lagging behind ASICs.

To address this problem there is an emerging technique which will add a new and important point in the spectrum of solutions. Automatically generated customized processors have the promise of delivering the needed performance with only slightly inflated design time.

A customized processor is a processor tailored to an individual application or a set of applications. The idea became a common research subject in the late eighties and early nineties with projects such as SCARCE [27] and The Architect’s Workbench (AWB) [12]. SCARCE was a flexible VLSI core for building application specific processors. AWB was a system built to help the designers evaluate design tradeoffs for building embedded processors.

Currently there is a resurgence of interest customized processors with many research projects and companies working in this domain. A great deal of the interest comes from increased system level integration, where it is increasingly common to place different parts of a system all onto one chip. To support this it is now common for vendors to sell descriptions of processor cores rather than actual silicon itself. These processor core descriptions can then be customized for a given application.

There are two major approaches to customized processors, one approach working towards pre-silicon customization, and the other approach pressing for post-silicon reconfiguration or adaptability. The first approach, which is being adopted by such systems as Hewlett Packard’s PICO project [29, 1, 34] and Tensilica’s Xtensa [15, 10], is intended to produce low-cost high-speed fixed hardware for embedded systems. The other approach attempts to take advantage of post-silicon customization through reconfigurability. Examples of systems that support reconfigurability are Altera’s NIOS [41] processor core and the CHIMEARA chip [50].

There is also another range of freedom for these chips, which is the design level. Some systems, such as CHIMEARA [50], and PRISC [30], concentrate their application tailoring at the level internal to a functional unit.

These systems work by tailoring the processor’s functional units to the application running on it. For example, they might merge commonly executed expressions into a single instruction. Other options are to perform the customization at the architectural level of registers and number of functional units such as Xtensa [15], SCARCE [27] and Lx [11]. This allows the system designer to make high level architectural tradeoffs for the application such as relieving register pressure or removing unused functional units. Still other designs have co-processors for a given application or type of application. Xtensa [10] supports the tight integration of co-processors into an architecture, while PICO [29, 1, 34] automatically generates co-processors in the form of a custom designed systolic array.

Even though there is a long history and many different projects, all of the projects have a similar high level overview. Begin with a configurable or parameterizable architectural template and customize it to fit the application at hand. Because all of these systems target a very specific application or suite of applications, and these applications are under the designers control, any one of these approaches could benefit from the techniques we present.

7.2 Customized Branch Prediction Architecture It is increasingly common for current embedded processors to have branch prediction. For example, Intel’s XScale (StrongARM-2) processor [21] has a 128 entry Branch Target Buffer (BTB), and each entry in the BTB has a 2bit saturating counter which is used for branch prediction. Our goal is to build a branch predictor for a given application that will have the performance of a large general purpose predictor, with about the same area that is already being used by embedded branch predictors.

Our approach is to take a baseline predictor such as the local 2-bit counters from the XScale architecture, and extend this with custom designed FSM predictors for the hard to predict branches. We use the standard two bit counters for most branches and use the custom FSM predictors for branches that do not work well with the default predictor. In doing this, we limit both the amount of additional area we have to add to get good performance and the amount of code we have to hard-wire into the processor.

The custom branch architecture we propose is seen in Figure 3. We extend XScale’s coupled BTB branch prediction architecture with a set of custom predictors that are hard-wired to particular branches. These custom predictors have the addresses that are associated with them locked down by the system software. While the state machines are fixed in hardware and are targeting specific branches, we wish to allow some software configurability

–  –  –




–  –  –

Figure 3: Customized branch prediction architecture. The architecture contains a traditional coupled BTB with 2-bit saturating up-down counters for conditional branch prediction. In addition, customized branch predictors are added for individual branches. A customized branch entry contains a tag (address), target and a custom FSM predictor for predicting the direction of the branch. The tag is associated with the branch that the FSM predictor was built for and is locked down by the system software.

should a re-compile of the software be needed after fabrication. This will allow the branches to move around in the address space but still use their custom state machines as long as the branch prediction patterns do not change.

The address of the branch is used to index into the BTB as well as the custom predictors. The custom branch entries perform a fully associative tag lookup to search for a match. If there is a match in the standard table then the two-bit saturating up-down counter is used to predict the bias of the branch. If instead there is a match in the custom table then the output on the current state of the corresponding state machine is used for prediction. In the next section we describe how we generated the FSM predictors hard-wired into this architecture.

Pages:     | 1 | 2 || 4 | 5 |

Similar works:

«Self-Regulating Software: Scalable, Predictable, Composable Distributed Systems Calton Pu Center for Experimental Research in Computer Systems Georgia Institute of Technology Atlanta, GA 30332-0280 Abstract Main Ideas: The difficulties in building scalable, predictable, and composable distributed systems are well known. We introduce the notion of self-regulating software to address these difficulties. A software module is self-regulating if it is able to satisfy a set of explicitly defined...»

«Case 14-22582 Doc 12 Filed 06/18/14 Entered 06/18/14 21:43:59 Desc Main Document Page 1 of 10 LOWENSTEIN SANDLER LLP Kenneth A. Rosen, Esq. Steven M. Skolnick, Esq. S. Jason Teele, Esq. Nicole Stefanelli, Esq. Shirley Dai, Esq. Anthony De Leo, Esq. 65 Livingston Avenue Roseland, New Jersey 07068 (973) 597-2500 (Telephone) (973) 597-2400 (Facsimile) Proposed Counsel to the Debtors and Debtors-in-Possession UNITED STATES BANKRUPTCY COURT DISTRICT OF NEW JERSEY In re: Chapter 11 Kid Brands, Inc.,...»

«Wanted: a Rector for. St. Margaret’s Church, Bagendon St. John the Evangelist, Elkstone All Saints, North Cerney THE CHURN VALLEY BENEFICE St. Mary the Virgin, Cowley St. Peter’s Church, Stratton St. Peter’s Church Rendcomb St. Mary Magdalene, Baunton St. Giles, Coberley St. James’ Church, Colesbourne The Churn Valley Benefice Statement The Churn Valley Benefice is looking for a new incumbent who will befriend us and lead us to minister in collaborative and creative ways across the...»

«441 BODENHAM PARISH COUNCIL Monday 1st February 2010 at Siward James Centre, at 7.30 pm Present: Cllr Tilford, Cllr Mullenger, Cllr Mitcheson. Cllr Davis, Cllr Ling, Cllr Knott, Cllr Herbert, Cllr Mrs Avery and Cllr Clark. Cllr Grumbley was also in attendance. There were 7 members of the public present at the start of the Meeting. Police Update.• PCSO Stephanie Annette was present and was invited to address the Meeting. She mentioned a recent burglary at Bodenham and appealed for residents to...»

«Disability Equality Training Trainers Guide by K Gillespie-Sells SRN; RNI; Dip N; Cert Ed and J Campbell B.A. (Hons); M.A. Joint Heads of Training LBDRT Published by: CENTRAL COUNCIL FOR EDUCATION AND TRAINING IN SOCIAL WORK Derbyshire House St. Chad's Street London WC1 8AD First Published: June 1991 ISBN: 0 904488 896 © Central Council for Education & Training in Social Work Designed by CCETSW and LBDRT staff Acknowledgements for illustrations: Colin Wheeler Manchester Disability Forum...»

«Tithe An Oireachtas An Comhchoiste um Dhlí agus Ceart, Comhionnanas, Cosaint agus Cearta na mBan An Tuarascáil Deiridh maidir leis an Tuarascáil ón gCoimisiún Fiosrúcháin Neamhspleách faoi Bhuamáil Bhaile Átha Cliath agus Mhuineacháin Márta 2004 Houses of the Oireachtas Joint Committee on Justice, Equality, Defence and Women’s Rights Final Report on the Report of the Independent Commission of Inquiry into the Dublin and Monaghan Bombings. March 2004 TABLE OF CONTENTS Chairman’s...»

«ESSR’14: 17th European Symposium on Radiopharmacy and Radiopharmaceuticals Pamplona, 25 April 2014 Preclinical safety testing of diagnostic and therapeutic radiopharmaceuticals regulatory requirements Rex FitzGerald SCAHT / Universität Basel Klingelbergstrasse 61 4056 Basel www.scaht.org Contents of talk • General preclinical safety testing requirements (ICH) • US and EU radiopharmaceutical regulations • ICH microdose approach • Swissmedic summary recommendations ESSR’14 /...»

«THE UNEQUAL YOKE Page 1 of 4 THE UNEQUAL YOKE Dear Ted, I am an elder in a church in Australia that runs along very similar lines to PBC. We are a new church and are encountering the problem of Christians dating unbelievers then moving into a more serious relationship. We have even had to apply church discipline to a couple who had moved in together (with success after much pain). Can you give me some scriptures other than the unequal yoke to help advise these people? We are serious about...»

«Vol. 89 (1996) No. 2 ACTA PHYSICA POLONICA A Proceedings of the II hit. School and Symposium on Physics in Materials Science, Jaszowiec 1995 REVIEW OF SURFACE RELAXATION AND RECONSTRUCTION PHENOMENA P. ZIELIŃSKI Institute of Nuclear Physics, Radzikowskiego 152, 31-342 Kraków, Poland The basic definitions concerning structure of surface are given. The principal experimental methods of the observation of surfaces are reviewed. A possibly heuristic explanation is given of principal theoretical...»

«2 Seat Trailer Owner’s manual and safety instructions Owner’s Manual Every effort has been made to ensure your trailer is of top quality and proven safe design, ready to provide you with many years of safe, happy use. Please read this manual and instructions fully before assembly and use. Please keep this manual for future reference. If you sell or give this product to someone else, please include this Owner’s Manual, and ask the new owner to read the instructions thoroughly before using....»



<<  HOME   |    CONTACTS
2017 www.thesis.dislib.info - Online materials, documents

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.