FREE ELECTRONIC LIBRARY - Online materials, documents

Pages:     | 1 |   ...   | 2 | 3 || 5 |

«Automated Design of Finite State Machine Predictors Timothy Sherwood Brad Calder Department of Computer Science and Engineering University of ...»

-- [ Page 4 ] --

7.3 Generating Traces to Train FSM Branch Predictors There are many different types of branch traces one could concentrate on to generate FSM predictors, and we only focus on one in this paper. For the custom branch prediction architecture in Figure 3, traces are generated on a global basis, and then used to generate a FSM predictor for individual branches. To generate a FSM for a specific branch, the traces used to train the FSM generator could include the local history of the branch, global history, or a combination of the two. We examined all of these and found that it is better to concentrate on capturing global correlation rather than a local history pattern because of the tendency of the global correlation to be more repeatable across different inputs, and our approach is efficient at finding and taking advantage of global correlations between branches.

The first step we perform in building our custom branch prediction architecture is to profile the application with our baseline predictor, in this case the Xscale predictor which is a bimodal table of 2-bit counters. This identifies those branches that are causing the greatest amount of mispredictions. For each of these branches we generate a Markov Model as discussed in Section 4. To generate the Markov Models that we need for the branches we are concentrating on, we keep track of a single global history register of length N. When a branch is encountered in the trace, we update that branch’s Markov Model with the outcome of the branch, given the history in the global history register. The Markov table is then fed into the FSM generator, and a customize FSM predictor is generated for that branch. Since the number of global histories that a given branch might see before it is being predicted is small compared to the 2N possible histories, the Markov Models can be compressed down significantly by only storing non-zero entries. For all the custom branch prediction results in this paper we use a history of length 9.

For this design only the branch that the custom predictor is generated for uses the FSM for prediction, based on a tag match as described earlier. We update the custom finite state machines in a different manner than the standard local 2-bit counters. We update all of the custom predictors in parallel on every branch, rather than only matching branches. On an update, the branch predictor moves each FSM predictor from one state to the next state based upon the prediction of the branch.

7.4 Estimating Area of FSM Predictors In order to enable us to make high level design choices before synthesis, it is important to be able to estimate the area that our automatically generated state machines will take up. To establish a relationship between the state machine descriptions and their area we took a random sample of custom FSM predictors generated across all of the benchmarks we examined. These account for 10% of all of the FSM predictors generated. We then synthesized these state machines with Synopsys.

Figure 4 shows the number of states in the state machine versus the area of the implementation. Each triangle represents one FSM predictor. The x-axis is the number of states in a given state machine, while the y-axis is the area as reported by Synopsys.

–  –  –

Figure 4: Area of a random sample of the custom FSM predictors taken from all of the benchmarks we examined.

The area was determined by Synopsys and plotted against the number of states in that machine. The strong linear bound allows us to estimate area for design tradeoffs and evaluation The results show, for most state machines, that the area is linearly proportional to the number of states in the machine. This trend line is drawn in with a dashed line. However this is not the case for all of the FSM predictors. For the state machines with a large number of states, the area is much less than would be predicted by this approximation. For these FSM predictors, there is a large number of states, but the machine is highly regular.

Because of this the state machine can be optimized down to a size much smaller than even some of the more chaotic state machines with less state.

We use the linear line in Figure 4 to estimate the area for the rest of the FSM predictors in the results to follow.

Even though the approximation does not hold for all of the predictors, it does bound the area of the predictors by the number of states in the state machine. We can use this approximation to make conservative estimates of area.

For the rest of this paper we use this approximation to quantify area rather than performing synthesis on each state we wish to examine.

7.5 Branch Prediction Results We compare out customized predictor against three other branch predictors. The first predictor we compare against is the the gshare predictor of McFarling [26]. The second predictor is a meta chooser predictor that contains a two-level local history branch prediction table, a global history table, and a meta chooser table that determines whether to use the local or global prediction for predicting the current branch. We call this the Local Global Chooser (LGC) predictor, and it is similar to the predictor found in the Alpha 21264. The final predictor we compare against is a set of per-branch two bit counters, as is found in the XScale processor. The XScale processor has a 2-bit counter associated with every branch target buffer entry, and not-taken is predicted on a BTB miss.

Figure 5 contains the results for the six programs comparing our custom FSM predictors to the gshare, LGC, and XScale predictors. The x-axis is the total area of the predictor, including the BTB structure, while the y-axis is the misprediction rate. The results for the LGC and gshare were gathered over a range of sizes. We present results for custom predictors trained on inputs, which are different than the ones used for measuring performance.

These results are denoted custom-diff. The custom-same results are when we use the same input for training and comparison, and it provides a limit to how well the FSM predictors may perform for that input.

The curve for the custom FSM predictors is generated by increasing the number of branches to be customized.

The top-left most point in the curve is the XScale architecture with one custom branch predictor. The points on the curve immediately to the right are for using 2 custom branch predictors, then three, and so on. As we add more FSM predictors, the number of mispredictions is reduced and at the same time the area to implement the predictor grows. The results show that for all programs the misprediction rate decreases as we devote more and more chip area to the prediction of branches.

The first thing that is very noticeable is that there is little to no difference between custom-diff and customsame. This implies that our training output has done a good job capturing the behavior that is inherent to the program. One could certainly use more than one input for training, and indeed this would be a good idea for verifying that the models generated correctly capture the behavior of the program.

For all of the programs, the custom predictors achieve the lowest misprediction rate of any predictor for their size. To beat the performance of these small predictors you would need a general purpose predictor that is 2 to 5 times larger. For some of the programs, even these large table sizes cannot perform better than our custom predictors.

For the program compress all of the benefit comes from the state machine for one branch. The misprediction rate is reduced from 23% to 16.5% by adding one custom FSM predictor. Adding more FSM predictors simply increases the area with little to no improvement in misprediction rate. Moderate table sizes of a LGC can outperform our customized predictors because there the branch causing the most mispredictions benefits from having local history. This branch would benefit from having a loop count instruction in a embedded processor, or

–  –  –

10.0% 14.0% 8.0% 13.0% 6.0% 12.0% 4.0% 11.0% 2.0%

–  –  –

Figure 5: Misprediction rate versus estimated area for the six benchmarks examined. Results are shown for the baseline XScale predictor, gshare, a meta predictor with a chooser between local and global history (LGC), and the customized branch predictor. The customized branch predictor results are for training on the same input used as gathering results testing, custom-same, and training on a different input from gathering results, custom-diff.

could easily be captured via customizing the branch predictor to perform loop termination prediction to predict the branch [35].

For g721, we can see that the XScale does a very good job of capturing the behavior of most of the branches in the program. However with little extra area we can reduce the misprediction rate from 8% to just over 7%. For vortex and gs the misprediction rate is reduced significantly from XScale’s default local 2-bit counters. For gs it is reduced from just under 5% to just over 4%, and for vortex the improvement is a reduction in miss rate from 13% to 3%. For these two programs, a local-global chooser of over two times the size of the custom predictor is needed to get a lower misprediction rate.

The best results are seen for ijpeg and gsm. For both of these programs the misprediction rate is far below that of even the largest table we examined. These programs do not benefit from local history. This can be seen by comparing how LGC performs against gshare for these programs. Since our scheme captures the global correlation so efficiently we can outperform the largest gshare by half a percent for ijpeg and over a full percent for gsm while only using a fifth of the area.

7.6 Custom Finite State Machine Examples Figures 6 and 7 are two examples of simple finite state machines that were generated using the techniques presented. Figure 6 serves as a good starting point for understanding how the state machines are used to generate predictions.

The state machine in figure 6 was automatically generated to target a particular branch in ijpeg. Each state is annotated with a prediction, shown in brackets. The state machine is updated by traversing an edge. An update with taken will traverse the edge labeled 1, while not-taken will traverse the edge labeled 0. This particular state machine was generated to capture a branch that is highly correlated with the branch that is two branches back in the history. For example, we would like to predict a 1 if the history patterns is “10” or “11” and predict 0 in all other cases. As can be seen in the figure, this is successfully captured by the finite state machine.

If you start in any state of the machine and you follow two transitions, first a 1 and then either a 0 or a 1, you will end up in a state that is labeled a 1. The converse is also true. If you follow an edge labeled 0, followed by either a 1 or a 0, the state you end up in will predict 0. This property is is maintained into even the most

–  –  –

Figure 7: A slightly more complex state machine generated from gs that captures two patterns. Patterns that match 0x1x or 0xx1x predict taken while all others predict not-taken.

complicated predictor.

Figure 7 is a slightly more complex state machine, this time generated for one of the branches in gs. This state machine captures two different patterns, 0x1x and 0xx1x where x is a “don’t care”. If, as above, you traverse any set of edges that match either of these patterns you will end up at a state predicting a 1. If the edges you traverse do not match these patterns you will end up in a state predicting 0.

For this branch there are four global history patterns that are seen the majority of the time, 001001010 which is biased taken, 010011010 which is biased not-taken, 010101010 which is biased taken, and 110010010 which is biased taken. There are other patterns which contribute but these represent over 90% of the patterns seen by this state machine. If you trace through these patterns, or just the last five digits of them, starting from any state you will end up in a state that predicts correctly. Because of this fact, it does not matter that we are updating the branch predictors for branches that may have never been trained on. No matter what state the machine has gotten itself into, as long as the final sequence of branches leading up to the branch is captured by one of the patterns the state machine will make the correct prediction.

These results show that our automated steps can be used to accurately identify branches that are highly correlated, and the nature of the correlation. This information could then be used to potentially guide additional compiler and hardware optimizations.

Note that for a given custom FSM predictor, only a Markov table with histories of length H leading up to the branch to be predicted are used to generate the FSM predictor. Therefore, using our update on every branch policy, we could be updating a custom FSM predictor with branches that were never seen in the histories used to generate the custom FSM predictor. This is fine because the FSM predictor we generated guarantees that the predictor will be in the desired prediction state when we fetch the candidate branch. The FSM predictor will be updated with state transitions for the H branches that come right before the branch that is assigned to the FSM predictor. Therefore, no matter what state the FSM predictor was in before performing the H branch updates, after the updates it will be in the desired prediction state.

8 Summary Predictive finite state machines are a fundamental building block in computer architecture and are used to control and optimize all types of prediction and speculation. In this paper we present an automated approach to generating FSM predictors using profile information. This approach can be used to develop FSM predictors that perform well over a suite of applications, tailored to a specific application, or even a specific instruction.

The algorithm we present uses profiling to build a Markov model of the correlated behavior present in a target application. From this model we define a set of patterns to capture. These patterns are compressed into a usable form and then translated into a regular expression. From the regular expression we use subset construction to build a finite state machine. This state machine is then optimized for our uses using Hopcroft’s Partitioning Algorithm and Start State Reduction.

Pages:     | 1 |   ...   | 2 | 3 || 5 |

Similar works:

«1 m CONFIDENTIAL UNIVERSITY GRANTS COMMISSION BAHADURSHAH ZAFAR MARG NEW DELHI-110 002 MINUTES OF THE 492nd MEETING OF THE UNIVERSITY GRANTS COMMISSION HELD ON 11th MARCH, 2013 The 492nd Meeting of the Commission was held on 11th March, 2013 in which the following were present: 1. Prof. Ved Prakash Chairman 2. Sh. Ashok Thakur Member 3. Smt. Anjuly Chib Duggal Member 4. Prof.Dr. Seyed E. Hasnain Member 5. Prof. Meenakshi Gopinath Member 6. Dr. Indu Shahani Member 7. Prof. Yogendra Yadav Member...»

«Rethinking Precarious Neighborhoods Scientific Editor: Agnès Deboulet Rethinking Precarious Neighborhoods SCIENTIFIC EDITOR Agnès Deboulet, LAVUE-CNRS COORDINATOR Irène Salenson, AFD The Etudes de l’AFD collection includes studies and research supported and coordinated by Agence Française de Développement. It promotes the diffusion of knowledge gathered from both in-the-field experience and academic work. The papers are systematically submitted for approval to an editorial committee that...»

«SECURITIES AND EXCHANGE COMMISSION (Release No. 34-79491; File No. SR-FICC-2016-007) December 7, 2016 Self-Regulatory Organizations; Fixed Income Clearing Corporation; Notice of Filing of Proposed Rule Change to Implement a Change to the Methodology Used in the MBSD VaR Model Pursuant to Section 19(b)(1) of the Securities Exchange Act of 1934 (“Act”),1 and Rule 19b-4 thereunder,2 notice is hereby given that on November 23, 2016, the Fixed Income Clearing Corporation (“FICC”) filed with...»

«Corporate Card Program Handbook Procedures and Regulations Corporate Card Services 2500 University Drive NW Calgary, Alberta, CANADA T2N 1N4 Telephone: (403) 220-5611 Email: cardhelp@ucalgary.ca www.ucalgary.ca/finance Scotiabank Customer Service Canada and USA 1-888-823-9657 Outside Canada and USA (collect) 1-416-750-6138 May 2016 TABLE OF CONTENTS 1.0 INTRODUCTION 2.0 ROLES & RESPONSIBILITIES Corporate Card Services Team Authorized Approver (VP, Dean, Director, Department Head) Budget and...»

«Loan Originations and Defaults in the Mortgage Crisis: The Role of the Middle Class Manuel Adelino Duke University Antoinette Schoar MIT and NBER Felipe Severino Dartmouth College Current version: February 2016; first version: November 2014 This paper highlights the importance of middle-class and high-FICO borrowers for the mortgage crisis. Contrary to popular belief, which focuses on subprime and poor borrowers, we show that mortgage originations increased for borrowers across all income...»

«Society of Skeletal Radiology 36th Annual Meeting March 17 – 20, 2013 Hyatt Regency Hill Country Resort and Spa San Antonio, TX Table of Contents Welcome from the Program Chair 2012 – 2013 Committees Program Schedule Overview Educational Needs & Objectives Accreditation Industry Sponsored Events SSR Paper Award Winners Young Investigator Award Winners Thank You to Our 2013 Promotional Partners Plan to Attend Thank You to Our 2013 Exhibitors SSR Past Presidents Ultrasound Workshop...»

«RECEIVED CFTC UNITED STATES OF AMERICA Before the Office of Proceedings COMMODITY FUTURES TRADING COMMISSION Proceedings Clerk 4:31 pm, Sep 29, 2014 ) ) In the Matter of: ) Fan Zhang, ) ) CFTC Docket No. 14-33 Respondent. ) ) ) ORDER INSTITUTING PROCEEDINGS PURSUANT TO SECTIONS 6(c) AND 6(d) OF THE COMMODITY EXCHANGE ACT, MAKING FINDINGS AND IMPOSING REMEDIAL SANCTIONS I. The Commodity Futures Trading Commission (Commission) has reason to believe that on December 26, 2012, and again on March...»

«NATURE'S NUMBERS The Unreal Reality of Mathematics IAN STEWART • BasicBooks A DIvIsIOn of HarperCollinsPubllshers The Science Masters Series is a global publishing venture consisting of original science books written by leading scientists and published by a worldwide team of twenty-six publishers assembled by John Brockman. The series was conceived by Anthony Cheetham of Orion Publishers and John Brockman of Brockman Inc., a New York literary agency, and developed in coordination with...»

«Ontario Geological Survey Open File Report 6072 Physical Evaluation and Assessment of Bedrock Aggregate Resource Potential, North Shore of Lake Superior ONTARIO GEOLOGICAL SURVEY Open File Report 6072 Physical Evaluation and Assessment of Bedrock Aggregate Resource Potential, North Shore of Lake Superior by Jagger Hims Limited, Clayton Research Limited and Agritrans Limited Parts of this publication may be quoted if credit is given. It is recommended that reference to this publication be made...»

«Form 603 Corporations Act 2001 Notice of initial substantial holder Section 671B TGA To:Company Name/Scheme The Secretary Thorn Group Limited Level 1, 47 Rickard Road Bankstown NSW 2200 PH: (02) 9101 5000 Fax: (02) 9101 5035 ACN 072 507 147 1. Details of substantial National Australia Bank Limited (ACN 004 044 937) and its associated entities listed in shareholder Annexure A Name ACN (if applicable) The holder became a substantial holder on 24/12/2013 The total number of votes attached to all...»

«MPIfG Discussion Paper 10/ 7 Neoliberal Restructuring in Turkey From State to Oligarchic Capitalism Roy Karadag Roy Karadag Neoliberal Restructuring in Turkey: From State to Oligarchic Capitalism MPIfG Discussion Paper 10 /7 Max-Planck-Institut für Gesellschaftsforschung, Köln Max Planck Institute for the Study of Societies, Cologne July 2010 MPIfG Discussion Paper ISSN 0944-2073 (Print) ISSN 1864-4325 (Internet) © 2010 by the author(s) Roy Karadag is a researcher at the Max Planck Institute...»

«The New York State Internal Control Act Implementation Guide: Strengthening Compliance With the Act and Standards September 2006 Table of Contents Section Page Acknowledgements and Executive Summary Internal Control Work Group Reports: 1. Program Coordination and Implementation 2. Education and Training Internal Audit Work Group Reports: 1. Organization and Staffing a. Organizational Placement and Independence b. Audit Director Qualifications c. Audit Staffing d. Outsourcing, Insourcing and...»

<<  HOME   |    CONTACTS
2017 www.thesis.dislib.info - Online materials, documents

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.