FREE ELECTRONIC LIBRARY - Online materials, documents

Pages:     | 1 || 3 | 4 |   ...   | 5 |

«Automated Design of Finite State Machine Predictors Timothy Sherwood Brad Calder Department of Computer Science and Engineering University of ...»

-- [ Page 2 ] --

In this way, the problem of creating a FSM predictor reduces to that of finding a regular expression that describes L in a compact manner. Once we have the regular expression that recognizes L we can use established methods to convert it into a finite state machine that recognizes L. If we do this translation correctly, the finite state machine will indicate that it recognizes a member from L when it sees an input string in L. Because we picked L to be only the subset of s that we want to predict 1 on, we know that when we recognize this input string sequence that we should predict 1.

To find the language L we use profile information from the application. From the profile a model of the data is built, and this model can then be analyzed to find histories that are biased toward predicting 1. This set of histories is then compressed to a usable form and converted into a regular expression.

4.2 Modeling To design a FSM predictor we start by tracing the target application suite to create a representative sequence of predictions for the applications.

An issue is determining what to trace, which depends on the intended use of the predictor. For branch prediction, we have two states 0 and 1, and the trace consists of a series of branch outcomes. We will use the following

trace for the rest of the examples in this section:

–  –  –

The next step after trace generation, is to build up a statistical model for the data. For this we use an Nth order Markov Model. An Nth order Markov Model is a table of size 2N which contains P [1|last N inputs] for each of the possible 2N last N inputs in the trace. N serves as a limit to the amount of history that we may use in making our predictions.

For our example trace, we build up the following probabilities for a second order Markov table (N=2):

–  –  –

P [1|00] is generated by finding all times that 00 is followed by a 1, in this case there are 5 cases of the pattern 00, and 2 of them are followed by a 1.

The corresponding predict 0 probabilities are calculated by subtracting the predict 1 probability from 1 because we only have 2 symbols in our alphabet. Note that while this scales exponentially with the size of N, it is still very reasonable for even the largest values of N we have examined. Having more knowledge of history after a certain point does not improve accuracy and we found that point was well within the reach of the techniques described.

For the predictors we are generating we did not see the need to go beyond N = 10.

4.3 Pattern Definition Now that we have the probabilities that we need, we can continue with the next step which is picking the histories that we will eventually build the language L from. In the case of a branch predictor, where we simply wish to minimize the number of mispredictions, the task is straight forward. We simply pick all the histories that have a probability of preceding a 1 which is greater than or equal to 1/2 to form the language “predict 1”. In our example above, that would be the set of histories {01, 10, 11}. We would like to predict a 1 whenever one of these histories the trace is divided up into groups of four only for readability, it has nothing to do with the trace itself appears in the input based upon our profile, since they had a probability greater than 1/2. Of course histories with probability equal to 1/2 can go either way.

We can make further design trade offs at this point in the form of don’t care patterns. Some patterns only occur very rarely, and their inclusion into the set of histories will have almost no effect on the performance of the predictor but will make the job of minimizing more complicated. These history patterns can be placed into a third set, separate from the “predict 1” and “predict 0” sets, called the “don’t care” set. We have found that by placing only the 1% least seen histories in the “don’t care” set can reduce the size of the predictor by a factor of two with negligible impact on prediction accuracy.

Once we have passed this stage of the design flow, the function of the predictor is set and will not change significantly. The finite state machine that will be designed will return a 1 when histories in the “predict 1” set are seen, and 0 when histories in the “predict 0” set are seen. The output of the “don’t care” set is still undecided at this point, and the next stage will take advantage of this to compress the description of the sets.

4.4 Pattern Compression Now that we have our three sets, “predict 1”, “predict 0” and “don’t care”, the next step is to compress our description of the sets into a usable form. This is done by a standard logic minimization tool, where the input is a truth table. The input side of the truth table is the history patterns captured by our Markov Model, and the output side is the set that the history pattern belongs to. If we continue with our example, we have the out set partitioned

up as follows:

–  –  –

From this we generate a truth table, where all the histories that are contained in the “predict 1” set have there output defined as one, and likewise for the “predict 0” set.

{00 → 0, 01 → 1, 10 → 1, 11 → 1} We then use the logic minimization tool Espresso [32], to minimize this truth table down to a set of logic functions that describe the set of inputs that produce a 1. The logic function that is output is our compact description of the “predict 1” set, and it recognizes those histories that we wish to predict 1 or 0 on. This step also merges the “don’t care” set into the “predict 1” and “predict 0” sets in such a way as to minimize the number of unique terms. The logic function that is generated is a sum of products representation. The “predict 1” set will satisfy this function and the “predict 0” set will not.

–  –  –

Here we see the sum of products representation of our truth table, where an x represents an input that is unimportant to the outcome of the function. From this description we can now build our regular expression that will capture the language L.

4.5 Regular Expression Building Once we have the minimized representation of the set of histories we need to build a regular expression which will match these history patterns whenever they come up in the sample input. More specifically, whenever we see an input string, whose trailing N bits match the minimized patterns for the “predict 1” set we return true.

We can build a regular expression as follows. Each term is a clause, each 0 is a 0, each 1 is a 1, and each don’t care, denoted as x, matches either a 0 or a 1. Let use start with a single term, from above, (1 x). This translates to the regular expression 1{0|1} because it is a 1 followed by either a 0 or 1. Similarly (x 1) translates to {0|1}1.

Since a match can be caused by either one, we write the whole expression as {1{0|1}} | {{0|1}1}. However this is not quite complete.

Remember that the language L must describe all possible inputs for which it needs to return 1, not just the last two bits of the input. We overcome this problem by specifying the language to be any string that ends in the desired histories, which can be done by concatenating any number of 1s or 0s in the front of the histories. Thus,

our final regular expression for the above history is:

{0|1} { 1{0|1} | {0|1}1 } Once we have the desired regular expression there is still the matter of converting it to an efficient FSM.

–  –  –

Figure 1: The state machine on the left was generated from the sequence t shown in section 4.2 and optimized with Hopcroft’s partitioning algorithm. The state machine on the right is what results from removing the start-up states and re-numbering. This is the final state machine for t. Note that the patterns ending in 01, 10, and 11 are still captured correctly. The number in brackets shows the prediction produced by each state in the state machine.

4.6 FSM Creation The first step in building a FSM from a regular expression is the construction of a non-deterministic finite state machine, which is a state machine that can be in more than one state at a time. Building a non-deterministic FSM is a fairly straight forward process of enumerating paths. Once the non-deterministic FSM is completed it is converted to a deterministic state machine using subset construction [18]. Subset construction can sometimes lead to an exponential blow up in the number of states, and there are better algorithms known which take advantage of reduced alphabets, but we have found subset construction to be more than sufficient for the predictors we examine in this paper.

At this point we now have a fully implementable finite state machine, however there are two more steps we take to reduce the number of states used by the state machine. We start by applying Hopcroft’s partitioning algorithm [18]. This algorithm removes both unreachable and redundant states, although there are still unneeded states in the state machine.

4.7 Start State Reduction Since the state machine must recognize all strings in the language, there are unnecessary start up states that are only used at the beginning of the execution where history bits are undefined. Since we are only interested in the steady state operation of the state machine, i.e. those strings with a length greater than N, we can cut out those nodes that are only used in parsing strings less than N in length. This goes against what was said earlier about not changing the semantics of the state machine, but this optimization only effects the behavior of the state machine on a small constant number of strings. There can be up to 2N start-up states, and they typically account for around one half of all states in the machine.

If we look to Figure 1 the problem can be clearly seen. The figure shows the state machines generated from our original sequence t. In the state machine on the left, the states S1 and S3 are not need needed after just the first couple of accesses to the state machine. We would like to remove these to save both state and transition logic complexity. This can be done by removing all nodes unreachable from the steady state operation of the state machine. All start-up states are unreachable from the steady state of operation because you can never have undefined history past the start-up phase. We exploit this fact by simply removing all nodes unreachable from steady state. Figure 1 shows that the start-up states S1 and S3 have been removed and the remaining states have been renumbered. Note that the behavior of the finite state machine is still unchanged past the removed start-up states.

4.8 Synthesis At this point we have almost reached our final destination. We have a finite state machine that produces a prediction based on the input. The predictions are governed by the last N bits of the input string, and that information is efficiently encoded into a finite state machine, as can be seen in Figure 1. Starting from any node, and traversing the edges in the FSM following patterns (01, 10, 11) in our “predict 1” set will end at a node with a prediction of

1. Similarly, we will predict 0 for the pattern (00) in our “predict 0” set, starting from any node.

The final step is to actually do synthesis. Since the finite state machine is a well understood and commonly used primitive, every synthesis tool has some form of finite state machine input format. The job of synthesis is to find an efficient hardware implementation for the state machine. This includes finding a good encoding for the states and their transitions. We translate our description of the finite state machine to VHDL, which is then read and analyzed by the Synposys design tool.

5 Methodology We used two sets of benchmarks to gather the results for this paper. The first set of benchmarks were chosen because of their interesting confidence estimation behavior for value prediction as shown in [4]. These programs were groff, gcc, li, go, and perl.

The other set of benchmarks were chosen based on their applicability to an embedded environment and because they had interesting branch behavior. Three of the applications compress, ijpeg, and vortex are from the SPEC95 benchmark suite. We also include three programs from the MediaBench application suite. Gsm decode does full-rate speech transcoding, g721 decode does voice decompression, and gs is an implementation of a postscript interpreter.

We used ATOM [43] to profile the applications and generate the Markov Models over the full execution of the benchmarks. All applications were compiled on a DEC Alpha AXP-21264 architecture using the DEC C compiler under OSF/1 V4 operating system using full optimizations (-O4). Traces were gathered for 300 million instructions from the SimPoints recommended in [37, 38]. After the Markov models were generated, generating all of the FSM predictors for each program using our automated approach took from 20 seconds to 2 minutes on a 500 MHZ Alpha 21264.

6 FSM Predictors in General Purpose Processors Structures in general purpose processors are customized to a suite of common applications used to evaluate the architecture during its design. This can include the SPEC2000 and other benchmarks suites. Any customization applied to a general purpose processor needs to be general purpose, so specializing to individual programs is not practical. Therefore, we use an aggregate trace of several benchmarks together to represent the common behavior of the suite of programs. We then build a customized predictor to this aggregate trace, which is used to represent the general purpose behavior of the workload the architecture is designed to.

6.1 Value Prediction Several architectures have been proposed for value prediction including last value prediction [22, 23], stride prediction [13, 14], context predictors [33], and hybrid approaches [48, 31]. In this study we focus on using a stride-based value predictor, since it provides the most performance for a reasonable amount of area devoted to value prediction [3].

Pages:     | 1 || 3 | 4 |   ...   | 5 |

Similar works:

«The Sacraments: A Reformed Perspective Chapter 4: An Excursus on the Baptist use of Jeremiah Brian Schwertley One of the most common arguments by Reformed Baptists against the baptism of covenant infants is based on Jeremiah 31:31-36. It is argued that in this prophecy regarding the New Covenant we have a radical change of administration where there is a regenerate church membership; where no member of the New Covenant can fall away. Note the following examples. Jewett writes, According to the...»

«PRODUCT MARKET STUDY (PMS) Business Opportunities for Malaysian Companies in the Green Building & Construction Sector in KSA Prepared By: MATRADE JEDDAH June 2015 CONTENTS 1. EXECUTIVE SUMMARY 2. INTRODUCTION 3. OVERVIEW OF THE GREEN BUILDING AND CONSTRUCTION SECTOR 4. DETERMINANTS OF MARKET GROWTH Factors for Growth Hindrances of Growth PROJECTs IN GREEN BUILDING King Abdullah Financial District (KAFD) King Abdullah University of Science and Technology (KAUST) PENETRATING THE GREEN BUILDING IN...»

«Wolfgang E. Ernst Conferences (contributed and invited, most with unrefereed proceedings) 1 Übergangswahrscheinlichkeiten des neutralen Kryptons aus Bogenmessungen, W. E. Ernst and E. Schulz-Gulde, DPG-Frühjahrstagung 1976 in Hannover, Germany. 2 Eigendruckund Starkverbreiterung von roten Kri-Linien, W. E. Ernst, B.-H. Müller und Th. Zaengel, DPG Frühjahrstagung 1977 in Essen, Germany. 3 A Computer Controlled CW Visible-UV-Laser Spektrometer, J. Kasper, C. Pollock, W. E. Ernst, G. K. Ernst,...»

«BRAIN MAGNETIC RESONANCE IMAGE LATERAL VENTRICLES DEFORMATION ANALYSIS AND TUMOR PREDICTION Kai Xiao1, Sooi Hock Ho2, Aboul Ella Hassanien3 1,2 Faculty of Engineering and Computer science, University of Nottingham, Malaysia Campus, 43500 Selangor, Malaysia, Email: kcxk2x@nottingham.edu.my Faculty of Engineering and Computer science, University of Nottingham, Malaysia Campus, 43500 Selangor, Malaysia, Email: Ho.Sooi-Hock@nottingham.edu.my Information Technology Department, Faculty of Computer...»

«Level crossing collision between Insert document title train 3PW4 and a motor vehicle Location | Vic | 25 May 2012 Werribee, Date ATSB Transport Safety Report Investigation [Insert Mode] Occurrence Investigation Rail Occurrence Investigation RO-2012-007 XX-YYYY-#### Final – 18 December 2013 Cover photo: Transport Safety Victoria on behalf of the ATSB. Released in accordance with section 25 of the Transport Safety Investigation Act 2003 Publishing information Published by: Australian Transport...»

«NATA Board of Directors Meeting NATIONAL ATHLETIC TRAINERS’ ASSOCIATION, INC. Board of Directors Minutes Renaissance Grand Hotel, Majestic Ballroom C St. Louis, MO June 22, 2015 Directors Staff Jim Thornton, MA, ATC, CES David Saddler Tim Weston, MEd, ATC Rachael Oats, CAE Michael Goldenberg, MS, ATC Amy Callender Pat Aronson, PhD, ATC, PTA Kathy Crelly Tory Lindley, MS, ATC Anita James, CMP Eric McDonnell, MEd, ATC, LAT Katie Scott, MS, ATC Kathy Dieringer, EdD, ATC, LAT Kandy Cefoldo Guests...»

«“A Gateway to Reconnect; Mount St. Mary’s Traditions” By Isabella Notar ABSTRACT: This year Mount Saint Mary’s University is honoring its Bicentennial and NCAA Basketball Championship. The 80th Anniversary of the Mount’s sister church in China, however, is unknown and uncelebrated. As part of the Bicentennial and President Powell’s challenge to recognize the Mount’s religious past in China, I traveled this summer to the Chinese town where Bishop James Walsh lived to try to find...»

«A new(er) dimension to online learning communities: using web tools to engage students Lisa Cluett Student Services The University of Western Australia Judy Skene Student Services The University of Western Australia Postal address: M302, Student Services, The University of Western Australia, 35 Stirling Highway, Nedlands WA. 6009 Ph: 08 6488 2404 Fax: 08 6488 1119 Email: lisa.cluett@uwa.edu.au Dr Lisa Cluett Learning Skills Adviser Student Services The University of Western Australia As part of...»

«2016 WINTER CONFERENCE December 1 – 4, 2016 – Rochester, NY 81st Annual Winter Conference Sessions, Showcases, Concert Schedule *Disclaimer: Please note that the information presented here was as accurate as possible at the time of printing. Final session, showcase, guest performance date/time/ location information is subject to change prior to the start of the conference. Please consult the 2016 NYSSMA®Winter Conference Program Guide (available at the Winter Conference Registration Booth)...»

«Faculty of Concordia Seminary. Faithful To Our Calling, Faithful To Our Lord. Part 2. St. Louis: Concordia Seminary, [1973]. Public Domain. FAITHFUL TO OUR CALLING FAITHFUL TO OUR LORD AN AFFIRMATION IN TWO PARTS BY THE FACULTY OF CONCORDIA SEMINARY PART II I BELIEVE Personal Confessions of Faith and Discussion of Issues I BELIEVE INTRODUCTION St. Peter has told us, Be ready at all times to answer anyone who asks you to explain the hope you have in you (1 Peter 3:15). The Faculty of Concordia...»

«Date: 13/07/06 Ref: 45/4/29 Note: This letter has had personal details edited out. BERKSHIRE ACT 1986: APPEAL UNDER SECTION 37(6) (FIRE PRECAUTIONS IN LARGE STORAGE BUILDINGS) IN RELATION TO THE BOROUGH COUNCIL'S DECISION TO REJECT PLANS IN RESPECT OF THE CONSTRUCTION OF A WAREHOUSE BUILDING The appeal 1. Section 37 of the 1986 Act (Fire precautions in large storage buildings) applies to any building where more than 7,000 m3 of the building will be used for the purpose of storing or depositing...»

«TMSJ 17/2 (Fall 2006) 207-222 PROGRESSIONAL DIALOGUE & PREACHING: ARE THEY THE SAME? Richard L. Holland Director of D.Min. Studies Assistant Professor of Pastoral Ministries Under the influence of postmodernism and postconservativism, the Emerging Church is engaged in dismantling much of present-day worship practices in the loc al church. A leader in advoca ting ra dical changes in conventional preaching is Doug Pagitt in his book Preaching Re-Imagined. Pagitt’s name for traditional p...»

<<  HOME   |    CONTACTS
2017 www.thesis.dislib.info - Online materials, documents

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.