

Tournament Branch Predictor:

Inputs: PC\_1 (from IF/ID reg), // PC value from IF  
PC\_2 (from EX/MEM reg), // PC value from EX\_MEM reg  
alu\_out (from EX/MEM reg), // calculated offset for new address (jal/jalr)  
branch\_pcmux\_out (from reg), // next address for branches  
br\_type (from EX/MEM reg), // whether branch is direct (br)/indirect (jal/jalr)  
is\_stalled, // state of pipeline  
br\_en\_in // evaluated br\_en from EX/MEM pipeline reg  
br\_local\_prediction, // T/NT from local predictor  
br\_global\_prediction, // T/NT from global predictor  
/\* might be more efficient to pass in the entire pipeline register struct \*/  
if\_id\_reg // struct  
ex\_mem\_reg // struct

Outputs:

br\_out, // T/NT from tournament branch prediction logic  
bad\_prediction, // neither (L | G) predictor was correct  
table\_update, // which predictor (L/G) was correct  
br\_addr\_out, // branch address out

/\* Logic needed to control counters/shift registers for local predictor and global predictor \*/  
if !is\_stalled and ex\_mem\_reg.opcode is op\_br:  
    load global pattern history registers and allow history pattern tables (counters) to  
be incremented or decremented  
    if (br\_type == BR)  
        use branch\_pcmux\_out value from datapath as target address  
    else if (br\_type == JMP)  
        use alu\_out from EX/MEM pipeline register with computed offset  
    if (br\_en\_in from EX/MEM pipeline reg matches br\_global\_prediction):  
        increment tournament predictor counters  
    else if (br\_en\_in from EX/MEM pipeline reg matches br\_local\_prediction):  
        decrement tournament predictor counter  
    else if ex\_mem\_reg.opcode is a branch and br\_en\_in differs from prediction:  
        if br\_local\_predictor was right: decrement tournament counter  
        else if br\_global\_predictor was right: increment tournament predictor counter  
    if if\_id\_reg.opcode is a branch and if\_id\_reg.pc\_address is cached:  
        if global branch predictor was WT/T: update global predictor  
        if local branch predictor was WT/T: update local predictor  
    else if if\_id\_reg.opcode is jal/jalr and if\_id\_reg.pc\_address is cached:  
        update both local and global predictors  
    if tournament branch predictor was WT/T:  
        br\_out = br\_global\_prediction  
        update global predictor  
    else if tournament branch predictor was WNT/NT:  
        br\_out = br\_local\_prediction  
        update local predictor



Local Branch P

Eviction write Buffer:

Cache: read\_i, write\_i, resp\_output, rdata\_output, addr\_input, wdata\_input

Physical memory: resp\_input, rdata\_i, addr\_output, wdata\_output, read\_output, write\_output, resp\_input

Start:

Read\_instruction:

if(hit)

Stay in idle, set resp\_out to high and rdata\_out to write\_instruction

Else

Stay in idle, set resp\_out to resp\_in and set output to input

Write\_instruction:

Go to Pause state and set resp\_output to high

Write:

Sends buffer to memory and handles EWB hits

Go to idle state

stalling:

if(read)

Goes to read state

if(write\_instruction)

state goes to write

Else

Stay in pause

Waits for cache to finish reading or raises a write\_input, write back happens when a read is finished.

Read:

Go to read when resp\_i is low otherwise go to write

Sets output to input





8-way PLRU



module LRU;

inputs: CLK, rst, load, [2:0] LW, [2:0] LW

outputs: [2:0] OUT + OUT

A 3 level tree from perspective of implementation,  
there will be three levels to the tree.

module Node

input: CLK, rst, load, LA, currentway,

outputs: left, right



### Left and Right Policy

Set Left = high when Load, Load LA  
are high, otherwise Set Left to low  
Set Right = high vice versa of Left

Each way will be a node, we load LA  
into current-way when Load is high, otherwise  
maintain LA's value in a register.

Bits of LW determines which node to select at each level. LA is the most recent way. W0-w7 are outputs from each node, which is the current way value.

Progress report:

For this checkpoint Max worked on developing the data cache and instruction cache then integrated the two with the arbiter. I worked on designing advanced features, and forwarding. Michael worked on forwarding, testing forwarding and static branch predictor. We all worked on debugging and developed a testbench arbiter for verification. We are still currently testing our checkpoint at this time. Some of the bugs that we found were interfacing with the RISC-V monitor, the arbiter wasn't latched correctly. Another bug when an instruction read and instruction write occurred at the same time, the solution was to reevaluate the state machine.

Roadmap:

For the next checkpoint we plan on implementing PLRU, eviction write back buffer, and tournament branch predictor, and pipelined caches. I will implement the PLRU, Michael is going to implement the eviction write back buffer and pipeline caches, and Max is going to implement the tournament branch predictor.

