

**CS320 Fall 2014**  
**Homework#5**  
**Quiz in Lecture Thursday October 30, 2014**  
**There are no make-up quizzes!**

**Problem 1: Datapath timing**

Assuming the following timings for the components of the datapath. All other components have no delay.

- Data/Instruction Memory Read: 200ps
- Data Memory Write: 100ps
- Register File Read: 100ps
- Register File Write: 50ps
- Adders: 80ps
- ALUs: 200ps
- Logical Shifter/Sign Extension: 30ps
- Multiplexors and other gates: 10ps

- a. Calculate the delay for each single cycle instruction type (R-type, lw, sw, beq, j). **See attached pages**
- b. Calculate the delay for each clock cycle of the finite state machine (each stage of each instruction, i.e. each bubble). **See attached pages**
- c. What is the minimum clock cycle period at which the multi-cycle datapath can operate properly? **270ps**  
What is the length of each instruction type in ps (R-type, lw, sw, beq, j)?
  - lw  $270 \times 5 = 1350$ ps
  - sw/r-type  $270 \times 4 = 1080$  ps
  - beq/j  $270 \times 3 = 810$  ps

**Problem 2: MIPS Execution**

Based on the timings from Question 1, examine the following MIPS program P.

```

add $t0, $s4, $s3
and $t2, $t0, $s1
add $t1, $t1, $t1
add $t2, $t2, $t2
lw $t3, OPTION
beq $s0, $t3, PARSE_WRITE_SP_LE
and $t0, $t1, 0xff
and $t1, $t1, 0xff00
or $t1, $t1, $t0
and $t0, $t2, 0xff
and $t2, $t2, 0xff00
or $t2, $t2, $t0
PARSE_WRITE_SP_LE:
sw $t1, 0($a1)
lw $v0, WRITE_FILE
sw $t2, 0($a1)

```

- a. In the multicycle datapath, in which clock cycle is the `lw $t3, OPTION` instruction fetched? **17<sup>th</sup> cycle**
- b. In the multicycle datapath, how many cycles does it take Program P to execute, assuming the branch is not taken? **61 clock cycles**
- c. Which values (eg. MEM[\$s0], branch address, Instruction[15:0], etc) are stored in the A, B, ALUOut, and MDR registers during the execution cycle of the following instruction: `sw $t1, 0($a1)`

A: `Reg[rs] = Reg[$a1]`

B: `Reg[rt] = Reg[$t1]`

ALUOut: `Reg[rs] + Instr[15:0] = Reg[$a1] + 0`

MDR: Value read from memory (since Mem Rd = 0, it is the value last time read from the memory), which is the `sw` instruction itself.

Rtype

570ps

Critical path:  
 $PC \rightarrow \text{Imem} \rightarrow \text{Reg Rd} \rightarrow \text{ALUSrc} \rightarrow \text{ALU} \rightarrow \text{MemRd}$   
 $\rightarrow \text{Reg Write}$



LW 760ps  $\xrightarrow{\text{Rd}}$   $PC \rightarrow \text{Imem} \rightarrow \text{Reg Rd} \rightarrow \text{ALU} \rightarrow \text{D Mem} \rightarrow \text{MemtoReg} \rightarrow \text{RegWrite}$



SW

600ps  $\xrightarrow{\text{PC} \rightarrow \text{Imem}} \text{Reg Rd} \xrightarrow{\text{ALU}} \text{Dmem Write}$



Beg

540ps  $\xrightarrow{\text{PC} \rightarrow \text{Imem}} \text{Reg Rd} \xrightarrow{\text{ALUSrc}} \text{ALU} \xrightarrow{\text{AND gate}} \text{MUX} \xrightarrow{\text{mux}} \text{Jump}$



Jump

240ps PC → INem Rd → Shift left → Jump



# Stage 0: Instruction Fetch

$PC \rightarrow ALUSrcA \rightarrow ALU \rightarrow PCsource \rightarrow PC$  210ps



# Stage 1: Decode

IR → Sign Extend → Shift left 2 → ALUSrcB → ALU → ALUout

270ps



Stage 2:

IR  $\rightarrow$  Sign Extend  $\rightarrow$  ALUSrcB  $\rightarrow$  ALW  $\rightarrow$  ALUOut 240ps



Stage 3:

AWOut  $\rightarrow$  IorD  $\rightarrow$  Memory Rd  $\rightarrow$  MDR

210ps.



Stage 4: MDR  $\rightarrow$  Reg Write  
IR  $\rightarrow$  Reg Write

60ps



Stage 5:

ALUOut  $\rightarrow$  IorD  $\rightarrow$  Mem Write

110ps



Stage 6:

A  $\rightarrow$  ALU  $\rightarrow$  ALUOut  
B  $\rightarrow$  ALU  $\rightarrow$  ALUOut

210PS



Stage 7:

*AWOut → MemtoReg → RegFile Write*  
*IR ~~RegOut~~ → RegDst → RegFile Write*

6ops



Stage 8:

$A \rightarrow ALU \rightarrow AND\text{ gate} \rightarrow OR\text{ gate} \rightarrow PC\text{ control}$

23ops



Stage 9:

~~PC~~ | R → Shift left 2 → PC Source → PC

40ps



### Problem 3: Multi-cycle datapath

In class we covered the MIPS multi-cycle implementation also which operates for the same basic subset of MIPS instructions as in the single-cycle basic implementation. In this problem you will introduce functionality for additional MIPS instructions. Use this datapath and finite state machine to specify your changes. In all cases, try to find a solution that minimizes the number of clock cycles required for the new instruction. Explicitly state how many cycles it takes to execute the new instruction in your modified finite state machine. Use the Multi-cycle handout posted on PIAZZA.

Modify the datapath and the finite state machine to:

PLEASE IGNORE THE TIMING VALUES IN THE FOLLOWING SOLUTIONS. THEY ARE NOT CORRECT.

- Implement 'addi' instruction

$\text{Reg}[\text{rt}] \leftarrow \text{Reg}[\text{rs}] + \text{immed}[15:0];$   
# register rs is added to immediate value in the instruction and stored in rt

- Implement 'jal' instruction.

$\text{Reg}[31] \leftarrow \text{PC} + 4$  # \$ra register stores the return address  
 $\text{PC} \leftarrow \text{PC} + 4[31:28], (\text{Instruction}[25:0] \ll 2)$   
# PC is the top 4 bits of the PC+4 value concatenated with the lowest 26 bits of the jump address  
# shifted the left by 2 bits

- Implement a new 'wai' instruction. The instruction, `wai rt`, stands for "where am i" and puts the instruction address into a register specified by the rt field. (Immediate format)

$\text{Reg}[\text{rt}] \leftarrow \text{PC}$  # \$ra register stores the return address

- Implement 'bne'

$\text{if}(\text{Reg}[\text{rs}] \neq \text{Reg}[\text{rt}]) \text{ PC} \leftarrow \text{PC} + 4 + \text{Instruction}[15:0] \ll 2$

- Implement 'beqi' (Assume the rs field is a 5-bit 2's complement value)

$\text{if}(\text{Reg}[\text{rt}] == 2\text{'s complement value in rs field})$   
 $\text{PC} \leftarrow \text{PC} + 4 + \text{Instruction}[15:0] \ll 2$

- Implement 'bgtz' (Assume the rt register is set to \$0)

$\text{if}(\text{Reg}[\text{rs}] \geq 0) \text{ PC} \leftarrow \text{PC} + 4 + \text{Instruction}[15:0] \ll 2$

- Implement a new 'incbeq' instruction. The instruction, `incbeq rs rt label`, stands for "increment and branch on equal". Increment always occurs!

$\text{if}(\text{Reg}[\text{rs}] == \text{Reg}[\text{rt}]) \text{ PC} \leftarrow \text{PC} + 4 + \text{Instruction}[15:0] \ll 2$   
 $\text{Reg}[\text{rs}] \leftarrow \text{Reg}[\text{rs}] + 4$  #the increment always occurs

- Implement a new 'sneg' instruction. The instruction, `sneg rs`, stands for "set on negative". (Immediate format)

$\text{if } (\text{Reg}[\text{rs}] < 0) \text{ Reg}[\text{rt}] = 1, \text{ else } \text{Reg}[\text{rt}] \leftarrow 0$

- Implement a new 'lwdec' instructon. The instruction `lwdec rs offset(rt)`, loads a value and decrements the rt value by one memory word.

$\text{Reg}[\text{rt}] \leftarrow \text{Mem}[\text{Reg}[\text{rs}] + \text{sign-extended offset}]$   
 $\text{Reg}[\text{rs}] \leftarrow \text{Reg}[\text{rs}] - 4$

- Implement a new 'swr' instruction. The instruction `swr rd rt(rs)`, uses a register value for the offset when calculating the memory address to store data to.

$\text{Mem}[\text{Reg}[\text{rs}] + \text{Reg}[\text{rt}]] = \text{Reg}[\text{rd}]$

- Implement a new 'slti' instruction. The instruction `slti rs rt`, immediate sets the rs register to the immediate value if the value in register rs is less than register rt.

$\text{if}(\text{Reg}[\text{rs}] < \text{Reg}[\text{rt}])$   
 $\text{Reg}[\text{rs}] \leftarrow 2\text{'s complement immediate value}$

ADDI:

No datapath changes.



jal





wai: R-type format  
No Datapath Changes



bne:



BNE:



begi





bgtz

# Assume rt = \$0





incbeq rs, rt, label



incbeg



11Ops

16Ops

Sneha



32  
zero extend  
A [msb]  
1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0



LDDec.



LD Dec.



swr rd, rt(rs)

$$\text{Mem}[\text{Reg}[rs] + \text{Reg}[rt]] = \text{Reg}[rd]$$



new Stage 110 ps

Swr







Problem 4: Show how the jump register instruction 'jr' can be implemented to the multi-cycle datapath by simply making changes to the finite state machine. (Hint: \$0 = \$zero = 0)

rt field of the instruction should be register \$0, which always holds the value 0.



Problem 5: Calculate the delay in the datapath after the addition of ALL instructions in Problem 1. Assume the same delays. What is the minimum clock cycle period at which this design can operate properly?

The answer will depend on how you implemented each instruction

Problem 6: Consider changes to the original multi-cycle datapath that alters the register file so that it has only one read port. Describe any changes that will need to be made to the datapath in order to support this modification. Use the datapath diagram to illustrate the changes. Modify the finite state machine to indicate how the instructions will work given your new datapath.

You will need to add control lines to Write registers A and B and a multiplexer (with a control line) to select between the Rs and Rt fields of the instruction.

State 1 is modified with select Rs and WriteA asserted. A new state has to be added before states 6 and 8 to select Rt and WriteB. After the new state control branches to either state 6 or state 8 based upon the op-code. Other states are not affected as on R-Format and branch instruction use input from two registers. Additionally, State 2 must be modified to additionally read the Rt register and store the contents in B. While, lw does not require to read Rt, the store word does. By making the modification to State 2, where the Register file is not

being used, we do not increase the number of clock cycles required for a sw instruction. Control signals to select Rt and WriteB must be specified.

Problem 7: Consider eliminating the two shift left by 2 units in the multicycle datapath. Instead of these units, the ALU will be used for shifting the values instead. What impact does this have on the original 5 instructions (lw, sw, R-type, beq, j)?

If the shift left by 2 is eliminated, then any time this happens will require that the ALU is used.

First, since the Decode stage adds the immediate value shifted left by 2 to calculate the branch address, now a stage in between the Fetch and Decode stages would need to be added to do the shifting of the sign extended value. The value would be in ALUOut and the input 3 of ALUSrcB would need to be modified to be the value from the ALUOut register.

The second Shift left unit is used in stage 9 for the Jump instruction. If this was removed, the datapath would need to go through the ALU and the PCSource must could eliminate input 2, and use the value out of the ALU instead.

Problem 8: In the single cycle datapath control unit for the original 5 instructions (lw, sw, R-type, beq, j), the MemtoReg control signal can be eliminated and the MemRead or ALUSrc control signals can be used to control the multiplexor instead. This reduces the number of control signals required (less logic gates to implement).

- a. Which other signals in the single cycle datapath can be eliminated and replaced by another existing control signal, or the inverse (NOT) of the signal.

RegDst and ALUOp1

RegDst & MemRead'

Branch & AluOp0

- b. Are there any signals in the multi-cycle datapath (basic 5 instructions only) which can be eliminated and replaced?

Yes, MemtoReg is only used in Stage 4 & 7. It can be replaced with RegDst' (which is also only used in Stage 4 & 7)