

# Processor design

## fetch

PC next instruction

- fetch instrn & inc PC [ instrn in IR ]
- use fields of instructions to select registers to read

→ done in meantime  
during exec



- execute depending on instruction

→ repeat

Ans: Why instruction mem & data mem  
(4 bytes) (different sizes)  
Either physical or logical division



Single cycle vs multicycle vs pipelined

(Entire instruction  
in 1 clock cycle)  
time as fast as  
Slowest instruction

↓  
each step  
of instruction  
In 1 clock  
cycle

↓  
process  
multiple  
instructions  
as can

instruction  
takes how much  
ever it needs  
(Dynamic  $\rightarrow$   
time/cycles)

Assembly like  
diff  
stages of  
(Fetch-  
Decode-  
Execute)  
at same time

Next lecture

Functional Elements first

Combinational Elements

(depends only on current values)

Sequential Elements (State)

(Operate on data)

(Contain state)

Eg : Comb. : ALU





## Combinational Elements

- Works as an *input  $\Rightarrow$  output function*, e.g., ALU
- Combinational logic *reads input data from one register and writes output data to another, or same, register*
  - read/write happens in a single cycle – combinational element cannot store data from one cycle to a future one



## Sequential / State Elements

define state of a machine

FF/Latch 1-bit state/memories  
elt's

## Sequential or State Elements

- State elements contain *data* in internal storage, e.g., registers and memory
- All state elements together *define the state of the machine*
  - What does this mean? Think of shutting down and starting up again...  $\begin{matrix} \text{1011} \\ \text{11} \end{matrix} \rightarrow \begin{matrix} \text{0101} \\ ? \end{matrix}$
- *Flipflops and latches* are 1-bit state elements, equivalently, they are 1-bit memories
- The *output(s)* of a flipflop or latch *always* depends on the bit value stored, i.e., its state, and can be called *1/0* or *high/low* or *true/false*
- The *input* to a flipflop or latch can change its state depending on whether it is clocked or not...

# State element

## Register file

### Arrays of DFF

R-type

\$0 - \$31

@posedge clock



### State elements on the Datapath: Register File

- Registers are implemented with arrays of D-flipflops



\$0 - \$31  
R-type

Register file with two read ports and one write port



## State elements on the Datapath: Register File

- Port implementation:



Read ports are implemented with a pair of multiplexors – 5 bit multiplexors for 32 registers



Write port is implemented using a decoder – 5-to-32 decoder for 32 registers. Clock is relevant to write as register state may change only at clock edge

Clock connected to all reg

Single  
cycle

~~Single long cycle~~



### Single Cycle Implementation

- Our first implementation of MIPS will use a single long clock cycle for every instruction
- Every instruction begins on one up (or down) clock edge and ends on the next up (or, down) clock edge
- This approach is not practical as it is much slower than a multicycle implementation where different instruction classes can take different numbers of cycles.
  - in a single-cycle implementation every instruction must take the same amount of time as the slowest instruction
  - in a multicycle implementation this problem is avoided by allowing quicker instructions to use fewer cycles
- Even though the single-cycle approach is not practical it is simple and useful to understand first.

Not practical  $\rightarrow$  too slow (you need to keep freq (slowest instruction))

## General Sequence of Instruction Execution

- For every instruction:
  - Send the *program counter* (PC) to the memory that contains the code and fetch the instruction from that memory.
  - Read one or two registers, using fields of the instruction to select the registers to read.
- For the load/store word instruction, we need to read only one register.
- Three types of instructions: memory-reference, arithmetic-logical, and branches. *bqr, j*      *lw/sw*      *add slt*
- All instruction classes, except jump, use the ALU after reading the registers.  
 $\text{as } j \rightarrow \text{just change PC}$     others need to inc PC  $\rightarrow$   
 $\text{PC} + 4$

Datapath

Store/Fetch inst  
& PC inst



Store & fetch  
inc PC

$\text{Inst} \leftarrow \text{MEM}[\text{PC}]$       [microinstructions]  
 $\text{PC} \leftarrow \text{PC} + 4$

## R type

Op<sub>c</sub>      SRL      target      dyt      shamt      func



check this  
for assembly  
Code order

### Datapath - Visualizing



add rd, rs, rt  
 $R[rd] \leftarrow R[rs] + R[rt];$

### Load store

2 regs , source offset  
base addr 16 bit offset  
ve true

16 bit  $\rightarrow$  sign extend  $\rightarrow$  32 bit

Data memory

MSB 0 all zero

MSB 1 all significant bits 1

## Datapath – Load/Store Instruction



Two additional elements used to implement load/stores

This is not opcode

blues

~~blues~~

lw rt, offset(r1)

$$r(rt) \leftarrow \text{Mem}[R[r1] + s_{\text{extend}}(\text{offset})]$$

Op rrt  $\leftarrow$  rt offset/imm16

## Datapath - Visualizing



sw rt, offset(rs)

$\text{Mem}[\text{R}[rs] + \text{sign\_ext}(\text{offset})] \leftarrow \text{R}(rt)$

RN2 is ~~get~~ used,  
Write register  
(WN)  
isn't

RD doesn't happen

WD does



Branch inst<sup>n</sup>

No shift hw reqd

simply connect wires for each shifted by 2 bits



Concatenation (MIPS branch how)

26 bits + 00  
32 bits unconditional jump

# Datapath without Branch

~~Obj exists~~

PC

inst mem



lw rt, offset(rs)



sw rt, offset(rs)

Why are we using different instruction mem & data memory

in a single cycle we need to

- (i) read from memory (instruction)
- (ii) write to memory (data)

read

So the necessary signals must be generated, all in 1 cycle

⇒ not possible (Well technically yeah but like, too complex ig)

We either both read & write (SW)  
 or both read & read (lw)  
 but in 1 cycle x D



R type or I type      x x

Lecture end (OMFG)

Single Cycle not ended //



is there an operation where we use both ALUs? I don't quite get how part



## Control Unit

Who sets?

- Control unit takes input from
  - the instruction opcode bits
- Control unit generates
  - ALU control input
  - write enable (possibly, read enable also) signals for each storage element
  - selector controls for each multiplexor

register file  
doesn't need register read  
not even

register memory (Data memory)

ALU → R type → (add, sub), ALU  
 I type → (lw, sw), base, base + offset  
 $X \overline{J} - j$   
separate add logic  
 (the top adder)



## Setting ALU Control Bits

| Instruction<br>opcode | AluOp | Instruction<br>operation | Funct Field | Desired<br>ALU action | ALU control<br>input |
|-----------------------|-------|--------------------------|-------------|-----------------------|----------------------|
| LW                    | 00    | load word                | xxxxxx      | add                   | 010                  |
| SW                    | 00    | store word               | xxxxxx      | add                   | 010                  |
| Branch eq             | 01    | branch eq                | xxxxxx      | subtract              | 110                  |
| R-type                | 10    |                          |             | add                   | 010                  |
| R-type                | 10    |                          | 100000      | add                   | 010                  |
| R-type                | 10    |                          | 100010      | subtract              | 110                  |
| R-type                | 10    |                          | 100100      | and                   | 000                  |
| R-type                | 10    |                          | 100101      | or                    | 001                  |
| R-type                | 10    | set on less              | 101010      | set on less           | 111                  |
| *                     |       |                          |             |                       |                      |

to decide  
either lw or sw we do  
JTH else.  
(lw)

| ALUOp | Funct field |        |    |    |    |    |    | Operation |
|-------|-------------|--------|----|----|----|----|----|-----------|
|       | ALUOp1      | ALUOp0 | F5 | F4 | F3 | F2 | F1 |           |
| 0     | 0           | X      | X  | X  | X  | X  | X  | 010       |
| 0     | 1           | X      | X  | X  | X  | X  | X  | 110       |
| 1     | X           | X      | X  | 0  | 0  | 0  | 0  | 010       |
| 1     | X           | X      | X  | 0  | 0  | 1  | 0  | 110       |
| 1     | X           | X      | X  | 0  | 1  | 0  | 0  | 000       |
| 1     | X           | X      | X  | 0  | 1  | 0  | 1  | 001       |
| 1     | X           | X      | X  | 1  | 0  | 1  | 0  | 111       |

bits  
res is always same (pos is same)

I type has diff opcodes

R type has diff function codes (Opcode is '0')

|        |        |       |       |       |       |       |
|--------|--------|-------|-------|-------|-------|-------|
| R-type | opcode | rs    | rt    | rd    | shamt | funct |
|        | 31-26  | 25-21 | 20-16 | 15-11 | 10-6  | 5-0   |

|                         |        |       |       |         |
|-------------------------|--------|-------|-------|---------|
| Load/store<br>or branch | opcode | rs    | rt    | address |
|                         | 31-26  | 25-21 | 20-16 | 15-0    |

- Observations about MIPS instruction format

- opcode is always in bits 31-26
- two registers to be read are always rs (bits 25-21) and rt (bits 20-16)
- base register for load/stores is always rs (bits 25-21)
- 16-bit offset for branch equal and load/store is always bits 15-0
- destination register for loads is in bits 20-16 (rt) while for R-type instructions it is in bits 15-11 (rd) (will require multiplexor to select)

for selecting reg to write to (either rt or rd)



Adding control to the MIPS Datapath III (and a new multiplexor to select field to specify destination register): what are the functions of the 9 control signals?

## Control Signals

| Signal name | Effect when deasserted                                                                             | Effect when asserted                                                                                   |
|-------------|----------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
| RegDst      | The register destination number for the Write register comes from the <b>rt</b> field (bits 20-16) | The register destination number for the Write register comes from the <b>rd</b> field (bits 15-11)     |
| RegWrite    | None                                                                                               | The register on the Write register input is written with the value on the Write data input             |
| ALUSrc      | The second ALU operand comes from the second register file output (Read data 2)                    | The second ALU operand is the sign-extended, lower 16 bits of the instruction                          |
| PCSrc       | The PC is replaced by the output of the adder that computes the value of $PC + 4$                  | The PC is replaced by the output of the adder that computes the branch target                          |
| MemRead     | None                                                                                               | Data memory contents designated by the address input are put on the first Read data output             |
| MemWrite    | None                                                                                               | Data memory contents designated by the address input are replaced by the value of the Write data input |
| MemtoReg    | The value fed to the register Write data input comes from the ALU                                  | The value fed to the register Write data input comes from the data memory                              |

## Datapath with Control - II

2 new things  
ALU OP & branch



MIPS datapath with the control unit: input to control is the 6-bit instruction opcode field, output is seven 1-bit signals and the 2-bit ALUOp signal



## Determining control signals for the MIPS datapath based on instruction opcode

| Instruction | RegDst | ALUSrc | Memto-Req | Reg Write | Mem Read | Mem Write | Branch | ALUOp1 | ALUp0 |
|-------------|--------|--------|-----------|-----------|----------|-----------|--------|--------|-------|
| R-format ✓  | 1 ✓    | 0      | 0         | 1         | 0        | 0         | 0      | 1 ✓    | 0 ✓   |
| lw ✓        | 0      | 1      | 1         | 1         | 1        | 0         | 0      | 0      | 0 ✓   |
| sw ✓        | X      | 1      | X         | 0         | 0        | 1         | 0      | 0      | 0 ✓   |
| beq ✓       | X      | 0      | X         | 0         | 0        | 0         | 1      | 0 ✓    | 1 ✓   |

it's kinda hard to memorize all this, revise this twice. After that's it's quite logical anyways

Why do you have an ALUop and function code, why not just one ;-?

**Ans:**

- i) As this is multilevel decoding  
-- it's easier to design control logic this way

RIDGE

Opcode is completely decoded in CISC

(not really having control signals)

- ii) Several instructions have very similar datapaths, instead of designing a single opcode and having a very large decoding unit we can divide them.

rdst (not rd!)

Opcode → Control signals → rd memread



ALU operations

→ 2 bits of ALU opt } 3 bits of ALU  
6 bits of funct code } control

\* making the control logic is  
outside the scope, in appendix B

→ Clock is inside control unit mostly

Sometimes

@ each the edge control signal generated

Sometimes not

↳ here clock is outside  
(in some archs)

↳ here clock  
is inside  
control unit

so some synchronization is required

(Like essentially, it depends on implementation)