



## Logic Gates



## Multiplexer (MUX):

| S | D <sub>0</sub> | D <sub>1</sub> | Y |
|---|----------------|----------------|---|
| 0 | 0              | 0              | 0 |
| 0 | 0              | 1              | 1 |
| 0 | 1              | 0              | 0 |
| 1 | 0              | 0              | 0 |
| 1 | 0              | 1              | 0 |
| 1 | 1              | 0              | 1 |
| 1 | 1              | 1              | 1 |



In general, wider ( $2^n : 1$ ) multiplexers can be achieved by two level logic, an array of tri-state buffers with a ( $n : 2^n$ ) decoder, or in a hierarchical design.

## AND:



## OR:



## XOR:



## Tri-State Buffer:



## Combinational Building Blocks

### Decoder (DEC):

|                |                |                |                |                |                |
|----------------|----------------|----------------|----------------|----------------|----------------|
| A <sub>1</sub> | A <sub>0</sub> | Y <sub>3</sub> | Y <sub>2</sub> | Y <sub>1</sub> | Y <sub>0</sub> |
| 0              | 0              | 0              | 0              | 1              | 0              |
| 0              | 1              | 0              | 0              | 1              | 0              |
| 1              | 0              | 0              | 1              | 0              | 0              |
| 1              | 1              | 1              | 0              | 0              | 0              |

Decoder 10-2:0

The outputs of a decoder are also the corresponding minterms.



## Arithmetic

### Half Adder:



### Full Adder:



### Carry Propagation Adders (CPA):



### Carry Lookahead Adder:



For  $N > 16$  a CLA is generally faster, but still scales linearly with respect to propagation delay.

### (Parallel) Prefix Adder (PPA):



### Carry Save Adder (CSA):



### Left Shift:



### Right Shift:



### Arithmetic Right Shift:



### Comparator:



## Parallel Multiplier:



$$\begin{array}{r}
 1 & 0 & 1 & 0 \\
 \times & 0 & 1 & 0 & 1 \\
 \hline
 1 & 0 & 1 & 0 \\
 0 & 0 & 0 & 0 \\
 1 & 0 & 1 & 0 & \text{Partial} \\
 + & 0 & 0 & 0 & 0 & \text{Products} \\
 \hline
 0 & 1 & 1 & 0 & 0 & 1 & 0
 \end{array}$$

## Synchronous Sequential Logic

- Synchronous sequential logic circuits consist of interconnected circuit elements such that:
- Every circuit element is either a register or a combinational circuit.
  - At least one circuit element is a register.
  - All registers receive the same clock signal.
  - Every cyclic path contains at least one register.

## Sequential Building Blocks

### SR Latch:



### D Latch:



### D Flip-Flop:



A register is an array of D Flip-Flops that all have the same clock signal.

## Finite State Machines

### Mealy Machine:

- $(S, S_0, \Sigma, \Lambda, T, G)$
- $S$ : Finite set of states
  - $S_0 \in S$ : Initial State
  - $\Sigma$ : Input Alphabet
  - $\Lambda$ : Output Alphabet
  - $T: S \times \Sigma \rightarrow S$ : Transition Function
  - $G: S \times \Sigma \rightarrow \Lambda$ : Output Function



### Moore Machine:

- $(S, S_0, \Sigma, \Lambda, T, G)$
- $S$ : Finite set of states
  - $S_0 \in S$ : Initial State
  - $\Sigma$ : Input Alphabet
  - $\Lambda$ : Output Alphabet
  - $T: S \times \Sigma \rightarrow S$ : Transition Function
  - $G: S \rightarrow \Lambda$ : Output Function



### Timing:



## Transistors

MOSFETs (Metal Oxide Semiconductor Field Effect Transistor):



NOT:



## Logic Level Noise Margins



## Memories

- Flip-Flops / Latches (Registers)**
  - Very fast, parallel access
  - Expensive (2+ transistors per bit)
- SRAM (Static RAM)**
  - relatively fast, only one data word at a time
  - less expensive (6 transistors per bit)
- DRAM (Dynamic RAM)**
  - slower, reading destroys content (refresh), one data word at a time
  - cheaper (4 transistors per bit)
- Other (HDD, Flash, ...)**
  - much slower
  - no transistors involved, non-volatile



Transmission Gate: Tri-State Buffer (inverted): General:



## MIPS

Instruction Formats:

R-Type:

| opcode            | rs                   | rt                   | rd                   | shamt                | funct              |
|-------------------|----------------------|----------------------|----------------------|----------------------|--------------------|
| 31 ... (6bits)... | 26 25 ... (5bits)... | 21 20 ... (5bits)... | 16 15 ... (5bits)... | 11 10 ... (5bits)... | 6 5 ... (6bits)... |

I-Type:

| opcode            | rs                   | rt                   | imm                    |
|-------------------|----------------------|----------------------|------------------------|
| 31 ... (6bits)... | 26 25 ... (5bits)... | 21 20 ... (5bits)... | 16 15 ... (16 bits)... |

J-Type:

| opcode            | addr                  |
|-------------------|-----------------------|
| 31 ... (6bits)... | 26 25 ... (26bits)... |

Instructions:

| Mnemonic | Name                       | Type | Operation                                                      | Op | Ums | Fmt | Notes                             |
|----------|----------------------------|------|----------------------------------------------------------------|----|-----|-----|-----------------------------------|
| add      | Add                        | R    | $rd = rs + rt$                                                 | ✓  | 0   | 20  |                                   |
| addu     | Add Unsigned               | R    | $rd = rs + rt$                                                 | 0  | 21  |     | May generate overflow exception.  |
| addi     | Add Imm                    | I    | $rt = rs + \text{SignExtImm}$                                  | ✓  | 8   |     | Operands are considered unsigned. |
| addiu    | Add Imm Unsigned           | I    | $rt = rs + \text{SignExtImm}$                                  | -  | 9   |     |                                   |
| sub      | Subtract                   | R    | $rd = rs - rt$                                                 | ✓  | 0   | 22  |                                   |
| subu     | Subtract Unsigned          | R    | $rd = rs - rt$                                                 | 0  | 23  |     |                                   |
| and      | And                        | R    | $rd = rs \& rt$                                                | 0  | 24  |     |                                   |
| andi     | And Imm                    | I    | $rt = rs \& \text{ZeroExtImm}$                                 | C  | -   |     |                                   |
| or       | Or                         | R    | $rd = rs \mid rt$                                              | 0  | 25  |     |                                   |
| ori      | Or Imm                     | I    | $rt = rs \mid \text{ZeroExtImm}$                               | D  | -   |     |                                   |
| nor      | Nor                        | R    | $rd = \sim(rs \& rt)$                                          | 0  | 27  |     |                                   |
| sll      | Shift Left Logic           | R    | $rd = rt \ll \text{shamt}$                                     | 0  | 0   |     |                                   |
| srl      | Shift Right Logic          | R    | $rd = rt \gg \text{shamt}$                                     | 0  | 2   |     |                                   |
| sra      | Shift Right Arithmetic     | R    | $rd = rt \ggg \text{shamt}$                                    | 0  | 3   |     |                                   |
| slt      | Set Less Than              | R    | $rd = rs < rt ? 1 : 0$                                         | 0  | 2A  |     |                                   |
| slti     | Set Less Than Imm          | I    | $rt = rs < \text{SignExtImm} ? 1 : 0$                          | A  | -   |     |                                   |
| sltu     | Set Less Than Unsigned     | R    | $rd = rs < rt : PC = 4 + \text{BranchAddr}$                    | ✓  | 0   | 2B  |                                   |
| sltiu    | Set Less Than Imm Unsigned | I    | $rt = rs < \text{SignExtImm} ? 1 : 0$                          | B  | -   |     |                                   |
| beq      | Branch on eq               | I    | $\text{if } rs == rt : PC = 4 + \text{BranchAddr}$             | 4  | -   |     |                                   |
| beqz     | Branch on not eq           | I    | $\text{if } rs == rt : PC = 4 + \text{BranchAddr}$             | 5  | -   |     |                                   |
| jr       | Jump                       | J    | $PC = \text{JumpAddr}$                                         | 2  | -   |     |                                   |
| jal      | Jump and Link              | J    | $rt = \$sp, PC = \text{JumpAddr}$                              | 3  | -   |     |                                   |
| jr       | Jump Register              | R    | $PC = rs$                                                      | 0  | 8   |     |                                   |
| lw       | Load Word                  | I    | $rt = M[rs + \text{SignExtImm}]$                               | 23 | -   |     |                                   |
| lhu      | Load Half Unsigned         | I    | $rt = M[rs + \text{SignExtImm}] \& 15\text{b}0$                | 25 | -   |     |                                   |
| lbu      | Load Byte Unsigned         | I    | $rt = M[rs + \text{SignExtImm}] \& 1\text{b}0$                 | 24 | -   |     |                                   |
| lui      | Load Upper Imm             | I    | $rt = \$imm, 16\text{b}0$                                      | F  | -   |     |                                   |
| sw       | Store Word                 | I    | $M[rs + \text{SignExtImm}] = rt$                               | 28 | -   |     |                                   |
| sh       | Store Half                 | I    | $M[rs + \text{SignExtImm}] \& 15\text{b}0 = rt \& 15\text{b}0$ | 29 | -   |     |                                   |
| sb       | Store Byte                 | I    | $M[rs + \text{SignExtImm}] \& 1\text{b}0 = rt \& 1\text{b}0$   | 28 | -   |     |                                   |
| mfhi     | Move from Hi               | R    | $rd = hi$                                                      | 0  | 40  |     |                                   |
| mflo     | Move from Lo               | R    | $rd = lo$                                                      | 0  | 42  |     |                                   |
| mult     | Multiply                   | R    | $hi, lo_3 = rs * rt$                                           | 0  | 18  |     |                                   |
| multu    | Multiply Unsigned          | R    | $hi, lo_3 = rs * rt$                                           | ✓  | 0   | 43  |                                   |

## Assembly

.data  
static var: .word 0 // 4 bytes

.text

.global main

main:

```
addi $0 $0 0 // 1st arg
addi $1 $1 0 // 2nd arg
addi $2 $2 0 // 3rd arg
addi $3 $3 0 // 4th arg
addi $4 $4 -24 // increase stack for args
sw $4 16($sp) // 5th arg
sw $5 20($sp) // 6th arg
jal func // call function
addi $6 $6 $0 // decrease stack
addi $6 $6 0 // return value
end: j end // halt loop
```

func:

addi \$sp \$sp -36 // push stack frame

sw \$0 60(\$sp) // save registers

sw \$0 24(\$sp)

sw \$7 52(\$sp)

sw \$fp \$fp 36(\$sp)

addi \$fp \$fp 36 // set frame ptr

addi \$0 \$0 0 // 1st arg

addi \$1 \$1 0 // 2nd arg

addi \$2 \$2 0 // 3rd arg

addi \$3 \$3 0 // 4th arg

lw \$4 4(\$sp) // 5th arg

lw \$5 20(\$sp) // 6th arg

// body

addi \$0 \$t6 0 // return value

lw \$fp 56(\$sp) // restore registers

lw \$7 52(\$sp)

lw \$0 24(\$sp)

lw \$ra 60(\$sp)

addi \$sp \$sp 36 // pop stack frame

jr \$ra // return to caller

Stack frame:



Memory Layout:



## Single Cycle Microarchitecture



| Instruction | Op     | RegWrite | MemToReg | ALUSrc | Branch | MemWrite | MemToMem | ALUOp | Jump |
|-------------|--------|----------|----------|--------|--------|----------|----------|-------|------|
| R-Type      | 000000 | 1        | 1        | 0      | 0      | 0        | 0        | 10    | 0    |
| lw          | 100011 | 1        | 0        | 1      | 0      | 0        | 1        | 00    | 0    |
| sw          | 101011 | 0        | X        | 1      | 0      | 1        | X        | 00    | 0    |
| beq         | 000100 | 0        | X        | 0      | 1      | 0        | 0        | 01    | 0    |
| addi        | 001000 | 1        | 0        | 1      | 0      | 0        | 0        | 00    | 0    |
| j           | 000010 | 0        | X        | X      | X      | 0        | X        | XX    | 1    |



Critical Path:  
 $T_c = t_{pcq\_PC} + t_{mem} + \max(t_{RF\_read}, t_{ext}, t_{mux}) + t_{ALU} + t_{mem} + 2t_{mux} + t_{RF\_setup}$   
 Time to execute N Instructions =  $N \cdot \frac{CPT}{T_c}$

## Pipelined Architecture



$$T_c > \max($$

$$\begin{aligned} &t_{pcq} + t_{mem} + t_{setup}, \\ &2(t_{RF\_read} + t_{mem} + t_{ext} + t_{AND} + t_{mux} + t_{setup}), \\ &t_{pcq} + t_{mem} + t_{ALU} + t_{setup}, \\ &t_{pcq} + t_{mem} + t_{setup}, \\ &2(t_{pcq} + t_{mem} + t_{RF\_write}) \end{aligned}$$

$$\begin{aligned} &\text{fetch} \\ &\text{decode} \\ &\text{execute} \\ &\text{memory} \\ &\text{writeback} \end{aligned}$$

//Forwarding Logic:

```
assign ForwardAD = (RD1==RLE) & (RD2==RLE) & RegWriteE
assign ForwardBD = (RD1==RLE) & (RD2==RLE) & RegWriteE & RegWriteM
```

//Stalling Logic:

```
assign ldstall = ((RD == RLE) | (RD == RLE)) & MemToRegE
assign branchstall = (BranchD & RegWriteE & (WriteRegE == RD) | WriteRegE == RD) |
    (BranchD & MemToRegM & (WriteRegM == RD) | WriteRegM == RD)
```

//Stall signals:

```
assign StallF, StallD, FlushE = ldstall | branchstall
```

## Hazards

► Data Hazard: Data for next instruction is not ready  
► Control Hazard: Next Instruction is unknown

## Pipeline:



## Branch Resolution:



## Data Forwarding:



## Pipeline Stalling:



## Verilog:

Comments:  
// One-liner  
// Multiple  
lines //

Numeric Constants:  
8'b\_0110\_1010 // binary  
8'd152 // octal  
8'd106 // decimal  
8'h6A // hexadecimal  
"j" // ASCII  
7'bz // High Impedance

Nets and Variables:  
wire [3:0] w; // assign outside always  
reg [3:0] r; // assign inside always  
reg [2:0] men [3:0];

Parameters:  
parameter N=8;  
localparam S=2'd3;

Assignments:  
assign Output = A\*B;  
assign Zc,D3 = 2'd15&[1:C[3:0],C[2:0],E[3];

Operators:  
A[D:1][N:H] // Select  
&A, ~&A, IA, ~IA, IA, ^A // Reduction  
!A, ~A // Complement  
+A, -A // Unary  
{A,...,B} // Concatenate  
ENLA3 // Replicate  
A\*B, A/B, A%B, A+B, A-B // Arithmetic  
A<B, A>B, A>>B // Shift  
A>B, A>=B, A<=B, A=B // Relation  
A&B, A|B, A^B, A~B // Bitwise  
A&B, A|B // Logical  
A?B:C // Conditional

## Module:

```
module MyModule
#(parameter N=8) // optional parameter
(input rst, clk,
output [N-1:0] out);
  //impl
endmodule
```

## Case:

```
always @(*) begin
  case (num)
    2'b00: A = 8'd3; //blocking
    2'b01,
    2'b10: A = 8'd108;
    2'b11: A = 8'd2;
  endcase
end
```

## always @(\*):

```
caseex (dec)
  4'b0xxx: enc = 2'b00;
  4'b00xx: enc = 2'b01;
  4'b001x: enc = 2'b10;
  4'b0001: enc = 2'b11;
  default: enc = 2'b00;
endcase
```

## Generate:

```
genvar j;
wire [1:N]out [3:0];
generate
  for(j=0; j<20; j=j+1)
    begin: Gen_Modules
      MyModule #(N(1)) mod(
        .rst(RST), .clk(CLK),
        .out(out[j]));
    end
endgenerate
```

## Synchronous:

```
always @ (posedge clk) begin
  if(rst) B <= 0; // (non-blocking)
  else B <= B + 1'b1;
end
```

## Loop:

```
always @ (n) begin
  count = 0;
  for(j=0; j<8; j=j+1)
    count = count + input[i];
end
```

## Function:

```
function [3:0]F;
  input [3:0]A;
  input [3:0]B;
  begin
    F = {A[3]&B[1], B[2]&D[2]3};
  end
endfunction
```

## State Machine:

```
reg [3:0]state;
parameter S0 = 2'b00,
S1 = 2'b01,
S2 = 2'b10,
S3 = 2'b11;
```

## always @ (posedge clk):

```
if (rst) state = S0;
else case(state)
  S0: state = S1;
  S1: state = S2;
  S2: state = S3;
  S3: state = S0;
endcase
```

## end

## Testbenches:

```
'timescale 1ns/1ps // (unit time) / (resolution)
initial
always
#10 // delay by 10ns
$display ("%b bin, %h hex, %d dec, %o oct, %c ASCII, %s str, %t time",
...,$time)
$readmemh("filename", testbuf)
$finish
```

```
module d_latch(input d,en,rstn,
                  output reg q);
  always @ (en or rstn or d)
    if (rstn) q <= 0;
    else if (en) q <= d;
endmodule
```

```
module d_flipflop(input d,rstn,clk,
                   output reg q);
  always @ (posedge clk or negedge rstn)
    if (rstn) q <= 0;
    else q <= d;
endmodule
```

## Functional Verification:

