

## NPTEL Verilog

- Feature size → smallest feature of the transistor that you can fabricate.

### Steps in design flow

#### • Behavioral design

→ specify the design in terms of behavior  
(Boolean expressions or truth tables  
FSM, etc.)

#### • Data path design



#### • Logic design

generate a netlist of gates/flip flops or standard cell.

minimizing gates, no. of gate levels & signal transmission activity

#### • Physical design & Manufacturing

• fabrication

• Alternatively, FPGA (but less speed)

## Y-diagram



\* don't care notion is '?' .

\*   
 [Module 1] { module ...  
 ;  
 end module

[module 2]

[module 3]

, ASIC → Application specific integrated circuit  
FPGA → Field programmable gate array .



example

module example (A, B, C, D, E, F, Y);

input A, B, C, D, E, F;

output Y;

wire t1, t2, t3, Y;

delay

nand #1 G1 (t1, A, B);

and #2 G2 (t2, C, ~B, D);

nor #1 G3 (t3, E, F)

nand #1 G4 (Y, t1, t2, t3);

end module

We can combine the same type of gate ~~together~~ together

nand #1 G1 (t1, A, B);

AND (Y, t1, t2, t3);

module testbench;

reg A, B, C, D, E, F;

wire Y;

example DUT (t1, B, C, D, E, F, Y);

initial

binary

if any of begin  
change, it will monitor (\$time, "A=%b, B=%b,  
is show C=%b, D=%b, E=%b, Y=%b");  
A, B, C, D, E, F, Y);

#5 A=1; B=0; C=0; D=1; E=0; F=0;

#5 A=0; B=0; C=1; D=1; E=0; F=0;

#5 A=1; C=0;

#5 F=1;

#5 \$finish;

end  
end module

Simulation results:

|    |                                                 |
|----|-------------------------------------------------|
| 0  | A = X, B = X, C = X, D = X, E = X, F = X, Y = X |
| 5  | A = 1, B = 0, C = 0, D = 1, E = 0, F = 0, Y = X |
| 8  | A = 1, B = 0, C = 0, D = 1, E = 0, F = 0, Y = 1 |
| 10 | A = 0, B = 0, C = 1, D = 1, E = 0, F = 0, Y = 1 |
| 13 | A = 0, B = 0, C = 1, D = 1, E = 0, F = 0, Y = 0 |
| 15 | A = 1, B = 0, C = 0, D = 1, E = 0, F = 0, Y = 0 |
| 18 | A = 1, B = 0, C = 0, D = 1, E = 0, F = 0, Y = 1 |
| 20 | A = 1, B = 0, C = 0, D = 1, E = 0, F = 1, Y = 1 |

Command in Verilog:

- a) iverilog -o mysim example.v example.tcl
- b) vvp mysim

~~little modifications~~

~~reg A, B, C, D, E, F;~~

module testbench;

~~reg A, B, C, D, E, F;~~  
. wire Y;

example DV7(A, B, C, D, E, F, Y);

initial

begin

```
    $dumpfile ("example.vcd");
    $dumpvars(0, testbench);
    monitor ($time, "A = %b, B = %b, C = %b,
                D = %b, E = %b, F = %b, Y = %b",
                A, B, C, D, E, F, Y);
```

```

#5 A=1; B=0; C=0; D=1; E=0; F=0;
#5 A=0; B=0; C=1; D=1; E=0; F=0;
#5 A=1; C=0;
#5 E F=1;
#5 $finish

```

end  
end module

To display the waveforms  
and the command  
• gtkwave example.vcd

stick diagram

↳ Blue lines are metal connections  
red lines are poly silicon connections  
green & brown lines are the diffusions  
Black crosses are the interconnections.



Placet  
Router  
→

• In CMOS tech, it is easier to design NAND, NOR & NOT gates.

Point to note:

- The "assign" statement represents continuous assignment whereby the variable l on the LHS gets updated whenever the expression on the RHS changes  
assign variable = expression;
- The LHS must be a "net" type variable, typically a "wire".
- The RHS can contain both "register" and "net" type variables.
- A verilog module can contain any number of assign statements; they are typically placed in the beginning after the port declarations.
- The "assign" statement models behavioral design of sequential and combinational ckt's typically used to model assignments and instantiations.

## Data types in Verilog

- 1) Net (by default are 1 bit values, - default is state).
  - must be continuously driven (output of block)
  - cannot be used to store a value
  - used to model connections b/w continuous assignments and instantiations.

### 2) Registers

- retains the last value assigned to it.
- often used to represent storage elements, but sometimes it can translate to combinational nets also.

Net

• Various 'Net' data types  $\rightarrow$  wire, wor, word, tri,  
supply0, supply1, etc.

• "wire" and "tri" are equivalent

• "wor" and "word"  $\rightarrow$  insert an OR and AND  
gate, respectively at connections  
"supply0" and "supply1" model power  
supply connections.

~~wor~~ wor f;

assign f = A & B;

assign f = C | D;

$\hookrightarrow$ , here function is realized

$$f = (A \& B) | (C \mid D)$$

in case of word '&'

• supply0 & supply1 have the greatest signal  
strength.

module using supply-wire (A, B, C, f);

input A, B, C;

output f;

supply0 gnd;

supply1 vdd;

nand A1 (f1, vdd, A, B);

xor h2 (f2, C, gnd);

and h3 (f3, f1, f2);

end module

### Value level

0

1

x

z

represents

logic 0 state

logic 1 state

unknown logic state

high impedance state

### Initialization:

- all unconnected nets are set to "z".
- all register variables set to "x".

### Register

• It can hold a value

• eg: reg : most widely used

integer : used for loop counting

real : used to store floating-point numbers

time : keeps track of simulation time

(not used in synthesis)

reg z; // Single bit register variable

reg [15:0] bus; // A 16-bit bus

• reg must be used when we model a bit sequential hardware elements like counters, shift registers, etc.

Example 32 bit counter with synchronous reset.

- Count value increases at the positive edge of the clock.
- If "rs" is high, the counter is reset at the positive edge of the next clock.

```

module simple_counter (clk, rst, count);
    input clk, rst;
    output [31:0] count;
    reg [31:0] count;

    always @ (posedge clk) → synchronous
        begin
            if (rst) → reset
                count = 32'd0;
            else
                count = count + 1;
        end
endmodule

```



if for asynchronous reset →

always @ (posedge clk or posedge rst)

- "time" data type.
- In analog simulation is carried out with respect to a logical clock called simulation time.
- The "time" data type can be used to store simulation time.
- The system function "f\_time" gives the current simulation time.
- Example :

time cur\_time;  
initial

cur\_time = f\_time;

### Values

- Are declared by specifying a range [range1:range2], where range1 is always the most significant bit & range 2 is the least significant bit.

wire [7:0] sum; // MSB is sum[7], LSB is sum[0]



~~reg~~ [1:10] data



- Parts of a vector can be addressed and used in an expression.
- - A 32 bit instruction register, that contains a 6-bit opcode, three registers for operands of 5 bits each and 11-bit offset.

$\text{reg } [31:0] \text{ IR};$

opcode =  $\text{IR}[31:26]$ ;

$\text{reg } 1 = \text{IR}[25:21];$

$\text{reg } [5:0] \text{ opcode};$

$\text{reg } 2 = \text{IR}[20:16];$

$\text{reg } [4:0] \text{ reg } 1, \text{reg } 2, \text{reg } 3$

$\text{reg } 3 = \text{IR}[15:11];$

$\text{reg } [10:0] \text{ offset}$

$\text{offset} = \text{IR}[10:0];$



$$\text{sum} = \text{IR}[25:21] + \text{IR}[19:11]$$

### Multi-dimensional Arrays & Memories

- Multi-dimensional arrays of any dimension can be declared in Verilog.
- Example:

$\text{reg } [31:0] \text{ register\_bank}[15:0];$  // 16 32-bit registers  
 in integer matrix  $[7:0][15:0];$

- Memories can be modeled in Verilog as a 2-D array of registers.

reg  $\text{mem\_bit}[0:2047];$  // 2K 1-bit words  
 $\text{reg } [15:0] \text{ mem-word}[0:1023];$  // 1K 16-bit words

## Specifying constant values

- `cylinders <size> <base> <number>`

example :

`4'b0101` // 4 bit binary number 0101  
`2'b0` // log2 0 (2bit)

`12'hB3C` // 12 bit number in hex

`12'h8xF` // 12 bit number 1000 xxxx 1111

`25` // signed number , in 32bit .

## Parameters (similar to #define in c)

~~#parameter~~ constant with given name .

~~#parameter~~ `H1 = 25, LO = 5;`

~~#parameter~~ `up = 2b'00, down = 2b'01` ;

~~#parameter~~ `RED = 3b'100, YELLOW = 3b'010;`

~~#parameter~~ module counter (`clear, clock, count`) ;

~~#parameter~~ `N = 7` ;

~~input~~ `clear, clock` ;

~~output~~ `[0:N] count ; reg [0:N] count;`

~~always @ (negedge clock)~~

~~if (clear)~~ `count <= 0;`

~~else~~ `count <= count + 1;`

~~end module~~

Predefined logic gates in verilog

List of Primitive gates

and A (out, in1, in2);

nand A (out, in1, in2);

or A (out, in1, in2);

nor A (out, in1, in2);

xor A (out, in1, in2);

xnor A (out, in1, in2);

not A (out, in);

buf A (out, in);

bufif1 A (out, in, ctrl);

bufif0 A (out, in, ctrl);

notbufif0 A (out, in, ctrl);

notbufif1 A (out, in, ctrl); } things are  
gates with  
tristate controls.

and A (a, b, c, d, e, t);

output      inputs .



if  $ctrl = 1$ ,  $out = in$   
 $= 0$ ,  $out = z$



not of 1, 2 if 0

Q3

' Deriving exclusive-or using nand gates:

' timescale 10ns / 1ns

module exclusive-or (f, a, b);

input a, b;

output f;

wire #1, #2, #3;

nand #5 m1 (#1, a, b);

and #5 m2 (#2, #1, #1);

and #5 m3 (#3, #1, b);

nor #5 m4 (f, #2, #3);

end module.



## The 'timescale' Directive

- Often in a single simulation, delay values, in one module need to be specified in terms of some time unit, while those in some other module need to be specified in terms of some other time unit.

- The 'timescale' compiler directive can be used:

' timescale compiler directive can be written as

' timescale <reference-time-unit>/<time-precision>

- The <reference-time-unit> specifies the unit of measurement for time.

- The <time-precision> specifies the precision to which the delays are rounded off during simulation.

- Valid values for specifying time unit and time precision are  $1, 10, 100$ .

ex timescale 10ns / 1ns

- If we specify #5 as delay, it will mean 50ns.  
Units are s, ms, us, ps & fs.

Specifying connectivity during instantiation :-

↳ Positional Association (same order as definition)  
e.g. nand #1 A1 (+1, A, B) .  
↳ Explicit Association

example

module example (A, B, C, D, E, F, X);

· wire #1, #2, #3, Y;

nand #1 G1 (+1, A, B);

and #2 G2 (+2, C, ~B, D);

end module

positional association

module testbench

reg x1, x2, x3, x4, x5, x6; wire OUT;

example DUT (.OUT(Y), .x1(A), .x2(B), .x3(C),  
.x4(D), .x5(E), .x6(F));

initial  
begin

explicit association

end module

## Hardware modeling laws

- In terms of the hardware realization, the value computed can be assigned to:
  - A "wire"
  - A "flip-flop" (edge triggered storage cell)
  - A "latch" (level-triggered storage cell)
- A variable in Verilog can be either "net" or "register".
  - A "net" data type always map to a "wire" during synthesis.
  - A "register" data type maps either to a "wire" or a "storage cell" depending upon the context under which a value is assigned.

```
module reg-maps-to-wire (A, B, C, f1, f2);
```

```
    input A, B, C;
```

```
    output f1, f2;
```

```
    wire A, B, C;
```

```
    reg f1, f2;
```

```
    always @ (A or B or C)
```

```
begin
```

```
    f1 = ~ (A & B);
```

```
    f2 = f1 ^ C;
```

```
end
```

```
endmodule
```

whenever A or B or C changes, always

block implements

- The synthesis system will generate a wire for f1.



module a\_problem\_case ( $A, B, C, f_1, f_2$ );  
 input  $A, B, C$ ;  
 output  $f_1, f_2$ ;  
 wire  $A, B, C$ ;  
 reg  $f_1, f_2$ ;  
 always @ (A or B or C)

```

begin
   $f_2 = f_1 \wedge f_2;$ 
   $f_1 = \sim(A \wedge B);$ 
end
endmodule
  
```

The synthesis system will generate a wire for  $f_2$ , and a storage cell for  $f_1$ .



## Venilog Operators

### Arithmatic operators

| unary     | binary |
|-----------|--------|
| + A       | A + B  |
| - B       | A - B  |
| - (A + B) | A * B  |

\*  $\%$  → modulus  
 \*  $*$  → exponentiation

### Logical operators

! → logical negation  
 & → logical AND  
 || → logical OR

### Relational operation

!= not equal  
 == equal  
 >= greater or equal  
 <= less than or equal  
 >  
 <

### Bitwise operators

$\sim$  bitwise NOT  
 & bitwise AND  
 | bitwise OR  
 $\oplus$  bitwise exclusive - OR  
 $\ominus$  bitwise exclusive - NOR

## Reduction operator



op  
single bit

wire [3:0] z; wire y;

assign y = z & x;

## Shift operators:

>> shift right  
<< shift left  
>>> arithmetic shift right

ex

wire [15:0] data, reg;

assign target = data >> 3;

shift while  
preserving the sign

## Conditional Operator:

cond Expr ? true-expr : false-expr;

↳ assign a = (b > c) ? b : c;

## Concatenation Operator: { ... , ... , ... }

Joins together bits from two or more comma-separated expressions.

// An 8 bit adder description

```
module parallel-adder (sum, cout, in1, in2, cin);
    input [7:0] in1, in2;
    input cin;
    output [7:0] sum;
    output cout;
    assign #20 {cout, sum} = in1 + in2 + cin;
endmodule
```

### Modeling

Behavioral

Structural

Starting Point

Handwritten  
implement.

Example : 16x1 mux

```
module mux16to1 (in, sel, out);
    input [15:0] in;
    input [3:0] sel;
    output out;
```

```
assign out = in[sel];
endmodule
```

module muxtest;

```
reg [15:0] A; reg [3:0] S; wire F;
```

```
mux16to1 M (.in(A), .sel(S), .out(F));
```

```

initial
begin
    $dumpfile ("");
    $dumppvars (o,
    $mon, $var ($time, "A=%h, $S=%h, F=%b",
    A, S, F));
    #5 A = 16'h3f0'a; S = 4'h0;
    #5 S = 4'h1;
    #5 S = 4'h6;
    #5 S = 4'hc;
    #5 $finish;
end
endmodule

```

Whenever there is nonconditional statement  $\rightarrow$  a mux  
 or generator.

```

module generate_decoder (out, in, select);
    input in;
    input [0:1] select;
    output [0:3] out;
    assign out [select] = in;
endmodule

```

Non const. in expression on LHS  
 generates a decoder



• module level-sensitive-latch ( $D, Q, En$ );  
 input  $D, En$ ;  
 output  $Q$ ;

assign  $Q = en ? D : Q$ ;  
 end module

| $En$ | $D$ | $Q_n$     |
|------|-----|-----------|
| 0    | x   | $Q_{n-1}$ |
| 1    | 0   | 0         |
| 1    | 1   | 1         |

• Modelling SR-latch



module SR-latch ( $Q, Q-bar, S, R$ );

input  $S, R$ ;  
 output  $Q$ ;  $Q-bar$ ;

assign  $Q = \sim(R \& Q-bar)$ ;  
 assign  $Q-bar = \sim(S \& Q)$ ;  
 end module

```
module latchtest;  
reg s,k; wire q, qbar;  
c8-latch LAT(q, qbar, s, k);
```

initial

```
begin  
$monitor($time, "s=%b, R=%b, q=%b,  
qbar=%b", s, k, q, qbar);
```

s = 1'b0; R = 1'b1;

#5 s = 1'b1; R = 1'b1;

#5 s = 1'b1; R = 1'b0;

#5 s = 1'b1; R = 1'b1;

#5 s = 1'b0; R = 1'b0;

#5 s = 1'b1; R = 1'b1;

end

end module.

, while loop

```
integer mycount;
```

initial begin  
while (mycount <= 255)

```
begin  
$display ("My count : %d", mycount);
```

mycount = mycount + 1;

end

end

## Behavioral style : Procedural Assignment

- Two kinds of procedural blocks are supported in Verilog:
  - The "initial" block
    - Executed once at the beginning of simulation
    - Used only in test benches; cannot be used in synthesis.
  - The "always" block
    - A continuous loop that never terminates
- The procedural block defines:
  - A region of code containing sequential statements
  - The statements execute in the order they are written.

### The "initial" block

- All statements inside an "initial" statement constitutes an "initial block".
  - Grouped inside a "begin ... end" structure for multiple statements.
  - The statements start at time 0, and execute only once.
  - If there are multiple "initial" blocks, all the blocks will start to execute concurrently at time 0.
- The "initial" block is typically used to write test benches for simulation:
  - Specifies the stimulus to be applied to the design-under-test (DUT)
  - Specifies how the DUT outputs are to be

displayed / handled.  
Specifies the file where the waveform  
information is to be dumped.

```
module testbench-example;  
reg a, b, cin, sum, cout;
```

initial

```
    cin = 1'b0;
```

initial

begin

```
    #5 a = 1'b1; b = 1'b1;
```

```
    #5 b = 1'b0;
```

end

initial

```
#25 $finish;
```

endmodule

### The "always" block

- An "always" statement starts at time 0 and executes the statements inside the block repeatedly, and never stops.

```
module generating_clock;
```

```
    output reg clk;
```

initial

```
    clk = 1'b0; // initialized to 0 at time 0
```

always

```
    #5 clk = ~clk; // Toggle after time 5 units
```

initial

```
#500 $finish;
```

endmodule

- "initial" and "always" blocks can coexist within the same Verilog module.
- They all execute concurrently, "initial" only once & "always" repeatedly.
- The @ (event-expression) part is required for both combinational & sequential blk descriptions.

Basic syntax of "always" block :

```
always @ (event-expression)
begin
    sequential-statement - 1 ;
    sequential-statement - 2 ;

```

```
    sequential-statement - n ;
end
```

- Only "reg" type variable can be assigned within an "initial" or "always" block.
- Basic reason :
  - The sequential "always" block executes only when the event expression triggers.
  - At other times the block is doing nothing.
  - An object being assigned to must therefore remember the last value assigned (not continuously driven).
  - So, only "reg" type variables can be assigned within the "always" block.
  - Of course, any kind of variable may appear in the event expression (reg, wire, etc).

a) begin .. end

begin  
sequential-statement-1;  
sequential-statement-2;  
sequential-statement-n;  
end

→ if  $n=1$ , "begin..  
end" is not required

b) if ... else

if ( $<\text{expression}>$ )  
seq-statement;  
else if ( $\dots$ )

a single statement or  
a group of  
statements within  
"begin .. end"

else default-statement;

c) case

case ( $<\text{expression}>$ )

expr1 : seq-statement;  
expr2 : seq-statement;  
...

exprn : seq-statement;

default : default-statement;

end case #

Two variations: "case z" and "case n"

- The "case z" statement treats all "z" values  
in the case alternatives or the case expression  
as don't care.

- The "casex" statement treats all "x" and "z" values in the case item as don't cares.

`reg [3:0] state; integer next-state;  
casex (state)`

`4'b1xxx : next-state = 0;`

`4'bxx1x : next-state = 1;`

`4'bxx1x : next-state = 2;`

`4'bxxx1 : next-state = 3;`

`default : next-state = 0;`

`endcase`

- d) "while" loop

`while (<expression>)  
sequential-statement;`

- e) "for" loop

`for (expr1; expr2; expr3)  
sequential-statement`

`integer mycount;`

`reg [100:2] data;`

`integer i;`

`initial`

`for (mycount = 0; mycount <= 255; mycount = mycount + 1)`

`$display ("My count : %d", mycount);`

`initial`

`for (i = 1; i <= 100; i = i + 1)`

`data[i] = 1'b0;`

- a) "repeat" loop → fixed no. of times
- ```
repeat (<expression>)
sequential - statement;
```
- b) "forever" loop → executes forever
- ```
forever
sequential - statement;
```
- until \$finish is  
encountered in  
for branch.

The "forever" loop is typically used along with timing specifier.

- If delay is not specified, the simulator would execute this statement indefinitely without advancing time.
- Rest of design will never be executed.

### @ (event expression)

- The event expression specifies the event that is required to resume execution of the procedural block.
- The event can be any one of the following :
  - Change of a signal value.
  - Positive or negative edge occurring on signal (posedge or ~~negat~~ negedge).
  - List of above-mentioned events, separated by "or" or comma.
- A "posedge" is any transition from {0, x, z} to

1, and from 0 to  $\{2, \bar{1}\}$ .



examples

- @ (in) // "in" changes
- @ (a or b or c) // any of 'a', 'b' or 'c' change
- @ (a, b, c) // " ", " ", "
- @ (posedge clk) // positive edge of "clk"
- @ (posedge clk or negedge reset) // any variable changes
- @ ( $^{*}$ ) // any variable changes

// D flip-flop with synchronous set and reset

```
module dff (q, qbar, d, set, reset, clk);  
    input d, set, reset, clk;  
    output reg q; output qbar;
```

```
assign qbar = ~q;  
always @ (posedge clk)
```

```
begin  
    if (reset == 0) q <= 0;  
    else if (set == 1) q <= 1;  
    else q <= d;
```

```
end  
endmodule
```

// D flip-flop with asynchronous set and reset

```
module dff (q, qbar, d, set, reset, clk);  
    input d, set, reset, clk;  
    output reg q; output qbar;
```

assign qbar = ~q;

always @ (posedge clk or negedge set or negedge reset)

begin

```
    if (reset == 0) q <= 0;  
    else if (set == 0) q <= 1;  
    else q <= d;
```

end

endmodule

// transparent latch with enable

```
module latch (q, qbar, din, enable);  
    input din, enable;  
    output reg q; output qbar;
```

assign qbar = ~q;

always @ (din or enable)

begin

```
    if (enable) q = din;
```

end

endmodule



Latch => level triggers

enable

// A combinational logic example

```
module mux91 (in1, in0, s, f);
    input  [1:0] in1, in0, s;
    output reg f;
    always @ (in1 or in0 or s)
        if (s)
            f = in1;
        else
            f = in0;
endmodule
```

// Sequential logic example

```
module incomp-state-spec (curr-state, flag);
    input [0:1] curr-state;
    output reg [0:1] flag;
    always @ (curr-state)
        case (curr-state)
            0, 1 : flag = 2;
            3 : flag = 0;
        endcase
endmodule
```

( end module )

| curr-state | Flag |
|------------|------|
| 0          | 2    |
| 1          | 2    |
| 2          | ?    |
| 3          | 0    |

⇒ if curr-state is 2, then  
flag will not change.

( flag will map to a storage element (latch) ).

## // A small modification

```
module incomp-state - spec (curr-state, flag);  
    input [0:1] curr-state;  
    output reg [0:1] flag;  
    always @ (curr-state)  
        begin  
            flag = 0;  
            case (curr-state)  
                0, 1 : flag = 1;  
                3 : flag = 0;  
            endcase  
        end  
endmodule
```

. It is up to the designer to code the design in such a way that latch can be avoided where possible.

## // A simple Y-function ALU

```
module ALU-Y bit (f, a, b, op);  
    input [1:0] op;           input [7:0] a, b;  
    output reg [7:0] f;
```

parameter ADD = 2'b00, SUB = 2'b01, MUL = 2'b10,

DIV = 2'b11;

always @ (op)

ADD : f = a + b;

SUB : f = a - b;

$MUL : f = a * b ;$   
 $DIV : f = a / b ;$   
 end case  
 end module

```

module priority-encoder (in, code);
  input [7:0] in;
  output reg [3:0] code;
  always @ (in
    begin
      if (in[0]) code = 3'b000;
      else if (in[1]) code = 3'b001;
      else if (in[2]) code = 3'b010;
      else if (in[3]) code = 3'b011;
      else if (in[4]) code = 3'b100;
      else if (in[5]) code = 3'b101;
      else if (in[6]) code = 3'b110;
      else if (in[7]) code = 3'b111;
    end
  endmodule
  
```



module bcd-to-7seg (bcd, seg);  
input [3:0] bcd;  
output reg [6:0] seg;

always @ (bcd)

begin case

0 : seg = 6'b0000001;

1 : seg = 6'b001111;

2 : seg = 6'b010101;

3 : seg = 6'b0010010;

4 : seg = 6'b1001100;

5 : seg = 6'b0100100;

6 : seg = 6'b0100000;

7 : seg = 6'b0001111;

8 : seg = 6'b0000000;

9 : seg = 6'b0000100;

default : seg = 6'b111111;

endcase

endmodule

// An n-bit comparator  
module compare (A, B, lt, gt, eq);  
parameter word-size = 16;  
input [word-size - 1:0] A, B;  
output reg lt, gt, eq;

always @ (\*)

begin

gt = 0; lt = 0; eq = 0;

```

if (A > B) gt = 1;
else if (A < B) lt = 1;
else eq = 1
end
endmodule

```

```

module alu-example (alu_out, A, B, operation_en);
    input [2:0] operation; input [7:0] A, B;
    input en;
    output [7:0] alu_out; reg [7:0] alu_reg;
    assign alu_out = (en == 1) ? alu_reg : 4'b0;

```

always @ (\*)

begin, (operation)

3'b000 : alu\_reg = A + B;

3'b001 : alu\_reg = A - B;

3'b011 : alu\_reg = ~A;

default : alu\_reg = 4'b0;

end case

end module

- 2 types of procedural assignments :

Blocking (denoted by "=")

non-blocking (denoted by "<=")

i) Block assignment

- General syntax

Variable-name = [delay-or-event-control] expression;

- The " $=$ " operator is used to specify blocking assignment.
- Blocking assignment statements are executed in order they are specified in a procedural block.
  - the target of an assignment gets updated before the next sequential statement in the procedural block is executed.
  - they do not block execution of statements in other procedural blocks.
- Recommended to combinational ckt.

## ii) Non-blocking assignment

General syntax:

variable\_name <= [delay- or -event-control]  
expression;

- Non-blocking assignment statements allow scheduling of assignments without blocking execution of statements that follow them within the procedural block.
  - the assignment to the target gets scheduled for the end of simulation cycle (at the end of the procedural block)
  - statements subsequent to the instruction under consideration are not blocked by assignment.
  - Allows concurrent procedural assignment, suitable for subsequent logic.

integer a, b, c;

initial

begin

a = 10; b = 20; c = 15;

end

initial

begin

a <= #5 b + c;

a = 35 at time 5

b <= #5 a + 5;

b = 15 " "

c <= #5 a - b;

c = -10 " "

end

results

different from  
blocking assignment

Swapping values of two variables "a" and "b"

always @ (posedge clk)

a <= b;

always @ (posedge clk)

b <= a;

/ module non\_blocking-assign;

integer a, b, c, d;

reg clock;

always @ (posedge clk)

begin

a <= b + c;

d <= a - 3;

b <= d + 10;

c <= c + 1;

end

initial  
begin  
\$monitor (f\_time, "a = %yd, b = %yd,  
c = %yd, d = %yd ", a, b, c, d);

a = 30; b = 20; c = 15; d = 5;

clock = 0;  
forever #(\$5) clock = ~clock;

end

initial

# 100 \$finish;

endmodule

- It is recommended not to use blocking & non-blocking assignments in same "always" block.
- A variable cannot appear as the target of both a blocking & non-blocking assignment

~~v<sup>t</sup>~~ n = n + 5;      x  
n <= y;

// ring counter



1 0 0 0  
0 1 0 0  
0 0 1 0  
0 0 0 1

```

module my_count (clk, init, count);
    input clk, init;
    output reg[31:0] count;
    always @ (posedge clk)
        begin
            if (init) count = 8'b10000000;
            else
                begin
                    count <= count << 1;
                    count[0] <= count[7];
                end
        end
endmodule

```

- Whenever a clock-trigger assignment → non-blocking

### Generate Blocks

- "generate" statements allow Verilog code to be generated dynamically before the simulation or synthesis begins.
- Very convenient to create parameterized module descriptions.
- Reentrant instantiations can be carried out for various Verilog blocks:
- Modules, user-defined primitives, gates, continuous assignments, "initial" and "always" blocks, etc.
- Generated instances have unique identification names and can be referenced hierarchically.

of module xor-bitwise (f, a, b);  
parameters N = 16;  
input [N-1:0] a, b;  
output [N-1:0] f;  
genvar p;

generate for ( $p=0$ ;  $p < N$ ;  $p=p+1$ )  
begin xor #p  
    xor x[p] (f[p], a[p], b[p]);  
end  
endgenerate  
endmodule

module generate-test;  
reg [15:0] x, y;  
wire [15:0] out;

xor-bitwise G (.f(out), .a(x), .b(y));

initial begin  
\$monitor ("x:y.b, y:y.b, out:y.b", x,  
out);  
x = 16'haaaaa, y = 16'h00ff;  
#10 x = 16'h0ff; y = 16'h3333;  
#20 \$finish;

end  
endmodule

\ generating N-bit ripple carry adder  
using genvar & generate

module RCA (carry-out, sum, a, b, carry-in);  
 parameter N=8;  
 input [N-1:0] a, b; input carry-in;  
 output [N-1:0] sum; output carry-out;  
 wire [N:0] carry; // carry[N] is carry-out

assign carry[0] = carry-in;  
 assign carry-out = carry[N];

genvar i;  
 generate for (i=0; i< N; i++)

begin fa-loop

wire t1, t2, t3;

xor h1 (t1, a[i], b[i]), h2 (sum[i], t2,  
 carry[i]);

and h3 (t2, a[i], b[i]), h4 (t3, t1, carry[i]);  
 or h5 (carry[i+1], t2, t3);

end

end generate  
 endmodule

- Some of the relative hierarchical instance names that are generated are:  
 - fa-loop[0].h1, fa-loop[1].h2, fa-loop[2].h3
- Some of the nets ("wires") that are generated are:  
 - fa-loop[0].t1, fa-loop[1].t2, fa-loop[0].t3.

## User defined Primitives (UDP)

- They can specify:
  - Truth table for combinational functions.
  - State table for sequential functions.
  - Don't care, rising and falling edges, etc. can be specified.
- for combinational functions, truth table entries are specified as:  
 $\langle \text{input } 1 \rangle \langle \text{input } 2 \rangle \dots \langle \text{input } N \rangle : \langle \text{output} \rangle;$
- for sequential functions, state table entries are specified as:  
 $\langle \text{input } 1 \rangle \langle \text{input } 2 \rangle \dots \langle \text{input } N \rangle : \langle \text{present-state} \rangle$   
Some rules for using UDP:  $\langle \text{next-state} \rangle$

- The input terminals to a UDP can only be scalar variables.
  - Multiple input terminals can be used.
  - The input terminals are declared as "input".
  - Input entries in the table must be in the same order as the "input" terminal list.
- Only one scalar output terminal must be used.
  - The output terminal must appear in the by for combinational UDPs, the output terminal is declared as "output".
  - For sequential UDPs, the output terminal is declared as "reg".
- For sequential UDPs, the state can be initialized with an "initial statement".

## Some Guidelines

- User defined Primitives (UDPs) model functionality
  - They do not model timing or process technology
- A functional block can be modeled as a UDP <sup>only</sup> if it has exactly one output.
- if a block has more than one outputs, it has to be modeled as a module.
- As an alternative, multiple UDPs can be used, one per output.
- Inside the simulator, a UDP is typically implemented as a lookup table in memory.
- The UDP state tables should be specified as completely as possible.
  - for unspecified cases, the output is set to "x".

// Full adder sum generation using UDP primitive  
primitive udp.sum (sum, a, b, c);

    input a, b, c;

    output sum;

    table

| a | b | c | : | sum |
|---|---|---|---|-----|
| 0 | 0 | 0 | : | 0 ; |
| 0 | 0 | 1 | : | 1 ; |
| 0 | 1 | 0 | : | 1 ; |
| 0 | 1 | 1 | : | 0 ; |
| 1 | 0 | 0 | : | 1 ; |
| 1 | 0 | 1 | : | 0 ; |
| 1 | 1 | 0 | : | 0 ; |
| 1 | 1 | 1 | : | 1 ; |

end table

end primitive

We can also specify don't care input combinations as "?".

// Full adder carry generation  
// Using don't care ("?")

primitive udp-cy (cout, a, b, c);

input a, b, c;

output cout;

table

| a | b | c | cout |
|---|---|---|------|
| 0 | 0 | ? | 0    |
| 0 | ? | 0 | 0    |
| ? | 0 | 0 | 0    |
| 1 | 1 | ? | 1    |
| 1 | ? | 1 | 1    |
| ? | 1 | 1 | 1    |

end table

end primitive

// Instantiating UDP's

// A full adder description

module full-adder (sum, cout, a, b, c);

input a, b, c;

output sum, cout;

udp-sum SUM (sum, a, b, c);

udp-cy carry (cout, a, b, c);

end module

/\* A level-sensitive D type latch

primitive Dlatch (i, d, clk, clr);  
    input d, clk, clr;  
    output reg q;

initial

q = 0; // This is optional

table

| d | clk | clr | q | q-new |                              |
|---|-----|-----|---|-------|------------------------------|
| ? | ?   | 1   | : | ?     | 0 ; // latch is clear        |
| 0 | 1   | 0   | : | ?     | 0 ; // latch is reset        |
| 1 | 1   | 1   | : | ?     | 1 ; // latch is set          |
| ? | 0   | 0   | : | ?     | - ; // remain previous state |

end table

end primitive

/\* A T flip-flop

primitive TFF (q, clk, clr);  
    input clk, clr;  
    output reg q;

table

| clk  | clr  | q | q-new |                                 |
|------|------|---|-------|---------------------------------|
| ?    | 1    | : | ?     | 0 ; // ff is cleared            |
| ?    | (10) | : | ?     | - ; // ignore -ve edge of "clr" |
| (10) | 0    | : | 1     | : 0 ; // ff toggles on edge     |
| (10) | 0    | : | 0     | : 1 ; // do                     |
| (0?) | 0    | : | ?     | : - ; // ignore +ve edge of clk |

endtable  
end primitive

// constructing a 6-bit ripple counter using 7 flip-flops  
~~module~~ ripple-counter (count, clk, clr);  
input clk, clr;  
output [5:0] count;

TFF F0 (count[0], clk, clr);  
TFF F1 (count[1], count[0], clk, clr);  
TFF F2 (count[2], count[1], clk, clr);  
TFF F3 (count[3], count[2], clk, clr);  
TFF F4 (count[4], count[3], clk, clr);  
TFF F5 (count[5], count[4], clk, clr);

~~end primitive~~  
end module

// A negative edge insensitive JK flip-flop  
primitive JK\_FF (q, j, k, clk, clr);  
input j, k, clk, clr;

output reg q;  
table

| j | k | clk  | clr  | q | q-new | dk                  |
|---|---|------|------|---|-------|---------------------|
| ? | ? | ?    | 1    | : | 2 : 0 | // clear            |
| ? | ? | ?    | (10) | : | ?     | // ignore           |
| 0 | 0 | (10) | 0    | : | ?     | // no change        |
| 0 | 1 | (10) | 0    | : | ?     | // set condition    |
| 1 | 0 | (10) | 0    | : | ?     | // set condition    |
| 1 | 1 | (10) | 0    | : | 0 : 1 | // toggle condition |
| 1 | 1 | (10) | 0    | : | 1 : 0 | // toggle condition |
| ? | ? | (01) | 0    | : | ?     | // no change        |

- '3' cannot be used in output field.
- '-' can only be specified in output field.
- shortcut "x" → for rising edge, instead of (01).
- shortcut "t" → for falling edge, instead of (10).
- shortcut '\*' indicates any value change in signal.

## Testbench

How to write test benches ?

- Create a dummy template
- declare inputs to the design-under-test (DUT) as "reg", and the outputs as "wire".
  - Because we have to initialize the DUT inputs inside procedural block(s), typically "initial", where only "reg" type variables can be assigned.
- Instantiate the DUT.
- Initialization and Monitoring
  - Assign some known values to DUT inputs
  - Monitor the DUT outputs for functional verification.
- For synchronous sequential Jcts, we need clock generation logic.

## The Simulator directives

- \$display ("<format>", expr1, expr2, ...);
  - Used to print the immediate values of text or variables to std::out.
- \$monitor ("<format>", var1, var2, ...)
  - Similar in syntax to \$display, but does not print immediately.

- It will print the value(s) whenever the value of some variable(s) in the given list changes.
- Has the functionality of event-driven print.

• \$finish;

- Terminates the simulation process.

• \$dumpfile (<filename>);

- Specifies the file that will be used for storing the values of the selected variables so that they can be graphically visualized later.
- The file typically has an extension .vcd (Value Change Dump), and contains information about any value changes on the selected variables.

• \$dumpoff;

- This directive stops the dumping of variables. All variables are dumped with "x" values and the next change of variables will not be dumped.

• \$dumpon;

- This directive starts previously stopped dumping of variables.

• \$dumpvars (level, list\_of\_variables\_or\_modules);

- Specifies which variables should be dumped on to the .vcd file.

- Both the parameters are optional; if both are omitted, all variables are dumped.

- If level = 0, then all variables within the modules from the list will be dumped. If any module from the list contains module

instantiated, then all variables from these modules will also be dumped.

- If level > 1, then only listed variables and variables of listed modules will be dumped.
- \$dumpall ;
  - the current values of all variables will be written to the file, irrespective of whether there has been any change in their value or not.
- \$dumpLimit (filesize);
  - used to set the maximum size of the .vcd file.

// test bench for Full adder using \$display

module test\_bench;

reg a,b,c; wire sum, cout;

integer i;

full-adder FA (sum, cout, a, b, c);

initial

begin

for (i=0 ; i<8 ; i=i+1) .

  begin

    {a, b, c} = i; #5;

    \$display ("T=%2d a=%b, b=%b,  
              c=%b, sum=%b, cout=%b",

              \$time, a, b, c, sum, cout);

  end

  #5 \$finish

endmodule

for out path  
called adder

iven log -o adder fa.v La-test .v

I have bunch for shift register

make shift-test ;

reg clk, clr, in; wire out; integer i;  
shiftreg\_4bit SR (clk, clr, in, out);

initial

begin clk = 1'b0; #2 clr = 0; #5 clr = 1; end

always #5 clk = ~clk;

initial begin #2;

repeat (2)

begin #10 in = 0; #10 in = 0; #10 in = 1;  
#10 in = 1; end

end

initial

begin

#dumpfile ("shifter.vcd");  
#dumpvars 0, shift test;  
#200 #finish;

end

endmodule

- Shall display 1 if outputs are correct ; and display 0 otherwise !

module fulladder - test;

reg a, b, c;

wire s, cout;

integer correct;

fulladder FA (a, b, c, s, cout);

initial

begin

correct = 1;

#5 a=1; b=1; c=0; #5;

if ((s != 0) || (cout != 1))

correct = 0;

#5 a=1; b=1; c=1; #5;

if ((s != 2) || (cout != 1))

correct = 0;

#5 a=0; b=1; c=0; #5;

if ((s != 1) || (cout != 0))

correct = 0;

#5 \$display ("%d", correct);

end

end module

- The system task \$random can be used to generate a random number.
- It is called as : \$random (<seed>)
  - The value of <seed> is optional and is used to ensure that the same sequence

of random numbers are generated each time the test is run.

```
module test_adder;  
    reg [7:0] a, b;  
    wire [7:0] sum;  
    integer myseed;  
    adder ADD (sum, cout, a, b);  
  
initial myseed = 15;  
  
initial  
begin  
    repeat (5)  
    begin  
        a = $random (myseed);  
        b = $random (myseed); # 10;  
        $display ("T: %3d, a: %h, b: %h,  
        sum: %h", $time; a, b, sum);  
    end  
end  
endmodule
```

## Finite State Machine (FSM)

- A PSM can be represented either in the form of a state table or in the form of a state diagram
- Transition diagram
- Variation exist, e.g. Algorithmic State Machine (ASM) chart.

State table

"111" detection

| Reset | PS | Input | NS | Output |
|-------|----|-------|----|--------|
| 1     | -  | -     | A  | 0      |
| 0     | A  | 0     | A  | 0      |
| 0     | A  | 1     | B  | 0      |
| 0     | B  | 0     | A  | 0      |
| 0     | B  | 1     | C  | 0      |
| 0     | C  | 0     | A  | 0      |
| 0     | C  | 1     | D  | 1      |
| 0     | D  | 0     | A  | 0      |
| 0     | D  | 1     | D  | 1      |

State diagram



Mealy and Moore FSM

- A deterministic FSM can be mathematically defined as a 6-tuple  $(\Sigma, \Gamma, S, s_0, \delta, \omega)$  where  $\Sigma$  (sigma) is the set of input combinations,  $\Gamma$  (gamma) is the set of output combinations,  $S$  (capital S) is a finite set of states,  $s_0 \in S$  is the initial state,  $\delta$  (delta) is the

state transition function, and  $w$  is the output function.

Here,  $\delta: S \times \Sigma \rightarrow S$

- Present state ( $ps$ ) and present input determines the next state ( $ns$ ).

for Mealy machine,  $w: S \times \Sigma \rightarrow F$  (output depends on state + inputs)

for Moore machine,  $w: S \rightarrow F$  (output depends only on the state).



Ex 1

There are 3 lamps, RED, GREEN, and YELLOW, that should glow cyclically with a fixed time interval.

Some observations:

The PSM will have 3 states, corresponding to the glowing state of the lamps

The input set is NULL; state transition will

occurs whenever clock signal comes.

- This is a Mealy machine, since the lamp that will glow only depends on the state and not on the input (here null).



00  $s_0 \rightarrow$  RED  
01  $s_1 \rightarrow$  GREEN  
10  $s_2 \rightarrow$  YELLOW  
 $\equiv$   
11x 2 bits



module cyclic\_lamp(clock, light);

input clock;

output reg [0:2] light;

parameter  $s_0 = 0$ ,  $s_1 = 1$ ,  $s_2 = 2$ ;

parameter RED = 3'b100, GREEN = 3'b010,  
YELLOW = 3'b001;

reg [0:1] state;

always @ (posedge clock)

case (state)

$s_0$  : begin //  $s_0$  mean RED  
light <= GREEN ; state <=  $s_1$ ;  
end

$s_1$  : begin

light <= YELLOW ; state <=  $s_2$ ;

end

$s_2$  : begin

light <= RED ; state <=  $s_0$ ;

end

default : begin

light <= RED;  
state <= SO;

end

end case

end module

// testbench

module test\_cyclic\_lamp;  
reg clk;  
wire [0:9] light;

cyclic\_lamp LAMP (.clk / light);

always #5 clk = ~clk;

initial

begin

clk = 1'b0;  
#100 \$finish;

end

initial

begin

\$dumpfile("cyclic.vcd"); \$dumpvars(0,test\_cyclic\_lamp);

\$monitor(\$time, "RAY : %b", light);

end

endmodule

Output

0 RAY:xxx

5 RAY:100

15 RAY:010

25 RAY:001

35 RAY:100

45 RAY:010

55 RAY:001

65 RAY:100

75 RAY:010

85 RAY:001

95 RAY:100

- Some comments on solution:
  - The synthesis tool will generate five flip-flops:  
2 for state and 3 for light.
  - The three output lines are also getting stored in flip-flops.
  - We have used non-blocking assignment triggered by clock edge.
  - But actually we do not need separate flip-flops for the outputs, as the outputs can be directly generated from the state.
- How to achieve this?
  - . Modify the Verilog code such that all assignments to light is made in a separate "always" block.
  - . Use blocking assignment triggered by state change, and not by clock.

```

module cyclic_lamp (clock, light);
    input clock;
    output reg [0:2] light;
    parameter SO = 0, S1 = 1, S2 = 2;
    parameter RED = 3'b100, GREEN = 3'b010,
                  YELLOW = 3'b001;
    reg [0:1] state;
    always @ (posedge clk)
        case (state)
            0 : state <= S1;
            1 : state <= S2;
            2 : state <= S0;
            default : state <= S0;

```

always @ (state)  
case (state)

S0 : light = RED ;

S1 : light = GREEN ;

S2 : light = YELLOW ;

default: light = RED ;

end case

endmodule

- The synthesis tool will be generating only 2 flip-flops corresponding to the first clock-triggered "always" block.
- The second "always" block will be generating a combinational ckt that takes state as input and produces light as outputs.

| state (s, s0) | light R&Q | minimization      |
|---------------|-----------|-------------------|
| s0: 00        | 1 0 0     | $R = s_0 \cdot s$ |
| s1: 01        | 0 1 0     | $Q = s_0$         |
| s2: 10        | 0 0 1     | $X = s_0$         |
| 11            | X X X     |                   |

|   |   |   |   |   |   |     |
|---|---|---|---|---|---|-----|
| R | 0 | 1 | * | Q | 0 | 1   |
| 0 | 1 | 0 | * | 0 | 0 | (1) |
| 1 | 0 | X | * | 1 | 0 | (X) |

Fx 2

Design of a serial parity detector

- A continuous stream of bits is fed to a ckt in synchronism with a clock. The ckt will be generating a bit stream as output, where a 0 will indicate

"even number of 1's seen so far" and a 1 will indicate "odd number of 1's seen so far".  
 - Also a Moore Machine.



```

module parity_gen (x, clk, z);
    input x, clk;
    output reg z;
    reg even-odd; // the machine state
    parameter EVEN=0, ODD=1;
    always @ (posedge clk)
        case (even-odd)
            
```

```

                EVEN: even-odd <= x ? ODD: EVEN;
                ODD: even-odd <= x ? EVEN: ODD;
                default: even-odd <= EVEN;
            endcase
        
```

```

    always @ (even-odd)
        case (even-odd)
            EVEN: z = 0;
            ODD: z = 1;
        endcase
    
```

```

endmodule

```

- This design will not cause the synthesis tool to generate a latch for the output  $z$ .

1) test bench

```
module test-parity;
    reg clk, x; wire z;
    parity_gen PAK(x, clk, z);
    initial
        begin
            $dumpfile("parity.vcd");
            dumpvars(0, test-parity);
            clk = $b0;
        end
    always #5 clk = ~clk;
    initial
        begin
            #2 x=0; #10 x=1; #10 x=1; #10 x=1;
            #10 x=0; #10 x=1; #10 x=1; #10 x=0;
            #10 x=0; #10 x=1; #10 x=1; #10 x=0;
            #10 $finish;
        end
    endmodule
```



ex 3. Design of a segment detector.  
- A circuit accepts a serial bit stream "x" as input and produces a serial bit stream "z" as output.

- Whenever the bit pattern "0 110" appears in the input stream, it outputs,  $z = 1$ ; at all other times,  $z = 0$ .
- Overlapping occurrences of the pattern are also detected.
- This is a Mealy Machine.

$$x: 0110110$$

$$z: 0001001$$



```

// Sequence detector for pattern "0110"
module seq-detector (x, clk, reset, z);
    input x, clk, reset;
    output reg z;
    parameter SO = 0, S1 = 1, S2 = 2, S3 = 3,
    reg [0:1] PS, NS;
    always @ (posedge clk or posedge reset)
        if (reset) PS <= SO;
        else PS <= NS;

```

```

always @ (PS, x)
  case (PS)
    50 : begin
      z = x ? 0 : 0;
      NS = x ? 50 : S1;
    end
    S1 : begin
      z = x ? 0 : 0;
      NS = x ? S2 : S1;
    end
    S2 : begin
      z = x ? 0 : 0;
      NS = x ? S2 : S1;
    end
    S3 : begin
      z = x ? 0 : 1;
      NS = x ? S0 : S1;
    end
  endcase
endmodule

module test-sequence;
reg clk, x, reset; wire z;
seq-detector #SEG (x, clk, reset, z);
initial
begin
  $dumpfile ("sequence.vcd");
  $dumpvars(0, test-seq);
  clk = 1'b0; reset = 1'b1;
#15 reset = 1'b0;
end
always #5 clk = ~clk;

```

```

initial
begin
    #10 x=0; #10 x=0; #10 x=1; #10 x=0;
    #10 x=0; #10 x=1; #10 x=1; #10 x=0;
    #10 x=0; #10 x=1; #10 x=1; #10 x=0;
    #10 $finish;
end
endmodule

```

Ex 9

Design a sequence detector for the bit pattern "10 1010".



Ex 9 Serial Adder:

$$\begin{array}{r}
 A: 0\ 1\ 1\ 0 \\
 B: 0\ 0\ 1\ 0 \\
 \hline
 S: 0\ 1\ 0\ 0
 \end{array}$$



State : CARRY  
(1 bit)



## Datapath & Controller

- In a complex digital system, the hardware is typically partitioned into two ways:
  - Data path, which consists of the functional units where all computations are carried out.
    - Typically consists of registers, multiplexers, buffers, adders, multipliers, counters, and other functional blocks.
  - Control Path, which implements a finite-state machine and provides control signals to the data path in proper sequence.
    - In response to the control signals, various operations are carried out by the data path.
    - Also takes inputs from the data path regarding various status information.

reg[15:0] A, B, C, D;

$$\begin{aligned} A &= B + C \\ D &= A - C \end{aligned} \Rightarrow$$



Ex: Multiplication by Repeated Addition

- We consider a simple algorithm using repeated addition
  - Assume B is non-zero

- We identify the functional blocks required in the data path, and the corresponding control signals

- Then we design the FSM to implement the multiplication algorithm using the data path.





```

module MUL_datapath (eq2, LdA, LdB, LdP, clrP,
                      decB, data_in, clk);
  input LdA, LdB, LdP, clrP, decB, clk;
  input [15:0] data_in;
  output [eq2];
  output [15:0] x, y, z, Bout, Bus;
endmodule
  
```

parallel in parallel out → a register

begin  
PJPO A ( $x$ , bus, ldA, clk);  
PJPO P ( $y$ ,  $z$ , ldP, clrP, clk);  
CTR B (Bout, Bus, ldb, decB, clk);  
ADD AD ( $z$ ,  $x$ ,  $y$ );  
EQ2 COMP (eqz, Bout);  
end module

module PJPO1 (dout, din, ld, clk);  
input [15:0] din;  
input ld, clk;  
output reg [15:0] dout;  
always @ (posedge clk);  
if (ld) dout <= din;  
end module

module ADD (out, in1, in2);  
input [15:0] in1, in2;  
output reg [15:0] out;  
always @ (\*)  
out = in1 + in2;  
end module

module PJPO2 (dout, din, ld, clr, clk);  
input [15:0] din;  
input ld, clr, clk;  
output reg [15:0] dout;  
always @ (posedge clk);  
if (clr) dout <= 16'b0;  
else if (ld) dout <= din;  
end module

```
module MUL_EQ2 (eq2, data);
    input [15:0] data;
    output eq2;
    assign eq2 = (data == 0);
endmodule
```

```
module CTR (dout, din, ld, dec, clk);
    input [15:0] din;
    input ld, dec, clk;
    output eq [15:0] dout;
    always @ (posedge clk)
        if (ld) dout <= din;
        else if (dec) dout <= dout - 1;
endmodule
```

~~```
module MUL_datapath (eq2, LdA, LdB, LdP,
    clk, lddB, data_in, clk);
    input LdA, ddB, LdP, clk, decB, clk;
    input [15:0] data_in;
    output eq2, x, z, Bout, Bus;
endmodule
```~~~~```
P1 P01 A (x, Bus, LdA, clk);
P1 P02 B (y, z, LdP, clk);
CTR B (Bout, Bus, LdB)
```~~

## The control path

Module controller ( $LdA, LdB, LdP, clrP, decB, done$ ,  
 $clk, eqz, start$ );  
input  $clk, eqz, start$ ;  
output reg  $LdA, LdB, LdP, clrP, decB, done$ ;

reg [2:0] state;

parameter  $s0 = 3'b000, s1 = 3'b001, s2 = 3'b010,$   
 $s3 = 3'b011, s4 = 3'b100$ ;

always @ (posedge clk)

begin

case (state)

$s0 : if (start) state <= s1;$

$s1 : state <= s2;$

$s2 : state <= s3;$

$s3 : #2 if (eqz) state <= s4;$

$s4 : state <= s4;$

default : state <= s0;

end case

end

~~end module~~

always @ (state)

begin

case (state)

$s0 : begin #1 LdA = 0; LdB = 0; LdP = 0;$

$clrP = 0; decB = 0; end$

$s1 : begin #1 LdA = 1; end$

$s2 : begin #1 LdA = 0; LdB = 1; clrP = 1; end$

$s3 : begin #1 LdB = 0; LdP = 1; clrP = 0; decB = 1; end$

$s4 : begin #1 done = 1; LdB = 0; LdP = 0; decB = 0; end$

```
default : begin #3 LdA=0; LdB=0; LdP=0;  
           Clk=0; decB=0; end  
endcase  
end  
endmodule
```

// Testbench

```
module MUL_tb ;  
    reg [15:0] data_in ;  
    reg clk, start ;  
    wire done ;
```

```
MUL datapath D1( eq2, LdA, LdB, LdP, Clk, decB,  
controller CON( LdA, LdB, LdP, Clk, decB, done,  
clk, eq2, start ) ;
```

initial

begin

clk = 1'b0 ;

#3 start = 1'b1 ;

#500 \$finish ;

end

always #5 clk = ~clk ;

initial

begin

#17 data\_in = 17 ;

#10 data\_in = 5 ;

end

↑ variable  
↓ funcDP  
→ end

```

initial
begin
    $monitor ($time, "%d %b" DP.Y, done);
    $dumpfile ("mul.vcd");
    $dumpvars (0, MUL);
end
endmodule

```

|    | result |   |
|----|--------|---|
| 0  | x      | x |
| 1  | x      | 0 |
| 25 | 0      | 0 |
| 45 | 17     | 0 |
| 55 | 38     | 0 |
| 65 | 51     | 0 |
| 75 | 68     | 0 |
| 85 | 85     | 0 |
| 88 | 85     | 1 |

## A better style of Modeling Data / Control Path

- In the previous example, in the "always" block activated by clock edge, both state change as well as computation of the next state is performed.
- A better and recommended approach:
  - Only trigger the state change in the clock activated "always" block.
  - in a separate "always" block using blocking assignments, compute the next state.
  - As in the previous example, in a separate

"always" block, generate the control signals for the data path.

Ex2: GCD computation

We consider a simple algorithm using repeated subtraction.

- We identify the functional blocks required in the datapath, and the corresponding control signals.



- Then we design the FSM to implement the GCD computation algorithm using the data path.







module ACD-data-path (gt, lt, eq, ldA, ldB, sel1, sel2, sel\_in, data-in, clk);  
 input ldA, ldB, sel1, sel2, sel\_in, clk;  
 input [15:0] data-in;  
 output gt, lt, eq;  
 wire [15:0] Aout, Bout, X, Y, Bus, SubOut;

PIPE A (Aout, Bus, ldA, clk);  
 PIPE B (Bout, Bus, ldB, clk);  
 MUX Mux\_in1 (X, Aout, Bout, sel1);  
 MUX Mux\_in2 (Y, Aout, Bout, sel2);  
 SUB SB (SubOut, X, Y);  
 COMPARE COMP (lt, gt, eq, Aout, Bout);  
 endmodule

module PIPE (data-out, data-in, load, clk);  
 input [15:0] data-in;  
 input load, clk;  
 output reg [15:0] data-out;  
 always @ (posedge clk)  
 if (load) data-out <= data-in;  
 endmodule

```

module svB (out, in1, in2);
    input [15:0] in1, in2;
    output reg [15:0] out;
    always @(*)
        out = in1 - in2;
endmodule

```

```

module COMPARE (lt, gt, eq, data1, data2);
    input [15:0] data1, data2;
    output lt, gt, eq;
    assign lt = data1 < data2;
    assign gt = data1 > data2;
    assign eq = data1 == data2;
endmodule

```

```

module MVX (out, in0, in1, sel);
    input [15:0] in0, in1;
    input sel;
    output [15:0] out;
    assign out = sel ? in1 : in0;
endmodule

```

```

module controller (ldA, ldB, sel1, sel2, selin,
                  done, clk, lt, gt, eq, start);
    input clk, lt, gt, eq, start;
    output reg ldA, ldB, sel1, sel2, selin, done;
    reg [2:0] state;
    parameter s0 = 3'b000, s1 = 3'b001, s2 = 3'b010,
              s3 = 3'b011, s4 = 3'b100, s5 = 3'b101;

```

days @ (preedge clk)

begin

case (state)

s0 : if (edge) state <= s1;

s1 : state <= s2;

s2 : #2 if (edge) state <= s3;

else if (lf) state <= s3;

else if (gt) state <= s4;

s3 : #2 if (edge) state <= s5;

else if (lf) state <= s3;

else if (gt) state <= s4;

s4 : #2 if (edge) state <= s5;

else if (lf) state <= s3;

else if (gt) state <= s4;

s5 : state <= s5;

default : state <= s0;

endcase

and

days @ (state)

begin

case (state)

s0 : begin sel-in = 1; lfdA = 1; lfdB = 0;  
done = 0; end

s1 : begin sel-in = 1; lfdA = 0; lfdB = 1;  
end

s2 : if (edge) done = 1;

else if (lf) begin

sel1 = 1; sel2 = 0; sel-in = 0;

#1 lfdA = 0; lfdB = 1;

else if (gt) begin

sel1 = 0; sel2 = 1; sel-in = 0;

#1 lfdA = 1; lfdB = 0;

end

S3 : if (eq) done = 1 ;  
else if (lt) begin  
sel1 = 1; sel2 = 0; sel\_m = 0;  
#1 ldt = 0; ldb = 1 ;  
else if (gt) begin  
sel1 = 0; sel2 = 1; sel\_m = 0;  
#1 ldt = 1; ldb = 0 ;  
end

S4 : if (eq) done = 1 ;  
else if (lt) begin  
sel1 = 1; sel2 = 0; sel\_m = 0;  
#1 ldt = 0; ldb = 1 ;  
end  
else if (gt) begin  
sel1 = 0; sel2 = 1; sel\_m = 0;  
#1 ldt = 1; ldb = 0 ;  
end

S5 : begin  
done = 1; sel1 = 0; sel2 = 0; ldt = 0;  
ldb = 0;

default end : begin ldt = 0; ldb = 0; end  
endcase  
end  
endmodule

|| Testbench  
module ACD-test ;  
reg [5:0] data\_in ;  
reg clk, start ;  
wire done ;

```

reg [15:0] A, B;
acq datapath DP (gl, lt, eq, ldt, ldb, sel1,
sel2, sel-in, data-in, clk);
controller CON (ldA, ldb, sel1, sel2, sel-in, done,
clk, lt, gl, eq, start);
initial
begin
    clk = 1'b0;
#5 start = 1'b1;
#1000 $finish;
end
always #5 clk = ~clk;
initial
begin
#12 data-in = 143;
#10 data-in = 28;
end
initial
begin
$monitor ("%t, %d, %b", DP.Aout, done);
$dumpfile ("gcd.vcd");
dumpvars(0, gcd);
end
endmodule

```

- Modeling the control path using the alternate approach

```

module controller (ldt, ldb, sel1, sel2, sel-in, done,
clk, lt, gl, eq, start);
input clk, lt, gl, eq, start;
output reg ldt, ldb, sel1, sel2, sel-in, done;

```

reg [2:0] state, next-state;  
parameters S0 = 3'b000, S1 = 3'b001, S2 = 3'b010,  
S3 = 3'b011, S4 = 3'b100, S5 = 3'b101;

always @ (posedge clk)

begin  
state <= next-state;  
end

always @ (state)

begin  
case (state)

S0 : begin sel-in = 1; ldA = 1; ldB = 0;  
done = 0; end

S1 : begin sel-in = 1; ldA = 0; ldB = 1; end  
S2 : if (eq) begin done = 1; next-state = S5; end  
else if (lt) begin

sel1 = 1; sel2 = 0; sel-in = 0;  
next-state = S3; #1 ldA = 0; ldB = 1;

end.

else if (gt) begin

sel1 = 0; sel2 = 1; sel-in = 0; next-state =  
#1 ldA = 1; ldB = 0;

end

S3 : if (eq) begin done = 1; next-state = S5; end  
else if (lt) begin

sel1 = 1; sel2 = 0; sel-in = 0;

next-state = S3; #1 ldA = 0; ldB = 1;

end

else if (gt) begin

sel1 = 0; sel2 = 1; sel-in = 0;

next-state = S4; #1 ldA = 1; ldB = 0;

end

```

S4: if (eq) begin done = 1; next-state = S5; end
else if (gt) begin
    sel1 = 1; sel2 = 0; sel-in = 0;
    next-state = S3; #1 ldA = 0; ldB = 1;
end
else if (gt) begin
    sel1 = 0; sel2 = 1; sel-in = 0;
    next-state = S4; #1 ldA = 1; ldB = 0;
end

```

```

S5: begin
done = 1; sel1 = 0; sel2 = 0; ldA = 0;
ldB = 0; next-state = S5;
end

```

```

default: begin ldA = 0; ldB = 0; next-state = S0;
end

```

end case

end

end module

### Booth's multiplication

- In the conventional shift and add multiplication, for n-bit multiplication, we iterate n times.
- Add either 0 or the multiplicand to the  $2^n$  bit partial product (depending on the next bit of the multiplication).
- Shift the  $2^n$  bit partial product to the right.
- Essentially we need n additions and n shift operations.
- Booth's algorithm is an improvement, whereby we can avoid the additions whenever consecutive 0's and 1's are detected in the multiplier.
  - Makes the process faster

## Basic idea Behind Booth's Algorithm

- We inspect two bits of the multiplier ( $q_i, q_{i-1}$ ) at a time.
  - if the bits are same (00 or 11), we only shift the partial product.
  - If the bits are 01, we do an addition and then shift.
  - If the bits are 10, we do a subtraction and then shift.
- $q_{-1}$  is assumed to be equal to 0.
- Significantly reduced the number of additions / subtractions.





$$\text{Example 1: } (-10) \times 13$$

Assume 5-bit numbers.

$$M: (10110)_2$$

$$-M: (01010)_2$$

$$Q: (01101)_2$$

$$\text{Product} = -130$$

$$= (110111110)_2$$

$$\begin{array}{r} A \\ 00000 \\ 01101 \\ 0 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 0 \end{array}$$

initialization

$$\begin{array}{r} A \\ 01010 \\ 00101 \\ 0 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 0 \end{array}$$

$A = A - M$

$$\begin{array}{r} A \\ 00101 \\ 00110 \\ 1 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 1 \end{array}$$

shift

$$\begin{array}{r} A \\ 11011 \\ 00110 \\ 1 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 1 \end{array}$$

$A = A + M$

$$\begin{array}{r} A \\ 00111 \\ 10011 \\ 0 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 0 \end{array}$$

shift

$$\begin{array}{r} A \\ 00011 \\ 11001 \\ 1 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 0 \end{array}$$

shift

$$\begin{array}{r} A \\ 10111 \\ 11100 \\ 1 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 0 \end{array}$$

$A = A - M$

$$\begin{array}{r} A \\ 11011 \\ 11110 \\ 0 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 0 \end{array}$$

shift

~~shift~~

Ans

Ex 2

$$-31 \times 28$$

Assume 6-bit nos.

$$M: (100001)_2$$

$$-M: (011111)_2$$

$$Q: (011100)_2$$

$$\text{Product} = -868$$

$$= (110010$$

$$011100)$$

$$\begin{array}{r} A \\ 000000 \\ 011100 \\ 0 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 0 \end{array}$$

initialization

$$\begin{array}{r} A \\ 000000 \\ 001110 \\ 0 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 0 \end{array}$$

shift

$$\begin{array}{r} A \\ 000000 \\ 000111 \\ 0 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 0 \end{array}$$

shift

$$\begin{array}{r} A \\ 011101 \\ 000111 \\ 0 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 0 \end{array}$$

$A = A - M$

$$\begin{array}{r} A \\ 000111 \\ 100011 \\ 1 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 0 \end{array}$$

shift

$$\begin{array}{r} A \\ 000111 \\ 110001 \\ 1 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 0 \end{array}$$

shift

$$\begin{array}{r} A \\ 110010 \\ 111000 \\ 1 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 0 \end{array}$$

$A = A + M$

$$\begin{array}{r} A \\ 110010 \\ 011100 \\ 0 \end{array} \quad \begin{array}{r} Q \\ 1 \\ 0 \end{array}$$

shift



DATA PATH



Control Path



## 11 The datapath

```
module BOOTH (ldt, ldQ, ldM, clrA, clk, clrf,  
sftA, sftQ, addsub, decr, ldcnt, datain,  
clk, qm1, eq2);  
input ldt, ldQ, ldM, clrA, clk, clrf, sft,  
sftB, addsub, clk;  
input [15:0] datain;  
output qm1, eq2;  
wire [15:0] A, M, Q, Z;  
wire [4:0] count;
```

assign eq2 = ~count;

```
shiftreg A (A, Z, A[15], clk, ldt, clrA, sftA);  
shiftreg Q (Q, datain, A[0], clk, ldQ, clrQ, sftQ);  
dff QM1 (Q[0], qm1, clk, clrf);  
PIPO MR (datain, M, clk, ldM);  
ALU AS (Z, A, M, addsub);  
counter CN (count, decr, ldcnt, clk);  
endmodule
```

```
module shiftreg (data_out, data_in, s_in, clk,  
ld, clr, sft);  
input s_in, clk, ld, clr, sft;  
input [15:0] data_in;  
output reg [15:0] data_out;  
always @ (posedge clk)  
begin  
if (clr) data_out <= 0;  
else if (ld)  
data_out <= data_in;
```

```
else if (sft)
    data-out <= {s-in, data-out[15:1]};  
end  
endmodule
```

```
module PIPD (data-out, data-in, clk, load);
    input [15:0] data-in;
    input load, clk;
    output reg [15:0] data-out;
    always @ (posedge clk)
        if (load) data-out <= data-in;
endmodule
```

```
module DFF (d, q, clk, dr);
    input d, clk, dr;
    output reg q;
    always @ (posedge clk)
        if (dr) q = 0;
        else q <= d;
endmodule
```

```
module ALU (out, in1, in2, addsub);
    input [15:0] in1, in2;
    input addsub;
    output reg [15:0] out;
    always @(*)
        begin
            if (addsub == 0) out = in1 - in2;
            else out = in1 + in2;
        end
endmodule
```

```

module counter (data_out, decr, ldcnt, clk)
    input decr, clk;
    output [4:0] data_out;
    always @ (posedge clk)
        begin
            if (ldcnt) data_out <= 5'b10000;
            else if (decr) data_out <= data_out - 1;
        end
endmodule

```

```

module controller (ldA, clrA, stfA, ldG, clrg,
                   stfB, ldM, clrf, addsub, start, decr,
                   ldcnt, done, clk, q0, qm1);
    input clk, q0, qm1, start;
    output reg ldA, clrA, stfA, ldG, clrg, stfB,
           ldM, clrf, addsub, decr, ldcnt, done;

```

$\text{reg } [2:0] \text{ state};$   
 parameter  $s_0 = 3'b000, s_1 = 3'b001, s_2 = 3'b010,$   
 $s_3 = 3'b011, s_4 = 3'b100, s_5 = 3'b101, s_6 = 3'b110;$

$\text{always } @ (\text{posedge clk})$   
 $\text{case (state)}$   
 $s_0 : \text{if (start) state} \leftarrow s_1;$   
 $s_1 : \text{state} \leftarrow s_2;$   
 $s_2 : \#2 \text{ if } (\{q0, qm1\} == 2'b01) \text{ state} \leftarrow$   
 $\cdot \text{ else if } (\{q0, qm1\} == 2'b10) \text{ state} \leftarrow s_4;$   
 $\cdot \text{ else state} \leftarrow s_5;$   
 $s_3 : \text{state} \leftarrow s_5;$   
 $s_4 : \text{state} \leftarrow s_5;$   
 $s_5 : \#2 \text{ if } ((\{q0, qm1\} == 2'b01) \& !q2) \text{ state} \leftarrow s_1;$

else if  $\{(f_{q0}, q=1) = 2'b01\} \text{ if } !eq_2\}$  state  $\leftarrow s_4;$   
else if  $(eq_2) \text{ state } \leftarrow s_6;$   
 $s_6, \text{ state } \leftarrow s_6;$   
default: state  $\leftarrow s_0;$   
endcase  
~~end~~

always @ (state)

begin  
case (state)

$s_0 : \text{begin } drA = 0; ldA = 0; sfTA = 0; drB = 0;$   
 $ldQ = 0; sfTQ = 0; ldM = 0; drff = 0; done = 0;$   
end

$s_1 : \text{begin } drA = 1; drff = 1; ldcnt = 1; ldM = 1; end;$

$s_2 : \text{begin } drA = 0; drff = 0; ldcnt = 0; ldM = 0;$   
 $ldQ = 0; end$

$s_3 : \text{begin } ldA = 1; addsub = 1; ldQ = 0; sfTA = 0;$   
 $sfTQ = 0; decr = 0; end$

$s_4 : \text{begin } ldA = 1; addsub = 0; lgB = 0; sfTA = 0;$   
 $sfTQ = 0; decr = 0; end$

$s_5 : \text{begin } sfTA = 1; sfTQ = 1; lddt = 0; ldQ = 0;$   
 $decr = 1; end$

$s_6 : done = 1;$

default: begin drA = 0; sfTA = 0; ldA = 0;  
sfTQ = 0; end

endcase

end

endmodule

## Modeling Memory

How to Model Memory?

- Memory is typically included by instantiating a pre-designed module from a design library.
- Alternatively, we can model memories using two-dimensional arrays.

module memory\_model (...)

```
reg [7:0] mem[0:1023];
endmodule
```

→ each memory word  
is of type [7:0],  
i.e. 8 bits.

The memory words can be

accessed as  
mem[0], mem[1], ...  
mem[1023].

How to initialize Memory?

- By reading memory data patterns from a specified disk file.

- Used for simulation

- Used for test benches

- Two verilog functions can be used:

\$readmemb (filename, memname, startaddr, stopaddr)  
(Data is read in binary format)

\$readmemh (filename, memname, startaddr, stopaddr)  
(Data is read in hexadecimal format)

- if "startaddr" and "stopaddr" are omitted, the entire memory is read.

```
module memory_model (...);  
reg [7:0] mem[0:1023];  
initial
```

begin

\$readmemh ("mem.dat", mem);

end

endmodule

~~ex 3~~  
 module memory-model (. . . );  
 input [7:0] mem [0:1023];  
 initial  
 begin  
 read memb ("mem.dat", mem, 200, 50);  
 end  
 endmodule

~~ex 3~~  
 Single-port RAM with synchronous read/write  
 module ram\_1 (addr, data, clk, rd, wr, cs);  
 input [7:0] addr;  
 output [7:0] data;  
 reg [7:0] mem [1023:0];  
 reg [7:0] d\_out;  
 assign data = (cs & rd) ? d\_out : 8'bz;  
 always @ (posedge clk)  
 if (cs & wr & !rd) mem[addr] = data;  
 always @ (posedge clk)  
 if (cs & rd & !wr) d\_out = mem[addr];  
 endmodule



E<sup>x</sup> 4 || Single-port RAM with asynchronous read/write

```

module ram_2 (addr, data, rd, wr, cs);
    input [9:0] addr; input rd, wr, cs;
    inout [7:0] data;
    reg [7:0] mem [1023:0];
    reg [7:0] d_out;

    assign data = (cs & l_rd) ? d_out : 8'bz;

    always @ (addr or data or rd or wr or cs)
        if (cs && wr && !rd) mem[addr] = data;

    always @ (addr or rd or wr or cs)
        if (cs && rd && !wr) d_out = mem[addr];

endmodule

```

E<sup>x</sup> 5 A ROM / EEPROM (Erasable programmable ROM)

```

module rom (addr, data, rd_en, cs);
    input [7:0] addr; input rd_en, cs;
    output reg [7:0] data;

    always @ (addr or rd_en or cs)
        case (addr)
            0 : data = 22;
            1 : data = 45;
            7 : data = 19;

endmodule

```

- Some simulation or synthesis tool gives inconsistent behaviour when using the "inout" data type.  
Such "inout" bidirectional data should be avoided.
- A better way to design a memory unit is to keep the data input and data output bus symbol lines separate.



```

module ram_3(data-out, data-in, addr, wr, cs);
parameter addr-size = 10, word-size = 8;
memory-size = 1024;
input [addr-size-1:0] addr;
input [word-size-1:0] data-in;
input wr, cs;
output [word-size-1:0] data-out;
reg [word-size-1:0] mem[memory-size-1:0];
assign data-out = mem[addr];
always @ (wr or cs)
  if (wr) mem[addr] = data-in;
endmodule

```

- For bidirectional data bus, tristate buffers can be included explicitly.



```

tri [7:0] Bus;
wire [7:0] dataout, data-in;
assign Bus = read ? data_out : 8'h22;
assign data-in = write ? Bus : 8'h22;

```

|| testbench for ram-3

```

module RAM-test
    reg [7:0] address;
    wire [7:0] data-out;
    reg [7:0] data-in;
    reg write, select;
    integer k, myseed;

```

ram-3 RAM (data-out, data-in, address, write, select);

```

initial begin
    for (k=0; k <= 1023; k=k+1)
        begin
            data-in = (k+k) % 256; read = 0; write = 1; select = 1;
            #9 write = 0; select = 0;
        end
end

```

repeat (20)

begin

#2 address = random(myseed) % 1094;  
writ = 0; select = 1;  
& display ("Address : %5d", Data = %8d,  
address, data);

end

end

initial myseed = 3^r;

end module

## Modelling register banks

- A register bank or register file is a group of registers, any of which can be randomly accessed.
  - Commonly used in computers to store the user-accessible registers.
- For ex., in MIPS 32 processor, there are 32 32-bit registers, referred to as R0, R1, ..., R31.
- Can be implemented as independent or an array.
- Register banks often allow concurrent access e.g. MIPS 32 allows 2 register reads & 1 register write every clock cycle.
- It is assumed that the same register is not read & written simultaneously.



// 32 x 32 register file

```
module regbank_v3 (rdData1, rdData2, wrData,  
                    sr1, sr2, dr, write, clk);  
    input clk, write;  
    input [7:0] sr1, sr2; dr; // source & destination  
    input [31:0] wrData;  
    output [31:0] rdData1, rdData2;  
  
    reg [31:0] regfile [0:31];  
    assign rdData1 = regfile[sr1];  
    assign rdData2 = regfile[sr2];  
  
    always @ (posedge clk)  
        if (write) regfile[dr] <= wrData;  
endmodule
```

// 32 x 32 register file with reset facility

```
module regbank_v4 (rdData1, rdData2, wrData, rd,  
                    sr1, sr2, dr, write, reset, clk);  
    input clk, write, reset;  
    input [7:0] sr1, sr2, dr;  
    input [31:0] wrData;  
    output [31:0] rdData1, rdData2;  
    reg [31:0] regfile [0:31];  
  
    assign rdData1 = regfile[sr1];  
    assign rdData2 = regfile[sr2];
```

Date \_\_\_\_\_  
Page \_\_\_\_\_

```

always @ (posedge clk)
begin
    if (rst) begin
        for (k=0; k< 32; k=k+1) begin
            regfile[k] <= 0;
    end
    end
    else if begin
        if (wrk)
            regfile [adr] <= wrData;
    end
end
endmodule

```

```

// testbench
module regfile-test;
reg [7:0] sr1/sr9, dr;
reg [31:0] wrData;
reg wrk, rset, clk;
wire [31:0] rdData1, rdData2;
integer k;

regbank-v4 REG (rdData1, rdData2, wrData,
                 sr1, sr2, dr, wrk, reset, clk);
initial clk >0;
always #5 clk = !clk;
initial
begin
    $dumph ($"regfile.vcd");
    #1 rset=1; wrk=0;
    #5 rset=0;
end

```

```

init_if
begin
#17
for (k=0; k< 32 ; k=k+1)
begin
    dr = k; wrData = 10*k; write=1;
    #10 write=0;
end
#20
for (k=0 ; k < 32 ; k=k+2)
begin
    sr1 = k; sr2 = k+1;
    #5;
    fdisplay ("reg[%2d]=%d, reg[%2d]=%d");
    end(sr1, rdData1, sr2, rdData2);
    #200 $finish;
end
endmodule

```

## Pipelining

- A mechanism for overlapped execution of several inputs sets by partitioning some computation into a set of  $k$ -sub computations (or stages)
- Very significant speed up (generally  $k$ )
- Suppose we want to attain  $k$  times speed up for some computation.
  - Alternative 1: Replicate the hardware  $k$  times → cost also goes up  $k$  times.

- Alternative 2: split the computation into k stages  
→ very minimal cost increase

- Need for buffering

- in hardware pipeline, we need a latch between successive stages to hold the intermediate results temporarily.

Model of a Synchronous k-stage Pipeline



- The latches are made with master-slave flip flops and serve the purpose of isolating input from outputs.
- The pipeline stages are typically combinational circuits.
- When clock is applied, all latches transfer data to the next stage simultaneously.

Structure of the Pipeline

- Linear Pipeline: the stages that constitute the pipeline are executed one by one in sequence



- Non-linear Pipeline: The stages may not execute in a linear sequence



A possible sequence: A, B, A, B, C, A, G

## Reservation Table

- The Reservation Table is a data structure that represents the utilization pattern of successive stages in a synchronous pipeline.
- Basically a space-time diagram of the pipeline that shows precedence relationship among pipeline stages
  - X-axis shows the time steps
  - Y-axis shows stages

- Number of columns give evaluation time



- The reservation table for a 4-stage linear pipeline is shown

|                | S <sub>1</sub> | S <sub>2</sub> | S <sub>3</sub> | S <sub>4</sub> |
|----------------|----------------|----------------|----------------|----------------|
| S <sub>1</sub> | X              |                |                |                |
| S <sub>2</sub> |                | X              |                |                |
| S <sub>3</sub> |                |                | X              |                |
| S <sub>4</sub> |                |                |                | X              |

- Reservation table for a 3-stage dynamic multi-function pipeline is shown



- Contains feedforward and feedback connections.
- Two functions X and Y

|                | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|----------------|---|---|---|---|---|---|---|---|
| S <sub>1</sub> | X |   |   |   |   | X |   | X |
| S <sub>2</sub> |   | X |   | X |   |   |   |   |
| S <sub>3</sub> |   |   | X |   | X |   | X |   |

|                | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|----------------|---|---|---|---|---|---|---|---|
| S <sub>1</sub> | Y |   |   |   |   | Y |   |   |
| S <sub>2</sub> |   |   | X |   |   |   |   |   |
| S <sub>3</sub> |   | Y |   | Y |   |   | Y |   |

- Some characteristics:

- Multiple X's in a row: repeated use of the same stage in different cycles.
- Long runs X's in a row: extended use of a stage over more than one cycle.
- Multiple X's in a column: multiple stages are used in parallel during a clock cycle.

## Speedup and Efficiency

Some notations:

$\tau$  :: clock period of the pipeline

$t_i$  :: time delay of the circuitry in stages;

$d_L$  :: delay of a latch

Maximum stage delay  $t_m = \max\{t_i\}$

$$\tau = t_m + d_L$$

Pipeline frequency  $f = 1/\tau$

- Total time to process  $N$  data sets is given by

$$T_k = ((k-1) + N) \tau$$

$$\text{Speedup } S_K = \frac{N k \tau}{K \tau + (N-1) \tau} = \frac{N k}{k + N - 1} \quad \text{as } N \rightarrow \infty, \\ S_K \rightarrow k$$

clock skew / jitter / setup time

- The minimum clock period of the pipeline must satisfy the inequality:

$$\tau \geq t_{\text{skew+jitter}} + t_{\text{logic+setup}}$$

### Definitions :

- skew : Maximum delay difference b/w the arrival of clock signals at the stage latches.
- Jitter : Maximum delay difference b/w the arrival of clock signal at the same latch.
- Logic delay : Maximum delay of the slowest stage in the pipeline.
- Setup time : Minimum time a signal needs to be stable at the input of a latch before it can be capture.

~~Ex~~ A simple example

- We consider example of a very simple 3-stage pipeline.
  - Four  $N$ -bit unsigned integers  $A, B, C$  and  $D$  as inputs.
  - An  $N$ -bit unsigned integer  $F$  as output.
  - The following computations are carried out in stages :
- $s_1 : x_1 = A + B ; \quad x_2 = C - D ;$
  - $s_2 : x_3 = x_1 + x_2 ;$
  - $s_3 : F = x_3 * D ;$



All the intermediate variables are allocated storage in the inter-stage latches itself.

S1

S2

S3



L34

One stage per "always" block

```
module pipe_ex (f, A, B, C, D, ckb);
parameter N = 10;
input [N-1:0] A, B, C, D;
input ckb;
output [N-1:0] f;
reg [N-1:0] L12_x1, L12_x2, L12_D, L23_x3, L23_D,
L3Y_F;
assign f = L3Y_F;
```

always @ (posedge ckb) // stage 1

begin  
 $L_{12\_x1} \leftarrow \#4 A + b;$

$L_{12\_x2} \leftarrow \#4 C - l;$

$L_{12\_D} \leftarrow D;$

end

always @ (posedge ckb) // stage 2

begin

$L_{23\_x3} \leftarrow \#4 L_{12\_x1} + L_{12\_x2};$

$L_{23\_D} \leftarrow L_{12\_D};$

end

always @ (posedge clk) // step 3  
L3Y-F <= #6 193-x3 \* 193-D;  
endmodule

## // Testbench

module pipe\_1-test;

parameter N = 10;

wire [N-1:0] F;

reg [N-1:0] A, B, C, D;

reg clk;

pipe-ex MYPipe (F, A, B, C, D, clk);

initial clk = 0

always #10 clk = ~clk;

initial

begin

#5 A=10; B=9; C=6; D=3; // F=75 (9Bh)

#20 A=10; B=10; C=5; D=3; // F=66 (42h)

#90 A=90; B=11; C=1; D=4; // F=119 (70h)

#20 A=15; B=10; C=8; D=2; // F=69 (3Eh)

#20 A=8; B=15; C=5; D=0; // F=0 (00h)

#20 A=10; B=90; C=5; D=3; // F=66 (42h)

#90 A=10; B=90; C=30; D=1; // F=49 (31h)

#90 A=30; B=1; C=9; D=4; // F=116 (74h)

end

initial

begin

\$dumpfile ("pipe\_1.vcd");

\$dumpvars (0, pipe\_1-test);

\$monitor ("Time : %d, F = %d", \$time, F);

# 300 \$finish

endmodule

- A warning
- In this example, we have used a single phase clock to load all the inter-stage registers in the pipeline.
  - In a real pipeline, this may lead to race condition.
  - Possible solution:
    - Use master-slave flip-flops in the registers.
    - Use a two-phase clock to clock the alternate stages.

### A more complex Example

- Consider a pipeline that carries out the following stage-wise operations:
- Inputs: Three register addresses ( $rs_1$ ,  $rs_2$ , and  $rd$ ) for an ALU function ( $f_{func}$ ), and a memory address ( $addr$ ).
- Stage 1: Read two 16-bit numbers from the registers specified by " $rs_1$ " and " $rs_2$ ", and store them in A and B.
- Stage 2: Perform an ALU operation on A and B specified by " $f_{func}$ ", and store it in Z.
- Stage 3: Write the value of Z in the register specified by " $rd$ ".
- Stage 4: Also write the value of Z in memory location " $addr$ ".

The Assumptions :-

- There is a register bank containing 16 16-bit registers.
- 4 bits are required to specify a register address.
- 1 register reads and 1 register write can be

performed every clock cycle.

- Register addresses are "rs<sub>1</sub>", "rs<sub>2</sub>" and "rd".

- Assume that the memory is organized as  $256 \times 16$ .

- 8 bits are required to specify memory address.

- Every memory location contains 16 bits of data, which can be read in a single clock cycle.

- Memory address specified by "addr".

- The ALU function is selected by a 4-bit field "funct" as follows:

0000 : ADD

0011 : SELA

0110 : OR

1001 : NEGAB

0001 : SUB

0100 : SELB

0111 : XOR

1010 : SRA

0010 : MUL

0101 : AND

1000 : NEGA

1011 : SLA

output is A

shift right

shift left



Clocking issue in Pipeline

- It is important that the consecutive stages be supplied suitable clocks for correct operation.

- Two options:

  - Use master/slave flip-flops in the latches to avoid race condition.
  - Use non-overlapping two phase clock for the consecutive pipeline stages.



```
module pipe_ex2 (zout, rs1, rs2, rd, func, addr, clk1,
```

```
clk2);
```

```
input [3:0] rs1, rs2, rd, func;
```

```
input [7:0] addr;
```

```
input clk1, clk2; // Two phase clock
```

```
output [15:0] zout;
```

```
reg [15:0] L12_A, L12_B, L23_2, L34_2;
```

```
reg [3:0] L12_rd, L12_func, L23_rd;
```

```
reg [7:0] L12_addr, L23_addr, L34_addr;
```

```
reg [15:0] regbank [0:15]; // Register Bank
```

```
reg [15:0] mem [0:255]; // 256x16 memory
```

assign  $Z_{out} = L3Y - Z;$

always @ (posedge clk1) // stage 1

begin

~~$regbank[193\_rd] <= \#2 L93\_Z;$~~

~~$L3Y\_Z <= \#2 L93\_Z;$~~

end

begin

$L12\_A <= \#2 regbank[rs1];$

$L12\_B <= \#2 regbank[rs2];$

$L12\_rd <= \#2 rd;$

$L12\_func <= \#2 func;$

$L12\_addr <= \#2 addr;$

end

always @ (posedge clk2) // stage 2

begin

case (func)

0 :  $L93\_Z <= \#2 L12\_A + L12\_B;$

1 :  $L93\_Z <= \#2 L12\_A - L12\_B;$

2 :  $L93\_Z <= \#2 L12\_A * L12\_B;$

3 :  $L93\_Z <= \#2 L12\_A;$

4 :  $L93\_Z <= \#2 L12\_B;$

5 :  $L93\_Z <= \#2 L12\_A \& L12\_B;$

6 :  $L93\_Z <= \#2 L12\_A | L12\_B;$

7 :  $L93\_Z <= \#2 L12\_A ^ L12\_B;$

8 :  $L93\_Z <= \#2 - L12\_A;$

9 :  $L93\_Z <= \#2 - L12\_B;$

10 ;  $L93\_Z <= \#2 L12\_A >> 1;$

11 ;  $f93\_Z <= \#2 L12\_A << 1;$

default :  $L93\_Z <= \#2 16'hxxxx;$

end case

```
L23_rd <= #2 L12_rd;
L23_addr <= #2 L12_addr;
end
```

```
always @ (posedge clk1) // stage 3
begin
```

```
reg bank[L23_rd] <= #1 L23_2;
```

```
L34_2 <= #2 L23_2;
```

```
L34_addr <= #2 L23_addr;
```

```
always @ (negedge clk2) // stage 4
```

```
begin
```

```
mem[L34_addr] <= #2 L34_2;
```

```
end
```

```
end module
```

// testbench

```
module pipe2_test;
```

```
wire [15:0] z;
```

```
reg [3:0] s1, s2, rd, func;
```

```
reg [7:0] addr;
```

```
reg clk1, clk2;
```

```
integer k;
```

```
pipe#(2) MYPipe(z, s1, s2, rd, func, addr,
clk1, clk2);
```

initial

```
begin
```

```
clk1 = 0; clk2 = 0;
```

```
repeat (20)
```

```
begin
```

```
#5 clk1 = 1; #5 clk1 = 0;
```

Date \_\_\_\_\_  
Page \_\_\_\_\_

```

#5 clk2 = 1; #5 clk2 = 0;
end
end
initial
for (k=0; k<16; k=k+1)
    MYPIPE.regbank[k] = k; //initialise registers
begin
#5 rs1 = 3; rs2 = 5; rd = 10; func = 0; addr = 125; //ADD
#20 rs1 = 3; rs2 = 8; rd = 11; func = 2; addr = 126; //MUL
#20 rs1 = 10; rs2 = 5; rd = 14; func = 1; addr = 128; //SUB
#20 rs1 = 7; rs2 = 3; rd = 13; func = 11; addr = 127; //SLA
#20 rs1 = 10; rs2 = 5; rd = 15; func = 2; addr = 129; //SUB
#20 rs1 = 11; rs2 = 13; rd = 16; func = 0; addr = 130; //ADD

```

```

#60 for (k=125; k<131; k+1)
    $display ("Mem [%d] = %d", k, MYPIPE.mem)
end

```

```

initial
begin
    $dumpfile ("pipe9.vcd");
    $dumpvars (0, pipe9-test);
    $monitor ("time : %d, F=%d", $time, Z);
    #300 $finish
end
endmodule

```

## Pipeline implementation of a processor

- We shall first look at instruction set architecture of a popular Reduced instruction set architecture (RISC) i.e. MIPS32.

### MIPS32

- MIPS registers:
  - a) 32, 32-bit general purpose registers (GPRs), R0 to R31.  
Register R0 contains a constant 0, cannot be written.
  - b) A special-purpose 32-bit program counter (PC).  
Points to the next instruction in memory to be fetched and executed.
- No flag registers (zero, carry, sign, etc).
- Very few addressing modes (register, immediate, register indexed, etc).
  - Only load and store instructions can access memory.
- We assume memory word size is 32-bits (word addressable).

### Instruction Subset being considered

- Load & Store instructions

LW R9, 124(R8) //  $R9 = \text{Mem}[R8 + 124]$

SW R5, -10(R25) //  $\text{Mem}[R25 - 10] = R5$

- Arithmetic & Logic instructions

ADD R1, R2, R3 //  $R1 = R2 = R3$

ADD R1, R2, R0 //  $R1 = R2 + 0$

SUB R12, R10, R8 //  $R12 = R10 - R8$

AND R20, R2, R5 //  $R20 = R2 \& R5$

OR R11, R5, R6 //  $R11 = R5 | R6$

MUL R5, R6, R7 //  $R5 = R6 * R7$

SLT R9, R11, R12 // If  $R11 < R12$ ,  $R9 = 1$ ; else  $R9 = 0$

$\text{ADD} \quad R1, R2, 25 \quad // R2 = R2 + 25$   
 $\text{SUB} \quad R5, R1, 150 \quad // R5 = R1 - 150$   
 $\text{SLT} \quad R2, R10, 10 \quad // \text{if } R10 < 10, R2 = 1; \text{ else } R2 = 0$

### Branch Instructions

$\text{BEQZ} \quad R1, \text{Label} \quad // \text{Branch to Loop if } R1 = 0$   
 $\text{BNEQZ} \quad R5, \text{Label} \quad // \text{Branch to Label if } R5 \neq 0$

### Jump Instruction

$\text{J} \quad \text{Loop} \quad // \text{Branch to Loop unconditionally}$

### Miscellaneous instruction

$\text{HLT} \quad // \text{Halt execution}$

### MP3S instruction encoding

#### Instruction are of 3 type - r, i, j

##### a) r-type instruction

- Two source and one destination

In addition, for shift instructions, the number of bits to shift can also be specified (we are not considering such instructions here)



$\text{ADD} : \frac{\text{opcode}}{000000}$

$\text{MUL} : 000101$

$\curvearrowleft \text{Not used here}$

$\text{SUB} : 000\ 001$

$\text{HLT} : 111111$

$\text{AND} : 000\ 010$

$\text{OR} : 000\ 011$

$\text{SLT} : 000\ 100$

### b) I-type instruction Encoding

• contains 16-bit immediate data field

• supports one source and one destination registers



LW : 001000

SW : 001001

ADDI : 001010

SUBI : 001011

SLTI : ~~0010~~1001100

~~SETE~~ SNEQZ : 001101

BEGZ : 001110

### c) J-type instruction

• contains a 96-bit jump address field

• extended to 98 bits by padding two 0's on right



J : 010000

### Instruction cycle

IF  $\rightarrow$  ID  $\rightarrow$  EX  $\rightarrow$  MEM  $\rightarrow$  WB

J

DR  $\leftarrow$  Mem[PC];

NP  $\leftarrow$  PC + 1;

(next PC)

ID

$$A \leftarrow \text{Reg}[rs];$$

$$B \leftarrow \text{Reg}[rt];$$

$$\text{Imm} \leftarrow (\text{Reg}_{rs})^6 \# \# I_{R_{rs} \dots 0} \quad // \text{sign extension}$$

$$\text{Imm}^1 \leftarrow (\text{Reg}_{rs})^6 \# \# I_{R_{rs} \dots 0} \quad // \text{sign extension}$$

~~Ex~~  $\rightarrow$  ALU operator.

Mem references

$$\text{ALUOut} \leftarrow A + \text{Imm};$$

Reg - Reg

$$\text{ALUOut} \leftarrow A \text{ func } B;$$

Reg - imm

$$\text{ALUOut} \leftarrow A \text{ fun Imm};$$

Branch

$$\text{ALUOut} \leftarrow \text{NPC} + \text{Imm}; \quad \text{or BEQZ } R_i, \text{Label}$$

$$\text{cond} \leftarrow (A \text{ op } 0); \quad [\text{op is } ==]$$

MEM

Load inst

$$\text{PC} \leftarrow \text{NPC};$$

$$\text{(Load inst)} \quad \text{LMD} \leftarrow \text{Mem[ALUOut]};$$

Store

$$\text{PC} \leftarrow \text{NPC};$$

$$\text{Mem[ALUOut]} \leftarrow B;$$

other inst

$$\text{PC} \leftarrow \text{NPC};$$

Branch

$$\text{if (cond) PC} \leftarrow \text{ALUOut};$$

$$\text{else PC} \leftarrow \text{NPC};$$

WB

Reg - Reg

$\text{Reg}[rd] \leftarrow ALUout$

Reg - imm

$\text{Reg}[rt] \leftarrow ALUout;$

Load instr

$\text{Reg}[rt] \leftarrow LMD;$

Data Path  
(non-pipelined)



Pipelined

- We should be able to start a new memory in every clock cycle
- Each stage should be completed within 1 clock period.

Convention used

~~IF~~

The latencies  $\rightarrow$  IF-ID, ID-EX, EX-MEM, MEM-WB



a) Micro operations for stage IF

$$IF-ID-IR \leftarrow Mem[PC]$$

$IF-ID-NPC,_{\alpha}PC \leftarrow \begin{cases} if ((EX-MEM-IR[Op\_of\_{PC}] == branch) \\ \& EX-MEM-cond) \quad \{ EX-MEM\_ALUOut \} \\ else \{ PC + 1 \} \end{cases}$



b)

ID

$$\begin{aligned} ID-B1-A &\leftarrow Reg[IF-ID-IR[\alpha]] ; \\ ID-Ex-B &\leftarrow Reg[IF-ID-IR[\beta]] ; \\ ID-Ex-NPC &\leftarrow IF-ID-NPC ; \end{aligned}$$

$$ID\_Ex\_IR \leftarrow IF\_ID\_IR;$$

$$ID\_Ex\_Imm \leftarrow \text{sign-extend}(IF\_ID\_IR_{15-0});$$



From MEM\_WB\_ALVOut  
or MEM\_WB\_LMD

From MEM\_WB\_2R[rd]

C) Ex R-> ALV

$Ex\_MEM\_IR \leftarrow ID\_Ex\_IR;$

$Ex\_MEM\_ALVOut \leftarrow ID\_Ex\_A \text{ func } ID\_Ex\_B;$

~~R-M ALV~~

$Ex\_MEM\_IR \leftarrow ID\_Ex\_IR;$

$Ex\_MEM\_ALVOut \leftarrow ID\_Ex\_A \text{ func } ID\_Ex\_Imm;$

~~R-M ADD Load/Store~~

$Ex\_MEM\_IR \leftarrow ID\_Ex\_IR;$

$Ex\_MEM\_ALVOut \leftarrow ID\_Ex\_A + ID\_Ex\_Imm;$

$Ex\_MEM\_B \leftarrow ID\_Ex\_B;$

Branch

$Ex\_Mem\_ALVout \leftarrow LD\_Ex\_NP (+ ID\_Ex\_Imm);$   
 $Ex\_Mem\_cond \leftarrow (1 - Ex\_A == 0);$



d) MEM

ALU

$MEM\_WB\_IR \leftarrow Ex\_MEM\_IR;$

$MEM\_WB\_ALVout \leftarrow Ex\_MEM\_ALVout;$

Load

$MEM\_WB\_SR \leftarrow Ex\_MEM\_IR;$

$MEM\_WB\_LMD \leftarrow Mem[Ex\_MEM\_ALVout];$

Store

$MEM\_WB\_IR \leftarrow Ex\_MEM\_IR;$

$Mem[Ex\_MEM\_ALVout] \leftarrow Ex\_MEM\_B;$

To IF stage

To IF stage



e) ~~W/D~~

~~AB ALV~~

$\text{Reg}[\text{MEM\_WB\_JK}[\text{rd}]) \leftarrow \text{MEM\_WB\_ALVat},$

R-M ALV

$Ry[MEMWB\_IR[\delta^+]] \leftarrow MEM\_WB\_ALUout;$

Load

$\text{Re}[\text{MEM\_WB\_IR}(rt)] \leftarrow \text{MEM\_WB\_LMD};$



## Branch implementation

|        |    |                  |     |    |     |                  |    |
|--------|----|------------------|-----|----|-----|------------------|----|
| i = 27 | 28 | Ex               | MEM | WB |     |                  |    |
| → i+1  |    | IF               | ID  | Ex | MEM | WB               |    |
| → i+2  | *  |                  | IF  | ID | Ex  | MEM              | WB |
| → i+3  | *  |                  | IF  | ID | Ex  | MEM              | WB |
| →      |    |                  | IF  | ID | Ex  |                  |    |
| 3      | 3  |                  |     |    |     |                  |    |
| HLT    |    | REG2 - REG1 loop |     |    |     | HALTED [1]       |    |
|        |    |                  |     |    |     | TAKEN-BRANCH [1] |    |

- For general 3-bit variables are used:
- HALTED : set after a HLT instruction executes and removes the WB stage.
- TAKEN-BRANCH : set after the decision to take a branch is known. Required to disable the instructions that have already entered the pipeline from making any state change.

~~module pipe\_MIPS32 (clk1, clk2);~~  
~~input clk1, clk2; // Two clock~~  
~~reg [31:0] PC, IF\_ID - IR, IF\_ID;~~

~~E<sup>8</sup>~~ Add 3 nos 10, 20, 30 stored in processor registers.

- initialize register R1 with 10
- " " R2 with 20 ] ] ]
- " " R3 with 30 ] ] ]
- Add 3 nos & store the sum in R4, R5.

|        |            |                                                         |
|--------|------------|---------------------------------------------------------|
| ADD \$ | R1, R0, 10 | $\rightarrow 001010\ 00000\ 00001\ 0000000000010001000$ |
| ADD \$ | R2, R0, 20 | $\rightarrow 001010\ 00000\ 00010\ 00000000000101000$   |
| ADD \$ | R3, R0, 25 | $\rightarrow 001010\ 00000\ 00011\ 0000000000011001$    |
| ADD    | R4, R1, R2 | $\rightarrow 000000\ 00001\ 00010\ 00100\ 00000\ 00000$ |
| ADD    | R5, R4, R3 | $\rightarrow 000000\ 00100\ 00011\ 00101\ 00000\ 00000$ |
| HLT    |            | $\rightarrow 11111\ 00000\ 00000\ 00000\ 00000\ 00000$  |

In case of hazards, we insert dummy instr.

~~B7^2~~  
Load a word stored in memory location 120, add 45h, and store the result in memory location 121.

ADD \$ R1, R0, R0 → 0010010 00000 00001 00000000001111000  
 LW R2, 0(R1) → 001000 00001 00010 000000000 000000000  
 ADD \$ R2, R2, R5 → 001010 00010 00010 00000000000101101  
 SW R2, 1(\$R1) → 001001 00010 00001 00000000000000001  
 HALT → 11111 00000 00000 00000000000000000

~~Ex 3~~ Compute the factorial of number  $N$  stored in memory location 200. The result will be stored in memory location 198.

## - The steps:

- initialize register R10 with the memory address 200.
  - Load contents of memory location 200 into register R3.
  - initialize register R9 with the value 1