

## Digital System Designs:-

ASM:

# FINALS



| CS | In | NS  | Output |
|----|----|-----|--------|
| 00 | 0  | 00  | 0      |
| 00 | 1  | 01  | 0      |
| 01 | 0  | 001 | 0      |
| 01 | 1  | 10  | 0      |
| 10 | 0  | 10  | 0      |
| 10 | 1  | 00  | 1      |

Bits of CS & NS:-

No. of states = 4<sup>3</sup>

$2^{\textcircled{2}} = 4 \rightarrow 2 \text{ bits}$

DSD

Microprogrammed SM:-



## Another Case..



dependent upon states only  
not dependent on input.

$CS$ ;  
 $CS + 1 = NS$ ;  
 $\Rightarrow CS$  will always start  
from zero.



counter based state  
machine.

controls  
execution

## Decode units:-

Interprets instruction.

## Custom Processors Design: (ISA)

~~CPU~~ Instruction set Architecture.

set of instructions:

- ① ALU instructions (+, -, ×)
- ② load, store, branch, jump, subroutine.

↓

function calling.

Basic Types of instructions:-

→ ① ALU based      ② Control Based.  
(if/else, loop, goto)

## ① ALU based processor:-

sequential

$$\bullet R_i = R_j + R_k$$

$$R_i = R_j - R_k$$

$$R_i = R_j \cdot R_k$$

$$R_i = R_j \& R_k$$

$$R_i = R_j$$

$R \rightarrow$  Register file. (set of  
 $i, j, k \in \{0, 1, 2, 3\}$ )

$R_i \in R_0, R_1, R_2, R_3$

$R_j \rightarrow$   
 $R_k$

{ These have  
four Register  
each

\* All registers are 8 bit wide.



{ Execute  
Decode  
Fetch



Control Signals: sel*i*, sel*j*, sel*k*, op-sel



- |   |                      |   |                |
|---|----------------------|---|----------------|
| ① | $R_i = R_j + R_k$    | 0 | $sel - op = 0$ |
| ② | $R_i = R_j - R_k$    | 1 |                |
| ③ | $R_i = R_j \mid R_k$ | 2 |                |
| ④ | $R_i = R_j \& R_k$   | 3 |                |
| ⑤ | $R_i = R_j$          | 4 |                |

\* opcode  $\rightarrow$  no. of instrs. (5)  $\rightarrow$  b

\* operands  $\rightarrow R_i$  2 bits  
 $\downarrow$   $\rightarrow R_j$  2 bits  
 $R_k$  (2) bits

|               |                         |                         |                         |
|---------------|-------------------------|-------------------------|-------------------------|
| <u>opcode</u> | <u><math>R_i</math></u> | <u><math>R_j</math></u> | <u><math>R_k</math></u> |
|---------------|-------------------------|-------------------------|-------------------------|

$\underbrace{\quad\quad\quad}_{3 \text{ bits}}$   $\underbrace{\quad\quad\quad}_{6 \text{ bits}}$   $\rightarrow$  address stored here.

Dec, 2017.

Lecture #21

Customer processor Design:-  
(Sequential manner)

- ALU based instr.
- Goto Label

$\Rightarrow$  ALU based

$\Rightarrow$  Goto Label

(unconditional jump)

$$R_i = R_j \text{ op } R_k$$

$$R_i = R_j \text{ op } R_m \text{ or } R_k$$

$$R_i = R_j \text{ op const}$$

$$R_i = \text{const.}$$

$$R_i = R_k$$

$$\Rightarrow R_i = R_j \text{ op const}$$

opcode |  $R_i$  |  $R_j$  | const

\* Control Unit



If (opcode == Goto)

sel = 1;

else

sel = 0;

opcode | address/label  
3 bits      8 bits

\* opcode of Goto will be there.



- Conditional Jump (If / else).





comes from decode.

|      |   |                       |
|------|---|-----------------------|
| +    | 0 | if ( $op < 5$ )       |
| -    | 1 | sel 2 = 0             |
| i    | 2 | elseif ( $op = 5$ )   |
| g    | 3 | sel 2 = 3             |
| max  | 4 | else if ( $op == 6$ ) |
| Goto | 5 | sel 2 = 1             |
| if 1 | 6 | else sel 2 = 2        |
| if 2 | 7 |                       |

x      x

① if ( $R_i == 0$ ) goto label

else normal

opcode | Ri | address/label

$R_i \rightarrow [==] \rightarrow 0$

② if ( $R_i > R_j$ ) goto label

else normal

opcode | Ri | Rj | address/label

$R_i \rightarrow [ > ] \rightarrow R_j$

③ if ( $R_0 == 0$ ) goto label

else normal

opcode | address



④

if ( $R_0 = 0$ ) goto  $R_0$  else goto  $R_1$



Subroutine Call: (includes return function).

→ same as goto

→ return as goto -



⇒ if nested loop, then double subroutines, use two registers and implement LIFO.

## Lecture #22

FOR time / count: (count, end address)

5?

C.P.D:-

1- ALU 2- Goto 3- Cond. 4- Subroutine 5- Loop.

opcode | operands | ↳ includes many other things

choose format  
that has max. length.

opcode | count | address | → for Loop

x → x

\*  $f_s \leq f_{sys}$   
 → rate of input  
 → rate of system



$$CPU = 11 \frac{1}{2} TU$$

$$f_{sys} = \frac{1}{11}$$



$$CPU = 8 TU$$

$$f_{sys} = \frac{1}{8}$$

$y_p \rightarrow reg$   
 $reg \rightarrow reg$   
 $reg \rightarrow out$

In first cycle only addition occurs.



In second register saves value in second reg.  
In third we get output.

After 3 cycles, system acts normally.

$\Rightarrow$  delay is generally latency.

$\Rightarrow$  The # of cycles required for I/P to get resp. O/P  
2 clk-cycles

$\Rightarrow$  The no. of registers = latency.  
or cuts.

Feedforward outset:-





latency = 1 CLK cycles.

$$CP = 6 \text{ TU}$$

$$C.R. = \frac{6 \text{ TU}}{1 \text{ CLK}} = f_{sys}$$



\* Registers correspond to speed.

\* more registers more cost.

\* max. resource breakage.

\* 2-stage pipeline  
(1 cut / register)

## ALU Based Processor (sequential)

$$R_i = R_j + R_k$$

$R \rightarrow$  register file

$$R_i = R_j - R_k$$

(set of registers)

$$R_i = R_j \cdot R_k$$

$i, j, k \in \{0, 1, 2, 3\}$

$$R_i = R_j \oplus R_k$$

$$R_i \rightarrow R_0, R_1, R_2, R_3$$

$$R_i = R_j$$

all registers are 8 bit wide

$$\begin{matrix} R_j \\ R_k \end{matrix} \rightarrow$$



Control Signals: sel-i, sel-j, sel-K, sel-op



- |   |                        |
|---|------------------------|
| 0 | $R_i = R_j + R_k$      |
| 1 | $R_i = R_j - R_k$      |
| 2 | $R_i = R_j \mid R_k$   |
| 3 | $R_i = R_j \wedge R_k$ |
| 4 | $R_i = R_j$            |



0-4 same

5 Goto label

(Sequential manner)

~~Register~~

$$R_i = R_j \text{ op } R_k \longrightarrow$$

op G [ +, -, \*, /, %, ... ]

$$R_i = R_j \text{ op } R_m \text{ op } R_k \longrightarrow$$

|        |                |                |                |
|--------|----------------|----------------|----------------|
| opcode | R <sub>i</sub> | R <sub>j</sub> | R <sub>k</sub> |
|--------|----------------|----------------|----------------|

$$R_i = R_j \text{ op const} \longrightarrow$$

|        |                |                |       |
|--------|----------------|----------------|-------|
| opcode | R <sub>i</sub> | R <sub>j</sub> | const |
|--------|----------------|----------------|-------|

$$R_i = \text{const}$$

$$R_i = R_k$$

## Custom Processor Design

- ALU Based instr (sequential manner) datapath
- Goto label (unconditional jump) control unit



if (opcode == goto)

sel = 1

else

sel = 0

## Conditional Jump (if else)

$\text{sel} = 0$  ALU

$\text{sel} = 1$  Goto

$\text{Sel} = \text{cond}$  conditional  
result



if ( $\text{opcode} < 5$ )

$\text{sel2} = 0$

else if ( $\text{opcode} == 5$ )

$\text{sel2} = 3$

else if ( $\text{opcode} == 6$ )

$\text{sel2} = 1$

else

$\text{sel2} = 2$ .

$\text{sel2} = 0$  ALU

$\text{sel2} = \cancel{0} 2$  Goto

$\text{sel2} = 1$  if / else

To incorporate multiple if else



if  $R_i == 0$  goto  
else normal



if  $R_i > R_j$  goto  
else normal



if  $R_o \rightarrow$

if  $R_o == 0$

$R_i \times$



if  $R_o == 0$  goto  $R_o$  address  $\times$



Subroutine Call. (function)  $\rightarrow$  essentially goto  
return  $\rightarrow$  essentially goto



if  $C_1 == 0$   
normal  
else if  $C_1 == 1$   
jump



Slowest Path  $\rightarrow$  Critical Path  
in our design



$$CP = 8 \text{ TU}$$

$$f_{sys} = \frac{1}{8} \text{ TU}$$



$$CP = 6$$

latency : 2 cycles

latency = # of clock cycles req for i/p to get respective o/p  
 ↗ no. of pipeline registers / cuts

feedforward cutset



$$a \rightarrow y$$

$$b \rightarrow y$$

$$c \rightarrow y$$

$$10 \text{ TU}$$

$$10 \text{ TU}$$

$$8 \text{ TU}$$

register breaks path



CP still 10 TU due to b  $\rightarrow$  y



$$CP 8$$

333



2 stage pipeline (1 cut/register)

always cut b/w resources



2 stage pipeline

$$CPD = 2 \text{ TU}$$

$$f_{\text{sys}} = \frac{1}{2 \text{ TU}}$$



# of pipelined registers

# of delays from i/p to o/p (for all paths) must be same  
(equal to no of cuts)

|                   |   |
|-------------------|---|
| $a \rightarrow d$ | 1 |
| $b \rightarrow d$ | 1 |
| $c \rightarrow d$ | 1 |



ignore algorithmic registers in count

(delay)

Node transfer theorem



no. of delays = 1

no. of delays = 1

can only move across the node  
if registers present on all  
inputs



i) place reg equal to  
# of cts on all inputs

ii) location of ct

iii) move registers

move all ip registers to  
ALL op



first move this

then this





$$f_s > f_{sys}$$



## Optimization of Basic building blocks

reduce critical path delay by optimizing operational units



carry ripple adder

### Carry Select Adder

| $\begin{matrix} 1 \\ 0 \end{matrix}$ | $\begin{matrix} 1 \\ 0 \end{matrix}$ | $\begin{matrix} 1 \\ 0 \end{matrix}$ |
|--------------------------------------|--------------------------------------|--------------------------------------|
| 01                                   | 11                                   | 10                                   |
| 01                                   | 01                                   | 00                                   |
| 10                                   | 00                                   | 10 S                                 |
| 0                                    | 1                                    | 0 C                                  |
| 11                                   | 01                                   | 11 S                                 |
| 0                                    | 1                                    | 0 C                                  |

$cin = 0$        $cin = 1$

• Group Size: 2

2 full adder delay

12 full adders.  
faster but more resources

|        |                                      |
|--------|--------------------------------------|
| 110010 | $\begin{matrix} S \\ C \end{matrix}$ |
| 0      |                                      |
| 110011 | $\begin{matrix} S \\ C \end{matrix}$ |
| 0      |                                      |

$cin = 0$        $cin = 1$

can also have non uniform group size. (1-2-3)

Conditional Sum Adder:-

(Group size = 1)

|                     |                     |       |  |  |
|---------------------|---------------------|-------|--|--|
|                     | 0 1 0 0 1 1 1 0     |       |  |  |
|                     | 1 0 1 1 1 0 1 1     |       |  |  |
| G <sub>1</sub> .S=1 | 1 1 1 1 0 1 0 1 S   | cin=0 |  |  |
| delay=1             | 0 0 0 0 1 0 1 0 C   |       |  |  |
|                     | 0 0 0 0 0 1 0 1 0 S | cin=1 |  |  |
|                     | 1 1 1 1 1 1 1 1 C   |       |  |  |
| G <sub>2</sub> .S=2 | 1 1 1 1 0 1 0 1 S   | cin=0 |  |  |
|                     | 0 0 0 1 1 1 C       |       |  |  |
|                     | 0 0 0 0 1 0 1 0 S   | cin=1 |  |  |
|                     | 1 1 1 1 1 1 C       |       |  |  |
| G <sub>3</sub> .S=4 | 1 1 1 1 1 0 0 1 S   | cin=0 |  |  |
|                     | 0 0 0 1 1 C         |       |  |  |
|                     | 0 0 0 0 1 0 1 0 S   | cin=1 |  |  |
|                     | 1 1 1 1 1 1 C       |       |  |  |
| G <sub>4</sub> .S=8 | 0 0 0 0 1 0 0 1 S   | cin=0 |  |  |
|                     | 1 C                 |       |  |  |
|                     | 0 0 0 0 1 0 1 0 S   | cin=1 |  |  |
|                     | 1 C                 |       |  |  |

→ muxes are so many  
 → 3 mux delay (★?)

| cin | in1 | in2 | S | C |
|-----|-----|-----|---|---|
| 0   | 0   | 0   | 0 | 0 |
| 0   | 0   | 1   | 1 | 0 |
| 0   | 1   | 0   | 1 | 1 |
| 0   | 1   | 1   | 0 | 1 |

| <i>cin=1</i> | <i>int1</i> | <i>int2</i> | <i>s</i> | <i>c</i> |
|--------------|-------------|-------------|----------|----------|
| 1            | 0           | 0           | 10       | 0        |
| 1            | 0           | 1           | 0        | 1        |
| 1            | 1           | 0           | 0        | 1        |
| 1            | 1           | 1           | 1        | 1        |

$$S_0 = \text{in1} \wedge \text{in2}$$

$$C_0 = \sin 1 \& \sin 2$$

$$S_1 = \sin 2 \approx 1 \sin 2$$

$$C_1 = \sin 1 / \sin 2$$



where in  
this one  
only  
one gate  
delay  
because  
four  
equations  
execute  
in parallel

Full adder  
uses two  
gate delay  
as two  
 $\times$  ORs  
are  
used.

Casey Select.



\* Trade off b/w area(Resources) & Speed.

### Carry-LookHead-Adder :-

- Relatively less complex
- more good result
- it can actually calculate me carry.
- truth table of full adder.

|                   | in1 | in2 | ci <sub>i</sub> | sum | ci <sub>i+1</sub> /out | whatever cin     |
|-------------------|-----|-----|-----------------|-----|------------------------|------------------|
| delete            | 0   | 0   | 0               | 0   | 0                      | cout always 0    |
|                   | 0   | 0   | 1               | 1   | 0                      |                  |
| propagate         | 0   | 1   | 0               | 1   | 0                      |                  |
| (P <sub>i</sub> ) | 0   | 1   | 1               | 0   | 1                      | dependent on cin |
|                   | 1   | 0   | 0               | 1   | 0                      |                  |
| generate          | 1   | 1   | 0               | 0   | 1                      | whatever         |
| (g <sub>i</sub> ) | 1   | 1   | 1               | 1   | 1                      | cin → cout = 1   |

signals to propagate/generate

$$P_i = \text{in1}_i \wedge \text{in2}_i \quad \left\{ \begin{array}{l} \text{either propagate/generate.} \\ \text{or} \end{array} \right.$$

$$g_i = \text{in1}_i \vee \text{in2}_i$$

$$\text{cout} = c_{i+1} = (P_i \vee c_i) | g_i$$

$$\text{for generate } \left\{ \begin{array}{l} (0 \vee c_i) | 1 \\ 0 | 1 = 1 \end{array} \right.$$

$$\text{for propagate } \left\{ \begin{array}{l} (1 \vee 0) | 0 \rightarrow 1 | 0 = 1 \end{array} \right.$$

$$C_{i+1} = P_i C_i + g_i$$

for  $i=0$

$$C_1 = P_0 C_0 + g_1 \rightarrow 1 \cdot 0 + 0 = 0 \quad \text{After delay?}$$

for  $i=1$

$$C_2 = P_1 C_1 + g_1$$

$$C_2 = P_1 C_1 + g_1$$

$$C_x = P_1 (P_0 C_0 + g_0) + g_1$$

→ likewise for others.

2 carries @ max.  
2 gate delay

CLA contd.

calculate carry (CLA)

$$\text{4-F-A} \quad \left\{ \begin{array}{c} C_3 \ C_2 \ C_1 \ C_0 \\ \text{---} \\ 0 \ 1 \ 0 \ 0 \\ | \ | \ 0 \ 1 \\ 1 \ 0 \ 1 \ 1 \end{array} \right\} \rightarrow \begin{array}{l} \text{calculated from process} \\ \text{to add them we use} \end{array}$$

$C_1$

four F-A

→ carries from CLA

→ one full adder delay



CLA

CLA

CLA

CLA

sum

$$C_1 = P_0 g_0 + C_0$$

$$C_2$$

$$C_3$$

$$C_4$$

$$C_5 = P_4 C_4 + g_4$$

for next  
block

$$C_4 = g_3 + P_3 g_2 + P_3 P_2 g_1 + P_3 P_2 P_1 g_0 + P_3 P_2 P_1 C_0$$

$$C_4 = P_0 C_0 + G_{7_0}$$

$$C_8 = G_{7_1} + P_1 C_4$$

$$C_{12} = G_{7_2} + P_2 C_8$$

$$C_{16} = G_{7_3} + P_3 G_{12}$$

→ intermediate  
carries  
are calculated  
by box.

→ because  
sum ones  
are complex

$$\rightarrow C_4 \quad C_8 \quad C_{12}$$

are @ start so we should do  
something about d-  
 $P_0 C_0 + G_{7_0} \leftarrow$

This thing is done @ last  
large block

$$C_{i+4} = G_{1_i} + P_{1_i} C_i$$

15<sup>th</sup> Jan 2018.

Lec # 207

CARRY RIPPLE  
Carry select

Conditional sum  
Look ahead

Carry Save Reduction  
Dual carry  
Wallace Tree.

Fast multipliers/  
multi operand Addition

Carry-Save adder:-

$a * b \rightarrow 4\text{ bits} \rightarrow \text{overall}$ .

$a + b + c + d \rightarrow 4\text{ bits} \rightarrow \text{partial product}$ .

10

5

(3 Inputs  $\rightarrow$  2 Outputs)

Delay = 1 Full adder.



0 1 0 . 0  
0 0 1 1  
0 0 1 0

0 1 0 1      sum  
0 1 0 .      carry

1FA used.

Comp. (3:2)  $\rightarrow$  started off with 3 and got 2  
if we started first with 2 numbers  
and merge result with the 3rd  
uses 8 full adders on the whole.

The final answer will be  
generated using any kind of full adder.  
Total five full adder delay.

Example #2  $a + b + c + d$ .

normally (12)  $\rightarrow$  through mis (6)

- 1) Carry Save Reduction
- 2) Dual "
- 3) Wallace Tree "

only works in

3 numbers



## CARRY SAVE REDUCTION



## Dual Carry Save Reduction:-

6 bits  $\rightarrow$  divide into 2  
Carry save  
Reduction



or

8bit



8 partial products.

Dual carry  
save Reduction



$\downarrow \downarrow \rightarrow$  2 layers obtained

$[+]$   $\rightarrow$  Simple add.

Transform the following using compression:  
using CRA



a  $Q_{1,3}$   
b  $Q_{1,3}$   
c  $Q_{2,2}$

4 layers due  
to four P.P



automatically uses any  
of the previous multipliers  
and it does not mention that  
uses carry save adder

-mentioning Carry Save adder:-



Sign Extension Elimination:-

$\begin{matrix} \text{SxS} \\ \text{SxUS} \end{matrix} \rightarrow \begin{matrix} \text{sign} \\ \text{Extension} \end{matrix}$

\* sign extension  $\rightarrow$  more adder  
 $\rightarrow$  more adders  $\rightarrow$  slow speed

\* sign extension is mandatory.

$\Rightarrow$  first wrong inter.  $\rightarrow$  correct answer add some correction and it will make answer correct  
 $\Rightarrow$  through this method no correct intermediate ans but final is correct on hardware.

ignore this

|                                                                                                                                                                  |                                                                                                             |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
| $\begin{array}{cccccc} & & 1 & 0 & 1 & S \\ \hline 1 & 1 & 1 & 1 & 0 & S \\   &   &   & 0 & 0 & 0 \\   &   & 1 & 0 & 1 & X \\   & 0 & 1 & 1 & X & X \end{array}$ | $\left\{ \begin{array}{l} 2's \text{ comp} \\ 101 \rightarrow 010 \\ \downarrow \\ 011 \end{array} \right.$ |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|

$\Rightarrow$  Correction vector

- 1) sign extend by 1.  
 2) Flip msB and place 1 @ location.

Demo

000  
 $\downarrow$

111 000  
 $\downarrow$

111 100  
 $\downarrow$

111 1100  
 $\downarrow$

111 100  
 $\downarrow$

100 000  
 $\downarrow$

becomes  
same  
number  
but we  
started

- 1) where there was

2's complement

= 1's comp + 1  
for last P.P

$\Rightarrow$  1's complement  
and place 1 @ LSB.

|                                                                                                                                                              |                   |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|
| $\begin{array}{cccccc} & & 1 & 0 & 1 & \\ \hline 1 & 1 & 1 & 1 & 0 & \\   &   &   & 0 & 0 & 0 \\   &   & 0 & 0 & 1 & X \\   & 0 & 1 & 0 & X & X \end{array}$ | $\rightarrow (B)$ |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|

ones separated

msB  $\leftarrow$  0 1 1 1 0 0 msB flip L-1

P.P { 1 1 1 0 0 0  
      1 1 0 0 0 0  
      1 0 0 0 0 0

IS  $\leftarrow$  0 0 0 1 0 0

Comp

1 0 1 0 1 0 0 0

$\downarrow$

drop as answer is 6 bit

correction  
vector found.

\* for  $3 \times 3$ ;  $5 \times 5$ ; correction  
vector will always  
be this

1 0 1 → (B)  
 1 1 0  
 1 0 0  
 0 0 1 X  
 1 1 0 X X

Ans  

$$\begin{array}{r}
 1 1 1 1 0 \\
 + 1 0 1 0 0 \\
 \hline
 1 0 0 0 1 1 0
 \end{array}$$
 → not correct and without correction vector and we took B.  
 ↓  
 000110 → Answer.

### Example #2.

$$(a * b) + c + d$$

Find correction vector for this.

$a = \begin{smallmatrix} Q_{1,1} & Q_{1,2} \\ Q_{2,1} & Q_{2,2} \\ Q_{3,1} & Q_{3,2} \end{smallmatrix}$   
 $b = \begin{smallmatrix} Q_{1,1} & Q_{1,2} \\ Q_{2,1} & Q_{2,2} \\ Q_{3,1} & Q_{3,2} \end{smallmatrix}$   
 $c = \begin{smallmatrix} Q_{1,1} & Q_{1,2} \\ Q_{2,1} & Q_{2,2} \\ Q_{3,1} & Q_{3,2} \end{smallmatrix}$   
 $d = \begin{smallmatrix} Q_{1,1} & Q_{1,2} \\ Q_{2,1} & Q_{2,2} \\ Q_{3,1} & Q_{3,2} \end{smallmatrix}$   
 is this Wallace?  $\times 13$   
 or any other comp. technique  $Q_{2,6}$   
 is comp.

Ans.  

$$\begin{array}{c|ccccc}
 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\
 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\
 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\
 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\
 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0
 \end{array}$$
 add them all



### Graphical Form.

$$\textcircled{1} \quad a * b - c * d$$

$$CV_1 - CV_2$$

\textcircled{2} if formats are same then  
C.V will be same.

$$\textcircled{3} \quad a * b + c * d$$

↓  
two separate C.V

$$\textcircled{4} \quad a * b - c * d$$

Formats are same  
may both will subtract  $\rightarrow C.V = 0$

|                 |         |
|-----------------|---------|
| [1 0 0 1 0 0 0] | 4x4 C.V |
|-----------------|---------|

40% final.

1 verilog question.

1 Design Question (state machine) (most probably)  
for our ease, forgetting processor, numbers are easy to gain

{ processor  
 design  
 numerical based.