

## Programmable System Design:-

### Flexible System:-

Implement using flexible fabric.

FPGA: field programmable gated Arrays  
while in use.



PLG: implement using LUT.  
lookuptable.

Eg: AND gate:-

$$A \wedge B = C$$

| A | B | C |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |



This setup is LUT.

Redesign circuit →

Map on FPGA



- 1) map logic on LUT
- 2) interconnect.

### Programmable fabric:-

- \* Generic hardware; which can be programmed as per requirement.
- S/W code



Program

translates state of the system from one state to another.  
output depends on inputs & state.

mealy machine

Ex. in this state transition,  
all required computations  
are done.

Eg: program is a sequence of instructions given in assembly language.

$$\begin{array}{l} a \leftarrow b + c \\ d \leftarrow d + e \end{array}$$

After running program, we get output as

S:  $\xrightarrow{\text{Prog.}}$  S':

$$\begin{array}{ll} a=0 & a=20 \\ b=10 & b=10 \\ c=20 & c=20 \\ d=30 & d=70 \\ e=40 & e=40 \end{array}$$

Tanda

\* we need to have interface!

instruction set  $\leftarrow$  language the generic hardware

ISA: instruction set

Architecture.

Instruction Set:- → Structure of computer, a machine language  
programmer must understand; to write a correct  
machine description, (timing independent) program for the machine.

a hardware designer must understand; to design a correct  
implementation of computer. [generic hardware]

Eg: Statement in C++:-

$$f = (g+h)-(i+j)$$

Assembly instructions:-

add to, g, h

add t1, i, j

sub f, to, t1

{ op-code/mnemonic, operand, source/destination }

\* So; what instructions should an instruction set have?  
can it have (a+b) - (c+d) as a single instruction with  
four operands? I mean; is this worth it?  
isn't this too specific.  
unnecessary!

church's thesis:

A very primitive computer can compute anything that a  
fancy computer can compute.

- you only need

- 1) logical functions
- 2) read and write to memory
- 3) data dependent decisions. {if-else conditions??}

General ISA selection is for practical reasons of cost & performance. Computability is not an issue at all.

like; ISA might have direct instructions for additions.

→ nvidia have raytracing cores

→ Apple m1 has ML cores

→ normal processors have floating point cores.

\* If our processor doesn't have "t" instruction; then the S/W  
needs to send the (a+b) instruction in terms of

usually 't' instruction is available.

one of basics.

$$\rightarrow b_1 \wedge b_2$$

$$\rightarrow \neg b_1$$

etc...

"pair in the add".

But possible.

\* Every instruction be like:-



"need Some format for EVERY instruction".

→ computing System:- (oh lar! I'm tired!)



\* Whenever we write on a high-level language; it needs to be compiled; and this compiled binary contains machine languages & depends on ISA we are using.

All 3 have very different languages.

x86 - intel, AMD  
ARM - mobile, tablets, M1  
POWER - IBM.  
arm technologies.

# Now the question is, how do we develop such a language which is "sufficient".

we need

Logical operations / memory communication / control flow change.  
(con arithmetic too!)

data dependent actions.

like if ( $i == j$ ) then  
this line;

else that line.

a primitive computer got to

have these. Mine has many more. Hehe.

\* First, we need to decide upon interface.

if machine was fixed;



but since machine is generic:



both are stored in memory.

\* Format of instructions:-

- 1) what operation
- 2) who are operands
- 3) where to place result.

] we'll need "temporary" space.

Registers!

State of system:-

- content of memory

- content of registers.



1024 bytes (1KB)

10 bits; enough to position the memory.

\*

| operation | OP1 | OP2 | dest |
|-----------|-----|-----|------|
|-----------|-----|-----|------|

memory  
Eg; 200

register  
Eg; RS

addressing modes.

- Base + offset
- constant
- exact loc.

## kind of instructions

1) Add/Sub/AND/... (logical)

2) LW Load. read from memory.

Eg: load  $R_1$  a  
 register  $(R_1)$       memory location...  
                           (500)

3) SW Store write to memory.

Eg: store  $R_1$  a

3) Jump < to location's.  
 (unconditional branch).  
 maybe we have function.

(or)  
conditional branch:

BZ  $R_1$   $R_2$  loc. BEQ:  
 branch on equality.

we are here in program.

move to here!

Like "go to".

if  $R_1 == R_2$ ; then jump to 'loc'.

Otherwise go to seq. next instructions.

\* we'll have one special register; to keep track of our location in list of instructions.



(Keep pointer to the  
 NEXT instruction).

\* Okay! we have instructions set now!

How to design the generic hardware!  
(a.k.a. processor)

what be the components?

- 1) Resources to perform the job } DATAPATH
- 2) mechanism to instruct } controller.

\* Inside processor:-



\* DATAPATH is a slave.

CONTROLLER is intelligent.

\* Lets dive more into instructions....

ADD/AND/SUB; for simplicity; say that all operands are to be in registers.

Eg:

say; I have 20 operations overall  
 (LW) → 1. use (LW) for memory to register.  
 5 bits 4 bits 4 bits 4 bits

| Operation | OP1 | OP2 | dest |
|-----------|-----|-----|------|
|-----------|-----|-----|------|

add say OP1, OP2 are registers,  $r_u = r_2 + r_3$ .

we have 16 registers (ARM has 16)

∴ 4 bits enough to address

### LW instruction (memory read)

| Operation | OP1 | Dest |
|-----------|-----|------|
|-----------|-----|------|

location in memory

Say Base + Offset kind of mode.

Stored in a register.

give direct value (not stored in memory)  
 4 bits. 8 bits.

### Jump location:-

| operation | Loc. |
|-----------|------|
|-----------|------|

maybe relative to current value.

BEQ

| Oper  | OP1   | OP2           | Loc. |
|-------|-------|---------------|------|
| $r_i$ | $r_j$ | 8bit/16bit... |      |

if  $r_i == r_j$ ; go to loc.

otherwise  
 next instruction.

• uniform format of instructions:-

| Operation                       | OP1 | OP2 | Dest. |
|---------------------------------|-----|-----|-------|
| mostly<br>and base reg.<br>reg. |     |     |       |

Arithmetic:

| Operation | OP1 | OP2 | Dest. | Unused |
|-----------|-----|-----|-------|--------|
| 5 bits    | 4   | 4   | 4     |        |

LW, SW:

| Operation | OP1                                             | OP2                     | Immediat<br>value. |
|-----------|-------------------------------------------------|-------------------------|--------------------|
| 2 bits    | (reg--)<br>base address<br>where to load<br>imm | where<br>to load<br>reg | or 11 bit          |

BEQ:

2 bits.

| Operation | OP1 | OP2 | value.     |
|-----------|-----|-----|------------|
| BEQ       | ori | ri  | or 11 bits |

Jump:

| OP. | loc. | 1111 |
|-----|------|------|
| S   | 1111 | 1111 |

\* we need in byte addressable format!

24 bits JV.

ALU: (or functions in DATAPATH)  
L 29; Pg 17-21

See how ALU are made & control is provided by MUX.

Add ✓ Sub ✓ Shift ✓ AND ✓ OR ✓ MUX

Storage: - Registers

Programmable regis fcs + S/W (S/W)

R<sub>0</sub> — R<sub>15</sub>. Acting Simultaneously. + temp. registers + PC. H/W

register file.

• each register  $\rightarrow$  multiple flip flops.

• write in parallel; read in parallel.



We'll write at most 1 register at per operation.

We need 2 independent MUX;

Since, we'll need; parallelly two values in operation.



\* we'll need some more temporary registers.

\* we'll need a program counter.

\*

(PC)

at least able to address each **MEMORY**

- 4GB memory; 32 bit PC.

Say; 24 bit PC.

Storage:- RF + PC + temp. registers

needed for

intermediate  
data.

Steering logic:-

point-to-point.



"state machine" (aka CONTROLLER) needs to generate these control signals based on situation."

X — X

\* one more time...

(33)

say Processor has 8 registers.

↑  
3 bits  
to address.



Eg Say total 10 operations defined  
4 bits to symbolize

1) Arithmetic & logic-

16bit:

| Operation | OPR1     | OPR2     | DEST     |  |
|-----------|----------|----------|----------|--|
| ADD       | $r_{11}$ | $r_{12}$ | $r_{13}$ |  |

$$r_{13} = r_{11} + r_{12}$$

memory (hence  
instructions)  
is byte  
addressable.

2) load and store instructions:

operation, reg, mem-location

as

supplied directly in instruction.

basevalue;

Stored in register

| Operation | Base reg. | Dest reg | Offset                           |
|-----------|-----------|----------|----------------------------------|
| 4         | 3         | 3        | $6 // 2^6 = 64$ bytes in memory. |

$r_{12}$

store the data in memory into  $r_{11}$ .

Same for SW.

### 3) Branches:-

\* unconditional: GOTO; function calls...



normally

$$PC = PC + 2.$$

but now,

$$PC = PC + (x \cdot 2)$$

*pointer in memory: indicating the next instruction*

↳ no. of instructions to jump.  
each is 2 bytes!

\* conditional: (data-dependent)

BEQ  $g_1 g_2$  rel.loc.

| 16bit | 4         | 3     | 3     | 6        | b0 |
|-------|-----------|-------|-------|----------|----|
|       | OPERATION | OP1   | OP2   | rel.loc. |    |
| b1S   | BEQ       | $g_1$ | $g_2$ | y        |    |

if  $g_1 = g_2$  then

their stored data!  $PC = PC + 24$

else

$$PC = PC + 2$$

\* This is a minimal set of instructions.

< L30 pg 7 > } just write everything together.



\* memory API :-



memory  
is  
Byte Addressable only

\* Now, let's see our controller.

FSM!!

lets see what all our controller has to do,

what are its states

State transitions

What would be our

Steering logic...

\* 16 bit adder;

each register  $\rightarrow$  16 flipflops....

→ Arithmetic, Logic

## 1) AL instruction:-

ADD \$1 \$2 \$3

1. Bring instruction from memory (PC). S<sub>1</sub>

2. Understand instruction. Easy; since we have a clearly defined ISA. S<sub>2</sub>



3. Read OPR1 & OPR2. S<sub>3</sub>

4. Compute the instruction (Addition!) S<sub>4</sub>

5. Write result in dest. register. S<sub>5</sub>

6. Update PC.  $PC = PC + 2$ . [this job can be done in first step only.]

"these are the 5 states I need to pass through; to complete the instruction!"

State machine:-



\* now, we need to define the state transition logic

& the outputs for every state.

\* S<sub>1</sub>:



merge  $S_2 \& S_3 \rightarrow S'_2$

nothing much to do.

$S'_2$ :

now; IR has instruction....



we'll give  $A_1, A_2$  to muxes;

we'll get  $D_1, D_2$ .

values of reg.  
(stored)  
temporary  
registers!

$D_1, D_2$ ; will initially be written into RF using LW instruct.  
- By SOFTWARE.

$S_4$ : (execute instruction)

control given from instructions.



$S_5$ : (update in ~~RF~~ RF) maybe

memory will be updated later by SW instruction.



$\therefore$  Register address =  $A_3$

Value written =  $D_3$

$\therefore$  we've shown

$S_1 \rightarrow S'_2 \rightarrow S_4 \rightarrow S_5$

\* we get an idea of complete datapath:-



Registers: IR, t<sub>1</sub>, t<sub>2</sub>, t<sub>3</sub>.

Controls:

- for t<sub>1</sub>W
- t<sub>2</sub>W
- t<sub>3</sub>W
- for ALU

- ∴ we have a state machine, which can perform AL instruction.
- Let's see other instructions.



This is a state diagram  
for the datapath.

Top 2 lines of J1 = f1 & f2

## 2) Load Instruction:-

| operation | Base | Dest | Offset |
|-----------|------|------|--------|
| IS        | 1211 | 98   | 65     |

### State operations:-

S6: 1. read instruction from memory & update PC.

S7: 2. understand and read the base address of memory.

S8: 3. compute the address of memory (Baseaddr + Offset)

S9: 4. Read memory

S10: 5. update the registers.

### S6:-

same as S1.

S7:- (read base mem. address)

|                                |                 |
|--------------------------------|-----------------|
| $IR_{11-9} \rightarrow RF-A_1$ | $\bullet T_1-W$ |
| $RF-D_1 \rightarrow T_1$       |                 |



S8:- (compute the mem.loc.)

|                                              |                   |
|----------------------------------------------|-------------------|
| $T_1 \rightarrow ALU-A$                      | $\bullet ALU-add$ |
| $IR_{5-0} \rightarrow SE6 \rightarrow ALU-B$ | $\bullet T_3-W$   |
| $ALU-C \rightarrow T_3$                      |                   |



### S9.5 (memory read)

"read will happen during cycle; write  
will happen at end of  
clock!"



( ) no need of  
temp. registr.

### S10: (update reg).



### so Load instruction

$S_6 \rightarrow S_7 \rightarrow S_8 \rightarrow S_9 \rightarrow S_{10}$ .

We'll need datapath connectivity as :-



control signals → output of state machine

### S1) Store instruction

S1.1) read instruction from memory; & update PC.

S1.2) Understand & read Operands

S1.3) Compute the addresses

S1.4) write to memory.



S1.5

same S1



S1.6: (read both)



S1.7: (compute mem\_loc).



#### S14: (write to memory)



∴ Store instruction is

$$S_{11} \rightarrow S_{12} \rightarrow S_{13} \rightarrow S_{14}$$

4) BEQ instruction:= Bea R<sub>1</sub>.R<sub>2</sub> off.loc.

S15 1) read instruction. Update PC by 2.

S16 2) read operands, compute address for taken branch.

S17 3) execute! (Compare operands & make decision).  
if equal; PC = PC + 2. offset + 2  
else PC = PC + 2

Offset is defined!

additional increase

to the regular increase of 2 bytes.

S15: Same as S1.

✓ hmmm...

reading into instruction register.

3+3 not working

S6i: (read operands; pre-calculate the taken value for PC)



$t_3 \rightarrow$  address for "if" case

$PC \rightarrow$  address for "else" case

S7i: (compare; & decide for PC)



*we'll need a zero detector.*



### S) Jump:-

$$\text{do } PC = PC + 2 + 2(\text{Offset})$$

| Operation | offset         |
|-----------|----------------|
| 15<br>④   | 12 " ⑪<br>12 0 |

S18

1) Fetch the instruction. Update PC by 2.

S19

2) Compute the address

S20

3) Transfer to the address.

S18: Same as Step-1

( $PC = PC + 2$  done here!)

S19:

already increased by 2 in S18.



S20: Transfer t3 to PC



S18 → S19 → S20.

\* Now I want to combine all states:-



\* STGr:



further:-



Special logic provides 4-bit address with 16 PTO

Instruction  $\rightarrow$  Address  $\rightarrow$  Memory

decoders! - using decoders; when we don't know where to go.

1. operation



(like more than 1 child).

AL → S<sub>4</sub>

L → S<sub>8</sub>

S → S<sub>8</sub>

why not MUX?

BEQ → S<sub>17</sub>

J → S<sub>20</sub>

2.



L → S<sub>9</sub>

S → S<sub>14</sub>.

∴ we made State machine.

→ control points

1. wr. signals to registers.

2. MR & MW.

3. whenever we have MUXES,

Eg:



{ 3 inputs from  
ALU; t<sub>3</sub>  
or else....}

can't just twist these three.

use a MUX.



} denotes if  
3 is correct.  
check!

Eg: for ALU-A; how many different input wires?

PC → ALU-A } 2:1 MUX ✓  
t<sub>1</sub> → ALU-A }



ALU-B:

$t_2 \rightarrow ALU-B$

$t_2 \rightarrow ALU-B$  4:1 multiplexer

$SEG \rightarrow ALU-B$

$SEL_2 \rightarrow ALU-B$



\* Like this point out all control points.

\* Now we can fully define state machine.

State machine:-

- ① next state logic
- ② output  $\rightarrow$  control points