

L-12 / 30.08.2023 /

Memory is to store data & to access

Mid-term - 2

Instruction based not memory based

L-13 / 04.09.2023 /

## (\*) Accumulation based CPU

Operation Code

ADD R,  $\Rightarrow$  Only one operand field other operand or source field is set by default to accumulator register.

if 'I' present in OP code

ADI  $\rightarrow$  immediate  $\rightarrow$  next field is (not) field, is a data.

|                     | memory      | register                      | data itself |
|---------------------|-------------|-------------------------------|-------------|
| Mode                | memory mode | register                      | immediate   |
| Processing Time     | slowest     | moderate                      | fastest     |
| memory access count | 1+1         | $1+0$<br>for instruction read | 1+0         |

## General Purpose CPU

| Operation | Addressing Mode |
|-----------|-----------------|
|-----------|-----------------|

intel  $\Rightarrow$  Destination

Motorola  $\Rightarrow$  Source

MOV M<sub>2</sub> Ax

Destination for intel processor

ADD R<sub>1</sub> R<sub>2</sub>

\* Address of instruction  $\Rightarrow$  Where instruction is stored

\* Address of data  $\Rightarrow$  where data is stored

\* instruction data  $\Rightarrow$  Binary formate of a command.

\* Data  $\Rightarrow$  Data is our regularly used number.

\* How does the processor set the address of data?

$\Rightarrow$  from instruction

\* How does the processor set the address in PC,  
~~program counter~~ or instruction?

$\Rightarrow$  Operating System will load address in PC  
program counter.

next address generated by a integrated circuit

Seacal-D added to with the PC.

Calcium Carbonate (From Coral Source) and  
Vitamin D<sub>3</sub> (Colecalciferol)

**Seacal-DX**  
Calcium Carbonate (From Coral Source)  
and Vitamin D<sub>3</sub> (Colecalciferol)

④ Sometimes updated by preceding instruction.  
during execution of a program.

④ How does the CPU work?

⇒ read from slide (10-18).

④ Basic command/instruction  $\Rightarrow$  micro operation.

④ User Program



④ Instruction Set Architecture

RISC vs CISC

all included, flexible.

④ ADD R, M, ADD microoperations most

1. Read R,

2. Read from M,

3. Activate ALU to select Adder Unit

4. Add Operation

5. Active R, to save result.

Micro Operation.

Seagate

- ④ Micro programs were stored in control memory inside the control unit.

- ④ CISC Processor 70-80ns.

instruction used 10-20ns.

Co. of control memory unused

- ⑤ then introduced the

RISC Processor

→ more faster than before

Only LOAD & STORE command  
can access memory.



Control Unit

for CISC Processor

→ CPU will need more memory

3 ALU to be activated for ADD

4. Save to Dest Register by 1

L-14/ 11.09.2023/

## CISC Processor:

software base  
 Micro program control Unit

### Design of control unit:

Instruction



Decodes Instruction

## Instruction:

Large number of Instruction

Simple as well as complex

Different types of Memory mode

### Some Examples

ADD R<sub>1</sub> R<sub>2</sub> R<sub>3</sub> → Register Mode

ADD R<sub>1</sub> 15 R<sub>3</sub> → Immediate Mode

ADD M R<sub>2</sub> R<sub>3</sub> → Memory Mode

ADD M<sub>1</sub> M<sub>2</sub> M<sub>3</sub>

ADD R<sub>1</sub> [R<sub>2</sub>] R<sub>3</sub> → Memory indirect addressing mode

ADD R<sub>1</sub> [R<sub>2</sub>+R<sub>3</sub>] R<sub>4</sub> → Register pointing to memory

When, it enclosed, it means memory.

## ⊗ Program (H.L.)

Inst - 1 → M.C → 0110110110 → C.V.  
 Inst - 2  
 Inst - 3  
 Inst - n



⊗ micro operation < micro instruction < micro program < M.C < L.L < H.L

at least 1 or More

## ⊗ Micro Operation:

ADD R<sub>1</sub> R<sub>2</sub> R<sub>3</sub>

1. CPU will read data from R<sub>1</sub> C<sub>1</sub> C<sub>7</sub>
2. CPU will read from R<sub>2</sub> C<sub>2</sub> C<sub>8</sub>
3. ALU to be activated for ADD C<sub>11</sub>
4. Save to temp Register R<sub>3</sub> C<sub>15</sub>
5. Temp R<sub>3</sub> → R<sub>3</sub>

⊗ Full demonstration

available in Picture

**Seacal-D** C<sub>6</sub> C<sub>9</sub>

Calcium Carbonate (From Coral Source) and  
Vitamin D<sub>3</sub> (Colecalciferol)

⊗ Parallel micro operation can make it faster

**Seacal-DX**

Calcium Carbonate (From Coral Source)  
and Vitamin D<sub>3</sub> (Colecalciferol)

- ④ CISC -
  - Number of Instruction } More
  - Addressing Mode }
  - ↳ May contain simple & complex instruction

Decode  $\Rightarrow$  micro programmed control unit.

Each Instruction  $\Rightarrow$

microprogram

micro instruction

Single Bus



32 different operation

↳ 32 control signals to control or select ALU Operation.

④ We need 32 control

④ for every register, we need two

signal for other operations. control signal. One for  $R \rightarrow \text{BUS}$

another for  $\text{BUS} \rightarrow R$

④ 64 control signals to transfer content of Register to/from BUS.

Then, total control signals 128.

| $C_1$ | $C_2$ | $C_3$ | ... | $C_{128}$ |
|-------|-------|-------|-----|-----------|
| 1     | 0     | 0     | ... | 0         |
| 0     | 0     | 1     | ... | 0         |

Then, the length of a micro instruction is 128.

- ④ In a micro instruction, how many 1 contains, these are micro operation.
- ⑤ For micro instruction we need 128 bits for this example.
- ⑥ ALU have 256 instruction. Each instruction contains 16 micro instruction. Length of micro instruction is 128. Size of control memory?

$$\Rightarrow 256 \times 16 \times 128 \text{ bit}$$

$$= 2^8 \times 2^4 \times 2^7 \text{ bit}$$

$$= 2^{19} \text{ bit}$$

$$= \frac{2^{19}}{2^3} \text{ byte}$$

$$= 2^{16} \text{ byte}$$

$$= 2^6 \text{ kB}$$

= 64 kB control memory required.



- ⑦ Only one operation can be done at a time. That means in a micro instruction, there will be only one 1.

**Seacal-D**

Calcium Carbonate (From Coral Source) and  
Vitamin D<sub>3</sub> (Colecalciferol)

**Seacal-DX**

Calcium Carbonate (From Coral Source)  
and Vitamin D<sub>3</sub> (Colecalciferol)

⊗



⊗ For memory optimize, we can use decoder,

$$32 = 2^5$$



⇒ But there will be a delay.

⊗ For Register, we can use MUX to select Register

~~$2^6 = 64$~~



⊗ MUX for Register to BUS

⊗ DMUX for BUS to Register.

$$32 = 2^5$$

Next class Quiz-2

Design of a Program

## CISC Processor (Easy to Use)

→ Large no of Instruction (Simple + Complex)

→ Different type of Addressing Modes

 256 of instruction contains different type of addressing mode for a single instruction.

## CISC Processor

Instruction : 20 - 301. | memory = 601.  
Used : 70 - 801. | used = 20.

## RISC Processor:

Simple instruction

Simple Register Mode instruction

For memory access only two instruction available

LOAD & STORE

⇒ Control Unit is hardware base  
No need control memory

⇒ PIPE Lining

⇒ CPI  $\approx 1$  (for CISC it is 4/5 or higher)

**Seacal-D**

Calcium Carbonate (From Coral Source) and  
Vitamin D<sub>3</sub> (Colecalciferol)

**Seacal-DX**

Calcium Carbonate (From Coral Source)  
and Vitamin D<sub>3</sub> (Colecalciferol)

$$CISC \Rightarrow \downarrow I \times \frac{1}{100} \text{ Avg CPZ} \times C_{\text{same}}$$

$$RISC \Rightarrow \uparrow I \times \frac{1}{200} \text{ Avg CPZ} \times C_{\text{same}}$$

$\Rightarrow$  Then impact of Avg CPZ is higher than instruction.

So RISC Processor is more efficient and speedy.

$\Rightarrow$  Instruction count of same program will be increase upto 50% in RISC.

L-16 / 18.09.2023/

① Single Cycle Implementation

② Multi cycle Implementation

③ Pipelines (Single cycle)



- ⊗ Implementation time  $\Rightarrow$  Varies with operation & addressing mode.

### CPU-A

|        |                    |                              |                    |                             |
|--------|--------------------|------------------------------|--------------------|-----------------------------|
| LOAD   | $30 \mu\text{sec}$ | $\xrightarrow{\text{delay}}$ | $60 \mu\text{sec}$ | Single Cycle Implementation |
| STORE  | $60 \mu\text{sec}$ | $\xrightarrow{\text{delay}}$ | $60 \mu\text{sec}$ |                             |
| BRANCH | $40 \mu\text{sec}$ | $\xrightarrow{\text{delay}}$ | $60 \mu\text{sec}$ |                             |

\*  $\Rightarrow$  maximum ~~time~~ will be the time of a single cycle, OR slowest instruction.

### ⊗ Multi Cycle implementation:

$\Rightarrow$  Processing of instruction split into task.



⊗ Simple instruction may required few subtask only.

⊗ Now we need to set the clock to the highest time.

$$\text{CLK} \Rightarrow 20 \mu\text{sec}$$

**Seacal-D**

Calcium Carbonate (From Coral Source) and  
Vitamin D<sub>3</sub> (Colecalciferol)

**Seacal-DX**

Calcium Carbonate (From Coral Source)  
and Vitamin D<sub>3</sub> (Colecalciferol)

For Register Mode : Fetch, decode, Register Read, ALU,  
Result  
= 5 CLK Cycle

For memory mode : 6 CLK Cycles  
For Branch instruction : Fetch, Decode, Register Read  
= 3 CLK cycle.

L-17/20.09.2023/

Instruction Set Architecture Implementation :

⊗ Micro Architecture

Hardware is used to decode the instruction.

⊗ Operation/ Instruction

⊗ Addressing Modes → Depends on architecture

⇒ How the operands are addressed in operands fields.

RISC → ADD R<sub>1</sub> R<sub>2</sub> R<sub>3</sub> → Register Mode } CISC  
ADD R<sub>1</sub> M<sub>1</sub> R<sub>3</sub> → Memory Mode }

## \* Micro Architecture

### i) Single Cycle Implementation

### ii) Multi Cycle Implementation

### iii) Pipelining Implementation

## \* Single Cycle Implementation

⇒ CLK Duration is set by the slowest Instruction time.

## \* Multiple Cycle Implementation

⇒ First we need to make a list of sub task.

To read machine code from RAM

⇒ Fetch → Decode → Register → ALU → Result  
Read      Operation      Store

## \* ADD R<sub>1</sub>, R<sub>2</sub>, R<sub>3</sub>

⇒ Fetch → Decode → Register → ALU → Store Result  
Read      Operation

= 5 Sub task = 5 CLK cycle

## \* LOAD R<sub>1</sub>, M

⇒ Not in the list of Sub task.

**Seacal-D**

Calcium Carbonate (From Coral Source) and  
Vitamin D<sub>3</sub> (Colecalciferol)

**Seacal-DX**

Calcium Carbonate (From Coral Source)  
and Vitamin D<sub>3</sub> (Colecalciferol)



⇒ Now it is complete.

⊗ But some subtask may not required for all Instruction.

⊗ LOAD  $R_1, R_2(16)$

$R_2 + 16 = \text{memory address}$

⇒ Fetch → Decode → Register → ALU → Memory → Register

Read ( $R_2$ )       $R_2 + 16$       Access      STORE



⊗ STORE  $R_1, R_2(16)$

⇒ Fetch → Decode → Register → ALU → Memory

Read      Operation      Access

= 5 CLK Cycle

⊗ ADD  $R_1 M_1 R_3$

⇒ Not suitable for the sequence of subtask list.

⇒ Then what is the list of sub task and the sequence?

⇒ Fetch → Decode → Register → Memory → ALU → Result Store  
 Read Access Operation Register

\* ADD R<sub>1</sub>, [R<sub>2</sub> + R<sub>3</sub>], R<sub>4</sub>

⇒ Fetch → Decode → Register → ALU → Memory → ALU  
 Read operation Access operation

(memory address calculation)

Result Store

= 7 cycle

20μ

5μ

5μ

5μ

20μ

5μ

\* Fetch → Decode → Register → ALU → Memory → Result  
 Read operation Access Store

Then,

Single Cycle Implementation = 60 μ

} CLK Speed

Multiple Cycle Implementation = 20 μ

\* RISC (followed Harvard Architecture)

↳ Separate Memory for Instruction & Data

⇒ Use Pipelining System.

**Seacal-D**

Calcium Carbonate (From Coral Source) and Vitamin D<sub>3</sub> (Colecalciferol)

**Seacal-DX**

Calcium Carbonate (From Coral Source) and Vitamin D<sub>3</sub> (Colecalciferol)



$\Rightarrow$  Then CLK cycle = 5 CLK

$\Rightarrow$  Processor are designed by assembly line or production line.

### \* Pipelining System



|                | 1  | 2  | 3   | 4   | 5   | 6   | 7   |  |
|----------------|----|----|-----|-----|-----|-----|-----|--|
| I <sub>1</sub> | IF | ID | ALU | MA  | WR  |     |     |  |
| I <sub>2</sub> |    | IF | ID  | ALU | MA  | WR  |     |  |
| I <sub>3</sub> |    |    | IF  | ID  | ALU | MA  | WR  |  |
| I <sub>4</sub> |    |    |     | IF  | ID  | ALU | MA  |  |
| I <sub>5</sub> |    |    |     |     | IF  | ID  | ALU |  |
| I <sub>6</sub> |    |    |     |     |     | IF  | ID  |  |
| I <sub>7</sub> |    |    |     |     |     |     | IF  |  |

The table shows the flow of instructions over time (t) through seven stages. The first instruction (I<sub>1</sub>) starts at stage 1 (IF). By stage 7, the first instruction has completed its cycle. Subsequent instructions (I<sub>2</sub>, I<sub>3</sub>, I<sub>4</sub>, I<sub>5</sub>, I<sub>6</sub>, I<sub>7</sub>) enter the pipeline sequentially, starting at stage 2, 3, 4, 5, 6, and 7 respectively. The last instruction (I<sub>7</sub>) is shown starting at stage 7.

- ① For 1<sup>st</sup> Instruction, we need 5 CLK cycle to get output.
- ② For 2<sup>nd</sup> Instruction, we need 1 more CLK cycle to get Output.
- ∴ 100 Instruction, How much CLK cycle need in this Pipeline System.

$$\Rightarrow 5 + (100-1) \times 1 = 104 \text{ CLK cycle}$$

$$\therefore 100 \text{ Instruction} = 104 \text{ CLK cycle}$$

$$\therefore CPI = \frac{104}{100} \approx 1$$

∴ For pipelining CPI reduced to 1.

∴ If we execute in normal Non Pipeline processor

$$\text{CLK cycle need} = 500$$

$$\therefore \text{Speed Up} = \frac{500}{104} \approx 5$$

∴ For Pipeline, Every instruction need to be independent.

If dependency exist, then <sup>processor</sup> we need to wait some CLK cycle.

- ⦿ Single Cycle Implementation
  - ⦿ Multi cycle Implementation
  - ⦿ Pipelining Implementation
  - ⦿ CPU Design depends on Instruction Set.

- ④ Single Cycle: Assume that, there are 100 instruction and  $I_{45}$  is the most slowest and takes 20 usec. Then CLK Cycle will set to 20 usec.



-  Multi Cycle Implementation:

## Operation Addressing Mode

ADD            R<sub>1</sub>    R<sub>2</sub>    R<sub>3</sub>

ADD R, M, M<sub>3</sub>

ADD  $R_1 [R_3 + R_5] \rightarrow R_6$   
refer to memory

④ Processor will allow faster instruction to process faster and slowest instruction takes its time to process its operation.

⑤ Processing task is split into sub task.



|                                                  |   |                                      |                         |                     |
|--------------------------------------------------|---|--------------------------------------|-------------------------|---------------------|
| ALU                                              |   |                                      |                         |                     |
| ADD R <sub>1</sub> R <sub>2</sub> R <sub>3</sub> | ✓ | ✓<br>R <sub>1</sub> , R <sub>2</sub> | ✓                       | ✗                   |
| LOAD R <sub>1</sub> R <sub>2</sub> (16)          | ✓ | ✓<br>R <sub>2</sub>                  | ✓<br>R <sub>2</sub> +16 | ✓<br>R <sub>1</sub> |
| STORE R <sub>2</sub> R <sub>3</sub> (16)         | ✓ | ✓<br>R <sub>2</sub> R <sub>3</sub>   | ✓<br>R <sub>3</sub> +16 | ✗                   |

Annotations: 'Read' points to the R<sub>2</sub>+16 connection, and 'Write' points to the R<sub>3</sub>+16 connection.

#### ⑥ RISC Addressing Mode:

ALU : ADD R<sub>1</sub> R<sub>2</sub> R<sub>3</sub> → Register Mode

ADD R<sub>1</sub> 64 R<sub>3</sub> → Immediate Mode

For CPU ↔ RAM ⇒ LOAD/STORE

just for memory access

⑦ Before ALU Operation, data must be loaded in a Register.

⊗ LOAD R, R<sub>2</sub>(16) → memory - always in indirect form  
If we give memory address directly  
STORE R<sub>2</sub>, R<sub>3</sub>(16) then instruction size will increase.

⊗ ALU instructions cannot access RAM. Only LOAD and STORE can access Memory.

⊗ CISC can access memory in ALU Instruction. For data transfer in general, MOV.

⊗ For CISC

ADD R<sub>1</sub> M, R<sub>2</sub>

⇒ Fetch → Decode → MA → ALU → Result Store

ADD R<sub>1</sub> [R<sub>2</sub>+R<sub>3</sub>] R<sub>4</sub>

⇒ Fetch → Decode → Calculate → MA → ALU → Result Store  
R.R. M.A  
ALU

Avg CPI > 1

⊗ Pipe Lining Implementation Ampti Applicable to RISC

processors.

R2SC



| Instruction | CLK |    |     |     |     |     |     |     |
|-------------|-----|----|-----|-----|-----|-----|-----|-----|
|             | 1   | 2  | 3   | 4   | 5   | 6   | 7   | 8   |
| $I_1$       | IF  | ID | ALU | MA  | WR  |     |     |     |
| $I_2$       |     | IF | ID  | ALU | MA  | WR  |     |     |
| $I_3$       |     |    | IF  | ID  | ALU | MA  | WR  |     |
| $I_4$       |     |    |     | IF  | ID  | ALU | MA  | WR  |
| $I_5$       |     |    |     |     | IF  | ID  | ALU | MA  |
| $I_6$       |     |    |     |     |     | IF  | ID  | ALU |
| $I_7$       |     |    |     |     |     |     | IF  | ID  |
| $I_8$       |     |    |     |     |     |     |     | IF  |

Next Class Exam

⇒ Assume all the instruction are all independent.

\* How many CLK cycle required for each instruction?

⇒ 5.

**Seacal-D**

Calcium Carbonate (From Coral Source) and  
Vitamin D<sub>3</sub> (Colecalciferol)

**Seacal-DX**

Calcium Carbonate (From Coral Source)  
and Vitamin D<sub>3</sub> (Colecalciferol)

- ⊗ Each instruction takes 5 CLK cycle.
- ⊗ Processing time ( $5 \times c$ )
- ⊗ How many CLK cycle required to get the output of first two instruction  
⇒ 6
- ⊗ First three?  
⇒ 7
- ⊗ Total CLK cycle of a program containing 100 instruction?  
 $\Rightarrow 5 + (100-1)$   
 $= 5 + 99 = 104$  CLK Cycle
- ⊗ Throughput of the System.
- ⊗ What does Pipe-Line improve?  
 $\Rightarrow$  Processing time of a program.

Because, each instruction is still same.

$$\text{⊗ Avg CPI} = \frac{104}{100} \approx 1$$

$$\text{RZSC: } \uparrow I \times \downarrow \text{Avg CPI} \times C_{\text{same}}^{\leq 2}$$

$$\text{CZSC: } \downarrow I \times \uparrow \text{Avg CPI} \times C_{\text{same}}^{> 1}$$

Processing time!

$$\text{RZSC} < \text{CZSC}$$

Avg CPI most effective

CISC  
ADD M<sub>1</sub> M<sub>2</sub> M<sub>3</sub> → RISC → LOAD R<sub>1</sub> M<sub>1</sub>

LOAD R<sub>2</sub> M<sub>2</sub>

ADD R<sub>1</sub> R<sub>2</sub> R<sub>3</sub>

STORE R<sub>3</sub> M<sub>3</sub>

L-19 / 27.09.2023

### Design of CPU

#### ⊗ Micro Architecture Implementation:

#### ⊗ Instruction:



⊗ Lets CPU has 32 Register each of 32 bits.

that means there are 32-D-FlipFlop with 32 input and output Line.

⊗ We can read two Register at a time.

④ We can save to one Register at a time.

Flip Flop used only for data storage, that's why we use D Flip-Flop. Flip Flop uses transistors to store data (A number of transistors)



**2R + 1W  
Model**



# Seacal-D

Calcium Carbonate (From Coral Source) and Vitamin D<sub>3</sub> (Colecalciferol)

# Seacal-DX

Calcium Carbonate (From Coral Source) and Vitamin D<sub>3</sub> (Colecalciferol)

⊗ If grouped like  $\Rightarrow$

$$1^{\text{st}} \text{ Operand} = R_0 - R_7$$

$$2^{\text{nd}} \text{ Operand} = R_8 - R_{15}$$

$$\text{Result Store} = R_{16} - R_{23}$$

Now design a micro architecture with 32 bit

Registers each of 32 bits.

⊗ Entry Registers will be disconnected

⊗ For grouping, Instruction code will be optimized.



### Fetch Unit

To read instruction from Instruction RAM



⊗ Machine Code 32 bits, RAM is Byte addressable.

⇒ then we need 4 memory location to store MC.

⇒ The PC will increase by 4 every time.

### Decode Unit



⊗ This Data Path is designed for Register Bank ALU only.

⊗ Picture available (Data)

④ Data Path Design for a CPU that support  
Register addressing ALU instruction:

⇒ 32 Register

⇒ Any Register can be used for any operand

⇒ Multi-cycle implementation

⇒ 1. Fetch

2. Decode & Register Read

3. ALU

4. Result Store



Can be organized in your own preference



✳ This Data Path is designed for  
Registers Base ALU only.

✳ Picture available (Date)

# Seacal-D

Calcium Carbonate (From Coral Source) and  
Vitamin D<sub>3</sub> (Colecalciferol)

# Seacal-DX

Calcium Carbonate (From Coral Source)  
and Vitamin D<sub>3</sub> (Colecalciferol)

④ For immediate mode,

there are two type design

- ① Fixed
- ② Flexible

① lets operand-2 is fixed for immediate field.



④ Modified Part only:



② for flexible mode we need two more selector just like ①.

⊗ LOAD R<sub>1</sub> R<sub>2</sub> (cy)

⊗ Modified Part Only



⊗ Picture Available (Date)

**Seacal-D**

Calcium Carbonate (From Coral Source) and  
Vitamin D<sub>3</sub> (Colecalciferol)

**Seacal-DX**

Calcium Carbonate (From Coral Source)  
and Vitamin D<sub>3</sub> (Colecalciferol)

④ Data Path Design for LOAD/STORE and BRANCH Instruction.

④ Register Architecture - 32 Registers

⇒  $[2R + 1W]$  model

④ Accumulator base CPU, one operand will read from AC and store to AC and other operand maybe Register or Memory.

④ LOAD R, R<sub>2</sub> (128)

→ Destination  
→ Source (Memory)

Nent Page



## ④ BRANCH Instruction

→ Conditional Branch Instruction



⇒ First, we need to find out, when this instruction will be TRUE or FALSE.

④  $\text{beq} \Rightarrow$  Branch if equal

Hence, if  $R_2 = R_3$  (TRUE)

else, FALSE

→ branch not equal



else, FALSE

④ We can use subtract to check these statement.

If, result is zero(0), then equal. Else, not equal.

④ Each instruction is 32 bit and Instruction RAM is Byte addressable.