

# 11 ALU, Data Path & Control Unit

Parts of general purpose computer (GPC) :-

→ we have two types of machines that have SPP are :  
① General purpose Computer ② Embedded Systems.

Embedded Systems are stored program computer in which programs are included to perform specific task only.

Components of GPC are -

- ① Memory
- ② ALU → Add, S, M, D, Shift, EX-OR etc
- ③ Registers [Collection of bits stored in sequence of flip flops]
- ④ Timing circuit to synchronize the instruction.
- ⑤ Control Signals
- ⑥ Flags [1 bit information as flip flop]
- ⑦ Common Bus or channel for connection among all components.

for timing signal we use sequence counters, such that using 'n' bit counter we can execute  $2^n$  instructions.

Control signal can be generated using the logic gate.

Bus can be maintained using the MUX

## Designing of bus



Generally we have lots of registers of size 16b, 32b, 64b but here we are taking only 4 registers of 4 bit each.

| S <sub>1</sub> | S <sub>0</sub> | O/p                                                         |
|----------------|----------------|-------------------------------------------------------------|
| 0              | 0              | D <sub>3</sub> D <sub>2</sub> D <sub>1</sub> D <sub>0</sub> |
| 0              | 1              | C <sub>3</sub> C <sub>2</sub> C <sub>1</sub> C <sub>0</sub> |
| 1              | 0              | B <sub>3</sub> B <sub>2</sub> B <sub>1</sub> B <sub>0</sub> |
| 1              | 1              | A <sub>3</sub> A <sub>2</sub> A <sub>1</sub> A <sub>0</sub> |

✓ No. of MUX = Size of Register

✓ Size of MUX = No. of registers

## Designing a Computer:

Assume the m/m size is  $4096 \times 16$  means complete complete m/m size 4096 in which you can read a 16 bit of word at each clock.

DR → Data register holds the operand fetched from m/m. Size of register should be 16 bit here.

AR → Address register stores the address.

Size of register should be 12 bits because m/m size is  $2^{12}$ .

Here DR can hold 16 bit but AR can consider only 12. So, the most significant bit of AR will get filled with 0's and left 12 bits should be same.

AC → Accumulator is a register available with ALU. It stores generally output of ALU. (16 bit).

IR → Instruction register used to store and decode instruction. It is of size equals to size of max. instruction (16 bit).



Here  $2^4 = 16$  different opn are possible.

We can also use operand address bits to increase no. of operations.

(PC) → Program Counter. (TR) → Temporary register.

IR/IOR → Input / Output Register: Read each bit as input and send to m/m or ALU. (say 8 bit here).

(ALU) → We are going to perform addition here.

Whatever the operation you want to select, select via S<sub>0</sub>, S<sub>1</sub>, S<sub>0</sub> (select lines).

Bus → Data, Address, Control

We are going to use data bus here.

Control bus contains control signals (LD, INC, CLR). These are not individual lines from control unit.



### Timing Circuit



→ Each time only one register will be activated.  
 → It is also known as hardware program control Units.

### Instruction Register (IR)



### Instruction Cycle :- Phases of Instr'n cycle :-

- ① fetching
- ② Decoding
- ③ Compute effective address
- ④ Execute



### fetching :

$$I_0 : AR \leftarrow PC$$

↓  
Add. will load  
Registers

PC

$$I_1 : IR \leftarrow M[AR]$$

↑  
Load content of AR to  
m/m [M] and load it  
to IR.

Particular register will get  
selected as

$$\begin{array}{c} S_2 \ S_1 \ S_0 \\ PC : \ 0 \ 1 \ 0 \\ \text{Memory: } 1 \ 1 \ 1 \end{array}$$



Instruction I<sub>0</sub> occurs at T<sub>0</sub> and I<sub>1</sub> occurs at T<sub>1</sub>.



We use the 'or' gate because it might be possible that more than 1 timing signal demand for access.

as IR get executed at the same time PC will also incremented as  $PC \leftarrow PC + 1$ . So we give  $T_1$  to INC in PC here too.

So, here instruction fetch is taking 2 Cycles ( $T_0, T_1$ ) but it might get vary from system to systems.

$T_0, T_1 \rightarrow$  fetch

Decoding:  $T_2 \oplus D_7 D_6 \dots D_0 \leftarrow$  Decode IR(14,13:12);  $AR \leftarrow IR(0,11)$



Memory size  $\rightarrow$  Register size, so in case of register we can't hold 12 bits operand. So we can reduce it by devoting few bits for operation codes. So this way we can generate more than 8 opcodes.

from decoder O/P : ~~D<sub>7</sub>~~ D<sub>0</sub> - D<sub>6</sub>  $\rightarrow$  Memory reference [J=0/1]



Memory reference address have both J=0,1 variants

Effective add. given  $\leftarrow$  (I=0)

- AND 0xxx
- ADD 1xxx
- LDA 2xxx
- STA 3xxx
- BUN 4xxx
- BSA 5xxx
- JSZ 6xxx

Branch One address  
Branch 8 save address  $\rightarrow$

$\leftarrow$  (I=1)  $\rightarrow$  Need to find effective address  
8xxx  $\rightarrow$  9xxx  $\rightarrow$  10xxx  $\rightarrow$  11xxx

9xxx  
Axxx  
Bxxx  
Cxxx  
Dxxx  
Exxx

I/O related

|     |       |
|-----|-------|
| INP | F 600 |
| OUT | F 400 |
| SKI | F 200 |
| SKO | F 100 |
| IOI | F 080 |
| IOF | F 040 |

### Register related

|     |      |     |      |
|-----|------|-----|------|
| CLA | 7600 | SZE | 7002 |
| CLE | 7400 | HLT | 7001 |
| CMA | 7200 |     |      |

There can be more than these instructions.

So in register and I/O instruction we are using the operand bits also.

T<sub>3</sub>  $\rightarrow$  AR  $\leftarrow$  M[CAR] : used in case of calculating effective address. So it is required for mem related J=1 but not for I=0.

If T<sub>3</sub> not required, so we can start execution here

### fetching Effective Address (T<sub>3</sub>)



Each instruction is no need to go through all the steps of instruction execution.

If  $D_7=0$ ,  $I=1$ , and  $T_3=1$  then we have to calculate the effective address.

$I=1$  for indirect addressing mode



$AR \leftarrow M[PC]$



→ wait for execution phase and implement  $I=1$  in next cycles

If  $D_7=0$ ,  $I=0$  and  $T_3=1$  : No need to do anything so,  $T_3$  is completely wasted or stall.

If  $D_7=1$  and  $I=1$ ,  $T_3=1$  : I/O No stalling go to next phase that is execution.

$D_7=1$   $I=0$ ,  $T_3=1$  : Go to execute phase / no waste of time.  
So in m/m related we start execution from  $T_4$  and in I/O and register related we start it from  $T_3$ .

### Execution [ fetch operands & execute] : ( $T_4$ )

Say, in m/m reference instruction,

|                    |                                                                               |
|--------------------|-------------------------------------------------------------------------------|
| AND D <sub>0</sub> | On m/m reference instruction we perform fetching effective address at $T_3$ . |
| ADD D <sub>1</sub> |                                                                               |
| LDA D <sub>2</sub> |                                                                               |
| STA D <sub>3</sub> |                                                                               |
| BUN D <sub>4</sub> |                                                                               |
| BSA D <sub>5</sub> |                                                                               |

AND :  $T_4 = 1$ ,  $D_0 = 1$        $DR \leftarrow M[AR]$

Assume 1st operand is already in AC and second we fetch from m/m into data register.



$T_5 = 1$ ,  $D_0 = 1$  :  $AC \leftarrow DR$  AND AC

In generally fetching m/m is slow and takes more than 1 clock cycle. But here we are assuming ideally.

So, to execute it takes ( $T_4$  to  $T_5$ ) for AND instr. [6 clocks]

| ( $T_0$ , $T_1$ ) | ( $T_1$ ) | ( $T_2$ ) | ( $T_3$ )     | ( $T_4$ ) | ( $T_5$ )  |
|-------------------|-----------|-----------|---------------|-----------|------------|
| fetch instrn      | Decode    | EA J      | fetch operand | Execution | (optional) |

After complete execution we need to clear the sequence counter to start from To again.

So make

$$T_5 : SC \leftarrow 0$$

- ② ADD D<sub>1</sub> :  $D_1 = 1$   $T_4 = 1$  |  $D_1 = 1$   $T_5 = 1$   
[6 Clocks]  $DR \leftarrow M[AR]$  |  $AC \leftarrow DR + AC$   
|  $E \leftarrow Cout$  [carry]  
 $SC \leftarrow 0$
- ③ LDA D<sub>2</sub> :  $D_2 = 1$   $T_4 = 1$  : To load something into AC, pass the word to bus and then word will pass to AC through ALU. But ALU is connected with bus. So we load it in DR first then it will point to AC.
- 
- DR  $\leftarrow M[AR]$

$$D_2 T_5 : AC \leftarrow DR, SC \leftarrow 0 \quad [6 \text{ Clocks}]$$

- ④ STA D<sub>3</sub> : It can be done in 1 cycle only.  
D<sub>3</sub> T<sub>4</sub> :  $M[AR] \leftarrow AC, SC \leftarrow 0$  [5 Clocks]

- ⑤ BUN D<sub>4</sub> : D<sub>4</sub> T<sub>4</sub> :  $PC \leftarrow AR, SC \leftarrow 0$

[5 Clocks] It will set Program counter to instruction execute next to

- ⑥ BSA D<sub>5</sub> : You come back to 101 after executing subroutine for this we save 101 at 200 and next instruction of subroutine is BUN 200 means you have to go 200 and find the next executing address there



BUN → Branch Unconditional.

$$D_5 T_4 : M[AR] \leftarrow PC$$

$$AR \leftarrow AR + 1$$

$$D_5 T_5 : PC \leftarrow AR; SC \leftarrow 0$$

Method we are using here doesn't support recursion.  
[6 Clocks]

ISZ D<sub>6</sub> : Increment and skip if zero. [7 clocks]

D<sub>6</sub> T<sub>4</sub> : DR  $\leftarrow$  M[AR]

D<sub>6</sub> T<sub>5</sub> : DR  $\leftarrow$  DR + 1

D<sub>6</sub> T<sub>6</sub> : M[AR]  $\leftarrow$  DR ; If (DR == 0) PC  $\leftarrow$  PC + 1  
SC  $\leftarrow$  0

Register reference instruction  $\rightarrow$  we start executing from T<sub>3</sub> and sometimes we can even finish at T<sub>3</sub>.



CIA  $\rightarrow$  Clear accumulator  
 $b_{11} = 1$

CLE :  $b_{10} = 1$

CMA :  $b_9 = 1$  Complement accumulator



Similarly we can implement total 12 instructions in this size of register by changing value of bi.

$\rightarrow$  All the register instruction requires only 4 clocks.

I/O related instruction : [Total 4 clocks required]



CPU works faster than I/O device so to make synchronization we need I/O flags

I flag  
↓  
0      1  
[Consumed by CPU]

O flag  
↓  
0      1  
[Data is available]

[Data is available]  
[Consumed by O/p device]



$$D_7 = 1 = I = T_3$$

- ① INP  $\rightarrow b_{11} = 1$
- ② OUT  $\rightarrow b_{10} = 1$
- ③ SKI  $\rightarrow b_9 = 1$
- ④ SKO  $\rightarrow b_8 = 1$
- ⑤ ION  $\rightarrow b_7 = 1$
- ⑥ IOF  $\rightarrow b_6 = 1$



b<sub>7</sub> [IE 6, 11]

$\downarrow$

We have different  
AND gate for  
each value of  
here.

$T_3 = 1$

| D <sub>7</sub> | I | Remark                |
|----------------|---|-----------------------|
| 0              | 0 | [stall]               |
| 0              | 1 | M [Effective Address] |
| 1              | 0 | Registers             |
| 1              | 1 | J10                   |

J10 flag

INP: Read INPUT: AC  $\leftarrow$  INPR [If Iflag=1]; FAI0  $\leftarrow 0$

OUT: Move o/b : OUTR  $\leftarrow$  AC ; FAI0  $\leftarrow 0$

SKI: if (FAI0 = 1) ; PC  $\leftarrow$  PC + 1 ; skip Input

SKO: if (FAI0  $\neq$  1) ; PC  $\leftarrow$  PC + 1 ; skip output

ION: Interrupt ON : INT  $\leftarrow 1$

IOF: Interrupt OFF : INT  $\leftarrow 0$

SC  $\leftarrow 0$   
in all

Interrupts are generally held by OS only.

(ii)

If there is any input coming from Input Interface the sequences

Input Interface  $\rightarrow$  INPR  
BUS  $\leftarrow$  AC  $\leftarrow$  ALU  
 $\hookrightarrow$  memory

1/m  $\rightarrow$  DR  $\rightarrow$  ALU  $\rightarrow$  AC  
OUTR  $\leftarrow$  BUS  $\leftarrow$



Programmed I/O: Suppose, our computer is programmed to execute 1 MIPS, then,

Say rate of input generate is 10 character per sec.

Say program for I/O devices is

200 SKI

201 BUN 200

202 INPR

~~Step~~ we can get input by just using last instruction but due to input flag check condition we are executing 1st instruction regularly, called busy waiting.

So 1 character = 10<sup>5</sup> use means in that time we can execute lot of instruction but we are executing only 1 instruction, so its waste of time.

Interrupt driven; ~~IEN = 1~~ [Program can be Interrupted]  
[Interrupt Enable]

We can't give control of IEN flag to user. It will always dealt in OS mode.

Interrupt driven HO

FGI → 1  
FGO → 1  
IEN → 1



During execution of ISR : IEN = 0.

Condition for writing an interrupt



R (say a flag)

R = 1 means program wants to be interrupted

Interrupt Cycle

check R at every T<sub>0</sub> clock

Instruction Cycle

R T<sub>0</sub> : fetch

R T<sub>1</sub> : fetch

R T<sub>2</sub> : Decode

T<sub>3</sub>} Execute

R T<sub>0</sub> : AR ← 0 ;

R T<sub>1</sub> : M[AR] ← PC ;

R T<sub>2</sub> : PC ← 0 ;

R T<sub>3</sub> : PC ← PC + 1 ;

IEN ← 0 ;

R ← 0 ; S ← 0

Temp. register (CTR)

(R)

R T<sub>0</sub> : AR ← 0 ;

TR ← PC

R T<sub>1</sub> : M[AR] ← TR ;

PC ← 0

R T<sub>2</sub> : PC ← PC + 1

IEN ← 0 ;

R ← 0 ; S ← 0

If no interrupt (R) then it follows

Normal execution

Two ways

You can find interrupt upto T<sub>2</sub> only. Else no interrupt

If  $T_0 \bar{T}_1 \bar{T}_2$  IEN and ( $T_0 \bar{I} \otimes T_0 \bar{O}$ ) is 1 then  $R = 1$ .

If  $R \neq 1$  in  $T_0$  then we will try to keep  $R \neq 1$  until  $T_1 \otimes T_2$  gets done.

So, to handle interrupt we are waiting  $T_1 \otimes T_2 \otimes T_3$ . This is an overhead.

So handle overheads we have DMA [Direct m/m access]

Till now we have



but in DMA



### Example of Control Unit

LOAD AR :  $R' T_0 + R' T_2 + D_7' I T_3$

CLR AR :  $R T_0$

INC AR :  $D_5 T_4$



→ Given example is Control Unit for Address Register. Similarly we combine all the control units to form a computer.

### Interrupt Enable Flag



$$D_7 = 1, I = 1$$

$$D_7 I T_3 B_7 : IEN \leftarrow 1$$

$$D_7 I T_3 B_6 : IEN \leftarrow 0$$

$$R T_2 : IEN \leftarrow 0$$

$I_{ON}, I_{OF}$  will work if  $B_7 = 1$

Programmable CU is faster, but it is not flexible. You can't change the operation of CU after complete manufacturing.

You can't add or delete more instruction.

so we need a Self Program control unit.

→ "Instr" set is fixed in hardware.

To perform an operation we need to perform several steps called 'micro operations'

$$\text{Ex: } AC \leftarrow AC + R_i$$

LDA 21  
LD R<sub>i,4</sub>  
ADD R<sub>i</sub>

3 operations

Each 'micro operations' are performed using Control Unit.

Hardwired → we use hardware as control units, but we can also do it using software called micro programs.

We have micro instructions -



Above MIR, increments the content of A and B called as control words.

Control word stored in control memory. [ROM]



Inside fetch MPC just gets increment but in switch over from fetch → Decode we have to load new address to flags



block in MIR otherwise INC occurs automatically.

Condition → This part depends upon no. of flags needed for n flags we need  $\lceil \log_2 k \rceil$  bits.

NA → size depends on size of control mtr. If control mtr have m words then  $\lceil \log_2 k \rceil$  bits.

Control Signal → Depends on no. of instructions required

Microprogramming can be done in 2 ways

- ① Horizontal: We can increase parallelism and its fast
- ② Vertical: It is slow because of extra use of decoders

For horizontal partition for  $N$  instruction we need  
 $N$  bit in control signal, in vertical partition we  
need  $\lceil \log N \rceil$  bits

- Need
- ① Control word is large
- ② High parallelism
- ③ Faster

- Need
- ① Control word is small
- ② No parallelism
- ③ Slower

→ we can also develop hybrid of both.

### Hardwired Control Unit

- ① Fixed Instructions
- ② High speed
- ③ Expensive
- ④ Complex

Ex → RISC

### Programmed Control Unit

- ① Flexible Instructions
- ② Slower as compare
- ③ Cheap
- ④ Simple

Ex → CISC

### I/O Interface

Need: ① Peripherals are electromechanical and electromagnetic, but CPU & memory are electronic devices. The signal conversion may be required.



- ① I/O devices have slower data transfer rates.
- ② Data Codes and formats ~~base~~ in I/O are different from CPU.
- ③ All devices are different from each other.

Interface gets activated according to the value kept by CPU on address bus.

Once Interface gets activated CPU generates commands to control bus and then access the data.

Types of commands -

- (1) Control Commands
- (2) Status Commands
  - ↳ To get the present status of devices
- (3) Input Command → To read
- (4) Output Command → To write.

### Isolated vs M/m mapped I/O



Peripherals → I/O device

### 2nd Model



Suppose CPU generate address then there is a conflict that the address is for main m/m or the peripherals. To resolve this we have control bus.

CB have 4 different lines along with its functionality

- (1) I/O Read      (2) I/O Write
- (3) Memory Read      (4) Memory Write
- (1), (2) = 1 means I/O op'n else m/m operation

This is called as Isolated I/O concept. If two configurations are m/m and I/O are connected on same side then why it is isolated. But we have to remember the same

I/O : Input / Output  
M/m : Load / Store

In m/m mapped I/O we divide main m/m into 2 parts

- (1) for m/m purpose
- (2) for I/O purpose



Adv.: We can use same Inst. for both m/m and I/O.

Address of I/O means address of register inside interface.

## Serial vs Parallel Transmission



| Serial                     | Parallel             |
|----------------------------|----------------------|
| ① Shift registers required | ① No shift registers |
| ② Burst Errors             | ② Bit errors         |
| ③ Cheap                    | ③ Costly             |
| ④ Less reliable            | ④ More reliable      |
| ⑤ Large distance           | ⑤ Less distance      |
| ⑥ Ex → Keyboard            | ⑥ Bus                |

for large distance b/w source and destination we can't perform task of both with just one clock as-



So we can't use clock in this condition.

with clock → Synchronous      without clock → Asynchronous.



① Strobe Signal.

② Handshaking Method.

① Strobe



Here destination can't say, I have received the data. So it is a problem.

## Handshaking



VD → Valid Data  
DA → Data Accepted

- Till now whatever we discuss is about transmit by the source but it can be possible that data should ask by the destination.
- Here we use single data line (serial) but we can use multiple data lines too (parallel) and rest is same.

## Asynchronous Serial: 1 char = 8 bits

Data is : 11010110



We need to add some bits to separate or show the start & end.

Say to send 8 bits (1 char) we need to send 11 bits.  
Suppose Keyboard is giving 7/b as 10 char per sec

So total  $1 \times 10 = 10$  bits needs to send.

→ In Async we have more than 1 clock and we need to again synchronize them.

## Types of I/O Transfer:

- ① Programmed I/O
- ② Interrupt driven I/O
- ③ DMA

① Both programmed I/O have a program like -

- ① Read the status flag
- ② If data is not available ( $status > 0$ ) go to step 1

③ Move the data

⋮

Disadvantage → By bypassing at step ②, so CPU time gets wasted.

- ② Whenever device is ready with data, it will interrupt by ISR.

Disadv: Less overhead than Programmed I/O but exists.

→ Sensor Interrupts always takes highest priority.

Also, Interrupts can be generated by timers, traps, I/O.

Traps means you stuck in infinite loop (Busy waiting, using extra space, divide by 0).

Priority of I/O devices



### Priority based Interrupts

| HD | Mouse | Printer | KB |         |
|----|-------|---------|----|---------|
| 1  | 1     | 1       | 1  | HD      |
| 0  | 1     | 1       | 1  | Mouse   |
| 0  | 0     | 1       | 1  | Printer |
| 0  | 0     | 0       | 1  | KB      |

In general we have different ISR for each device



Suppose KB interrupt gets interrupted in between by HD ISR, then we have to go

for ISR of HD in between.

So, here is a problem that how can we get to know about the address of ISR of HD.

- ① Software Based
- ② Hardware based

Software means pure software but hardware means interplay of both software and hardware.

In software approach we do polling, means we go to each device and ask him that, he is the one who perform ISR according to priority (Non-Vectorized)

Disadv.: Time consuming.



### Daisy Chaining



INTACK used to accept Interrupt by higher priority devices

Devices are connected in order of increasing priority if ACK signal reaches to device generates the interrupt that device will not allow INTACK signal to go forward.

I/O device provide interrupt signal and address of ISR



Each I/O generates vector address for ISR and load on Bus to support CPU.

This implementation is faster than previous implementation.



|   | $J_0$ | $J_1$ | $J_2$ | $J_3$ | $y$ | $x$ |
|---|-------|-------|-------|-------|-----|-----|
| 1 | X     | X     | X     | X     | 0   | 0   |
| 0 | 1     | X     | X     | X     | 0   | 1   |
| 0 | 0     | 1     | X     | X     | 1   | 0   |
| 0 | 0     | 0     | 1     | X     | 1   | 1   |

for Disk Interrupt, Vector Address → 00 00 00 00  
for Printer Interrupt, VAD is  
01 0 0 0 0 0 0

$JST = 1$  means at least one device generate interrupt

If  $JEN = 1$  means there is an interrupt arises and we are allowing it.

If  $JEN = 0$  then it means CPU is busy and we can't handle this interrupt.

As Enable becomes 1, ISR gets loaded, firstly ISR update the mask register according to priority.

### Interrupt Overhead.

In order to perform operations like read from Input (INP) and write (OUT), we do perform lots of other instruction in support of these instruction. All these are mounted as overhead here.

### Interrupt Cycle :

$$SP \leftarrow SP - 1 \quad (\text{Stack Pointer})$$
$$M[SP] \leftarrow PC$$

$$INTACK \leftarrow 1$$

$$PC \leftarrow VAD$$

$$IEN \leftarrow 0$$

→ Stack pointer grow upwards  
[ \$000 → 0 ]

### Initial operations of ISR

These are overheads -

- ① Clear lower level mask bit
- ② clear IST.
- ③ Save content of processor register.
- ④ Set IEN flag.
- ⑤ Proceed with service routine

### Final Operations of ISR

These are also overheads -

- ① Clear IEN
- ② Save content of processor register
- ③ Clear the bit in the interrupt register belongs to source that has been serviced
- ④ Set lower level priority bits in mask registers
- ⑤ Restore return address into PC and set IEN

- Absolute & direct addressing are same thing. (G2)
  - ↳ Load direct from given add.
- ↗ In relative addressing size of address get reduced and useful in case of branching. [PC + Address.]
- ↗ In operand forwarding if data is fetch from memory then we can do read register along with m/m write and execution just after m/m write [2006 : 2.33]
- In relative add. requires instruction size, allows indexing of array with same instruction, enable relocation.
- ↗ Relative add. are calculated from absolute add. so they can't be greater than ~~far~~ faster than absolute add.
- ↗ RFE (Return from exception) is a privileged instruction performed by OS to switch between kernel → user after the end of handling an exception, so it is a trap instr.
- ↗ If one trap instruction is executing, no other trap can execute at the same time.
- ↗ Control hazards penalties can be partially or completely removed by dynamic branch predictions.
- In delayed branching we generally keep Load/Store Instruction in delayed slots.
- ↗ In pipeline, if WAW hazards also available in instr along with RAW then do. (OF) after complete the execution (ALU) phases of previous instruction.

### RISC

- Reduced Instrn set
- Less no. of Instrn
- Fixed length Instrn
- more registers
- Few no. of addressing modes
- Less cost and cheap
- Single cycle instructions
- only in registers. (Reg → Reg.)
- Hardwired control unit
- CPI = 1 using pipeline

### CISC

- Complex Instrn set
- Large no. of Instrn
- Variable length Instrn format
- Large no. of addressing mode
- High cost and powerful
- Several cycle instruction
- Manipulate direct m/m
- Hardwired control unit and micro op queue
- No pipeline used

→ On RISC

$$\text{Window size} = \text{Local Reg.} + \text{Global Reg.} + (\text{IN} + \text{OUT}) \\ = L + G + 2C$$

64

$$\text{Register file size} = (\text{No. of register window}) * (L + C) + G$$

→ Total no. of bits required to represent control

~~No parallel signal in vertical programmed IO is, K (say).~~  
No parallel signal in vertical programmed IO is, K (say).  
~~Storage~~  $K = \lceil \log_2(\text{Group}_1) \rceil + \lceil \log_2(\text{Group}_2) \rceil + \dots$

In horizontal programmed IO: ] parallelism and  
 $K = (G_1 + G_2 + \dots)$  faster

→ Memory mapped IO uses the same set of read, write control lines for both memory & I/O transfers.

→ Isolated IO has distinct input & output instructions.

→ M/m mapped IO is faster, less complex and consumes less power as compare to IO mapped.

① Implied : No address CLC.

② Immediate : MOVE R1, 35H → direct value is given

③ Register mode : Load from one register to another also called direct register mode : MOV R1, R2

④ Register Indirect Mode : MOV R1, [R2]

⑤ Direct / Absolute : Only one m/m reference ADD R1, [0301]

⑥ Indirect Mode : Symbol @ + () .

⑦ Base register : Effective add. calculate by adding base register value to address field value

PC = Base + Relative Value

⑧ Relative : Register contains offset and we get effective address by adding it to PC

PC = PC + Relative

→ In question of RAW dependency is like.

$$R_0 \leftarrow M[LocI] \rightarrow I_1$$

$$R_1 \leftarrow R_0 + R_2 \rightarrow I_2$$

then,  $I_2$  will do execution phase after completion of M/m access phase of  $R_0$  (m/m access occur after exec of  $I_1$ )

→ Execution Add  $R_1, R_2 \rightarrow 3$  cycles → Direct  
Add  $R_1, @R_2 \rightarrow 4$  cycle → Indirect

→ Relative add. mode used to reduce instruction size, all indexing of array, easy relocation but doesn't enable fast address calculation than absolute add.

→ A CPU has 24 bit (3B) instruction and starting address in 300 then all consecutive address are 303, 306, 309

→ LOAD Rd, 1000 (Rs) means  $Rd \leftarrow \frac{[1000+1]}{\text{m/m location}}$

→ In even integer binary representation LSB should be '0's

→ In operand forwarding if case is -

$$\begin{aligned} R_1 &\leftarrow R_1 + R_2 \\ R_2 &\leftarrow R_1 + R_3 \end{aligned} \quad \text{fwd.}$$

then Ex of  $I_2$  will occur after Ex of  $I_1$

→ In delayed branching irrespective of condition, first instruction always executed

→ If FD and OF given as separate stages then in case of RAW dependency do perform OF stage after completion of first instruction.

→ Buffer between two stages come under consideration during pipeline otherwise we use only stage delay in non-pipeline nodes

- Interrupt driven I/O is faster than programmed I/O
- CALL and JUMP Instructions just check the value of flag doesn't change its
- In dairy chaining we gives non-uniform priority to all devices
- formatting floppy means writing identifier information on all tracks & sectors
- Swap space is used to save process data
- performance gain =  $\frac{E_{\text{programmed I/O}}}{E_{\text{interrupt I/O}}}$
- Burst block transfer (DMA) and vectored interrupt needs high Bus.
- ⇒ for a magnetic disk with concentric circular tracks, the latency is not linearly proportional to seek distance due to higher capacity of tracks on the periphery of the platters

- Based and relative addressing modes are suitable for program relocation at run time not absolute & indirect.
- In absolute addressing mode ~~address~~ is inside the ~~instruction~~ ~~address~~.
- Normally Add execution takes 3 cycles if all operands follows direct addressing mode but if any one follows indirect then it takes 4 cycles and 5 if both follow indirect.
- Notation ADD Rd, 1000 :  $Rd \leftarrow Rd + 1000$   
 ADD Rd, 1000(Rs) :  $Rd \leftarrow Rd + M[1000 + Rs]$
- In operand forwarding we provide o/b to IO phase along with EX phase in register or in case of m/m just after m/m phase.
- Speed Up =  $\frac{\text{Pipe depth}}{(1 + \text{No. of stalls})} = \frac{\text{no. of stages}}{(1 + \text{No. of stall instr})}$
- Time taken by  $n$  Ins =  $(CPI) * (\text{time taken by 1 instr})$
- Any implied register explicitly mentioned as an operand in operation is not possible.
- Throughput: No. of instructions per second.
- In cache avg. m/m access time If 'y' is given :  $t_{avg} = y^{ktn}$
- If in question given that ignore the search time in cache the used formula is  $\Rightarrow H_1 t_1 + (1 - H_1) H_2 t_2 + \dots$
- In direct mapping we use decoder MUX and in k-way set associative we use decoder MUX as well as  $(k \times 1)$  way selection MUXs.
- In accessing array using col<sup>n</sup> major order we generally possess all the elements as miss.
- In asynchronous serial data transfer we keep 0 is the beginning and 2-1's at the end to provide synchronization.