

# ALU and Data Path

Control unit



Santosh Kumar Verma  
Mentor  
GeeksforGeeks

# ALU and Data Path in Computer Organization

- As per the developments, a real need of complex operations are more required.
- These operations are handled by the ALU.



# Structure

The **central processing unit (CPU)** can be divided into two sections:

1. **Data section:** Memory, registers, ALU (adders, multiplier, etc.), and communication buses. Each step (fetch, decode, execute, save the result) requires communication (data transfer) paths between memory, registers and ALU. It is also known as the ***data path***.
2. **Control section:** Data path for each step is set up by control signals that set up dataflow directions on communication buses and select ALU and memory functions. Control signals are generated by a control unit consisting of one or more finite state machines.

# Data Path



# BUS

There are mainly three type of bus:-

1. **Address bus:** Transfers memory addresses from the processor to components like storage and input/output devices. It's one-way communication.
2. **Data bus:** carries the data between the processor and other components. The data bus is bidirectional.
3. **Control bus:** carries control signals from the processor to other components. The control bus also carries the clock's pulses. The control bus is unidirectional.



The bus can be dedicated, i.e., it can be used for a single purpose or it can be multiplexed.

# Registers

The following list of five registers for in-out signal data storage:

1. Program Counter: A CPU register in the computer processor which has the address of the next instruction to be executed from memory. (Instruction Pointer)

2. Instruction Register: holds the instruction currently being executed or decoded *Knows about the opcode*

3. Memory Address Register (MAR): is the CPU register that either stores the memory address from which data will be fetched, or the address to which data will be sent and stored. *IBAR*

4. Memory Data Register (MDR): is the register that stores the data being transferred to and from the immediate access storage. Also known as memory buffer register (MBR). *40 BR*

5. General Purpose Register (GPR): are used to store temporary data within the microprocessor. It is a multipurpose register.

$R_1, R_2, \dots, R_n | AX | BX | CX | DX$

6. Special function Register (BP, SP, SI, DI, ...)



A data path is a collection of functional units such as *arithmetic logic units* or multipliers that perform data processing operations, registers, and buses.

Along with the control unit, it composes the central processing unit (CPU). A larger data path can be made by joining more than one data path using multiplexers.



Figure 3.2 Computer Components: Top-Level View

# One Bus organization



# Two Bus organizations



# Three Bus organization



| Feature                   | One-Bus Data Path                                                    | Two-Bus Data Path                                                                      | Three-Bus Data Path                                                                       |
|---------------------------|----------------------------------------------------------------------|----------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
| <u>Definition</u>         | Uses a single bus for all data transfers.                            | Uses two separate buses for source and destination.                                    | Uses three buses: <u>two for source</u> operands, <u>one for result</u> .                 |
| Buses                     | One common bus for all data transfers.                               | Two separate buses for source and destination operands.                                | Three buses: two for reading data, one for writing result.                                |
| <u>Registers Usage</u>    | <u>Shared register</u> for reading and writing.                      | <u>One read bus</u> and <u>one write bus</u> .                                         | <u>Two read buses</u> (two sources can be read simultaneously) and <u>one write bus</u> . |
| <u>Data Throughput</u>    | <u>Lower throughput</u> due to a single bus constraint.              | <u>Moderate throughput</u> , as <u>read</u> and <u>write</u> can occur simultaneously. | Higher throughput since two reads and one write can happen concurrently.                  |
| <u>Control Complexity</u> | <u>Simpler control logic</u> .                                       | <u>Medium control complexity</u> .                                                     | <u>More complex control logic</u> due to multiple buses.                                  |
| <u>Performance</u>        | Slowest among the three, as only one operation can happen at a time. | <u>Faster than one-line</u> but slower than three-line.                                | <u>Fastest</u> due to parallelism in fetching and writing.                                |
| <u>Power Consumption</u>  | Lowest power consumption.                                            | Moderate power consumption.                                                            | Higher power consumption due to additional buses.                                         |
| <u>Cost</u>               | Low cost, fewer components.                                          | Moderate cost.                                                                         | Higher cost due to increased complexity and bus lines.                                    |



Write the content of file1 into file2.

open a file

user → kernel

System calls

R  
↑  
w

Von Neumann / Harvard Archit

What it does?

Instruction format

Addressing modes

Data path & ALU

fetch  
decode  
execute  
write back

CU → Control signals to do

the tasks in  
Sync.

## Control Unit and Operations

Add R<sub>1</sub>, (R<sub>2</sub>)

R<sub>1</sub> ← R<sub>1</sub> + (R<sub>2</sub>)

Memory [R<sub>2</sub>]



MAR ← R<sub>2</sub>

temp<sub>2</sub> ← Memory(MAR).  
← R<sub>1</sub> + temp<sub>2</sub>

Keep CPU busy every time

↑  
performance

Various M-ops } → operation

# Instruction Micro-Operations

- A computer executes a program of instructions (or instruction cycles)
- Each instruction cycle has a number of steps or phases:



- These can be seen as micro-operations
  - Each step does a modest amount of work
  - Atomic operation of CPU

- The **Control Unit (CU)** is a vital component of the Central Processing Unit (CPU) in a computer.
- Its primary function is to direct the operations of the processor by telling the system's memory, ALU (Arithmetic Logic Unit), and input/output devices **how to respond to the instructions** that have been sent to the processor.

# what the control unit does:

- 1. Instruction Fetching:** The CU fetches instructions from memory and places them in the instruction register.
- 2. Decoding Instructions:** It interprets the instruction's opcode and determines what operations need to be performed.
- 3. Control Signal Generation:** It generates the necessary control signals to control other parts of the CPU and memory, enabling data transfer and execution of the instruction.
- 4. Data Routing:** The CU determines where data should be moved (e.g., between registers or between a register and memory).
- 5. Timing and Synchronization:** It ensures that operations are carried out in the correct sequence and are synchronized with the system clock.
- 6. Managing CPU cycles:** The control unit manages how the CPU transitions between fetch, decode, execute, and store cycles for each instruction.

## **Constituent Elements of its Program Execution**



# Types of Micro-operation

---

- Transfer data between registers
- Transfer data from register to external
  - (*memory*/*IO*)
- Transfer data from external to register
  - (*memory*/*IO*)
- Perform arithmetic or logical ops
- Interrupts / jump / branching

# Functions of Control Unit

- Sequencing
  - Causing the CPU to step through a series of micro-operations
- Execution
  - Causing the performance of each micro-op
- This is done using Control Signals

# Control Signals

---

- Clock
  - One micro-instruction (or set of parallel micro-instructions) per clock cycle
- Instruction register
  - Op-code for current instruction
  - Determines which micro-instructions are performed
- Flags (CY, AC, P, S, Z, D -) - PSW
  - State of CPU
  - Results of previous operations
- From control bus
  - Interrupts
  - Acknowledgements —

# **Control Signals - output**

---

- Within CPU
  - Cause data movement
  - Activate specific functions
- Via control bus
  - To memory
  - To I/O modules

# A Simple Computer & its Control Unit



# **Examples of Control Signal Sequence**

# Flowchart for Instruction Cycle



# Fetch - 4 “Control” Registers Utilized

- Program Counter **(PC)**
  - Holds address of next instruction to be fetched
- Memory Address Register **(MAR)**
  - Connected to address bus
  - Specifies address for read or write op
- Memory Buffer Register **(MBR)**
  - Connected to data bus
  - Holds data to write or last data read
- Instruction Register **(IR)**
  - Holds last instruction fetched



# Fetch Cycle

- Address of next instruction is in PC
- Address (MAR) is placed on address bus  
 $t1:$   $\underline{\text{MAR} \leftarrow (\text{PC})}$
- Control unit issues READ command
- Result (data from memory) appears on data bus
- Data from data bus copied into MBR  
 $t2:$   $\underline{\text{MBR} \leftarrow (\text{memory})}$
- PC incremented by 1 (in parallel with data fetch from memory)  
-  $\text{PC} \leftarrow (\text{PC}) + 1$
- Data (instruction) moved from MBR to IR  
 $t3:$   $\underline{\text{IR} \leftarrow (\text{MBR})}$
- MBR is now free for further data fetches



3 cc / 3 CPU cycles

# Fetch Cycle



Fetch Cycle:

t1: MAR  $\leftarrow$  (PC)

t2: MBR  $\leftarrow$  (memory)

PC  $\leftarrow$  (PC) + 1

t3: IR  $\leftarrow$  (MBR)

# Fetch Cycle

- Let  $T_x$  be the time unit of the clock. Then:

$t_1: MAR \leftarrow (PC)$

$t_2: MBR \leftarrow (\text{memory})$

$PC \leftarrow (PC) + 1$

$t_3: IR \leftarrow (MBR)$

- Is this equally correct? Why?

$t_1: MAR \leftarrow (PC)$

$t_2: MBR \leftarrow (\text{memory})$

$t_3: PC \leftarrow (PC) + 1$

$IR \leftarrow (MBR)$

valid  
if  
two / three Bus Data path

invalid  
if  
one Bus data path

# Basic Rules for Clock Cycle Grouping

---

- Proper sequence must be followed
  - MAR  $\leftarrow$  (PC) must precede MBR  $\leftarrow$  (memory)
- Conflicts must be avoided
  - Must not read & write same register at same time
  - MBR  $\leftarrow$  (memory) & IR  $\leftarrow$  (MBR) must not be in same cycle
- Also: PC  $\leftarrow$  (PC) + 1 involves addition
  - Use ALU ?
  - May need additional micro-operations

# Indirect Cycle (Memory)



Indirect Cycle:

- t1: MAR ← (IR<sub>address</sub>)
- t2: MBR ← (memory)
- t3: IR<sub>address</sub> ← (MBR<sub>address</sub>)

memory indirect

- IR is now in same state as if direct addressing had been used

Add X<sub>1</sub>, (1234H)



- (What does this say about IR size?)

IR address ← MBR ← (memory)  
inconsistent

# Interrupt Cycle



## Interrupt Cycle:

- t1: MBR ← (PC)
- t2: MAR ← save-address
- PC ← routine-address
- t3: memory ← stack  
memory ← (MBR)

- This is a minimum. May be additional micro-ops to get addresses
- ~~Note~~ N.B. saving context is done by interrupt handler routine, not micro-ops

# Execute Cycle: ADD R1, memory



Execute Cycle: ADD R1, X  
operand

- t1: MAR  $\leftarrow$  (IR<sub>address</sub>)
- t2: MBR  $\leftarrow$  (memory)
- t3: R1  $\leftarrow$  R1 + (MBR)

- Different for each instruction
- Note no overlap of micro-operations

# Execute Cycle: ISZ X



Execute Cycle: ISZ X (inc and skip if zero)

- t1: MAR  $\leftarrow$  (IR<sub>address</sub>)
- t2: MBR  $\leftarrow$  (memory)
- t3: MBR  $\leftarrow$  (MBR) + 1
- t4: memory  $\leftarrow$  (MBR)  
if (MBR) == 0 then  
skip PC  $\leftarrow$  (PC) + 1 - explicit

- Notes:

- "if" is a single micro-operation
- Micro-operations done during t4

# Execute Cycle: BSA X



Execute: BSA X (Branch and Save Address)

- |              |                                       |
|--------------|---------------------------------------|
| t1: MAR      | $\leftarrow$ (IR <sub>address</sub> ) |
| MBR          | $\leftarrow$ (PC)                     |
| t2: PC       | $\leftarrow$ (IR <sub>address</sub> ) |
| stack memory | $\leftarrow$ (MBR)                    |
| t3: PC       | $\leftarrow$ (PC) + 1                 |



- BSA X - Branch and save address  
Address of instruction following BSA is saved in X
- Execution continues from X+1

# Control Signals



|            | Micro-operations                                      | Timing | Active Control Signals |
|------------|-------------------------------------------------------|--------|------------------------|
| Fetch:     | $t_1: \text{MAR} \leftarrow (\text{PC})$              | $t_1$  | $C_2$                  |
|            | $t_2: \text{MBR} \leftarrow \text{Memory}$            | $t_2$  | $C_3, C_R$             |
|            | $\text{PC} \leftarrow (\text{PC}) + 1$                | $t_3$  |                        |
| Indirect:  | $t_1: \text{MAR} \leftarrow (\text{IR(Address)})$     | $t_1$  | $C_3$                  |
|            | $t_2: \text{MBR} \leftarrow \text{Memory}$            | $t_2$  | $C_5, C_R$             |
|            | $\text{IR(Address)} \leftarrow (\text{MBR(Address)})$ | $t_3$  |                        |
| Interrupt: | $t_1: \text{MBR} \leftarrow (\text{PC})$              | $t_1$  | $C_1$                  |
|            | $t_2: \text{MAR} \leftarrow \text{Save-address}$      | $t_2$  |                        |
|            | $\text{PC} \leftarrow \text{Routine-address}$         | $t_3$  |                        |
|            | $t_4: \text{Memory} \leftarrow (\text{MBR})$          | $t_4$  | $C_{12}, C_W$          |

$C_R$  = Read control signal to system bus.  
 $C_W$  = Write control signal to system bus.



# Control Unit Design

## a. Hardwired

faster

(switches | logic gate | combination circuit)

## b. Micro programmed

take time

(small codes)

# Hardwired Implementation

- » Control unit is viewed as a sequential logic circuit
- » Used to generate fixed sequences of control signals
- » Implemented using any of a variety of “standard” digital logic techniques
- » Principle advantages
  - ◆ High(er) speed operation
  - ◆ Smaller implementations (component counts)
- » Modifications to the design can be hard to do
- » Favored approach in RISC style designs

# **Hard Wired Control Unit**

---

- The Cycles (Fetch, Indirect, Execute, Interrupt) are constructed as a State Machine
- The Individual instruction executions can be constructed as State Machines
  - Common sections can be shared. There is a lot of similarity
- One ALU is implemented. All instructions share it

# State Machine

---

- Combinational logic

- Determine outputs at each state.
  - Determine next state.

- Storage elements

- Maintain state representation.



# State Diagram

---

Shows **states** and **actions** that cause **transitions** between states.



# Example State Machine

---



# Problems With Hard Wired Designs

---

- Sequencing & micro-operation logic gets complex
- Difficult to design, prototype, and test
  - Cost effective*
- Resultant design is inflexible, and difficult to build upon (Pipeline, multiple computation units, etc.)
- Adding new instructions requires major design and adds complexity quickly.

# Control Unit Organization

The Control Memory contains sequences of microinstructions that provide the control signals to execute instruction cycles, e.g. Fetch, Indirect, Execute, and Interrupt.

Tasks of Control Unit:

- Microinstruction sequencing
- Microinstruction execution

May be expected to complete instruction execution in "1" clock cycle. How is this possible?



# Recall: Micro-sequencing

| Micro-operations | Timing                                                                | Active Control Signals |
|------------------|-----------------------------------------------------------------------|------------------------|
| Fetch:           | $t_1: MAR \leftarrow (PC)$                                            | $C_2$                  |
|                  | $t_2: MBR \leftarrow Memory$                                          | $C_5, C_R$             |
|                  | $PC \leftarrow (PC) + 1$                                              |                        |
| Indirect:        | $t_3: IR \leftarrow (MBR)$                                            | $C_4$                  |
|                  | $t_1: MAR \leftarrow (IR(Address))$                                   | $C_8$                  |
|                  | $t_2: MBR \leftarrow Memory$                                          | $C_5, C_R$             |
| Interrupt:       | $t_3: IR(Address) \leftarrow (MBR(Address))$                          | $C_4$                  |
|                  | $t_1: MBR \leftarrow (PC)$                                            | $C_1$                  |
|                  | $t_2: MAR \leftarrow Save-address$<br>$PC \leftarrow Routine-address$ |                        |
|                  | $t_3: Memory \leftarrow (MBR)$                                        | $C_{12}, C_W$          |

$C_R$  = Read control signal to system bus.  
 $C_W$  = Write control signal to system bus.

# **Simple Control Memory**

- I1-I4 Control Memory addresses
  - O1-O16 Control Signals

# Example of Control Memory Organization

## Microinstructions:

- Generate Control Signals
- Provide Branching
- Do both



# Typical Microinstruction Formats



(a) Horizontal microinstruction



(b) Vertical microinstruction

# **Horizontal vs Vertical Microprogramming**

---

## **Horizontal Microprogrammed**

- Unpacked
- Hard
- Direct

## **Vertical Microprogrammed**

- Packed
- Soft
- Indirect

# Microinstruction Encoding

## Direct Encoding



(a) Direct encoding

# Microinstruction Encoding

## Indirect Encoding



(b) Indirect encoding

# Horizontal Micro-programming

---

- Wide control memory word
- High degree of parallel operations possible
- Little encoding of control information
- Fast

## **Vertical Micro-programming**

---

- Width can be much narrower
- Control signals encoded into function codes – need to be decoded
- More complex, more complicated to program, less flexibility
- More difficult to modify
- Slower

## **Next Address Decision**

---

- Depending on ALU flags and control buffer register:
  - Get next instruction
    - Add 1 to control address register
  - Jump to new routine based on jump microinstruction
    - Load address field of control buffer register into control address register
  - Jump to machine instruction routine
    - Load control address register based on opcode in IR

# **Microprogrammed Control Unit**



# **Advantages and Disadvantages of Microporgramming**

---

## **Advantage:**

- Simplifies design of control unit
  - Cheaper
  - Less error-prone
  - Easier to modify

## **Disadvantage:**

- Slower

# Design Considerations

---

- Necessity of speed
- Size of microinstructions
- Address generation
  - Branches
    - Both conditional and unconditional
    - Based on current microinstruction, condition flags, contents of IR
    - Based on format of address information
      - + Two address fields
      - + Single address field
      - + Variable format

# Address Generation

---

| Explicit             | Implicit         |
|----------------------|------------------|
| Two-field            | Mapping          |
| Unconditional Branch | Addition         |
| Conditional branch   | Residual control |

# Branch Control: Two Address Fields

Branch based upon:

- Instruction Opcode
- Address 1
- Address 2

Does require a wide microinstruction, but no address calculation is needed



# Branch Control: Single Address Field

Branch based upon:

- Next instruction
- Address
- Opcode

Does require more circuitry, e.g. adder



# Branch Control: Variable Format

One bit determines microinstruction format:

- Control signal format
- Branch format

Does require even more circuitry, and is slowest.



Consider the following data path of a CPU

The, *ALU*, the bus and all the registers in the data path are of identical size. All operations including incrementation of the *PC* and the *GPRs* are to be carried out in the *ALU*. Two clock cycle are needed for memory read operation-the first one for loading address in the *MAR* and the next one for loading data from the memory but into the *MDR*.

The instruction "add R0, R1" has the register transfer interpretation  $R0 \leftarrow R0 + R1$ . The minimum number of cycles needed for execution cycle of this instruction is

- A. 2
- B. 3
- C. 4
- D. 5



The microinstructions stored in the control memory of a processor have a width of 26 bits. Each microinstruction is divided into three fields: a micro-operation of 13 bits, a next address field ( $X$ ), and a  $MUX$  select field ( $Y$ ). There are 8 status bits in the inputs of the  $MUX$ .

How many bits are there in the  $X$  and  $Y$  fields, and what is the size of the control memory in number of words?

- A. 10, 3, 1024
- B. 8, 5, 256
- C. 5, 8, 2048
- D. 10, 3, 512



Consider the following sequence of micro-operations

$MBR \leftarrow PC$

$MAR \leftarrow X$

$PC \leftarrow Y$

$Memory \leftarrow MBR$

Which one of the following is a possible operation performed by this sequence?

- A. Instruction fetch
- B. Operand fetch
- C. Conditional branch
- D. Initiation of interrupt service

Consider the following datapath of a simple non-pipelined *CPU*. The registers  $A$ ,  $B$ ,  $A_1$ ,  $A_2$ ,  $MDR$ , the bus and the *ALU* are 8-bit wide.  $SP$  and  $MAR$  are 16-bit registers. The *MUX* is of size  $8 \times (2:1)$  and the *DEMUX* is of size  $8 \times (1:2)$ . Each memory operation takes 2 *CPU* clock cycles and uses  $MAR$  (Memory Address Register) and  $MDR$  (Memory Data register).  $SP$  can be decremented locally.

The *CPU* instruction "*push r*", where  $r=A$  or  $B$ , has the specification

$$M[SP] \leftarrow r$$

$$SP \leftarrow SP - 1$$

How many *CPU* clock cycles are needed to execute the "*push r*" instruction?

- A. 2
- B. 3
- C. 4
- D. 5



## Horizontal micro programming

- A** does not require use of signal decoders.
- B** Results in larger sized micro instructions than vertical micro programming
- C** Uses one bit for each control signal
- D** All of the above

Arrange the following configuration for CPU in decreasing order of operating speeds:  
Hard wired control, Vertical microprogramming, Horizontal microprogramming.

1. Hard wired control, Vertical microprogramming, Horizontal microprogramming.
2. Hard wired control, Horizontal microprogramming, Vertical microprogramming.
3. Horizontal microprogramming, Vertical microprogramming, Hard wired control.
4. Vertical microprogramming, Horizontal microprogramming, Hard wired control.

Consider a CPU where all the instructions require 7 clock cycles to complete execution. There are 140 instructions in the instruction set. It is found that 125 control signals are needed to be generated by the control unit. While designing the horizontal microprogrammed control unit, single address field format is used for branch control logic. What is the minimum size of the control word and control address register?

- A 125, 7
- B 125, 10
- C 135, 9
- D 135, 10

An instruction set of a processor has 125 signals which can be divided into 5 groups of mutually exclusive signals as follows:

Group 1 : 20 signals, Group 2 : 70 signals, Group 3 : 2 signals, Group 4 : 10 signals, Group 5 : 23 signals.

How many bits of the control words can be saved by using vertical microprogramming over horizontal microprogramming?

A

0

B

103

C

22

D

55