

# ELECTRONICS AND COMMUNICATION TECHNOLOGIES: ELECTRONICS SYSTEMS

LM Cyber Security – Fall 2024

**Federico Baronti, Luca Crocetti**

Dip. Ing. Informazione

Via G. Caruso, 16 – Stanza B-1-09

050 2217581 – [federico.baronti@unipi.it](mailto:federico.baronti@unipi.it)



Office hours:

Friday 14-16. Please, contact me in advance before showing up.  
We can also arrange an appointment remotely on Microsoft  
Teams.

# **COMPUTER ORGANIZATION**

# Processor's building blocks

- PC provides instruction address
- Instruction is fetched into IR
- Instruction address generator updates PC
- Instruction in IR is decoded and operands in registers are read
- ALU performs some computation
- For load/store instruction a data memory access is performed
- Control circuitry supervises instruction execution by generating the control signals needed to perform the required actions



# A digital processing system

- datapath



# A multi-stage digital processing system

- datapath



# Why multi-stage?

- Processing moves from one stage to the next in each clock cycle
- Such a multi-stage system **is the basis for pipelined** operation
  - High-performance processors have a pipelined organization
  - Pipelining enables the execution of successive instructions to be overlapped
- We will get back to **pipeline** later. Let's now focus on the basics of the multi-stage architecture of a **RISC-style processor**

# Instruction execution

- Pipelined organization is most effective if all instructions can be executed in the same number of steps.
- Each step is carried out in a separate hardware stage.
- Processor design will be illustrated using **five** hardware stages.
- **How can instruction execution be divided into five steps?**
  - Let's start from some representative RISC instructions

Fetch      Decode      Compute      Memory      Write  
F            D            C            M            W

# A memory access instruction:

I-type

ldw R5, X(R7)

$R5 \leftarrow \text{Mem32}[X] + R7$

signed  
extension  
to 32 bits

- F 1. Fetch the instruction and increment the program counter.
- D 2. Decode the instruction and read the contents of register R7 in the register file.
- C 3. Compute the effective address = X + [R7].
- M 4. Read the memory source operand at effective address.
- W 5. Load the operand read from memory into the destination register, R5.



# A computational instruction:

R-type

add R3, R4, R5 ;  $R3 \leftarrow R4 + R5$

- F 1. Fetch the instruction and increment the program counter.
- D 2. Decode the instruction and read registers R4 and R5.
- C 3. Compute the sum  $[R4] + [R5]$ . ALU
- H 4. No action.  CN is a short circuit
- W 5. Load the result into the destination register, R3.

- Stage 4 (memory access) is not involved in this instruction.



# 5-stage Architecture of a RISC Processor

- F 1. Fetch an instruction and increment the program counter.
  - D 2. Decode the instruction and read registers from the register file.
  - C 3. Perform an ALU operation.
  - M 4. Read or write memory data if the instruction involves a memory operand.
  - W 5. Write the result into the destination register.
- 
- This sequence determines the hardware stages needed.

# A 5-stage implementation of a RISC processor

- Instruction processing moves from stage to stage in every clock cycle, starting with **fetch**.
- The instruction is **decoded** and the source registers are read in stage 2.
- Computation** takes place in the ALU in stage 3.



# A 5-stage implementation of a RISC processor

- ...
- If a **memory** operation is involved, it takes place in stage 4.
- The result (if any) of the instruction is **written** in the destination register in stage 5.



# The datapath – Stages 2 to 5

IR

- Register file,  
used in stages 2 and 5.
  - (Inter-stage registers RA,  
RB, RZ, RM, RY needed to  
carry data from one stage  
to the next)



32 REGISTERS

# Hardware components: Register file

is involved in 2 phases : DECODE and WRITE  
(may be)

- A 2-port register file is needed to read the two source registers at the same time.
- It may be implemented using a 2-port memory.

Eg. ADD R3, R4, R5



# The datapath – Stages 2 to 5

- Register file,  
used in stages 2 and 5
  - (Inter-stage registers RA, RB, RZ,  
RM, RY needed to carry data  
from one stage to the next)
- ALU stage
- Memory stage
- Final stage to store result  
to the register file



LD / ST

# Memory stage

- For a calculation instruction:
  - MuxY selects [RZ] to be placed in RY.
- For a memory instruction:
  - RZ provides memory address, and MuxY selects read data to be placed in RY.
  - RM provides data for a memory write operation.
- In subroutine calls or exception handling:
  - Input 2 of MuxY is used to save the return address in the link register



stw R4, 100(R5)  
E.A.

# Memory access

- When data are found in the cache, access to memory can be completed in one clock cycle.
- Otherwise, read and write operations may require several clock cycles to load data from main memory into the cache.
- A control signal is needed to indicate that memory function has been completed (**MFC**).  
E.g., for step 1:



# Instruction memory and IR control signals



# Data memory interface



# Instruction address generator

is involved in F, C  
all instr.

- Connections to registers RY and RA are used to support subroutine call and return instructions

~~F F~~ PC\_Enable

~~[z] D C | re [w]~~

for br needs  
to take into  
consideration  
the condition

callr Ri  
some instructions: CALL | RET | jmp. | br  
CALL CLR JMP | JMPI | BR  
JMP | JMPI | BR  
condition





*R-type*

# Example: add R3, R4, R5

1. Memory address  $\leftarrow [PC]$ ,  
Read memory, Wait for  
MFC, IR  $\leftarrow$  Memory data,  
 $PC \leftarrow [PC] + 4$
2. Decode instruction,  
 $RA \leftarrow [R4]$ ,  $RB \leftarrow [R5]$
3.  $RZ \leftarrow [RA] + [RB]$
4.  $RY \leftarrow [RZ]$
5.  $R3 \leftarrow [RY]$



$$R_0 = 0$$

$\begin{array}{|c|c|c|c|c|c|} \hline & 7 & 5 & 21 & 5 & 0 \\ \hline \text{OpCode} & \text{Imm} = X & \text{R7} & \text{R5} & \text{R0} & \text{R1} \\ \hline \end{array}$

I-type

# Example: ldw R5, X(R7)

1. Memory address  $\leftarrow [PC]$ ,  
Read memory, Wait for  
MFC, IR  $\leftarrow$  Memory data,  
 $PC \leftarrow [PC] + 4$
2. Decode instruction,  
 $RA \leftarrow [R7]$
3.  $RZ \leftarrow [RA] + \text{Immediate value } X$
4. Memory address  $\leftarrow [RZ]$ ,  
Read memory, Wait for  
MFC,  $RY \leftarrow$  Memory data
5.  $R5 \leftarrow [RY]$



# Example: <sup>ST</sup> stw R6, X(R8)

1. Memory address  $\leftarrow [PC]$ ,  
Read memory, Wait for  
MFC, IR  $\leftarrow$  Memory data,  
 $PC \leftarrow [PC] + 4$
2. Decode instruction,  
 $RA \leftarrow [R8]$ ,  $RB \leftarrow [R6]$
3.  $RZ \leftarrow [RA] + \text{Immediate value } X$ ,  $RM \leftarrow [RB]$
4. Memory address  $\leftarrow [RZ]$ ,  
Memory data  $\leftarrow [RM]$ ,  
Write memory, Wait for  
MFC
5. No action



# Unconditional branch

1. Memory address  $\leftarrow [PC]$ , Read memory,  
Wait for MFC, IR  $\leftarrow$  Memory data,  
 $PC \leftarrow [PC] + 4$
2. Decode instruction
3.  $PC \leftarrow [PC] + \text{Branch offset}$
4. No action
5. No action



# Conditional branch:

## Branch\_if\_[R5]=[R6] LOOP

1. Memory address  $\leftarrow [PC]$ , Read memory, Wait for MFC, IR  $\leftarrow$  Memory data,  $PC \leftarrow [PC] + 4$
2. Decode instruction,  $RA \leftarrow [R5]$ ,  $RB \leftarrow [R6]$
3. Compare  $[RA]$  to  $[RB]$ ,  
If  $[RA] = [RB]$ , then  
 $PC \leftarrow [PC] + \text{Branch offset}$
4. No action
5. No action



# Subroutine call with indirection:

Call<sup>callr</sup>  
register R9

CALL  
RET

1. Memory address  $\leftarrow [PC]$ , Read memory, Wait for MFC, IR  $\leftarrow$  Memory data,  $PC \leftarrow [PC] + 4$
2. Decode instruction,  $RA \leftarrow [R9]$
3.  $PC-Temp \leftarrow [PC]$ ,  
 $PC \leftarrow [RA]$
4.  $RY \leftarrow [PC-Temp]$
5. Register LINK  $\leftarrow [RY]$



SUBROUTINE LINKAGE  
The return address is stored in one of the general purpose regs

# Subroutine RETurn

CALL  
RET

1. Memory address  $\leftarrow [PC]$ , Read memory, Wait for MFC, IR  $\leftarrow$  Memory data,  $PC \leftarrow [PC] + 4$
2. Decode instruction,  $RA \leftarrow$  Register LINK
3.  $PC \leftarrow [RA]$
4. No action
5. No action



START

# What are the various Control signals that need to be generated?

Assumption: We are executing only 1 instr. at a time (ex. IR won't change during execution)

- ↓
- Select multiplexer inputs to route the flow of data
- This simplifies the design

Pipelining will introduce issues to be taken into account.

- Set the function performed by the ALU
- Generation of enable signals for the registers
- Determine when data are written into the PC, the IR, the register file, and the memory

→ Those values are connected with BUSES to the IR, but not necessarily used. Depends on the op.

# Register file control signals



Instruction Format



We have an immediate value ↗ I



Under the assumption of no pipelining, \* They can be reentered anywhere.  
NOTE: ① is selected in case of a CALL. In ② we will have  
the return address.

Possible Optimization: look at implementation of the CALLR

CALLR is an R-Type MSR. The info that comes in the LINK can be taken by the  
implementation. In general for a call/callr, the destination reg. is LINK reg.  
that is fixed at hardware.

COMPUTE PHASE

# ALU control signals



# Result selection



# Control signals of instruction address generator



# Control signal generation

5-stage architecture: F D C M W

- Circuitry must be implemented to generate control signals so actions take place in correct sequence and at correct time.
- There are two basic approaches:
  - hardwired control and microprogramming
- Hardwired control involves implementing circuitry that considers step (ring) counter, IR, ALU result, and external inputs. You can just think like combinational network
- Step (Ring) counter keeps track of execution progress, one clock cycle for each of the five steps described (unless a memory access takes longer than one cycle).

design by hand

→ description of a circuit using a language.

Structure of a counter

STEP  
RING

COUNTER

10000 is reset ①  
configuration

5 D flipflops that form a shift register with a feedback from last to first one.



| ①                                                                                         | (F) T1 | (D) T2 | (C) T3 | (M) T4 | (W) T5 |
|-------------------------------------------------------------------------------------------|--------|--------|--------|--------|--------|
| NOTE: When switch on this piece of hw, we want the first FF as 1 and all the others as 0. | 1      | 0      | 0      | 0      | 0      |
|                                                                                           | 0      | 1      | 1      | 1      | 1      |
|                                                                                           | 0      | 0      | 0      | 0      | 0      |
|                                                                                           | 0      | 0      | 0      | 0      | 0      |
|                                                                                           | 0      | 0      | 0      | 0      | 0      |

Size of the ring depends on the number of stages.

...

Note: if every operation required one clock cycle we wouldn't need a mechanism to keep the counter fixed. But this is not always true. This is implemented with a T\_EN signal. It's an enable for all flip flops (load, store, interaction with the cache etc.)

[WRITE BACK vs WRITE THROUGH are used in cache]

JMP SLIDE 30: enable for register file

$$\overline{RF\_Write} = T_5 \cdot \left( \begin{array}{c} \text{addl} \\ \text{or} \\ \vdots \\ \text{addi} \\ \text{or} \\ \text{load} \\ \text{call} \\ \text{callr} \end{array} \right)$$

↳ Signal is one when  
we are not writing and we are executing some instr.

$\overline{T_{EN}}$

$$PC\_enable = T_1 \cdot MFC + T_3 \cdot \left( \begin{array}{c} \text{cell} \\ \text{callr} \\ \text{br} \\ \text{jmp} \\ \text{cond} \\ \text{jmp} \\ \text{cond} \\ \text{ret} \end{array} \right)$$

PC enable is active in phase 1 or phase 3

Only if memory function completed is high

# Hardwired generation of control signals

① you generate signals complement for generating other signals



# CISC processors

- CISC-style processors have more complex instructions. (*longer than 1 word*)
- The full set of instructions cannot all be implemented in a fixed number of steps.
- Execution steps for different instructions do not all follow a prescribed sequence of actions.
- Hardware organization should therefore enable a flexible flow of data and actions to accommodate CISC.

Recall that there's a clear flow of info in RISC. In a CISC the flow is more sophisticated and requires buses. Usually we have 3 buses in a CISC.

# Hardware organization for a CISC computer



# Bus

- An example of an interconnection network.
- When functional units are connected to a common bus, tri-state drivers are needed.



# A 3-bus interconnection network

$$R5 \leftarrow R5 + R6$$



Example 1: Add R5, R6

1. Memory address  $\leftarrow [PC]$ ,  
Read memory, Wait for  
MFC, IR  $\leftarrow$  Memory data,  
 $PC \leftarrow [PC] + 4$

Prog. Counter on bus B.  
What is read on  
memory goes on bus C  
in IR.

2. Decode instruction
3.  $R5 \leftarrow [R5] + [R6]$

→ R5 and R6 need to be executed.

Just 3 clock cycles!

# A 3-bus interconnection network

$\text{Mem32[R7+x]} \leftarrow \text{Mem32[R7+x]}$  &  $R9$

## Example 2\*: And X(R7), R9

1. Memory address  $\leftarrow [PC]$ , Read memory, Wait for MFC,  $IR \leftarrow$  Memory data,  $PC \leftarrow [PC] + 4$ 
  - during decode CPU understands what and the word has to be read.
2. Decode instruction
3. Memory address  $\leftarrow [PC]$ , Read memory, Wait for MFC, **Temp1**  $\leftarrow$  Memory data,  $PC \leftarrow [PC] + 4$
4. **Temp2**  $\leftarrow [Temp1] + [R7]$  *Indirizzo dell'operando in memoria*
5. Memory address  $\leftarrow [Temp2]$ , Read memory, Wait for MFC, **Temp1**  $\leftarrow$  Memory data
6. **Temp1**  $\leftarrow [Temp1]$  AND [R9]
7. Memory address  $\leftarrow [Temp2]$ , Memory data  $\leftarrow [Temp1]$ , Write memory, Wait for MFC



\*X is stored as a second word of the instruction

# A 3-bus interconnection network

$\text{Mem32[R7+x]} \leftarrow \text{Mem32[R7+x]}$  & R9

## Example 2\*: And X(R7), R9



1. Memory address  $\leftarrow$  [PC], Read memory, Wait for MFC, **IR**  $\leftarrow$  Memory data, **PC**  $\leftarrow$  [PC] + 4
  - Annotations: "during decode CPU sends 1 word" and "that word has to be read."
2. Decode instruction
3. Memory address  $\leftarrow$  [PC], Read memory, Wait for MFC, **Temp1**  $\leftarrow$  Memory data, **PC**  $\leftarrow$  [PC] + 4
4. **Temp2**  $\leftarrow$  **Temp1** + [R7]
  - Annotations: "Indirizzo dell'operando" and "in memoria"
5. Memory address  $\leftarrow$  [Temp2], Read memory, Wait for MFC, **Temp1**  $\leftarrow$  Memory data
6. **Temp1**  $\leftarrow$  [Temp1] AND [R9]
7. Memory address  $\leftarrow$  [Temp2], Memory data  $\leftarrow$  [Temp1], Write memory, Wait for MFC

\*X is stored as a second word of the instruction

# References

- C. Hamacher, Z. Vranesic, S. Zaky, N. Manjikian  
"Computer Organization and Embedded Systems,"  
*McGraw-Hill International Edition*
  - Chapter V: Basic Processing Unit