

← support XIP →

| SRAM                                                                                                                                                                                                                                                                                                                                                                                               | DRAM                                                                                                                                                                                                                                                                                                                                                                                                                                                     | EEPROM                                                                                                            | NAND Flash                                                                                                                                                                                                                                                                | NOR Flash                                                                                                                                                                                                                                                                                                             | Magnetic HDD                                                                                                                                                                                                                                    |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul style="list-style-type: none"> <li>- in system memory</li> <li>- volatile</li> <li>- Large <math>(4-6</math> transistors per cell) <math>\xrightarrow{1 \text{ bit}}</math></li> <li>- Fast</li> <li>- low power consumption (active &amp; standby)</li> <li>- does NOT need periodic refresh</li> </ul> <p>able to access random address locations within the memory at any point in time</p> | <ul style="list-style-type: none"> <li>- in system memory</li> <li>- volatile</li> <li>- small (<math>1-3</math> transistors per cell)</li> <li>- slower <math>\xrightarrow{\text{more}} \text{ more power}</math></li> <li>- higher power <math>\xrightarrow{\text{refresh}} \text{ periodic refresh}</math></li> <li>- periodic refresh (mainly integrity)</li> <li>- single transistor design</li> </ul>                                              | <ul style="list-style-type: none"> <li>- non-volatile</li> <li>- can electrically program/erase device</li> </ul> | <ul style="list-style-type: none"> <li>- in main storage memory</li> <li>- non-volatile</li> <li>- behaves like a NAND gate.</li> <li>- does NOT support XIP</li> <li>- lower cost per bit than NOR</li> <li>- e.g. USB, SSD</li> </ul>                                   | <ul style="list-style-type: none"> <li>- in system memory*</li> <li>- non-volatile</li> <li>- behaves like a NOR gate.</li> <li>- supports XIP</li> <li>- allows random word/byte programming (eraser done at block level)</li> <li>- use as system memory to store program code or general storage memory</li> </ul> | <ul style="list-style-type: none"> <li>- non-volatile</li> <li>- stores data by magnetizing a thin film of ferromagnetic media on the platter</li> <li>- mainly on desktop, data centres, basically large secondary storage devices.</li> </ul> |
|                                                                                                                                                                                                                                                                                                                                                                                                    | <ul style="list-style-type: none"> <li>- memory array takes up most of the space</li> <li>- uses capacitor as its storage elements</li> <li>- Upgraded version: SDRAM</li> <li>↳ make use of a clock signal from host to synchronize data transfer <math>\xrightarrow{\text{faster transfer rate}}</math></li> <li>- defects memory device for system memory</li> <li>- pipeline architecture</li> <li>↳ a lot faster, overlapping operations</li> </ul> |                                                                                                                   | <ul style="list-style-type: none"> <li>- can be erased in larger block size than EEPROM</li> <li>- faster than EEPROM (write operation for large data)</li> <li>- cost less than EEPROM</li> <li>- suitable for system requiring large amt of non-volatile mem</li> </ul> | <p>SSD based on flash</p> <p>NAND prefetched</p>                                                                                                                                                                                                                                                                      |                                                                                                                                                                                                                                                 |

| Parallel Data Transfer                                                                                                                                                                                                                                                  |                                                                                                                                                                                                                                                                                                      | Serial Data Transfer                                                                                                                                                                                                                                                          |                                                                                                                                                                                                                                                                                                   |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Pros                                                                                                                                                                                                                                                                    | Cons                                                                                                                                                                                                                                                                                                 | Pros                                                                                                                                                                                                                                                                          | Cons                                                                                                                                                                                                                                                                                              |
| <ul style="list-style-type: none"> <li>&gt;&gt; Faster data transfer rate<br/>(more bits can be transferred at one time ; given same clock rate as serial)</li> <li>&gt;&gt; Hardware interface design tend to be simpler as only strobe signals are needed.</li> </ul> | <ul style="list-style-type: none"> <li>&gt;&gt; Signal Skew &amp; Cross Talk<br/>(limits max clk speed &amp; transfer distance)</li> <li>&gt;&gt; Bulky hardware if data width is large</li> <li>&gt;&gt; need more space to route the PCB traces</li> <li>&gt;&gt; higher hardware costs</li> </ul> | <ul style="list-style-type: none"> <li>&gt;&gt; Less affected by Signal Skew &amp; cross talk (less wires → able to support higher frequency data)</li> <li>&gt;&gt; Transfer data reliably over a longer distance</li> <li>&gt;&gt; Lower cost (less wires, etc.)</li> </ul> | <ul style="list-style-type: none"> <li>&gt;&gt; Data transfer rate lower given same clk rate since only 1 data line available</li> <li>&gt;&gt; more complex hardware interface as it needs to handle serial to parallel conversion (processors typically only process in bytes, etc.)</li> </ul> |

### Synchronous

- >> has common clock signal
- E.g. receiver to latch in data at every rising edge
- >> Master - Slave ; Master provide clk signal
- >> Allow faster transfer rate  
(no data overhead needed)

### Asynchronous

- >> no common clock signal; so devices have to agree on a pre-fixed clock frequency to use for data transfer
- >> receiver knows transmitting clk rate & no. of bits to be transferred with each data pack.
- >> transferred by frame; every frame has data bits + overhead

### Burst Mode

- >> Allow faster data transfer
- >> CPU may be suspended for longer period of time
- > may not be suitable for real-time applications
- >> Suitable for application where transfer is bursty  
E.g. HDD file transfer

### Cycle Stealing

- >> Slower data transfer rate
- >> Interleaving DMA data transfer with CPU instructions allow CPU to continue executing its program while DMA is doing data transfer
- >> CPU performance slower ; more responsive than Burst
- >> Suitable for real time application

### Transparent Mode

- >> slowest data transfer rate but best CPU respond
- >> may not affect CPU performance at all
- >> more complex hardware needed to detect when CPU not using bus

### Polling

#### Pros

- >> Programmer has complete control over entire process
- >> Easiest method to test & debug

#### Cons

- >> CPU waits in loop → cannot perform other task until data transfer done
- >> Program execution of CPU held up while waiting for I/O device to get ready

#### Pros

- >> Efficient use of CPU resources
- >> CPU can continue with other tasks before interrupts
- >> allows prioritization & pre-emption

#### Cons

- >> More hardware
- >> more complex & difficult to debug

>> Inefficient use of CPU resources

## Signal Chain Sub-System

- >> ADC - DAC
- >> Parallel and serial (UART / SPI) Digital Interfaces
- >> Interrupt Controller
- >> Direct Memory Access Controller

## Analog & Digital Signals

- >> real world signals analog by nature
- >> Analog : continuous voltage level & in time domain
- >> digital processing : more flexibility in implementation and are more tolerant to noise and component aging
- >> digital signals are discrete time representation of analog signal ; has discrete voltage levels

>> Sampling frequency needs to be at least twice the signal frequency (Nyquist Theorem)

F.g. signal @ 1k hertz, sample @ 2k hertz / s

>> analog signal level is assigned to the nearest discrete voltage level allowed in process.



- Digital Processors works only in the digital domain so typical process these days is to convert the real world analog signal to digital signal, allow the processor to work on the digital data, and reconstruct back the analog signal to be output to the real world.

## Electrical Signal Interface

- >> Interface: boundary where  $\geq 2$  devices meet to exchange information
- >> modes of connection: Single-bit (serial) / Multi-bit (parallel) Data Transfer ; can be single or bi-direction.



## Input / Output / Bi-Directional Pins

- Semiconductor devices interface to the outside world via pins.
- Depending on the device design and configuration, these pins are either an input, output or bi-direction (input and output) pin.



## Interface Compatibility

- >> Interfacing devices requires compatibility in :
- Electrical signal level

Have to be the same  
↳ Communication protocol (Handshaking & Data Signals)

## Electric signal level

- >> if exceed max input voltage: spoilt / reduced reliability
- >> output voltage should not exceed max. allowable input voltage



- **Device #1**
  - Output Voltage level for logic '1' =  $V_{1+}$  (e.g. 5V)  
For typical value of  $V_{1+}$ , look for  $V(OH)$  parameter in device datasheets.
  - Output Voltage level for logic '0' =  $V_{1-}$  (e.g. 0V)  
For typical value of  $V_{1-}$ , look for  $V(OL)$  parameter in device datasheets.
- **Device #2**
  - Min Input voltage range to recognize as logic '1' =  $V(IH)$   
max. voltage device outputs for input logic 1
  - Max Input voltage range to recognize as logic '0' =  $V(IL)$   
min. voltage device outputs for input logic 0
- In order for Device #2 to sense the logic level correctly,
  - Condition 1:  $V_{1+} \geq V(IH)$  (e.g.  $V_{1+} \geq 2.0V$ )
  - Condition 2:  $V_{1-} \leq V(IL)$  (e.g.  $V_{1-} \leq 0.8V$ )

KEY:  
 $V_{OH} \geq V_{IH}$   
 $V_{OL} \leq V_{IL}$

## Electrical Signal level Standards

>>  $V_{OH} > V_{IH}$

cater to potential drop in voltage when travelling across wires with resistance

>>  $V_{IL} > V_{OL}$

cater to ground noise in electrical signal; to bump up signal

## Differential Signals

- Differential signals has better noise tolerance so is able to be clocked at higher frequency.
- $V(CM) = \text{Common Mode Ground.}$



## Communication Protocols

- >> refers to how data are formatted during transmission
- >> Eg. No. of bits in a transmission frame; what synchronization to use; Data width; Types of data and its formatting (VSB, UART, SPI, etc.)

## Parallel Data Transfer

- >> Multiple bits of data transferred simultaneously between two devices
- >> Synchronous in nature as some sort of strobe signal is needed to inform receiver when to latch data
- >> higher transfer rate than serial (same clk rate)
- >> more prone to Signal Skew and Crosstalk

## Signal Skew

- If for some reason (due to circuit design, external interference), the signal in one or more data lines took different amount of time to reach the receiver, then there is a skew between the signals in the parallel data bus.
- Below example illustrate the effect of signal skew. Data is latched by the receiver on rising edge of the strobe signal.
- This result in wrong data ( $D=0$  instead of 1) being latched by the receiver.



## Signal Skew – Contributing Factors

- Signal Skew is due to variation in propagation delay between signals from the same data bus.
- Propagation delay is the time taken for signal to travel between two points in a circuit.
- Capacitance and Resistance of the physical data line is a major contributor to circuit propagation delay.
  - The larger the resistance and capacitance, the larger the delay. Illustrated in the equation  $\tau = RC$  where  $\tau$  is the time constant dictating rate of change of voltage levels in the data line concerned.
- Variation in resistance and capacitance of the signal lines can be due to
  - Variation in PCB trace length/width (lead to change in impedance).
  - Connecting active components (capacitors, inductors, IC etc) to some of the signal lines.

## Crosstalk

## Crosstalk – Electrical Circuit

- Crosstalk are undesired coupling of signals from one circuit to another circuit.
- In a parallel bus context, the close placement of the data lines in PCB routing or cabling enables the effect of electrical signal in one trace/wire to be coupled over to the other. Creating undesired interference (crosstalk).
- Crosstalk can be transmitted electrically or via electromagnetic radiation (the trace/wire acts like an antenna).

Limitations signal skew & cross talk place on system design

>> max. no. of lines on data bus, max. clk rate, max. no. of bytes that can be transferred in one block, min. delay between each byte transfer

| Parallel Data Transfer                                                                                                                                                                                                                                                 |                                                                                                                                                                                                                                                                                                      | Serial Data Transfer                                                                                                                                                                                                                                                          |                                                                                                                                                                                                                                                                                                   |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Pros                                                                                                                                                                                                                                                                   | Cons                                                                                                                                                                                                                                                                                                 | Pros                                                                                                                                                                                                                                                                          | Cons                                                                                                                                                                                                                                                                                              |
| <ul style="list-style-type: none"> <li>&gt;&gt; Faster data transfer rate<br/>(more bits can be transferred at one time; given same clock rate as serial)</li> <li>&gt;&gt; Hardware interface design tend to be simpler as only strobe signals are needed.</li> </ul> | <ul style="list-style-type: none"> <li>&gt;&gt; Signal skew &amp; Cross Talk<br/>(limits max clk speed &amp; transfer distance)</li> <li>&gt;&gt; Bulky hardware if data width is large</li> <li>&gt;&gt; need more space to route the PCB traces</li> <li>&gt;&gt; higher hardware costs</li> </ul> | <ul style="list-style-type: none"> <li>&gt;&gt; Less affected by Signal skew &amp; Cross talk (less wires → able to support higher frequency data)</li> <li>&gt;&gt; Transfer data reliably over a longer distance</li> <li>&gt;&gt; Lower cost (less wires, etc.)</li> </ul> | <ul style="list-style-type: none"> <li>&gt;&gt; Data transfer rate lower given same CLK rate since only 1 data line available</li> <li>&gt;&gt; more complex hardware interface as it needs to handle serial to parallel conversion (Processors typically only process in bytes, etc.)</li> </ul> |
|                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                      |                                                                                                                                                                                                                                                                               |                                                                                                                                                                                                                                                                                                   |

### Serial Data Transfer

>> Data transferred one bit at a time over a single data line.

### Serial Data Transfer Mode

- >> Simplex: Data transfer in 1 direction
- >> Half-Duplex: Data transfer in both directions, but Rx & Tx can only be transmitter or receive at any one point in time
- >> Full Duplex: Simultaneous Rx And Tx



### Synchronous

>> has common clock signal  
E.g. receive to latch in data at every rising edge of clock

### A synchronous

>> no common clock signal;  
so devices have to agree on a pre-fixed clock frequency to use for data transfer



### Synchronous Serial Transfer

- >> Common clock signal between transmitter & receiver to synchronize the data transfer
- >> Master - Slave Configuration

# Calculations

## MAX DATA TRANSFER RATE

>> Max strobe rate  $\times$  No. of bits

↖ Max strobe rate  $\rightarrow$  determined by slowest device in the chain

E.g.

NOR Flash (NOR0001-1M)

- Max data strobe rate supported by DRAM = 10MHz
- Max Processor Parallel bus strobe rate = 200MHz
  - $\Rightarrow$  Max supported data strobe rate on this link = 10MHz
- 16-bit interface  $\Rightarrow$  16 bits data transferred with each strobe
  - $10\text{M} * 16 = 160\text{Mbps}$

## DATA TRANSFER READING

2) Fig. 2 shows the logical waveform of an asynchronous data frame.



>> 7-bit ASCII character transmitted .  
 ↗ 1001101  $\rightarrow$  0x4D , 'M'

>> Even Parity : if Data + Parity bit = even no. of 1s

>> odd Parity : if Data + Parity bit = odd no. of 1s

>> if baud rate is 9600 :

$$1 \text{ cycle!} \quad T = \frac{1}{\text{baud rate}} = \frac{1}{9600} = 0.104 \text{ ms}$$

>> Data transfer rate of serial interface in characters per second (cps) :

1 packet  $\rightarrow$  1 Start , 7 Data , 1 Parity & 1 Stop

Total : 10 bits

$$\frac{\text{baud rate}}{\text{no. of bits per packet}} = \frac{9600}{10} = 960 \text{ cps}$$

$$>> \text{if no parity bit : } \frac{9600}{9} = 1066 \text{ cps}$$

$\rightarrow$  output is more but data packet integrity cannot be verified.

(f) Assume the baud rate of the transmitter is 4800, but the baud rate of the receiver is configured as 9600. Based on Fig. 2, determine, if any, the ASCII character(s) that will be received.

[Suggested Solution]

### Possible multiple-sampling scenario

4800 baud rate: 7-bits:  $1001101_2 \rightarrow 0x4D$

| Original Frame |                | P | E |  |  |
|----------------|----------------|---|---|--|--|
| S              | D (lsb to msb) | 0 | 1 |  |  |
| ~              | 1011001        | 0 | 1 |  |  |

### Oversampled Frames (2x)

| Frame #1 |                | Frame #2 |   | P | E              |   |   |
|----------|----------------|----------|---|---|----------------|---|---|
| S        | D (lsb to msb) | P        | E | S | D (lsb to msb) | P | E |
| 0        | 0110011        | 1        | 1 | 0 | 0001100        | 1 | 1 |

Frame #1: 0x66  $\rightarrow$  lowercase letter 'f'

Frame #2: 0x18  $\rightarrow$  special character 'CAN'

Parity errors will be flagged for both frames (odd numbers of '1's received).

## MAX BAUD RATE

Consider the system diagram in Figure 1 of case study notes.

- The processor UART peripheral in its Serial I/O module is configured to receive data with format of 1 start-bit, 7 data-bits, 1 parity-bit and 1 stop-bit.
- Each time the UART peripheral receives a character, it'll store the data into a buffer.
- It will then interrupt the CPU to notify CPU that there is data available.
- The CPU will then execute the Interrupt Service Routine (ISR) to read the character received.
- Interrupt Latency is the term used to describe the time between interrupt request and entrance to ISR.
- The minimum interrupt latency for the CPU is 10  $\mu\text{s}$  and it takes 90  $\mu\text{s}$  for the CPU to execute the instructions in the ISR (read the received data from the buffer).
- Assume that the UART data is transferred back to back i.e. no delays between each UART packets.

>> Total time required to transfer one character = Interrupt Latency + ISR execution

$$T_c = \frac{1}{(10+90)} = 10000 \text{ SRS per second}$$

>> Each char transfer involved 10 bits on UART

$$10 \text{ bits} * \frac{10000}{T_c} = 100 \text{ kbits per second}$$

## INTERRUPTS

The specifications of the system in Figure 1 (case study notes) requires the buttons to be asserted 400 times over a 24-hour period. Given that the processor consumes 0.1mA current when it is idle and 50mA when it is in active mode i.e. running code, answer the following.

- (a) Supposed polled IO technique is used to sample the button status. Given that the processor poll the buttons every 100ms, each poll requires the processor to be active for 10ms, what is the average current consumed over the 24-hour period?

$$\text{Polling time} = \frac{40 \times 0.1 + 10 \times 50}{100} \text{ ms}$$

(a) Polled every 100ms, each poll takes 10ms  
 ⇒ 90ms(idle), 10ms(active).  
 ⇒ Average current =  $0.9 \times 0.1\text{mA} + 0.1 \times 50\text{mA} = 5.09\text{mA}$

- (b) If interrupt-driven I/O is used to sample the button status, what is the average current consumed in the same 24-hour period? Given that it takes 20ms to service each interrupt, including interrupt latency and ISR execution.

(b) Interrupt mechanism.  
 ⇒ Interrupted 400 times over 24-hour  
 ⇒ Active time =  $400 \times 20\text{ms} = 8000\text{ms}$   
 ⇒ Average current consumed over 24-hour  
 $= [8 \times 50\text{mA} + ((24 \times 60 \times 60) - 8) \times 0.1\text{mA}] / [24 \times 60 \times 60]$   
 $= 0.1046\text{mA}$

Active time:  $400 \text{ times} \times 20\text{ms} = 8000 \text{ ms}$   
 not active  $\downarrow$   
 $8000 + [(24 \times 60 \times 60) - 8] \times 0.1$   
 seconds in 24 hrs

- (c) Given that a battery with capacity of 2000mAh means that it can supply 2000mA continuously for 1 hour, or 1000mA continuously for 2 hours. How long would a battery of 2000mAh last when used in (a) and (b) above?

### (c) Use case (a) (Polling)

- a. Average current consumed 5.09mA
- b. Battery life =  $2000\text{mAh} / 5.09\text{mA} = 393 \text{ hours} = 16.4 \text{ days}$

### Use case (b) (Interrupt)

- a. Average current consumed 0.104 mA
- b. Battery life =  $2000\text{mAh} / 0.104\text{mA} = 19231 \text{ hours} = 801 \text{ days} = 2.2 \text{ years}$

battery capacity

inverse current

## DMA

- The processor's system bus is capable of supporting simultaneous transfer of up to 3 bytes of data at one time;
- DMA is used to transfer video data from Memory to Display Controller module.
- Each video pixel data consist of three bytes (Red, Green, Blue) and are transferred simultaneously on the system bus on each bus cycle.
- Each transfer on the system bus takes 5ns and transfer of the bus control between the CPU and the DMAC takes 100ns.
- Assume that DMAC is using Fetch-and-Deposit DMA.
- Note that 1Kbyte = 1024 Byte.



- (a) Given that the video is output at a rate of 30 frames per second and each video frame has a resolution of 1920x1080 pixels. If burst-mode was used by the DMAC to burst one frame of video data at a time, would the DMAC be able to transfer each video frame completely?

### [Suggested Solution]

Each transfer of the system bus will transfer 3 bytes of video data (RGB) and take 2\*5ns.

Number of transfers in 1/30 seconds =  $1920 \times 1080 = 2073600$  transfers.

Total time taken

$$= 100\text{ns} + 2073600 \times 2 \times 5\text{ns} + 100\text{ns} = 0.02073602 \text{ sec} < 1/30\text{sec}$$

⇒ DMAC is able to transfer each frame of video before the start of the next frame request.

- (b) Repeat the calculation if the DMAC is using cycle-stealing mode, assume that DMAC needs to wait at least 5 instructions before it could request for control of the system bus again.

### [Suggested Solution]

Time taken for one transfer

$$= \text{Bus control request} + 1 \text{ transfer} + \text{Bus release} + 5 \times \text{CPU Clock cycles wait}$$

$$= 100\text{ns} + 2 \times 5\text{ns} + 100\text{ns} + 5 \times (1/400\text{MHz}) = 222.5\text{ns}$$

Total time taken for 2073600 transfers =  $2073600 \times 222.5\text{ns} = 0.461376 \text{ sec} > 1/30\text{sec}$

⇒ DMAC will not be able to complete the transfer of one video frame before the next frame request comes in

Note that the last 5\*CPU clock cycles wait is added in the above calculation to simplify the calculation process. But it'll not meet the requirement (>100%) even if it is taken into account.

Note: the '400 Mhz' for CPU clock is taken from the processor information in the case study notes.

Burst:

Bus control request + (no of frames × fetch × deposits) + Bus control request

Cycle Stealing:  
 Bus control request + (fetch × deposit) +  
 Bus control request × instruction wait ×  
 CPU clk

# CAPACITY

(h) Part Number: HDD001

| Overview                            | Electrical Interface        |
|-------------------------------------|-----------------------------|
| ■ 4 surfaces                        | • V(OH): 5V(max), 4.2V(min) |
| ■ 1024 tracks per surface           | • V(OL): 1.3V(max), 0V(min) |
| ■ 128 sectors per track             | • V(IH): 5V(max), 3.6V(min) |
| ■ 512 bytes/sector                  | • V(IL): 1.8V(max), 0V(min) |
| ■ Track-to-track seek time = 5 msec |                             |
| ■ Rotational speed = 5000 RPM       |                             |
| ■ MTTF = 500,000 hours              |                             |

(i) Part Number: HDD002

| Overview                            | Electrical Interface        |
|-------------------------------------|-----------------------------|
| ■ 8 heads                           | • V(OH): 5V(max), 4.2V(min) |
| ■ 1024 cylinders                    | • V(OL): 1.3V(max), 0V(min) |
| ■ 256 sectors per track             | • V(IH): 5V(max), 3.6V(min) |
| ■ 512 bytes/sector                  | • V(IL): 1.8V(max), 0V(min) |
| ■ Track-to-track seek time = 4 msec |                             |
| ■ Rotational speed = 10000 RPM      |                             |
| ■ MTTF = 1,000,000 hours            |                             |

>> Formula for capacity:

$\text{Surfaces} \times \text{tracks per surface} \times \text{sectors per track}$   
 $\times \text{bytes/sector}$

>> Formula for access time:

$$T_A = \text{Seek time} + \text{Average rotational delay}$$

SEEK  
ROTATIONAL DELAY

$$>> \text{RPM} \rightarrow \text{RPS} : \frac{5000 \text{ rpm}}{60 \text{ s}} \rightarrow \sim 83 \text{ rps}$$

a. What is the capacity of each drive?

[Suggested Solution]

HDD001:

Capacity =  $4 \times 1024 \times 128 \times 512 = 268,435,456 \text{ Bytes} = 256 \text{ Mbytes}$   
Note: 1Mbyte =  $2^{20}$  bytes

HDD002:

Capacity =  $8 \times 1024 \times 256 \times 512 = 1,073,741,824 \text{ bytes} = 1 \text{ GByte}$

Note:

1GByte =  $2^{30}$  bytes,

# of cylinders = # of tracks of each surface,

# of heads = # of valid surfaces

b. For HDD001,  
i. What is its access time?

[Suggested Solution]

RPS = 5000/60

Average rotational delay =  $T_{R,Avg} = 0.5/\text{RPS} = 0.5/(5000/60) = 6 \text{ ms}$

Seek time =  $T_s = 5 \text{ msec}$

Access time,  $T_A = T_s + T_{R,Avg} = 5 + 6 = 11 \text{ ms}$

- ii. What is the time needed to transfer a 4Kbyte file stored in random non-consecutive sectors on different tracks? Assume that every sector is on a different track.

[Suggested solution]

RPS = 5000/60 per second

Access time  $T_A = 11 \text{ ms}$

$\text{Surface} \times \text{tracks per surface}$   
 $\text{bytes per sector}$   
 $\text{bytes per track} \times \text{bytes per sector}$

$\text{no. of sectors} =$

$$\# \text{ of sectors} = 4 \times 1024 / 512 = 8$$

$$\text{Transfer time } T_f \text{ for 1 sector} = 512 / ((\text{RPS} * \text{Dr} * \text{Ds}) = 512 / ((5000/60) * 128 * 512) = 93.75 \text{ us}$$

$$\text{Total time, } T_{\text{TOTAL}} = 8 * (T_A + T_f) = 88.75 \text{ ms}$$

Note: This illustrate the effect for HDD fragmentation. If a file is fragmented, i.e. stored in non-consecutive sectors/clusters, the effective performance of the HDD data transfer will drop, as shown by the total access time for the two examples above.

- iii. After defragmenting HDD001, what would be the time needed to transfer a 280Kbyte file?

[Suggested solution]

Defragmentation => file is stored in consecutive sectors.

However, 280KByte is large than size of one track (128\*512=64KByte).

It occupies 4 full tracks and 24 Kbyte on the fifth track.

RPS = 5000/60 per second

Access time  $T_A = 11 \text{ ms}$

$T_f$  for one full track = 1/RPS = 12 ms

$$\text{Total time for 1 track, } T_{\text{TOTAL}} = 11 + 12 \text{ ms} = 23 \text{ ms}$$

$$4 \text{ tracks} = 4 * 23 = 92 \text{ ms}$$

Last 24KByte:

$$\text{Transfer time } T_f = N / (\text{RPS} * \text{DT} * \text{DS}) = (24 * 1024) / ((5000/60) * 128 * 512) = 4.5 \text{ ms}$$

$$\text{Total time, } T_{\text{TOTAL}} = 11 + 4.5 \text{ ms} = 15.5 \text{ ms}$$

Total time needed to transfer 280KByte file =  $92 + 15.5 = 107.5 \text{ ms}$

Note:  $T_A$  is required when moving from one track to the other since the R/W head need to be positioned at the starting sector of the next track.

**Note: Do not refer to the processor configuration in the case study notes for this tutorial.  
A smaller system will be used instead.**

## 8.1 Cache

1. Given a processor system with the following characteristics

- Processor has a direct-mapped cache with 32 cache blocks and a cache size of 512 bytes.
- Cache Memory Access time = 5ns.
- Cache Hit rate = 0.9
- 64Kbyte DRAM used as the main memory.
- DRAM Memory access time = 200ns

a. In doing cache mapping analysis, how many **blocks** would the main memory be partitioned to?

[Suggested Solution]

$$\text{Cache blocks} = \frac{\text{Cache size}}{\text{Cache blocks}}$$

Cache block size = 512/32 = 16 Bytes

$$\text{Number of main memory blocks} = 64\text{KByte}/16\text{Byte} = (2^{16})/(2^4) = 2^{12} = 4096$$

b. What is the format of a memory address as seen by the cache (i.e. determine the sizes of the tag, block and offset fields)?

[Suggested Solution]

$$\text{Cache Block Size} = 16 \Rightarrow \text{Offset Field} = 4 \text{ bits}$$

$$\text{Number of blocks in the cache} = 32 \Rightarrow \text{Block Field} = 5 \text{ bits}$$

$$\text{Main Memory Size} = 64\text{KByte} \Rightarrow 16 \text{ address bits}$$

$$\text{Tag Field bits} = \text{Main Memory Address bits} - \text{Offset} - \text{Block} = 7$$

$$\Rightarrow \text{TAG:BLOCK:OFFSET} = 7:5:4$$

c. CPU needs to read a byte from main memory address 0x0DB63.

- i. Which cache block would CPU look at to search for the required data?
- ii. How many main memory blocks could potentially be mapped to the same cache block as that of 0x0DB63?
- iii. How does the CPU know if the cache block identified in (i) above contains the data that it needs?
- iv. What is the purpose of the ‘offset’ field in the cache mapping?

**[Suggested Solution]**

- i.  $0x0DB63 = \text{0000 } \textbf{1101 } \textbf{1011 } \textbf{0110 } \textbf{0011}$   
 $\Rightarrow$  Block 10110b = Block 22
  - ii. Number of MM blocks = 4096  
Number of cache blocks = 32  
 $\Rightarrow$  each cache block is a potential destination to  $4096/32 = 128$  MM blocks.
  - iii. By looking at the TAG value entry of the corresponding cache block  
For 0x0DB63, the TAG value for the cache block should be equal to **00001101 101b**
  - iv. Offset refers to the offset of the byte of interest from the cache block boundary. For 0x0DB63, the byte is the byte 3 of block 22. First byte of the block is byte 0.
- d. What is the effective access time of the memory in this system?

**[Suggested Solution]**

Assume cache memory and main memory access do not overlap.

$$\text{EAT} = H * \text{Cache}_{\text{Access}} + (1-H) * (\text{Cache}_{\text{Access}} + \text{Mem}_{\text{Access}}) = 0.9 * 5\text{ns} + (1-0.9) * (5\text{ns} + 200\text{ns}) = 25\text{ns.}$$

## 8.2 Virtual Memory

2. In a processor system with the following characteristics,

- 1 MByte Virtual memory space
- 64 Kbyte DRAM as main memory
- Paging scheme used for virtual memory management, Page Table as shown in Table 8.2
- Virtual Page size = 1 KByte
- TLB with 4 entries

**Table 8.2 – Page Table**

| Virtual Page Number | Valid Bit | Page Frame Number |
|---------------------|-----------|-------------------|
| 0                   | 1         | 1                 |
| 1                   | 1         | 2                 |
| 2                   | 0         | -                 |
| 3                   | 1         | 16                |
| 4                   | 1         | 9                 |

a. How many bits are required for each virtual address?

[Suggested Solution]

$$1 \text{ Kb} = 2^{10}$$

$$1 \text{ Mbyte} = 2^{20} \text{ byte} \Rightarrow 20 \text{ bits}$$

$$1 \text{ Mb} = 2^{20}$$

$$1 \text{ Gb} = 2^{30}$$

b. How many bits are required for each physical address?

[Suggested Solution]

$$64 \text{ KByte} = 2^6 * 2^{10} = 2^{16} \Rightarrow 16 \text{ bits}$$

c. What is the maximum number of entries in the page table in Table 7.2?

[Suggested Solution]

$$\text{Number of virtual pages} = 2^{20} / 2^{10} = 2^{10} = 1024$$

$$\Rightarrow \text{Max number of entries} = 1024$$

d. What is the maximum number of valid entries in the page table in Table 7.2?

[Suggested Solution]

$$\text{Number of Physical Frames} = 2^{16} / 2^{10} = 64$$

$$\Rightarrow \text{Max number of valid entries} = 64$$

e. With reference to Table 7.2, answer the following. Indicate when a page fault occurs.

- (i) The compiler mapped the UART routine to virtual address 0x005F0 – 0x006FF, where in the DRAM would you be able to find the UART routine?

[Suggested Solution]

0x005F0 → virtual page number 1 → mapped to page frame number 2  
 0x006FF → virtual page number 1 → mapped to page frame number 2

| Virtual Address (20 bits) | Physical Address (16 bits) |
|---------------------------|----------------------------|
| 0000 0000 0101 1111 0000  | 0000 1001 1111 0000        |
| 0000 0000 0110 1111 FFFF  | 0000 1010 1111 FFFF        |

Physical Address = 0x09F0 – 0xAFF

- (ii) The compiler mapped the I2C routine to virtual address 0x009C0 - 0x009DF, where in the DRAM would you be able to find the I2C routine?

[Suggested solution]

0x009C0 → virtual page number 2 → page fault occurs (valid bit =0).

0x009DF → virtual page number 2 → page fault occurs (valid bit =0).

- (iii) What happens when there is a page fault?

[Suggested Solution]

Page fault => the required data/code is not in the main memory

=> OS needs to retrieve the required information from the storage memory and update the paging table accordingly.

f. What memory are the Page Table and TLB resided?

[Suggested Solution]

Main Page Table => Main Memory

TLB => Internal Fast Memory close to CPU

g. What is the function and effect of a TLB?

[Suggested Solution]

- ⇒ Cache the frequently used Page table information
- ⇒ Leads to shorter access time for paging information => increase in system performance.

---

(Not necessary to be covered during tutorial)

[Optional, but students are encouraged to attempt these questions]

3. Consider a system with the following characteristics.

- Direct mapped cache of 32 cache blocks and cache block size of 32 bytes
  - Cache uses Physical Address for address mapping
  - Virtual Memory page size 2048 bytes
  - Virtual Memory size is 1Mbyte. Physical Memory size is 64KByte
  - Extracts of Page Table (valid entries)
    - Virtual Page 0 → Physical Frame 9
    - Virtual Page 1 → Physical Frame 3
    - Virtual Page 2 → Physical Frame 5
    - Virtual Page 3 → Physical Frame 2
    - Virtual Page 4 → Physical Frame 7
  - The main program is 5KByte in size and starts at virtual address 0x01006
- (i) Assuming that the compiler allocates the program sequentially in the virtual memory, what is the physical address of the start and end of the main program?

[Suggested Solution]

Virtual Address (Start) = 0x01006, end = 0x02406. (5KByte size)  
0x1006 => 0001 0000 0000 0110 => Page 2 => Physical Frame 5 (101b)  
Physical address = 0010 1000 0000 0110 = **0x2806**  
0x2406 => 0010 0100 0000 0110 => Page 4 => Physical Frame 7 (111b)  
Physical Address = 0011 1100 0000 0110 => **0x3C06**

- (ii) Which cache block should the CPU check in the cache for the start of the main program? What is the corresponding TAG value used to check for cache hit/miss?

[Suggested Solution]

Physical Address used for tagging. Hence, TAG:BLK:OFFSET = 6:5:5  
Start address = 0x2806 = 0010 1000 0000 0110  
TAG value = 001010 = **0xA**, BLK = 000000 = **0**

## 9.1 Computer Arithmetic

1. Describe and explain the pros and cons of fixed and floating number system, particularly the representable range and precision. Illustrate using example of 32-bit fixed point and the 32-bit floating point number in single precision IEEE754 format.

[Suggested Solution]

- Floating Point (IEEE754)
  - Max Range  $\approx 2 \cdot 2^{128}$  ( $\sim 2^{128}$  to  $-2^{128}$ ).
  - Max Precision (near to zero) less than  $2^{-126}$
- Fixed Point
  - Max Range (Radix right of LSB, unsigned)  $\approx 2^{32}$
  - Max Precision (Radix left of MSB)  $2^{-32}$
- Floating point yield a larger range and better precision at small numbers with the same number of bits representation.
- One usually needs the best precision when the numbers are small.
- However, Fixed point number has the advantage of having uniform precision across entire range.
- Floating point number's precision changes across the range and the very coarse precision at the two end of the range may not be desirable to the intended algorithm.

2. An array consisting of the length of 256 wires is given by  $L[0], L[1], \dots, L[255]$ . Assume that user are not allowed to test for any overflow during computation, describe a scheme to compute the average length of the 256 wires that will yield a result with the highest precision based on the following specifications:

- 16-bit registers are used for storing the data and result.
- Only Single-Precision (16-bit) and Fixed-Point arithmetic is used.
- Maximum possible length of each wire is 0x3FF and is an integer.

No coding is required, illustrate your answer in the form of a mathematical expression and justify your answer.

[Suggested Solution]

**Solutions**

- Unsigned arithmetic used to maximise the range since the wire length cannot be negative
- 16-bit => maximum 0xFFFF
- Max length of one wire = 0x3FF => Max number of wire length that can be accumulated before the result will potentially overflow = 64
- Math Expression =  
$$\{(L[0]+...+L[63])/64 + (L[64]+...+L[127])/64 + (L[128]+...+L[191])/64 + (L[192]+...+L[255])/64\}/4$$

Always perform Division last to avoid truncating the LSBs too early in the computation. But need to take note of overflow when using addition, subtraction and multiplication.

## 9.2 Pipelines

3. Consider a processor (not ARM processor) with 4 pipeline stages: Fetch Instruction (F), Decode (D), Execute (E) and Store (S). Assume that

- Branch target address is calculated at the execute stage
- Instruction length for every instruction is one word long
- Each pipeline stage take 1 cycle to complete
- Delay Branching is not enabled.

How many cycles does the code in Figure 9.2 take? You can ignore all pipeline conflicts you see in the code, just focus on the pipeline behaviour and execution cycles.

```

    MOV  R6, #0x900 ; I1
    MOV  R5, #0      ; I2
    MOV  R4, #0x800 ; I3
    MOV  R3, #0x300 ; I4
Loop LDR  R0, [R3]   ; I5
      LDR  R1, [R4]   ; I6
      ADD  R2, R1, R0 ; I7
      ADD  R5, R5, #1 ; I8
      ADD  R4, R4, #1 ; I9
      ADD  R3, R3, #1 ; I10
      CMP  R5, #5     ; I11
      BNE  Loop       ; I12
      ADD  R4, R4, R2 ; I13
      STR  R2, [R6]   ; I14

```

Figure 9.2

### [Suggested Solution]

- I1 to I4: 7 cycles (I1 will take 4 cycles to pass through the pipeline)
- I5 to I12. For the first four iterations, each iteration takes (8+2) cycles. This includes 2 cycles discarded in each iteration.
- I5 to I12. Fifth iteration takes 8 cycles since the branch is not taken so no instructions are discarded.
- I13 to I14: 2 cycles.
- Total =  $7 + (8+2)*4 + 8 + 2 = 57$  cycles.

**Solutions**

4. Consider a processor (not ARM processor) with 4 pipeline stages: Fetch Instruction (F), Decode (D), Execute (E) and Store (S). Assume that

- Branch target address is calculated at the execute stage
- Instruction length for every instruction is one word long
- Each pipeline stage takes 1 cycle to complete
- No Resource Conflicts
- Delayed Branching is enabled

Identify and describe ALL pipeline conflicts the code in Figure 9.3 has when executed in the pipeline processor above. Suggest workaround for pipeline conflicts identified.

```

    MOV  R6, #0x900 ; I1
    MOV  R5, #0      ; I2
    MOV  R4, #0x800 ; I3
    MOV  R3, #0x300 ; I4
Loop LDR  R0, [R3]   ; I5
      LDR  R1, [R4]   ; I6
      ADD  R2, R1, R0 ; I7
      ADD  R5, R5, #1 ; I8
      CMP  R5, #5     ; I9
      BNE  Loop       ; I10
      ADD  R4, R4, #1 ; I11
      ADD  R3, R3, #1 ; I12
      ADD  R4, R4, R2 ; I13
      STR  R2, [R6]   ; I14

```

**Figure 9.3**

### [Suggested Solution]

- I4 and I5. Data dependency. R3 register value is used in I5 before I4 completes loading R3 register. Resolved by
  - swapping I3 and I4, OR swapping I5 ad I6, OR
  - inserting a NOP instruction between I4 and I5
- I6 and I7. Data dependency. R1 register value is used in I7 before I6 completes loading R1 register. Resolved by
  - swapping I7 and I8, OR
  - inserting a NOP instruction between I6 and I7
- I8 and I9. Data dependency. R5 register value is used in I9 before I8 completes loading R1 register. Resolved by
  - swapping I7 and I8, OR
  - inserting a NOP instruction between I8 and I9

- I5 and I12. Data dependency. I11 and I12 are delay slot instructions. During the first four iterations of the loop, I5 uses R3 before I12 can load the latest increment to R3. Resolved by
  - swapping I11 and I12, and add a NOP before I13.
- No branch conflicts since delayed branching is used.
- No data dependency between I13 and I14 since I14 is just reading and not writing to R2

(Not necessary to be covered during tutorial)

[Optional, but students are encouraged to attempt these questions]

### 9.3 Rounding Error

5. You have been tasked to write a program that calculates the actual time based on a counter that is incremented once every 0.10 seconds. For example, if the counter value is 3,600,000, you would expect the actual time to be 100 hours ( $(3,600,000 \times 0.1) / (60 \times 60)$ ).

Suppose you have decided to use a 24-bit fixed point representation as shown in Figure 8.4 to store the value of 0.10 seconds ( $2^{-4} + 2^{-5} + 2^{-8} + 2^{-9} + 2^{-12} + 2^{-13} + \dots$ ).

|   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | . | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

**Figure 9.3 – Fixed point representation of 0.1010**

- Approximate the round-off error (in decimal) of  $0.10_{10}$  due to the fixed-point representation.
- What is the effect of this round-off error on the time calculated if the counter value is 3,600,000?

[Suggested Solution]

- The decimal value of the fixed point representation in Figure 8.4 is  $0.0999999046325684_{10}$ .

Hence, the round-off error is approximately  $0.000000095_{10}$  ( $0.10_{10} - 0.0999999046325684_{10}$ ).

- The time calculated will be off by approximately **0.34 seconds** ( $3600000_{10} \times 0.000000095_{10}$ ).

Note: This question is based on an actual example of disasters caused by bad numeric. The inaccurate calculation of time has contributed to fatalities when it was used for intercepting enemy missiles during Gulf War. While an error of 0.34 seconds may not seem like much, a missile travelling at 1,676 meters per second would have travelled approximately 500 meters in that time.

## Important Formulas (106)

HDD Transfer Rate:

$$\text{Revolutions Per Second (RPS)} = \text{RPM} / 60$$

$$\text{Average Rotational Delay } (T_{R,AU}) = \frac{0.5}{\text{RPS}}$$

$$\text{Transfer Time } (T_T) = \frac{N}{\text{RPS} * D_r * D_s}$$

↑                      ↑                      ↑  
 Revolutions    Track    Sector  
 per sec       Density   Density

$$\text{Access Time } (T_A) = T_s + T_R$$

↑                      ↑  
 Average    Average Rotational  
 Seek        Delay

## Direct-Mapped Caching

$$\text{Main/System Memory Size} = 64 \text{ KBytes}$$

$$\text{Cache Size} = 256 \text{ Bytes}$$

$$\text{Cache Block Size} = 16 \text{ Bytes}$$

Where would Data at main memory address 0x1106 be mapped to?

$$\text{Offset bits} = \log_2(\text{Cache Block Size}) = \log_2(16) = 4$$

$$\text{No of Cache blocks} = 256 / 16 = 16$$

$$\text{Blk bits} = \log_2(\text{Number of Cache Blocks}) = \log_2(16) = 4$$

$$\text{Main Memory Size} = 64 \text{ KB} (\text{Convert to bytes}) = 64 * 1024 = 65536$$

$$\text{Main Memory Size in bits} = \log_2(65536) = 16 \text{ bits}$$

$$\text{Tag bits} = 16 - 4 - 4 = 8$$

$$\text{Mapping Format} = 8:4:4$$

Apply to the main memory address 0x1106 (convert to binary)

$$0x1106 = \underbrace{0001}_{\# \text{TAG}} \underbrace{0001}_{\# \text{Blk}} \underbrace{0000}_{\# \text{Offset}}$$

$$\text{TAG} = 0x11 \quad \text{Blk} = 0x0 \quad \text{Offset} = 0x6$$

Consider a system with the following characteristics.

- Direct mapped cache of 16 cache blocks and block size 16 bytes
- Cache uses Virtual Address for address mapping
- Virtual Memory page size 1024 bytes
- Virtual Memory size 1Mbyte. Physical Memory size 64 KByte
- Extracts of Page Table (valid entries)
  - Virtual Page 0 → Physical Frame 2
  - Virtual Page 1 → Physical Frame 5
  - Virtual Page 2 → Physical Frame 8
  - Virtual Page 9 → Physical Frame 3

From which cache block would the CPU retrieve the data for a virtual address 0x00119? What is the corresponding tag value of the cache block if it is a cache hit?

(5 marks)

$$\begin{aligned} \log_2(16) &= 4 \\ \log_2(16) &= 4 \\ (16 * 1024) &= 65536 \\ (65536) & \end{aligned}$$

$$\begin{aligned} 10^2 &= 100 \\ 100 & \end{aligned}$$

$$10 - 4 - 4 = 8$$



Cache Miss Rate  $\rightarrow$   $1 - \text{Hit Rate} \rightarrow (1-H)$

### Effective Access Time

$$EAT = H \times \text{Access}_c + (1-H) \times (\text{Access}_c + \text{Access}_{mn})$$

↑      ↑      ↑      ↑      ↑  
Hit Rate    Access time    Cache    Access time    Access time  
              for cache      miss rate      for cache      for main memory

Example: Consider a system with a main memory access time of 200ns supported by a cache having 10ns access time and a hit rate of 99%.

$$\text{Hit Rate} = 0.99$$

$$\text{Access}_c = 10\text{ns} \quad 0.99 \times (10) + (0.01) (10 + 200)$$

$$\text{Access}_{mn} = 200\text{ns} \quad = 12\text{ns}$$

## Paging Example :

Virtual Address Space  $\rightarrow$  8 kb  $\rightarrow$  8192 bytes  $\rightarrow 2^{13} \rightarrow$  13 bits

Physical Address Space  $\rightarrow$  4 kb  $\rightarrow$  4096 bytes  $\rightarrow 2^{12} \rightarrow$  12 bits

Page Size  $\rightarrow$  1 kb  $\rightarrow$  1024 bytes  $\rightarrow 2^{10} \rightarrow$  10 bits (offset)

$$\text{Virtual Page field} = 13 - 10 = 3$$

$$\text{Physical Frame field} = 12 - 10 = 2$$

### Virtual Address Examples :

0X1553 / 0X0FAD

1. 0X1553 (Break it up to binary) Follow the virtual addr  
Space 13 bits, therefore  
for 1, omit the 0s
- 1 0101 0101 0011 13 bits representation
- virtual page field offset 101 in decimal is 5. Check  
page 5 if there is a valid bit.

| Page | Frame | Valid Bit |
|------|-------|-----------|
| 0    | -     | 0         |
| 1    | 3     | 1         |
| 2    | 0     | 1         |
| 3    | -     | 0         |
| 4    | -     | 0         |
| 5    | 1     | 1         |
| 6    | 2     | 1         |
| 7    | -     | 0         |

it is found in frame 1.

In physical frame  $\rightarrow$  0101 0101 0011  
physical frame offset retained

2. 0XFAD (Break it up to binary)

- 0 1111 1010 0000 13 bits representation
- virtual page field offset 011 in decimal is 3. Check the  
Page 3 if there is a valid bit.

It has a valid bit of 0,

therefore, there is a page

fault.

| Page | Frame | Valid Bit |
|------|-------|-----------|
| 0    | -     | 0         |
| 1    | 3     | 1         |
| 2    | 0     | 1         |
| 3    | -     | 0         |
| 4    | -     | 0         |
| 5    | 1     | 1         |
| 6    | 2     | 1         |
| 7    | -     | 0         |

## Arithmetic

### Two's Complement Representation

MSB = 0 (positive number)

MSB = 1 (negative number)

Example: (Convert to Negative)

+106: 01101010

-106: 10010110

(After the first 1, negate the rest)

$$-128 + 16 + 4 + 2 = -106$$

### Examples of Carry and Overflow:

(-4): 1100

(-2): 1110

→ Carry bit thrown.

$$\begin{array}{r} 1 \\ \hline 1 & 0 & 0 \\ + & 1 & 1 & 0 \\ \hline 1 & 0 & 1 & 0 \end{array}$$

No overflow, sign still same

(-4): 1100

(-65): 1010

→ Carry bit thrown

$$\begin{array}{r} 1 \\ \hline 1 & 0 & 0 \\ + & 1 & 0 & 1 & 0 \\ \hline 0 & 1 & 1 & 0 \end{array}$$

Sign change, overflow occurred

## Carry Bit

Example: (48 + (-19) = 29)

→ Carry bit thrown away ~

$$\begin{array}{r} 1 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 \\ + & 1 & 1 & 1 & 0 & 1 & 1 & 0 & 1 \\ \hline 0 & 0 & 0 & 0 & 1 & 1 & 0 & 1 \end{array}$$

### Overflow (For signed numbers)

In (A+B): (Addition)

MSB(A) = MSB(B), MSB(Result) ≠ MSB(A)

Overflow occurred

In (A-B): (Subtraction)

MSB(A) ≠ MSB(B), MSB(Result) ≠ MSB(A)

### Overflow Occurred

## Sign Extension

- In two's complement, sign extension is needed to convert a smaller size operand to a larger size operand



- Sign extension simply copies the sign bit (MSB) into the higher order bits

## Multi-Precision Arithmetic

ADD R0,R0,R1 ; add lower word with carry out

ADC R2,R2,R3 ; add upper word with carry in



Range and Precision:

Floating Point Representation:  $\pm \underbrace{9.99}_{\text{Sign}} \times \underbrace{1.11100000000000000000000000000000}_\text{Mantissa} \times \underbrace{11100000000000000000000000000000}_\text{Exponent}$

Floating Point can be represented from  $(-9.99 \times 10^{-99} \text{ to } +9.99 \times 10^{99})$

Normalisation: Avoid synonymous representation by maintaining one non-zero

digit before radix  $\rightarrow \pm \underbrace{5.42}_{\text{Nonzero}} \times 10^{\pm 03}$  (Normalised)

Underflow: (occurs when value is too small to be represented)

Smallest positive Normalised number:  $+1.00 \times 10^{-99}$

Smallest negative Normalised number:  $-1.00 \times 10^{-99}$

Floating Point Standards

Single Precision (32 bits)

1 bit Sign + 8 bit Exponent + 23 bit Fraction

Double Precision (64 bits)

1 bit Sign + 11 bit Exponent + 52 bit Fraction

Conversion of Single Precision to decimal

$$(-1)^S \times (1.F)_2 \times 2^{E - \text{Bias}}$$

Sign bit  $\rightarrow S=0$  (positive),  $S=1$  (negative)

Exponent  $\rightarrow$  Binary number left on radix

Value of Exponent =  $E - \text{Bias}$  (127 for Single, 1023 for Double)

Fraction  $\rightarrow$  Example  $\rightarrow (1.11)_2 \rightarrow (1 + 1 \times 2^{-1} + 1 \times 2^{-2})$  (Expressed this way)

Example: 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Sign = 0

$$\text{Fraction} = (1.111)_2 = 1 + 2^{-1} + 2^{-2} + 2^{-3} = 1.875$$

Exponent = 1011 0010 = 178

Value in decimal =  $+1.875 \times 2^{178}$

$E - \text{Bias} = 178 - 127 = 51$

### Representable Range for Normalised Single Precision

- In normalised mode, exponent is from 00000001 to 11111110
- Smallest magnitude normalised number



Exponent =  $(00000001)_2 = 1$ ; E - Bias =  $1 - 127 = -126$

1 + Fraction =  $(1.000\dots000)_2 = 1$

Value in decimal =  $1 \times 2^{-126}$

- Largest magnitude normalised number



Exponent =  $(11111110)_2 = 254$ ; E - Bias =  $254 - 127 = 127$

1 + Fraction =  $(1.111\dots111)_2 \approx 2$

Value in decimal =  $2 \times 2^{127} \approx 2^{128}$



### IEEE 754 Encoding



| Mode         | Sign  | Exponent             | Fraction      |
|--------------|-------|----------------------|---------------|
| Normalized   | 1 / 0 | 00000001 to 11111110 | Anything      |
| Denormalized | 1 / 0 | 00000000             | Non zero      |
| Zero         | 1 / 0 | 00000000             | 0000 ... 0000 |
| Infinity     | 1 / 0 | 11111111             | 0000 ... 0000 |



### Addition of floating point numbers

#### Rounding Error Mitigation

Suppose Given :  $1.23 e + 01$  (Steps to take)

$4.56 e + 00$

- Add Guard digit to maintain precision

$1.230 e + 01$

$4.560 e + 00$

- Align the digits

$$\begin{array}{r} 1.230 e + 01 \\ 0.456 e + 01 \\ \hline \end{array} \quad \begin{array}{r} 1.686 e + 01 \\ \hline \end{array}$$

- Round Off  $\rightarrow 1.686 e + 01 \rightarrow 1.69 e + 01$

Example:

Align the  $1.00 \times 10^0$  Exponent

$$1.23 \times 10^3 + 1.00 \times 10^0 +$$

$$1.230 e + 03$$

$$1.00 \times 10^0 + 1.00 \times 10^0 +$$

$$0.001 e + 03 \text{ (Align)}$$

$$1.00 \times 10^0 + 1.00 \times 10^0$$

$$1.231 e + 03 \text{ (Round off)}$$

$$1.230 e + 03 \text{ (Repeat of 1,2,3)}$$

$$+ 0.001 e + 03$$

:

$$1230 e + 03 \text{ Final Ans}$$

Example:

$$1.00 \times 10^0 + 1.00 \times 10^0 +$$

$$1.000 e + 00$$

$$1.00 \times 10^0 + 1.00 \times 10^0 +$$

$$2.000 e + 00$$

$$1.00 \times 10^0 + 1.23 \times 10^3$$

:

5.000e + 00 (Needs Alignment)

$$+ 1.230e + 03$$

↓

$$+ 0.005e + 03 \text{ (Aligned)}$$

$$+ 1.230e + 03$$

$$\underline{1.235e + 03} \text{ (Round)}$$

$$1.240 + 03 \text{ (Rounded Value)}$$