

# System Design using Microcontroller :-

## Frame at:-

### Objective

- \* System design using microcontrollers
- \* Architectural features & Programming aspects of (ARM controller/Processor)
- \* Memory management in system design Application

~~Objectives~~

ARM Architecture

ARM Programming model - I  
(C.S)

ARM Programming mode - 2  
(T.I.S)

ARM Programming  
Memory management

## UNIT - I

### ARM Architecture

### ARM families

Design Philosophy

Registers

Pipeline

Intelligent &  
Virtual Table

## UNIT - II :-

### ARM Programming Model - I

Data Processing  
Instruction

### Instruction Set

conditional instructions

Branch Load

Set PC  
Instructions

## UNIT - III :-

### ARM Programming Model - II

Register  
Usage

other  
Branch  
instruction

### Thumb Instruction Set

Data  
Processing  
Instruction

Software Interrupt Instructions

Stack  
single &  
Multi Register  
Lock - Store  
Instructions

## Unit - IV :-

C. Programs  
using function calls



Legendra N. S.  
S.W. Value  
Ref

## Unit - V :-

Cache  
Architecture



# System Design using MicroControlled

## Unit - I

### Introduction :-

1) What is ARM ?

- ✓ → ARM originally Acorn RISC Machine, later known Advanced Reduced Instruction Set Computing (RISC) Machine (ARM) which is family of RISC architectures for Computer Processor. (1983 & 1985)

✓ → ARM makes 32-bit & 64-bit RISC multi-core Processors.

→ RISC Processors are designed to perform a smaller number of types of Computer Instructions so that they can operate at a higher speed, performing millions of Instructions Per Second (MIPS). By skipping unneeded instructions and optimizing pathways.

→ ARM Processors are extensively used in Consumer Electronics device such as smart phones, tablets, multimedia players .. etc..

### Features:-

- ✓ → Load / Store Architecture.
- ✓ → An orthogonal instruction set. → uses all addressing modes (orthogonal means instruction type & Addressing mode vary independently)
- ✓ → Mostly single cycle execution
- ✓ → Enhanced power saving design
- ✓ → 64 & 32-bit execution stated for stable high performance
- ✓ → Hardware virtualization support.

2) ARM Embedded Systems :-

- ✓ The ARM Processor Core is a key component of many successful 32-bit Embedded Systems, which are widely used in mobile phones, handheld organizers, multitude of other everyday portable consumer devices.
- ✓ → ARM Core is not a single Core but whole family of design principles and a common instruction set.

Example:- ARM's most successful core is ARM7TDMI. It provides upto '120 RISC NIPS' and is known high core density and low power consumption, making it ideal for mobile embedded devices.

### 3) The RISC Design Philosophy :-

- ✓ → aimed at delivering simple but powerful instructions that execute within a single cycle at a high clock speed.
- ✓ → It constraints on reducing the complexity of instructions performed by hardware because it is easier to provide greater flexibility and intelligence in software rather than hardware.
  - ✓ As a result RISC design places greater demands on the compiler. In contrast, the traditional complex instruction set computer (CISC) relies more on the hardware for instruction functionality and consequently the CISC instructions are more complicated.
- Implemented in 4-design sides:-
  - Instruction (IPR)
  - Pipeline
  - Registers
  - Load store Architecture.

#### ① Instructions :-

- ✓ → RISC processors have reduced no. of instruction codes classes.
- ✓ → These classes provide simple operations that can each execute in single cycle.
- ✓ → The compiler (S) programmers synthesize complicated operations by combining several simple instructions.
- ✓ → Each instruction is fixed length to allow the pipeline to fetch next instruction before decoding the current instruction.

#### ② Pipelines :- The sequencing of instructions is broken down into smaller units that can be executed in parallel by pipeline.

- Ideally pipeline advances by one step on each cycle for maximum throughput.
- Instructions can be decoded in one pipeline stage.

(3)

Registers - RISC Machines have large no. of General purpose Registers  
 → Registers act as just local memory store for all data processing operations.



(4) Load - Store Architecture :-

The processor operates on data held in Registers. Separate load and store instruction transfer data b/w the register bank and external memory.

4) The ARM Design Philosophy :-

ARM Processor has been specially designed to be small to reduce power consumption and external battery operation.

blog:- 1<sup>st</sup> portable Embedded System require some form of battery Power.

Application such as Mobiles & Personal digital Assistants (PDA)

- ✓ → High core density is another major design requirement since Embedded system have limited memory due to cost / physical size restriction.
- E.S are price sensitive and use slow and low cost memory devices.
- To reduce area of the die taken up by Embedded Processor.
- ✓ → ARM has incorporated hardware debug technology within the Processor to that Software engineers can view register in the Processor if breaking code.
- ✓ → ARM core is not like RISC architecture because of the constraints of primary applications - the Embedded system.

## 5) Instruction Set for Embedded Systems:-

⇒ Variable cycle execution for certain instructions:-

✓ Not every ARM instruction executed in a single cycle.

E.g.: Load etc. multiple instructions may in no. of execution cycle depending upon no. of register being transferred.

✓ → The transfer can occur on sequential memory address, which increases performance since sequential memory access are often faster than random access.

→ Code density also improved since multiple register transfer are common operations at start and end of function.

⇒ Infinite barrel shifted leading to more complex instructions:-

✓ The infinite barrel shifted is a hardware component that pre-selects one of 16 registers before it is used by instruction.

✓ → It expands capability of many instructions to improve the performance and code density.

⇒ Thumb 16-bit instruction set:-

→ ARM enhanced the RISC core by adding a second 16-bit instruction set called Thumb that permits the ARM core to execute either 16/32-bit instructions.

→ The 16-bit instruction improve code density by about 30% over 32-bit fixed length instructions.

⇒ Conditional execution:- An instruction only executed only when specified condition has been satisfied.

⇒ Enhanced instructions:- Enhanced DSP instructions were added to standard ARM instruction set support fast 16x16-bit multiplier operations and saturation.

(5)

## Embedded Systems Hardware :-

" " Can control many different devices from small sensible found on production line , to real time Control systems , All devices use combination of Hardware and Software Components.



fig:- ARM based Embedded device , a micro controller:-

- ① ARM Processor Controls the Embedded device . Different version of ARM Processor are available to suit the desired operating characteristics.
- ② Controllers Co-ordinate important functional blocks of the System . Two commonly found controllers are interrupt and memory controller.
- ③ Peripherals Provides off-chip capability External to chip
- ④ ARM Bus Technology:- E.S use different bus technology than those designed for X86 PCs. the most common PC bus technology, the Peripheral Component Interconnect (PCI) bus, connects such devices as video card and hard disks to the X86 processor bus. This type is called external (or) off-chip.

Those are 2-different classes of devices attached to the bus.

→ ① ARM Processor (it is bus master) - a logical device capable of initiating data transfer with another device attached to the bus across same bus.

② peripherals tend to bus slaves - logical devices capable only of responding to transfer request from a bus master device.

- The bus has 2-Architectural level
- ① Physical level that covers electrical characteristics and ~~bus~~ width (16, 32 (or) 64-bits).
- ② Deal with Protocol → The logical rules that govern the communication between processor & peripherals

## ii) AMBA Bus Protocol:-

- ✓ AMBA → Advanced Microcontroller Bus Architecture  
introduced in 1996 adopted as on-chip bus architecture used for ARM Processors.
- ✓ First AMBA buses introduced were ARM System Bus (ASB) and ARM Peripheral Bus (APB), later introduced another bus design ARM High Performance Bus (AHB).
- AMBA peripheral designer can reuse the same design on multiple projects.
- ✓ AHB provides higher data throughput than ASB b/c it is based on centralized multiplexed bus scheme, rather than ASB bidirectional bus design.
- ✓ AHB bus to run at higher clock speeds and to be the first ARM bus to support width of 64 & 128 bits.
- ✓ ARM introduced 2-variations on Buses
- ↳ Multilayered AHB
  - ↳ AHB Lite.

- In contrast, AHB allows single bus master to be active on bus at any time.
- ✓ → Multilayered AHB bus allows multiple active bus masters.
- ✓ → AHB IPte is a sheet of AHB bus and it is limited to a single bus master.
- ✓ → AHB & multilayered AHB support same protocol for Master & Slave but have different interconnect.  
↳ They permit operations in parallel and allows for higher throughput rates.

(iii)

### Memory :-

CS has to have some form of memory to store and execute code.

Compsale  
Power, Power Consumption  
Performance.

### Characteristics

Hierarchy, width, type.

If memory has to run twice as fast to maintain a desired bandwidth then the memory power requirement may be higher.

(not to speed up data transfer)

### ~~Hierarchy~~ Hierarchy :-

The Cache is placed b/w main memory and the Cache used to speed up data transfer b/w Processor and main memory.



→ Cache provides an overall increase in performance but with a loss of predictable execution time.  
→ Main memory size around (256 KB to 256 MB) depending on application

### Width :-

The memory width is no. of bits the memory return on each access typically 8, 16, 32, 64 bits.

If an uncache system using 32-bit ARM instruction and 16-bit wide memory chip, the processor have to make 2-memory fetches per instruction, each requires 16-bit ~~memory~~ loads.

This Obviously has effect on system Performance.

(8)

→ If the Core executes 16-bit thumb instructions it will achieve better Performance with 16-bit memory.

### ③ Instructions:

| <u>Instruction Size</u> | <u>8-bit memory</u> | <u>16-bit memory</u> | <u>32-bit memory</u> |
|-------------------------|---------------------|----------------------|----------------------|
| ARM 32-bit              | 4-Cycles            | 2-Cycle              | 1-Cycle              |
| Thumb 16-bit            | 2-Cycles            | 1-Cycle              | 1-Cycle              |

### Different types of memory

ROM → is least flexible of all memory types. It contains image that is permanently set at Production time and can not be reprogrammed but it is slow to write.

Flash ROM → can be written/Read

DRAM (Dynamic RAM)

SRAM (Static RAM)

on Power up memory controller allows to be active.  
(allow to initialize the code to be executed)

### ④ Peripherals:-

A Peripheral ranges from single memory controller to a more complex 802.11 wireless device.

#### Interrupt Controller:-

① Standard IC :- sends interrupt signal. Can be programmed to ignore (all mask) and individual (one)

set of device

② Vector IC :- Associate a Priority and a handler address to each interrupt

⑦

### Embedded System

#### Software :-



8

Data Path:-

- Data enters the processor through the Data bus.
- Instruction decoder translates instructions before they are executed.
- Each instruction executed belongs to a particular instruction set.
- ARM processors like all RISC processors, uses a load-store architecture.  
ie 2-instruction types are transferring data.
- ✓ (i) Load instruction Copy data from memory to register in the code ie  
Reg → memory.
- ✓ (ii) Store " " " " Reg → memory.
- Data items are placed in Reg file - a storage bank made up of 32-bit Reg
- ARM code is 32-bit processor.  
↳ most instruction treat the register as holding signed & unsigned 32-bit value.
- (i) The sign extend hardware converts 8-bit to 16-bit (signed) → 32-bit value  
↳ as they read from memory and placed in Reg.
- ↳ ARM instructions typically have 2-source Registers, "Rn & Rm" and a single result or destination register, Rd.
- ↳ Source operands are read from register file using internal label A & B.
- ALU (Arithmetic Logic Unit) ie MAC (multiply-accumulate unit) takes the Reg values Rn & Rm from the A & B buses and computes a result.
- Load & store instructions use the ALU to generate an address to be held in address register and broadcast on address bus.
- Imp feature in ARM is Register Rm alternately be processed in barrel shifted before it enters the ALU.
- Both barrel shifted & ALU can calculate wide range of expression and Address after processing through functional unit the result in Rd is back in Reg file using Result bus.
- Barrel Shifter is combinational circuit which shift data to left/right by arbitrary no. of position in same cycle & it's off.
- ~~Barrel~~ & increment logic can update register content for sequential access independent of ALU.



(a) Registers :- (holds either data or address)



CPSR → Some program  
SR

Giant Program  
Status Register

→ All Reg are of 32-bit size

→ There are upto 18 Active Registers : 16-data registers & 2-Program Status Reg.  
( $R_0 - R_{15}$ )

→ ARM Registers has 3-Registers assigned to particular task (or) special function

$R_{13}, R_{14}, R_{15}$ .

✓ →  $R_{13}, R_{14}, R_{15}$  & CPSR are used for special-Purpose Reg.

✓ →  $R_{13}$  is traditionally used as stack pointer (SP) and stores the head of stack.

✓ →  $R_{14}$  is called the Link register (LR) and it whole code puts the return address where it calls subroutine.

(11)

$R_{15}$  is Program Counter (PC) Contains the address of next instruction

to be fetched by Processor.

→ In ARM registers ( $x_0 - x_{13}$ ) are orthogonal — any instruction that you apply for ' $x_i$ ' you can equally well apply to any of other registers.

✓ ~~CPSR~~ → A-Program Status Reg : CPSR ≈ SPSR.  
(Current & save Program Status Register). /.

### ⑩ Gentle Program Status Register :-

→ ARM Code use CPSR to monitor and control internal operations.

- ✓ → It is 32-bit register.
- It shows basic layout of generic Program Status Register.
- ✓ → Divided in 4-fields each 8-bit wide :-  
↳ flags ; status ; extension and control.
- ✓ → In Gentle design extension and status field used for future use.
- ✓ → The Control field contains Processor mode , state and interrupt mask bit.
- ✓ → The flag field contains conditional flags.
- ✓ → ARM Processor Core have Extra bit allocated . for e.g. 3-bit which can be found in flag field is only available on Jazelle-Enabled Processors . which execute 8-bit instruction.



### a) Processor mode :-

→ determines which register is active and access right to CPSR.

Each Processor mode is Privileged (8) or non Privileged.

↳ A Privileged mode allow full read - write access to CPSR.

↳ non " " " condition flags

→ There are total 7-processor in that

↳ 6-Privileged mode (abort; fast; interrupt; request;  
interrupt request; supervisor;  
system and undefined)

↳ 1-non Privileged mode (User).

→ The processor enters Abort mode when there is a failed attempt to access memory.

✓ → fast interrupt & interrupt request & interrupt request modes correspond to 2-interrupt levels available on ARM processor.

✓ → Supervisor mode that the processor is in after reset and in

→ ~~Processor is Abort mode~~ ~~there is failed attempt to access memory~~.

→ Processor in Abort mode that an OS kernel operates.

→ General mode that an OS kernel operates.

✓ → System mode is special version of User mode allows full read-write access to CPSR.

→ undefined mode is used when Processor encounters an instruction.

✓ that is undefined or not supported by implementation.

→ User mode for programs & applications

## Banked Registers :-



fast  
interrupt  
request

|            |
|------------|
| \$0 - fig  |
| \$1 - fig  |
| \$10 - fig |
| \$11 - fig |
| \$12 - fig |
| \$13 - fig |
| \$14 - fig |

SPSR - fig

Interrupt  
request

|            |
|------------|
| \$13 - iqr |
| \$14 - iqr |

SPSR - iqr

Supervisor

|            |
|------------|
| \$13 - SVC |
| \$14 - SVC |

CPSR - SVC

undefined

|             |
|-------------|
| \$13 - undf |
| \$14 - undf |

SPSR - undf

Abort

|            |
|------------|
| \$13 - Abt |
| \$14 - Abt |

CPSR - Abt

- Every processor mode except user mode can change mode by writing directly to the mode bit of CPSR.
- All processor mode except system mode have a set of associated bank registers that are a subset of main 16-registers.
- The banked reg map one-to-one onto a user mode register. i.e. when processor mode changes banked reg change new mode will replace an existing reg.
- In interrupt mode, the instructions you execute still access registers named \$13 & \$14. Cuz these registers are banked registers \$13 - iqr & \$14 - iqr.
- The user registers \$13-UX & \$14-UX are not affected by interrupt referencing these registers. A program still has normal access to the other registers \$0 to \$12.

- ✓ → the processor mode can be changed by a program that writes directly to CPSR (Processor Core has to be privileged) or by hardware when core responds to an exception or interrupt.



- ✓ → the terms exception and interrupt cause a mode change as follows:  
reset ; interrupt request ; fast interrupt request ; software interrupt;  
data abort ; prefetch abort and undefined instruction.
- ✓ → the user registers are replaced with registers X<sub>3</sub>-IRQ and X<sub>14</sub>-IRQ respectively.
- ✓ → while X<sub>14</sub>-IRQ contains the return address and X<sub>3</sub>-IRQ contains the stack pointer for interrupt request mode.
- the stack pointed to by SPSR, which stored the previous modes CPSR, which appears a new register in interrupt request mode.
- ✓ → to return back to user mode, a special return instruction used that instructs the core to restore the original CPSR from SPSR-IRQ and back in user registers X<sub>3</sub> & X<sub>14</sub>.  
↳ CPSR can only be modified and read in a user mode-privileged mode. There is no CPSR available in privileged mode.

| <u>Mode</u>            | <u>Abbreviation</u> | <u>Privileged</u> | <u>mode (4:0)</u> |
|------------------------|---------------------|-------------------|-------------------|
| Abort                  | abt                 | Yes               | 10111             |
| fast interrupt Request | fiq                 | Yes               | 10001             |
| Interrupt request      | irq                 | Yes               | 10010             |
| Supervisor             | svc                 | Yes               | 10011             |
| System                 | Sys                 | Yes               | 11011             |
| undefined              | und                 | Yes               | 11011             |
| User                   | usr                 | No                | 10000             |

- when a mode change
- ✓ → Note:- CPSR is not copied into SPSR due to a Program writing directly to CPSR.
  - ↳ forced due to a Program writing directly to CPSR.
  - ✓ → the saving of CPSR is avoided only when an exception (R) interrupt is raised.
  - ✓ → the current active Processor mode occupied 5-least significant bits of CPSR. when Power is on it is Supervisor mode which is Privileged.
  - ↳ starting in " mode is useful. since Initialization code can use full access to CPSR to setup the stacks for each of other modes.

(c)

### State and Instruction set :-

- State determines which instruction set is being executed.
- State determines which instruction set is being executed.
- There are 3-instruction sets → ARM ; Thumb ; Jazelle
- ↳ ARM instruction set is only active when the Processor is in ARM state.
- ↳ ARM instruction set is only active when the Processor is executing purely 16-bit instructions.
- ↳ Once in thumb state the Processor is executing purely 16-bit instructions.
- ↳ The Jazelle-J and ThumbT-bits in the PSR reflect the state of Processor.

- ✓ → when both J-bit & T-bit are zero (0) the processor is in ARM state and execute ARM instructions. this the case when Power applied to the core.
- ✓ → when T-bit is ('1') Processor is in thumb state.
- to change state the core executes specialized branch instruction.

|   | ARM & Thumb Instruction Set Features | ARM (CPSR T=0)                    | Thumb (CPSR T=1)                                   |
|---|--------------------------------------|-----------------------------------|----------------------------------------------------|
| 1 | Instruction size                     | 32-bit                            | 16-bit                                             |
| 2 | Core instruction                     | 38                                | 30                                                 |
| 3 | Conditional execution                | most                              | only branch instructions                           |
| 4 | Data processing instructions         | access to barrel shifted and ALU  | separate barrel shifted and ALU instructions       |
| 5 | Program status register              | Read - write in Privileged mode   | no direct access                                   |
| 6 | Register usage                       | 15-General Purpose registers + PC | 8-General Purpose registers + 7 high Register + PC |

- ✓ → 3rd instruction Jazelle. executes 8-bit instruction and is a hybrid mix of software and hardware design to speed up the execution of Java bytecodes.

↳ to execute Java bytecode requires Jazelle technology plus a special modified version of Java virtual machine.

↳ Hardware portion of Jazelle only support a subset of the Java byte codes, the rest are emulated in software.

### Jazelle Instruction Set Features

|                   | Jazelle (CPSR T=0 ; J=1)                                                                              |
|-------------------|-------------------------------------------------------------------------------------------------------|
| Instruction size  | 8-bit                                                                                                 |
| Core instructions | over 60% of Java bytecode are implemented in hardware; the rest of codes are implemented in software. |

## Interrupt Masking:-

→ Used to stop specific interrupt request from interrupting processor

There are 2-types interrupt request levels



✓ → The CPSR has 2-interrupt mask bit 7 and 6 (or S and F) which control the masking IRQ and FIQ.

✓ → The S bit masks IRQ when set to binary '1', and F bit " FIQ " " " " 1".

## (e) Conditional flags:-

→ are updated by comparisons and the result of ALU operations that specify S-instruction suffix.

Eg:- SUBS      Subtract  
              instruction results in a register value of zero if CPSR is set.

then Z-flag in then i.e this subtract instruction specifically update the CPSR.

### Conditional flag:-

| Flag | flag name  | Set when                                                |
|------|------------|---------------------------------------------------------|
| Q    | Saturation | the result causes an overflow and/or saturation         |
| V    | Overflow   | the result causes signed overflow                       |
| C    | Carry      | the result causes an unsigned carry                     |
| Z    | Zero       | the result is zero frequently used to indicate Equality |
| N    | Negative   | bit 31 of the result is binary 1                        |

→ When processor codes to include DSP extensions, the Q-bit indicates if an overflow (or) saturation has occurred in enhanced DSP instruction.

↳ The flag is "sticky" in the sense that hardware set the flag

↳ To clear flag you need to write to CPSR directly.

Jazelle - Enabled Processor J-bit reflect state of core if it is + the core is in Jazelle state. (18)

↳ J-bit generally not available on some processor cores.  
→ most ARM instructions can be executed conditionally on the value of Condition flag.

↳ These flags are located in most significant bit in CPSR.



C-flag set

Conditional

Processor is in ARM state (0/0)

Coz neither J/F-bit is set

(F) Conditional Execution:- controls whether (R) not core will execute an instruction.

Conditional Mnemonics:-

| mnemonic | Name                             | Conditional flag |
|----------|----------------------------------|------------------|
| EQ       | Equal                            | Z                |
| NE       | not Equal                        | N                |
| CS HS    | Carry set / unsigned high & same | C                |
| CC LO    | Carry clear / unsigned low.      | N                |
| MI       | minus / negative                 | n                |
| PL       | plus / Positive or zero          | V                |
| VS       | overflow                         | V                |
| VC       | no overflow                      | V                |
| H<       | unsigned high                    | ZC               |
| LS       | unsigned low & same              | Z(N)V            |
| GE       | signed greater than & equal      | NV & NV          |
| LT       | signed less than                 | NV & NV          |
| GT       | signed greater than              | NzV(0) NZV       |
| LE       | " less than & equal              | Z, NV(0) NV      |
| AL       | always (unconditional)           | Ignored.         |

## 11 Pipeline :-

- Is the mechanism a RISC Processor uses to execute instructions.
- Using a pipeline speed up execution by fetching the next instruction while other instructions are being decoded and executed.



→ Fetch loads an instruction from memory.

→ Decode identifies the instruction to be executed.

→ Execute processes the instruction and writes the result back to a register.



## 12 Pipeline executing characteristics:-

The ARM pipeline has not processed an instruction unit of passes completely through the execute stage.



Q. Shows Instruction sequence on a ARM-7 Pipeline.

(20)

The MSR instruction (Move to Status Register from an ARM Register) (CPSR to CPSR) is used to enable IRQ interrupt which only occurs once the MSR instruction completes the execute stage of pipeline.

↳ It changes 1-bit in the CPSR to enable the IRQ interrupt.  
→ Once ADD instruction enters the execute stage of pipeline, IRQ interrupt are enabled.



Note:- When the pipeline is in thumb state the PC is the instruction address plus 4.

- <sup>1st</sup> Execution of a branch instruction (B) branching by the direct modification of the PC causes the ARM core to flush the pipeline.
- <sup>2nd</sup> ARM 10 uses branch prediction, which reduces the effect of a pipeline flush by predicting possible branched and loading the new branch address prior to execution of instruction.
- <sup>3rd</sup> Instruction in execution state will complete even though an interrupt has been raised. Other instruction in pipeline will be abandoned and processor will start filling the pipeline from the appropriate entry in vector table.

## Exceptions, Interrupts & Vector Tables:-

When exception (or) interrupt occurs the processor sets the PC to a specific memory address.

Where the address is within a special address range called Vector table.

- The memory map address 0x00000000 is reserved for the vector table, ~~which is at 32 bit word~~. On some processors the vector table can be optionally located at a higher address in memory (starting at the offset 0xFFFF0000). Operating systems such as Linux and Microsoft Embedded products can take advantage of this feature.
- When an exception (or) interrupt occurs the processor suspended normal execution and starts loading instructions from the exception vector table. Each vector table entry contains a form of branch instruction pointing to start of a specific routine.

### Vector Table:-

| <u>Exception / interrupt</u> |
|------------------------------|
| Reset                        |
| Undefined instruction        |
| Software interrupt           |
| Prefetch abort               |
| Data abort                   |
| Reserved                     |
| Interrupt Request            |
| Fast Interrupt Request       |

| <u>Shorthand</u> | <u>Address</u> | <u>High Address</u> |
|------------------|----------------|---------------------|
| RESET            | 0x 00 00 0000  | 0xFFFF0000          |
| UNDEF            | 0x 00 00 0004  | 0xFFFF0004          |
| SWI              | 0x00000008     | 0xFFFF0008          |
| PABT             | 0x0000000C     | 0xFFFF000C          |
| DABT             | 0x 00 00 0010  | 0xFFFF0010          |
| -                | 0x 00 00 0014  | 0xFFFF0014          |
| IRQ              | 0x00000018     | 0xFFFF0018          |
| FIQ              | 0x 00 00 001C  | 0xFFFF001C          |

- Reset vector is the location of first instruction executed by the processor when power is applied. This instruction branches to initialization code.
- Undefined instruction vector is used when the processor cannot decode an instruction.
- Software interrupt vector is called when you execute a SWI instruction. The SWI instruction is frequently used as the mechanism to invoke an OS routine.

② Bus fault about vector occurs when the Processor attempts to fetch an instruction from an address without sufficient access permission.  
The actual abort occurs in the decode stage.

- Data about vector is similar to a Bus fault about but is raised when an instruction attempts to access data memory without sufficient access permissions.
- Interrupt request vector is used by external hardware to interrupt the normal execution flow of the Processor. It can only be raised or not masked in CPSR.
- Fast interrupt request vector is similar to Interrupt req but is received for hardware requiring faster response times. It can only be raised if FIQs are not masked in CPSR.

### ⑬ Architecture Revisions :-

Early ARM Processor implementation executes a specific instruction set architecture (ISA).

#### ① Nomenclature :-

ARM {x} {y} {z} {i} {f} {D} {M} {I} {E} {J} {F} {-S}

x - family

y - memory management / protection unit

z - Cache

i - Thumb 16-bit decoder

f - STAG debug

M - fast multiplier

I - Embedded ICE monitor

E - Enhanced instructions (includes TDMI)

J - Jazelle

F - Vector floating point unit

S - Synthesizable version.

Architecture Evolution:-

Introduced in 1985 from original version (1-6).  
 One of most significant change to ISA was introduction of Thumb instruction set in ARMv4T (the ARM7TDMI Processor).

⑥ ARM Processor Families:-

Families are based on ARM7; ARM7T; ARM-10; ARM-11 etc.

7, 9, 10, 11 indicates core design

Revision history:-RevisionExample Core ImplementationISA Enhancement:-

① ARMv1

ARM 1

First ARM Processor  
26-bit addressing

② ARM v2

ARM 2

32-bit multipliex  
32-bit Co-processor support

③ ARM v2A

ARM 3

on-chip Cache  
Atomic swap instruction

④ ARMv3

ARM 6 and ARM7TDI

32-bit addressing  
Separate spsr & cpsr  
New modes - undefined instruction and abort  
MMU support - virtual memory

⑤ ARMv3M

ARM7M

Signed and unsigned long multiply instructions

⑥ ARMv4

Strong ARM

Load-store instructions for signed and unsigned  
halfword/bytes  
new mode - system  
Reserve SWI space for architecturally  
defined operations  
26-bit addressing mode no longer supported

⑦ ARMv4T

ARM7TDMI &amp; ARM9T

Thumb

Superset of ARMv4T  
extra instructions added for changing state  
between ARM and Thumb  
Enhanced multiply instructions  
extra DSP-type instructions  
faster multiply accumulate

⑨ ARMv5TEJ

ARM17EJ &amp; ARM1926EJ

Java Acceleration

Improved multiprocessor instruction  
unsigned and mixed endian data handling  
new multimedia instructions

ARM 11

## Description of CPSR:-

(24)

| Registers | Bits | Architectures | Description    |
|-----------|------|---------------|----------------|
| Mode      | 4:0  | all           | Processor mode |
| R         | 5    | ARMV4T        | Thumb State    |
| S         | 6    | all           | interrupt mask |
| SQZF      | 24   | ARM V5TEJ     | Jazelle State  |
| J         | 27   | ARM V5TE      | Condition flag |
| Q         | 28   | all           | "              |
| V         | 29   | all           | "              |
| C         | 30   | all           | "              |
| Z         | 31   | all           | "              |
| N         |      |               |                |

## ① ARM family attribute comparison:-

|                  | ARM 7       | ARM 9                  | ARM 10                | ARM 11                 |
|------------------|-------------|------------------------|-----------------------|------------------------|
| ① Pipeline depth | 3-stage     | 5-stage                | 6-stage               | 8-stage                |
| ② Typical MHz    | 80          | 190                    | 260                   | 335                    |
| ③ mW/MHz         | 0.06mW/MHz  | 0.19mW/MHz<br>(+Cache) | 0.5mW/MHz<br>(+Cache) | 0.26mW/MHz<br>(+cache) |
| ④ MIPS/MHz       | 0.97        | 1.1                    | 1.3                   | 1.2                    |
| ⑤ Architecture   | Von-Neumann | Harvard                | Harvard               | Harvard                |
| ⑥ Multiplier     | 8x32        | 8x32                   | 16x32                 | 16x32                  |

## ARM7 family:-

→ Core has Von-Neumann style architecture, with both data & instructions use the same bus.

→ Core has 3-stages pipeline & executes the architecture ARMv4T instruction set.

fetch → one significant variation in ARM7 family is ARM7TDMI-S.

decode → the ARM7TDMI-S has same operating characteristics as standard ARM7TDMI.

execute → ARM720T is most flexible member of ARM7 family bcoz it includes MMU.

→ The presence MMU means the ARM720T is capable of handling the Linux & Microsoft Embedded Platform Operating Systems, which includes 8K Cache.

→ Virtualizable can be selected to a higher address by setting a Co-Processor 15-Register.

→ Another variation is ARM7EJ-S Processor also synthesizable.

→ ARM7EJ-S is quite different since it includes a five-stage pipeline and executes ARMv5TEJ instructions.

→ This version of ARM7 is the only one that provides both Java acceleration and Enhanced instructions but without any memory protection.

| CPU Core   | MMU/MPU | Cache                                  | Satellite | Thumb | ISA   | E <sup>a</sup> |
|------------|---------|----------------------------------------|-----------|-------|-------|----------------|
| ARM7TDMI   | none    | none                                   | no        | Yes   | V4T   | no             |
| ARM7EJ-S   | none    | none                                   | yes       | Yes   | V5TEJ | yes            |
| ARM720T    | MMU     | Unified - 8K Cache                     | no        | Yes   | V4T   | no             |
| ARM920T    | MMU     | Separate - 16K/16KD+1 Cache            | no        | Yes   | V4T   | no             |
| ARM922T    | MMU     | Separate 8K/8K D+1 Cache               | no        | Yes   | V4T   | no             |
| ARM926EJ-S | MMU     | Separate Cache & TCMs configurable     | yes       | Yes   | V5TEJ | yes            |
| ARM940T    | MPU     | Separate - 4K/4K D+1 Cache             | no        | Yes   | V4T   | no             |
| ARM946E-S  | MPU     | Separate - Cache and TCMs configurable | no        | Yes   | V5TE  | yes            |

|             |              |                                           |     |     |      |     |      |
|-------------|--------------|-------------------------------------------|-----|-----|------|-----|------|
| M966E-S     | none         | Separate - ICMs<br>Configurable           | no  | Yes | V5TE | Yes | (26) |
| ARM1020 E   | MMU          | Separate - 32K/32K<br>D+I<br>Cache        | no  | Yes | V5TE | Yes |      |
| ARM1022E    | mmu          | Separate - 16K/16K<br>D+I<br>Cache        | no  | Yes | V5TE | Yes |      |
| ARM1026EJ-S | mmu &<br>MPU | Separate - Cache<br>and ICMs Configurable | Yes | Yes | V5TE | Yes |      |
| ARM1136J-S  | mmu          | Separate - Cache<br>and ICMs Configurable | Yes | Yes | V6   | Yes |      |
| ARM1136JFS  | mmu          | Separate Cache &<br>ICMs Configurable     | Yes | Yes | V6   | Yes |      |

- (15) ARM-9 Family:- Announced in 1997. ARM-9 Processor can run at higher clock frequencies than ARM7. Extra stages improve performance of processor.
- The memory system has been redesigned to follow Harvard Architecture which separates data D & instructions I bus.
- 1st Processor in ARM9 family was ARM920T which includes separate D+I Cache and an MMU. This Processor can be used by OS running virtual memory support.
- ARM920T is a variation on the ARM920T but with half the D+I Cache size.
- ARM940T includes a smaller D+I Cache and a MPU. → ARM940T is designed for application not for OS.
- Both ARM920T and ARM940T execute the architecture V4T instruction.

ARM9E-S: This core is synthesizable version of ARM9 with Extension E. There are 2 variants both execute architecture of VSTE. (27)

ARM946E-S

ARM966E-S

Instructions

which also support optional Embedded trace macrocell (ETM)

→ ARM946E-S includes ICM (rightly buffered memory), cache, and an MMU, which this processor is designed for use in Embedded Control applications. that provide deterministic real-time response; ARM966E does not have MMU.

→ ARM926EJ-S is latest core in ARM9 Product line is announced in 2000, which was designed for use in small portable Java-enabled devices such as 3G phones and personal digital assistants (PDAs)

✓ → ARM926EJ-S is 1<sup>st</sup> ARM Processor core to include the Jazelle technology accelerates Java byte code execution. Which features MMU, D+I Cache, ICMs, and non-zero wait states memory.

## 16) ARM10 Family :- announced in 1999

It extends the ARM9 Pipeline to 6-stages Supports optional  
vector floating point (VFP) unit, which adds 7-stages to ARM10 pipeline

→ The VFP increases floating point performance and is compliant with the IEEE 754-1985 floating point standard.

→ ARM1020E is 1<sup>st</sup> Processor to use an ARM10E core. like the ARM9E includes E-Enhanced instructions. It has separate 32K D+I caches; optional vector floating point unit and an MMU.

↳ It also has 64-bit dual bus interface

→ ARM1026EJ-S is very similar to ARM926EJ-S but with both MMU and MMU. This Processor has the performance of ARM10 with flexibility of ARM926EJ-S.

## ARM11 family :-

(28)

→ ARM1136J-S, announced in 2003 designed for high performance and power efficient applications.

→ It is 1<sup>st</sup> Processor implementation to execute architecture ARMv6 instructions.

→ It incorporates 8-stages pipeline with separate load file and arithmetic pipelines included. In ARMv6 instructions are single instructions, multiple data (SIMD) extensions for media processing specifically designed to increase video processing performance.

The ARM1136JF-S is an ARM1136J-S with addition of vector floating point unit for fast floating point operations.

## ARM7 EJ-S (5-stage)

Instruction address  
ARM State  
PC

PC-4

PC-8

PC-12

PC-16

Thumb State  
PC

PC-2

PC-4

PC-6

PC-8



## 6-Stage Jazelle Pipeline

insts fetched from memory

decoding of Java bytecode to opcodes

Registers used in instruction are decoded

Shift and ALU operation

Data access to Java memory

write back to reg bank

## Hardware Extensions :-

(29)

Hardware Extensions placed next to ARM Core improve performance, manage resources; provide extra functionality.

### 3 - Hardware extensions

- ↳ Cache
- ↳ tightly coupled memory (Tcm)
- ↳ memory management (mmu)
- ↳ co-processor interface.

① Cache & Tightly Coupled Memory :-

The Cache is a block of fast memory placed b/w main memory & core.



( Von-Neumann Architecture  
with Cache ).

ARM has 2 forms of Cache. First is found attached to Von-Neumann style cores. It combines both data and instruction into a single unified Cache.



( Harvard Architecture )

Second form is Harvard-Architecture Separates Cache & data; Instruction

Cache Provides an overall increase in Performance but  
Expense of Predictable Execution.

→ But in real time system it is paramount that Code execution  
is deterministic the time taken for loading and storing instructions (8)  
data must be predictable This is achieved using a form of  
memory called ICM (ightly Coupled Memory).

→ ICM is fast SRAM located close to Core guarantees the  
clock cycles required to fetch instructions (8) data - critical for  
real time algorithms requiring deterministic behaviours.



fig:- Combining of both Technologies

Memory Management:- ARM Core have 3 different types of mm.  
① No Protection; a memory protection unit (MPU)

② Providing limited protection and a memory management unit (MMU)  
③ Providing full protection.

Non Protected Memory:- is fixed and provides very little  
flexibility used for small, simple CS that require no protection.

- Memory Protection Unit :- employ simple system that we limited no. of memory regions which are controlled by set of special co-processor registers, and each region is defined with specific access permission.
- Memory Management Unit (MMU) :- are most comprehensive memory management hardware available on the ARM
  - It uses set of translation tables to provide fine grained control over memory. These tables are stored over main memory and provide a virtual to physical address map as well as access permissions.