



# + Unit 2

## The Central Processing Unit

# RoadMap

## Chapter

- Machine instruction characteristics
  - Elements of a machine instruction
  - Instruction representation
  - Instruction types
  - Number of addresses
  - Instruction set design
- Types of operands
  - Numbers
  - Characters
  - Logical data

## Instruction Sets: Characteristics and Functions

- Intel x86 and ARM data types
- Types of operations
  - Data transfer
  - Arithmetic
  - Logical
  - Conversion
  - Input/output
  - System control
  - Transfer of control
- Intel x86 and ARM operation types

# 1. Machine Instruction Characteristics



- The operation of the processor is determined by the instructions it executes, referred to as machine instructions or *computer instructions*
- The collection of different instructions that the processor can execute is referred to as the processor's *instruction set*
- Each instruction must contain the information required by the processor for execution



# Elements of a Machine Instruction

ADD

## Operation code (opcode)

- Specifies the operation to be performed. The operation is specified by a binary code, known as the operation code, or opcode

## Source operand reference

- The operation may involve one or more source operands, that is, operands that are inputs for the operation

## Result operand reference

- The operation may produce a result

## Next instruction reference

- This tells the processor where to fetch the next instruction after the execution of this instruction is complete



**Figure 12.1 Instruction Cycle State Diagram**

# Source and result operands can be in one of four areas:

## 1) Main or virtual memory

- As with next instruction references, the main or virtual memory address must be supplied

## 2) I/O device

- The instruction must specify the I/O module and device for the operation. If memory-mapped I/O is used, this is just another main or virtual memory address

## 3) Processor register

- A processor contains one or more registers that may be referenced by machine instructions.
- If more than one register exists each register is assigned a unique name or number and the instruction must contain the number of the desired register

## 4) Immediate

- The value of the operand is contained in a field in the instruction being executed

# Instruction Representation

- Within the computer each instruction is represented by a sequence of bits
- The instruction is divided into fields, corresponding to the constituent elements of the instruction



Figure 12.2 A Simple Instruction Format



# Instruction Representation

- Opcodes are represented by abbreviations called *mnemonics*
- Examples include:
  - ADD Add
  - SUB Subtract
  - MUL Multiply
  - DIV Divide
  - LOAD Load data from memory
  - STOR Store data to memory
- Operands are also represented symbolically
- Each symbolic opcode has a fixed binary representation
  - The programmer specifies the location of each symbolic operand



# Instruction Types

- Arithmetic instructions provide computational capabilities for processing numeric data
- Logic (Boolean) instructions operate on the bits of a word as bits rather than as numbers, thus they provide capabilities for processing any other type of data the user may wish to employ



- Movement of data into or out of register and or memory locations

| <u>Instruction</u> | <u>Comment</u>            |
|--------------------|---------------------------|
| SUB Y, A, B        | $Y \leftarrow A - B$      |
| MPY T, D, E        | $T \leftarrow D \times E$ |
| ADD T, T, C        | $T \leftarrow T + C$      |
| DIV Y, Y, T        | $Y \leftarrow Y \div T$   |

(a) Three-address instructions

| <u>Instruction</u> | <u>Comment</u>            |
|--------------------|---------------------------|
| MOVE Y, A          | $Y \leftarrow A$          |
| SUB Y, B           | $Y \leftarrow Y - B$      |
| MOVE T, D          | $T \leftarrow D$          |
| MPY T, E           | $T \leftarrow T \times E$ |
| ADD T, C           | $T \leftarrow T + C$      |
| DIV Y, T           | $Y \leftarrow Y \div T$   |

(b) Two-address instructions

| <u>Instruction</u> | <u>Comment</u>              |
|--------------------|-----------------------------|
| LOAD D             | $AC \leftarrow D$           |
| MPY E              | $AC \leftarrow AC \times E$ |
| ADD C              | $AC \leftarrow AC + C$      |
| STOR Y             | $Y \leftarrow AC$           |
| LOAD A             | $AC \leftarrow A$           |
| SUB B              | $AC \leftarrow AC - B$      |
| DIV Y              | $AC \leftarrow AC \div Y$   |
| STOR Y             | $Y \leftarrow AC$           |

(c) One-address instructions

**Figure 12.3 Programs to Execute  $Y = \frac{A - B}{C + (D \times E)}$**

# Table 12.1

## Utilization of Instruction Addresses (Nonbranching Instructions)

| Number of Addresses | Symbolic Representation | Interpretation              |
|---------------------|-------------------------|-----------------------------|
| 3                   | OP A, B, C              | A $\leftarrow$ B OP C       |
| 2                   | OP A, B                 | A $\equiv$ A OP B           |
| 1                   | OP A                    | AC $\leftarrow$ AC OP A     |
| 0                   | OP                      | T $\leftarrow$ (T - 1) OP T |

AC = accumulator

T = top of stack

(T - 1) = second element of stack

A, B, C = memory or register locations

## 2. Types of Operands



# + Numbers

- All machine languages include numeric data types
- Numbers stored in a computer are limited:
  - Limit to the magnitude of numbers representable on a machine
  - In the case of floating-point numbers, a limit to their precision
- Three types of numerical data are common in computers:
  - Binary integer or binary fixed point
  - Binary floating point
  - Decimal
- Packed decimal
  - Each decimal digit is represented by a 4-bit code with two digits stored per byte
  - To form numbers 4-bit codes are strung together, usually in multiples of 8 bits





# Characters

- A common form of data is text or character strings
- Textual data in character form cannot be easily stored or transmitted by data processing and communications systems because they are designed for binary data
- Most commonly used character code is the International Reference Alphabet (IRA)
  - Referred to in the United States as the American Standard Code for Information Interchange (ASCII)
- Another code used to encode characters is the Extended Binary Coded Decimal Interchange Code (EBCDIC)
  - EBCDIC is used on IBM mainframes



# Logical Data

- An  $n$ -bit unit consisting of  $n$  1-bit items of data, each item having the value 0 or 1
- Two advantages to bit-oriented view:
  - Memory can be used most efficiently for storing an array of Boolean or binary data items in which each item can take on only the values 1 (true) and 0 (false)
  - To manipulate the bits of a data item
    - If floating-point operations are implemented in software, we need to be able to shift significant bits in some operations
    - To convert from IRA to packed decimal, we need to extract the rightmost 4 bits of each byte

## Table 12.2

# x86 Data Types

| Data Type                                       | Description                                                                                                                                                                                                                                         |
|-------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| General                                         | Byte, word (16 bits), doubleword (32 bits), quadword (64 bits), and double quadword (128 bits) locations with arbitrary binary contents.                                                                                                            |
| Integer                                         | A signed binary value contained in a byte, word, or doubleword, using two's complement representation.                                                                                                                                              |
| Ordinal                                         | An unsigned integer contained in a byte, word, or doubleword.                                                                                                                                                                                       |
| Unpacked binary coded decimal (BCD)             | A representation of a BCD digit in the range 0 through 9, with one digit in each byte.                                                                                                                                                              |
| Packed BCD                                      | Packed byte representation of two BCD digits; value in the range 0 to 99.                                                                                                                                                                           |
| Near pointer                                    | A 16-bit, 32-bit, or 64-bit effective address that represents the offset within a segment. Used for all pointers in a nonsegmented memory and for references within a segment in a segmented memory.                                                |
| Far pointer                                     | A logical address consisting of a 16-bit segment selector and an offset of 16, 32, or 64 bits. Far pointers are used for memory references in a segmented memory model where the identity of a segment being accessed must be specified explicitly. |
| Bit field                                       | A contiguous sequence of bits in which the position of each bit is considered as an independent unit. A bit string can begin at any bit position of any byte and can contain up to 32 bits.                                                         |
| Bit string                                      | A contiguous sequence of bits, containing from zero to $2^{32} - 1$ bits.                                                                                                                                                                           |
| Byte string                                     | A contiguous sequence of bytes, words, or doublewords, containing from zero to $2^{32} - 1$ bytes.                                                                                                                                                  |
| Floating point                                  | See Figure 12.4.                                                                                                                                                                                                                                    |
| Packed SIMD (single instruction, multiple data) | Packed 64-bit and 128-bit data types                                                                                                                                                                                                                |

$0 - 2^k - 1$



**Figure 12.4 x86 Numeric Data Formats**

# Single-Instruction-Multiple-Data (SIMD) Data Types



- Introduced to the x86 architecture as part of the extensions of the instruction set to optimize performance of multimedia applications
- These extensions include MMX (multimedia extensions) and SSE (streaming SIMD extensions)
- Data types:
  - Packed byte and packed byte integer
  - Packed word and packed word integer
  - Packed doubleword and packed doubleword integer
  - Packed quadword and packed quadword integer
  - Packed single-precision floating-point and packed double-precision floating-point

# ARM Data Types



ARM processors support data types of:

- 8 (byte)
- 16 (halfword)
- 32 (word) bits in length

All three data types can also be used for two's complement signed integers

For all three data types an unsigned interpretation is supported in which the value represents an unsigned, nonnegative integer



## Alignment checking

- When the appropriate control bit is set, a data abort signal indicates an alignment fault for attempting unaligned access

## Unaligned access

- When this option is enabled, the processor uses one or more memory accesses to generate the required transfer of adjacent bytes transparently to the programmer



**Figure 12.5 ARM Endian Support - Word Load/Store with E-bit**

# 3. Type Of Operations

## Common Instruction Set Operations (page 1 of 2)

| Type          | Operation Name        | Description                                                                                            |
|---------------|-----------------------|--------------------------------------------------------------------------------------------------------|
| Data Transfer | Move (transfer)       | Transfer word or block from source to destination                                                      |
|               | Store                 | Transfer word from processor to memory                                                                 |
|               | Load (fetch)          | Transfer word from memory to processor                                                                 |
|               | Exchange              | Swap contents of source and destination                                                                |
|               | Clear (reset)         | Transfer word of 0s to destination                                                                     |
|               | Set                   | Transfer word of 1s to destination                                                                     |
|               | Push                  | Transfer word from source to top of stack                                                              |
|               | Pop                   | Transfer word from top of stack to destination                                                         |
| Arithmetic    | Add                   | Compute sum of two operands                                                                            |
|               | Subtract              | Compute difference of two operands                                                                     |
|               | Multiply              | Compute product of two operands                                                                        |
|               | Divide                | Compute quotient of two operands                                                                       |
|               | Absolute              | Replace operand by its absolute value                                                                  |
|               | Negate                | Change sign of operand                                                                                 |
|               | Increment             | Add 1 to operand                                                                                       |
|               | Decrement             | Subtract 1 from operand                                                                                |
| Logical       | AND                   | Perform logical AND                                                                                    |
|               | OR                    | Perform logical OR                                                                                     |
|               | NOT (complement)      | Perform logical NOT                                                                                    |
|               | Exclusive-OR          | Perform logical XOR                                                                                    |
|               | Test                  | Test specified condition; set flag(s) based on outcome                                                 |
|               | Compare               | Make logical or arithmetic comparison of two or more operands; set flag(s) based on outcome            |
|               | Set Control Variables | Class of instructions to set controls for protection purposes, interrupt handling, timer control, etc. |
|               | Shift                 | Left (right) shift operand, introducing constants at end                                               |
|               | Rotate                | Left (right) shift operand, with wraparound end                                                        |

(Table can be found on page 426 in textbook.)



| Type                | Operation Name     | Description                                                                                               |
|---------------------|--------------------|-----------------------------------------------------------------------------------------------------------|
| Transfer of Control | Jump (branch)      | Unconditional transfer; load PC with specified address                                                    |
|                     | Jump Conditional   | Test specified condition; either load PC with specified address or do nothing, based on condition         |
|                     | Jump to Subroutine | Place current program control information in known location; jump to specified address                    |
|                     | Return             | Replace contents of PC and other register from known location                                             |
|                     | Execute            | Fetch operand from specified location and execute as instruction; do not modify PC                        |
|                     | Skip               | Increment PC to skip next instruction                                                                     |
|                     | Skip Conditional   | Test specified condition; either skip or do nothing based on condition                                    |
|                     | Halt               | Stop program execution                                                                                    |
|                     | Wait (hold)        | Stop program execution; test specified condition repeatedly; resume execution when condition is satisfied |
|                     | No operation       | No operation is performed, but program execution is continued                                             |
| Input/Output        | Input (read)       | Transfer data from specified I/O port or device to destination (e.g., main memory or processor register)  |
|                     | Output (write)     | Transfer data from specified source to I/O port or device                                                 |
|                     | Start I/O          | Transfer instructions to I/O processor to initiate I/O operation                                          |
|                     | Test I/O           | Transfer status information from I/O system to specified destination                                      |
| Conversion          | Translate          | Translate values in a section of memory based on a table of correspondences                               |
|                     | Convert            | Convert the contents of a word from one form to another (e.g., packed decimal to binary)                  |

## Table 12.3

# Common Instruction Set Operations (page 2 of 2)

(Table can be found on page 426 in textbook.)

# Table 12.4

## Processor Actions for Various Types of Operations

|                     |                                                                                                                                                                                                                               |
|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Data Transfer       | Transfer data from one location to another                                                                                                                                                                                    |
|                     | If memory is involved: <ul style="list-style-type: none"><li>Determine memory address</li><li>Perform <u>virtual</u>-to-actual-memory address transformation</li><li>Check cache</li><li>Initiate memory read/write</li></ul> |
|                     | May involve data transfer, before and/or after                                                                                                                                                                                |
|                     | Perform function in ALU                                                                                                                                                                                                       |
|                     | Set condition codes and flags                                                                                                                                                                                                 |
| Logical             | Same as arithmetic                                                                                                                                                                                                            |
| Conversion          | Similar to arithmetic and logical. May involve special logic to perform conversion                                                                                                                                            |
| Transfer of Control | Update program counter. For subroutine call/return, manage parameter passing and linkage                                                                                                                                      |
| I/O                 | Issue command to I/O module                                                                                                                                                                                                   |
|                     | If memory-mapped I/O, determine memory-mapped address                                                                                                                                                                         |

(Table can be found on page 427 in textbook.)

# Data Transfer

Most fundamental type of machine instruction



Must specify:

- Location of the source and destination operands
- The length of data to be transferred must be indicated
- The mode of addressing for each operand must be specified

# Table 12.5

## Examples of IBM EAS/390 Data Transfer Operations

| Operation Mnemonic | Name            | Number of Bits Transferred | Description                                                      |
|--------------------|-----------------|----------------------------|------------------------------------------------------------------|
| L                  | Load            | 32                         | Transfer from memory to register                                 |
| LH                 | Load Halfword   | 16                         | Transfer from memory to register                                 |
| LR                 | Load            | 32                         | Transfer from register to register                               |
| LER                | Load (Short)    | 32                         | Transfer from floating-point register to floating-point register |
| LE                 | Load (Short)    | 32                         | Transfer from memory to floating-point register                  |
| LDR                | Load (Long)     | 64                         | Transfer from floating-point register to floating-point register |
| LD                 | Load (Long)     | 64                         | Transfer from memory to floating-point register                  |
| ST                 | Store           | 32                         | Transfer from register to memory                                 |
| STH                | Store Halfword  | 16                         | Transfer from register to memory                                 |
| STC                | Store Character | 8                          | Transfer from register to memory                                 |
| STE                | Store (Short)   | 32                         | Transfer from floating-point register to memory                  |
| STD                | Store (Long)    | 64                         | Transfer from floating-point register to memory                  |



- Most machines provide the basic arithmetic operations of add, subtract, multiply, and divide
- These are provided for signed integer (fixed-point) numbers
- Often they are also provided for floating-point and packed decimal numbers
- Other possible operations include a variety of single-operand instructions:
  - Absolute
    - Take the absolute value of the operand
  - Negate
    - Negate the operand
  - Increment
    - Add 1 to the operand
  - Decrement
    - Subtract 1 from the operand



## Arithmetic

## Table 12.6

### Basic Logical Operations

| P | Q | NOT P | P AND Q | P OR Q | P XOR Q | P=Q |
|---|---|-------|---------|--------|---------|-----|
| 1 | 0 | 1     | 0       | 0      | 0       | 1   |
| 2 | 1 | 1     | 0       | 1      | 1       | 0   |
| 3 | 0 | 0     | 0       | 1      | 1       | 0   |
| 4 | 1 | 0     | 1       | 1      | 0       | 1   |

Left Right



0 0 0 | 0 0 0 0



0 0 | 1 0 0 0 0



| 1 | 0 0 0



| 0 | 1 0 0 0 0



| 1 0 0 | 0



| 0 1 0 | 0

**Figure 12.6 Shift and Rotate Operations**

## Table 12.7

# Examples of Shift and Rotate Operations

| Input    | Operation                       | Result   |
|----------|---------------------------------|----------|
| 10100110 | Logical right shift (3 bits)    | 00010100 |
| 10100110 | Logical left shift (3 bits)     | 00110000 |
| 10100110 | Arithmetic right shift (3 bits) | 11110100 |
| 10100110 | Arithmetic left shift (3 bits)  | 10110000 |
| 10100110 | Right rotate (3 bits)           | 11010100 |
| 10100110 | Left rotate (3 bits)            | 00110101 |



Instructions that  
change the  
format or  
operate on the  
format of data



An example  
is converting  
from  
decimal to  
binary

# Conversion



An example of a  
more complex  
editing  
instruction is the  
EAS/390  
Translate (TR)  
instruction



# Input/Output



- Variety of approaches taken:
  - Isolated programmed I/O
  - Memory-mapped programmed I/O
  - DMA
  - Use of an I/O processor
- Many implementations provide only a few I/O instructions, with the specific actions specified by parameters, codes, or command words



# System Control

Instructions that can be executed only while the processor is in a certain privileged state or is executing a program in a special privileged area of memory

Typically these instructions are reserved for the use of the operating system

Examples of system control operations:

A system control instruction may read or alter a control register

An instruction to read or modify a storage protection key

Access to process control blocks in a multiprogramming system



# Transfer of Control

- Reasons why transfer-of-control operations are required:
  - It is essential to be able to execute each instruction more than once
  - Virtually all programs involve some decision making
  - It helps if there are mechanisms for breaking the task up into smaller pieces that can be worked on one at a time
- Most common transfer-of-control operations found in instruction sets:
  - Branch
  - Skip
  - Procedure call



**Figure 12.7 Branch Instructions**

# Skip Instructions

Includes an implied address

Typically implies that one instruction be skipped, thus the implied address equals the address of the next instruction plus one instruction length

Because the skip instruction does not require a destination address field it is free to do other things

Example is the increment-and-skip-if-zero (ISZ) instruction



# Procedure Call Instructions

- Self-contained computer program that is incorporated into a larger program
  - At any point in the program the procedure may be invoked, or *called*
  - Processor is instructed to go and execute the entire procedure and then return to the point from which the call took place
- Two principal reasons for use of procedures:
  - Economy
    - A procedure allows the same piece of code to be used many times
  - Modularity
- Involves two basic instructions:
  - A call instruction that branches from the present location to the procedure
  - Return instruction that returns from the procedure to the place from which it was called



(a) Calls and returns



(b) Execution sequence

**Figure 12.8 Nested Procedures**



**Figure 12.9 Use of Stack to Implement Nested Procedures of Figure 12.8**



(a) P is active



(b) P has called Q

**Figure 12.10 Stack Frame Growth Using Sample Procedures P and Q**

# x86 Operation Types



- The x86 provides a complex array of operation types including a number of specialized instructions
- The intent was to provide tools for the compiler writer to produce optimized machine language translation of high-level language programs
- Provides four instructions to support procedure call/return:
  - CALL
  - ENTER
  - LEAVE
  - RETURN
- When a new procedure is called the following must be performed upon entry to the new procedure:
  - Push the return point on the stack
  - Push the current frame pointer on the stack
  - Copy the stack pointer as the new value of the frame pointer
  - Adjust the stack pointer to allocate a frame

# Table 12.8

## x86 Status Flags

| Status Bit | Name            | Description                                                                                                                                                    |
|------------|-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CF         | Carry           | Indicates carrying or borrowing out of the left-most bit position following an arithmetic operation. Also modified by some of the shift and rotate operations. |
| PF         | Parity          | Parity of the least-significant byte of the result of an arithmetic or logic operation. 1 indicates even parity; 0 indicates odd parity.                       |
| AF         | Auxiliary Carry | Represents carrying or borrowing between half-bytes of an 8-bit arithmetic or logic operation. Used in binary-coded decimal arithmetic.                        |
| ZF         | Zero            | Indicates that the result of an arithmetic or logic operation is 0.                                                                                            |
| SF         | Sign            | Indicates the sign of the result of an arithmetic or logic operation.                                                                                          |
| OF         | Overflow        | Indicates an arithmetic overflow after an addition or subtraction for two's complement arithmetic.                                                             |

R

## Table 12.9

x86

# Condition Codes for Conditional Jump and SETcc Instructions

| Symbol     | Condition Tested                                | Comment                                                                |
|------------|-------------------------------------------------|------------------------------------------------------------------------|
| A, NBE     | CF=0 AND ZF=0                                   | Above; Not below or equal (greater than, unsigned)                     |
| AE, NB, NC | CF=0                                            | Above or equal; Not below (greater than or equal, unsigned); Not carry |
| B, NAE, C  | CF=1                                            | Below; Not above or equal (less than, unsigned); Carry set             |
| BE, NA     | CF=1 OR ZF=1                                    | Below or equal; Not above (less than or equal, unsigned)               |
| E, Z       | ZF=1                                            | Equal; Zero (signed or unsigned)                                       |
| G, NLE     | [(SF=1 AND OF=1) OR (SF=0 and OF=0)] AND [ZF=0] | Greater than; Not less than or equal (signed)                          |
| GE, NL     | (SF=1 AND OF=1) OR (SF=0 AND OF=0)              | Greater than or equal; Not less than (signed)                          |
| L, NGE     | (SF=1 AND OF=0) OR (SF=0 AND OF=1)              | Less than; Not greater than or equal (signed)                          |
| LE, NG     | (SF=1 AND OF=0) OR (SF=0 AND OF=1) OR (ZF=1)    | Less than or equal; Not greater than (signed)                          |
| NE, NZ     | ZF=0                                            | Not equal; Not zero (signed or unsigned)                               |
| NO         | OF=0                                            | No overflow                                                            |
| NS         | SF=0                                            | Not sign (not negative)                                                |
| NP, PO     | PF=0                                            | Not parity; Parity odd                                                 |
| O          | OF=1                                            | Overflow                                                               |
| P          | PF=1                                            | Parity; Parity even                                                    |
| S          | SF=1                                            | Sign (negative)                                                        |

(Table can be found on page 440 in the textbook.)

R

Table 12.10

# MMX Instruction Set

| Category      | Instruction          | Description                                                                                                                   |
|---------------|----------------------|-------------------------------------------------------------------------------------------------------------------------------|
| Arithmetic    | PADD [B, W, D]       | Parallel add of packed eight bytes, four 16-bit words, or two 32-bit doublewords, with wraparound.                            |
|               | PADDS [B, W]         | Add with saturation.                                                                                                          |
|               | PADDUS [B, W]        | Add unsigned with saturation                                                                                                  |
|               | PSUB [B, W, D]       | Subtract with wraparound.                                                                                                     |
|               | PSUBS [B, W]         | Subtract with saturation.                                                                                                     |
|               | PSUBUS [B, W]        | Subtract unsigned with saturation                                                                                             |
|               | PMULHW               | Parallel multiply of four signed 16-bit words, with high-order 16 bits of 32-bit result chosen.                               |
|               | PMULLW               | Parallel multiply of four signed 16-bit words, with low-order 16 bits of 32-bit result chosen.                                |
| Comparison    | PMADDWD              | Parallel multiply of four signed 16-bit words; add together adjacent pairs of 32-bit results.                                 |
|               | PCMPEQ [B, W, D]     | Parallel compare for equality; result is mask of 1s if true or 0s if false.                                                   |
|               | PCMPGT [B, W, D]     | Parallel compare for greater than; result is mask of 1s if true or 0s if false.                                               |
|               | PACKUSWB             | Pack words into bytes with unsigned saturation.                                                                               |
| Conversion    | PACKSS [WB, DW]      | Pack words into bytes, or doublewords into words, with signed saturation.                                                     |
|               | PUNPCKH [BW, WD, DQ] | Parallel unpack (interleaved merge) high-order bytes, words, or doublewords from MMX register.                                |
|               | PUNPCKL [BW, WD, DQ] | Parallel unpack (interleaved merge) low-order bytes, words, or doublewords from MMX register.                                 |
| Logical       | PAND                 | 64-bit bitwise logical AND                                                                                                    |
|               | PNDN                 | 64-bit bitwise logical AND NOT                                                                                                |
|               | POR                  | 64-bit bitwise logical OR                                                                                                     |
|               | PXOR                 | 64-bit bitwise logical XOR                                                                                                    |
| Shift         | PSLL [W, D, Q]       | Parallel logical left shift of packed words, doublewords, or quadword by amount specified in MMX register or immediate value. |
|               | PSRL [W, D, Q]       | Parallel logical right shift of packed words, doublewords, or quadword.                                                       |
|               | PSRA [W, D]          | Parallel arithmetic right shift of packed words, doublewords, or quadword.                                                    |
|               | MOV [D, Q]           | Move doubleword or quadword to/from MMX register.                                                                             |
| Data Transfer | EMMS                 | Empty MMX state (empty FP registers tag bits).                                                                                |
| State Mgt     |                      |                                                                                                                               |

(Table can be found on page  
442 in the textbook.)

Note: If an instruction supports multiple data types [byte (B), word (W), doubleword (D), quadword (Q)], the data types are indicated in brackets.

# x86 Single-Instruction, Multiple-Data (SIMD) Instructions

R

- 1996 Intel introduced MMX technology into its Pentium product line
  - MMX is a set of highly optimized instructions for multimedia tasks
- Video and audio data are typically composed of large arrays of small data types
- Three new data types are defined in MMX
  - Packed byte
  - Packed word
  - Packed doubleword
- Each data type is 64 bits in length and consists of multiple smaller data fields, each of which holds a fixed-point integer



MMX code sequence performing this operation:

|          |              |                                              |
|----------|--------------|----------------------------------------------|
| pxor     | mm7, mm7     | ;zero out mm7                                |
| movq     | mm3, fad_val | ;load fade value replicated 4 times          |
| movd     | mm0, imageA  | ;load 4 red pixel components from image A    |
| movd     | mm1, imageB  | ;load 4 red pixel components from image B    |
| punpkblw | mm0, mm7     | ;unpack 4 pixels to 16 bits                  |
| punpkblw | mm1, mm7     | ;unpack 4 pixels to 16 bits                  |
| psubw    | mm0, mm1     | ;subtract image B from image A               |
| pmulhw   | mm0, mm3     | ;multiply the subtract result by fade values |
| paddw    | mm0, mm1     | ;add result to image B                       |
| packuswb | mm0, mm7     | ;pack 16-bit results back to bytes           |

**Figure 12.11 Image Compositing on Color Plane Representation**

# ARM Operation Types



Load and store  
instructions

Branch  
instructions

Data-processing  
instructions

Multiply  
instructions

Parallel addition  
and subtraction  
instructions

Extend  
instructions

Status register  
access  
instructions

R

Table 12.11

# ARM Conditions for Conditional Instruction Execution

| Code | Symbol | Condition Tested                                                                        | Comment                                               |
|------|--------|-----------------------------------------------------------------------------------------|-------------------------------------------------------|
| 0000 | EQ     | $Z = 1$                                                                                 | Equal                                                 |
| 0001 | NE     | $Z = 0$                                                                                 | Not equal                                             |
| 0010 | CS/HS  | $C = 1$                                                                                 | Carry set/unsigned higher or same                     |
| 0011 | CC/LO  | $C = 0$                                                                                 | Carry clear/unsigned lower                            |
| 0100 | MI     | $N = 1$                                                                                 | Minus/negative                                        |
| 0101 | PL     | $N = 0$                                                                                 | Plus/positive or zero                                 |
| 0110 | VS     | $V = 1$                                                                                 | Overflow                                              |
| 0111 | VC     | $V = 0$                                                                                 | No overflow                                           |
| 1000 | HI     | $C = 1 \text{ AND } Z = 0$                                                              | Unsigned higher                                       |
| 1001 | LS     | $C = 0 \text{ OR } Z = 1$                                                               | Unsigned lower or same                                |
| 1010 | GE     | $N = V$<br>$[(N = 1 \text{ AND } V = 1)$<br>$\text{OR } (N = 0 \text{ AND } V = 0)]$    | Signed greater than or equal                          |
| 1011 | LT     | $N \neq V$<br>$[(N = 1 \text{ AND } V = 0)$<br>$\text{OR } (N = 0 \text{ AND } V = 1)]$ | Signed less than                                      |
| 1100 | GT     | $(Z = 0) \text{ AND } (N = V)$                                                          | Signed greater than                                   |
| 1101 | LE     | $(Z = 1) \text{ OR } (N \neq V)$                                                        | Signed less than or equal                             |
| 1110 | AL     | —                                                                                       | Always (unconditional)                                |
| 1111 | —      | —                                                                                       | This instruction can only be executed unconditionally |

(Table can be found on  
Page 445 in the textbook.)

# RoadMap

## Chapter

- Addressing modes
  - Immediate addressing
  - Direct addressing
  - Indirect addressing
  - Register addressing
  - Register indirect addressing
  - Displacement addressing
  - Stack addressing
- Assembly language

## Instruction Sets: Addressing Modes and Formats

- x86 addressing modes
- ARM addressing modes
- Instruction formats
  - Instruction length
  - Allocation of bits
  - Variable-length instructions
- X86 instruction formats
- ARM instruction formats

# 4. Addressing Modes





**Figure 13.1 Addressing Modes**

# Table 13.1

## Basic Addressing Modes

| Mode              | Algorithm                         | Principal Advantage | Principal Disadvantage     |
|-------------------|-----------------------------------|---------------------|----------------------------|
| Immediate         | $\text{Operand} = A$              | No memory reference | Limited operand magnitude  |
| Direct            | $\text{EA} = A$                   | Simple              | Limited address space      |
| Indirect          | $\text{EA} = (A)$                 | Large address space | Multiple memory references |
| Register          | $\text{EA} = R$                   | No memory reference | Limited address space      |
| Register indirect | $\text{EA} = (R)$                 | Large address space | Extra memory reference     |
| Displacement      | $\text{EA} = A + (R)$             | Flexibility         | Complexity                 |
| Stack             | $\text{EA} = \text{top of stack}$ | No memory reference | Limited applicability      |



# Immediate Addressing

- Simplest form of addressing
- Operand = A
- This mode can be used to define and use constants or set initial values of variables
  - Typically the number will be stored in twos complement form
  - The leftmost bit of the operand field is used as a sign bit
- Advantage:
  - No memory reference other than the instruction fetch is required to obtain the operand, thus saving one memory or cache cycle in the instruction cycle
- Disadvantage:
  - The size of the number is restricted to the size of the address field, which, in most instruction sets, is small compared with the word length

# Direct Addressing

Address field contains the effective address of the operand

Effective address (EA) = address field (A)

Was common in earlier generations of computers

Requires only one memory reference and no special calculation

Limitation is that it provides only a limited address space





# Indirect Addressing

- Reference to the address of a word in memory which contains a full-length address of the operand

$$\underline{\text{EA}} = \underline{(A)} \text{ Op}$$

- Parentheses are to be interpreted as meaning *contents of*

- Advantage:

- For a word length of  $N$  an address space of  $2^N$  is now available

*bit*

$$8 \rightarrow 2^8 = 256$$

- Disadvantage:

- Instruction execution requires two memory references to fetch the operand
  - One to get its address and a second to get its value

- A rarely used variant of indirect addressing is multilevel or cascaded indirect addressing

- $\text{EA} = (\dots ((A)) \dots)$

- Disadvantage is that three or more memory references could be required to fetch an operand



# Register Addressing



Address field  
refers to a  
register rather  
than a main  
memory address

$$EA = R$$

## Advantages:

- Only a small address field is needed in the instruction
- No time-consuming memory references are required

## Disadvantage:

- The address space is very limited



# Register Indirect Addressing

- Analogous to indirect addressing
  - The only difference is whether the address field refers to a memory location or a register
- $EA = (R)$
- Address space limitation of the address field is overcome by having that field refer to a word-length location containing an address
- Uses one less memory reference than indirect addressing



# Displacement Addressing



- Combines the capabilities of direct addressing and register indirect addressing
- $\underline{EA = A + (R)}$  B
- Requires that the instruction have two address fields, at least one of which is explicit
  - The value contained in one address field (value = A) is used directly
  - The other address field refers to a register whose contents are added to A to produce the effective address
- Most common uses:
  - Relative addressing
  - Base-register addressing
  - Indexing

# Relative Addressing

The implicitly referenced register is the program counter (PC)

- The next instruction address is added to the address field to produce the EA
- Typically the address field is treated as a twos complement number for this operation
- Thus the effective address is a displacement relative to the address of the instruction

Exploits the concept of locality

Saves address bits in the instruction if most memory references are relatively near to the instruction being executed

# Base-Register Addressing

- The referenced register contains a main memory address and the address field contains a displacement from that address
- The register reference may be explicit or implicit
- Exploits the locality of memory references
- Convenient means of implementing segmentation
- In some implementations a single segment base register is employed and is used implicitly
- In others the programmer may choose a register to hold the base address of a segment and the instruction must reference it explicitly



# Indexing

- The address field references a main memory address and the referenced register contains a positive displacement from that address
- The method of calculating the EA is the same as for base-register addressing
- An important use is to provide an efficient mechanism for performing iterative operations
- Autoindexing
  - Automatically increment or decrement the index register after each reference to it
  - $EA = A + (R)$
  - $(R) \leftarrow (R) + 1$
- Postindexing
  - Indexing is performed after the indirection
  - $EA = (A) + (R)$
- Preindexing
  - Indexing is performed before the indirection
  - $EA = (A + (R))$





# Stack Addressing



- A stack is a linear array of locations
  - Sometimes referred to as a *pushdown list* or *last-in-first-out queue*
- A stack is a reserved block of locations
  - Items are appended to the top of the stack so that the block is partially filled
- Associated with the stack is a pointer whose value is the address of the top of the stack
  - The stack pointer is maintained in a register
  - Thus references to stack locations in memory are in fact register indirect addresses
- Is a form of implied addressing
- The machine instructions need not include a memory reference but implicitly operate on the top of the stack





**Figure 13.2 x86 Addressing Mode Calculation**

# Table 13.2

## x86 Addressing Modes

R

| Mode                                    | Algorithm                                         |
|-----------------------------------------|---------------------------------------------------|
| Immediate                               | $\text{Operand} = A$                              |
| Register Operand                        | $\text{LA} = R$                                   |
| Displacement                            | $\text{LA} = (\text{SR}) + A$                     |
| Base                                    | $\text{LA} = (\text{SR}) + (B)$                   |
| Base with Displacement                  | $\text{LA} = (\text{SR}) + (B) + A$               |
| Scaled Index with Displacement          | $\text{LA} = (\text{SR}) + (I) \cdot S + A$       |
| Base with Index and Displacement        | $\text{LA} = (\text{SR}) + (B) + (I) + A$         |
| Base with Scaled Index and Displacement | $\text{LA} = (\text{SR}) + (I) \cdot S + (B) + A$ |
| Relative                                | $\text{LA} = (\text{PC}) + A$                     |

|     |   |                                                 |
|-----|---|-------------------------------------------------|
| LA  | = | linear address                                  |
| (X) | = | contents of X                                   |
| SR  | = | segment register                                |
| PC  | = | program counter                                 |
| A   | = | contents of an address field in the instruction |
| R   | = | register                                        |
| B   | = | base register                                   |
| I   | = | index register                                  |
| S   | = | scaling factor                                  |

STRB r0, [r1, #12]



STRB r0, [r1, #12]!



STRB r0, [r1], #12



Figure 13.3 ARM Indexing Methods



# ARM Data Processing Instruction Addressing and Branch Instructions

R

- Data processing instructions
  - Use either register addressing or a mixture of register and immediate addressing
  - For register addressing the value in one of the register operands may be scaled using one of the five shift operators
  
- Branch instructions
  - The only form of addressing for branch instructions is immediate
  - Instruction contains 24 bit value
    - Shifted 2 bits left so that the address is on a word boundary
  - Effective range  $\pm 32\text{MB}$  from the program counter

LDMxx r10, {r0, r1, r4}  
 STMxx r10, {r0, r1, r4}



**Figure 13.4 ARM Load/Store Multiple Addressing**

# 5. Instruction Formats

Define the layout of the bits of an instruction, in terms of its constituent fields

Must include an opcode and, implicitly or explicitly, indicate the addressing mode for each operand

For most instruction sets more than one instruction format is used



# Instruction Length



- Most basic design issue
- Affects, and is affected by:
  - Memory size
  - Memory organization
  - Bus structure
  - Processor complexity
  - Processor speed
- Should be equal to the memory-transfer length or one should be a multiple of the other
- Should be a multiple of the character length, which is usually 8 bits, and of the length of fixed-point numbers

# Allocation of Bits



1/0

2

Number of addressing modes

Number of operands

Register versus memory

Number of register sets

Address range

Address granularity

8  
3

4  
2



10bit

M

### Memory Reference Instructions

| Opcode | D/I | Z/C | Displacement |   |    |
|--------|-----|-----|--------------|---|----|
| 0      | 2   | 3   | 4            | 5 | 11 |

### Input/Output Instructions

| 1 | 1 | 0 | Device |  |  | 3 | 9 | Opcode |
|---|---|---|--------|--|--|---|---|--------|
| 0 |   | 2 | 3      |  |  | 8 |   | 11     |

### Register Reference Instructions

#### Group 1 Microinstructions

|   |   |   |   |     |     |     |     |     |     |     |     |
|---|---|---|---|-----|-----|-----|-----|-----|-----|-----|-----|
| 1 | 1 | 1 | 0 | CLA | CLL | CMA | CML | RAR | RAL | BSW | IAC |
| 0 | 1 | 2 | 3 | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  |

#### Group 2 Microinstructions

|   |   |   |   |     |     |     |     |     |     |     |    |
|---|---|---|---|-----|-----|-----|-----|-----|-----|-----|----|
| 1 | 1 | 1 | 1 | CLA | SMA | SZA | SNL | RSS | OSR | HLT | 0  |
| 0 | 1 | 2 | 3 | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11 |

#### Group 3 Microinstructions

|   |   |   |   |     |     |   |     |   |   |    |    |
|---|---|---|---|-----|-----|---|-----|---|---|----|----|
| 1 | 1 | 1 | 1 | CLA | MQA | 0 | MQL | 0 | 0 | 0  | 1  |
| 0 | 1 | 2 | 3 | 4   | 5   | 6 | 7   | 8 | 9 | 10 | 11 |

D/I = Direct/Indirect address

Z/C = Page 0 or Current page

CLA = Clear Accumulator

CLL = Clear Link

CMA = CoMplement Accumulator

CML = CoMplement Link

RAR = Rotate Accumultator Right

RAL = Rotate Accumulator Left

BSW = Byte SWap

IAC = Increment ACCumulator

SMA = Skip on Minus Accumulator

SZA = Skip on Zero Accumulator

SNL = Skip on Nonzero Link

RSS = Reverse Skip Sense

OSR = Or with Switch Register

HLT = HaLT

MQA = Multiplier Quotient into Accumulator

MQL = Multiplier Quotient Load

**Figure 13.5 PDP-8 Instruction Formats**



| Opcode | Register | I  | Index Register | Memory Address |    |
|--------|----------|----|----------------|----------------|----|
| 0      | 8 9      | 12 | 14             | 17 18          | 35 |

I = indirect bit

**Figure 13.6 PDP-10 Instruction Format**

# Variable-Length Instructions



- Variations can be provided efficiently and compactly
- Increases the complexity of the processor
- Does not remove the desirability of making all of the instruction lengths integrally related to word length
  - Because the processor does not know the length of the next instruction to be fetched a typical strategy is to fetch a number of bytes or words equal to at least the longest possible instruction
  - Sometimes multiple instructions are fetched

|    |              |             |                  |   |              |        |                        |   |              |                        |  |
|----|--------------|-------------|------------------|---|--------------|--------|------------------------|---|--------------|------------------------|--|
| 1  | Opcode<br>4  | Source<br>6 | Destination<br>6 | 2 | Opcode<br>7  | R<br>3 | Source<br>6            | 3 | Opcode<br>8  | Offset<br>8            |  |
| 4  | Opcode<br>8  | FP<br>2     | Destination<br>6 | 5 | Opcode<br>10 |        | Destination<br>6       | 6 | Opcode<br>12 | CC<br>4                |  |
| 7  | Opcode<br>13 |             | R<br>3           | 8 | Opcode<br>16 |        |                        |   |              |                        |  |
| 9  | Opcode<br>4  | Source<br>6 | Destination<br>6 |   |              |        | Memory Address<br>16   |   |              |                        |  |
| 10 | Opcode<br>7  | R<br>3      | Source<br>6      |   |              |        | Memory Address<br>16   |   |              |                        |  |
| 11 | Opcode<br>8  | FP<br>2     | Source<br>6      |   |              |        | Memory Address<br>16   |   |              |                        |  |
| 12 | Opcode<br>10 |             | Destination<br>6 |   |              |        | Memory Address<br>16   |   |              |                        |  |
| 13 | Opcode<br>4  | Source<br>6 | Destination<br>6 |   |              |        | Memory Address 1<br>16 |   |              | Memory Address 2<br>16 |  |

Numbers below fields indicate bit length

Source and Destination each contain a 3-bit addressing mode field and a 3-bit register number

FP indicates one of four floating-point registers

R indicates one of the general-purpose registers

CC is the condition code field

**Figure 13.7 Instruction Formats for the PDP-11**

| Hexadecimal Format                                                                 | Explanation                                                                                                                                                                               | Assembler Notation and Description                                                                                                                            |
|------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
|   | Opcde for RSB                                                                                                                                                                             | RSB<br>Return from subroutine                                                                                                                                 |
|   | Opcde for CLRL<br>Register R9                                                                                                                                                             | CLRL R9<br>Clear register R9                                                                                                                                  |
|   | Opcde for MOVW<br>Word displacement mode,<br>Register R4<br>356 in hexadecimal<br>Byte displacement mode,<br>Register R11<br>25 in hexadecimal                                            | MOVW 356(R4), 25(R11)<br>Move a word from address<br>that is 356 plus contents<br>of R4 to address that is<br>25 plus contents of R11                         |
|  | Opcde for ADDL3<br>Short literal 5<br>Register mode R0<br>Index prefix R2<br>Indirect word relative<br>(displacement from PC)<br>Amount of displacement from<br>PC relative to location A | ADDL3 #5, R0, @A[R2]<br>Add 5 to a 32-bit integer in<br>R0 and store the result in<br>location whose address is<br>sum of A and 4 times the<br>contents of R2 |

R

Figure 13.8 Examples of VAX Instructions

R



**Figure 13.9 x86 Instruction Format**

|                                    | 31   | 30 | 29 | 28 | 27     | 26 | 25 | 24 | 23           | 22    | 21            | 20           | 19            | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|------------------------------------|------|----|----|----|--------|----|----|----|--------------|-------|---------------|--------------|---------------|----|----|----|----|----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|
| data processing<br>immediate shift | cond | 0  | 0  | 0  | opcode | S  | Rn | Rd | shift amount | shift | 0             | Rm           |               |    |    |    |    |    |    |    |    |    |   |   |   |   |   |   |   |   |   |   |
| data processing<br>register shift  | cond | 0  | 0  | 0  | opcode | S  | Rn | Rd | Rs           | 0     | shift         | 1            | Rm            |    |    |    |    |    |    |    |    |    |   |   |   |   |   |   |   |   |   |   |
| data processing<br>immediate       | cond | 0  | 0  | 1  | opcode | S  | Rn | Rd | rotate       |       | immediate     |              |               |    |    |    |    |    |    |    |    |    |   |   |   |   |   |   |   |   |   |   |
| load/store<br>immediate offset     | cond | 0  | 1  | 0  | P      | U  | B  | W  | L            | Rn    | Rd            |              | immediate     |    |    |    |    |    |    |    |    |    |   |   |   |   |   |   |   |   |   |   |
| load/store<br>register offset      | cond | 0  | 1  | 1  | P      | U  | B  | W  | L            | Rn    | Rd            | shift amount | shift         | 0  | Rm |    |    |    |    |    |    |    |   |   |   |   |   |   |   |   |   |   |
| load/store<br>multiple             | cond | 1  | 0  | 0  | P      | U  | S  | W  | L            | Rn    |               |              | register list |    |    |    |    |    |    |    |    |    |   |   |   |   |   |   |   |   |   |   |
| branch/branch<br>with link         | cond | 1  | 0  | 1  | L      |    |    |    |              |       | 24-bit offset |              |               |    |    |    |    |    |    |    |    |    |   |   |   |   |   |   |   |   |   |   |

S = For data processing instructions, signifies that the instruction updates the condition codes

S = For load/store multiple instructions, signifies whether instruction execution is restricted to supervisor mode

P, U, W = bits that distinguish among different types of addressing\_mode

B = Distinguishes between an unsigned byte (B==1) and a word (B==0) access

L = For load/store instructions, distinguishes between a Load (L==1) and a Store (L==0)

L = For branch instructions, determines whether a return address is stored in the link register

## Figure 13.10 ARM Instruction Formats

|    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |   |   |   |   |   |   |   |   |   |   |   |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|---|
| 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |   |
| 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

ror #0 - range 0 through 0x000000FF - step 0x00000001



|    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |   |   |   |   |   |   |   |   |   |   |   |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|---|
| 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |   |
| 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

ror #8 - range 0 through 0xFF000000 - step 0x01000000

|    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |   |   |   |   |   |   |   |   |   |   |   |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|---|
| 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |   |
| 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

ror #30 - range 0 through 0x0000003FC - step 0x00000004

## Figure 13.11 Examples of Use of ARM Immediate Constants



**Figure 13.12 Expanding a Thumb ADD Instruction into its ARM Equivalent**



# Thumb-2 Instruction Set

- The only instruction set available on the Cortex-M microcontroller products
- Is a major enhancement to the Thumb instruction set architecture (ISA)
  - Introduces 32-bit instructions that can be intermixed freely with the older 16-bit Thumb instructions
  - Most 32-bit Thumb instructions are unconditional, whereas almost all ARM instructions can be conditional
  - Introduces a new If-Then (IT) instruction that delivers much of the functionality of the condition field in ARM instructions
- Delivers overall code density comparable with Thumb, together with the performance levels associated with the ARM ISA
- Before Thumb-2 developers had to choose between Thumb for size and ARM for performance



| Halfword 1 [15:13] | Halfword1 [12:11] | Length                | Functionality                                 |
|--------------------|-------------------|-----------------------|-----------------------------------------------|
| Not 111            | xx                | 16 bits (1 halfword)  | 16-bit Thumb instruction                      |
| 111                | 00                | 16 bits (1 halfword)  | 16-bit Thumb unconditional branch instruction |
| 111                | Not 00            | 32 bits (2 halfwords) | 32-bit Thumb-2 instruction                    |

**Figure 13.13 Thumb-2 Encoding**

| Address |      | Contents |     |      |
|---------|------|----------|-----|------|
| 101     | 0010 | 0010     | 101 | 2201 |
| 102     | 0001 | 0010     | 102 | 1202 |
| 103     | 0001 | 0010     | 103 | 1203 |
| 104     | 0011 | 0010     | 104 | 3204 |
|         |      |          |     |      |
| 201     | 0000 | 0000     | 201 | 0002 |
| 202     | 0000 | 0000     | 202 | 0003 |
| 203     | 0000 | 0000     | 203 | 0004 |
| 204     | 0000 | 0000     | 204 | 0000 |

(a) Binary program

| Address |      | Contents |  |
|---------|------|----------|--|
| 101     | 2201 |          |  |
| 102     | 1202 |          |  |
| 103     | 1203 |          |  |
| 104     | 3204 |          |  |
|         |      |          |  |
| 201     | 0002 |          |  |
| 202     | 0003 |          |  |
| 203     | 0004 |          |  |
| 204     | 0000 |          |  |

(b) Hexadecimal program

| Address |     | Instruction |
|---------|-----|-------------|
| 101     | LDA | 201         |
| 102     | ADD | 202         |
| 103     | ADD | 203         |
| 104     | STA | 204         |
|         |     |             |
| 201     | DAT | 2           |
| 202     | DAT | 3           |
| 203     | DAT | 4           |
| 204     | DAT | 0           |

(c) Symbolic program

| Label  | Operation | Operand |
|--------|-----------|---------|
| FORMUL | LDA       | I       |
|        | ADD       | J       |
|        | ADD       | K       |
|        | STA       | N       |
|        |           |         |
| I      | DATA      | 2       |
| J      | DATA      | 3       |
| K      | DATA      | 4       |
| N      | DATA      | 0       |

(d) Assembly program

**Figure 13.14 Computation of the Formula  $N = I + J + K$**

R



# RoadMap

## Chapter

- Processor organization
- Register organization
  - User-visible registers
  - Control and status registers
- ✓ Instruction cycle
  - The indirect cycle
  - Data flow
- The x86 processor family
  - Register organization
  - Interrupt processing

## Processor Structure and Function

- Instruction pipelining
  - Pipelining strategy
  - Pipeline performance
  - Pipeline hazards
  - Dealing with branches
  - Intel 80486 pipelining
- The Arm processor
  - Processor organization
  - Processor modes
  - Register organization
  - Interrupt processing

# 6. Processor Organization

## Processor Requirements:

- Fetch instruction
  - The processor reads an instruction from memory (register, cache, main memory)
- Interpret instruction
  - The instruction is decoded to determine what action is required
- Fetch data
  - The execution of an instruction may require reading data from memory or an I/O module
- Process data
  - The execution of an instruction may require performing some arithmetic or logical operation on data
- Write data
  - The results of an execution may require writing data to memory or an I/O module
- In order to do these things the processor needs to store some data temporarily and therefore needs a small internal memory





**Figure 14.1 The CPU with the System Bus**



**Figure 14.2 Internal Structure of the CPU**

# 7. Register Organization



- Within the processor there is a set of registers that function as a level of memory above main memory and cache in the hierarchy
- The registers in the processor perform two roles:

## User-Visible Registers

- Enable the machine or assembly language programmer to minimize main memory references by optimizing use of registers

## Control and Status Registers

- Used by the control unit to control the operation of the processor and by privileged operating system programs to control the execution of programs



# User-Visible Registers

Referenced by means of  
the machine language  
that the processor  
executes



## Categories:

- **General purpose**
  - Can be assigned to a variety of functions by the programmer
- **Data**
  - May be used only to hold data and cannot be employed in the calculation of an operand address
- **Address**
  - May be somewhat general purpose or may be devoted to a particular addressing mode
  - Examples: segment pointers, index registers, stack pointer
- **Condition codes**
  - Also referred to as flags
  - Bits set by the processor hardware as the result of operations



# Table 14.1

## Condition Codes

| Advantages                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | Disadvantages                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <ol style="list-style-type: none"><li>Because condition codes are set by normal arithmetic and data movement instructions, they should reduce the number of COMPARE and TEST instructions needed.</li><li>Conditional instructions, such as BRANCH are simplified relative to composite instructions, such as TEST AND BRANCH.</li><li>Condition codes facilitate multiway branches. For example, a TEST instruction can be followed by two branches, one on less than or equal to zero and one on greater than zero.</li><li>Condition codes can be saved on the stack during subroutine calls along with other register information.</li></ol> | <ol style="list-style-type: none"><li>Condition codes add complexity, both to the hardware and software. Condition code bits are often modified in different ways by different instructions, making life more difficult for both the microprogrammer and compiler writer.</li><li>Condition codes are irregular; they are typically not part of the main data path, so they require extra hardware connections.</li><li>Often condition code machines must add special non-condition-code instructions for special situations anyway, such as bit checking, loop control, and atomic semaphore operations.</li><li>In a pipelined implementation, condition codes require <u>special synchronization</u> to avoid conflicts.</li></ol> |

# Control and Status Registers

Four registers are essential to instruction execution:

- Program counter (PC)
  - Contains the address of an instruction to be fetched
- Instruction register (IR)
  - Contains the instruction most recently fetched
- Memory address register (MAR)
  - Contains the address of a location in memory
- Memory buffer register (MBR)
  - Contains a word of data to be written to memory or the word most recently read



# + Program Status Word (PSW)



Register or set of registers that contain status information

Common fields or flags include:

- Sign
- Zero
- Carry
- Equal
- Overflow
- Interrupt Enable/Disable
- Supervisor

DS/ 1 Use



**Figure 14.3 Example Microprocessor Register Organizations**

# 8. Instruction Cycle

Includes the following stages:

Fetch

Execute

Interrupt

Read the next instruction from memory into the processor

Interpret the opcode and perform the indicated operation

If interrupts are enabled and an interrupt has occurred, save the current process state and service the interrupt



**Figure 14.4 The Instruction Cycle**



**Figure 14.5 Instruction Cycle State Diagram**



MBR = Memory buffer register

MAR = Memory address register

IR = Instruction register

PC = Program counter

**Figure 14.6 Data Flow, Fetch Cycle**



**Figure 14.7 Data Flow, Indirect Cycle**



**Figure 14.8 Data Flow, Interrupt Cycle**

# 9. Instruction Pipelining

## Pipelining strategy :

Similar to the use of an assembly line in a manufacturing plant

To apply this concept to instruction execution we must recognize that an instruction has a number of stages



New inputs are accepted at one end before previously accepted inputs appear as outputs at the other end



**Figure 14.9 Two-Stage Instruction Pipeline**

# + Additional Stages

- Fetch instruction (FI)
  - Read the next expected instruction into a buffer
- Decode instruction (DI)
  - Determine the opcode and the operand specifiers
- Calculate operands (CO)
  - Calculate the effective address of each source operand
  - This may involve displacement, register indirect, indirect, or other forms of address calculation
- Fetch operands (FO)
  - Fetch each operand from memory
  - Operands in registers need not be fetched
- Execute instruction (EI)
  - Perform the indicated operation and store the result, if any, in the specified destination operand location
- Write operand (WO)
  - Store the result in memory



**Figure 14.10 Timing Diagram for Instruction Pipeline Operation**

Time →

← Branch Penalty →

|                | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 14 |
|----------------|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| Instruction 1  | FI | DI | CO | FO | EI | WO |    |    |    |    |    |    |    |    |
| Instruction 2  |    | FI | DI | CO | FO | EI | WO |    |    |    |    |    |    |    |
| Instruction 3  |    |    | FI | DI | CO | FO | EI | WO |    |    |    |    |    |    |
| Instruction 4  |    |    |    | FI | DI | CO | FO |    |    |    |    |    |    |    |
| Instruction 5  |    |    |    |    | FI | DI | CO |    |    |    |    |    |    |    |
| Instruction 6  |    |    |    |    |    | FI | DI |    |    |    |    |    |    |    |
| Instruction 7  |    |    |    |    |    |    | FI |    |    |    |    |    |    |    |
| Instruction 15 |    |    |    |    |    |    |    | FI | DI | CO | FO | EI | WO |    |
| Instruction 16 |    |    |    |    |    |    |    |    | FI | DI | CO | FO | EI | WO |

**Figure 14.11 The Effect of a Conditional Branch on Instruction Pipeline Operation**



Figure 14.12 Six-Stage Instruction Pipeline

|    | FI | DI | CO | FO | EI | WO |
|----|----|----|----|----|----|----|
| 1  | I1 |    |    |    |    |    |
| 2  | I2 | I1 |    |    |    |    |
| 3  | I3 | I2 | I1 |    |    |    |
| 4  | I4 | I3 | I2 | I1 |    |    |
| 5  | I5 | I4 | I3 | I2 | I1 |    |
| 6  | I6 | I5 | I4 | I3 | I2 | I1 |
| 7  | I7 | I6 | I5 | I4 | I3 | I2 |
| 8  | I8 | I7 | I6 | I5 | I4 | I3 |
| 9  | I9 | I8 | I7 | I6 | I5 | I4 |
| 10 |    | I9 | I8 | I7 | I6 | I5 |
| 11 |    |    | I9 | I8 | I7 | I6 |
| 12 |    |    |    | I9 | I8 | I7 |
| 13 |    |    |    |    | I9 | I8 |
| 14 |    |    |    |    |    | I9 |

(a) No branches

|    | FI  | DI  | CO  | FO  | EI  | WO  |
|----|-----|-----|-----|-----|-----|-----|
| 1  | I1  |     |     |     |     |     |
| 2  | I2  | I1  |     |     |     |     |
| 3  | I3  | I2  | I1  |     |     |     |
| 4  | I4  | I3  | I2  | I1  |     |     |
| 5  | I5  | I4  | I3  | I2  | I1  |     |
| 6  | I6  | I5  | I4  | I3  | I2  | I1  |
| 7  | I7  | I6  | I5  | I4  | I3  | I2  |
| 8  | I15 |     |     |     |     | I3  |
| 9  | I16 | I15 |     |     |     |     |
| 10 |     | I16 | I15 |     |     |     |
| 11 |     |     | I16 | I15 |     |     |
| 12 |     |     |     | I16 | I15 |     |
| 13 |     |     |     |     | I16 | I15 |
| 14 |     |     |     |     |     | I16 |

(b) With conditional branch

Figure 14.13 An Alternative Pipeline Depiction

S-  
-



(a)



(b)

**Figure 14.14 Speedup Factors with Instruction Pipelining**

# Pipeline Hazards

Occur when the pipeline, or some portion of the pipeline, must stall because conditions do not permit continued execution

There are three types of hazards:

- Resource
- Data
- Control

Also referred to as a *pipeline bubble*



|    | Clock cycle |    |    |    |    |    |    |    |   |
|----|-------------|----|----|----|----|----|----|----|---|
|    | 1           | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9 |
| I1 | FI          | DI | FO | EI | WO |    |    |    |   |
| I2 |             | FI | DI | FO | EI | WO |    |    |   |
| I3 |             |    | FI | DI | FO | EI | WO |    |   |
| I4 |             |    |    | FI | DI | FO | EI | WO |   |

(a) Five-stage pipeline, ideal case

|    | Clock cycle |    |      |    |    |    |    |    |    |
|----|-------------|----|------|----|----|----|----|----|----|
|    | 1           | 2  | 3    | 4  | 5  | 6  | 7  | 8  | 9  |
| I1 | FI          | DI | FO   | EI | WO |    |    |    |    |
| I2 |             | FI | DI   | FO | EI | WO |    |    |    |
| I3 |             |    | Idle | FI | DI | FO | EI | WO |    |
| I4 |             |    |      |    | FI | DI | FO | EI | WO |

(b) I1 source operand in memory

**Figure 14.15 Example of Resource Hazard**

D 5

**ADD EAX, EBX**

**SUB ECX, EAX**

| Clock cycle |    |    |    |      |    |    |    |    |    |
|-------------|----|----|----|------|----|----|----|----|----|
| 1           | 2  | 3  | 4  | 5    | 6  | 7  | 8  | 9  | 10 |
| FI          | DI | FO | EI | WO   |    |    |    |    |    |
|             | FI | DI |    | Idle | FO | EI | WO |    |    |
| I3          |    | FI |    |      | DI | FO | EI | WO |    |
| I4          |    |    |    |      | FI | DI | FO | EI | WO |

**Figure 14.16 Example of Data Hazard**



# Types of Data Hazard

- Read after write (RAW), or true dependency
  - An instruction modifies a register or memory location
  - Succeeding instruction reads data in memory or register location
  - Hazard occurs if the read takes place before write operation is complete
- Write after read (WAR), or antidependency
  - An instruction reads a register or memory location
  - Succeeding instruction writes to the location
  - Hazard occurs if the write operation completes before the read operation takes place
- Write after write (WAW), or output dependency
  - Two instructions both write to the same location
  - Hazard occurs if the write operations take place in the reverse order of the intended sequence

# Control Hazard



- Also known as a *branch hazard*
- Occurs when the pipeline makes the wrong decision on a branch prediction
- Brings instructions into the pipeline that must subsequently be discarded
- Dealing with Branches:
  - Multiple streams
  - Prefetch branch target
  - Loop buffer
  - Branch prediction
  - Delayed branch



# Multiple Streams

A simple pipeline suffers a penalty for a branch instruction because it must choose one of two instructions to fetch next and may make the wrong choice



A brute-force approach is to replicate the initial portions of the pipeline and allow the pipeline to fetch both instructions, making use of two streams



## Drawbacks:

- With multiple pipelines there are contention delays for access to the registers and to memory
- Additional branch instructions may enter the pipeline before the original branch decision is resolved

# Prefetch Branch Target

- When a conditional branch is recognized, the target of the branch is prefetched, in addition to the instruction following the branch
- Target is then saved until the branch instruction is executed
- If the branch is taken, the target has already been prefetched
- IBM 360/91 uses this approach





# Loop Buffer

- Small, very-high speed memory maintained by the instruction fetch stage of the pipeline and containing the  $n$  most recently fetched instructions, in sequence
- Benefits:
  - Instructions fetched in sequence will be available without the usual memory access time
  - If a branch occurs to a target just a few locations ahead of the address of the branch instruction, the target will already be in the buffer
  - This strategy is particularly well suited to dealing with loops
- Similar in principle to a cache dedicated to instructions
  - Differences:
    - The loop buffer only retains instructions in sequence
    - Is much smaller in size and hence lower in cost

## Branch address



**Figure 14.17 Loop Buffer**

# Branch Prediction

- Various techniques can be used to predict whether a branch will be taken:

1. Predict never taken
2. Predict always taken
3. Predict by opcode

- These approaches are static
- They do not depend on the execution history up to the time of the conditional branch instruction

1. Taken/not taken switch
2. Branch history table

- These approaches are dynamic
- They depend on the execution history



**Figure 14.18 Branch Prediction Flow Chart**



**Figure 14.19 Branch Prediction State Diagram**



(a) Predict never taken strategy



(b) Branch history table strategy

**Figure 14.20 Dealing with Branches**



# Interrupt Processing

## Interrupts and Exceptions

### ■ Interrupts

- Generated by a signal from hardware and it may occur at random times during the execution of a program
- Maskable
- Nonmaskable

### ■ Exceptions

- Generated from software and is provoked by the execution of an instruction
- Processor detected
- Programmed

### ■ Interrupt vector table

- Every type of interrupt is assigned a number
- Number is used to index into the interrupt vector table

# Processor Modes

R

ARM  
architecture  
supports seven  
execution  
modes

Most application  
programs execute in  
user mode

- While the processor is in user mode the program being executed is unable to access protected system resources or to change mode, other than by causing an exception to occur

Remaining six  
execution modes  
are referred to as  
privileged modes

- These modes are used to run system software

Advantages to defining  
so many different  
privileged modes

- The OS can tailor the use of system software to a variety of circumstances
- Certain registers are dedicated for use for each of the privileged modes, allows swifter changes in context

# Exception Modes

R

Have full access  
to system  
resources and can  
change modes  
freely

Entered when  
specific  
exceptions occur

**Exception modes:**

- Supervisor mode
- Abort mode
- Undefined mode
- Fast interrupt mode
- Interrupt mode

**System mode:**

- Not entered by any exception and uses the same registers available in User mode
- Is used for running certain privileged operating system tasks
- May be interrupted by any of the five exception categories



# Thank You!!!