

ECE 485/585  
Computer Organization and Design

Lecture 4: Instruction Set Architecture  
Fall 2022

Won-Jae Yi, Ph.D.

Department of Electrical and Computer Engineering  
Illinois Institute of Technology

# Procedure Calling

function  
call

→ one way to

Implement

abstraction

In SO.

- Steps required

1. Place parameters in registers  
 $\$a0, \$a1, \$a2, \$a3$
2. Transfer control to procedure
3. Acquire storage for procedure
4. Perform procedure's operations
5. Place result in register for caller
6. Return to place of call

return Value

$\$v0, \$v1$



return address.

# Register Usage

- \$a0 – \$a3: arguments (reg's 4 – 7)
- \$v0, \$v1: result values (reg's 2 and 3)
- \$t0 – \$t9: temporaries
  - Can be overwritten by callee
- \$s0 – \$s7: saved
  - Must be saved/restored by callee
- \$gp: global pointer for static data (reg 28)
- \$sp: stack pointer (reg 29)
- \$fp: frame pointer (reg 30)
- \$ra: return address (reg 31)

$\Rightarrow$  Size of your current stack.



| Name      | Register number | Usage                                        | Preserved on call? |
|-----------|-----------------|----------------------------------------------|--------------------|
| \$zero    | 0               | The constant value 0                         | n.a.               |
| \$v0-\$v1 | 2-3             | Values for results and expression evaluation | no                 |
| \$a0-\$a3 | 4-7             | Arguments                                    | no                 |
| \$t0-\$t7 | 8-15            | Temporaries                                  | no                 |
| \$s0-\$s7 | 16-23           | Saved                                        | yes                |
| \$t8-\$t9 | 24-25           | More temporaries                             | no                 |
| \$gp      | 28              | Global pointer                               | yes                |
| \$sp      | 29              | Stack pointer                                | yes                |
| \$fp      | 30              | Frame pointer                                | yes                |
| \$ra      | 31              | Return address                               | yes                |

# Procedure Call Instructions

- Procedure call: jump and link

→ **jal ProcedureLabel**

- Address of following instruction put in \$ra
- Jumps to target address

- Procedure return: jump register

→ **jr \$ra**

- Copies \$ra to program counter → PC
- Can also be used for computed jumps
  - e.g., for case/switch statements

# Leaf Procedure Example

- C code: LABEL.

```
int leaf_example(int g, h, i, j)
{int f; $s0 ←
f = (g + h) - (i + j);
return f;
}
```

\$a0 \$a1 \$a2 \$a3

\$v0

- Arguments g, ..., j in \$a0, ..., \$a3
- f in \$s0 (hence, need to save \$s0 on stack)
- Result in \$v0

\$s ... → global  
var.

→ saved  
data .

# *jal leaf-example*

## Leaf Procedure Example

- MIPS code:

```
leaf_example:  
    addi $sp, $sp, -4  
    sw $s0, 0($sp)  
    add $t0, $a0, $a1  
    add $t1, $a2, $a3  
    sub $s0, $t0, $t1  
    add $v0, $s0, $zero  
    lw $s0, 0($sp)  
    addi $sp, $sp, 4  
    jr $ra
```

Save \$s0 on stack

Procedure body

Result

Restore \$s0

Return

# Local Data on the Stack



- Local data allocated by callee
  - e.g., C automatic variables
- Procedure frame (activation record)
  - Used by some compilers to manage stack storage

# Memory Layout

- Text: program code
- Static data: global variables
  - e.g., static variables in C, constant arrays and strings
  - \$gp initialized to address allowing  $\pm$  offsets into this segment
- Dynamic data: heap
  - E.g., malloc in C, new in Java
- Stack: automatic storage



# Character Data

- Byte-encoded character sets
  - ASCII: 128 characters
    - 95 graphic, 33 control
  - Latin-1: 256 characters
    - ASCII, +96 more graphic characters
- Unicode: 32-bit character set
  - Used in Java, C++ wide characters, ...
  - Most of the world's alphabets, plus symbols
  - UTF-8, UTF-16: variable-length encodings

# ASCII Representation

| ASCII value | Character |
|-------------|-----------|-------------|-----------|-------------|-----------|-------------|-----------|-------------|-----------|-------------|-----------|
| 32          | space     | 48          | 0         | 64          | @         | 80          | P         | 96          | ~         | 112         | p         |
| 33          | !         | 49          | 1         | 65          | A         | 81          | Q         | 97          | a         | 113         | q         |
| 34          | "         | 50          | 2         | 66          | B         | 82          | R         | 98          | b         | 114         | r         |
| 35          | #         | 51          | 3         | 67          | C         | 83          | S         | 99          | c         | 115         | s         |
| 36          | \$        | 52          | 4         | 68          | D         | 84          | T         | 100         | d         | 116         | t         |
| 37          | %         | 53          | 5         | 69          | E         | 85          | U         | 101         | e         | 117         | u         |
| 38          | &         | 54          | 6         | 70          | F         | 86          | V         | 102         | f         | 118         | v         |
| 39          | '         | 55          | 7         | 71          | G         | 87          | W         | 103         | g         | 119         | w         |
| 40          | (         | 56          | 8         | 72          | H         | 88          | X         | 104         | h         | 120         | x         |
| 41          | )         | 57          | 9         | 73          | I         | 89          | Y         | 105         | i         | 121         | y         |
| 42          | *         | 58          | :         | 74          | J         | 90          | Z         | 106         | j         | 122         | z         |
| 43          | +         | 59          | ;         | 75          | K         | 91          | [         | 107         | k         | 123         | {         |
| 44          | ,         | 60          | <         | 76          | L         | 92          | \         | 108         | l         | 124         |           |
| 45          | -         | 61          | =         | 77          | M         | 93          | ]         | 109         | m         | 125         | }         |
| 46          | .         | 62          | >         | 78          | N         | 94          | ^         | 110         | n         | 126         | ~         |
| 47          | /         | 63          | ?         | 79          | O         | 95          | _         | 111         | o         | 127         | DEL       |

# Byte/Halfword Operations

- Could use bitwise operations
- MIPS byte/halfword load/store
  - String processing is a common case



Endian } BIG : most significant byte of the data  
will be stored in the least significant byte address

LITTLE : most significant byte of the data  
will be stored in the most significant  
byte address.

Ex) lb \$t0, \$t1(\$t1)  
Signed op.  
Ext.

lbu

Value  $\Rightarrow$  0x10010000

i) BIG ENDIAN  $\underline{\underline{\$t0 =}}$  0x00000055 10001001  
ii) LITTLE ENDIAN  $\underline{\underline{\$t0 =}}$  0xFFFFF88 10001000

# String Copy Example

- C code (naïve):

- Null-terminated string

```
void strcpy (char x[], char y[])
{ int i;
  i = 0;
  while ((x[i]=y[i]) != '\0')
    i += 1;
}
```

- Addresses of x, y in \$a0, \$a1
  - i in \$s0



# String Copy Example

Jal strcpy

- MIPS code:

```
strcpy:  
    addi $sp, $sp, -4      # adjust stack for 1 item  
    sw $s0, 0($sp)         # save $s0  
    add $s0, $zero, $zero  # i = 0  
L1: add $t1, $s0, $a1    # addr of y[i] in $t1  
    lbu $t2, 0($t1)        # $t2 = y[i]  
    add $t3, $s0, $a0    # addr of x[i] in $t3  
    sb $t2, 0($t3)        # x[i] = y[i]  
    beq $t2, $zero, L2    # exit loop if y[i] == 0  
    addi $s0, $s0, 1       # i = i + 1 &  
    j L1                  # next iteration of loop  
L2: lw $s0, 0($sp)        # restore saved $s0  
    addi $sp, $sp, 4       # pop 1 item from stack  
    jr $ra                # and return
```

\*. sll \$s0, \$s0, 2

base addr. Array Y

048C

100  
1000  
1100

base addr.  
Array X

X 4

## 32-bit Constants



- Most constants are small
  - 16-bit immediate is sufficient
- For the occasional 32-bit constant

lui rt, constant

- Copies 16-bit constant to left 16 bits of rt
- Clears right 16 bits of rt to 0
- Example: load the following 32-bit constant to \$s0

• 0000 0000 0011 1101 0000 1001 0000 0000

lui \$s0, 61

0000 0000 0011 1101 0000 0000 0000 0000

ori \$s0, \$s0, 2304

0000 0000 0011 1101 0000 1001 0000 0000

li \$s0,

MACRO Funk.

\$s0.

## Addressing Modes

# Branch Addressing

Absolute

0x2000



Relative



displacement/offset

how far away



from my current location to the target location.

↳ LABEL

Branch instructions specify

- Opcode, two registers, target address
- Most branch targets are near branch
- Forward or backward

bcg  
bne

\$rs, \$rt, LABEL

..

$$\text{target} = \text{PC}_{\text{new}} + (\text{offset} \times 4)$$

$\text{PC}_{\text{new}}$

$$\text{PC}_{\text{new}} = \text{PC}_{\text{current}} + Y$$

### ■ PC-relative addressing

→ Target address =  $\text{PC} + \text{offset} \times 4$

■ PC already incremented by 4 by this time





# Jump Addressing

Absolute

- Jump (j and jal) targets could be anywhere in text segment
  - Encode full address in instruction



# Target Addressing Example

- Loop code from earlier example

• Assume Loop at location 80000



60000/00 01001 10110/01001 000000 0/0000 = 0x01364820

① bne \$t0, \$55, Exit  
op      rs      rt      Imm.  
6      5      5      16  
I-type [ op | rs | rt | Imm ] ?  
?

000 101 01000 10101 0000 0000 0000 0010 = PC + (offset \* 4)

∴ 0x5/50002

0x80018 = PC<sub>current</sub> + 4  
<sup>0x8000C</sup>

② J Loop → 0x800000  
0x04020000 + (Imm \* 4)  
[ op | Imm | D ]  
6      26      0  
?      Imm      D  
?  
00001000 = 0x000000  
0x200000  
Imm × 4 = 0x000000  
1000  
0010  
2  
= 0x80010  
0x8000 = Imm × 4 + (Imm × 4)  
∴ Imm - 2

# Branching Far Away

- If branch target is too far to encode with 16-bit offset, assembler rewrites the code
- Example

```
    → beq $s0,$s1, L1  
        ↓  
        bne $s0,$s1, L2 {  
        j L1  
L2: ...
```

# Addressing Mode Summary

3 modes

1. Register operand

2. Immediate operand

3. Register  $\neq$

+ offset(displacement)



# Concluding Remarks

- Design principles
  - 1. Simplicity favors regularity
  - 2. Smaller is faster
  - 3. Make the common case fast
  - 4. Good design demands good compromises
- Layers of software/hardware
  - Compiler, assembler, hardware
- MIPS: typical of RISC ISAs
  - c.f. x86

# Concluding Remarks

- Measure MIPS instruction executions in benchmark programs
  - Consider making the common case fast
  - Consider compromises

| Instruction class | MIPS examples                        | SPEC2006 Int | SPEC2006 FP |
|-------------------|--------------------------------------|--------------|-------------|
| Arithmetic        | add, sub, addi                       | 16%          | 48%         |
| Data transfer     | lw, sw, lb, lbu,<br>lh, lhu, sb, lui | 35%          | 36%         |
| Logical           | and, or, nor, andi,<br>ori, sll, srl | 12%          | 4%          |
| Cond. Branch      | beq, bne,slt,<br>slti, sltiu         | 34%          | 8%          |
| Jump              | j, jr, jal                           | 2%           | 0%          |

### MIPS operands

| Name                  | Example                                                                          | Comments                                                                                                                                                                                      |
|-----------------------|----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 32 registers          | \$s0-\$s7, \$t0-\$t9, \$zero, \$a0-\$a3, \$v0-\$v1, \$gp, \$fp, \$sp, \$ra, \$at | Fast locations for data. In MIPS, data must be in registers to perform arithmetic, register \$zero always equals 0, and register \$at is reserved by the assembler to handle large constants. |
| $2^{30}$ memory words | Memory[0], Memory[4], ..., Memory[4294967292]                                    | Accessed only by data transfer instructions. MIPS uses byte addresses, so sequential word addresses differ by 4. Memory holds data structures, arrays, and spilled registers.                 |

### MIPS assembly language

| Category           | Instruction                      | Example             | Meaning                                                   | Comments                              |
|--------------------|----------------------------------|---------------------|-----------------------------------------------------------|---------------------------------------|
| Arithmetic         | add                              | add \$s1,\$s2,\$s3  | $\$s1 = \$s2 + \$s3$                                      | Three register operands               |
|                    | subtract                         | sub \$s1,\$s2,\$s3  | $\$s1 = \$s2 - \$s3$                                      | Three register operands               |
|                    | add immediate                    | addi \$s1,\$s2,20   | $\$s1 = \$s2 + 20$                                        | Used to add constants                 |
| Data transfer      | load word                        | lw \$s1,20(\$s2)    | $\$s1 = \text{Memory}[\$s2 + 20]$                         | Word from memory to register          |
|                    | store word                       | sw \$s1,20(\$s2)    | $\text{Memory}[\$s2 + 20] = \$s1$                         | Word from register to memory          |
|                    | load half                        | lh \$s1,20(\$s2)    | $\$s1 = \text{Memory}[\$s2 + 20]$                         | Halfword memory to register           |
|                    | load half unsigned               | lhu \$s1,20(\$s2)   | $\$s1 = \text{Memory}[\$s2 + 20]$                         | Halfword memory to register           |
|                    | store half                       | sh \$s1,20(\$s2)    | $\text{Memory}[\$s2 + 20] = \$s1$                         | Halfword register to memory           |
|                    | load byte                        | lb \$s1,20(\$s2)    | $\$s1 = \text{Memory}[\$s2 + 20]$                         | Byte from memory to register          |
|                    | load byte unsigned               | lbu \$s1,20(\$s2)   | $\$s1 = \text{Memory}[\$s2 + 20]$                         | Byte from memory to register          |
|                    | store byte                       | sb \$s1,20(\$s2)    | $\text{Memory}[\$s2 + 20] = \$s1$                         | Byte from register to memory          |
|                    | load linked word                 | l1 \$s1,20(\$s2)    | $\$s1 = \text{Memory}[\$s2 + 20]$                         | Load word as 1st half of atomic swap  |
|                    | store condition. word            | sc \$s1,20(\$s2)    | $\text{Memory}[\$s2 + 20] = \$s1; \$s1 = 0 \text{ or } 1$ | Store word as 2nd half of atomic swap |
| Logical            | load upper immed.                | lui \$s1,20         | $\$s1 = 20 * 2^{16}$                                      | Loads constant in upper 16 bits       |
|                    | and                              | and \$s1,\$s2,\$s3  | $\$s1 = \$s2 \& \$s3$                                     | Three reg. operands; bit-by-bit AND   |
|                    | or                               | or \$s1,\$s2,\$s3   | $\$s1 = \$s2   \$s3$                                      | Three reg. operands; bit-by-bit OR    |
|                    | nor                              | nor \$s1,\$s2,\$s3  | $\$s1 = \sim (\$s2   \$s3)$                               | Three reg. operands; bit-by-bit NOR   |
|                    | and immediate                    | andi \$s1,\$s2,20   | $\$s1 = \$s2 \& 20$                                       | Bit-by-bit AND reg with constant      |
|                    | or immediate                     | ori \$s1,\$s2,20    | $\$s1 = \$s2   20$                                        | Bit-by-bit OR reg with constant       |
|                    | shift left logical               | sll \$s1,\$s2,10    | $\$s1 = \$s2 << 10$                                       | Shift left by constant                |
| Conditional branch | shift right logical              | srl \$s1,\$s2,10    | $\$s1 = \$s2 >> 10$                                       | Shift right by constant               |
|                    | branch on equal                  | beq \$s1,\$s2,25    | if( $\$s1 == \$s2$ ) go to PC + 4 + 100                   | Equal test; PC-relative branch        |
|                    | branch on not equal              | bne \$s1,\$s2,25    | if( $\$s1 \neq \$s2$ ) go to PC + 4 + 100                 | Not equal test; PC-relative           |
|                    | set on less than                 | slt \$s1,\$s2,\$s3  | if( $\$s2 < \$s3$ ) $\$s1 = 1$ ; else $\$s1 = 0$          | Compare less than; for beq, bne       |
|                    | set on less than unsigned        | sltu \$s1,\$s2,\$s3 | if( $\$s2 < \$s3$ ) $\$s1 = 1$ ; else $\$s1 = 0$          | Compare less than unsigned            |
|                    | set less than immediate          | slti \$s1,\$s2,20   | if( $\$s2 < 20$ ) $\$s1 = 1$ ; else $\$s1 = 0$            | Compare less than constant            |
| Unconditional jump | set less than immediate unsigned | sltiu \$s1,\$s2,20  | if( $\$s2 < 20$ ) $\$s1 = 1$ ; else $\$s1 = 0$            | Compare less than constant unsigned   |
|                    | jump                             | j 2500              | go to 10000                                               | Jump to target address                |
|                    | jump register                    | jr \$ra             | go to \$ra                                                | For switch, procedure return          |
|                    | jump and link                    | jal 2500            | $\$ra = PC + 4$ ; go to 10000                             | For procedure call                    |

FIGURE 2.1 MIPS assembly language revealed in this chapter. This information is also found in Column 1 of the MIPS Reference Data Card at the front of this book.

# ARM & MIPS Similarities

- ARM: the most popular embedded core
- Similar basic set of instructions to MIPS

|                       | ARM                       | MIPS                      |
|-----------------------|---------------------------|---------------------------|
| Date announced        | 1985                      | 1985                      |
| Instruction size      | 32 bits                   | 32 bits                   |
| Address space         | 32-bit flat               | 32-bit flat               |
| Data alignment        | Aligned                   | Aligned                   |
| Data addressing modes | 9                         | 3                         |
| Registers             | $15 \times 32\text{-bit}$ | $31 \times 32\text{-bit}$ |
| Input/output          | Memory mapped             | Memory mapped             |

# Compare and Branch in ARM



- Uses condition codes for result of an arithmetic/logical instruction
  - Negative, zero, carry, overflow
  - Compare instructions to set condition codes without keeping the result
- Each instruction can be conditional
  - Top 4 bits of instruction word: condition value
  - Can avoid branches over single instructions

→ bne Sto, Sto, label  
MIPS

SUB  
BEQ  
D3, D1  
LABEL  
MC68K.

# Instruction Encoding (ARM vs MIPS)



--<--  
if ( $x \leq 0$ )  
     $x = 0$ ;  
else  
     $x = 1$ .

- MIPs -  
bgt \$t0, \$zero, LABEL1  
add \$t0, \$zero, \$zero  
j Exit  
→ LABEL1: add \$t0, \$zero, 1  
Exit : \_\_\_\_\_.

- ARM -

CMP r0, #0  
MOVLE r0, #0  
MOVGT r0, #1



# Addressing Modes

| Addressing mode                           | ARM | MIPS |
|-------------------------------------------|-----|------|
| Register operand                          | X   | X    |
| Immediate operand                         | X   | X    |
| Register + offset (displacement or based) | X   | X    |
| Register + register (indexed)             | X   | —    |
| Register + scaled register (scaled)       | X   | —    |
| Register + offset and update register     | X   | —    |
| Register + register and update register   | X   | —    |
| Autoincrement, autodecrement              | X   | —    |
| PC-relative data                          | X   | —    |

FIGURE 2.33 Summary of data addressing modes. ARM has separate register indirect and register 1 offset addressing modes, rather than just putting 0 in the offset of the latter mode. To get greater addressing range, ARM shifts the offset left 1 or 2 bits if the data size is halfword or word.

# Addressing Modes

|                   | Instruction name              | ARM                | MIPS          |
|-------------------|-------------------------------|--------------------|---------------|
| Register-register | Add                           | add                | addu, addiu   |
|                   | Add (trap if overflow)        | adds; swivs        | add           |
|                   | Subtract                      | sub                | subu          |
|                   | Subtract (trap if overflow)   | subs; swivs        | sub           |
|                   | Multiply                      | mul                | mult, multu   |
|                   | Divide                        | —                  | div, divu     |
|                   | And                           | and                | and           |
|                   | Or                            | orr                | or            |
|                   | Xor                           | eor                | xor           |
|                   | Load high part register       | —                  | lui           |
|                   | Shift left logical            | lsl <sup>1</sup>   | sllv, sll     |
|                   | Shift right logical           | lsr <sup>1</sup>   | srlv, srl     |
|                   | Shift right arithmetic        | asr <sup>1</sup>   | sra, sra      |
|                   | Compare                       | cmp, cmn, tst, teq | slt/i, slt/iu |
| Data transfer     | Load byte signed              | ldr sb             | lb            |
|                   | Load byte unsigned            | ldr b              | lbu           |
|                   | Load halfword signed          | ldr sh             | lh            |
|                   | Load halfword unsigned        | ldr h              | luh           |
|                   | Load word                     | ldr                | lw            |
|                   | Store byte                    | strb               | sb            |
|                   | Store halfword                | strh               | sh            |
|                   | Store word                    | str                | sw            |
|                   | Read, write special registers | mrs, msr           | move          |
|                   | Atomic Exchange               | swp, swpb          | ll;sc         |

# Unique Features of ARM

- ARM does not have \$zero
  - Separate opcodes to perform some operations that MIPS can do with \$zero
- ARM supports multiword arithmetic
  - Novel interpretation of 12-bit immediate field
    - 8 LSBs are zero-extended to 32-bit value
    - Then, rotated right by the number of bits specified in the first 4 bits of the field multiplied by 2
    - Capable of representing all powers of two in a 32-bit word
- Operand shifting not limited to immediate values
  - 2<sup>nd</sup> reg. of all arithmetic/logical operations has option to be shifted before the operation
- Has instructions to save groups of registers
  - Any of the 16 registers can be loaded/stored into a memory in a single instruction

# Intel x86 ISA

- Evolution with backward compatibility
  - 8080 (1974): 8-bit microprocessor
    - Accumulator, plus 3 index-register pairs
  - 8086 (1978): 16-bit extension to 8080
    - Complex instruction set (CISC)
  - 8087 (1980): floating-point coprocessor
    - Adds FP instructions and register stack
  - 80286 (1982): 24-bit addresses, MMU
    - Segmented memory mapping and protection
  - 80386 (1985): 32-bit extension (now IA-32)
    - Additional addressing modes and operations
    - Paged memory mapping as well as segments

# Intel x86 ISA

- Further evolution...
  - i486 (1989): pipelined, on-chip caches and FPU
    - Compatible competitors: AMD, Cyrix, ...
  - Pentium (1993): superscalar, 64-bit datapath
    - Later versions added MMX (Multi-Media eXtension) instructions
    - The infamous FDIV bug
  - Pentium Pro (1995), Pentium II (1997)
    - New microarchitecture (see Colwell, *The Pentium Chronicles*)
  - Pentium III (1999)
    - Added SSE (Streaming SIMD Extensions) and associated registers
  - Pentium 4 (2001)
    - New microarchitecture
    - Added SSE2 instructions

# Intel x86 ISA

- And further...
  - AMD64 (2003): extended architecture to 64 bits
  - EM64T – Extended Memory 64 Technology (2004)
    - AMD64 adopted by Intel (with refinements)
    - Added SSE3 instructions
  - Intel Core (2006)
    - Added SSE4 instructions, virtual machine support
  - AMD64 (announced 2007): SSE5 instructions
    - Intel declined to follow, instead...
  - Advanced Vector Extension (announced 2008)
    - Longer SSE registers, more instructions
- If Intel didn't extend with compatibility, its competitors would!
  - Technical elegance ≠ market success

# Basic x86 Registers

CCS<

| Name   | Use                                  |
|--------|--------------------------------------|
| EAX    | GPR 0                                |
| ECX    | GPR 1                                |
| EDX    | GPR 2                                |
| EBX    | GPR 3                                |
| ESP    | GPR 4                                |
| EBP    | GPR 5                                |
| ESI    | GPR 6                                |
| EDI    | GPR 7                                |
| CS     | Code segment pointer                 |
| SS     | Stack segment pointer (top of stack) |
| DS     | Data segment pointer 0               |
| ES     | Data segment pointer 1               |
| FS     | Data segment pointer 2               |
| GS     | Data segment pointer 3               |
| EIP    | Instruction pointer (PC)             |
| EFLAGS | Condition codes                      |

# Basic x86 Addressing Modes

- Two operands per instruction

| Source/dest operand | Second source operand |
|---------------------|-----------------------|
| Register            | Register              |
| Register            | Immediate             |
| Register            | Memory                |
| Memory              | Register              |
| Memory              | Immediate             |

- Memory addressing modes
  - Address in register
  - $\text{Address} = R_{\text{base}} + \text{displacement}$
  - $\text{Address} = R_{\text{base}} + 2^{\text{scale}} \times R_{\text{index}}$  (scale = 0, 1, 2, or 3)
  - $\text{Address} = R_{\text{base}} + 2^{\text{scale}} \times R_{\text{index}} + \text{displacement}$

# x86 Instruction Encoding

a. JE EIP + displacement



b. CALL



c. MOV EBX, [EDI + 45]



d. PUSH ESI



e. ADD EAX, #6765



f. TEST EDX, #42



- Variable length encoding

- Postfix bytes specify addressing mode

- Prefix bytes modify operation

- Operand length, repetition, locking, ...

# Implementing IA-32

- Complex instruction set makes implementation difficult
  - Hardware translates instructions to simpler microoperations
    - Simple instructions: 1–1
    - Complex instructions: 1–many
  - Microengine similar to RISC
  - Market share makes this economically viable
- Comparable performance to RISC
  - Compilers avoid complex instructions

# Fallacies

- Powerful instruction  $\Rightarrow$  higher performance
  - Fewer instructions required
  - But complex instructions are hard to implement
    - May slow down all instructions, including simple ones
  - Compilers are good at making fast code from simple instructions
- Use assembly code for high performance
  - But modern compilers are better at dealing with modern processors
  - More lines of code  $\Rightarrow$  more errors and less productivity

# Fallacies

- Backward compatibility  $\Rightarrow$  instruction set doesn't change
  - But they do accrete more instructions

