

Poll: Anything positive about your weekend you'd like to share?

# EECS 370 - Lecture 4

ARM

byte be like



## Instruction Set Architecture (ISA) Design Lectures

- Lecture 2: ISA - storage types, binary and addressing modes
- Lecture 3 : LC2K
- Lecture 4 : ARM**
- Lecture 5 : Converting C to assembly – basic blocks
- Lecture 6 : Converting C to assembly – functions
- Lecture 7 : Translation software; libraries, memory layout

**M** Live Poll + Q&A: [slido.com #eeecs370](https://slido.com/#eeecs370)

Poll and Q&A Link

**M**

## Agenda

- ARM overview and basic instructions**
- Memory instructions
  - Handling multiple data widths
- Sample Problems

## ARMv8 ISA

- LC2K is intended to be an extremely bare-bones ISA
  - "Bare minimum"
  - Easy to design hardware for, really annoying to program on (as you'll see in P1m)
  - Invented for our projects, not used anywhere in practice
- ARM (specifically v8) is a much more powerful ISA
  - Used heavily in practice (most smartphones, some laptops & supercomputers)
  - Subset (LEG) is focus of hw and lecture



**M**

6

**M**

7

## ISA Types

### Reduced Instruction Set Computing (RISC)

- Fewer, simpler instructions
- Encoding of instructions are usually the same size
- Simpler hardware
- Program is larger, more tedious to write by hand
- E.g. LC2K, RISC-V, ARM (kinda)
- More popular now

### Complex Instruction Set Computing (CISC)

- More, complex instructions
- Encoding of instructions are different sizes
- More complex hardware
- Short, expressive programs, easier to write by hand
- E.g. x86
- Less popular now

|                | LC2K           | LEG            |
|----------------|----------------|----------------|
| # registers    | 8              | 32             |
| Register width | 32 bits        | 64 bits        |
| Memory size    | $2^{18}$ bytes | $2^{64}$ bytes |
| # instructions | 8              | 40-ish         |
| Addressability | Word           | Byte           |

## ARM vs LC2K at a Glance

We'll discuss what this means in a bit →

**M**

8

**M**

9

## ARM Instruction Set—LEGv8 subset

- The main types of instructions fall into the familiar classes we saw with LC2K:
  - Arithmetic**
    - Add, subtract, (multiply not in LEGv8)
  - Data transfer**
    - Loads and stores—LDUR (load unscaled register), STUR, etc.
  - Logical**
    - AND, ORR, EOR, etc.
    - Logical shifts, LSL, LSR
  - Conditional branch**
    - CBZ, CBNZ, B.cond
  - Unconditional branch (jumps)**
    - B, BR, BL

## LEGv8 Arithmetic Instructions

- Format: three operand fields
  - Dest. register usually the first one – check instruction format
  - ADD X3, X4, X7 // X3 = X4 + X7
  - LC2K generally has the destination on the right!!!!
  - E.g. add 1 2 3 // r1 + r2 -> r3 LC2K: reg1 reg2 dest X1 → t0 X2 → t1
  - LEG: dest reg1 reg2 X3 → g X4 → h X5 → i X6 → j
- C-code example:  $f = (g + h) - (i + j)$

|                           |   |                            |
|---------------------------|---|----------------------------|
| <b>ADD      t0, g, h</b>  | → | <b>ADD      X1, X3, X4</b> |
| <b>ADD      t1, i, j</b>  |   | <b>ADD      X2, X5, X6</b> |
| <b>SUB      f, t0, t1</b> |   | <b>SUB      X7, X1, X2</b> |

**M**

10

**M**

11

## LEGv8 R-instruction Encoding

- Register-to-register operations
- Consider ADD X3, X4, X7
  - $R[Rd] = R[Rn] + R[Rm]$
  - Rd = X3, Rn = X4, Rm = X7
- Rm = second register operand
- shamt = shift amount
  - not used in LEG for ADD/SUB and set to 0
- Rn = first register operand
- Rd = destination register
- ADD opcode is 100001011000, what are the other fields?

| opcode  | Rm     | shamt  | Rn     | Rd     |
|---------|--------|--------|--------|--------|
| 11 bits | 5 bits | 6 bits | 5 bits | 5 bits |

M Live Poll + Q&A: [slido.com #eecs370](https://slido.com/#eecs370)

12

## LEGv8 Logical Instructions

- Logical operations are bit-wise
- For example assume
  - $X9 = 11111111\ 11111111\ 11111111\ 00000000\ 00000000\ 00001101\ 11000000$
  - $X13 = 00000000\ 00000000\ 11011010\ 00000000\ 00000000\ 00111100\ 00000000$
  - AND X2, X13, X9 would result in
    - $X2 = 00000000\ 00000000\ 11011010\ 00000000\ 00000000\ 00001100\ 00000000$
- AND and OR correspond to C operators & and |
- For immediate fields the 12 bit constant is padded with zeros to the left—zero extended

| Category | Instruction            | Example         | Meaning             | Comments                            |
|----------|------------------------|-----------------|---------------------|-------------------------------------|
| Logical  | and                    | AND X1, X2, X3  | $X1 = X2 \& X3$     | Three reg. operands: bit-by-bit AND |
|          | inclusive or           | ORR X1, X2, X3  | $X1 = X2 \mid X3$   | Three reg. operands: bit-by-bit OR  |
|          | exclusive or           | EOR X1, X2, X3  | $X1 = X2 \wedge X3$ | Three reg. operands: bit-by-bit XOR |
|          | and immediate          | ANDI X1, X2, 20 | $X1 = X2 \& 20$     | Bit-by-bit AND reg. with constant   |
|          | inclusive or immediate | ORRI X1, X2, 20 | $X1 = X2 \mid 20$   | Bit-by-bit OR reg. with constant    |
|          | exclusive or immediate | EORI X1, X2, 20 | $X1 = X2 \wedge 20$ | Bit-by-bit XOR reg. with constant   |
|          | logical shift left     | LSL X1, X2, 10  | $X1 = X2 \ll 10$    | Shift left by constant              |
|          | logical shift right    | LSR X1, X2, 10  | $X1 = X2 \gg 10$    | Shift right by constant             |

14

## Pseudo Instructions

- Instructions that use a shorthand “mnemonic” that expands to pre-existing assembly instruction
- Example:
  - MOV X12, X2 // the contents of X2 copied to X12—X2 unchanged
- This gets expanded to:
  - ORR X12, XZR, X2
- What alternatives could we use instead of ORR?

M Live Poll + Q&A: [slido.com #eecs370](https://slido.com/#eecs370)

## Agenda

- ARM overview and basic instructions
- **Memory instructions**
  - Handling multiple data widths
- Sample Problems

## I-instruction Encoding

- Format: second source operand can be a register or immediate—a constant in the instruction itself
- e.g., ADD X3, X4, #10 //although we write "ADD", this is "ADDI"
- Format: 12 bits for immediate constants 0-4095

| opcode  | immediate | Rn     | Rd     |
|---------|-----------|--------|--------|
| 10 bits | 12 bits   | 5 bits | 5 bits |

• Don't need negative constants because we have SUBI

• C-code example:  $f = g + 10$

ADDI X7, X5, #10

• C-code example:  $f = g - 10$

SUBI X7, X5, #10

13

M Live Poll + Q&A: [slido.com #eecs370](https://slido.com/#eecs370)

12

## LEGv8 Shift Logical Instructions

### LSR X6, X23, #2

$X23 = 11111111\ 11111111\ 11111111\ 00000000\ 00000000\ 00000000\ 11011010\ 00000010$

### C equivalent

•  $X6 = X23 \gg 2;$

### LSL X6, X23, #2

• What register gets modified?

• What does it contain after executing the LSL instruction?

Poll: Why is shifting so valuable?

- Makes multiplying easier
- Allows quicker 2's-complement conversions
- Allows for more complex branching behavior
- It's always a good time to get shifty

• In shift operations Rm is always 0—shamt is 6 bit unsigned

| opcode  | Rm     | shamt  | Rn     | Rd     |
|---------|--------|--------|--------|--------|
| 11 bits | 5 bits | 6 bits | 5 bits | 5 bits |

M Live Poll + Q&A: [slido.com #eecs370](https://slido.com/#eecs370)

15

## Class Problem #1

- Show the C and LEGv8 assembly for extracting the value in bits 15:10 from a 64-bit integer variable



Assume the variable is in X1

$x = x \gg 10$   
 $x = x \& 0x3F .. \{011111\}$

Want these bits

- Poll: Which operations did you use (select all)?
- and ✓
  - or
  - add
  - left shift
  - right shift ✓
- LDR X2 X1 #10  
ANDI X2 X2 #63  
0x3F

M Live Poll + Q&A: [slido.com #eecs370](https://slido.com/#eecs370)

17

M Live Poll + Q&A: [slido.com #eecs370](https://slido.com/#eecs370)

Memory Diagram:

## Word vs Byte Addressing

- A **word** is a collection of bytes

- Exact size depends on architecture
- in LC2K and ARM, 4 bytes
- Double word is 8 bytes

### LC2K is word addressable

- Each **address** refers to a particular **word** in memory
- Wanna move forward one int? Increment address by **one**
- Wanna move forward one char? Uhhh...



- ARM (and most modern ISAs) is **byte addressable**

- Each **address** refers to a particular **byte** in memory
- Wanna move forward one int? Increment address by **four**
- Wanna move forward one char? Increment address by **one**

M Live Poll + Q&A: [slido.com #eecs370](https://slido.com/#eecs370)

18

M

19

## LEGv8 Memory Instructions

- Like LC2K, employs base + displacement addressing mode
  - Base is a register
  - Displacement is 9-bit immediate  $\pm 256$  bytes—sign extended to 64 bits
- Unlike LC2K (which always transfers 4 bytes), we have several options in LEGv8

| Category | Instruction Example | Meaning                                                                    | Comments                              |
|----------|---------------------|----------------------------------------------------------------------------|---------------------------------------|
|          | LDUR register       | LDUR X1, [X2, 40] X1 = Memory[X2 + 40]                                     | Doubleword from memory to register    |
|          | store register      | STUR X1, [X2, 40] Memory[X2 + 40] = X1                                     | Doubleword from register to memory    |
|          | load signed word    | LDURSW X1, [X2, 40] X1 = Memory[X2 + 40]                                   | Word from memory to register          |
|          | store word          | STURW X1, [X2, 40] Memory[X2 + 40] = X1                                    | Word from register to memory          |
|          | load half           | LDURH X1, [X2, 40] X1 = Memory[X2 + 40]                                    | Halfword from memory to register      |
|          | store half          | STURH X1, [X2, 40] Memory[X2 + 40] = X1                                    | Halfword register to memory           |
|          | load byte           | LDURB X1, [X2, 40] X1 = Memory[X2 + 40]                                    | Byte from memory to register          |
|          | store byte          | STURB X1, [X2, 40] Memory[X2 + 40] = X1                                    | Byte from register to memory          |
|          | move wide with zero | MOVZ X1, 20, LSL 0 X1 = 20 or 20 * 2 <sup>16</sup> or 20 * 2 <sup>48</sup> | Loads 16-bit constant, rest zeros     |
|          | move wide with keep | MOVK X1, 20, LSL 0 X1 = 20 or 20 * 2 <sup>16</sup> or 20 * 2 <sup>48</sup> | Loads 16-bit constant, rest unchanged |

Look over formatting on your own

## D-Instruction fields

- Data transfer
- opcode and op2 define data transfer operation
- address is the  $\pm 256$  bytes displacement
- Rn is the base register
- Rt is the destination (loads) or source (stores)
- More complicated modes are available in full ARMv8

| opcode  | address | op2    | Rn     | Rt     |
|---------|---------|--------|--------|--------|
| 11 bits | 9 bits  | 2 bits | 5 bits | 5 bits |

## Agenda

- ARM overview and basic instructions
- Memory instructions
  - Handling multiple data widths
- Sample Problems

## LEGv8 Memory Instructions

- Registers are 64 bits wide
- But sometimes we want to deal with non-64-bit entities
  - E.g. ints (32 bits), chars (8 bits)
- When we load smaller elements from memory, what do we set the other bits to?
  - Option A: set to zero
 
  - Option B: sign extend
 
- We'll need different instructions for different options

## Load Instruction Sizes



How much data is retrieved from memory at the given address?

| Desired amount of data to transfer?        | Operation                        | Unused bits in register?                                               | Example                                                                  |
|--------------------------------------------|----------------------------------|------------------------------------------------------------------------|--------------------------------------------------------------------------|
| 64-bits (double word or whole register)    | LDUR (Load unscaled to register) | N/A                                                                    | 0xFEDC_BA98_7654_3210                                                    |
| 16-bits (half-word) into lower bits of reg | LDURH                            | Set to zero                                                            | 0x0000_0000_0000_3210                                                    |
| 8-bits (byte) into lower bits of reg       | LDURB                            | Set to zero                                                            | 0x0000_0000_0000_0010                                                    |
| 32-bits (word) into lower bits of reg      | LDURSW (load signed word)        | Sign extend (0 or 1 based on most significant bit of transferred word) | 0x0000_0000_7654_3210<br>or<br>0xFFFF_FFFF_F654_3210 (depends on bit 31) |

## Load Instruction in Action



## Load Instruction in Action – other example

```

int my_big_number = -534159618; // 0xE0295EFE in 2's complement

int inc_number() {
    my_big_number++;
    // load value from mem into reg
    // then increment it
}

```

LDURSW X3, [X4, #0]

Sign extend (0xE0295EFE) to 64 bits → 0xFFFFFFFFE0295EFE



## But wait...

```

int my_big_number = -534159618; // 0xE0295EFE in 2's complement

```

|    |      |
|----|------|
| FE | 2604 |
| 5E | 2605 |
| 29 | 2606 |
| EO | 2607 |

- If I want to store this number in memory... should it be stored like this?
- ... or like this?

|    |      |
|----|------|
| EO | 2604 |
| 29 | 2605 |
| 58 | 2606 |
| FE | 2607 |

## Big Endian vs. Little Endian

- Endian-ness: ordering of bytes within a word
  - Little – Bigger address holds more significant bits
  - Big – Opposite, smaller address hold more significant bits
- The Internet is big endian, x86 is little endian, LEG and ARMv8 can switch
  - But in general assume little endian. (Figures from Wikipedia)



**M** Live Poll + Q&A: slido.com #eecs370

28

## Store Instructions

- Store instructions are simpler—there is no sign/zero extension to consider (do you see why?)

| Desired amount of data to transfer?        | Operation                      | Example               |
|--------------------------------------------|--------------------------------|-----------------------|
| 64-bits (double word or whole register)    | STUR (Store unscaled register) | 0xFEDC_BA98_7654_3210 |
| 16-bits (half-word) from lower bits of reg | STURH                          | 0x0000_0000_0000_3210 |
| 8-bits (byte) from lower bits of reg       | STURB                          | 0x0000_0000_0000_0010 |
| 32-bits (word) from lower bits of reg      | STURW                          | 0xFFFF_FFFF_F654_3210 |

29

## Agenda

- ARM overview and basic instructions
- Memory instructions
  - Handling multiple data widths
- Sample Problems

**M**

28

## Example Code Sequence

What is the final state of memory once you execute the following instruction sequence? (assume X5 has the value of 0)



31

## Example Code Sequence

What is the final state of memory once you execute the following instruction sequence? (assume X5 has the value of 0)



30

**M** Live Poll + Q&A: slido.com #eecs370

31

## Example Code Sequence

What is the final state of memory once you execute the following instruction sequence? (assume X5 has the value of 0)



32

## Example Code Sequence (2)

What is the final state of memory once you execute the following instruction sequence? (assume X5 has the value of 0)



34

**M**

35

## Example Code Sequence (2)

What is the final state of memory once you execute the following instruction sequence? (assume X5 has the value of 0)



40

**M**

39

**M**