

# **Chapter 3:**

# **Assembly Language Programming I**

## **Topics**

- Assembly Programmer's Execution Model
- Accessing Information
  - Registers
  - Memory
- Arithmetic operations

# Announcements

- **Data Lab is due Friday Sept 12 by 11:55 pm**
  - Sign up for grading time slots with your TA
- **Prof. Han traveling next Tuesday to Washington DC**
  - Guest lecture next Tuesday by TA Yogesh Virkar
  - No office hours next Tuesday, but will be back for Wednesday office hours 3-4 pm
- **First midterm probably the week of Oct 6**
- **Essential that you read the textbook in detail & do the practice problems**
  - Read Chapter 3.1-3.14, skip 3.12 for now

# Chapter Mapping

Chapter 3

Lines of  
Source  
code

Pre-processor  
&  
Compiler

add a,b  
sub a,b  
move a...  
  
Lines of  
Assembly  
code

Assembler  
&  
Linker

Chapter 2

Operating  
System

10101010

Memory



10101010  
00001010  
01010111  
  
Lines of  
Binary  
code &  
data  
  
11101110  
00111011  
10000111

Chapters 3, 4 and 6

# Intel x86 Processors

- **Totally dominate PC computer market**
- **Evolutionary design**
  - Backwards compatible up until 8086 16-bit CPU, introduced in 1978
  - Then 80286, 80386, 80486, Pentium, ..., Intel Core i7 – hence the name x86
  - Added more features as time goes on
- **Complex instruction set computer (CISC)**
  - Many different instructions with many different formats
    - But, only small subset encountered with Linux programs
  - Hard to match performance of Reduced Instruction Set Computers (RISC)
  - But, Intel has done just that!



# Assembly Language

- **Specific to a CPU**
  - We will be using x86 assembly language
  - ARM processors will have a different assembly language, etc.
  - 32-bit processors will have different assembly language than 64-bit processors
- **Different styles for IA32 assembly code**
  - We will be using gcc/GNU-style assembly language
  - There is also the Intel style of assembly language
    - Example: switches order of source and destination compared to gcc/GNU

# Assembly Programmer's View



## Programmer-Visible State

- **EIP Program Counter**
  - Address of next instruction
- **Register File**
  - Heavily used program data
- **Condition Codes**
  - Store status information about most recent arithmetic operation
  - Used for conditional branching

- **Memory**
  - Byte addressable array
  - Code, user data, (some) OS data
  - Includes call stack used to support procedures

# Turning C into Object Code

- Code in files p1.c p2.c
- Compile with command: gcc -O p1.c p2.c -o p
  - Use optimizations (-O)
  - Put resulting binary in file p



# Compiling Into Assembly

## C Code

```
int sum(int x, int y)
{
    int t = x+y;
    return t;
}
```

## Generated Assembly

```
_sum:
    pushl %ebp
    movl %esp,%ebp
    movl 12(%ebp),%eax
    addl 8(%ebp),%eax
    movl %ebp,%esp
    popl %ebp
    ret
```

Obtain with command

```
gcc -O -S code.c
```

Produces file code.s

Note how the assembly:

- Maps to C code (**t = x+y** maps to the **addl** instruction)
- Contains only simple data types (long ints with the “**l**” suffix)
- Moves data between memory and registers

# Disassembling Object Code

## Disassembled

```
00401040 <_sum>:  
 0:      55          push    %ebp  
 1:  89 e5         mov     %esp,%ebp  
 3:  8b 45 0c      mov     0xc(%ebp),%eax  
 6:  03 45 08      add     0x8(%ebp),%eax  
 9:  89 ec         mov     %ebp,%esp  
 b:  5d           pop    %ebp  
 c:  c3           ret  
 d:  8d 76 00      lea     0x0(%esi),%esi
```

## Disassembler

`objdump -d p`

- Useful tool for examining object code
- Analyzes bit pattern of series of instructions
- Produces approximate rendition of assembly code
- Can be run on either a .out (complete executable) or .o file

# Alternate Disassembly

## Object

0x401040:  
0x55  
0x89  
0xe5  
0xb8  
0x45  
0x0c  
0x03

## Disassembled

|                    |      |                 |
|--------------------|------|-----------------|
| 0x401040 <sum>:    | push | %ebp            |
| 0x401041 <sum+1>:  | mov  | %esp, %ebp      |
| 0x401043 <sum+3>:  | mov  | 0xc(%ebp), %eax |
| 0x401046 <sum+6>:  | add  | 0x8(%ebp), %eax |
| 0x401049 <sum+9>:  | mov  | %ebp, %esp      |
| 0x40104b <sum+11>: | pop  | %ebp            |
| 0x40104c <sum+12>: | ret  |                 |
| 0x40104d <sum+13>: | lea  | 0x0(%esi), %esi |

0x45  
0x08  
0x89  
0xec  
0x5d  
0xc3

## Within gdb Debugger

gdb p

disassemble sum

■ Disassemble procedure

x/13b sum

■ Examine the 13 bytes starting at sum

# Moving Data

## Moving Data

`movl Source, Dest`

- Move 4-byte (“long”) word
- Lots of these in typical code

## Operand Types

- Register: One of 8 integer registers
  - But %esp and %ebp reserved for special use
  - Others have special uses for particular instructions
- Memory: 4 consecutive bytes of memory
  - Various “address modes”
- Immediate: Constant integer data
  - Like C constant, but prefixed with ‘\$’
  - E.g., \$0x400, \$-533
  - Encoded with 1, 2, or 4 bytes

## CPU registers



# Representing Instructions

- **For historical reasons (16-bit processors), Intel terminology considers a “word” to be 16 bits long**
  - ‘`movw %ax, %dx`’, where the ‘`w`’ in `movw` implies a 16 bit quantity is about to be moved
  - ‘`movl %eax, %edx`’, the ‘`l`’ in `movl` implies a “long” 32-bit quantity is about to be moved.
  - **See text for more `mov` instructions:** `movb`, `movw`, `movl`, `movsbw`, `movsbl`, `movswl`, `movzbw`, `movzbl`, `movzwl`
- **Does the width of a C ‘long’ int == an x86 assembly language ‘long’ word?**
  - Yes, they’re both 32-bits or 4 bytes on a 32-bit x86 machine
  - No, a C ‘long’ is 64-bits on a 64-bit x86 machine, while an x86 assembly language ‘long’ is still 32-bits wide
    - i.e. a `movl` on a 64-bit machine is still going to move a 32-bit quantity

# Moving Data Examples



## Examples

1. Moving the value in one register to another
2. Moving a value at a memory location to a register
3. Moving a register value to a memory location

# movl Operand Combinations

|      | Source     | Destination                                                 | C Analog                     |
|------|------------|-------------------------------------------------------------|------------------------------|
| movl | <i>Imm</i> | <i>Reg</i> movl \$0x4,%eax<br><i>Mem</i> movl \$-147,(%eax) | temp = 0x4;<br>*p = -147;    |
|      | <i>Reg</i> | <i>Reg</i> movl %eax,%edx<br><i>Mem</i> movl %eax,(%edx)    | temp2 = temp1;<br>*p = temp; |
|      | <i>Mem</i> | <i>Reg</i> movl (%eax),%edx                                 | temp = *p;                   |

- Cannot do memory-memory transfers with single instruction
  - i.e. can't do: `movl (%eax), (%edx)`

# Simple Addressing Modes

**Normal**

**(R)**

**Mem[Reg[R]]**

- Register R specifies memory address

`movl (%ecx), %eax`

**Displacement D(R)**

**Mem[Reg[R]+D]**

- Register R specifies start of memory region
- Constant displacement D specifies offset

`movl 8(%ebp), %edx`



Go to memory address `%ebp+8`  
and fetch the data located there

# Using Simple Addressing Modes

```
void swap(int *xp, int *yp)
{
    int t0 = *xp;
    int t1 = *yp;
    *xp = t1;
    *yp = t0;
}
```

swap:

```
pushl %ebp  
movl %esp,%ebp  
pushl %ebx
```

} Set Up

```
movl 12(%ebp),%ecx  
movl 8(%ebp),%edx  
movl (%ecx),%eax  
movl (%edx),%ebx  
movl %eax,(%edx)  
movl %ebx,(%ecx)
```

} Body

```
movl -4(%ebp),%ebx  
movl %ebp,%esp  
popl %ebp  
ret
```

} Clean Up

# C Pointers – A Quick Recap

- **int count=1;**
  - Declare an integer named count
  - This allocates 4 bytes in memory for the variable count
  - Initialize count to the value 1
- **char \*p1;**
  - Declare p1 as a *pointer* to a char, i.e. the value of p1 is interpreted as a memory address (4 bytes wide on 32-bit systems)
  - The pointer is allocated space in memory (4 bytes, not 1)
- **int \*p2 = &count;**
  - Declare p2 as a pointer to an integer
  - Allocates 4 bytes in memory for the pointer (32-bit)
  - Initializes its value to the memory address of the count variable

# C Pointers (2)

```
int count=1;
```

```
char *p1;
```

```
int *p2 = &count;
```

- Assume the variables are laid out in memory as shown
- We see p2 storing the memory address of count, i.e. p2 is *pointing at* count
- p1 is uninitialized and not yet pointing at any character



For brevity, the two most significant bytes of address are not shown

# Using Simple Addressing Modes

```
void swap(int *xp, int *yp)
{
    int t0 = *xp;
    int t1 = *yp;
    *xp = t1;
    *yp = t0;
}
```

swap:

```
pushl %ebp  
movl %esp,%ebp  
pushl %ebx
```

} Set Up

```
movl 12(%ebp),%ecx  
movl 8(%ebp),%edx  
movl (%ecx),%eax  
movl (%edx),%ebx  
movl %eax,(%edx)  
movl %ebx,(%ecx)
```

} Body

```
movl -4(%ebp),%ebx  
movl %ebp,%esp  
popl %ebp  
ret
```

} Finish

# Understanding Swap

```
void swap(int *xp, int *yp)
{
    int t0 = *xp;
    int t1 = *yp;
    *xp = t1;
    *yp = t0;
}
```

Memory  
(shown as 4 bytes wide, but  
in reality byte addressable)



| Register | Variable |
|----------|----------|
|----------|----------|

|      |    |
|------|----|
| %ecx | yp |
| %edx | xp |
| %eax | t1 |
| %ebx | t0 |

|                     |                  |
|---------------------|------------------|
| movl 12(%ebp), %ecx | # ecx = yp       |
| movl 8(%ebp), %edx  | # edx = xp       |
| movl (%ecx), %eax   | # eax = *yp (t1) |
| movl (%edx), %ebx   | # ebx = *xp (t0) |
| movl %eax, (%edx)   | # *xp = eax      |
| movl %ebx, (%ecx)   | # *yp = ebx      |

# Understanding Swap

## CPU Registers

|      |       |
|------|-------|
| %eax |       |
| %edx |       |
| %ecx |       |
| %ebx |       |
| %esi |       |
| %edi |       |
| %esp |       |
| %ebp | 0x104 |



```

movl 12(%ebp), %ecx    # ecx = yp
movl 8(%ebp), %edx     # edx = xp
movl (%ecx), %eax      # eax = *yp (t1)
movl (%edx), %ebx      # ebx = *xp (t0)
movl %eax, (%edx)       # *xp = eax
movl %ebx, (%ecx)       # *yp = ebx

```

# Understanding Swap

## CPU Registers

|      |       |
|------|-------|
| %eax |       |
| %edx |       |
| %ecx | 0x120 |
| %ebx |       |
| %esi |       |
| %edi |       |
| %esp |       |
| %ebp | 0x104 |



```

movl 12(%ebp),%ecx    # ecx = yp
movl 8(%ebp),%edx     # edx = xp
movl (%ecx),%eax      # eax = *yp (t1)
movl (%edx),%ebx      # ebx = *xp (t0)
movl %eax,(%edx)       # *xp = eax
movl %ebx,(%ecx)       # *yp = ebx

```

# Understanding Swap

CPU Registers

|      |       |
|------|-------|
| %eax |       |
| %edx | 0x124 |
| %ecx | 0x120 |
| %ebx |       |
| %esi |       |
| %edi |       |
| %esp |       |
| %ebp | 0x104 |



```

movl 12(%ebp),%ecx    # ecx = yp
movl 8(%ebp),%edx     # edx = xp
movl (%ecx),%eax      # eax = *yp (t1)
movl (%edx),%ebx      # ebx = *xp (t0)
movl %eax,(%edx)       # *xp = eax
movl %ebx,(%ecx)       # *yp = ebx

```

# Understanding Swap

CPU Registers

|      |       |
|------|-------|
| %eax | 456   |
| %edx | 0x124 |
| %ecx | 0x120 |
| %ebx |       |
| %esi |       |
| %edi |       |
| %esp |       |
| %ebp | 0x104 |



```

movl 12(%ebp), %ecx    # ecx = yp
movl 8(%ebp), %edx     # edx = xp
movl (%ecx), %eax      # eax = *yp (t1)
movl (%edx), %ebx      # ebx = *xp (t0)
movl %eax, (%edx)       # *xp = eax
movl %ebx, (%ecx)       # *yp = ebx

```

# Understanding Swap

CPU Registers

|      |       |
|------|-------|
| %eax | 456   |
| %edx | 0x124 |
| %ecx | 0x120 |
| %ebx | 123   |
| %esi |       |
| %edi |       |
| %esp |       |
| %ebp | 0x104 |



```

movl 12(%ebp), %ecx    # ecx = yp
movl 8(%ebp), %edx     # edx = xp
movl (%ecx), %eax      # eax = *yp (t1)
movl (%edx), %ebx      # ebx = *xp (t0)
movl %eax, (%edx)       # *xp = eax
movl %ebx, (%ecx)       # *yp = ebx

```

# Understanding Swap

CPU Registers

|      |       |
|------|-------|
| %eax | 456   |
| %edx | 0x124 |
| %ecx | 0x120 |
| %ebx | 123   |
| %esi |       |
| %edi |       |
| %esp |       |
| %ebp | 0x104 |



```

movl 12(%ebp), %ecx    # ecx = yp
movl 8(%ebp), %edx     # edx = xp
movl (%ecx), %eax      # eax = *yp (t1)
movl (%edx), %ebx      # ebx = *xp (t0)
movl %eax, (%edx)       # *xp = eax
movl %ebx, (%ecx)       # *yp = ebx

```

# Understanding Swap

CPU Registers

|      |       |
|------|-------|
| %eax | 456   |
| %edx | 0x124 |
| %ecx | 0x120 |
| %ebx | 123   |
| %esi |       |
| %edi |       |
| %esp |       |
| %ebp | 0x104 |

\*yp (t1)  
xp  
yp  
\*xp (t0)



```

movl 12(%ebp), %ecx    # ecx = yp
movl 8(%ebp), %edx     # edx = xp
movl (%ecx), %eax      # eax = *yp (t1)
movl (%edx), %ebx      # ebx = *xp (t0)
movl %eax, (%edx)       # *xp = eax
movl %ebx, (%ecx)       # *yp = ebx

```

# C operators – Assembly Equivalents?

## Operators

( ) [ ] -> .  
! ~ ++ -- + - \* & (type) sizeof  
\* / %  
+ -  
<< >>  
< <= > >=  
== !=  
&  
^  
|  
&&  
||  
?:  
= += -= \*= /= %= &= ^= != <<= >>=  
,

Many of these C operators have direct x86 assembly equivalents

# Some Assembly Arithmetic Operations

## Format                  Computation

### Two Operand Instructions

|                             |                                        |
|-----------------------------|----------------------------------------|
| <code>addl Src,Dest</code>  | $Dest = Dest + Src$                    |
| <code>subl Src,Dest</code>  | $Dest = Dest - Src$                    |
| <code>imull Src,Dest</code> | $Dest = Dest * Src$ Signed multiply    |
| <code>sall Src,Dest</code>  | $Dest = Dest \ll Src$ Also called shll |
| <code>sarl Src,Dest</code>  | $Dest = Dest \gg Src$ Arithmetic       |
| <code>shrl Src,Dest</code>  | $Dest = Dest \gg Src$ Logical          |
| <code>xorl Src,Dest</code>  | $Dest = Dest \wedge Src$               |
| <code>andl Src,Dest</code>  | $Dest = Dest \& Src$                   |
| <code>orl Src,Dest</code>   | $Dest = Dest \mid Src$                 |

# Some Arithmetic Operations

## Format                  Computation

### One Operand Instructions

**incl Dest**                   $Dest = Dest + 1$

**decl Dest**                   $Dest = Dest - 1$

**negl Dest**                   $Dest = - Dest$

**notl Dest**                   $Dest = \sim Dest$

# Indexed Addressing Modes

## Most General Form

$D(Rb, Ri, S)$

$\text{Mem}[\text{Reg}[Rb] + S * \text{Reg}[Ri] + D]$

- $D$ : Constant “displacement” 1, 2, or 4 bytes
- $Rb$ : Base register: Any of 8 integer registers
- $Ri$ : Index register: Any, except for  $\%esp$ 
  - Unlikely you’d use  $\%ebp$ , either
- $S$ : Scale: 1, 2, 4, or 8

## Example

- `movl 8(%eax,%edx,4), dst`      is equivalent to  
`movl (%eax+4*%edx+8), dst`

## Special Cases (can leave out some elements)

$(Rb, Ri)$

$\text{Mem}[\text{Reg}[Rb] + \text{Reg}[Ri]]$

$D(Rb, Ri)$

$\text{Mem}[\text{Reg}[Rb] + \text{Reg}[Ri] + D]$

$(Rb, Ri, S)$

$\text{Mem}[\text{Reg}[Rb] + S * \text{Reg}[Ri]]$

# Address Computation Examples

|      |        |
|------|--------|
| %edx | 0xf000 |
| %ecx | 0x100  |

| Expression    | Computation      | Address |
|---------------|------------------|---------|
| 0x8(%edx)     | 0xf000 + 0x8     | 0xf008  |
| (%edx,%ecx)   | 0xf000 + 0x100   | 0xf100  |
| (%edx,%ecx,4) | 0xf000 + 4*0x100 | 0xf400  |
| 0x80(,%edx,2) | 2*0xf000 + 0x80  | 0x1e080 |

# `leal` Instruction for Address Computation

`leal Src, Dest`

- *Src* is indexed address mode expression
- Set *Dest* (must be register) to address denoted by expression

You should think of this as “Compute Using Effective Address”

- Example:

`leal 10(%edx, %edx, 4), %eax`

$$\underbrace{\quad}_{\%edx} \underbrace{+ 4 * \underbrace{\%edx}_{\%edx}}_{\%edx} + 10$$

$$= 5 * \%edx + 10$$

Therefore “ $\%eax = 5 * \%edx + 10$ ”

- Compare to:

`movl 10(%edx, %edx, 4), %eax`

means “ $\%eax = \text{Mem}[5 * \%edx + 10]$ ”

# **leal Instruction for Address Computation**

## **Uses**

- Computing address without doing memory reference
  - E.g., translation of  $p = \&x[i]$ ;
- Computing arithmetic expressions of the form  $x + k^*y$ 
  - $k = 1, 2, 4$ , or  $8$ .

# An Arithmetic Example in Assembly

```
int arith  
  (int x, int y, int z)  
{  
    int t1 = x+y;  
    int t2 = z+t1;  
    int t3 = x+4;  
    int t4 = y * 48;  
    int t5 = t3 + t4;  
    int rval = t2 * t5;  
    return rval;  
}
```

```
arith:  
  pushl %ebp  
  movl %esp,%ebp  
  
  movl 8(%ebp),%eax  
  movl 12(%ebp),%edx  
  leal (%edx,%eax),%ecx  
  leal (%edx,%edx,2),%edx  
  sall $4,%edx  
  addl 16(%ebp),%ecx  
  leal 4(%edx,%eax),%eax  
  imull %ecx,%eax  
  
  movl %ebp,%esp  
  popl %ebp  
  ret
```



# Understanding arith

```
int arith  
    (int x, int y, int z)  
{  
    int t1 = x+y;  
    int t2 = z+t1;  
    int t3 = x+4;  
    int t4 = y * 48;  
    int t5 = t3 + t4;  
    int rval = t2 * t5;  
    return rval;  
}
```



```
movl 8(%ebp),%eax  
movl 12(%ebp),%edx  
leal (%edx,%eax),%ecx  
leal (%edx,%edx,2),%edx  
sall $4,%edx  
addl 16(%ebp),%ecx  
leal 4(%edx,%eax),%eax  
-36imull %ecx,%eax
```

# Understanding arith

```
int arith  
    (int x, int y, int z)  
{  
    int t1 = x+y;  
    int t2 = z+t1;  
    int t3 = x+4;  
    int t4 = y * 48;  
    int t5 = t3 + t4;  
    int rval = t2 * t5;  
    return rval;  
}
```



|                         |                      |
|-------------------------|----------------------|
| movl 8(%ebp),%eax       | # eax = x            |
| movl 12(%ebp),%edx      | # edx = y            |
| leal (%edx,%eax),%ecx   | # ecx = x+y (t1)     |
| leal (%edx,%edx,2),%edx | # edx = 3*y          |
| sall \$4,%edx           | # edx = 48*y (t4)    |
| addl 16(%ebp),%ecx      | # ecx = z+t1 (t2)    |
| leal 4(%edx,%eax),%eax  | # eax = 4+t4+x (t5)  |
| -37imull %ecx,%eax      | # eax = t5*t2 (rval) |

# Understanding arith

```
int arith
    (int x, int y, int z)
{
    int t1 = x+y;
    int t2 = z+t1;
    int t3 = x+4;
    int t4 = y * 48;
    int t5 = t3 + t4;
    int rval = t2 * t5;
    return rval;
}
```



```
movl 8(%ebp),%eax          # eax = x
movl 12(%ebp),%edx          # edx = y
leal (%edx,%eax),%ecx      # ecx = x+y (t1)
leal (%edx,%edx,2),%edx      # edx = 3*y
sall $4,%edx                # edx = 48*y (t4)
addl 16(%ebp),%ecx          # ecx = z+t1 (t2)
leal 4(%edx,%eax),%eax      # eax = 4+t4+x (t5)
-38imull %ecx,%eax          # eax = t5*t2 (rval)
```

# Understanding arith

```
int arith
    (int x, int y, int z)
{
    int t1 = x+y;
    int t2 = z+t1;
    int t3 = x+4;
    int t4 = y * 48;
    int t5 = t3 + t4;
    int rval = t2 * t5;
    return rval;
}
```



|                         |                      |
|-------------------------|----------------------|
| movl 8(%ebp),%eax       | # eax = x            |
| movl 12(%ebp),%edx      | # edx = y            |
| leal (%edx,%eax),%ecx   | # ecx = x+y (t1)     |
| leal (%edx,%edx,2),%edx | # edx = 3*y          |
| sall \$4,%edx           | # edx = 48*y (t4)    |
| addl 16(%ebp),%ecx      | # ecx = z+t1 (t2)    |
| leal 4(%edx,%eax),%eax  | # eax = 4+t4+x (t5)  |
| -39imull %ecx,%eax      | # eax = t5*t2 (rval) |

# Understanding arith

```
int arith
    (int x, int y, int z)
{
    int t1 = x+y;
    int t2 = z+t1;
    int t3 = x+4;
    int t4 = y * 48;
    int t5 = t3 + t4;
    int rval = t2 * t5;
    return rval;
}
```



|                         |                      |
|-------------------------|----------------------|
| movl 8(%ebp),%eax       | # eax = x            |
| movl 12(%ebp),%edx      | # edx = y            |
| leal (%edx,%eax),%ecx   | # ecx = x+y (t1)     |
| leal (%edx,%edx,2),%edx | # edx = 3*y          |
| sall \$4,%edx           | # edx = 48*y (t4)    |
| addl 16(%ebp),%ecx      | # ecx = z+t1 (t2)    |
| leal 4(%edx,%eax),%eax  | # eax = 4+t4+x (t5)  |
| -40imull %ecx,%eax      | # eax = t5*t2 (rval) |

Note that compiler can reorder the original code (t2 now computed after t4)

# **Supplementary Slides**

# Assembly Characteristics

## Minimal Data Types

- “Integer” data of 1, 2, or 4 bytes
  - Data values
  - Addresses (untyped pointers)
- Floating point data of 4, 8, or 10 bytes
- No aggregate types such as arrays or structures
  - Just contiguously allocated bytes in memory

## Primitive Operations

- Perform arithmetic function on register or memory data
- Transfer data between memory and register
  - Load data from memory into register
  - Store register data into memory
- Transfer control
  - Unconditional jumps to/from procedures
  - Conditional branches

# Machine Instruction Example

```
int t = x+y;
```

```
addl 8(%ebp),%eax
```

Similar to  
expression  
 $x += y$

```
0x401046: 03 45 08
```

## C Code

- Add two signed integers

## Assembly

- Add 2 4-byte integers
  - “Long” words in GCC parlance
  - Same instruction whether signed or unsigned
- Operands:

|    |          |           |
|----|----------|-----------|
| x: | Register | %eax      |
| y: | Memory   | M[%ebp+8] |
| t: | Register | %eax      |

» Return function value in %eax

## Object Code

- 3-byte instruction
- Stored at address 0x401046

# Object Code

## Code for sum

```
0x401040 <sum>:  
 0x55      • Total of 13  
 0x89      bytes  
 0xe5      • Each  
 0x8b      instruction 1,  
 0x45      2, or 3 bytes  
 0x0c      • Starts at  
 0x03      address  
 0x45      0x401040  
 0x08  
 0x89  
 0xec  
 0x5d  
 0xc3
```

## Assembler

- Translates .s into .o
- Binary encoding of each instruction
- Nearly-complete image of executable code
- Missing linkages between code in different files

## Linker

- Resolves references between files
- Combines with static run-time libraries
  - E.g., code for malloc, printf
- Some libraries are *dynamically linked*
  - Linking occurs when program begins execution

# What Can be Disassembled?

```
% objdump -d WINWORD.EXE

WINWORD.EXE:      file format pei-i386

No symbols in "WINWORD.EXE".
Disassembly of section .text:

30001000 <.text>:
30001000: 55          push    %ebp
30001001: 8b ec        mov     %esp,%ebp
30001003: 6a ff        push    $0xffffffff
30001005: 68 90 10 00 30  push    $0x30001090
3000100a: 68 91 dc 4c 30  push    $0x304cdc91
```

- Anything that can be interpreted as executable code
- Disassembler examines bytes and reconstructs assembly source

# Intel x86 Evolution: Milestones

| <i>Name</i>  | <i>Date</i> | <i>Transistors</i>                                                                                                                            | <i>MHz</i> |
|--------------|-------------|-----------------------------------------------------------------------------------------------------------------------------------------------|------------|
| · 8086       | 1978        | 29K                                                                                                                                           | 5-10       |
|              |             | ■ Intel's first 16-bit processor. Basis for IBM PC & DOS                                                                                      |            |
|              |             | ■ 1MB address space                                                                                                                           |            |
| · 386        | 1985        | 275K                                                                                                                                          | 16-33      |
|              |             | ■ Intel's first 32 bit processor , referred to as IA32                                                                                        |            |
|              |             | ■ Added “flat addressing” and support for MMU with paging                                                                                     |            |
|              |             | ■ Capable of running Unix                                                                                                                     |            |
|              |             | ■ 32-bit Linux/gcc uses no instructions introduced in later models                                                                            |            |
| · Pentium 4F | 2005        | 230M                                                                                                                                          | 2800-3800  |
|              |             | ■ A 64-bit processor. Intel's Itanium 64-bit CPU line debuting in 2001 was less successful due to performance & backward compatibility issues |            |
|              |             | ■ Meanwhile, Pentium 4s (Netburst arch.) phased out in favor of “Core” line                                                                   |            |

# Intel x86 Processors: Overview



IA: often redefined as latest Intel architecture

# Intel x86 Processors, contd.

## ▪ Machine Evolution

|               |      |      |
|---------------|------|------|
| ■ 486         | 1989 | 1.9M |
| ■ Pentium     | 1993 | 3.1M |
| ■ Pentium/MMX | 1997 | 4.5M |
| ■ PentiumPro  | 1995 | 6.5M |
| ■ Pentium III | 1999 | 8.2M |
| ■ Pentium 4   | 2001 | 42M  |
| ■ Core 2 Duo  | 2006 | 291M |



## ▪ Added Features

- Instructions to support multimedia operations
  - Parallel operations on 1, 2, and 4-byte data, both integer & FP
- Instructions to enable more efficient conditional operations

## ▪ Linux/GCC Evolution

- Very limited

# Pentium Pro (P6)

## History

- Announced in Feb. '95
- Basis for Pentium II, Pentium III, and Celeron processors
- Pentium 4 similar idea, but different details

## Features

- Dynamically translates instructions to more regular format
  - Very wide, but simple instructions
- Executes operations in parallel
  - Up to 5 at once
- Very deep pipeline
  - 12–18 cycle latency

# Pentium Pro Block Diagram



Microprocessor Report  
2/16/95

# PentiumPro Operation

Translates instructions dynamically into “Uops”

- 118 bits wide
- Holds operation, two sources, and destination

Executes Uops with “Out of Order” engine

- Uop executed when
  - Operands available
  - Functional unit available
- Execution controlled by “Reservation Stations”
  - Keeps track of data dependencies between uops
  - Allocates resources

Consequences

- Indirect relationship between IA32 code & what actually gets executed
- Tricky to predict / optimize performance at assembly level

# New Species: IA64, then IPF, then Itanium,...

| <i>Name</i>                                | <i>Date</i> | <i>Transistors</i>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|--------------------------------------------|-------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| · Itanium                                  | 2001        | 10M                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|                                            |             | <ul style="list-style-type: none"><li>■ First shot at 64-bit architecture: first called IA64<ul style="list-style-type: none"><li>■ DEC's Alpha CPU was 64-bit in 1992, Sun SPARC had 64-bit in 1995</li></ul></li><li>■ Radically new VLIW instruction set designed for high performance</li><li>■ Can run existing IA32 programs in compatibility mode<ul style="list-style-type: none"><li>● On-board “x86 engine” – not great performance</li></ul></li><li>■ Joint project with Hewlett-Packard</li></ul> |
| · Itanium 2                                | 2002        | 221M                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|                                            |             | <ul style="list-style-type: none"><li>■ Big performance boost</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| · Itanium 2 Dual-Core                      | 2006        | 1.7B                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| · Itanium has not taken off in marketplace |             | <ul style="list-style-type: none"><li>■ Lack of backward compatibility, no good compiler support,</li><li>– Pentium 4 got too good</li></ul>                                                                                                                                                                                                                                                                                                                                                                   |

# x86 Clones: Advanced Micro Devices (AMD)

- **Historically**

- AMD has followed just behind Intel
  - A little bit slower, a lot cheaper

- **Then**

- Recruited top circuit designers from Digital Equipment Corp. and other downward trending companies
  - Built Opteron: tough competitor to Pentium 4
  - In 2003, developed x86-64, their own extension to 64 bits

- **Recently**

- Intel much quicker with dual core design
  - Intel currently far ahead in performance
  - em64t backwards compatible to x86-64

# Intel's 64-Bit

- Intel Attempted Radical Shift from IA32 to IA64
  - Totally different architecture (Itanium)
  - Executes IA32 code only as legacy
  - Performance disappointing
- AMD Stepped in with Evolutionary Solution
  - x86-64 (now called “AMD64”)
- Intel Felt Obligated to Focus on IA64
  - Hard to admit mistake or that AMD is better
- 2004: Intel Announces EM64T extension to IA32
  - Extended Memory 64-bit Technology
  - Almost identical to x86-64!
- Meanwhile: EM64t well introduced, wide adoption now
- Our focus will be on 32-bit assembly language programming
  - But we will point out 64-bit equivalents as we go

# Other Processors

## ARM-based RISC processors

- Large market in mobile smartphones, e.g. Android phones and tablets as well as iPhone/iPad
- low power, low cost, reasonably high performance for mobile space

## CISC vs. RISC – not evident there's a clear winner

- Market forces affect outcome, not just technical issues
- For more, see RISC vs. CISC and RISC.

# CISC Properties

**Instruction can reference different operand types**

- Immediate, register, memory

**Arithmetic operations can read/write memory**

**Memory reference can involve complex computation**

- $R_b + S^*R_i + D$
- Can be used for arithmetic!

**Instructions can have varying lengths**

- IA32 instructions 1-15 bytes

# RISC Properties

**Only load/store instructions can access and move to/from memory**

**Data instructions only manipulate registers, not memory**

**Each instruction takes one clock cycle**

**Complex computations are performed by many instructions**

**Deep pipelining**

# CISC vs RISC

**“A common misunderstanding of the phrase "reduced instruction set computer" is the mistaken idea that instructions are simply eliminated, resulting in a smaller set of instructions. In fact, over the years, RISC instruction sets have grown in size, and today many of them have a larger set of instructions than many CISC CPUs.”**

and

**“The term "reduced" in that phrase was intended to describe the fact that the amount of work any single instruction accomplishes is reduced—at most a single data memory cycle—compared to the "complex instructions" of CISC CPUs that may require dozens of data memory cycles in order to execute a single instruction.”**

-- Wikipedia

# Other Processors (2)

Intel's Atom targeting small/embedded low power space too, e.g. netbooks

- IA32-compatible

## Graphics Processing Units (GPUs)

- Designed for highly parallel data processing, such as computing the 3D rendering of a graphics scene
- General-purpose CPUs are sequential with some parallelism

# Data Representations: IA32 + x86-64

- Sizes of C Objects (in Bytes)

| <i>C Data Type</i>   | <i>Typical 32-bit</i> | <i>Intel IA32</i> | <i>x86-64</i> |
|----------------------|-----------------------|-------------------|---------------|
| • <b>unsigned</b>    | 4                     | 4                 | 4             |
| • <b>int</b>         | 4                     | 4                 | 4             |
| • <b>long int</b>    | 4                     | 4                 | 8             |
| • <b>char</b>        | 1                     | 1                 | 1             |
| • <b>short</b>       | 2                     | 2                 | 2             |
| • <b>float</b>       | 4                     | 4                 | 4             |
| • <b>double</b>      | 8                     | 8                 | 8             |
| • <b>long double</b> | 8                     | 10/12             | 16            |
| • <b>char *</b>      | 4                     | 4                 | 8             |

*Or any other pointer*

# Whose Assembler?

## Intel/Microsoft Format

```
lea    eax, [ecx+ecx*2]
sub   esp, 8
cmp   dword ptr [ebp-8], 0
mov   eax, dword ptr [eax*4+100h]
```

## GAS/Gnu Format

```
leal  (%ecx,%ecx,2),%eax
subl $8,%esp
cmpl $0,-8(%ebp)
movl $0x100(,%eax,4),%eax
```

## Intel/Microsoft Differs from GAS in syntax

- Operands listed in opposite order

mov Dest, Src

movl Src, Dest

- Constants not preceded by '\$', Denote hex with 'h' at end

100h

\$0x100

- Operand size indicated by operands rather than operator suffix

sub

subl

- Addressing format shows effective address computation

[eax\*4+100h]

\$0x100(,%eax,4)

# C operators – Assembly Equivalents?

## Operators

( ) [ ] -> .  
! ~ ++ -- + - \* & (type) sizeof  
\* / %  
+ -  
<< >>  
< <= > >=  
== !=  
&  
^  
|  
&&  
||  
?:  
= += -= \*= /= %= &= ^= != <<= >>=  
,

## Associativity

left to right  
right to left  
left to right  
right to left  
right to left  
left to right

**Note: Unary +, -, and \* have higher precedence than binary forms**