

# Notes ASC - cours + carte

## Cuprins

- 2-7 flaguri + registri
- 8 op. perf. by ALU
- 9-11 2' complement + representations
- 12-17 Overflow
- 18-21 { Address registers  
Address computation
- 22-25 Far and near address
- 26-27 Pointer arithmetic
- 28-37 Assembly language basics
- 38-42 Instruction
- 42-43 .....
- 43-49 Machine inst. nplus.
- 49-51 Multimodul

# I The x86 Microprocessor architecture



## Registers

EAX - accumulator

EBX - base register

ECX - counter register - used mostly for repetitive loops

EDX - data register - mostly used with EAX  
when the result exceeds a doubleword  
(flosit if we dd \* dd -> dg)

lword - 16 bits

dword - 32 bits

qword - 64 bits

ESP - (stack pointer) - points to the last element put on the stack (top)

EBP - (base pointer) - points to the first elem. put on the stack (base)

ESP + EBP = stack registers (LIFO memory area)

EDI - destination index } used for accessing  
ESI - source index      } element  
index registers

! All of them are dword registers (32 bits)

=> there are also 16 bits registers which contains (each) other 2 registers that are 8 bits



$a[1042] = * (a + 1042) \Rightarrow$  ~~xx~~ odd pointers

$\downarrow$        $\downarrow$        $\downarrow$   
 address base index  
 (displacement)

$x = x + 1$   
 $\downarrow$        $\searrow$   
 address      continuous link  
 $\downarrow$       dim memory

## Flags

CF - (carry flag) = the transport flag

set to 1 = (unsigned overflow) and esto  
 o cifra im plus (in afloa dom. de repres)

set to 0 = otherwise

$$\begin{array}{r}
 1001\ 0011 + 147 + \\
 0111\ 0011 \\
 \hline
 0000\ 0110
 \end{array}$$

$\Rightarrow$  carry digit  $\Rightarrow \boxed{CF = 1}$

$$\begin{array}{r} 93h + \\ 73h \\ \hline 106h \end{array}$$

$$\begin{array}{r} -109 + \\ 115 \\ \hline 06 \end{array}$$

$93h = (10010011)_2$        $\begin{cases} 154 & \text{unsigned} \\ -109 & \text{signed} \end{cases}$

PF - (parity flag) - if the representation of the least significant byte has an odd number of '1'  $\Rightarrow PF=0$ , otherwise  $PF=1$  (nr. par de 1)

AF - (auxiliary flag) - dacă există un transport carry de la bitul 3 la bitul 4

ZF - (zero flag) -

$$\begin{array}{r} 0000 \quad 0101 \\ 0000 \quad 0001 \\ \hline 0000 \quad 0110 \end{array} \Rightarrow \text{non-zero result} \Rightarrow ZF=0$$

$$\begin{array}{r} 0000 \quad 0011 \\ 1111 \quad 1101 \\ \hline 0000 \quad 0000 \end{array} \Rightarrow \text{zero result} \quad ZF=1$$

SF - (sign flag) - the most significant bit is 1  $\Rightarrow$  SF = 1, otherwise SF = 0

$$\begin{array}{r} 1000 \quad 0000 \\ 1111 \quad 1111 \\ \hline 0111 \quad 1111 \end{array}$$

$\Rightarrow$  SF = 0

TF (trap flag) - debugging flag

IF (interrupt flag)

DF (direction flag) - for operating string functions

set to 0 - string parsing in ascending order

set to 1 - string parsing in descending order

OF (overflow flag) - signed overflow  
if the result (signed interpretation) didn't

fit the reserved space  $OF=1$  otherwise

$OF=0$

## Flags categories

- a) set as direct effect of the LPO (*last performed operation*) - CF, PF, AF, ZF, SF, OF
- b) flags set by the programmer to influence the next instructions - CF, AF, TF, IF



this are set by special instructions

CLC - CF=0      STC - CF=1      CMC - complements the value of CF

CLD - AF=0      STD - AF = 1

CLI - IF=0      STI - IF = 1      => mu merge μ 32 bits programming

**Flag** = indicator represented on one bit. A configuration of the flag briefing for every instruction's execution

**Flag Register** = 16 bits, but only 9 are used

## II Operations performed by ALU

ALU = arithmetic logic unit

- ① Arithmetic operators (\*, :)
- ② bit-shifting operators
- ③ logical operations (AND, OR, XOR, NOT)

→ it sets the flags

$$\begin{array}{r} C^f=1 & 1001\ 0011 + \\ & \swarrow 0111\ 0011 \\ & \textcircled{1} \underline{\underline{0}}0000110 \end{array}$$

$SF=0$

mibble = semioctet

- A representation in base 2 has always two possible interpretation in base 10

rechte : signed [-128, 127]  
unsigned [0, 127]

### III Signed and unsigned representation

unsigned : can represent only positive numbers

the unsigned repres. of a pos. number =  
the number in base 2

signed : can represent positive and negative

14 on 8 bits: 0001 0001

-14 on 8 bits: 1110 1111

1 0000 0000 -

0001 0001

1110 1111

# Interpretation

unsigned :

$$\begin{aligned} 1110\ 1111 &= 1 \cdot 2^7 + 1 \cdot 2^6 + 1 \cdot 2^5 + 0 \cdot 2^4 + 1 \cdot 2^3 + 1 \cdot 2^2 + 1 \cdot 2^1 + 1 \cdot 2^0 \\ &= 239 \end{aligned}$$

signed :

① 1110 1111 → we keep this and inverse the rest

↓ it is a negative number

$$\begin{aligned} ② 1110\ 1111 &= 0001\ 0001 = 1 \cdot 2^4 + 1 \cdot 2^0 = 17 \\ &\Rightarrow -17 \end{aligned}$$

## 2's complement

"we interpret representations and represent interpretations"

$$\begin{aligned} 8 \text{ bits} \quad 2^8 &= 256 \Rightarrow \{\text{0, } 255\} \text{ or } \{-128, 127\} \\ 16 \text{ bits} \quad 2^{16} &= 65536 \Rightarrow \{\text{0, } 65535\} \text{ or } \{-32768, +32767\} \end{aligned}$$

$$\textcircled{1} \quad 1001\ 0011 \Rightarrow 0110\ 1101 = 109 = -109$$

$93_{10} = 1001\ 0011$

$$0110\ 1101 = 109 = -109$$

A number that starts with 0 in base 2 has the same value in base 10 in both signed and unsigned

The minimum of bits to represent

$O_m$  "bits" we may represent  
 $2^m$  values : unsigned values  $\{0 \dots 2^m - 1\}$   
: signed values  $\{-2^{(m-1)}, 2^{(m-1)} - 1\}$

on 2 bits  $\Rightarrow \{0, 3\} \quad \{-2, +1\}$  } we can represent  
3 on 2 bits,  
but not -3  
 $\Rightarrow -3$  can be represented on 3 bits

## IV Overflow concept

Overflow = situation / condition which expresses the fact that the result of the last operation performed didn't fit the reserved space for it or does not belong to the admissible representation interval for that size or that operation is a mathematical nonsense in that particular interpretation (signed or unsigned)

$$\begin{array}{r} 1001\ 0011 + \\ \underline{1011\ 0011} \\ \hline \textcolor{red}{1} 0100\ 0110 \end{array}$$

$$\begin{array}{r} 147 + \\ \underline{179} \\ \hline \textcolor{blue}{326} \end{array}$$

$$\begin{array}{r} 93h + \\ \underline{B3h} \\ 1\ 46h \end{array}$$

$$\begin{array}{r} -109 + \\ \underline{-77} \\ \hline \textcolor{blue}{-186} \end{array} !!!$$

} both OF = 1 CF = 1

OF - adding two positive numbers and getting a negative result  
- adding two negative numbers and

getting a positive result  
- when an operation exceeds the signed number range

1001 0011 +

0111 0011  
0000 0110

1001 0011 =  $1 \cdot 2^4 + 1 \cdot 2^3 + 1 \cdot 2^2 + 1 \cdot 2^0 = 15$  ✓  
⇒

0110 1101 =  $1 \cdot 2^6 + 1 \cdot 2^5 + 1 \cdot 2^3 + 1 \cdot 2^2 + 1 \cdot 2^0 = -109$

0111 0011 =  $1 \cdot 2^6 + 1 \cdot 2^5 + 1 \cdot 2^4 + 1 \cdot 2^2 + 1 \cdot 2^0 = 115$

⇒ signed = unsigned 115

CF is set to 1

OF is set to 0 (0000 0110 = 6 which

fits on [-128, 127]

0110 0010 - (98)

1100 1000  
1001 1010 (200)

$$\begin{array}{rcl} 1001 \ 1010 & = & 154 \quad \text{unsigned} \\ & = & -102 \quad \text{signed} \end{array}$$

$$\begin{array}{rcl} 0110 \ 0010 & = & 98 \quad \text{unsigned} \\ & = & 98 \quad \text{signed} \end{array}$$

$$\begin{array}{rcl} 1100 \ 1000 & = & 200 \quad \text{unsigned} \\ & = & -56 \quad \text{signed} \end{array}$$

unsigned

$$\begin{array}{r} 98 \\ - \\ 200 \\ \hline -102 \end{array} \Rightarrow \text{not OK} \Rightarrow CF=1$$

it is diff. from 154

signed

$$\begin{array}{r} 98 \\ - \\ (-56) \\ \hline 154 \end{array} \Rightarrow \text{not OK} \Rightarrow OF=1$$

it is diff. from -102

The multiplication operand does not produce overflow  $\Rightarrow$  the space reserved is enough for both interpretations

Anyway,  $b * b = b$  }  
 $w * w = w$  }  
 $d * d = d$  }

$CF = OF = 0$

$b * b = w$  }  
 $w * w = d$  }  
 $d * d = \text{gxxord}$  }

$CF = OF = 1$

| the worst effect is in the case of  
• **division** : if the quotient (catul) does not fit in the reserved space  $w * 1/b = b$ ,  
 $w = dd / w$ ,  $gw / dd = dd$ )  $\Rightarrow$  division overflow will signal „Run-time error“

Why do we need CF and OF simulation?

=> when performing add / sub operation in base 2, in fact 2 operations are actually performed simultaneously in base 10 : one in signed version and the other in unsigned

BUT

- unlike add and sub , which work the same in base 2 (regardless of the signed / unsigned) , multiplication and division work differently in the signed and unsigned case

Multiplication and division do not work the same binary in the 2 interpretations => this is why we need to specify before

the operation how we want the  
operands to be interpreted

=> mul / div (unsigned)

imul / idiv (signed)

also, we have ADC (add with  
carry) that is used when we  
want to add things bigger  
than 8 bits

• the same for SBB (subtract  
with borrow)

## V Address registers

## Address computation

- Address of a memory location - no. of consecutive bytes from the beginning of the RAM memory and the beginning of that memory location
- segment - a logical section of a program's memory (featured by its basic address (beginning), by its limit (size) and by its type)

Segment - block of memory (physical segment)  
- variable-sized block of memory (logical segment) occupied by a program's code or data

offset = the number of bytes between the beginning of that segment and that particular memory location

segment selector = numeric value of 16 bits which selects uniquely the accessed segment and has features



$$a_7a_6a_5a_4a_3a_2a_1a_0 := b_7b_6b_5b_4b_3b_2b_1b_0 + 0706050403020100$$

computed address in hexa      base address      address specification

An address specification is also named  
**FAR address**

The x86 processors have a memory access control mechanism called **paging**, which is independent of address segmentation.

**Paging** implies dividing the virtual memory into pages, which are associated (translated) to the physical memory

The x86 architecture allows 4 types of segments:



the needed information for identifying  
the current active segment selector

$$a[\gamma] = *(\overset{\text{pointed}}{a + \gamma}) \text{ arithmetic}$$

↓  
dereferencing  
operator

{ FS, GS = extended, the appeared  
on the 32 bits programming  
EIP = extension of IP (instruction  
pointer)

CS:EIP → the address of the current  
executing instruction

• size of an instruction = 1-15 bits

At any given moment during run  
time there is only one **active**

segment of any type

## Far addresses and Near addresses

- to address a RAM memory location two values are needed
  - 1) to indicate the segment
  - 2) to indicate the offset inside the segment

**Near address** = an address for which only the offset is specified (the segment address being implicitly taken from a segment register)

! always inside one of the 4 active segments

**Far address** = an address for which the programmer explicitly specifies a

segment selector and it can be specified in one of the following 3 ways - offset specification :  $S_3, S_2, S_1, S_0$

- segment register : CS, SS, ES  
FS or GS

- FAR [variable], where variable is of type farword and contains the 6 bytes representing the FAR address

! There are 3 ways of expressing an operand :

i) register mode : mov  $\text{ECX}, 17$   
 $\hookrightarrow$  specified in register mode

ii) immediate mode : mov ECX, 17  
giving you right now  
the constant

iii) memory addressing mode: `mov eax, [a+7]`  
(every variable is in memory, but not  
everything from the memory is a variable)

### OFFSET ADDRESS

$[\text{base}] + [\text{index} * \text{scale}] + [\text{constant}]$



the content of one of the registers EAX, EBX, ECX, EDX, EBP, ESI, EDI or ESP as **base**;  
the content of one of the registers EAX, EBX, ECX, EDX, EBP, ESI or EDI as **index**;  
scale to multiply the value of the index register with 1, 2, 4 or 8;  
the value of a numeric constant, on a byte, word or on a doubleword.

`[ ]` - optionality (not all 3 of them  
are needed)

## Modus to address the memory

- direct addressing -> only the constant is present
- base addressing -> if there is one of the base registers
- scale-indexed addressing -> if there is one of the index registers present

indirect addressing  $\Leftrightarrow$  we have at least one register between [ ]

jmp instructions  $\Rightarrow$  relative addressing

indicates the position of the next instruction to be run relative to the current position

## VI Pointers arithmetic

= the set of arithmetic operations allowed to be performed with pointers , this meaning using arithmetic expressions which have addresses as operands

Only 3 operations

① subtracting two addresses

address - address = OK ( $g - p$  = substr of 2 pointers)  $\Rightarrow$  scalar value

② adding a numerical value to a pointer  $\Rightarrow$  a pointer

address + numerical constant (identification of an element by indexing -  $a[7]$ )  $g + g$

③ subtracting a numerical constant from a pointer  $\Rightarrow$  a pointer

address - numerical constant

var - this is an address

[var] - is its contents

L-value vs R-value. LHS vs RHS of an assignment



mov eax, [esp \* 5] - syntax error  
↳ cannot be index

mov eax, ebx OK!

mov eax, [ebx \* 3] OK!

mov eax, [ebx + ebx \* 3]

`mov eax, [ebp + 4]` - syntax errors

## VII Assembly language basics

an assembler  $\Rightarrow$  generates bytes

an assembled works with:

labels = user-defined names for pointing  
to data or memory areas

instructions = mnemonics which suggests  
the underlying action

directives = indications given to the  
assembler for correctly generating  
the corresponding bytes

\$ location counter - the size of an array  
= an offset

\$ - \$\$ = the distance from the beginning of the  
segment or a scalar = current size of section

\$ = an offset = address = pointer type  
address of here

$\$ \$$  = an offset = address = pointer type  
address of start of current section

## Source line format

[label [:]] [prefixes] [mnemonic] [operands] ; [comments]

ex: jmp here ; label + mnemonic + operand + comment  
repz cmpsd; prefix + mnemonic + comment

## Categories of labels

- 1) **code labels** => at the level of instructions sequence for defining the destinations of the control transfers during program execution
- 2) **data labels** => symbolic identification for some memory location

[v] => the value of the variable  
v => the address of the variable

Ex: `mov EAX, et`  $\Rightarrow$  the address (offset)  
`mov EAX, [et]`  $\Rightarrow$  the content from the  
address et  
`lea eax, [v]`  $\Rightarrow$  the address (offset)

**mnemonics**  $\leftarrow$  instruction names  
                  directives names

- instruction - guides the processor
- directives - guides the assembler

## Expressions

expression - operands + operators

## Operands specification modes

- 1) immediate operands  $\rightarrow$  the value is computed at assembly time
- 2) memory operands  $\rightarrow$  the value is computed at loading time

3) register operands  $\rightarrow$  the value is computed at run-time

① immediate operands = constant numeric  
- binary, octal, decimal, hexa values

② register operands

- direct using `mov eax, ebx`
- indirect usage and addressing `mov eax,[ebx]`

③ memory addressing operands

direct addressing operand = constant or symbol representing the address (labels, procedures names, values of the loc counter)

- its offset is computed at assembly time
- the effective address always refers to a segment register
  - CS - for code labels
  - SS - when using EBP or ESP as base

DS - for the rest of data accesses

indirect addressing operands  $\rightarrow$  use

registers for pointing to memory address

[base\_register + index\_register \* scale + constant]

example for DS, SS

mov eax, [ebx] DS: [ebx]

mov eax, [ebp] SS: [ebp]

mov eax, [ebp $\leftarrow 2\right] \Leftrightarrow$  mov eax, [ebp + ebp] SS

mov eax, [ebp $\times 3\right] \Leftrightarrow$  mov eax, [ebp + ebp $\times 3\right] SS$

mov eax, [ebp $\leftarrow 4\right] DS$  (ebp nu mai e baza)

mov eax, [ebx + esp] } SS

mov eax, [esp + ebx] }

mov eax, [ebx + esp $\times 2\right] DS$   
  ↳ base

mov eax, [ebx + ebp] DS

mov eax, [ebp + ebx] SS

## Using operators

operators = used for combining, comparing, modifying and analyzing the operands

{ + - performs addition at assembly time  
ADD - performs addition during run-time

## Bit-shifting operators

expression  $\gg$  how many

expression  $\ll$  how many

mov ax, 01110111b  $\ll$  3; AX = 00000011 10111000b

add bh, 01110111b  $\gg$  3; the source operator is 00001110b

## Bitwise operators

| OPERATOR | SYNTAX               | MEANING         |
|----------|----------------------|-----------------|
| $\sim$   | $\sim$ expresie      | Bits complement |
| &        | expr1 & expr2        | Bitwise AND     |
|          | expr1   expr2        | Bitwise OR      |
| $\wedge$ | expr1 $\wedge$ expr2 | Bitwise XOR     |

Examples (we assume that the expression is represented on a byte):

|                                                                                                 |                    |           |
|-------------------------------------------------------------------------------------------------|--------------------|-----------|
| $\sim 11110000b$                                                                                | ; output result is | 00001111b |
| $01010101b \& 11110000b$                                                                        | ; output result is | 01010000b |
| $01010101b   11110000b$                                                                         | ; output result is | 11110101b |
| $01010101b ^ 11110000b$                                                                         | ; output result is | 10100101b |
| ! - logical negation (similar with C) ; $!0 = 1$ ; $!(\text{anything different from zero}) = 0$ |                    |           |

$\&$  - and       $1 - 0 \rightarrow 0$   
 $0 - 1 \rightarrow 0$   
 $1 - 1 \rightarrow 1$   
 $0 - 0 \rightarrow 0$

$\sim$  not       $0 \rightarrow 1$   
 $1 \rightarrow 0$

$|$  or       $1 - 0 \rightarrow$   
 $0 - 1 \rightarrow 1$   
 $1 - 1 \rightarrow 1$   
 $0 - 0 \rightarrow 0$

$\wedge$  xor       $1 - 0 \rightarrow 1$   
 $0 - 1 \rightarrow 1$   
 $1 - 1 \rightarrow 0$   
 $0 - 0 \rightarrow 0$

xor ax, ax  $\Rightarrow$  ax = 0

## The segment specification operator

(:) perform the far address computation  
of a variable or label

segment : expression

ex:    [ss:bx+4] offset relative to ss  
      [es:082h] offset relative to es

## Type operators

• they specify the type of some expressions  
or operands

byte[A]

dw/dword[A]

byte / word / dword / gword specifiers have

always only the task of clarifying an  
ambiguity

ex: mov [v], 0  $\Rightarrow$  syntax error  
- operation size not specified

examples where a data size specifier is needed

mov [mem], 12

(i) div [mem]

(i) mul [mem]

push [mem]

pop [mem]

## Directives

- 1) the segment directive = allows targeting the bytes of code or data
- the numeric value assigned to the segment name is the segment address (32 bits)

- the optional arguments give to the linker-editor and the assembler the necessary info. regarding the way in which segments must be loaded and combined in memory

the [type] allows selecting the usage mode of the segment

- code (or text) - the segment will contain code
- data (or bss) - data segment
- rdata - def. of constant data

### Data definition derivatives

data definition = declaration + allocation

# VIII

## Assembly language instructions

|                           |                                                                                                                                                         |
|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| <b>MOV</b> <i>d,s</i>     | <i>&lt;d&gt; &lt;-&gt; &lt;s&gt;</i><br>(b-b, w-w, d-d)                                                                                                 |
| <b>PUSH</b> <i>s</i>      | $\text{ESP} = \text{ESP} - 4$ and transfers („pushes”) <i>&lt;s&gt;</i> in the stack (s – doubleword)                                                   |
| <b>POP</b> <i>d</i>       | Eliminates („pops”) the current element from the top of the stack and transfers it to <i>d</i> ( <i>d</i> – doubleword) ; $\text{ESP} = \text{ESP} + 4$ |
| <b>XCHG</b> <i>d,s</i>    | <i>&lt;d&gt; <math>\leftrightarrow</math> &lt;s&gt;</i> ; <i>s,d</i> – have to be L-values !!!                                                          |
| <b>[reg_segment] XLAT</b> | $\text{AL} \leftarrow \text{DS:[EBX+AL]}$ or $\text{AL} \leftarrow \text{segment:[EBX+AL]}$                                                             |
| <b>CMOVcc d, s</b>        | <i>&lt;d&gt; <math>\leftarrow</math> &lt;s&gt;</i> if cc (conditional move) is true                                                                     |
| <b>PUSHA / PUSHAD</b>     | Pushes in the stack EAX, ECX, EDX, EBX, ESP, EBP, ESI and EDI                                                                                           |
| <b>POPA / POPAD</b>       | Pops EDI, ESI, EBP, ESP, EBX, EDX, ECX and EAX from stack                                                                                               |
| <b>PUSHF</b>              | Pushes EFlags in the stack                                                                                                                              |
| <b>POPF</b>               | Pops the top of the stack and transfers it to Eflags                                                                                                    |
| <b>SETcc d</b>            | <i>&lt;d&gt; <math>\leftarrow</math> 1</i> if cc is true, otherwise <i>&lt;d&gt; <math>\leftarrow</math> 0</i> (byte set on condition code)             |

push s  
pop d

} *s,d* must be **doublewords**

push eax  $\Leftrightarrow$  sub esp, 4 ; prepare (allocate) space in order to store the value  
mov [esp], eax ; store the value in the allocated space

pop eax  $\Leftrightarrow$  mov eax, [esp] ; load in eax the value from the top of the stack  
add esp, 4 ; clear the location

**XCHG** op<sup>1</sup>, op<sup>2</sup>  $\Rightarrow$  interchange variables

[reg-segment] **XLAT**

- data in ebx original type pure in al ebx+al

ex: `lbox a b c d e f }` =>  
`al 2`

=> im al o nā purnā codul ascii  
al lui c

**LEA** gen-rg, content of mem-of

`lea eax, [v]` => loads the offset  
of v

(=> `mov eax, v`)

**pushf** transfers all the flags on  
top of the stack

**popf**

## Type conversion instructions

|                   |                                                                                                                     |
|-------------------|---------------------------------------------------------------------------------------------------------------------|
| <b>CBW</b>        | converts the byte from AL to the word in AX (sign extension)                                                        |
| <b>CWD</b>        | converts the word from AX to the doubleword in DX:AX (sign extension)                                               |
| <b>CWDE</b>       | converts the word from AX to the doubleword in EAX (sign extension)                                                 |
| <b>CDQ</b>        | converts the doubleword from EAX to the quadword in EDX:EAX (sign extension)                                        |
| <b>MOVZX d, s</b> | loads in d (REGISTER !), which must be of size larger than s (reg/mem), the UNSIGNED contents of s (zero extension) |
| <b>MOVSX d, s</b> | load in d (REGISTER !), which must be of size larger than s (reg/mem), the SIGNED contents of s (sign extension)    |

[http://www.c-jump.com/CIS77/ASM/DataTypes/T77\\_0270\\_sext\\_example\\_movsx.htm](http://www.c-jump.com/CIS77/ASM/DataTypes/T77_0270_sext_example_movsx.htm) !!!!!!!!

# Shifts and rotates

Bit shifting instructions can be classified in the following way:

- Logic shifting instructions
  - left - **SHL**
  - right - **SHR**

- Arithmetic shifting instructions
  - left - **SAL**
  - right - **SAR**

Bit rotating instructions can be classified in the following way:

- Rotating instructions without carry
  - left - **ROL**
  - right - **ROR**

- Rotating instructions with carry
  - left - **RCL**
  - right - **RCR**

For giving a suggestive definition for shifts and rotates let's consider as an initial configuration one byte having the value  $X = abcdefgh$ , where a-h are binary digits, h is the least significant bit, bit 0, a is the most significant one, bit 7, and k is the actual value from CF (CF=k). We then have:

SHL X,1 ;has the effect  $X = bcdefgh0$  and CF = a  
SHR X,1 ;has the effect  $X = 0abcdefg$  and CF = h  
SAL X,1 ; identically to SHL  
SAR X,1 ;has the effect  $X = aabcdefg$  and CF = h  
ROL X,1 ;has the effect  $X = bcdefgha$  and CF = a  
ROR X,1 ;has the effect  $X = habcdefg$  and CF = h  
RCL X,1 ;has the effect  $X = bcdefghk$  and CF = a  
RCR X,1 ;has the effect  $X = kabcdefg$  and CF = h

# Branching, jumps , loops

- jmp operand - unconditional jump
- call operand - transfer control to the procedure identified by operand
- ret [m] - transfer control to the first instruction after CALL

**jmp**      **oprand**  
                label / register / mem. address

```
mov ax,1  
jmp AdunaDoi  
AdunaUnu: inc ax  
            jmp urmare  
AdunaDoi: add ax,2  
urmare: . . .
```

## Conditional jump instructions

cmp d, A - fictious subst d-1  
test d, A - d AND

$OF = 0$   
 $CF = 0$

$OF, SF, ZF, AF, PF$  and  
 $CF$

$SF, ZF, PF$  - modified

## Repetitive instructions

LOOP → repetitive run of instruction block, as long as the value of the

CX register is different from 0

## CALL and RET instructions

a CALL instruction may be

- a procedure name
- the name of a register containing an address
- a memory address

| MNEMONICA         | SEMNIFICATIE (salt dacă..<<relație>>)                             | Condiția verificată |
|-------------------|-------------------------------------------------------------------|---------------------|
| JB<br>JNAE<br>JC  | este inferior<br>nu este superior sau egal<br>există transport    | CF=1                |
| JAE<br>JNB<br>JNC | este superior sau egal<br>nu este inferior<br>nu există transport | CF=0                |
| JBE<br>JNA        | este inferior sau egal<br>nu este superior                        | CF=1 sau ZF=1       |
| JA<br>JNBE        | este superior<br>nu este inferior sau egal                        | CF=0 și ZF=0        |
| JE<br>JZ          | este egal<br>este zero                                            | ZF=1                |
| JNE<br>JNZ        | nu este egal<br>nu este zero                                      | ZF=0                |
| JL<br>JNGE        | este mai mic decât<br>nu este mai mare sau egal                   | SF≠OF               |
| JGE<br>JNL        | este mai mare sau egal<br>nu este mai mic decât                   | SF=OF               |
| JLE<br>JNG        | este mai mic sau egal<br>nu este mai mare decât                   | ZF=1 sau SF≠OF      |
| JG<br>JNLE        | este mai mare decât<br>nu este mai mic sau egal                   | ZF=0 și SF=OF       |
| JP<br>JPE         | are paritate<br>paritatea este pară                               | PF=1                |
| JNP<br>JPO        | nu are paritate<br>paritatea este impară                          | PF=0                |
| JS                | are semn negativ                                                  | SF=1                |
| JNS               | nu are semn negativ                                               | SF=0                |
| JO                | există depășire                                                   | OF=1                |
| JNO               | nu există depășire                                                | OF=0                |

Tabelul 4.1. Instrucțiunile de salt condiționat

## String contents

a6 dd '123', '345', 'abcd' ; 3 doublewords are defined, their contents being  
31 32 33 00|33 34 35 00|61 62 63 64|

a6 dd '1234' ; 31 32 33 34

coduri in ascii

a6 dd '12345' ; 31 32 33 34|35 00 00 00|

a7 dd '2345'

; |32 33 34 35|

a8 dd 12345678h ; |78 56 34 12|

-> No little endian

=> little endian

## Conversions classification

a). Destructive – cbw, cwd, cwde, cdq, movzx, movsx, mov ah,0; mov dx,0; mov edx,0

Non-destructive – Type operators: byte, word, dword, qword

b). Signed - cbw, cwd, cwde, cdq, movsx

Unsigned – movzx, mov ah,0; mov dx,0; mov edx,0, byte, word, dword, qword

c). by enlargement – all the destructive ones ! + word, dword, qword

by narrowing – byte, word, dword

d). implicit vs explicit conversions

e = a+b+c e,b = float , a,c – integer

i=c //only in C NOT in C++ !

## IX Machine instructions representation

- a x86 machine instruction = represents a sequence of 1 to 15 bytes
  - = has maximum 2 operands (source / destination operands)
    - ↓
      - from these 2 operands, only one may be stored in the RAM memory. The other one must be either one EU register, either an integer constant

## internal format of an instruction



- instruction
  - address size
  - operand size
  - segment override
- specify for  
 some instructions  
 the nature and  
 the storage of  
 operands (register or  
 memory)
- displacement  
 + immediate  
 = constant

## Instruction prefix

- string manipulation instruction prefixes  
 (explicitly provided by programmers)

**F3h = REP, REPE**

**F2h = REPNE**

where

- **REP** repeats instruction the number of times specified by *iteration count ECX*.
  - **REPE** and **REPNE** prefixes allow to terminate loop on the value of **ZF** CPU flag.
- 0xF3 is called REP when used with MOVS/LODS/STOS/INS/OUTS  
 (instructions which don't affect flags)
- 0xF3 is called REPE or REPZ when used with CMPS/SCAS
- 0xF2 is called REPNE or REPNZ when used with CMPS/SCAS, and is not documented for other instructions.
- Intel's insn reference manual REP entry only documents F3 REP for MOVS, not the F2 prefix.

Related string manipulation instructions are:

- **MOVS**, move string ; **STOS**, store string
- **SCAS**, scan string ; **CMPS**, compare string, etc.

- **segment override prefix**  $\Rightarrow$  memory access to use specified segment instead of default segment (provided by the programmers)

**2Eh = CS**

**36h = SS**

**3Eh = DS**

**26h = ES**

**64h = FS**

**65h = GS**

#### Definition

Instruction prefixes are assembly language constructs that appear optionally in the composition of a source line (explicit prefixes) or in the internal format of an instruction (prefixes generated implicitly by the assembler in two cases) and that modify the standard behavior of those instructions (in the case of explicit prefixes) or which signals the processor to change the default representation size of operands and/or addresses, sizes established by assembly directives (BITS 16 su BITS 32).

- main task of an assembler - generating bytes

- at any given moment only one segment of every type may be active
- in 32 bits programming  $\Rightarrow$  the segment registers CS, DS, SS, ES contain the values of the selectors of the currently active segments
- CS : EIP  $\rightarrow$  contain the address of the currently executed instruction
- an assembly language instruction doesn't allow both of its explicit operands to be from the memory

[prefixes] + code + [ModeR/M] + [SIB] + [displacement] + [immediate]

|   |   |   |   |   |   |
|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|

3 - can express register-type operands and / or indirect addressing memory-type operands in which only {base} appears

if  $[index * scale]$  also appears  $\Rightarrow$  SIB 14  
is also needed

3+4  $\Rightarrow$  base + index \* scale

5+6  $\Rightarrow$  constant

if "3" tells us that we have a register operand  $\Rightarrow$  4,5,6 are absent

5  $\rightarrow$  expresses the direct addressing memory access

6  $\rightarrow$  numerical constants

- direct addressing = direct access to the memory operand based on its offset, without mudding / specifying any registers in the offset specification formula

CS:EIP – The FAR (complete, full) address of the currently executing instruction

EIP – automatically incremented by the current execution

CS – contains the segment selector of the currently active segment and it can be changed only if the execution will switch to another segment

Mov cs, [var] – forbidden - syntactic ok (illegal instruction in OllyDbg...)

Mov eip, eax - syntax error – symbol 'eip' undefined

Jmp FAR somewhere ; CS and EIP will be both modified !

Jmp start1 ; NEAR jmp – only the offset will be modified, so only EIP !

## X Multi-module programming

### Volatile vs. non-volatile resources

- **volatile** resources are represented by those registers that **the calling convention is defining them as belonging to the called subroutine**, thus, the caller being responsible **as part of the call code** to save their values (if the called subroutine is using them) and after that, at the end of the call to restore the initial (old) values. So: who is saving the volatile resources ? **The caller** (as part of the call code) . Who is restoring in the end those values ? Also **the caller but NOT as part of a certain call/entry or exit code**. Just restore them after the call in the regular code as a mandatory responsibility.
- **non-volatile** resources are any memory addresses or registers which do not belong explicitly to the called subroutine, but if this one needs to modify those resources, it is necessary that the called subroutine to save them at the entry as part of the entry code and restore them back at exit, as part of the exit code. So: who is saving the non-volatile resources ? **The callee** (apelatul = the **called** subroutine, as part of the entry code) . Who is restoring in the end these values ? Also **the callee** (as part of the exit code).

### Call code (THE CALLER):

- a). Saving the volatile resources (EAX, ECX, EDX, EFLAGS)
- b). Passing parameters
- c). Saving the returning address and performing the call

### Entry code (THE CALLEE – called subroutine):

- a). Building the new stackframe                   PUSH EBP,  
                                                         MOV EBP, ESP
- b). Allocating space for local variables   SUB ESP, nr\_bytes
- c). Saving non-volatile resources exposed to be modified

### Exit code (THE CALLEE):

- a). Restoring non-volatile resources
- b). Freeing the space allocated for local variables [ADD ESP, nr\_bytes\_locals] – mentioned here just as a reverse for the above b) from the entry code, but not really necessary because deallocating the stackframe (mov esp, ebp) includes this action anyway from a practically point of view.
- c). Deallocating the stackframe   MOV ESP, EBP (if we know exactly the size of the stackframe , ADD ESP, sizeof(stackframe) solves similarly...) and restoring the base of the   POP EBP caller stackframe (old EBP)           (a, b c – the reverse of the entry code)
- d). Returning from the subroutine (RET) and deallocating passed parameters (if we have a STDCALL function)   -   (reverse of b + c from the call code)

It is still to be done the reverse of a) from call code. It is the task of the CALLER to do it together with a possible parameters take out from the stack (if it is a CDECL function).

