

## End-Semester Examination

Allotted time: 60 minutes

Marks: 32

### Instructions:

- There are four questions with sub-parts, printed over two sides of a sheet. Please check if your sheet has all the questions. Marks will be normalized by a factor of 10/32.
- Discussions amongst the students are not allowed. Any dishonesty shall be penalized heavily.
- Be clear in your arguments. Vague arguments shall not be given full credit.
- Please note that in class we discussed NASM x86 assembly and in assignments we discussed GNU x86 assembly. The main difference is in the order of operands. NASM uses destination first but GNU Assembly does not use destination as a first operand.

1. Consider a 5-bit floating-point representation based on the IEEE floating-point format, with one sign bit, two exponent bits ( $k = 2$ ), and two fraction bits ( $n = 2$ ).

(a) Compute the bias.

(b) What is the binary representation of  $-\frac{31}{8}$  and  $\frac{56}{32}$  in the above floating point representation.

[2+3+3 marks]

2. Explain the given GNU x86 assembly snippet.

```
push %ebp  
mov %esp,%ebp  
mov 0x8(%ebp),%eax  
# lea (%esi,%ebx,4), %edx corresponds to setting %edx to the address %esi+%ebx*4  
lea (%eax,%eax,4),%eax  
lea 0xffffffff(%eax,%eax,4),%eax  
mov %ebp,%esp  
pop %ebp  
ret
```

[8 marks]

3. The 5 stages of the processor have the following latencies:

|    | Fetch | Decode | Execute | Memory | Writeback |
|----|-------|--------|---------|--------|-----------|
| a. | 300ps | 400ps  | 350ps   | 550ps  | 100ps     |
| b. | 200ps | 150ps  | 100ps   | 190ps  | 140ps     |

Assume that when pipelining, each pipeline stage costs 20ps extra for the registers between pipeline stages.

- (a) Non-pipelined processor: what is the cycle time? What is the latency of an instruction? What is the throughput?
- (b) Fully Pipelined processor: What is the cycle time? What is the latency of an instruction? What is the throughput?
- (c) If you could split one of the pipeline stages into 2 equal halves, which one would you choose? What is the new cycle time? What is the new latency? What is the new throughput?
4. (a) For a direct-mapped cache design with a 32-bit address and byte-addressable memory, the following bits of the address are used to access the cache:

|    | Tag   | Index | Offset |
|----|-------|-------|--------|
| a. | 31-10 | 9-5   | 4-0    |
| b. | 31-12 | 11-6  | 5-0    |

For each configuration (a and b):

- i. What is the cache block size (in words)?
- ii. How many entries (blocks) does the cache have?

- (b) You have a 2-way set associative L1 cache that is 8KB, with 4-word cache lines. Writing and reading data to L2 takes 10 cycles. You get the following sequence of writes to the cache – each is a 32-bit address in hexadecimal:

0x1000

0x1004

0x1010

0x11c0

0x2000

0x21c0

0x3400

0x3404

0x3f00

0x2004

0x1004

- i. How many cache misses occur with an LRU policy?
- ii. How many cache misses occur with a FIFO policy?
- iii. Would the miss-rate increase or decrease if the cache was the same size, but direct-mapped (in above case)? Explain.

[2+3+3 marks]

[1.5+1.5 marks]

[2+1+2 marks]

## x86-64 Reference Sheet (GNU assembler format)

### Instructions

#### Data movement

`movq Src, Dest` Dest = Src  
`movsbq Src, Dest` Dest (quad) = Src (byte), sign-extend  
`movzbq Src, Dest` Dest (quad) = Src (byte), zero-extend

#### Conditional move

`cmove Src, Dest` Equal / zero  
`cmovev Src, Dest` Not equal / not zero  
`cmove Src, Dest` Negative  
`cmovea Src, Dest` Nonnegative  
`cmoveg Src, Dest` Greater (signed >)  
`cmovege Src, Dest` Greater or equal (signed ≥)  
`cmovel Src, Dest` Less (signed <)  
`cmovele Src, Dest` Less or equal (signed ≤)  
`cmovea Src, Dest` Above (unsigned >)  
`cmoveae Src, Dest` Above or equal (unsigned ≥)  
`cmoveb Src, Dest` Below (unsigned <)  
`cmovebe Src, Dest` Below or equal (unsigned ≤)

#### Control transfer

`cmpq Src2, Src1` Sets CCs Src1 Src2  
`testq Src2, Src1` Sets CCs Src1 & Src2  
`jmp label` jump  
`je label` jump equal  
`jne label` jump not equal  
`js label` jump negative  
`jns label` jump non-negative  
`jg label` jump greater (signed >)  
`jge label` jump greater or equal (signed ≥)  
`jl label` jump less (signed <)  
`jle label` jump less or equal (signed ≤)  
`ja label` jump above (unsigned >)  
`jb label` jump below (unsigned <)  
`pushq Src`  $\%rsp = \%rsp - 8$ ,  $Mem[\%rsp] = Src$   
`popq Dest` Dest =  $Mem[\%rsp]$ ,  $\%rsp = \%rsp + 8$   
`call label` push address of next instruction, `jmp label`  
`ret`  $\%rip = Mem[\%rsp]$ ,  $\%rsp = \%rsp + 8$

### Arithmetic operations

`leaq Src, Dest` Dest = address of Src  
`incq Dest` Dest = Dest + 1  
`decq Dest` Dest = Dest - 1  
`addq Src, Dest` Dest = Dest + Src  
`subq Src, Dest` Dest = Dest - Src  
`imulq Src, Dest` Dest = Dest \* Src  
`xorq Src, Dest` Dest = Dest ^ Src  
`orq Src, Dest` Dest = Dest | Src  
`andq Src, Dest` Dest = Dest & Src  
`negq Dest` Dest = - Dest  
`notq Dest` Dest = ~ Dest  
`salq k, Dest` Dest = Dest ≪ k  
`sarq k, Dest` Dest = Dest ≫ k (arithmetic)  
`shrq k, Dest` Dest = Dest ≫ k (logical)

### Addressing modes

- Immediate  
 $\$val Val$   
 $val$ : constant integer value  
`movq $7, %rax`
- Normal  
 $(R) Mem[Reg[R]]$   
 $R$ : register R specifies memory address  
`movq (%rcx), %rax`
- Displacement  
 $D(R) Mem[Reg[R]+D]$   
 $R$ : register specifies start of memory region  
 $D$ : constant displacement D specifies offset  
`movq 8(%rdi), %rdx`
- Indexed  
 $D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+D]$   
 $D$ : constant displacement 1, 2, or 4 bytes  
 $Rb$ : base register: any of 8 integer registers  
 $Ri$ : index register: any, except %esp  
 $S$ : scale: 1, 2, 4, or 8  
`movq 0x100(%rcx,%rax,4), %rdx`

### Instruction suffixes

**b** byte  
**w** word (2 bytes)  
**l** long (4 bytes)  
**q** quad (8 bytes)

### Condition codes

**CF** Carry Flag  
**ZF** Zero Flag  
**SF** Sign Flag  
**OF** Overflow Flag

### Integer registers

|                   |                  |
|-------------------|------------------|
| <code>%rax</code> | Return value     |
| <code>%rbx</code> | Callee saved     |
| <code>%rcx</code> | 4th argument     |
| <code>%rdx</code> | 3rd argument     |
| <code>%rsi</code> | 2nd argument     |
| <code>%rdi</code> | 1st argument     |
| <code>%rbp</code> | Callee saved     |
| <code>%rsp</code> | Stack pointer    |
| <code>%r8</code>  | 5th argument     |
| <code>%r9</code>  | 6th argument     |
| <code>%r10</code> | Scratch register |
| <code>%r11</code> | Scratch register |
| <code>%r12</code> | Callee saved     |
| <code>%r13</code> | Callee saved     |
| <code>%r14</code> | Callee saved     |
| <code>%r15</code> | Callee saved     |

# CS107 x86-64 Reference Sheet

## Common instructions

|                        |                                                                                               |
|------------------------|-----------------------------------------------------------------------------------------------|
| <b>mov</b> src, dst    | dst = src                                                                                     |
| <b>movsbl</b> src, dst | byte to int, sign-extend                                                                      |
| <b>movzbl</b> src, dst | byte to int, zero-fill                                                                        |
| <b>cmove</b> src, reg  | reg = src when condition holds,<br>using same condition suffixes as jmp                       |
| <br>                   |                                                                                               |
| <b>lea</b> addr, dst   | dst = addr                                                                                    |
| <br>                   |                                                                                               |
| <b>add</b> src, dst    | dst += src                                                                                    |
| <b>sub</b> src, dst    | dst -= src                                                                                    |
| <b>imul</b> src, dst   | dst *= src                                                                                    |
| <b>neg</b> dst         | dst = -dst (arith inverse)                                                                    |
| <br>                   |                                                                                               |
| <b>imulq</b> S         | signed full multiply<br>R[%rdx]:R[%rax] <- S * R[%rax]                                        |
| <b>mulq</b> S          | unsigned full multiply<br>same effect as <b>imulq</b>                                         |
| <br>                   |                                                                                               |
| <b>idivq</b> S         | signed divide<br>R[%rdx] <- R[%rdx]:R[%rax] mod S<br>R[%rax] <- R[%rdx]:R[%rax] / S           |
| <b>divq</b> S          | unsigned divide - same effect as <b>idivq</b>                                                 |
| <b>cqto</b>            | R[%rdx]:R[%rax] <- SignExtend(R[%rax])                                                        |
| <br>                   |                                                                                               |
| <b>sal</b> count, dst  | dst <= count                                                                                  |
| <b>sar</b> count, dst  | dst >= count (arith shift)                                                                    |
| <b>shr</b> count, dst  | dst >= count (logical shift)                                                                  |
| <b>and</b> src, dst    | dst &= src                                                                                    |
| <b>or</b> src, dst     | dst  = src                                                                                    |
| <b>xor</b> src, dst    | dst ^= src                                                                                    |
| <b>not</b> dst         | dst = ~dst (bitwise inverse)                                                                  |
| <br>                   |                                                                                               |
| <b>cmp</b> a, b        | b-a, set flags                                                                                |
| <b>test</b> a, b       | a&b, set flags                                                                                |
| <br>                   |                                                                                               |
| <b>set</b> dst         | sets byte at dst to 1 when condition holds, 0 otherwise, using same condition suffixes as jmp |
| <br>                   |                                                                                               |
| <b>jmp</b> label       | jump to label (unconditional)                                                                 |
| <b>je</b> label        | jump equal ZF=1                                                                               |
| <b>jne</b> label       | jump not equal ZF=0                                                                           |
| <b>js</b> label        | jump negative SF=1                                                                            |
| <b>jns</b> label       | jump not negative SF=0                                                                        |
| <b>jg</b> label        | jump > (signed) ZF=0 and SF=OF                                                                |
| <b>jge</b> label       | jump >= (signed) SF=OF                                                                        |
| <b>jl</b> label        | jump < (signed) SF!=OF                                                                        |
| <b>jle</b> label       | jump <= (signed) ZF=1 or SF!=OF                                                               |
| <b>ja</b> label        | jump > (unsigned) CF=0 and ZF=0                                                               |
| <b>jae</b> label       | jump >= (unsigned) CF=0                                                                       |
| <b>jb</b> label        | jump < (unsigned) CF=1                                                                        |
| <b>jbe</b> label       | jump <= (unsigned) CF=1 or ZF=1                                                               |

|                 |                                            |
|-----------------|--------------------------------------------|
| <b>push</b> src | add to top of stack<br>Mem[-%rsp] = src    |
| <b>pop</b> dst  | remove top from stack<br>dst = Mem[%rsp++] |
| <b>call</b> fn  | push %rip, jmp to fn                       |
| <b>ret</b>      | pop %rip                                   |

## Condition codes/flags

|    |               |
|----|---------------|
| ZF | Zero flag     |
| SF | Sign flag     |
| CF | Carry flag    |
| OF | Overflow flag |

## Addressing modes

Example source operands to **mov**

### Immediate

**mov \$0x5, dst**

\$val  
source is constant value

### Register

**mov %rax, dst**

%R  
R is register  
source in %R register

### Direct

**mov 0x4033d0, dst**

0xaddr  
source read from Mem[0xaddr]

### Indirect

**mov (%rax), dst**

(%R)  
R is register  
source read from Mem[%R]

### Indirect displacement

**mov 8(%rax), dst**

D(%R)  
R is register  
D is displacement  
source read from Mem[%R + D]

### Indirect scaled-index

**mov 8(%rsp, %rcx, 4), dst**

D(%RB,%RI,S)  
RB is register for base  
RI is register for index (0 if empty)  
D is displacement (0 if empty)  
S is scale 1, 2, 4 or 8 (1 if empty)  
source read from:  
Mem[%RB + D + S\*%RI]

## CS107 x86-64 Reference Sheet

### Registers

|                         |                     |
|-------------------------|---------------------|
| %rip                    | Instruction pointer |
| %rsp                    | Stack pointer       |
| %rax                    | Return value        |
| %rdi                    | 1st argument        |
| %rsi                    | 2nd argument        |
| %rdx                    | 3rd argument        |
| %rcx                    | 4th argument        |
| %r8                     | 5th argument        |
| %r9                     | 6th argument        |
| %r10, %r11              | Callee-owned        |
| %rbx, %rbp,<br>%r12-%15 | Caller-owned        |

### Instruction suffixes

|   |                            |
|---|----------------------------|
| b | byte                       |
| w | word (2 bytes)             |
| l | long /doubleword (4 bytes) |
| q | quadword (8 bytes)         |

Suffix is elided when can be inferred from operands. e.g. operand %rax implies q, %eax implies l, and so on

### Register Names

| 64-bit register | 32-bit sub-register | 16-bit sub-register | 8-bit sub-register |
|-----------------|---------------------|---------------------|--------------------|
| %rax            | %eax                | %ax                 | %al                |
| %rbx            | %ebx                | %bx                 | %bl                |
| %rcx            | %ecx                | %cx                 | %cl                |
| %rdx            | %edx                | %dx                 | %dl                |
| %rsi            | %esi                | %si                 | %sil               |
| %rdi            | %edi                | %di                 | %dil               |
| %rbp            | %ebp                | %bp                 | %bp1               |
| %rsp            | %esp                | %sp                 | %sp1               |
| %r8             | %r8d                | %r8w                | %r8b               |
| %r9             | %r9d                | %r9w                | %r9b               |
| %r10            | %r10d               | %r10w               | %r10b              |
| %r11            | %r11d               | %r11w               | %r11b              |
| %r12            | %r12d               | %r12w               | %r12b              |
| %r13            | %r13d               | %r13w               | %r13b              |
| %r14            | %r14d               | %r14w               | %r14b              |
| %r15            | %r15d               | %r15w               | %r15b              |

## NASM Intel x86 Assembly Language Cheat Sheet

| Instruction                   | Effect                                                                                                                    | Examples                                               |
|-------------------------------|---------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|
| <b>Copying Data</b>           |                                                                                                                           |                                                        |
| <code>mov dest,src</code>     | Copy src to dest                                                                                                          | <code>mov eax,10</code><br><code>mov eax,[2000]</code> |
| <b>Arithmetic</b>             |                                                                                                                           |                                                        |
| <code>add dest,src</code>     | <code>dest = dest + src</code>                                                                                            | <code>add esi,10</code>                                |
| <code>sub dest,src</code>     | <code>dest = dest - src</code>                                                                                            | <code>sub eax, ebx</code>                              |
| <code>mul reg</code>          | <code>edx:eax = eax * reg</code>                                                                                          | <code>mul esi</code>                                   |
| <code>div reg</code>          | <code>edx = edx:eax mod reg</code><br><code>eax = edx:eax ÷ reg</code>                                                    | <code>div edi</code>                                   |
| <code>inc dest</code>         | Increment destination                                                                                                     | <code>inc eax</code>                                   |
| <code>dec dest</code>         | Decrement destination                                                                                                     | <code>dec word [0x1000]</code>                         |
| <b>Function Calls</b>         |                                                                                                                           |                                                        |
| <code>call label</code>       | Push eip, transfer control                                                                                                | <code>call format_disk</code>                          |
| <code>ret</code>              | Pop eip and return                                                                                                        | <code>ret</code>                                       |
| <code>push item</code>        | Push item (constant or register) to stack.<br>I.e.: <code>esp=esp-4; memory[esp] = item</code>                            | <code>push dword 32</code><br><code>push eax</code>    |
| <code>pop [reg]</code>        | Pop item from stack and store to register<br>I.e.: <code>reg=memory[esp]; esp=esp+4</code>                                | <code>pop eax</code>                                   |
| <b>Bitwise Operations</b>     |                                                                                                                           |                                                        |
| <code>and dest, src</code>    | <code>dest = src &amp; dest</code>                                                                                        | <code>and ebx, eax</code>                              |
| <code>or dest,src</code>      | <code>dest = src   dest</code>                                                                                            | <code>or eax,[0x2000]</code>                           |
| <code>xor dest, src</code>    | <code>dest = src ^ dest</code>                                                                                            | <code>xor ebx, 0xffffffff</code>                       |
| <code>shl dest,count</code>   | <code>dest = dest &lt;&lt; count</code>                                                                                   | <code>shl eax, 2</code>                                |
| <code>shr dest,count</code>   | <code>dest = dest &gt;&gt; count</code>                                                                                   | <code>shr dword [eax],4</code>                         |
| <b>Conditionals and Jumps</b> |                                                                                                                           |                                                        |
| <code>cmp b,a</code>          | Compare b to a; must immediately precede any of the conditional jump instructions                                         | <code>cmp eax,0</code>                                 |
| <code>je label</code>         | Jump to label if <code>b == a</code>                                                                                      | <code>je endloop</code>                                |
| <code>jne label</code>        | Jump to label if <code>b != a</code>                                                                                      | <code>jne loopstart</code>                             |
| <code>jg label</code>         | Jump to label if <code>b &gt; a</code>                                                                                    | <code>jg exit</code>                                   |
| <code>jge label</code>        | Jump to label if <code>b ≥ a</code>                                                                                       | <code>jge format_disk</code>                           |
| <code>jl label</code>         | Jump to label if <code>b &lt; a</code>                                                                                    | <code>jl error</code>                                  |
| <code>jle label</code>        | Jump to label if <code>b ≤ a</code>                                                                                       | <code>jle finish</code>                                |
| <code>test reg,imm</code>     | Bitwise compare of register and constant; should immediately precede the <code>jz</code> or <code>jnz</code> instructions | <code>test eax,0xffff</code>                           |
| <code>jz label</code>         | Jump to label if bits were <b>not</b> set ("zero")                                                                        | <code>jz looparound</code>                             |
| <code>jnz label</code>        | Jump to label if bits <b>were</b> set ("not zero")                                                                        | <code>jnz error</code>                                 |
| <code>jmp label</code>        | Unconditional relative jump                                                                                               | <code>jmp exit</code>                                  |
| <code>jmp reg</code>          | Unconditional absolute jump; arg is a register                                                                            | <code>jmp eax</code>                                   |
| <b>Miscellaneous</b>          |                                                                                                                           |                                                        |
| <code>nop</code>              | No-op (opcode 0x90)                                                                                                       | <code>nop</code>                                       |
| <code>hlt</code>              | Halt the CPU                                                                                                              | <code>hlt</code>                                       |

Instructions with no memory references must include 'byte', 'word' or 'dword' size specifier.

Arguments to instructions: Note that it is not possible for **both** src and dest to be memory addresses.

Constant (decimal or hex):    10 or 0xff                  Fixed address:    [200] or [0x1000+53]

Register:                        eax bl                  Dynamic address:    [eax] or [esp+16]

32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp (points to first used location on top of stack)

16-bit registers: ax, bx, cx, dx, si, di, sp, bp

8-bit registers: al, ah, bl, bh, cl, ch, dl, dh