

## 程序代写代做 CS 编程辅导 Computer Architecture and Low Level Programming



Dr. Vasilios Kelefouras

WeChat: cstutorcs

Email: v.kelefouras@plymouth.ac.uk

Assignment Project Exam Help  
Website:

<https://www.plymouth.ac.uk/staff/vasilios>

QQ: 749389476

<https://tutorcs.com>

# Outline

程序代写代做 CS编程辅导

2

- Memory hierarchy
- Cache memories
- Temporal and spatial locality
- Cache design



WeChat: cstutorcs

- Cache hit/miss

Assignment Project Exam Help

- Direct mapped, set associative, fully associative

Email: tutorcs@163.com

- Write policies

- Replacement policy

QQ: 749389476

<https://tutorcs.com>

- Stack memory

# Memory Wall Problem

程序代写代做 CS编程辅导

3



# Memory Hierarchy

程序代写代做 CS编程辅导

4



Taken from [https://www.researchgate.net/publication/281805561\\_MTJ-based\\_hybrid\\_storage\\_cells\\_for\\_normally-off\\_and\\_instant-on\\_computing/figures?lo=1](https://www.researchgate.net/publication/281805561_MTJ-based_hybrid_storage_cells_for_normally-off_and_instant-on_computing/figures?lo=1)

# Cache memories

## 程序代写代做 CS编程辅导

5

- Wouldn't it be nice if we could have a balance between fast and cheap memory?
- The solution is to add from the processor to the main memory 3 levels of cache memories, which are small, fast, but expensive memories



WeChat: cstutorcs  
Assignment Project Exam Help  
Email: tutorcs@163.com  
QQ: 749389476

- Cache memories occupy the largest part of the chip area
- They consume a significant amount of power consumption
- Add complexity to the design
- Cache memories are of key importance regarding performance

# Memory Hierarchy (2)

程序代写代做 CS编程辅导

6



- Consider that CPU needs to execute a load instruction
  - First it looks at L1 data cache. If the datum is there then it loads it and no other memory is accessed.
  - If the datum is not in the L1 data cache (**L1 miss**), then the CPU looks at the L2 cache
  - If the datum is in L2 (**L2 hit**) then no other memory is accessed. Otherwise (**L2 miss**), the CPU looks at L3 etc.

WeChat: cstutorcs

Assignment Project Exam Help

L1 cache access time: Email: [tutorcs@163.com](mailto:tutorcs@163.com)

L2 cache access time : 6-14 CPU cycles

L3 cache access time : QQ: [749389476](https://tutorcs.com)

DDR access time : 100-200 CPU cycles

<https://tutorcs.com>

# Data Locality

## 程序代写代做 CS编程辅导

7

- Regarding **static** programs it is difficult and time consuming to figure out what data will be the “most recently accessed” before a program actually runs
  - In static programs the memory path is known at compile time
- Regarding **dynamic** programs it is impossible
- This makes it hard to know what to store into the small, precious cache memory
- But in practice, most programs exhibit locality, which the cache can take advantage of
  - The principle of **temporal locality** says that if a program accesses one memory address, there is a good chance that it will access the same address again
  - The principle of **spatial locality** says that if a program accesses one memory address, there is a good chance that it will also access other nearby addresses



WeChat: cstutorcs

Assignment / Project / Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutortutors.com>

# Temporal Locality in Program Instructions

程序代写代做 CS编程辅导

8

- The principle of **temporal locality** says that if a program accesses one memory address, there is a good chance it will access the same address again
- Loops are excellent examples of temporal locality in programs
  - The loop body will be executed many times
  - The computer will need to access those same few locations of the instruction memory repeatedly
- For example:



WeChat: cstutorcs  
Assignment Project Exam Help

Email: tutorcs@163.com

Loop: *lw, \$t0, 0(\$s1)*

add \$t0, \$t0, \$s2

sw \$t0, 0(\$s1)

addi \$s1, \$s1, -4

bne \$s1, \$0, Loop

QQ: 749389476  
<https://tutorcs.com>

- Each instruction will be fetched over and over again, once on every loop iteration

# Temporal Locality in Data

程序代写代做 CS编程辅导

9

- Programs often access the same variables over and over, especially within loops, e.g., below, **sum**, **i** and **B[k]** are repeatedly read/written
- Commonly-accessed variables can be kept in registers, but this is not always possible as there is a limited number of registers**



WeChat: cstutorcs

- Sum** and **i** variables are a) of small size, b) reused many times, and therefore it is efficient to remain in the CPU's registers
- B[k]** remains unchanged during the innermost loop and therefore it is efficient to remain in a CPU register
- The whole **A[ ]** array is accessed 3 times and therefore it will remain in the cache (depending on its size)

<https://tutorcs.com>

```
sum = 0;  
for (k = 0; k < 3; k++)  
    for (i = 0; i < N; i++)  
        sum = sum + A[i] + B[k];
```

# Spatial Locality in Program Instructions

程序代写代做 CS编程辅导

10

- The principle of spatial locality says that if a program accesses one memory address, there is a good chance that it will also access other nearby addresses



```
sub    $sp, $sp, 16  
sw    $ra, 0($sp)  
sw    $s0, 4($sp)  
sw    $a0, 8($sp)  
sw    $a1, 12($sp)
```

WeChat:estutorcs  
Assignment Project Exam Help

Email: tutorcs@163.com

- Every program exhibits spatial locality, because instructions are executed in sequence ~~most of the time~~ (however, branches might occur) - if we execute an instruction at memory location  $i$ , then we will probably also execute the next instruction, at memory location  $i+1$
- Code fragments such as loops exhibit both temporal and spatial locality

# Spatial Locality in Data

程序代写代做 CS编程辅导

11

- Programs often access data stored in contiguous memory locations
  - Arrays, like  $A[ ]$  in the code below are always stored in memory contiguously – this transformation is performed by the compiler



WeChat: cstutorcs  
sum = 0;  
for (i = 0; i < N; i++)  
 sum = sum + A[i];

Email: tutorcs@163.com

QQ: 749389476

|       |      |      |      |
|-------|------|------|------|
| A[0]  | A[1] | A[2] | A[3] |
| ..... |      |      |      |
|       |      |      |      |
|       |      |      |      |
|       |      |      |      |

<https://tutorcs.com>



# How caches take advantage of temporal locality

程序代写代做 CS编程辅导

12



- Every time the processor reads an address in main memory, a copy of the data is also stored in the cache
  - The next time that same address is read, we can use the copy of the data in the cache instead of accessing the slower DDR
  - So the first read is a little slower than before since it goes through both main memory and the cache, but subsequent reads are much faster
- This takes advantage of temporal locality, commonly accessed data are stored in the faster cache memory

WeChat: **tutorcs**  
Assignment Project Exam Help  
Email: [tutorcs@163.com](mailto:tutorcs@163.com)  
QQ: 749389476  
<https://tutorcs.com>



# How caches take advantage of Spatial locality

程序代写代做 CS编程辅导

13

- When the CPU reads location  $i$  in memory, a copy of that data is placed in memory, a copy of that data is placed in memory, a
- But instead of just copying one element of location  $i$ , it copies several values into the cache at once (cache line)**



WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476  
<https://tutorcs.com>

Again, the initial load incurs a performance penalty, but we're gambling on spatial locality and the chance that the CPU will need the extra data

L1 data cache

|      |      |      |      |
|------|------|------|------|
| A[0] | A[1] | A[2] | A[3] |
| A[4] | A[5] | A[6] | A[7] |
| .... |      |      |      |
|      |      |      |      |

Main memory



Cache lines - 256 bit

Words 32 bit

# Definitions: Hits and misses

程序代写代做 CS编程辅导

14

- A **cache hit** occurs if the cache contains the data that we're looking for. Hits are desirable, because the cache can return the data much faster than main memory
- A **cache miss** occurs if the cache does not contain the requested data. This is inefficient, since the CPU must then wait for addressing the slower next level of memory
- There are two basic measurements of cache performance
  - The **hit rate** is the percentage of memory accesses that are handled by the cache
  - The **miss rate** ( $1 - \text{hit rate}$ ) is the percentage of accesses that must be handled by the slower lower level memory
- Typical caches have a hit rate of 95% or higher, so in fact most memory accesses will be handled by the cache and will be dramatically faster



WeChat: [tutorcs](#)

Assignment Project Exam Help

Email: [tutorcs@163.com](mailto:tutorcs@163.com)

QQ: 749389476

<https://tutorcs.com>

# Important Questions

## 程序代写代做 CS编程辅导

15



1. When we fetch a block of data from main memory to the cache, where exactly should we put it?

WeChat: cstutorcs

2. How can we tell if a word is already in the cache, or if it has to be fetched from main memory first?

Assignment Project Exam Help

Email: tutorcs@163.com

3. Eventually, the small cache memory might fill up. To load a new block from main memory, we'd have to replace one of the existing blocks in the cache... which one?

QQ: 749389476  
<https://tutorcs.com>

4. In a write request, are we going to write to all memories in memory hierarchy or not?



# A simple cache design

程序代写代做 CS编程辅导

- Caches are divided into blocks which may be of various sizes
  - The number of blocks in memories are always in power of 2
  - For now consider that each block contains just one byte (not true in practice). Of course this cannot take advantage of spatial locality
- Here is an example of cache with eight blocks, each holding one byte

WeChat: cstutorcs  
Assignment Project Exam Help  
index 8-bit data

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>



# Where should we put data in the cache? (1)

程序代写代做 CS编程辅导

17

- A **direct-mapped** cache is the simplest approach: each main memory address maps to exactly one cache block.
- In the following figure, a 16-entry main memory and a 4-entry cache (four 1-entry blocks) are shown.
- Memory locations 0, 4, 8 and 12 all map to cache block 0.
- Addresses 1, 5, 9 and 13 map to cache block 1, etc.



# Where should we put data in the cache? (2)

程序代写代做 CS编程辅导

18

- One way to figure out which block a particular memory should go to is to use the modulo (remainder) operator



memory address

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

WeChat: cstutors  
Assignment Project Forum Help  
Email: tutorcs@163.com  
QQ: 74938947  
<https://tutorcs.com>

*the modulo operation finds the remainder after division of one number by another*



$$14 \bmod 4 = 2$$

# Where should we put data in the cache? (3)

程序代写代做 CS编程辅导

19

- An equivalent way to find which cache block a memory address maps to is to look at the least significant bits of the address
- In a four-entry cache we would check the two least significant bits of our memory addresses
- Again, you can check that address  $14_{10}$  ( $1110_2$ ) maps to cache block  $2_{10}$  ( $10_2$ )
- Taking the least  $k$  bits of a binary value is the same as computing that value mod  $n$



ment of a memory address in the cache  
bits of the address

memory  
address

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

WeChat: cstutorcs

Assignment Project Forum Help

Email: tutorcs@163.com

QQ: 74938047

<https://tutorcs.com>



# How can we find data in the cache?

程序代写代做 CS编程辅导

20

- How can we tell if a word is already in the cache, or if it has to be fetched from main memory first?
- If we want to read memory at address  $i$ , we can use the mod trick to determine which cache block would contain  $i$
- But other addresses might also map to the same cache block. How can we distinguish between them?
- For instance, cache block 2 could contain data from addresses 2, 6, 10 or 14



memory  
address

0  
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15

WeChat: cstutores  
Assignment Project Exam Help  
Email: tutorcs@163.com  
QQ: 74938947  
<https://tutorcs.com>



Index  
0  
1  
2  
3

# Adding tags

## 程序代写代做 CS编程辅导

21

- The solution is to add tags to the cache, which supply the rest of the address bits to let us distinguish between different memory locations that map to the same cache block



# what's in the cache

## 程序代写代做 CS编程辅导

22

- Now we can tell exact addresses of main memory are stored in the cache, by concatenating the block tags with the block indices



WeChat: cstutorcs

| Index | Tag | Data                   | Main memory address in cache block |
|-------|-----|------------------------|------------------------------------|
| 00    | 00  | Assignment             | Project Exam Help                  |
| 01    | 11  | Project                | 11 + 01 = 1101                     |
| 10    | 01  | Email: tutorcs@163.com | 01 + 10 = 0110                     |
| 11    | 01  | QQ: 749389476          | 01 + 11 = 0111                     |

'+' does not refer to addition here

<https://tutorcs.com>

# the valid bit

## 程序代写代做 CS编程辅导

23

- Initially, the cache is empty. It does not contain valid data, but trash.
- Thus, we add a **valid bit** to each cache block
  - When the system is initialized, all the valid bits are set to 0
  - When data is loaded into a particular cache block, the corresponding valid bit is set to 1

WeChat: cstutorcs

| Index | Valid Bit | Tag           | Data   | Main memory address in cache block |
|-------|-----------|---------------|--------|------------------------------------|
| 00    | 1         | 00 Email:     | 63.com | $00 + 00 = 0000$                   |
| 01    | 0         | 11            |        | Invalid                            |
| 10    | 0         | QQ: 749389476 |        | Invalid                            |
| 11    | 1         | 01            |        | $01 + 11 = 0111$                   |

<https://tutorcs.com>

# cache hit

## 程序代写代做 CS编程辅导

24

- Every memory has its **memory controller**, a HW mechanism responsible for finding the words in memory, loading/storing them.
- When the CPU tries to read from memory, the address will be sent to the **cache controller**
  - The lowest  $k$  bits of the address will index a block in the cache
  - If the block is valid and the tag matches the upper  $(m - k)$  bits of the  $m$ -bit address, then that data will be sent to the CPU
- Here is a diagram of a 32-bit memory address and a  $2^{10}$ -byte cache



# cache miss

## 程序代写代做 CS编程辅导

25

- In a two level memory system, L1 Cache misses are somehow expensive, but L2 cache misses are very expensive
- However, the slower main memory accesses are inevitable on an L3 cache miss



WeChat: cstutorcs

Assignment Project Exam Help

- The simplest thing to do is to stall the pipeline until the data from main memory can be fetched (and also copied into the cache)

Email: tutorcs@163.com  
QQ: 749389476

<https://tutorcs.com>

# Copying a block into the cache

程序代写代做 CS编程辅导

26

- After data is read from memory, putting a copy of that data into the cache is straightforward
  - The lowest  $k$  bits of the address specify a cache block
  - The upper  $(m - k)$  address bits are stored in the block's tag field
  - The data from main memory is stored in the block's data field
  - The valid bit is set to 1



WeChat: cstutorcs

Assignment Project Exam Help



# What if the cache fills up?

程序代写代做 CS编程辅导

27

- Eventually, the small cache might fill up. To load a new block from DDR, we'd have to replace one of the existing blocks in the cache... which one?



- A miss causes a new block to be loaded into the cache, automatically overwriting any previously stored data
- Normally, the **least recently used (LRU)** replacement policy is used, which assumes that least recently used data are less likely to be requested than the most recently used ones
- So, in a cache miss, **cache throws out the cache line that has been unused for the longest time**

WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

# A more realistic direct mapped cache memory

程序代写代做 CS编程辅导

28

- Normally, one cache line 8/256 bits of data. In most programming languages, an integer uses 4 bytes)



8/256 bits of data. In most programming languages, an integer uses 4 bytes)



# Associativity

## 程序代写代做 CS编程辅导

29

- The replacement policy determines where in the cache a copy of a particular entry of main memory will be placed.
- So far, we have seen direct mapped cache only



WeChat: cstutorcs

Assignment Project Exam Help

- There are three types of caches regarding associativity

□ Direct mapped

Email: tutorcs@163.com

□ N-way Associative

QQ: 749389476

□ Fully associative

<https://tutorcs.com>

# Associative Caches

程序代写代做 CS编程辅导

30

- Block 12 placed in 8 block



**Fully associative:**  
block 12 can go  
anywhere



Block-frame address



WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476



Block no. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

<https://tutorcs.com>

**2 way Set associative** (like having two half size direct mapped caches):  
block 12 can go in either of the two block 0 ( $12 \bmod 4 = 0$ )



# Set-associative cache memories

程序代写代做 CS编程辅导

31

- A 4-way associative cache consists of four direct-mapped caches that work in parallel
- Normally, L1 caches are 4-way associative, while L2/L3 caches are of 16/24 way associative, i.e., 16/24 direct mapped caches in parallel
- Data are found in one cache among four by using an address which is stored in one of the four caches



WeChat: cstutorcs

Assignment Project Exam Help

- Set associative caches are the most used as they present the best compromise between cost and performance  
QQ: 749389476

<https://tutorcs.com>

# The 4-way set-associative cache architecture

程序代写代做 CS编程辅导

32



# Cache misses

## 程序代写代做 CS编程辅导

33

- **Compulsory miss** (or cold start access to a block)

- Normally, compulsory misses are insignificant



- **Capacity miss:**

- Cache cannot contain all blocks accessed by the program

WeChat: cstutorcs

- Solution: increase cache size

- **Conflict miss:** Assignment Project Exam Help

- Multiple memory locations are mapped to the same cache location

Email: tutorcs@163.com

- Solution 1: increase cache size

- Solution 2: increase associativity

QQ: 749389476

<https://tutorcs.com>

# Write policies - In a write request, are we going to write to all memories in memory hierarchy or not?

34



- **Write through** – Data are written to all memories in memory hierarchy
  - Disadvantage: Data are written into multiple memories every time and thus it takes more time
- **Write back** – Data are written only to L1 data cache; only when this block is replaced, it is written in L2. If this block is replaced from, then it is written in main memory
  - Requires a “dirty bit”
  - more complex to implement, since it needs to track which of its locations have been written over, and mark them as dirty for later writing to the lower memory

<https://tutorcs.com>

# Write policies (2)

## 程序代写代做 CS编程辅导

35

### What happens on a write miss?



- a decision needs to be made about whether there was a write miss, whether or not data would be loaded into the cache. This can be handled by these two approaches:
  - **Write allocate** : datum at the missed-write location is loaded to cache, followed by a write-hit operation. In this approach, write misses are similar to read misses
  - **No-write allocate** : datum at the missed-write location is not loaded to cache, and is written directly to DDR. In this approach, data are loaded into the cache on read misses only
- Both write-through and write-back policies can use either of these write-miss policies, but usually they are paired in this way:
  - **A write-back cache uses write allocate**, hoping for subsequent writes (or even reads) to the same location, which is now cached
  - **A write-through cache uses no-write allocate**. Here, subsequent writes have no advantage, since they still need to be written directly to DDR

# Memory Allocation

## 程序代写代做 CS编程辅导

36



# Stack

## 程序代写代做 CS编程辅导

37

- There are 4 parts in memory:

- code, global variables and heap.**

- Stack is a block of memory that is used for function calls and local variables.

WeChat: cstutorcs

- Stack size is fixed during compilation – cannot ask for more during runtime.

Assignment Project Exam Help

- Actual data in the stack grows and shrinks as local variables and other pointers are added and removed accordingly.

Email: tutorcs@163.com

- Every function call is allocated a stack frame** – a block of memory – required for holding function specific data. The size of this is determined during compile-time.

QQ: 749389476

- In x64, the stack frame size is a multiple of 16 bytes
- Little-endian form is used



# Virtual and Physical Memory

程序代写代做 CS编程辅导

38

- A virtual address is a memory address that a process uses to access its own memory
  - The virtual address is not the same as the actual physical RAM address in which it is stored
  - When a process accesses a virtual address, the Memory Management Unit hardware translates the virtual address to physical
  - The OS determines the mapping from virtual address to physical address

Email: tutorcs@163.com

QQ: 749389476

https://tutorcs.com



# How can I find the memory addresses of an array? 程序代写代做 CS编程辅导

39

- We can print the virtual memory addresses only
- The hardware addresses are managed by the memory management unit
- However, if you know how the cache works you can have a very good guess about where the data are stored



WeChat: cstutorcs

Assignment Project Exam Help

//print virtual memory address  
Email: datuss@163.com

for (i=0; i<4; i++)

    for (j=0; j<4; j++) QQ: 749389476

        printf("\nThe address of element (%d,%d) is %p", i, j, &A[i][j]);

<https://tutorcs.com>

# Stack and Heap

## 程序代写代做 CS编程辅导

40

### □ Stack memory

- The stack is a special part of RAM that can be used by functions as temporary storage...

- To save register state.

- For local variables.

Assignment Project Exam Help

- For large input parameters / return values.

- Stack works as Last In First Out (LIFO)

QQ: 749389476

### □ Heap memory

- Dynamic allocation

<https://tutorcs.com>



# Why Stack is needed?

程序代写代做 CS编程辅导

41

- Calling subroutines
- Using registers in subroutines
- Using local variables in subroutines with the same name as in others
  - less RAM memory is used as these local variables are no longer needed when the function is ended
- Passing and returning large arguments to subroutines



WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

# Why Stack is needed?

程序代写代做 CS编程辅导

42

- Every function being called has its own space in stack
- Every time a function is called, it stores into the stack the following:
  - the return address – thus the CPU knows where to return afterwards
  - its input operands (in x86, the first 6 operands are stored into registers)
  - Its local variables – this way, we can use the registers in the subroutine without being overwritten, e.g., consider using edx in both caller and callee



WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

# Stack related instructions and registers

程序代写代做 CS编程辅导

43

## Registers



- **ESP Stack pointer** – points to the last element used in the stack.  
Implicitly manipulated by stack instructions (e.g. push, pop, call, ret, etc.)
- **EBP Base pointer** – used to reference function parameters and local variable within a stack frame.

## Instructions

Assignment Project Exam Help

- **push M/L/R** Pushes an M/L/R value on top of the stack (ESP - 4).
- **pop R** Remove and restore an R value from the top (ESP + 4).
- **call label** Calls a function with a label. This results into pushing the next instruction address on the stack.
- **ret** Returns to caller function – return value is usually stored in eax register.

# Every time we call a function... 程序代写代做 CS编程辅导

44



- The operands of the called function are stored into **rdi,rsi,rdx,rcx,r8**, then into the stack (if more space is needed).
- The return value is stored to **rax** (**eax** for 32bit - it is the same register)

WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

- Remember this when doing reverse engineering...

<https://tutorcs.com>

# How Stack Works?

程序代写代做 CS编程辅导

45

- Let's have a look at the 3<sup>rd</sup> question of this lab session ...



WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

# Caller and Callee – example (1)

程序代写代做 CS编程辅导

46

- In the C++ code below, addFunc is the caller function, and addFunc is the callee function.
- We will convert this into assembly and see how it works with the memory.



WeChat: cstutorcs

```
int main() {  
    int a = 2;  
    int b = 3;  
    int c = addFunc(a, b);  
    return 0;  
}  
  
int addFunc(int a, int b) {  
    return a + b;  
}
```

Assignment Project Exam Help  
Email: tutorcs@163.com  
QQ: 749389476  
<https://tutorcs.com>

# Caller and Callee -example (2)

程序代写代做 CS编程辅导

47

```
6 .data ; data segment  
7     ; define your vari  
8     a DWORD 2  
9     b DWORD 4  
10    .code ; code segment  
11        main PROC C ;  cdecl calling convention -- caller  
12            push b ; push b into stack  
13            push a ; push a into stack  
14            call addFunc ; call addition function  
15            add esp, 8 ; remove the  
16            INVOKE ExitProcess, 0 ; exit process  
17        main ENDP  
18        addFunc PROC C ; cdecl calling convention -- callee  
19            push ebp ; store whatever ebp is for the caller  
20            mov ebp, esp ; get the current stack pointer into the base pointer  
21            mov ebx, [ebp + 8] ; this is a  
22            mov eax, [ebp + 12] ; this is b  
23            add eax, ebx  
24            pop ebp ; restore ebp for the caller  
25            ret ; return  
26        addFunc ENDP  
27    END main
```



WeChat: cstutorcs  
Assignment Project Exam Help  
Email: tutorcs@163.com  
QQ: 749389476  
<https://tutorcs.com>

# Caller and Callee -example (3)

程序代写代做 CS编程辅导

48



# Walking through the code...

程序代写代做 CS编程辅导

49



The screenshot shows a debugger interface. At the top, there is assembly code:

```
11     main    cdecl calling convention -- caller
12     p        push b into stack
```

Below the assembly code is a QR code with the text "Tutor CS" next to it. Underneath the QR code, the word "Registers" is highlighted in blue. The registers are listed as follows:

|                |                |                   |                |
|----------------|----------------|-------------------|----------------|
| EAX = 8AC2BA10 | EBX = 00F9F000 | ECX = 0118100A    | EDX = 0118100A |
| ESI = 0118100A | EDI = 0118100A | EIP = 0118101C    | ESP = 00DEF8C0 |
| EBP = 00DEF8D0 | EF = 00000246  | WeChat: cstutorcs |                |

- We start the debugger [Assignment Project Exam Help](#)
- Before running line 12, the [Email: tutorcs@163.com](mailto:tutorcs@163.com)
- Each address is pointing to 4 bytes of [QQ: 749389476](#) memory.
- Decreasing addresses has nothing in there to being with.
- Push the last parameter of the function first (LIFO)!



The screenshot shows a memory dump window titled "Memory 1". It displays memory addresses from 0x00DEF8A8 to 0x00DEF8C8. The memory dump is as follows:

| Address    | Value       | Content |
|------------|-------------|---------|
| 0x00DEF8A8 | 00 00 00 00 | ....    |
| 0x00DEF8AC | 00 00 00 00 | ....    |
| 0x00DEF8B0 | 00 00 00 00 | ....    |
| 0x00DEF8B4 | 00 00 00 00 | ....    |
| 0x00DEF8B8 | 00 00 00 00 | ....    |
| 0x00DEF8BC | 00 00 00 00 | ....    |
| 0x00DEF8C0 | 44 87 f7 75 | D.÷u    |
| 0x00DEF8C4 | 00 f0 f9 00 | .δù.    |
| 0x00DEF8C8 | 20 87 f7 75 | .÷u     |

# Walking through the code...

程序代写代做 CS编程辅导

50



Assignment Project Exam Help

Memory 1

- The value of b is pushed on the top of the stack.
- Last used address is now ESP =  $0x00DEF8C0 - 4 = 0x00DEF8BC$
- Little Endian – lower bits on the lower addresses (see red highlighted line on the right).

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

|            |    |    |    |    |      |
|------------|----|----|----|----|------|
| 0x00DEF8A8 | 00 | 00 | 00 | 00 | .... |
| 0x00DEF8AC | 00 | 00 | 00 | 00 | .... |
| 0x00DEF8B0 | 00 | 00 | 00 | 00 | .... |
| 0x00DEF8B4 | 00 | 00 | 00 | 00 | .... |
| 0x00DEF8B8 | 00 | 00 | 00 | 00 | .... |
| 0x00DEF8BC | 04 | 00 | 00 | 00 | .... |
| 0x00DEF8C0 | 44 | 87 | f7 | 75 | D.÷u |
| 0x00DEF8C4 | 00 | f0 | f9 | 00 | .δù. |
| 0x00DEF8C8 | 20 | 87 | f7 | 75 | .÷u  |

# Walking through the code...

程序代写代做 CS编程辅导

51



Assignment Project Exam Help

Memory 1

- The value of a is pushed on the top of the stack.
- Last used address is now ESP =  $0x00DEF8BC - 4 = 0x00DEF8B8$
- Little Endian – lower bits on the lower addresses (see red highlighted line on the right).

Email: [tutorcs@163.com](mailto:tutorcs@163.com)

QQ: 749389476

<https://tutorcs.com>

|            |    |    |    |    |      |
|------------|----|----|----|----|------|
| 0x00DEF8A8 | 00 | 00 | 00 | 00 | .... |
| 0x00DEF8AC | 00 | 00 | 00 | 00 | .... |
| 0x00DEF8B0 | 00 | 00 | 00 | 00 | .... |
| 0x00DEF8B4 | 00 | 00 | 00 | 00 | .... |
| 0x00DEF8B8 | 02 | 00 | 00 | 00 | .... |
| 0x00DEF8BC | 04 | 00 | 00 | 00 | .... |
| 0x00DEF8C0 | 44 | 87 | f7 | 75 | D.÷u |
| 0x00DEF8C4 | 00 | f0 | f9 | 00 | .δù. |
| 0x00DEF8C8 | 20 | 87 | f7 | 75 | .÷u  |

# Walking through the code...

程序代写代做 CS编程辅导

52



```
18      addFunc F:\...\addFunc.c:18 cdecl calling convention -- callee
19      push    whatever ebp is for the caller ≤1ms elapsed
```

Registers

```
EAX = 8AC2BA10 EBX = 00F9F000 ECX = 0118100A EDX = 0118100A
ESI = 0118100A EDI = 01181004 EIP = 00181037 ESP = 00DEF8B4
EBP = 00DEF8D0 EFL = 00000246
```

WeChat: estutorcs

Email: tutorcs@163.com

- What happened here? ⇒ We jumped into the function that we called: addFunc.
- Last used address is now ESP = <https://tutorcs.com>  
 $0x00DEF8B8 - 4 = 0x00DEF8B4$ .
- Why was that weird number pushed to the stack?

| Memory 1   |    |    |    |    |      |  |
|------------|----|----|----|----|------|--|
| 0x00DEF8A8 | 00 | 00 | 00 | 00 | ...  |  |
| 0x00DEF8AC | 00 | 00 | 00 | 00 | ...  |  |
| 0x00DEF8B0 | 00 | 00 | 00 | 00 | ...  |  |
| 0x00DEF8B4 | 2d | 10 | 18 | 01 | -... |  |
| 0x00DEF8B8 | 02 | 00 | 00 | 00 | ...  |  |
| 0x00DEF8BC | 04 | 00 | 00 | 00 | ...  |  |
| 0x00DEF8C0 | 44 | 87 | f7 | 75 | D.÷u |  |
| 0x00DEF8C4 | 00 | f0 | f9 | 00 | .δù. |  |
| 0x00DEF8C8 | 20 | 87 | f7 | 75 | .÷u  |  |

# Walking through the code...

程序代写代做 CS编程辅导

53



```
call a           addtition function
01181028  call    inc (01181037h)
add es           the
0118102D  add    esp,8
```

WeChat: cstutorcs

- We pushed the instruction address that comes after the function call would finish, so that we can resume normal operation after fucntion call has finished.

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

| Memory 1   |    |    |    |    |      |
|------------|----|----|----|----|------|
| 0x00DEF8A8 | 00 | 00 | 00 | 00 | .... |
| 0x00DEF8AC | 00 | 00 | 00 | 00 | .... |
| 0x00DEF8B0 | 00 | 00 | 00 | 00 | .... |
| 0x00DEF8B4 | 2d | 10 | 18 | 01 | -... |
| 0x00DEF8B8 | 02 | 00 | 00 | 00 | .... |
| 0x00DEF8BC | 04 | 00 | 00 | 00 | .... |
| 0x00DEF8C0 | 44 | 87 | f7 | 75 | D.÷u |
| 0x00DEF8C4 | 00 | f0 | f9 | 00 | .δù. |
| 0x00DEF8C8 | 20 | 87 | f7 | 75 | .÷u  |

# Walking through the code...

程序代写代做 CS编程辅导

54



19 | push ebp  
20 | mov ebp,

tever ebp is for the caller  
the current stack pointer into the base pointe

Registers

EAX = 8AC2BA10 EBX = 00F9F000 ECX = 0118100A EDX = 0118100A  
ESI = 0118100A EII = 0118100A EDI = 01181038 ESP = 00DEF8B0  
EBP = 00DEF8D0 EFL = 00000246

WeChat: cstutorcs

- Now, we have saved the EBP state in the stack.
- This is because we are going to re-write it in the next statement.
- This will enable us to explicitly manipulate bits of memory.
- Last used address is now ESP =  $0x00DEF8B4 - 4 = 0x00DEF8B0$ .

Assignment Project Exam Help

Memory 1

|            |    |    |    |    |       |
|------------|----|----|----|----|-------|
| 0x00DEF8A8 | 00 | 00 | 00 | 00 | ....  |
| 0x00DEF8AC | 00 | 00 | 00 | 00 | ....  |
| 0x00DEF8B0 | d0 | f8 | de | 00 | DfP.  |
| 0x00DEF8B4 | 2d | 10 | 18 | 01 | -.... |
| 0x00DEF8B8 | 02 | 00 | 00 | 00 | ....  |
| 0x00DEF8BC | 04 | 00 | 00 | 00 | ....  |
| 0x00DEF8C0 | 44 | 87 | f7 | 75 | D.÷u  |
| 0x00DEF8C4 | 00 | f0 | f9 | 00 | .δù.  |
| 0x00DEF8C8 | 20 | 87 | f7 | 75 | .÷u   |

# Walking through the code...

程序代写代做 CS编程辅导

55

```
20      mov ebp,          the current stack pointer into the base pointe
21      mov ebx,          this is a ≤1ms elapsed
```



Registers

EAX = 8AC2BA10 EBX = 00F9F000 ECX = 0118100A EDX = 0118100A  
EST = 0118100A EPI = 0118100A EIP = 0118103A ESP = 00DEF8B0  
EBP = 00DEF8B0 EFL = 00000246

- We copied the ESP into the EBP register.
- Within addFunc, we will use the EBP register to access stack rather than ESP.
- We want to access a and b values from the stack next. Where are they?
- Last used address is now ESP =  $0x00DEF8B4 - 4 = 0x00DEF8B0$ .

WeChat: cstutorcs

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

| Memory     |             |
|------------|-------------|
| 0x00DEF8A8 | 00 00 00 00 |
| 0x00DEF8AC | 00 00 00 00 |
| 0x00DEF8B0 | d0 f8 de 00 |
| 0x00DEF8B4 | 2d 10 18 01 |
| 0x00DEF8B8 | 02 00 00 00 |
| 0x00DEF8BC | 04 00 00 00 |
| 0x00DEF8C0 | 44 87 f7 75 |
| 0x00DEF8C4 | 00 f0 f9 00 |
| 0x00DEF8C8 | 20 87 f7 75 |

# Walking through the code...

程序代写代做 CS编程辅导

56



```
20      mov ebp,  
21      mov ebx,
```

the current stack pointer into the base pointer  
this is a ≤1ms elapsed

Registers

|                |                |                |                |
|----------------|----------------|----------------|----------------|
| EAX = 8AC2BA10 | EBX = 00F9F000 | ECX = 0118100A | EDX = 0118100A |
| EST = 0118100A | EDI = 0118100A | ESP = 00DEF8B0 | ESP = 00DEF8B0 |
| EIP = 00DEF8B0 | EFL = 00000246 |                |                |

WeChat: estutorcs

Assignment Project Exam Help

- Currently ESP = 0x00DEF8B0.
- Email: [tutorcs@163.com](mailto:tutorcs@163.com)
- a is in address 0x00DEF8B8 = ESP + 8
- b is in address 0x00DEF8BC = ESP + 12
- Why? ⇒ We pushed the <https://tutorcs.com> address following on from the addFunc call and then the EBP address

| Memory | 0x00DEF8A8 | 00 | 00 | 00 | 00 | ....  |
|--------|------------|----|----|----|----|-------|
|        | 0x00DEF8AC | 00 | 00 | 00 | 00 | ....  |
|        | 0x00DEF8B0 | d0 | f8 | de | 00 | Dfp.  |
|        | 0x00DEF8B4 | 2d | 10 | 18 | 01 | -.... |
|        | 0x00DEF8B8 | 02 | 00 | 00 | 00 | ....  |
|        | 0x00DEF8BC | 04 | 00 | 00 | 00 | ....  |
|        | 0x00DEF8C0 | 44 | 87 | f7 | 75 | D.÷u  |
|        | 0x00DEF8C4 | 00 | f0 | f9 | 00 | .δù.  |
|        | 0x00DEF8C8 | 20 | 87 | f7 | 75 | .÷u   |

# Walking through the code...

程序代写代做 CS编程辅导

57

```
21 m [p + 8] ; this is a  
22 m [p + 12] ; this is b  
23 a  
24 p store ebp for the caller ≤1ms elapsed
```



Registers

EAX = 00000006 BX = 00201002 ECX = 0118100A EDX = 0118100A  
ESI = 0118100A EDI = 0118100A EIP = 01181042 ESP = 00DEF8B0  
EBP = 00DEF8B0 EFL = 00000206

WeChat: cstutores  
Assignment Project Exam Help

- We perform three steps now to add a and b.
- Stack is unaltered – only changed general purpose registers eax and ebx using the ebp to access values from the stack.
- The resulting sum is in eax =  $2 + 4 = 6$

Email: tutorcs@163.com

QQ: 749389476

<http://tutorcs.com>

Memory 1

|            |    |    |    |    |       |
|------------|----|----|----|----|-------|
| 0x00DEF8A8 | 00 | 00 | 00 | 00 | ....  |
| 0x00DEF8AC | 00 | 00 | 00 | 00 | ....  |
| 0x00DEF8B0 | d0 | f8 | de | 00 | Dfp.  |
| 0x00DEF8B4 | 2d | 10 | 18 | 01 | -.... |
| 0x00DEF8B8 | 02 | 00 | 00 | 00 | ....  |
| 0x00DEF8BC | 04 | 00 | 00 | 00 | ....  |
| 0x00DEF8C0 | 44 | 87 | f7 | 75 | D.÷u  |
| 0x00DEF8C4 | 00 | f0 | f9 | 00 | .ðù.  |
| 0x00DEF8C8 | 20 | 87 | f7 | 75 | .÷u   |

# Walking through the code...

程序代写代做 CS编程辅导

58



```
24 ; restore ebp for the caller
25 return ≤ 1ms elapsed

Registers
EAX = 00000006 EBX = 00000002 ECX = 0118100A EDX = 0118100A
ESI = 0118100A EDI = 0118100A EIP = 01181043 ESP = 00DEF8B4
EBP = 00DEF8D0 EFL = 00000000

WeChat: estutorcs
```

- We now restored EBP to original state so that the caller function do not see any changes to it, and therefore can use it as if nothing happened. **QQ: 749389476**
- Pressing next now return to the next instruction in the main function.
- Note that ESP has changed: it now “removed” (i.e. not tracking the piece of memory) EBP.

| Assignment Project Exam Help |             | Memory 1 |  |
|------------------------------|-------------|----------|--|
| 0x00DEF8A8                   | 00 00 00 00 | ....     |  |
| 0x00DEF8AC                   | 00 00 00 00 | ....     |  |
| 0x00DEF8B0                   | d0 f8 de 00 | Dfp.     |  |
| 0x00DEF8B4                   | 2d 10 18 01 | -....    |  |
| 0x00DEF8B8                   | 02 00 00 00 | ....     |  |
| 0x00DEF8BC                   | 04 00 00 00 | ....     |  |
| 0x00DEF8C0                   | 44 87 f7 75 | D..u     |  |
| 0x00DEF8C4                   | 00 f0 f9 00 | .ðù.     |  |
| 0x00DEF8C8                   | 20 87 f7 75 | .÷u      |  |

# Walking through the code...

程序代写代做 CS编程辅导

59



```
15 ; remove the
16 Process, 0 ; exit process ≤ 1ms elapsed
```

Registers

```
EAX = 00000006 EBX = 00000002 ECX = 0118100A EDX = 0118100A
ESI = 0118100A EDI = 0118100A EIP = 01181030 ESP = 00DEF8C0
EBP = 00DEF8D0 EFL = 00000216
```

WeChat: cstutorcs

Email: tutorcs@163.com

- After the addFunc we reset the ESP to the original starting point by [QQ:749389476](#) discounting the places where we put a and b).  
<https://tutorcs.com>
- The values are still there in the stack, but will be overwritten if we have other functions using the stack.

|            | Memory      | Value | Description |
|------------|-------------|-------|-------------|
| 0x00DEF8A8 | 00 00 00 00 | ..... |             |
| 0x00DEF8AC | 00 00 00 00 | ..... |             |
| 0x00DEF8B0 | d0 f8 de 00 | Dfp.  |             |
| 0x00DEF8B4 | 2d 10 18 01 | -.... |             |
| 0x00DEF8B8 | 02 00 00 00 | ..... |             |
| 0x00DEF8BC | 04 00 00 00 | ..... |             |
| 0x00DEF8C0 | 44 87 f7 75 | D.÷u  |             |
| 0x00DEF8C4 | 00 f0 f9 00 | .òù.  |             |
| 0x00DEF8C8 | 20 87 f7 75 | .÷u   |             |

程序代写代做 CS编程辅导



Thank you  
WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

Date

29/10/2019

School of Computing  
(University of Plymouth)