

# "Procedure Calling"

\* Let's assume procedure like a SPJ.

Execution process of SPJ

(i) leaves with a secret plan.

(ii) acquires resources.

(iii) performs the task.

(iv) covers traces.

(v) returns to the point of origin with  
desired answers.

Execution of a procedure

Step 1: Put parameters in a place where the procedure  
can access them.  $(X_{10} - X_{17})$

Step 2: Transfer control to the procedure.

Step 3: Acquire storage resources needed for the proc.

Step 4: Perform the desired task.

Step 5: Put the result value in a place where  
the calling program can access it.

Step 6: Return control to the point of origin.  $\rightarrow X_1$

[A proc. can be called from several  
points in a program]

JAL  $\Rightarrow$  Jump and Link instruction

Syntax: JAL  $X_1$ , procedureLabel  
 $\uparrow$  Fixed  
Jump Destination

Explanation: Jump to procedureLabel and write return Address to  $X_1$ .

JALR  $\Rightarrow$  Jump and Link Register

Syntax: JALR  $X_0$ ,  $O[X_1]$   
 $\downarrow$  Jump Destination

\* The calling program (caller) puts the parameters values in  $X_{10} - X_{17}$ .

\* Uses  $Jal X_1, abc$  to branch to procedure label  $abc$  (callee)

\* Callee performs the calculations, places the results in the same parameter registers

\* Returns control to the caller using  $jalr X_0, O(X_1)$

PC (Program Counter)  $\Rightarrow$  is the register that holds the address of the current instruction being executed.

Jal  $x_1$ , procedureLabel  
↑

$(PC+4)$  is stored.

Jal  $x_0$ , Label  $\Rightarrow$  unconditional branch within a procedure.  
↑

0 is hardwired;

Discard the return Address

What if compiler needs more registers for a procedure than the 8 arg. registers?

$\Rightarrow$  If the callee require any register that is already in use by the caller, callee must restore the values that were contained before the procedure was invoked.

First store the values  
in stack

You want to  
store 3 register  
datas.

$\Rightarrow$  each register size = 64 bits

High Add.

= 8 locations

SP  $\Rightarrow$  24  
23  
22  
21  
20  
19  
18  
17

Stack

$\Rightarrow$  to store data of 3 registers  
 $= (3 \times 8) = 24$  locations

are needed

(LIFO)

16  
15  
14  
13  
12  
11  
10  
9  
8  
7  
6  
5  
4  
3  
2  
1  
0

(i) Decrease the SP by the #locations you need

(ii) Then store the values in stack

Low Add.

```
def leaf_example(g, h, i, j):
    f = (g+h) - (i+j)
    return f
```

leaf\_procedure:

addi SP, SP, -24

sd x5, 16(SP)  
sd x6, 8(SP)  
sd x20, 0(SP)

} stored into stack

$g = x_{10}$

$h = x_{11}$

$i = x_{12}$

$j = x_{13}$

$f = x_{20}$  } save these  
 $\text{temp} = x_5, x_6$  } 3 in the stack first.

Add  $x_5, x_{10}, x_{11}$

Add  $x_6, x_{12}, x_{13}$

Sub  $x_{20}, x_5, x_6$

Addi  $x_{10}, x_{20}, 0$  } store the return value

$| d \quad x_{20}, \quad 0(sp) \quad \} \quad$   
 $| d \quad x_6, \quad 8(sp) \quad \} \quad \text{stored into}$   
 $| d \quad x_5, \quad 16(sp) \quad \} \quad \text{stack}$

addi sp, sp, 24

jalr x0, 0[x1]

# To avoid saving and restoring a register whose value is never used,

$x_5 - x_7$  and  $x_{28} - x_{31} \Rightarrow$  temp. Register values are not preserved by the callee.

$x_8 - x_9$  and  $x_{18} - x_{27} \Rightarrow$  saved register values must be preserved by the callee.

Hence, from the above example we do not need to restore the values of  $x_5, x_6$

## "Memory Layout"



heap - Dynamic Data - Linked List

constants  
Global Variables - arrays, strings ] global pointers  $x_3$   
RISC-V machine code

FIGURE 2.13 The RISC-V memory allocation for program and data.

$\Rightarrow$  RISC-V convention for allocation of memory when running the Linux OS.

# Local Data on Stack



The segment of the stack that contains a procedure's saved registers, and local variables called **Procedure Frame / Activation Period.**

# What if there are more than 8 parameters?

→ saved reg.  
→ arg reg.  
→ return add. reg.

→ place the extra parameters on the stack just above the frame pointer.

→ Procedure will expect first eight params to be in  $(x_{10} - x_{17})$  & rest in memory, addressable via the frame pointer.

## RISC-V byte/halfword/word load/store

- Load byte/halfword/word: Sign extend to 64 bits in rd
  - `lb rd, offset(rs1)`
  - `lh rd, offset(rs1)`
  - `lw rd, offset(rs1)`
- Load byte/halfword/word unsigned: Zero extend to 64 bits in rd
  - `lbu rd, offset(rs1)`
  - `lhu rd, offset(rs1)`
  - `lwu rd, offset(rs1)`
- Store byte/halfword/word: Store rightmost 8/16/32 bits
  - `sb rs2, offset(rs1)`
  - `sh rs2, offset(rs1)`
  - `sw rs2, offset(rs1)`

Try this  $\Rightarrow X_{10} = 1111\ 1111\ 1111\ 0000$

Add:  $X_{10}$ ,  $X_{10}$ , 65520



then how do we do this?



LUI = Load Upper Immediate — use it to form 32 bit  
immediates

Syntax = LUI rd, constant

- \* copies the 20 bit data into rd's [31:12]
- \* copy whatever you have in bit 31 to bits [63:32]
- \* copy 0s in [11:0] of rd

load this 64 bit value into  $X_{10} \Rightarrow$



lui  $X_{10}$ , 976

$X_{10} = 0000 0000 0000 0000 0000 0000 0000 \boxed{0000 0000 0011 1101 0000} 0000 0000 0000$

addi  $X_{10}$ ,  $X_{10}$ , 1280

$X_{10} = 0000 0000 0000 0000 0000 0000 0000 \boxed{0000 0000 0011 1101 0000} 0101 0000 0000$

U type instruction  $\rightarrow LUI$



\* each field has unique name and size.



## Branching Instructions

**SB type instruction** — beq, bne, blt, bge

31

|   |   |   |   |   |   |   |   |
|---|---|---|---|---|---|---|---|
| 1 | 6 | 5 | 5 | 3 | 4 | 1 | 2 |
|---|---|---|---|---|---|---|---|

0

# represents branch addresses  
in multiples of 2.

# each field has unique name and size.

|          |            |     |     |        |           |         |        |
|----------|------------|-----|-----|--------|-----------|---------|--------|
| imm [12] | imm [10:5] | r22 | r21 | funct3 | imm [4:1] | imm [1] | opcode |
|----------|------------|-----|-----|--------|-----------|---------|--------|

# forwarded & backward moving.  
# the unusual encoding of imm.

simplifies datapath design.

Loop:

- #80000 slli x10, x22, 3 ✓ — line 1
- #80004 add x10, x10, x25 ✓ — line 2
- #80008 ld x9, 0(x10) ✓ — line 3
- #80012 bne x9, x24, Exit ✓ — line 4
- #80016 addi x22, x22, 1 ✓ — line 5 ✓
- #80020 beq x9, x9, loop — line 6 ✓

Exit:

- #80024 — line 7 ✓

$$3 \times 4 = 12$$

0000 0000 1100

0 0000 0000 1100  
12 11 10:5 4:1

Discard it.

|      |         |       |       |     |      |   |         |
|------|---------|-------|-------|-----|------|---|---------|
| 0    | 000 000 | 11000 | 01001 | XXX | 0110 | 0 | xxxxxxx |
| [12] | [10:5]  |       |       |     |      |   | OP code |

|      |         |       |       |     |      |   |         |
|------|---------|-------|-------|-----|------|---|---------|
| 1    | 111 111 | 00000 | 00000 | XXX | 0110 | 1 | xxxxxxx |
| [12] | [10:5]  |       |       |     |      |   | OP code |

$5 \times 4 = 20$  but its going upward so -20.

= 0000 0001 0100

= 0 0000 0001 0100

= 1 111 1110 1011

+ 1

$\overline{1 111 1110 1100} \Rightarrow -20$  in 2's comp.

|   |         |       |       |     |      |   |         |
|---|---------|-------|-------|-----|------|---|---------|
| 1 | 111 111 | 11000 | 01001 | XXX | 0110 | 1 | xxxxxxx |
|---|---------|-------|-------|-----|------|---|---------|

0

(i) Form the 12 bit number

11 111 111 0110

(ii) Detect if positive or negative number.

if neg perform 2's comp again,

1111 1111 0110

0000 0000 1001

$+ 1$   
 $\overline{0000 0000 1010} \Rightarrow 10$

(iii) PC  $\oplus$  imm  $\times 2$  offset

Based on the sign bit

## UJ type instruction - Jal

31

|   |    |   |   |   |   |
|---|----|---|---|---|---|
| 1 | 10 | 1 | 8 | 5 | x |
|---|----|---|---|---|---|

0

# uses 20 bit

immediates for  
larger jumps.

# each field has unique name and size.

|                     |                     |                  |                        |    |        |
|---------------------|---------------------|------------------|------------------------|----|--------|
| imm <sub>20:8</sub> | imm <sub>10:1</sub> | imm <sub>1</sub> | imm <sub>[10:12]</sub> | rd | opcode |
|---------------------|---------------------|------------------|------------------------|----|--------|

# For more large jump, (32 bit)

lui: load address [31:12]  
to a temp reg.

Jal x<sub>0</sub>, 2000

|   |              |   |           |       |        |
|---|--------------|---|-----------|-------|--------|
| 0 | 1111 0100 00 | 0 | 0000 0000 | 00000 | opcode |
|---|--------------|---|-----------|-------|--------|

jalr: add address [11:0] and  
jump to target.

$$2000 = 0000 \ 0000 \ 0111 \ 1101 \ 0000$$

$$= 0 \underbrace{0000 \ 0000}_{20} \ / \underbrace{1111 \ 1101 \ 0000}_{10:1}$$

Given a branch on register x<sub>10</sub> being equal to zero,

beq x<sub>10</sub>, x<sub>0</sub>, L1

replace it by a pair of instructions that offers a much greater  
branching distance.

### Answer

These instructions replace the short-address conditional branch:

```
bne x10, x0, L2
jal x0, L1
L2:
```