

# CODE GENERATION

## LECTURE 20

### SECTION 8.1 - 8.4

ROKAN UDDIN FARUQUI

Associate Professor  
Dept of Computer Science and Engineering  
University of Chittagong, Bangladesh  
Email: *rokan@cu.ac.bd*



# The compiler Backend

What have we obtained by the front-end transformations?

Assuming we have produced relatively low-level IR (like 3-address code), we can assume that all **syntactic** and **semantic** errors have been eliminated.



# Outline

1 Design Issues

2 Target Language

3 Basic Blocks and Flow Graphs



# What code generation issues are important?

- Correctness (preserving the semantics) of the program.
- Speed of the target program (efficient use of the target machine's resources)
- Speed and structure of the code generator (length, easy to implement, test and maintain).



# What code generation issues are important?

- Correctness (preserving the semantics) of the program.
- Speed of the target program (efficient use of the target machine's resources)
- Speed and structure of the code generator (length, easy to implement, test and maintain).



# What code generation issues are important?

- Correctness (preserving the semantics) of the program.
- Speed of the target program (efficient use of the target machine's resources)
- Speed and structure of the code generator (length, easy to implement, test and maintain).



# What code generation issues are important?

## Hilbert's 10th Problem

1936 — Alonzo Church — Computable Function  
Alan Turing → Turing Machine

$$L = \{ a^n b^n c^n \mid n > 0 \}$$

Generating an optimal target program for a given source program is **undecidable**. Many subproblems encountered in code generation are computationally **intractable**

- there are heuristics to generate good code
- optimization on the IR helps



# What code generation issues are important?

Generating an optimal target program for a given source program is **undecidable**. Many subproblems encountered in code generation are computationally **intractable**

- there are heuristics to generate good code
- optimization on the IR helps



# Code Generator: Input



- ➊ Three-address presentations (quadruples, triples, ...)
- ➋ Virtual machine presentations (bytecode, stack-machine, ...)  
Text
- ➌ Linear presentation (postfix, ...)
- ➍ Graphical presentation (syntax trees, DAGs, ...)



# Code Generator: Input

- ① Three-address presentations (quadruples, triples, ...)
- ② Virtual machine presentations (bytecode, stack-machine, ...)
- ③ Linear presentation (postfix, ...)
- ④ Graphical presentation (syntax trees, DAGs, ...)



# Code Generator: Input

- ① Three-address presentations (quadruples, triples, ...)
- ② Virtual machine presentations (bytecode, stack-machine, ...)
- ③ Linear presentation (postfix, ...)
- ④ Graphical presentation (syntax trees, DAGs, ...)



# Code Generator: Input

- ① Three-address presentations (quadruples, triples, ...)
- ② Virtual machine presentations (bytecode, stack-machine, ...)
- ③ Linear presentation (postfix, ...)
- ④ Graphical presentation (syntax trees, DAGs, ...)



# Code Generator : Target Program

## ① Instruction set architecture (RISC, CISC)

**CISC** MANY, SPECIALIZED instructions, FEW registers, variety of addressing-modes (CISC requires micro code)

**RISC** FEW, SIMPLE, ORTHOGONAL instructions, MANY registers, simple addressing-modes (RISK drives hardware directly)

**Stack Machine** Instruction set is organized with NO REGISTERS, all operations done on stack (e.g. ADD instructions whence no arguments..)

- ➊ Producing absolute machine-language program
- ➋ Producing relocatable machine-language program
- ➌ Producing assembly language programs



# Code Generator : Target Program

- ① Instruction set architecture (RISC, CISC)

✓ **CISC** MANY, SPECIALIZED instructions, FEW registers, variety of addressing-modes (CISC requires micro code)

✓ **RISC** FEW, SIMPLE, ORTHOGONAL instructions, MANY registers, simple addressing-modes (RISK drives hardware directly)

**Stack Machine** Instruction set is organized with NO REGISTERS, all operations done on stack (e.g. ADD instructions whence no arguments..)

- ② Producing absolute machine-language program
- ③ Producing relocatable machine-language program
- ④ Producing assembly language programs

# Code Generator : Target Program

- ① Instruction set architecture (RISC, CISC)

**CISC** MANY, SPECIALIZED instructions, FEW registers, variety of addressing-modes (CISC requires micro code)

**RISC** FEW, SIMPLE, ORTHOGONAL instructions, MANY registers, simple addressing-modes (RISK drives hardware directly)

**Stack Machine** Instruction set is organized with NO REGISTERS, all operations done on stack (e.g. ADD instructions whence no arguments..)

- ② Producing absolute machine-language program
- ③ Producing relocatable machine-language program
- ④ Producing assembly language programs

# Code Generator : Target Program

- ① Instruction set architecture (RISC, CISC)

**CISC** MANY, SPECIALIZED instructions, FEW registers, variety of addressing-modes (CISC requires micro code)

**RISC** FEW, SIMPLE, ORTHOGONAL instructions, MANY registers, simple addressing-modes (RISK drives hardware directly)

**Stack Machine** Instruction set is organized with NO REGISTERS, all operations done on stack (e.g. ADD instructions whence no arguments..)

- ② Producing absolute machine-language program
- ③ Producing relocatable machine-language program
- ④ Producing assembly language programs

# Main Tasks of Code Generator

$t_1 = a + b$   
 $t_2 = t_1 - c$

**Instruction selection** choosing appropriate target-machine instructions to implement the IR statements

**Registers allocation and assignment** deciding what values to keep in which registers

**Instruction ordering** deciding in what order to schedule the execution of instructions



# Instruction Selection

The complexity of mapping IR program into code-sequence for target machine depends on:

- ① Level of IR (high-level or low-level)
- ② Nature of instruction set (data type support)
- ③ Desired quality of generated code (speed and size)



# Instruction Selection

The complexity of mapping IR program into code-sequence for target machine depends on:

- ① Level of IR (high-level or low-level)
- ② Nature of instruction set (data type support)
- ③ Desired quality of generated code (speed and size)



# Instruction Selection

The complexity of mapping IR program into code-sequence for target machine depends on:

- ① Level of IR (high-level or low-level)
- ② Nature of instruction set (data type support)
- ③ Desired quality of generated code (speed and size)



# Instruction Selection

- ➊ How closely does the IR match the target language?
- ➋ How good can the code be?
- ➌ What kind of target architecture are we considering?



# Instruction Selection

- ➊ How closely does the IR match the target language?
- ➋ How good can the code be?
- ➌ What kind of target architecture are we considering?



# Instruction Selection

- ➊ How closely does the IR match the target language?
- ➋ How good can the code be?
- ➌ What kind of target architecture are we considering?



# Register allocation and assignment

A register is the **FASTE<sup>N</sup> COMPUTATIONAL UNIT** on the target machine. When not enough registers to hold all values, the remaining values reside in memory

**Allocation** : Selecting the set of variables that will reside in registers at each point in the program

**Assignment** : Picking the specific register that a variable will reside in

- Hard! (NP complete).
- Operating system conventions.
- Hardware restrictions:
  - Special purpose registers (stack, index, frame, etc.)
  - Register pair (even/odd registers)

# Evaluation Order

- Selecting the order in which computations are performed
- Affects the efficiency of the target code
- Picking a best order is NP-complete
- Some orders require fewer registers than others



# Cost?

Everything is a trade-off when generating efficient code.

- Instruction code size.
- Instruction memory access time.
- Instruction execution time.



# Outline

1 Design Issues

2 Target Language

3 Basic Blocks and Flow Graphs



# Simple Target-Machine

- Load/store operations

$LD \ dst, \ addr$

$ST \ x, \ r$

8-bit Microprocessor

$a=1000$

$a[4]=$

$b=a[i]$

- Computation operations:

$OP \ dst, \ src1, \ src2$

- Jump operations:

$BR \ L$

- Conditional jumps:

$Bcond \ r, \ L$

- Byte addressable

- $n$  registers:  $R0, R1, \dots, Rn - 1$



# Simple Target-Machine

$*a(r) = \text{contents}(\text{contents}(a + \text{contents}(r)))$

## ~~Addressing modes:~~

- variable name
- $a(r)$  means:  $\text{contents}(a + \text{contents}(r))$
- $*a(r)$  means:  $\text{contents}(\text{contents}(a + \text{contents}(r)))$
- immediate:  $\#constant$  (e.g.  $LD\ R1,\ #100$ )



# Cost

- cost of an instruction  $= 1 +$  cost of operands
- cost of register operand  $= 0$
- cost involving memory and constants  $= 1$
- cost of a program  $=$  sum of instruction costs



# Example: $x = y - z$

| Code        | Instructions                                       | Notes Effects                                      |
|-------------|----------------------------------------------------|----------------------------------------------------|
| $x = y - z$ | LD R1, y<br>LD R2, z<br>SUB R1, R1, R2<br>ST x, R1 | $R1 = y$<br>$R2 = z$<br>$R1 = R1 - R2$<br>$x = R1$ |



Example:  $b = a[i]$

$a[4]$

| Code       | Instructions                                          | Notes Effects                                                                            |
|------------|-------------------------------------------------------|------------------------------------------------------------------------------------------|
| $b = a[i]$ | LD R1, i<br>MUL R1, R1, 8<br>LD R2, a(R1)<br>ST b, R2 | $R1 = i$<br>$R1 = R1 * 8$<br>$R2 = \text{contents}(a + \text{contents}(R1))$<br>$b = R2$ |



# Example: $x = *p$

| Code     | Instructions                                | Notes Effects                                                           |
|----------|---------------------------------------------|-------------------------------------------------------------------------|
| $x = *p$ | LD R1, p<br>LD R2, <u>0(R1)</u><br>ST x, R2 | $R1 = p$<br>$R2 = \text{contents}(0 + \text{contents}(R1))$<br>$x = R2$ |



# Outline

1 Design Issues

2 Target Language

3 Basic Blocks and Flow Graphs



# Idea

- ① **Basic Blocks** are maximal sequences of consecutive three-address instructions,
  - ① control flow can only enter through first instruction,
  - ② control flow can only leave from last instruction.



# Basic Blocks

- Graph presentation of intermediate code
- Nodes of the graph are called basic blocks
- Edges indicate which block follows which other block.
- The graph is useful for doing better job in:
  - Register allocation
  - Instruction selection



# Algorithm: Determine Basic Blocks

**Input:** A (numbered) sequence of three-address instructions.

**Output:** A mapping from the instruction (numbers) to basic blocks.

**Method:** Generate set of leaders.

- ➊ The **first** instruction in sequence is the **leader**.
- ➋ Any instruction that is the target of branch is a **leader**.
- ➌ Any instruction that follows a branch is a **leader**.

Now create a basic block per leader, and associate the leader with it. Associate all non-leaders with the basic block of the nearest proceeding leader.

# Example: $10 \times 10$ Identity Matrix

```
1  for i from 1 to 10 {  
2    for j from 1 to 10 {  
3      a[ i , j ] = 0.0;  
4    } }  
5    for i from 1 to 10 {  
6      a[ i , i ] = 1.0; }  
7  
8
```



- Numbers are doubles (of size 8 bytes)



# Three-address code:

```
1. i = 1
2. j = 1
3. t1 = 10 * i
4. t2 = t1 + j
5. t3 = 8 * t2 // each element a double
6. t4 = t3 - 88 // unused, allocated space, used
7. a[t4] = 0.0
8. j = j + 1
9. if j <= 10 goto 3
10. i = i + 1
11. if i <= 10 goto 2
12. i = 1
13. t5 = i - 1
14. t6 = 88 * t5 // strength reduction (reducing complexity)
15. a[t6] = 1.0
16. i = i + 1
17. if i <= 10 goto 13
```

1.  $i = 1$
2.  $j = 1$
3.  $t1 = 10 * i$
4.  $t2 = t1 + j$
5.  $t3 = 8 * t2$
6.  $t4 = t3 - 88$
7.  $a[t4] = 0.0$
8.  $j = j + 1$
9. if  $j \geq 10$  goto 3
10.  $i = i + 1$
11. if  $i \geq 10$  goto 2
12.  $i = 1$
13.  $t5 = i - 1$
14.  $t6 = 88 * t5$
15.  $a[t6] = 1.0$
16.  $i = i + 1$
17. if  $i \leq 10$  goto 13





# Flow graphs

- **Flow Graph** is the graph with basic blocks as nodes and branches as directed edges
- Picture of basic blocks with arrows for possible control flows.
- A loop is sequence of distinct basic blocks where
  - ① Entry to first block is loop entry is only label where control comes from basic blocks outside the sequence.
  - ② It is possible to reach the loop entry from any block in the sequence.



# DAG Representation of Basic Blocks

- Leaves for initial values of variables (we may not know the values so we use  $a_0$ ,  $b_0$ , ...)
- Node for each expression
- Node label is the expression operation
- Next to the node we put the variable(s) for which the node produced last definition
- Children of a node consist of nodes producing last definition of operands



# Finding Local Common Subexpressions

```

a = b + c
b = a - d
c = b + c
d = a - d
    
```



```

a = b + c
d = a - d
c = d + c
    
```

Construct the DAG for the basic block

```
d = b * c  
e = a + b  
b = b * c  
a = e - d
```



# Dead Code Elimination

From the basic block DAG:

- Remove any root node that has no live variables
- Repeat until no nodes can be removed



```
a = b + c;  
b = b - d  
c = c + d  
e = b + c
```

