

Code Generation:

- \* The final phase in our compiler model is the Code Generator.
- \* It takes as input the Intermediate Representation (IR) produced by the front end of the compiler, along with relevant symbol table information, and produces as output a semantically equivalent target program.



\* The target program must preserve the semantic meaning of the source program and be of high quality i.e., it must make effective use of the available resources of the Target machine.

\* Moreover, (it must make effective) the Code Generator itself must run efficiently.

\* Compilers that need to produce efficient target programs, include an optimization phase prior to code generation.

The optimizer maps the IR into IR from which more efficient code can be generated.

\* In General, the code optimization and code generation phases of a compiler, often referred to as the back end.

## Issues in the Design of a Code Generator:

The issues in the design of code generators are,

- Input to the code generator.
- Target programs.
- Instruction selection
- Register Allocation
- Evaluation order.

Input to the code generator: The input to the code generator consists of the Intermediate Representation of the source program, together with information in the symbol table that is used to determine the runtime addresses of the data objects denoted by the names.

In the IR.

\* we assume that paid to code generator, the I<sub>RP</sub> has been validated by the front end, i.e., typechecking, syntax, semantics etc.

Target program: The output of the code generation is the target program. The target program may take a variety of forms i.e.; absolute machine language, relocatable machine language (e.g.) assembly language.

(3)

Absolute machine language: output → that it can be placed in a fixed location in memory and immediately executed.

Relocatable machine code: output allows subprograms to be compiled separately. Although we must pay the added expense of linking and loading if we produce relocatable object modules.

Producing an assembly code as output makes the process of code generation easier as we can generate symbolic instructions.

Instruction selection: the nature of the instruction set of the target machine determine the difficulty of instruction selection. For each type of three address statement, we can design a code that outlines the target code to be generated for that construct. For example, every three address code statement of the form  $x := y + z$ , where  $x, y$  and  $z$  are statically allocated, can be translated into the code sequence.

LOAD Y, R<sub>0</sub> /\* load y into Register R<sub>0 \*/ R<sub>0</sub> = y</sub>

ADD Z, R<sub>0</sub> /\* add z to R<sub>0 \*/ R<sub>0</sub> = R<sub>0</sub> + z</sub>

MOV R<sub>0</sub>, X /\* Store R<sub>0</sub> into X \*/ X = R<sub>0</sub>

The quality of the generated code is determined by its speed and size. i.e,

For example the sequence of three address stmt

(4)

$$a = b + c$$

$$d = a + e$$

would be translated into

|                                  |                                 |
|----------------------------------|---------------------------------|
| 1, LD R0,b // $R_0 \leftarrow b$ | 2, a = a + 1                    |
| ADD R0,R0,c // $R_0 = R_0 + c$   | LD R0,a // $R_0 \leftarrow a$   |
| ST a,R0 // $a = R_0$             | ADD R0,R0,#1 // $R_0 = R_0 + 1$ |
| LD R0,a // $R_0 \leftarrow a$    | ST a,R0 // $a = R_0$            |
| ADD R0,R0,e // $R_0 = R_0 + e$   |                                 |
| ST d,R0 // $d = R_0$             |                                 |

Here, the fourth statement is redundant since it loads a value that has just been stored, and so is the third if 'a' is not subsequently used.

Instruction used i.e., which instruction should be used in case there are multiple instructions that do the same job.

Register Allocation:— A key problem in code generation is deciding which values to hold in what registers.

\* Registers are the fastest computational unit on the target machine. but we usually do not have enough of them to hold all values.

\* Values not held in registers need to reside in memory.

\* Instructions involving Register operands are invariably shorter and faster than those involving operands in memory, so efficient utilisation of Registers is particularly important.

→ the use of Registers is often subdivided into two sub-problems:

1. Register allocation: during which we select the set of variables that will reside in registers at each point in the program.

2. Register assignment: during which we pick the specific register that a variable will reside in.

\* store long life time values that are often used in Registers.

### Even odd Register pairs:

certain machine require register pairs (even odd register) for some operands and result.

Ex! Some machines, integer multiplication and integer division involve register pairs. The multiplication instruction is of the form

$$M \times_1 Y$$

where  $X$ , the multiplicand, is the even register of an even/odd register pair and  $Y$ , the multiplier, is the odd register.

The product occupies the entire even/odd register pair. (6)

The division instruction is of the form.

D X/Y

where the dividend occupies an even/odd register pair whose even register is X; the divisor is Y. After division, the even register holds the remainder and the odd register the quotient.

Evaluation order!: the order in which computations are performed can affect the efficiency of the target code.

The order in which the instructions will be executed, thus increases performance of the code.

Example:

$a+b - (c+d)*e$

↓

$t_1 := a+b$       MOV a, R<sub>0</sub>  
 $t_2 := c+d$       ADD b, R<sub>0</sub>  
 $t_3 := c*t_2$       MOV R<sub>0</sub>, t<sub>2</sub>  
 $t_4 := t_1 + t_3$     MOV c, R<sub>1</sub>  
                      ADD d, R<sub>1</sub>  
                      MOV e, R<sub>0</sub>  
                      MUL R<sub>1</sub>, R<sub>0</sub>  
                      MOV t<sub>1</sub>, R<sub>1</sub>  
                      SUB R<sub>0</sub>, R<sub>1</sub>  
                      MOV R<sub>1</sub>, t<sub>4</sub>

Re-ordered instructions and code.

$t_2 := c+d$       MOV c, R<sub>0</sub>  
 $t_3 := ext_2$       ADD d, R<sub>0</sub>  
 $t_1 := a+b$       MOV e, R<sub>1</sub>  
 $t_4 := t_1 - t_3$     MUL R<sub>0</sub>, R<sub>1</sub>  
                      MOV a, R<sub>0</sub>  
                      ADD b, R<sub>0</sub>  
                      SUB R<sub>1</sub>, R<sub>0</sub>  
                      MOV R<sub>0</sub>, t<sub>4</sub>

when instructions are independently, their evaluation order can be changed to utilize and save instruction cost.

## Peephole optimization:-

- \* while most production compilers produce good code through careful instruction selection and Register allocation;
- \* A simple but effective technique for locally improving the target code is "peephole optimization", which is done by examining a sliding window of target instructions and replacing instruction sequences within the peephole by a shorter and faster sequence.
- \* peephole optimization can also be applied directly after intermediate code generation to improve the intermediate representation.
- \* the peephole is a small, sliding window on a program. the code in most peepholes need not be contiguous, although some implementations do require this.

Characteristics of peephole optimization are:

- \* Redundant-instruction elimination
- \* flow-of-control optimizations
- \* Algebraic simplifications
- \* use of machine idioms.

Redundant-instruction Elimination: If the target code contains the instruction sequence

Mov R, a

Mov a, R

We can Delete the second instruction if it is an unlabeled instruction, because first instruction ensure that the value

$a$  is already in Register R. If it is labeled - there is no guarantee that instruction ① will always be executed before instruction ②.

LD a, R0

ST R0, a

In a target program, we can delete the store instruction because whenever it is executed, the first instruction will ensure that the value of  $a$  has already been loaded into Register R0.

### Eliminating Unreachable code:

- \* Another opportunity for peephole optimization is the Removal of unreachable instructions. An unlabeled instruction immediately following an unconditional jump may be removed.
- \* This operation can be repeated to eliminate a sequence of instructions.

For example:

Sum = 0

if (sum)

printf ("%d", sum);

In above source code if (sum) statement, it never get executed, so, it is say that unreachable instruction and we can eliminate it.

Ex:-2

if a < b goto L1

goto L2

a = b

L1: a = b + 2

L2: b = a + 2

Flow of control optimisations: If we have jumps to jumps, then ③

→ these unnecessary jumps can be eliminated in either intermediate code (or) Target code.

Ex-1

If we have the jump sequence as shown below,

Goto L<sub>1</sub>

.....

L<sub>1</sub>: Goto L<sub>2</sub>

Then this can be replaced by,

Goto L<sub>2</sub>

.....

L<sub>1</sub>: Goto L<sub>2</sub>

if there are now no jumps to L<sub>1</sub>, then it may be possible to eliminate the statement L<sub>1</sub>: goto L<sub>2</sub> provided by an unconditional jump.

Similarly, the sequence if a < b goto L<sub>1</sub> can be transformed as

if a < b goto L<sub>1</sub>

L<sub>1</sub>: goto L<sub>2</sub>

can be replaced by,

if a < b goto L<sub>2</sub>

L<sub>1</sub>: goto L<sub>2</sub>.

Finally, suppose there is only one jump to L<sub>1</sub> and L<sub>1</sub> is preceded by an unconditional goto. Then the sequence,

Goto L<sub>1</sub>

L<sub>1</sub>: if a < b goto L<sub>2</sub>

L<sub>3</sub>:

may be replaced by the sequence

if a < b goto L<sub>2</sub>

Ex: if x > y goto L<sub>4</sub>

L<sub>1</sub>: goto L<sub>2</sub>

L<sub>2</sub>: goto L<sub>3</sub>

L<sub>3</sub>: goto L<sub>4</sub>

L<sub>4</sub>: goto L<sub>5</sub>

L<sub>5</sub>: goto L<sub>6</sub>

L<sub>6</sub>: goto L<sub>7</sub>

L<sub>7</sub>: goto L<sub>8</sub>

L<sub>8</sub>: goto L<sub>9</sub>

L<sub>9</sub>: goto L<sub>10</sub>

L<sub>10</sub>: goto L<sub>11</sub>

L<sub>11</sub>: goto L<sub>12</sub>

L<sub>12</sub>: goto L<sub>13</sub>

L<sub>13</sub>: goto L<sub>14</sub>

L<sub>14</sub>: goto L<sub>15</sub>

L<sub>15</sub>: goto L<sub>16</sub>

L<sub>16</sub>: goto L<sub>17</sub>

L<sub>17</sub>: goto L<sub>18</sub>

L<sub>18</sub>: goto L<sub>19</sub>

L<sub>19</sub>: goto L<sub>20</sub>

L<sub>20</sub>: goto L<sub>21</sub>

L<sub>21</sub>: goto L<sub>22</sub>

L<sub>22</sub>: goto L<sub>23</sub>

L<sub>23</sub>: goto L<sub>24</sub>

L<sub>24</sub>: goto L<sub>25</sub>

L<sub>25</sub>: goto L<sub>26</sub>

L<sub>26</sub>: goto L<sub>27</sub>

⇒ if x > y goto L<sub>10</sub>

• sufficient exploit will be

## Algebraic Simplification and Reduction in Strength

(4)

\* The algebraic identities that could be used to simplify DAG's. These algebraic identities can also be used by a peephole optimizer to eliminate three address statement such as.

$$x = x + 0$$

(8)

Such pattern  $x = x + 1$  in the peephole.

Similarly, Reduction-in-strength transformations can be applied on the peephole to replace expensive operations by equivalent cheaper ones on the target machine. Certain machine instructions are considerably cheaper than others and can often be used as special cases of more expensive operators. For example,  $x^2$  is invariably cheaper to implement as  $x*x$  than as a call to an exponentiation routine.

## use of machine idioms:-

The Target machine may have hardware instructions to implement certain specific operations efficiently. Detecting situations that permit the use of these instructions can reduce execution time significantly. For example

Some machines have auto-increment and auto-decrement addressing modes. These add or subtract one from an operand before or after using its value. The use of the modes greatly improves the quality of code when pushing or popping a stack, as in parameter passing.

These modes can also be used for implementation of statements like

$$x := x + 1$$

inc i

$$i = i - 1$$

Dec i

## Target Machine:-

Our target computer models a three-address machine with load and store operations, computation operations, jump operations and conditional jumps.

- \* The underlying computer is a byte-addressable machine with  $n$  general-purpose Registers,  $R_0, R_1, \dots, R_{n-1}$ .

It has two-address instruction of the form:

opcode source, destination.

In which opcode is an operation, and source and destination address fields.

opcode-operations, it is typically

(shown with MOV - move content of source to destination)

ADD (add content of source to destination)

SUB (subtract content of source from destination)

In this, the result will be in the destination field.

- \* The source and destination of instruction are specified by combining Registers and memory locations with address modes.

the address modes together with their Assembly language forms and associated costs. ⑨

- \* Addressing modes involving Registers have zero additional cost, indeed, essentially each instruction having cost is 1 unit.
- \* we shall assume each target-language instruction has an associated cost, for simplicity, we take the cost of an instruction to be one plus the cost associated with the addressing mode of the operands. This cost corresponds to the length in words of the instruction.

Cost of Instruction:

$$1 + \text{cost(Source mode)} + \text{cost(Destination mode)}$$

Target Machine - Addressing modes:

| Mode     | F&m   | Address         | Added cost |
|----------|-------|-----------------|------------|
| Absolute | M     | M               | 1          |
| Register | R     | R               | 0          |
| Indexed  | (C,R) | C + contents(R) | 1          |

|                    |                |                           |   |     |
|--------------------|----------------|---------------------------|---|-----|
| Indirect Registers | $\times R$     | contents (R)              | 0 | (3) |
| Indirect Indexed   | $\times CCR$ ) | contents (C+contents (R)) | 1 |     |
| Literal            | # C            | source to be a constant.  | 1 |     |

It is Immediate Address mode.

Example:

| S.N | Instruction   | operation                                            | cost |
|-----|---------------|------------------------------------------------------|------|
| 1   | MOV R0, M     | store content of Register (R0) into memory location. | 2    |
| 2   | MOV R0, R1    | store content of Register (R0) into Register (R1)    | 2    |
| 3   | MOV M, R0     | store content of (M) into Register R0                | 2    |
| 4   | MOV #1(R0), M | store contents (#1+contents (R0)) into M             | 3    |
| 5   | MOV #1, R0    | loads the constant 1 into Register R0                | 2    |

Some of the difficulties in generating code for this machine can be seen by considering what code to generate for a three-address statement of the form  $a := b + c$ . This statement can be implemented by many different instruction sequences.

1 address mode cost

(4)

Instruction cost

|            |                        |     |   |
|------------|------------------------|-----|---|
| 1)         | Mov b, R <sub>0</sub>  | 1+1 | 2 |
|            | ADD C, R <sub>0</sub>  | 1+1 | 2 |
|            | Mov R <sub>0</sub> , a | 1+1 | 2 |
| total cost |                        |     | 6 |
| 2).        | Mov b, a               | 1+2 | 3 |
|            | ADD c, a               | 1+2 | 3 |
| total cost |                        |     | 6 |

Assuming R<sub>0</sub>, R<sub>1</sub> and R<sub>2</sub> contain the address of a, b and c respectively.

|            |                                       |     |   |
|------------|---------------------------------------|-----|---|
| 3)         | Mov *R <sub>1</sub> , *R <sub>0</sub> | 1+0 | 1 |
|            | ADD *R <sub>2</sub> , *R <sub>0</sub> | 1+0 | 1 |
| total cost |                                       |     | 2 |

|            |                                     |     |   |
|------------|-------------------------------------|-----|---|
| 4)         | ADD R <sub>2</sub> , R <sub>1</sub> | 1+0 | 1 |
|            | Mov R <sub>1</sub> , a              | 1+1 | 2 |
| total cost |                                     |     | 3 |

\*\* In order to generate good code for this machine, we must utilize its addressing capabilities efficiently.

### Example:

1, the three-address statement  $x = y - z$  can be implemented by the machine instructions.

LD R<sub>1</sub>, Y      // R<sub>1</sub> = Y

LD R<sub>2</sub>, Z      // R<sub>2</sub> = Z

SUB R<sub>1</sub>, R<sub>1</sub>, R<sub>2</sub>      // R<sub>1</sub> = R<sub>1</sub> - R<sub>2</sub>

ST X, R<sub>1</sub>      // X = R<sub>1</sub>

2, suppose 'a' is an array whose elements are 8-byte values, perhaps real numbers. Also assume elements of 'a' are indexed starting at '0'. we may execute the three address code  $b = a[i]$  by the machine instruction.

LD R<sub>1</sub>, i      // R<sub>1</sub> = i

MUL R<sub>1</sub>, R<sub>1</sub>, 8      // R<sub>1</sub> = R<sub>1</sub> \* 8

LD R<sub>2</sub>, a(R<sub>1</sub>)      // R<sub>2</sub> = contents(a + contents(R<sub>1</sub>))

ST b, R<sub>2</sub>      // b = R<sub>2</sub>.

3, To implement a simple pointer indirection, such as the three-address code  $x = *p$ , we can use machine instruction

LD R<sub>1</sub>, P      // R<sub>1</sub> = P

LD R<sub>2</sub>, O(R<sub>1</sub>)      // R<sub>2</sub> = contents(O(contents(R<sub>1</sub>)))

ST X, R<sub>2</sub>      // X = R<sub>2</sub>

The Assignment through a pointer  $x = p$  is similarly implemented in machine code by. (6)

LD R<sub>1</sub>, P      // R<sub>1</sub> = p.

LD R<sub>2</sub>, Y      // R<sub>2</sub> = y

ST O(R<sub>1</sub>), R<sub>2</sub>      // contents(Of contents(R<sub>1</sub>)) = R<sub>2</sub>

4) Consider a conditional-Jump Three-address instruction like

If  $x < y$  goto L

The machine-code equivalent would be something like

LD R<sub>1</sub>, X      // R<sub>1</sub> = x

LD R<sub>2</sub>, Y      // R<sub>2</sub> = y

SUB R<sub>1</sub>, R<sub>1</sub>, R<sub>2</sub>      // R<sub>1</sub> = R<sub>1</sub> - R<sub>2</sub>

BLTZ R<sub>1</sub>, M      // if R<sub>1</sub> < 0 Jump to M

Here, M is the label that represents the first machine instruction generated from the 3-address instruction that label L.

Note: In BLTZ, it is a conditional Jump instruction.

Cond x, L, where x is a register, L is a label.

### Exercises:

(1)

Generate code for the following 3-address statements assuming all variables are stored in memory locations.

- a)  $x = 1$    b)  $x = a$    c)  $x = a + 1$    d)  $x = a + b$    e)  $x = b * c$

$y = a + x$  } Two Statements

|    |                                                     |                                                                                                                        |                                              |                                           |
|----|-----------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|----------------------------------------------|-------------------------------------------|
| 2) | $y = x_2$<br>$z = z + 4$<br>$*p = y$<br>$p = p + 4$ | 3, 'a' and 'b' are arrays whose elements are 4-byte values.<br>1, $x = a[i]$<br>$y = b[j]$<br>$a[i] = y$<br>$b[j] = x$ | 2, $x = a[i]$<br>$y = b[i]$<br>$z = x * y$ . | 3, $x = a[p]$<br>$y = b[x]$<br>$a[i] = y$ |
|----|-----------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|----------------------------------------------|-------------------------------------------|

Determine the costs of the following instruction sequences.

a) LD R0, Y

c2 LD R0, C

LD R1, Z

LD R1, I

ADD R0, R0, R1

MUL R1, R1, 8

ST X1, R0

ST a(R1), R0

b) LD R0, I

d) LD R0, P

MUL R0, R0, 8

LD R1, D(R0)

LD R1, a(R0)

ST X1, R1

ST b, R1

e) LD R0, P

f) LD R0, X

LD R1, X

LD R1, Y

ST O(R0), R1

SUB R0, R0, R1

BLTZ \*R3, R0.