

## UNIT-II

SDD (Syntactic Directed Definition)

SDD is a context free grammar together with attributes & rules.

Attributes are associated with grammar symbols & rules are associated with productions.

If  $x$  is a symbol &  $\alpha$  is one of its attribute then we write  $x.\alpha$

Two kinds of attributes for non terminals :-

- 1) Synthesized attribute
- 2) Inherited attribute.

Synthesized attribute :-

A synthesized attribute at node 'N' is defined only in terms of attribute values at the children of 'N' except 'N' itself.

Inherited attribute :-

An inherited attribute at node 'N' is defined only in terms of attribute values at parent, siblings & itself.

Annotated parse tree :-



Differences between S-attributed & L-attributed.

L-attributed.

S - Attributed

1. uses only synthesized attribute

2. semantic actions are placed at right end of a production

$\text{S} \rightarrow \text{A} \rightarrow \text{B} \text{C}$ ,  $\text{A} \rightarrow \text{B} \text{C} \text{D}$

(n) → indicates the end of the string

3. Attributes are evaluated during bottom up parsing.

4. S-attributed grammar is based on LR grammar.

is implemented using

19/10/23

$$3 \times 5 + 4n = 19n$$

| productions                                      | semantic rules                                                                                   |
|--------------------------------------------------|--------------------------------------------------------------------------------------------------|
| $\text{E} \rightarrow \text{E}n$                 | $\text{L}.\text{val} = \text{E}.\text{val}$                                                      |
| $\text{E} \rightarrow \text{e} + \text{T}$       | $\text{E}.\text{val} = \text{e}.\text{val} + \text{T}.\text{val}$                                |
| $\text{e} \rightarrow \text{T}$                  | $\text{E}.\text{val} = \text{T}.\text{val}$                                                      |
| $\text{T} \rightarrow \text{T}_1 * \text{F}$     | $\text{T}.\text{val} = \text{T}_1.\text{val} * \text{F}.\text{val}$                              |
| $\text{F} \rightarrow (\text{E})   \text{digit}$ | $\text{F}.\text{val} = (\text{E}.\text{val})$<br>$\text{F}.\text{val} = \text{digit}.\text{val}$ |

## YACC (Yet another Compiler-compiler)



- ① Declaration
- ② Translation rule
- ③ Auxiliary procedure

% { #include < stdio.h > } ; Ordinary declaration

### Translation rules :-

In the part of YACC specification, a set of productions are

$\langle \text{head} \rangle \rightarrow \langle \text{body}_1 \rangle \mid \langle \text{body}_2 \rangle \mid \dots \mid \langle \text{body}_n \rangle$ .

could be written in YACC as

$\langle \text{head} \rangle : \langle \text{body}_1 \rangle \{ \text{semantic action}_1 \}$

$\quad \quad \quad | \langle \text{body}_2 \rangle \{ \text{semantic action}_2 \}$ .

A YACC semantic action is a sequence of 'C' statement.

The symbol  $\$ \$$  refers to the attribute-value

associated with the non-terminal on the left hand side.

$\$_i$  refers to the value associated with the  $i$ th grammar symbol on the right hand side.

$$e \rightarrow e + t \mid t$$

$$t \rightarrow t * f \mid f$$

$f \rightarrow (e) \mid \text{digit}.$

100.

$$e : e + t \quad \{ \$ = \$1 + \$3 \}$$

11

$$t : t * f \quad \{ \$ = \$1 * \$3 \}$$

12

$$f : (e) \quad \{ \$ = (\$2) \}$$

1 digit

13

Auxiliary procedures:

The third section of YACC specification consists of supporting 'C' routines.

10 \$ include <ctype.h>

11

9 \$ token digit

100.

$$\text{expr} : \text{expr} + \text{term} \quad \{ \$ = \$1 + \$3 \}$$

1 term.

$$\text{term} : \text{term} * \text{factor} \quad \{ \$ = \$1 * \$3 \}$$

1 factor.

$$\text{factor} : (\text{expr}) \quad \{ \$ = (\$2) \}$$

100.

yylex()

```
{  
    int c;  
    c = getchar();  
    if (isdigit(c))  
    {  
        yylval = c - '0';  
        return DIGIT;  
    }  
    return c;  
}
```

~~YACC~~ → to compile

YACC quename.y

cc y.tab.c.

• la.out.

## Type Checking:-

Type checking is a methodology to check whether the source language follows both syntactic & semantic conventions of the source language.

Type checking ensures that the errors will be detected & reported.

ICG - Intermediate code generator



fig: Position of a type checker.

Type Checking can be off 2 types:-

- ① Static type checking (Compile time).
- ② Dynamic type checking. (Runtime).

During static & dynamic checking, the following errors can be detected.

- ① Type-check
- ② Flow of control checks.
- ③ Uniqueness checks.
- ④ Name related checks.

Type checking of expressions :-

$e \rightarrow \text{literal}$  &  $e.\text{type} = \text{char}$

$e \rightarrow \text{num}$  &  $e.\text{type} = \text{integer}$

$e \rightarrow \text{id.}$  &  $e.\text{type} = \text{lookup(id.entry)}$

$e \rightarrow e_1 \text{ mod } e_2$  &  $e.\text{type} = \text{if } e_1.\text{type} = \text{integer and } e_2.\text{type} = \text{integer then integer else type-error}$

$\underline{e_1 := a[10]}$        $\checkmark$        $s \cdot E \in \text{array}(10, \text{integer})$   
 $\underline{e \rightarrow e_1[e_2]}$ .  $\{$   $e \cdot \text{type} = \text{if } e_2 \cdot \text{type} = \text{integer} \text{ and}$   
 $e_1 \cdot \text{type} = \text{array}(S, t) \text{ then } t$   
 $\text{else type-error}$   $\}$

$\underline{e_1 := e \rightarrow e_1 \uparrow}$

$\{$   $e \cdot \text{type} = \text{if } e_1 \cdot \text{type} = \text{pointer}(t) \text{ then } t$   
 $\text{else type-error}$   $\}$

## Type checking of statements

$s \rightarrow id := e_1; s \cdot \text{type} = \text{if } id \cdot \text{type} = t \cdot \text{type} \text{ then}$   
 $\quad \quad \quad \text{void}$   
 $\quad \quad \quad \text{else type-error}$

$s \rightarrow \text{if } e \text{ then } s_1 \& s \cdot \text{type} = \text{if } e \cdot \text{type} = \text{true}$   
 $\quad \quad \quad \text{then } s_1 \cdot \text{type}$   
 $\quad \quad \quad \text{else type-error}$

$s \rightarrow \text{while } e \text{ do } s_1$   
 $\quad \quad \quad \& s \cdot \text{type} = \text{if } e \cdot \text{type} = \text{true} \text{ then } s_1 \cdot \text{type}$   
 $\quad \quad \quad \text{else type-error}$   
 $s \rightarrow s_1; s_2 \mid$   
 $\quad \quad \quad \{ s \cdot \text{type} = \text{if } s_1 \cdot \text{type} = \text{void} \text{ and } s_2 \cdot \text{type} = \text{void}$   
 $\quad \quad \quad \text{then void else}$   
 $\quad \quad \quad \text{else type-error} \}$

\*\*\*.

## Type checking of functions

$e \rightarrow e_1(e_2) \& e \cdot \text{type} = \text{if } e_2 \cdot \text{type} = t \text{ and}$   
 $e_1 \cdot \text{type} = s \rightarrow t \text{ then } t$   
 $\quad \quad \quad \text{else type-error}$

## Type Conversions:-

Type conversion is the method of converting a variable from its datatype to another datatype depending on the operation & other Operands.

$\epsilon \rightarrow \text{num}$

$\{\epsilon.\text{type} = \text{integer}\}$

$\epsilon \rightarrow \text{num.num}$

$\{\epsilon.\text{type} = \text{float}(\text{real})\}$

$\epsilon \rightarrow \text{id}$

$\{\epsilon.\text{type} = \text{lookup(id.entry)}\}$

$\epsilon \rightarrow E_1 \text{ OP } E_2$

$\epsilon.\text{type} =$  if  $E_1.\text{type} = \text{integer}$  and  
 $E_2.\text{type} = \text{integer}$  then +  
else  
type-error.  
else

~~$\epsilon.\text{type} =$~~  if  $E_1.\text{type} = \text{integer}$  and  $E_2.\text{type} = \text{float}$   
then +  
else  
type-error.

~~$\epsilon.\text{type} =$~~  if  $E_1.\text{type} = \text{real}$  and  $E_2.\text{type} = \text{integer}$   
then real

~~else~~

~~if  $E_1.\text{type} = \text{real}$  and  $E_2.\text{type} = \text{real}$~~

~~then real~~

~~else~~

~~type-error.~~

## Chomsky hierarchy of languages :-

=

(1) Recursively enumerable language (Type 0)

(2) Regular language (Type 3)

(3) Context-free language (Type 2)

(4) Context-sensitive language (Type 1)

(5) Recursively enumerable language (Type 0)



Chomsky hierarchy represents the classes of languages that are accepted by different machines. Therefore, every lang. of type 3 is also of type 2, 1, type 1 & type 0. Similarly, type 2 is also of type 1 & type 0. Similarly, type 1 is also of type 0.

Type 3 ⊂ Type 2 ⊂ Type 1 ⊂ Type 0.

| Grammar                                   | Languages                  | Recognizing<br>Automata             | Production<br>rules                 | Example                                                   |
|-------------------------------------------|----------------------------|-------------------------------------|-------------------------------------|-----------------------------------------------------------|
| Type - 3<br>Regular grammar               | Regular                    | FSA                                 | $A \rightarrow a, A \rightarrow ab$ | $L = \{a^n   n \geq 0\}$                                  |
| Type - 2<br>CFG                           | Context-free               | PDA<br>(Pushdown automata)          | $A \rightarrow a$                   | $L = \{a^n b^n   n \geq 0\}$                              |
| Type - I<br>Context sensitive<br>grammar. | Context-sensitive          | LBA (Linear<br>Bounded<br>automata) | $d A B \rightarrow d \beta B$       | $d = \{a^n b^n   n \geq 0\}$                              |
| Type - 0.<br>unrestricted<br>grammar      | Recursively<br>enumerable. | TM<br>(Turing<br>Machine)           | $\varphi \rightarrow \alpha$        | $\varphi = \{w   w$<br>describes<br>a terminating<br>T.M. |

### \* Type 0 Grammar :-

Type 0 grammar is known as unrestricted grammar.

There is no restriction on the grammar rules of these type of languages. These languages can be efficiently modeled by turing machine.

### \* Type 1 Grammar :-

Type 1 grammar is known as context sensitive grammar. It follows the following rules.

1) The context sensitive grammar may have more than one symbol on the LHS of production rules.

2) The number of symbols on LHS must not exceed the no. of symbols RHS.

3)  $A \rightarrow \epsilon$  is not allowed unless A is a start symbol.

## Type 2 Grammar :-

Type 2 grammar is known as Context free grammar.

Context free languages are the languages which can be represented by the CFG. ( $G = \{V, T, P\}$ )

## Type 3 Grammar :-

Type 3 Grammar is known as regular grammar.

Regular lang's are those lang's which can be described using regular expressions. These lang's can be modelled by DFA or NFA.

Construct SLR, CLR, LALR for following grammar,

$$A \rightarrow id \mid B = D.$$

$$B \rightarrow id$$

$$D \rightarrow a \mid B.$$

$$A \rightarrow id$$

$$A \rightarrow B = D$$

$$B \rightarrow id$$

$$D \rightarrow a$$

$$D \rightarrow B$$

goto( $I_0, A$ )

$$A' \rightarrow A \cdot$$

goto( $I_0, id$ )

$$A \rightarrow id \cdot$$

goto( $I_0, B$ )

$$A \rightarrow B \cdot$$

$$A \rightarrow B = D \cdot$$

$$A' \rightarrow \cdot A \cdot$$

$$A \rightarrow \cdot id$$

$$A \rightarrow \cdot B = D$$

$$B \rightarrow \cdot id$$

$$D \rightarrow \cdot a$$

$$D \rightarrow \cdot B$$

goto( $I_0, E$ )

$$A \rightarrow B = \cdot D \cdot$$

$$D \rightarrow \cdot a$$

goto( $I_4, B$ )

$$A \rightarrow \cdot B = D \cdot$$

$$B \rightarrow \cdot id$$

goto( $I_4, D$ )

$I_0$

$I_4$

$I_5$

goto ( $I_4$ , A)

$\wedge D \rightarrow a \cdot y \cdot I_5 G_6$

goto ( $I_4$ , B)

$D \rightarrow B \cdot y \cdot I_4$

goto ( $I_4$ , id)

$B \rightarrow id \cdot y \cdot I_8$

- ①  $A \rightarrow id$   $I_2$ .
- ②  $A \rightarrow B = D$   $I_6$ .
- ③  $B \rightarrow id$   $I_2$
- ④  $D \rightarrow a$   $I_4$
- ⑤  $D \rightarrow B$   $I_7$ .

$$\text{FOLLOW}(A) = \{\$, y\} \quad \text{FOLLOW}(B) = \{=, \$\}$$

$$\text{FOLLOW}(D) = \{y\}$$

$$\text{FIRST}(B) = \{id\} \quad \text{FIRST}(D) = \{a, id\}$$

| status | id    | =     | a     | y | A | B  | D |
|--------|-------|-------|-------|---|---|----|---|
| $I_0$  | $S_2$ |       | $S_4$ |   | 1 | 3. |   |
| $I_1$  |       |       |       |   |   |    |   |
| $I_2$  |       |       |       |   | 8 | 13 |   |
| $I_3$  |       | $S_4$ |       |   |   |    |   |
| $I_4$  | $S_8$ |       | $S_6$ |   | 7 |    | 5 |
| $I_5$  |       |       | $S_4$ |   |   | 4  | 6 |
| $I_6$  |       |       |       |   | 7 |    |   |
| $I_7$  |       |       |       |   |   |    |   |
| $I_8$  |       |       |       |   | 3 |    |   |

14/11/23

## equivalence of type expressions :-

① Structural equivalence

② Name equivalence.

Structural Equivalence :-

Structural Equivalence of type expressions tells that two type expressions are structurally equivalent if & only if they are identical i.e. the two type expression should be of same basic type.

Name equivalence :-

Name equivalence views each type name as a distinct type. So, two type expressions are name equivalent if they are identical.

Overloading of functions by operation :-

An overloaded symbol has diff meaning depending on its context. Overloading is resolved when a unique meaning is determined for each occurrence of a name.

example:- + operator in java denotes either string concatenation (or) addition, depending on the type of its operands.

```
void e1() void e2(String s)
{ {
```

## Unit-IV

### Intermediate Code Generator :-



### Absract Syntax Tree :-

Abstract Syntax Tree is a hierarchical structure of the ip pgm. In a Syntax Tree the parent nodes are operators & leaf nodes are the symbol of the expression. We use the following func. to create the nodes of the Syntax tree :-

i) mknode(operator, left, right)

It creates an operator node with label op & two fields containing pointers to left & right.

ii) mkleaf(id, entry)

It creates an identifier node with label id & a field containing entry, a pointer to the symbol table entry for the identifier.

iii) mkleaf(num, value)

Creates a num node with label num & a field containing val, the value of a number.

① : a - 4 \* c

p1 = mkleaf(id, a)

p2 = mkleaf(num, 4)

p3 = mknode(-, p1, p2)

p4 = mkleaf(id, c)

p5 = mknode(+, p3, p4)



$p_1 = \text{mkleaf}(\text{id}, b)$   
 $p_2 = \text{mkleaf}(\text{id}, c)$   
 $p_3 = \text{mknode}(-, p_1, p_2)$

$p_4 = \text{mkleaf}(\text{id}, d)$   
 $p_5 = \text{mknode}(*, p_1, p_4)$   
 $p_6 = \text{mkleaf}(\text{id}, a)$   
 $p_7 = \text{mknode}(+, p_6, p_5)$   
 $p_8 = \text{mkleaf}(\text{id}, b)$

$p_9 = \text{mkleaf}(\text{id}, c)$

$p_{10} = \text{mknode}(-, p_8, p_9)$

$p_{11} = \text{mkleaf}(\text{id}, d)$

$p_{12} = (*, p_{10}, p_{11})$ .       $p_{13} = \overset{\text{mknode}}{(+, p_7, p_{12})}$

*Time*  
**DAG (Directed Acyclic Graph)**



$p_1 = \text{mkleaf}(\text{id}, b)$   
 $p_2 = \text{mkleaf}(\text{id}, c)$   
 $p_3 = \text{mknode}(-, p_1, p_2)$   
 $p_4 = \text{mkleaf}(\text{id}, a)$   
 $p_5 = \text{mknode}(*, p_1, p_3)$   
 $p_6 = \text{mknode}(+, p_4, p_5)$   
 $p_7 = \text{mkleaf}(\text{id}, d)$   
 $p_8 = \text{mknode}(*, p_3, p_7)$   
 $p_9 = \text{mknode}(+, p_6, p_8)$



\* Three - address code :-

Three - address code is a sequence of stmts of the general form  $x = y \text{ op } z$  where  $x, y, z$  are operands &  $\text{op}$  is any operator.

$$x = \underline{y} \underline{\text{op}} \underline{z}$$
$$x = \underline{y} \underline{\text{op}} \underline{y}$$

$\text{op} \rightarrow \text{operator}$ .

where  $x, y, z$  are operands &  $\text{op}$  is any operator.

Ex:-  $x = y * z$ .

Here  $t_1$  &  $t_2$  are compiler generated temporary names.

$$t_1 = y * z$$

$$t_2 = x + t_1$$

Ex:-  $a = b * -c + b * -c$ .

$$t_1 = -c$$

$$t_3 = -c$$

$$t_2 = b * t_1$$

$$t_4 = b * t_3$$

$$t_5 = t_2 + t_4$$

Accdg.

Syntax  
tree

DAG :-  $t_1 = -c$

$$t_2 = b * t_1$$

$$t_3 = t_2 + t_2$$

$$a = t_3$$

Implementation of 3 address code:-

Implement the 3 address code using 3 representations

- 1) Quadruples
- 2) Triples
- 3) Indirect triples

## Quadruple:-

A Quadruple is a record structure with 4 fields such as operator (op), arg<sub>1</sub>, arg<sub>2</sub>, result.

The op field contains an internal code for the operator.

The three address Stmt  $x = y \text{ op } z$ . is represented by y in arg<sub>1</sub>, z in arg<sub>2</sub> & x in result.

Triples :- op, arg<sub>1</sub>, arg<sub>2</sub>.

To avoid entering temporary name into symbol table refers to a temporary value by the position of the stmt.

Indirect triples :- op arg<sub>1</sub>, arg<sub>2</sub>, tmpno

Another representation of 3 address code that has been considered is that testing of pointers to triples rather than using the triples themselves. This implementation is known as indirect triples.

Ex:- Implement the 3 address code for following stmt  
 $x + y * z$ .

$$t1 = y * z$$

$$t2 = x + t1$$

① Quadruples :-

| line no # | OP | arg <sub>1</sub> | arg <sub>2</sub> | result |
|-----------|----|------------------|------------------|--------|
| 1         | *  | y                | z                | t1     |
| 2         | +  | x                | t1               | t2     |

### ② Triples

$$(1) t1 = y * 3$$

$$(2) t2 = x + t1$$

| line no # | op | arg1 | arg2 |
|-----------|----|------|------|
| 1         | *  | y    | 3    |
| 2         | +  | x    | (1)  |

### ③ Indirect triple

| line no # | tripeno | op | arg1 | arg2  |
|-----------|---------|----|------|-------|
| 1         | 101     | *  | x    | 3     |
| 2         | 102     | +  | y    | (101) |

anything

$$\underline{\text{ex}}: a = b * (-c) + b * (-c)$$

$$(1) t1 = -c$$

$$(2) t2 = b * t1 \quad (4) t4 = b * t3$$

$$(3) t3 = -c \quad (5) t5 = t2 + t4 \quad (6) a = t5$$

(a) Quadruple:

| line no # | op | arg1 | arg2 | result |
|-----------|----|------|------|--------|
| 1         | -  | c    | null | t1     |
| 2         | *  | b    | t1   | t2     |
| 3         | -  | c    | null | t3     |
| 4         | *  | b    | t3   | t4     |
| 5         | +  | t2   | t4   | t5     |
| 6         | =  | t5   | -    | a      |

Triple

| line no # | OP | arg1 | arg2 |
|-----------|----|------|------|
| 1         | -  | c    |      |
| 2         | *  | b    | (1)  |
| 3         | -  | c    |      |
| 4         | *  | b    | (3)  |
| 5         | +  | (2)  | (4)  |
| 6         | =  | (5)  |      |

(c) Indirect Triple

| line no # | Triple no | OP | arg1  | arg2  |
|-----------|-----------|----|-------|-------|
| 1         | 101       | -  | c     |       |
| 2         | 102       | *  | b     | (101) |
| 3         | 103       | -  | c     |       |
| 4         | 104       | *  | b     | (103) |
| 5         | 105       | +  | (102) | (104) |
| 6         | 106       | =  | (105) |       |

Ex:- Implement the following using DAG.

$$(x+y) * (y+z) + (x+y+z)$$

$$t1 = x+y$$

$$t2 = y+z$$

$$t3 = t1 * t2$$

$$t4 = t1 + z$$

$$t5 = t3 + t4$$

Quadruple

| line no | OP | arg1 | arg2 | result |
|---------|----|------|------|--------|
| 1       | +  | x    | y    | t1     |
| 2       | +  | y    | z    | t2     |
| 3       | *  | t1   | t2   | t3     |
| 4       | +  | t1   | z    | t4     |
| 5       | +  | t3   | t4   | t5     |

## Triple

| line no # | op | aug1 | aug2 |
|-----------|----|------|------|
| 1         | +  | x    | y    |
| 2         | +  | (y)  | z    |
| 3         | *  | (1)  | (2)  |
| 4         | +  | (1)  | (2)  |
| 5         | +  | (2)  | (4)  |

## Indirect triple

| line no # | Triple no | op | aug1  | aug2  |
|-----------|-----------|----|-------|-------|
| 1         | 101       | +  | x     | y     |
| 2         | 102       | +  | y     | z     |
| 3         | 103       | *  | (101) | (102) |
| 4         | 104       | +  | (101) | z     |
| 5         | 105       | +  | (103) | (104) |

$$(Q) -(x+y)* (y+z) + (x+y+z)$$

$$t_1 = x+y$$

$$t_5 = t_1 + z$$

$$t_2 = -t_1$$

$$t_6 = t_4 + t_5$$

$$t_3 = y+z$$

$$t_4 = (-t_2 * t_3)$$

| language construct          | ICF<br>Intermediate code form   | meaning                                                                                                     |
|-----------------------------|---------------------------------|-------------------------------------------------------------------------------------------------------------|
| Assignment statement        | $x = y \text{ op } z.$          | binary operation is performed.                                                                              |
| Assignment statement        | $x = \text{op } z.$             | Unary operation is performed.                                                                               |
| copy statement              | $x = y$                         | here the value of y is assigned to x.                                                                       |
| unconditional jump          | goto d.                         | the control flow goes to the stmt labelled by d.                                                            |
| Conditional jump            | if $\text{relational operator}$ | If, n relational operators is true then it executes the goto/stmt                                           |
| array statement             | $x = y[i]$                      | The value at $i^{\text{th}}$ index of array y is assigned to x.                                             |
| Address & Pointer statement | $x = \&y$<br>$x = *y$           | The value of x will be having the address/location of y.<br>They are pointers whose value is assigned to x. |

Construct the syntax tree, postfix notation & 3 addrs code for  $a = b + c + d$       left right root  $\leftarrow$  post

$$a = b + c + d$$



Write the 3 address code for the following stmt.

if  $A < B$  1 else 0.

- ① if  $A < B$  goto 2. else goto 3. optional
- ②  $T_1 = 1$
- ③  $T_2 = 0$

Write the 3 address code for the following expression.

① if  $A < B$  and  $C < D$  then  $t = 1$  else  $t = 0$ .

- ① if  $A < B$  goto(2).  $(5) + t_2 = 1$ .
- ② goto(4). goto(3).
- ③ if  $(C < D)$  goto(5)
- ④  $t_1 = 0$ .

② if  $A < B$  or  $C < D$  then  $t = 1$  else  $t = 0$ .

- ① if  $A < B$  goto(3) else goto(4)
- ② if  $C < D$  goto(3) else goto(4).
- ③  $t_1 = 1$
- ④  $t_2 = 0$

Convert the following code int pgm into 3 address code.

{ if ( $a > b$ ) then  
     $n++$ ;  
else  
     $m--$ ;  
     $c++$ ;  
} while ( $c < 5$ )

if ( $a < b$ ) then  
     $n++;$   
else  
     $n--;$

① if ( $a < b$ ) goto (2) else goto (3)

②  $m = n + 1;$

③  $m = n - 1.$

b) Generate the 3 address code for the following stmts.

switch (ch)

{ case 1:  $c = a + b;$   
    break;

case 2:  $c = a - b;$   
    break;

3 -

if  $ch = 1$  goto  $\alpha_1$ .

if  $ch = 2$  goto  $\alpha_2$ .

$\alpha_1 \quad T_1 = a + b$

$C = T_1$

goto last.

$\alpha_2 \quad T_2 = a - b$

and then  $C_2 = T_2$  and then goto last.

goto last.

last.

\* \* \* Q) Convert the following program into 3 address code.

```

Prod = 0;
i = 1;
do
    Prod = Prod + a[i] * b[i];
    i = i + 1;
} while (i <= 10);

```

- (1) Prod = 0 } copy stmts.
- (2) i = 1
- (3)  $T_1 = 4 * i \rightarrow$  int contains 4 bytes
- (4)  $T_2 = a[T_1]$
- (5)  $T_3 = a * i$
- (6)  $T_4 = b[T_3]$
- (7)  $T_5 = T_2 * T_4$
- (8)  $T_6 = Prod + T_5$
- (9) Prod =  $T_6$
- (10)  $T_7 = i + 1$
- (11)  $i = T_7$
- (12) If ( $i < 10$ ) goto(3)

### Translation of simple & control flow statements :-

21/11/23

The translation of boolean expressions into 3 address code in the context of stmts such as

$$S \rightarrow \text{if}(B) S_1$$

$$S \rightarrow \text{if}(B) S_1 \text{ else } S_2$$

$$S \rightarrow \text{while}(B) S_1$$

$S \rightarrow if(B) S_1$



$S \rightarrow if(B) S_1 \text{ else } S_2$



$S \rightarrow \text{while}(B) S_1$



With a boolean expression B, we associate two labels

- i) B . true :- The label to which control flows if B is true.
- ii) B . false :- The label to which control flows if B is false.

S . next denoting a label for the inst immediately after code for S.

## Statement

$S \rightarrow \text{if}(B) S_1$

## Semantic rule.

$B \cdot \text{true} = \text{new label}()$

$B \cdot \text{false} = S \cdot \text{next} = S_1 \cdot \text{next}$

$S \cdot \text{code} = B \cdot \text{code} \parallel \text{label}(B \cdot \text{true})$   
 $\parallel S_1 \cdot \text{code} \parallel$

$S \rightarrow \text{if}(B) S_1 \text{ else } S_2$

$B \cdot \text{true} = \text{new label}()$

$B \cdot \text{false} = \text{new label}()$

$S_1 \cdot \text{next} = S_2 \cdot \text{next} = S \cdot \text{next}$

$S \cdot \text{code} = B \cdot \text{code} \parallel \text{label}(B \cdot \text{true})$

$S_1 \cdot \text{code} \parallel \text{gen('goto' } S \cdot \text{next})$

$\parallel \text{label}(B \cdot \text{false}) \parallel S_2 \cdot \text{code}.$

$S \rightarrow \text{while}(B) S_1$

$\text{begin} = \text{new label}()$

$B \cdot \text{true} = \text{new label}()$

$B \cdot \text{false} = S \cdot \text{next}$

$S_1 \cdot \text{next} = \text{begin}$

$S \cdot \text{code} = \text{label(begin)} \parallel (B \cdot \text{code}) \parallel$

$\text{label}(B \cdot \text{true}) \parallel (S_1 \cdot \text{code}) \parallel$

$\text{gen('goto' begin)}$ .

$S \rightarrow S_1 S_2$



$S \cdot \text{code} = S_1 \cdot \text{code} \parallel S_2 \cdot \text{code}.$

$S \cdot \text{next} = \text{new label}()$

$P \cdot \text{code}_2 = S \cdot \text{code} \parallel \text{label}(S \cdot \text{next})$

$S \rightarrow \text{assign}$

$S \cdot \text{code} = \text{assign code}.$

## Storage organization :-



## Activation record :-



An activation record is a contiguous block of storage that manages info required by a single execution of procedures.

### Temporaries :-

Stores temporary e.g. intermediate values of an expression.

### local Data :-

Stores local data of the called procedure.

### Machine status :-

Before the procedure is called it stores the machine status in registers.

### Access link :-

Stores the info of the data which is outside the local scope.

### Control link :-

Stores the address of the activation record of the caller procedure.

Actual parameters :- Stores actual parameters i.e. parameters which are used to send i/p to the called procedure.

Returned value :- Stores return value.

24/11/23

### Storage allocation strategies :-

There are mainly three types of storage allocation strategies -

- 1) static allocation
- 2) heap "
- 3) stack "

① Static Allocation :- lays out or assigns the storage for all the data objects at compile time.

C & C++ use static allocation. The memory will be allocated in a static address location.

- Adv:-
- 1) It is easy to understand.
  - 2) The memory is allocated & once only at compile time & remains the same through out the program completion.

Ds Adv :-

- 1) Not highly scalable.
- 2) Static storage allocation is not very efficient.

2) Heap Allocation

Heap Allocation is used where the stack allocation lacks if we want to retain the values of the local variable after the activation record ends. LIFO scheme does not work for the allocation & deallocation of the activation record.

C, C++, Python, Java.

Adv:-

- 1. Heap allocation is the most flexible allocation scheme.
- 2. Heap allocation is useful where we have data whose size is not fixed & can change during the runtime.

Ds Adv

- 1. Heap allocation is slower as compared to stack allocation.
- 2. There is a chance of memory leaks.

③ Stack Allocation

Stack is commonly known as dynamic allocation.

Dynamic allocation means the allocation of memory at runtime.

Stack is a DS that follows the LIFO principle so whenever there are multiple activation records created it will be pushed or popped in the stack as activations begin & ends.

### Access to non-local names

1) static    [Accesslink  
Display]

2) Dynamic    [Shallow  
Deep]

### Program :-

main

```

{ int a=0;
  int b=0;

  { int b=1;
    { int a=2;
      pf("y-d1.d\n",a,b);
    }
    pf("y-d2.d\n",a,b);
    { int b=3;
      pf("y-d3.d\n",a,b);
      pf("y-d4.d\n",a,b);
    }
    pf("y-d5.d\n",a,b);
  }
}

```

| Declarations | Scope                          |
|--------------|--------------------------------|
| int a=0,     | B <sub>0</sub> -B <sub>2</sub> |
| int b=0;     | B <sub>0</sub> -B <sub>1</sub> |
| int b=1;     | B <sub>1</sub> -B <sub>3</sub> |
| int a=2,     | B <sub>2</sub>                 |
| int b=3;     | B <sub>3</sub>                 |

O/P:

21,  
013  
011  
010.



Access link for finding non-local data:-



(a)



(a)



## Parameter passing techniques

```
void swap(int x, int y)
{ }
```

parameter passing refers to the exchange of information between methods, functions & procedures.

parameters passing are of two types

- Actual parameters
- Formal parameters

### Actual parameters:

The variables specified in the function call are called actual parameters.

### Formal parameters

The variables declared in the called func is called formal parameters.

```
void swap(int x, int y)
```

```
{ int t;
```

```
  t=x;
```

```
  x=y;
```

```
  y=t;
```

```
}
```

```
main()
```

```
{ int a=1, b=2;
```

```
  swap(a,b);
```

```
  printf("a=%d, b=%d", a, b)
```

```
}
```

## Terminologies in parameter passing :-

i-value :- The i-value of the expression refers to the location in memory. The i-values are placed on the LHS of an assignment operator.

r-value :- The value of an expression is its r-value. They are basically placed on the RHS of the assignment operator.

ex:- (A-B)

i-value      r-value

## Methods for parameter passing :-

Call by value

Call by reference

Call by copy-restore

Call by name.

### Call by value :-

Caller evaluates the actual parameters. It also passes r-values of actual parameters to formal parameters.

### Call by reference :-

Caller evaluates actual parameters. It passes i-values to formal parameters (call by address / call by location).  
ex:- swap(&a, &b).

### Call by copy-restore :-

It is the hybrid between call by value & call by reference.

Caller evaluates actual parameters & passes r-values to formal parameters.  
i-values of actual parameters are determined before the call direction.

## Call by name

here procedure body is called instead of the procedure names.

Actual parameters are literally substituted for the formal parameters.

\* \* Symbol table & its implementation :-

<name, information>

allocate

① list

free

lookup

insert

set-attribute

get-attribute

② Linked List



A compiler need to collect & use information about the names appearing in the source pgm. This, info is entered into a data structure called symbol table.

Each entry in the symbol table is a pair of the form <name, information>.

ex:-  $p = i + \tau * 60$  ;

< $p$ , id1>

< $=$ , assignment operator>

< $i$ , id2>

< $*$ , multiplication operator>

< $+$ , addition operator>

< $\tau$ , id3>

< $60$ , constant>

## Operations of a symbol table:-

allocate :- To allocate a new empty symbol table.

free :- To remove all entries and free storage of symbol table.

lookup :- to search for a name & returns pointer to its entry.

insert :- to insert a name in a symbol table & returns to its entry.

get-attribute :- to associate an attribute with a given entry

get-attribute :- to get an attribute associated with a given entry.

## Implementation of symbol Table :-

1) List :- In this method an array is used to store names & associated information.



2) linked list :- here a link field is added to each record.

Searching of names is done in order specified by the link of the link field.

FIRST gives the position of the first record on the linked list.



3) BST :- A more efficient approach to symbol table organization is to add two link fields to each record. These fields to link the records into a binary search tree.

### a) Hash table :-



The Hash Table consists of  $K$  words numbered from  $0, 1, \dots, K-1$ . These words are pointers into the Storage Table. To determine whether name is in the symbol table we apply a hash func(h), such that  $h(\text{name})$  is an integer btw 0 to  $K-1$ .

### Code Optimization techniques:-

|   |    |    |
|---|----|----|
| 7 | 12 | 23 |
|---|----|----|

#### ① Local optimization techniques:-

##### i) Common sub expression elimination :-

Common sub expression elimination is a compiler optimization that searches for instances of identical expressions & analyzes whether it is used by replacing them with a single variable holding the compute value.

$$a = b * c + g$$

$$d = b * c * e$$

$$\text{temp} = b * c$$

$$a = \text{temp} + g$$

$$d = \text{temp} * e$$



i) Copy propagation :-

Assignment of the form  $v = v$  is called copy statement  
(Same example as above).

i) Deadcode elimination :-

A variable is alive at a point in a program if its value can be used subsequently. Otherwise it is dead at the point.

iv) Reduction in Strength :-

The transformation of replacing an expensive operation such as multiplication by cheaper one as addition is known as strength reduction.

$\text{temp} = 5;$

```
for i=1 to 10 do
```

$$x = i * 5$$

end.

$\Rightarrow$

```
for i=1 to 10 do
```

$$x = \text{temp}$$

$$\text{temp} = \text{temp} + 5$$

end.

## v) Code motion

### (i) loop optimization :-

#### i) loop invariant method (or) code motion :-

In this optimization technique the computation inside the loop is avoided & thereby the computation overhead on compiler is avoided. This ultimately optimizes code generation.

for  $i=0$  to 10 do begin

$$k = i + a/b;$$

sop(k);

.....

end;

$$t = a/b.$$

for  $i=0$  to 10 do begin

$$k = i + t$$

.....

end.

while( $i \leq \text{limit}-2$ )  $\Rightarrow t = \text{limit}-2$

while( $P_2 = t$ )

#### ii) loop fusion :-

In loop fusion method several loops are merged into one loop.

ex:- for  $i=1$  to  $n$  do

  for  $j=1$  to  $m$  do

$$a[i,j] = 10$$

$\Rightarrow$  for  $i=1$  to  $n+m$  do

$$a[i] = 10.$$

#### iii) loop unrolling :-

In this method the no. of jumps & tests can be reduced by writing the code  $k$  times.

ex: int i=1;  
 while (i <= 100)  
 {  
 a[i] = b[i];  
 i++;  
 }

int i=1  
 while (i <= 100)  
 { a[i] = b[i];  
 i++;  
 }

### Basic Block :-

basic block contains a sequence of stmts. The flow of control enters at the beginning of the stmt & leave at the end without any branch.

The following sequence of three address stmts form a basic block:-

$t_1 = x * x$   
 $t_2 = x + y$   
 $t_3 = z * t_2$   
 $t_4 = t_1 + t_3$   
 $t_5 = y * y$   
 $t_6 = t_4 + t_5$

### Basic Block Construction:-

#### Algorithm:-

partition into basic blocks.

Input:- It contains the sequence of 3 address stmts

Output:- It contains a list of one basic block with each three address stmts in exactly one block.

Method :- First identify the leaders in a code.  
The rules for finding leaders are as follows:-

- i) The first stmt is a leader.
- ii) Statement 'L' is a leader if there is an unconditional or unconditional goto stmts.
- iii) Instruction 'L' is a leader if it immediately follows goto or conditional goto stmts.

for each leader, its basic block consists of the leaders & all entries in 3 address code. It doesn't include the next leaders or end of the program.

begin

prod=0;

i=1;

do. begin

prod = prod + a[i] \* b[i];

i = i + 1;

end

while i <= 10

end do

8/12/23

1) prod=0

2) i=1

3)  $t_1 = A \cdot i^0$

4)  $t_2 = a[t_1]$

5)  $t_3 = A \cdot i^1$

6)  $t_4 = b[t_3]$

7)  $t_5 = t_2 * t_4$

8)  $t_6 = \text{prod} + t_5$

9)  $\text{prod} = t_6$

10)  $t_7 = i + 1$

11)  $i = t_7$

12) if  $i <= 10$  goto 3

prod = 0

i = 1

$T_1 = A \cdot i$

$T_2 = a[T_1]$

$T_3 = A \cdot i$

$T_4 = b[T_3]$

$T_5 = T_2 * T_4$

$T_6 = \text{prod} * T_5$

prod =  $T_6$

$T_7 = i + 1$

$i = T_7$

if ( $i <= 10$ ) goto 3.

Flow Graph :- flow graph is a directed graph. It contains the flow of control info for the set of basic blocks.

Once an intermediate code program is partitioned into basic blocks we represent the flow of control between them by a flow graph. The nodes of the flow graph are the basic blocks.

If there is an edge from block B to block C, it means only it is possible for the instructions in block C to immediately follow the last instruction in block B.

Therefore 'B' is the predecessor of 'C' & 'C' is a successor of 'B'.



DAG representation for Basic Block :-

A DAG for basic block is a directed acyclic graph with the following labels on nodes.

i) The leaves of the graph are labelled by a unique identifier & that identifier can be variable names or constants.

ii) Internal nodes of the graph is labelled by an operator symbol.

3) DAG provides a good way to determine the common sub expression.

Algorithm for construction of DAG :-

Input :- It contains a basic block.

Output :- It contains the following information

- Each node contains a label. for leafs the label is an identifier.
- Each node contains a list of attached identifiers to hold the computed values.

Case (i)  $x = y \text{ op } z$

case (ii)  $x = \text{op } y$

case (iii)  $x := y$

Method :-

Step 1 :- If y operand is undefined then create node(y).  
If z operand is undefined then for case (i).

Create node(z)

Step 2 :- for case (i) Create node(op) whose right child is node(z) & left child is node(y).

for case (ii) check whether there is node(op) with one child node(y).

for case (iii) node(n) will be node(y)

Construct DAG for the following expressions :-

$$T_0 = a + b$$

$$T_1 = T_0 + c$$

$$d = T_0 + T_1$$



(Q)  $T_1 = a + b$        $T_4 = T_1 - T_3$   
 $T_2 = a - b$        $T_5 = T_4 + T_3$   
 $T_3 = T_1 * T_2$



Write in steps.

construct DAG for the following 3 address statements :-

(1)  $S_1 = A * i$   
(2)  $S_2 = A[S_1]$ .  
(3)  $S_3 = A * i$

(4)  $S_4 = B[S_3]$   
(5)  $S_5 = S_2 + S_4$   
(6)  $S_6 = prod + S_5$

(7)  $prod = S_6$   
(8)  $S_7 = i + 1$   
(9)  $i = S_7$ .  
(10)  $\text{if } i \leq 20 \text{ goto } 1$



Step 7 :-  $\text{prod} = 56$



Step 8:-



$$\text{Step 9: } \rho = s_7$$



Step 10:



## Code Generation Algorithm

It generates target code for a sequence of three address statements.

for each operator in a stmt - there is a corresponding target lang. operator.

### Register & Address Descriptors

#### Register Descriptors



It keeps track of what is currently in each register.

Initially all the registers are empty.

#### Address Descriptors

It keeps track of the location where the current value of the name can be found.

The location may be a register or memory address.

### Algorithms :-

for a 3 address instruction  $x = y + z$  do the following:

- ① Use getreg( $x = y + z$ ) to select registers for  $x, y, z$ . call them as  $R_x, R_y, R_z$ .
- ② If  $y$  is not in  $R_y$  then issue an instruction  $LD R_y y'$  (load  $R_y, y'$ ), where  $y'$  is one of the memory location for  $y$ .
- ③ Similarly if  $z$  is not in  $R_z$  issue an instruction  $LD R_z z'$ , where  $z'$  is a memory location for  $z$ .
- ④ Issue the instruction at  $R_x, R_y, R_z$

$$t = a - b$$

$$u = a - c$$

$$v = t + u$$

$$a = d$$

$$d = v + u$$

here  $t, u, v$  are the temporary variables.  $a, b, c, d$  are the normal variables.

Step 1:- Initially all registers are empty.

| R <sub>1</sub> | R <sub>2</sub> | R <sub>3</sub> |
|----------------|----------------|----------------|
|                |                |                |

|   |   |   |   |   |   |   |
|---|---|---|---|---|---|---|
| a | b | c | d | t | u | v |
| a | b | c | d | t | u | v |

Step 2:-  $t = a - b$

LD R<sub>1</sub>, a

LD R<sub>2</sub>, b

SUB R<sub>2</sub> R<sub>1</sub>, R<sub>2</sub>

| R <sub>1</sub> | R <sub>2</sub> | R <sub>3</sub> | a | b | c | d | t | u | v |
|----------------|----------------|----------------|---|---|---|---|---|---|---|
| a              | t              |                | a | b | c | d | t | u | v |

Step 3:-  $u = a - c$ .

LD R<sub>3</sub>, c.

SUB R<sub>1</sub> R<sub>1</sub>, R<sub>3</sub>

| R <sub>1</sub> | R <sub>2</sub> | R <sub>3</sub> | a | b | c | d | t | u | v |
|----------------|----------------|----------------|---|---|---|---|---|---|---|
| u              | t              | c              | a | b | c | d | t | u | v |

Step 4:-  $v = t + u$

ADD R<sub>3</sub> R<sub>2</sub>, R<sub>1</sub>

| R <sub>1</sub> | R <sub>2</sub> | R <sub>3</sub> | a | b | c | d | t | u | v |
|----------------|----------------|----------------|---|---|---|---|---|---|---|
| u              | t              | v              | a | b | c | d | t | u | v |

Step 5:-  $a = d$ .

| R <sub>1</sub> | R <sub>2</sub> | R <sub>3</sub> |
|----------------|----------------|----------------|
| u              | a              | d              |

|   |                |   |   |   |                |                |
|---|----------------|---|---|---|----------------|----------------|
| a | b              | c | d | t | u              | v              |
| a | R <sub>2</sub> | b | c | d | R <sub>1</sub> | R <sub>3</sub> |

Step 6:-  $d = v + u$

Add R<sub>1</sub> R<sub>3</sub>, R<sub>1</sub>

| R <sub>1</sub> | R <sub>2</sub> | R <sub>3</sub> |
|----------------|----------------|----------------|
| t              | d              | a              |

|   |                |   |   |   |                |                |
|---|----------------|---|---|---|----------------|----------------|
| a | b              | c | d | t | u              | v              |
| a | R <sub>2</sub> | b | c | d | R <sub>1</sub> | R <sub>3</sub> |

ST a, R<sub>2</sub>

ST d, R<sub>1</sub>

of perform code generation algorithm for the given expression :-

$$d = (a-b) + (a-c) + (a-c)$$

first convert into three address code

$$t_1 = a - b$$

$$t_2 = a - c$$

$$t_3 = t_1 + t_2$$

$$d = t_3 + t_2$$

Step 1:- Initially all registers are empty

| R <sub>1</sub> | R <sub>2</sub> | R <sub>3</sub> |
|----------------|----------------|----------------|
|                |                |                |

| a | b | c | d | t <sub>1</sub> | t <sub>2</sub> | t <sub>3</sub> |
|---|---|---|---|----------------|----------------|----------------|
| a | b | c | d |                |                |                |

Step 2:-  $t_1 = a - b$ .

LD R<sub>1</sub>, a

| R <sub>1</sub> | R <sub>2</sub> | R <sub>3</sub> |
|----------------|----------------|----------------|
| a              | t <sub>1</sub> |                |

| a | b | c | d | t <sub>1</sub> | t <sub>2</sub> | t <sub>3</sub> |
|---|---|---|---|----------------|----------------|----------------|
| a | b | c | d | R <sub>2</sub> |                |                |

LD R<sub>2</sub>, b

SUB R<sub>2</sub>, R<sub>1</sub>, R<sub>2</sub>

Step 3:-  $t_2 = a - c$ .

LD R<sub>3</sub>, c

| R <sub>1</sub> | R <sub>2</sub> | R <sub>3</sub> |
|----------------|----------------|----------------|
| t <sub>2</sub> | t <sub>1</sub> | c              |

| a | b | c | d | t <sub>1</sub> | t <sub>2</sub> | t <sub>3</sub> |
|---|---|---|---|----------------|----------------|----------------|
| a | b | c |   | R <sub>2</sub> | R <sub>1</sub> |                |

SUB R<sub>1</sub>, R<sub>1</sub>, R<sub>3</sub>

Step 4:-

$$t_3 = t_1 + t_2$$

ADD R<sub>3</sub>, t<sub>1</sub>, t<sub>2</sub>

| R <sub>1</sub> | R <sub>2</sub> | R <sub>3</sub> |
|----------------|----------------|----------------|
| t <sub>2</sub> | t <sub>1</sub> | t <sub>3</sub> |

| a | b | c | d | t <sub>1</sub> | t <sub>2</sub> | t <sub>3</sub> |
|---|---|---|---|----------------|----------------|----------------|
| a | b | c | d | R <sub>3</sub> | R <sub>1</sub> |                |

ADD R<sub>3</sub>, R<sub>2</sub>, R<sub>1</sub>

Step 5:-

$$d = t_3 + t_2$$

~~ADD R<sub>2</sub>, R<sub>3</sub>, R<sub>1</sub>~~

| R <sub>1</sub> | R <sub>2</sub> | R <sub>3</sub> |
|----------------|----------------|----------------|
| t <sub>2</sub> | t <sub>3</sub> |                |

| a | b | c | d              | t <sub>1</sub> | t <sub>2</sub> | t <sub>3</sub> |
|---|---|---|----------------|----------------|----------------|----------------|
| a | b | c | R <sub>2</sub> | R <sub>1</sub> |                |                |

ADD R<sub>2</sub>, R<sub>3</sub>, R<sub>1</sub>

ST. d, R<sub>2</sub>

## Peephole Optimization Techniques :-

It replaces short sequence of target instructions by a shorter or faster sequence.

### ① Redundant instruction elimination :-

(a) eliminating redundant load & store

ex- LD R0, a, f  
ST a, R0, f

(b) eliminating unreachable code

ex- if debug == 1 goto L1

goto L2 x (unreachable code)

L1 - print debugging info.

L2 .

### ② flow of control optimization :-

Here, unnecessary jumps can be avoided.

goto L1.

goto L2.

L1: goto L2

L1: goto L2

L2: a = b

L2: a = b

### ③ Algebraic express/simplifications

$$n = n + 0 \quad n = n * 1$$

### ④ Use of machine idioms

i = i + 1 → Inc i

i = i - 1 → Dec i.

## Register Allocation & Assignment :-

- (1) Global register allocation
- (2) Usage Count
- (3) Registers assignment for outer loops
- (4) Register allocation by graph coloring



$\sum \text{use}(x_1 B) + 2 * \text{live}(x_1 B)$   
blocks B in L

loop = L    register = x

$\text{use}(x_1 B)$  is no. of times  $x$  used & not preceded by an assignment to  $x$  in some block.  
 $\text{live}(x_1 B)$  is 1 if  $x$  is live on exit, otherwise it is zero.

|   | B1                                 | B2                                   | B3                                 | B4                                 | Total |
|---|------------------------------------|--------------------------------------|------------------------------------|------------------------------------|-------|
| a | $0 + 2 * 1 = 2$                    | $1 + 2 * 0 = 1$                      | $1 + 2 * 0 = 1$                    | $0 + 2 * 0 = 0$                    | 4     |
| b | $2 + 2 * 0 = 2$                    | $0 + 2 * 0 = 0$                      | $0 + 2 * 1 = 2$                    | $0 + 2 * 1 = 2$                    | 6     |
| c | $1 + 2 * 1 = 2$<br>$1 + 2 * 0 = 1$ | $10 + 2 * 1 = 12$<br>$0 + 2 * 0 = 0$ | $1 + 2 * 1 = 3$<br>$1 + 2 * 0 = 1$ | $1 + 2 * 0 = 3$<br>$1 + 2 * 0 = 1$ | 11    |
| d | $1 + 2 * 1 = 3$<br>$1 + 2 * 1 = 3$ | $1 + 2 * 1 = 3$<br>$1 + 2 * 0 = 1$   | $1 + 2 * 1 = 3$<br>$1 + 2 * 0 = 1$ | $1 + 2 * 1 = 3$<br>$1 + 2 * 0 = 1$ | 12    |
| e | $0 + 2 * 1 = 2$<br>$0 + 2 * 1 = 2$ | $0 + 2 * 1 = 2$<br>$0 + 2 * 0 = 0$   | $0 + 2 * 1 = 2$<br>$0 + 2 * 1 = 2$ | $0 + 2 * 1 = 2$<br>$0 + 2 * 0 = 0$ | 8     |
| f | $1 + 2 * 1 = 3$<br>$1 + 2 * 0 = 1$ | $0 + 2 * 1 = 2$<br>$1 + 2 * 0 = 1$   | $1 + 2 * 1 = 3$<br>$1 + 2 * 0 = 1$ | $0 + 2 * 1 = 2$<br>$1 + 2 * 0 = 1$ | 10    |

$\text{live}(n_1, B) + \text{live}(n_2, B)$

$$a = b + c$$

$a \rightarrow$  defined variables

$d, c, f, e, b, r_a$   
 $R_1, R_2, R_3, R_4, R_5, R_6$

$b, c \rightarrow$  used variables ..

def chains - definition used chains

| $n_1$ | $R_2$ | $R_3$ | $R_4$ | $R_5$ | $R_6$ |
|-------|-------|-------|-------|-------|-------|
| d     | c     | f     | e     | b     | a     |

| $a - 4$ | $b - 6$ | $c - 3$ | $d - 6$ | $e - 4$ | $f - 4$ |
|---------|---------|---------|---------|---------|---------|
| $R_1$   | $R_2$   | $R_3$   |         |         |         |

Global register Allocation :- To save some of these stores & corresponding loads, we might arrange to assign registers to frequently used variables & keep this registers consistent across block boundaries.

### ③ Register Assignment for outer loop :-

If an outer loop  $L_1$  contains an inner loop  $L_2$  - the names allocated registers in  $L_2$  need not be allocated in  $L_1-L_2$ . If 'x' is allocated to register in  $L_2$  then load & on entrance of  $L_2$  & store it on exit from  $L_2$ .

### ④ Registers allocation by graph coloring :-

When a register is needed for computation but all registers are in use - the contents of one register must be stored into memory location.  
Here 2 passes are used :-

- 1) Target machine instructions are selected.
- 2) A register interference graph is constructed,

nodes are symbolic registers & it connects to nodes.

A graph is said to be colored if each node has been assigned a color in such a way that no two adjacent nodes have the same color.





## Introduction to assembly language instructions

Address in the target code:

There are  $n$ -general purpose registers  $R_0, R_1, \dots, R_{n-1}$ .

The two address instruction of the form  $op\ source\ dest$  where  $op$  is any operator code,  $source$  &  $dest$  are data fields.

MOV - Moves from  $src$  to  $dest$ .

ADD - Add  $source$  to  $dest$ .

SUB - Subtracts  $src$  from  $dest$ .

Addressing mode

Absolute

Register

Indirect

Indirect-registers

Indirect Indirect

Literal

form

Address

Added cost.

$m$

$m$

$R$

$R$

$C(R)$

(+ content( $R$ ))

1

$\#R$

Content( $R$ )

0

$\#C(R)$

Content

1

$\#C$

( $C + \text{content}(R)$ )

$C$

1

Instruction cost = 1 + cost of the src & dest Addressing mode.

Consider the following code sequence :-

- 1) MOV B1R0.                    2) MOV B1A  
ADD C1R0.                    ADD C1A.  
MOV R0A.
- 3) MOV A1R0                    SUB R1R0  
ADD B1R0                    MOV R0A.  
MOV C1R1                    ADD C1R1  
SUB E1R1.

Calculate the cost of the above instructions :-

(1)  $\text{MOV B1R0} = 1+1+0=2$   
 $\text{ADD C1R0} = 1+1+0=2$   
 $\text{MOV R0A} = 1+0+1=\frac{2}{6}$ .

(2)  $\text{MOV B1A} = 1+1+1=3$   
 $\text{ADD C1A} = 1+1+1=\frac{3}{6}$ .

(3)  $\text{MOV A1R0} = 1+1+0=2$                     (4)  $\text{MOV R0,R1} = \frac{1+0+0}{1}=1$   
 $\text{ADD B1R0} = 1+1+0=2$                     (5)  $\text{MOV R0,M} = 1+0+1=2$   
 $\text{MOV C1R1} = 1+1+0=2$                     ADD H1, R0  
 $\text{ADD C1R1} = 1+1+0=2$                      $= 1+1+0=2$   
 $\text{SUB E1R1} = 1+1+0=2$                     SUB 2(R0), 48(R1)  
 $\text{SUB R1R0} = 1+0+0=1$                      $1+2(1)+1$   
 $\text{MOV R0,A} = 1+0+1=2$

## live Variable Analysis

→ The variable  $OUT[B] = \cup IN[S]$

$S$  is a successor of  $B$

$$IN[B] = USE[B] \cup (OUT[B] - DEF[B])$$

$$IN[B] = \emptyset, \text{ for all } B \text{ (initialization only)}$$



Step 1:

$$OUT[B_4] = IN[B_2] = \emptyset$$

$$IN[B_4] = USE[B_4] \cup [\emptyset - \emptyset]$$

$$IN[B_4] = \{a, j\}$$

Step 2:  $OUT[B_3] = IN[B_4] = \{a, j\}$

$$IN[B_3] = USE[B_3] \cup (OUT[B_3] - DEF[B_3])$$

$$IN[B_3] = \{v_2\} \cup (\{a, j\} - \{a\})$$

$$IN[B_3] = \{v_2, j\}$$

Step 3 :-

$$\begin{aligned}
 \text{OUT}[B_2] &= \text{IN}[B_3] \cup \text{IN}[B_4] \\
 &= \{v_2, j\} \cup \{a, j\} = \{v_2, a, j\} \\
 \text{IN}[B_2] &= \text{use}[B_2] \cup [\text{OUT}[B_2] - \text{DEF}[B_2]] \\
 &= (i, j) \cup \{v_2, a, j\} - \emptyset \\
 \text{IN}[B_2]. &= \{i, j, a, v_2\}
 \end{aligned}$$

Step 4 :-

$$\begin{aligned}
 \text{OUT}[B_1]. &= \text{IN}[B_2] = \{i, j, a, v_2\} \\
 \text{IN}[B_1] &= \text{use}[B_1] \cup (\text{OUT}[B_1] - \text{DEF}[B_1]) \\
 &= \{m, n, v_1\} \cup (\{i, j, a, v_2\} - \{i, j, a\}) \\
 &= \{m, n, v_1\} \cup \{v_2\} \\
 &= \{m, n, v_1, v_2\}.
 \end{aligned}$$

Step 5 :-

$$\begin{aligned}
 \text{OUT}[B_4] &= \text{IN}[B_2] = \{a, i, j, v_2\} \\
 \text{IN}[B_4] &= \text{use}[B_4] \cup [\text{OUT}[B_4] - \text{DEF}[B_4]] \\
 &= \{a, j\} \cup (\{a, i, j, v_2\} - \{i\}) \\
 &= \{a, i, v_2\}.
 \end{aligned}$$

Step 6 :-

$$\begin{aligned}
 \text{OUT}[B_3] &= \text{IN}[B_4] = \{a, j, v_2\} \\
 \text{IN}[B_3] &= \text{use}[B_3] \cup (\text{OUT}[B_3] - \text{DEF}[B_3]) \\
 &= \{v_2\} \cup (\{a, j, v_2\} - \{a\}) \\
 &= \{j, v_2\}.
 \end{aligned}$$

Step 7 :-

$$\begin{aligned}
 \text{OUT}[B_2] &= \text{IN}[B_3] \cup \text{IN}[B_4] \\
 &= \{j, v_2\} \cup \{a, j, v_2\} \\
 &= \{a, j, v_2\}.
 \end{aligned}$$

$$\begin{aligned}
 \text{IN}[B_2] &= \text{use}[B_2] \cup (\text{OUT}[B_2] - \text{DEF}[B_2]) \\
 &= (i, j) \cup (\{a, j, v_2\} - \{\emptyset\}) \\
 &= \{i, j, a, v_2\}
 \end{aligned}$$

## Data Flow Analysis

Basic Terminologies :-

Definition point - a point in a program containing some definition.

Reference point - a point in a program containing a reference to a data item

Evaluation point - a point in a program containing evaluation of expression

Dataflow Analysis is a technique to analyze how data flows through a program.



Data flow Properties :-

- 1) Available expression
- 2) Reaching Definitions
- 3) Live variable Analysis
- 4) Busy expression

Available expression :- An expression is said to be available at program point  $x$ , if along path it reaches to  $x$ . An expression is available at its evaluation point.



Reaching Definition :- A definition D reaches a point X if there is path from D to X in which D is not killed, i.e. not defined.



D<sub>1</sub> is reaching definition for B<sub>2</sub> but not for B<sub>3</sub> since it is killed by D<sub>2</sub>.

Live Variable :- A var is said to be live at some point p if from p to end the variable is used before it is redefined else it becomes dead.



a is live at block B<sub>1</sub>, B<sub>2</sub>, B<sub>4</sub> but killed at B<sub>5</sub>

Busy Expression :- An expression is busy along a path if its evaluation exists along that path & none of its operand definition exists before its evaluation along the path.



## Dataflow Equations :-

part 2.

Input :- kill( $B_1$ ) and gen( $B_1$ ) for every basic block  $B_1$   
Output :- in( $B$ ) and out( $B$ ) for every basic block  $B$ .

For each  $B$  repeat

out( $B$ ) := gen( $B$ )

some changes to any out( $B$ ) occurs repeat -

in( $B$ ) :=  $\bigcup_{B' \in pred(B)}$  out( $B'$ )

out( $B$ ) := gen( $B$ )  $\cup$  (in( $B$ ) - kill( $B$ ))

where

out( $S$ ) is the information at the end of the statements in  $S$ .

gen( $S$ ) is the info generated by the block.

in( $S$ ) is the info at the beginning of the block.

kill( $S$ ) is the info killed / removed by the block.

(S)

entry

$a_1 : i = m-1$   
 $a_2 : j = n$   
 $d : a := 1$



$d : a := b + c$

gen( $S$ ) =  $\{d\}$

kill( $S$ ) =  $\{d\}$

in( $S$ ) =  $\{d\}$   
 out( $S$ ) =  $\{d\}$

gen( $B_1$ ) =  $\{d\}$   
 $a_1 : i = m-1$   
 $a_2 : j = n$   
 $d : a := 1$

kill( $B_1$ ) =  $\{d\}$   
 $a_1 : i = m-1$   
 $a_2 : j = n$   
 $d : a := 1$

in( $B_1$ ) =  $\{d\}$   
 out( $B_1$ ) =  $\{d\}$

gen( $B_2$ ) =  $\{d\}$   
 $a_1 : i = m-1$   
 $a_2 : j = n$   
 $d : a := 1$

kill( $B_2$ ) =  $\{d\}$   
 $a_1 : i = m-1$   
 $a_2 : j = n$   
 $d : a := 1$

in( $B_2$ ) =  $\{d\}$   
 out( $B_2$ ) =  $\{d\}$

Dataflow equations for an assignment :-



$a_1 := m-1$   
 $a_2 := n$   
 $d := 1$

$a_1 := m$   
 $a_2 := n$   
 $d := a + b$

$a_1 := m$   
 $a_2 := n$   
 $d := a + b + 1$

gen( $B_1$ ) =  $\{d\}$   
 $a_1 : i = m-1$   
 $a_2 : j = n$   
 $d : a := 1$

kill( $B_1$ ) =  $\{d\}$   
 $a_1 : i = m-1$   
 $a_2 : j = n$   
 $d : a := 1$

in( $B_1$ ) =  $\{d\}$   
 out( $B_1$ ) =  $\{d\}$

gen( $B_2$ ) =  $\{d\}$   
 $a_1 : i = m-1$   
 $a_2 : j = n$   
 $d : a := 1$

kill( $B_2$ ) =  $\{d\}$   
 $a_1 : i = m-1$   
 $a_2 : j = n$   
 $d : a := 1$

in( $B_2$ ) =  $\{d\}$   
 out( $B_2$ ) =  $\{d\}$

gen( $B_3$ ) =  $\{d\}$   
 $a_1 : i = m-1$   
 $a_2 : j = n$   
 $d : a := 1$

kill( $B_3$ ) =  $\{d\}$   
 $a_1 : i = m-1$   
 $a_2 : j = n$   
 $d : a := 1$

in( $B_3$ ) =  $\{d\}$   
 out( $B_3$ ) =  $\{d\}$

Data-flow equations for a sequence of statements :-

$$\begin{aligned} \text{gen}(s) &= \text{gen}(s_1) \\ \text{kill}(s) &= \text{kill}(s_1) \\ \text{in}(s_1) &= \text{in}(s) \cup \text{gen}(s_1) \\ \text{out}(s) &= \text{out}(s_1). \end{aligned}$$



$$\begin{aligned} \text{gen}(s) &= \text{gen}(s_2) \cup (\text{gen}(s_1) - \text{kill}(s_2)) \\ \text{kill}(s) &= \text{kill}(s_2) \cup (\text{kill}(s_1) - \text{gen}(s_2)) \\ \text{in}(s_1) &= \text{in}(s) \\ \text{in}(s_2) &= \text{out}(s_1) \\ \text{out}(s) &= \text{out}(s_2) \end{aligned}$$

Data flow equations for an alternative :-



Relocatable code :-

$$\begin{aligned} \text{gen}(s) &= \text{gen}(s_1) \cup \text{gen}(s_2) \\ \text{kill}(s) &= \text{kill}(s_1) \cap \text{kill}(s_2) \\ \text{in}(s_1) &= \text{in}(s) \\ \text{in}(s_2) &= \text{in}(s) \\ \text{out}(s) &= \text{out}(s_2) \cup \text{out}(s_1). \end{aligned}$$

Data flow equations for a while loop :-



### Object code forms (Op or Target generator)

- ① Absolute code
- ② Relocatable code
- ③ Assembly code

Absolute code :-

Absolute code is a machine code that contains references to actual addresses within program address space. The generated code can be placed directly in memory at execution starts immediately. Generally small programs can be compiled at execution quickly.

Relocatable code :-

Producing a relocatable machine. Large programs often allow sub programs to be compiled separately. A set of relocatable object modules can be linked together at link time with the help of the linking loader.

Assembly code :-

Producing an assembly lang as op makes the process of code generation. But generating an assembly code as op makes code generation process slower.

## Algorithm - copy propagation

Input :- A flow graph containing used definition i.e and chains reaching to block B.

Output :- A graph after applying copy propagation transformation.

Method :- For each simt  $n=y$  perform following

Steps :-

Step 1 :- determine the simts in which  $n$  is used. Thus Stmt should be reachable from def of  $n$ .

Step 2 :- There should not be any def of  $n$  only occur prior to use( $n$ ) in the block containing Stmt s.

Step 3 :- If 's' satisfies the cond mentioned in Step 2 then replace all uses of  $n$  by  $s$ .

Bounding Step 1 by y.

$$\begin{array}{l} n = t_3 \\ a[t_1] = t_2 \\ a[t_3] = a \end{array}$$

copy simt

use

$$\begin{array}{l} y = n + 3 \\ a[t_5] = y \end{array}$$

$$\begin{array}{l} n = t_3 \\ a[t_1] = t_2 \\ a[t_3] = t_3 \end{array}$$

↓

$$\begin{array}{l} a[t_4] = t_2 \\ a[t_5] = t_3 \end{array}$$

↓

$$\begin{array}{l} y = t_3 + 3 \\ a[t_5] = y \end{array}$$