

## UNIT-V CONTROL FLOW & DATA FLOW ANALYSIS

flow graph - It is a directed graph in which the flow control info is added to the basic block.

→ The nodes to the flow graph are represented by basic block.

→ The block whose leader is the first stmt is called 'initial stmt. block'

3. Add code:

ex:  $i=0$   
 $sum=0$   
while ( $i \leq 10$ )  
 $\{ sum = sum + i;$   
 $\}$   
3) initial stmt. through phd analysis  
exit dominant pairing of dominos & short

1)  $i=0$

2)  $sum=0$

3) if ( $i \leq 10$ ) goto 4

4)  $t_1 = sum + i$

5)  $sum = t_1$



Loop: It is a collection of nodes in the flow graph.

→ 1) All such nodes are strongly connected that means  
there is path from node to any other node within  
that loop

2) The collection of nodes has unique entry that means  
there is only one path. One path from a node  
outside the loop to the node inside the loop.  
3) The loop that contains no other loops is called  
inner loop

Ex:



Loops in a flow Graph:

### 1) Dominators



- node 1 is the initial node & it dominates every node as it is a initial node.
- Node 2 dominates 3, 4, 5 as there exists only one path from initial node to node 2 which is going through the

$$1 \rightarrow 2 \rightarrow 3 \rightarrow 5$$

$$1 \rightarrow 2 \rightarrow 4 \rightarrow 5.$$

### 2) Natural nodes loops

→ Loop is a flow graph can be denoted by  $n \rightarrow d$  such that

$d$  dominates  $n$  these edges are called back edges

→ If there is  $p \rightarrow q \rightarrow$  where  $p$  is the tail &  $q$  is head.  
head dominates tail.



### 3) Inner loop

→ It is a loop that contains no other loop.

ex:



The inner loop is  $4 \rightarrow 2$

The edges are given by

$$2 \rightarrow 3 \quad 3 \rightarrow 4$$



### 4) Reducible flow graph

- It is a flow graph contains 2 edges. One is forward edge & the other is backward edge.
- forward edges form an acyclic graph
- Backward edges are such edges whose heads dominates tail.

ex:



- The above flow graph is reducible. We can reduce the graph by removing the backward edges  $3 \rightarrow 2$  edge. Similarly removing the backward edge  $5 \rightarrow 1$ .

### ⑤ Non-reducible flow graph:

- It is a flow graph in which there are no backwards edges.
- forward edges may produce cycle in the graph.



### Global optimization:

- It is a program in which each node is represented in the form of flow graph.
- flow graph is a graphical representation in which each node represents the basic block & edges represents the flow of control from one block to another.
- There are two types 1) Control flow analysis  
2) Data flow analysis.

#### 1) Control flow analysis:

ex. if ( $i \leq 10$ )

{

  sum = B[0];

  j = 0;

  L1: if ( $A[i] < B[i]$ )

    {

$J_1 = j$ ;

      if ( $B[i] >= 0$ )

        {

          sum = sum + B[j];

        }

$j = j + 1$ ;

→ It determines the info about presence of loops, nesting of loops; nodes visited by execution of specific node.

→ The analysis is made on the flow of control by examining the program flow graph.

if ( $J < N$ ) goto L2

}

$i = i + 1$

if ( $i < N$ ) goto 4 L1

}

printf (" sum = %d\n", sum);

}



2.ex: L :  $a = k + 2$

$$c = d - b$$

$$d = a + b$$

if ( $d > i$ ) goto E

$$f = b - d$$

$$k = d - 2$$

$$b = a + f$$

if ( $d < i$ ) goto B

$$d = b * 2$$

$$g = b * d$$

$$B: i = i + 1$$

$$b = d - 1$$

goto L

$$E: K = a - e$$

$$f = e + K$$

$$c = d + b$$

~~DATA FLOW ANALYSIS~~

Properties: → analysis is made on the data flow.  
ie it determines the info regarding the definition & use of the data.

### ⇒ DATA FLOW ANALYSIS

→ The data-flow analysis is a process of computing values of data-flow properties

1. A program point containing the definition is called definition point

2. A program point at which reference to a data item is called reference point.

3. A program point at which some evaluating expression is given is made called evaluation point

Ex:  $w_1 : x = 3$  → definition point.



$w_2 : y = x$  → Reference point

$w_3 : z = a * b$  → Evaluation point

## PROPERTIES

- Available  
 1) Evaluatable exp  
 2) Reaching definition.  
 3) Live variables  
 4) Busy exp

### ① Available Exp



1. Exp is available at its evaluation time.
2. The exp ~~should~~ can not be modified before their use.

That exp should be appearing in all blocks after the modifications are allowed. The exp  $4 * i$  is available exp because this exp is not being changed by any of the block before appearing in B4 block.

ADV:  
 → The use of available is to eliminate the common subexpression.

### ② Reaching definition



The definition  $d_1$  is said to a reaching definition for block B2. But, the def<sup>n</sup>  $d_1$  is not a reaching def<sup>n</sup> of block B3 becoz it is killed by def<sup>n</sup>  $d_2$  in B2.

ADV

→ It is used in constant & variable propagation.

### ③ Live variables:

→



→ The variable  $x=5$  is live variable at some point P if there is a path from P to  $c_{exit}$  along the which the value of  $x$  is used before it is redefined. Otherwise the variable is said to be dead at that point.

→  $x$  is live at block B1, B3, B4 but killed at B6.

ADV

→ It is used for dead code elimination.

④

An exp is said to be a busy exp along some path  $P_i$  to  $P_j$  if e is evaluated along  $P_i \dots P_j$ .

→ Busy exp are useful in code movement optimization.

## REDUNDANT COMMON SUB-EXPRESSION

⇒ Copy propagation.

$a := b \rightarrow \text{copy stmt}$



→ we will replace  $x$  by  $t_3$   
 → eliminating the copy statement finally we get



⇒ Iterative data-flow analysis.

→ We can perform a data-flow analysis by computing  
 data-flow equations using

- 1) available exp
- 2) reaching Def<sup>n</sup>
- 3) Live variable analysis
- 4) Logical AND (or) Logical OR
- 5) Difference.

4)

logical AND      logical OR      Difference

A B ANB

0 0 0

0 1 0

1 0 0

1 1 1

A B AUB

0 0 0

0 1 1

1 0 1

1 1 1

A B A-B

0 0 0

0 1 0

1 0 1

1 1 0

Data flow equations:

→ Equations representing the expressions that are appearing in the program. These equations, <sup>used in</sup> computing live variables, available expressions and the reaching definition.

→ Data flow equations are of types

1. forward

2. Backward

→ Available expression & reaching def<sup>n</sup> is called forward analysis

→ Live variable analysis is called backward analysis

→  $Out[s] = gen[s] \cup [in[s] - kill[s]]$

→ s represents the statement for which the dataflow eqn can be written. gen represents the information generated within the statement. In represents the gathered information

→ before entering loop. kill represents the information that is killed within the controlled flow.

Live variable analysis :

→ Data flow equations are

$use[s] \rightarrow$  represents set of variables used by Sipat

$def[s] \rightarrow$  " " " " defined by S.

→ every stmt uses some set of variables (read from them)  
 & defines some set of variables (writes to them)

$$\text{Ex: } x = y + z$$

def =  $x$   
 defined value =  $y, z$

use =  $x$   
 def =  $x$ .

### Algorithm for data-flow equations using live variable analysis

$$\text{int}[i] = \emptyset$$

$$\text{out}[i] = \emptyset$$

Repeat until no change for all  $i$ .

$$\text{out}[i] = \bigcup_{j \in \text{succ}[i]} \text{in}[j]$$

$$i \in \text{succ}[i]$$

$$\text{in}[i] = \text{use}[i] \cup (\text{out}[i] - \text{def}[i])$$

End for

End Repeat.

Ex:  $\text{in}[i]$  &  $\text{out}[i]$  for above prog using  $\text{use}[]$  &

def []



→ first we define  $\text{use}[i] \in \text{def}[i]$  for each block.

for above flow graph,

→ Initially we assume  $\text{in}[i] \& \text{out}[i] = \emptyset$ .

|   | $\text{in}[i]$ | $\text{out}[i]$ |
|---|----------------|-----------------|
| 1 | $\emptyset$    | $\emptyset$     |
| 2 | $\emptyset$    | $\emptyset$     |
| 3 | $\emptyset$    | $\emptyset$     |
| 4 | $\emptyset$    | $\emptyset$     |
| 5 | $\emptyset$    | $\emptyset$     |
| 6 | $\emptyset$    | $\emptyset$     |
| 7 | $\emptyset$    | $\emptyset$     |
| 8 | $\emptyset$    | $\emptyset$     |

$$\text{out}[2] = \text{in}[3] \cup \text{in}[4] \\ = \emptyset \cup \emptyset = \emptyset.$$

$$\text{in}[2] = \text{use}[2] \cup (\text{out}[2] - \text{def}[2]) \\ = \infty \cup (\emptyset - \emptyset) \\ = \infty.$$

$$\text{out}[3] = \text{in}[5] \oplus = \emptyset$$

$$\text{in}[3] = m \cup (\emptyset - z) \\ \text{in}[3] = m.$$

$$\text{out}[4] = \emptyset$$

$$\text{in}[4] = \text{use}[4] \cup (\text{out}[4] - \text{def}[4]) \\ = \infty \cup (\emptyset - \emptyset) \\ = \infty.$$

$$\text{out}[5] = \text{in}[6]$$

$$= \emptyset$$

$$\text{in}[5] = \text{use}[5] \cup (\text{out}[5] - \text{def}[5]) \\ = m, x \cup (\emptyset - y) \\ = m, x$$

$$\text{out}[6] = \text{in}[7] \cup \text{in}[8]$$

$$= \emptyset \cup \emptyset = \emptyset$$

$$\text{in}[6] = \text{use}[6] \cup (\text{out}[6] - \text{def}[6]) \\ = \infty \cup (\emptyset - \emptyset) = \infty.$$

$$\text{out}[7] = \text{in}[2]$$

$= \emptyset$ .

$$\text{in}[7] = \text{use}[7] \cup (\text{out}[7] - \text{def}[7])$$

$$= z \cup (x - m)$$

$$= x, z$$

$$\text{out}[8] = \text{in}[2]$$

$= x$

$$\text{in}_{\text{out}}[8] = \text{use}[8] \cup (\text{out}[8] - \text{def}[8])$$

$$= y \cup (x - m) = x, y$$

After 1<sup>st</sup> pass we get.

| in[i] | out[i]      |
|-------|-------------|
| 1     | $\emptyset$ |
| 2     | $\{x\}$     |
| 3     | $\{m\}$     |
| 4     | $\{x\}$     |
| 5     | $\{m, x\}$  |
| 6     | $\{x\}$     |
| 7     | $\{x, z\}$  |
| 8     | $x$ .       |

$$\text{out}[1] = \text{in}[2] = x$$

$$\text{in}[1] = x$$

$$\text{out}[2] = \text{in}[3] \cup \text{in}[4]$$

$$= m, x, z$$

$$\text{in}[2] = m, x$$

$$\text{out}[3] = \text{in}[5] = m, x$$

$$\text{in}[3] = m, x$$

$$\text{out}[4] = \emptyset$$

$$\text{in}[4] = x.$$

$$\text{out}[5] = \text{in}[6] = [e]$$

$= x$

$$\text{in}[5] = m, x$$

$$\text{out}[6] = x, y, z$$

$$\text{in}[6] = x, y, z$$

$$\text{out}[7] = m, x$$

$$\text{in}[7] = x, z$$

$$\text{out}[8] = m, x$$

$$\text{in}[8] = x, y$$

After 2<sup>nd</sup> pass we get.

|   | in[i]          | out[i]    |
|---|----------------|-----------|
| 1 | $\emptyset, x$ | $x$       |
| 2 | $m, x$         | $m, x$    |
| 3 | $m, x$         | $m, x$    |
| 4 | $m, x$         | $x$       |
| 5 | $x, y, z$      | $x, y, z$ |
| 6 | $x, z$         | $m, x$    |
| 7 | $x, y$         | $m, x$    |

→ since in[4] & out[4] are same as pass 1, so we will not consider.

$$\text{out}[i] = m, x$$

$$\text{in}[i] = x$$

$$\text{out}[2] = m, x \}$$

$$\text{in}[2] = m, x \}$$

out[3] - since in[2] & out[2] are same as in pass 1

we will not consider

$$\text{out}[3] = m, x \} \quad \text{we will not consider.}$$

$$\text{in}[3] = m, x \}$$

$$\text{out}[5] = x, y, z \}$$

$$\text{in}[5] = x, m, z \}$$

$$\text{out}[6] = x, y, z \}$$

$$\text{and in}[6] = x, y, z \}$$

$$\text{out}[7] = m, x \}$$

$$\text{in}[7] = x, z \}$$

$$x = [x]_{ai} = [1]_{fu}$$

$$x = [1]_{ai}$$

$$[+]_{ai} \cup [x]_{ai} = [x]_{fu}$$

since in[6], in[7], in[8] are not considered.

$$\text{out}[8] = m, x \}$$

$$\text{in}[8] = x, y \}$$

$$0 = [+]_{fu}$$

$$x = [+]_{ai}$$

$out[i]$      $in[i]$

(1)  $\{m, x\}$      $\{x\}$

(5)  $\{x, y, z\}$      $\{x, z, m\}$

$out[1] = m, x$     }  
 $in[1] = x$     }

$out[5] = x, y, z$     } we will not consider.  
 $in[5] = x, z, m$

finally we can write the  $in[i]$  &  $out[i]$  values as.

|   | $in[i]$         | $out[i]$      |
|---|-----------------|---------------|
| 1 | $\{x\}$         | $\{m, x\}$    |
| 2 | $\{m, x\}$      | $\{m, x\}$    |
| 3 | $\{m, x\}$      | $\{m, x\}$    |
| 4 | $\{\emptyset\}$ | $\{x\}$       |
| 5 | $\{m, y, z\}$   | $\{x, y, z\}$ |
| 6 | $\{x, y, z\}$   | $\{x, y, z\}$ |
| 7 | $\{x, z\}$      | $\{m, x\}$    |
| 8 | $\{x, y\}$      | $\{m, x\}$    |



Compute the following 3 address code

$$t_1 = a + b$$

$$t_2 = c + d$$

$$t_3 = e - t_2$$

$$t_4 = t_1 - t_3$$

Consider the DAG as an example. Explain the process of code generation from DAG



Three address code

$$t_1 = a + b$$

$$t_2 = c + d$$

$$t_3 = e - t_2$$

$$t_4 = t_1 - t_3$$

Target code

MOV a, R<sub>0</sub>

ADD b, R<sub>0</sub>

MOV c, R<sub>1</sub>

ADD d, R<sub>1</sub>

MOV e, R<sub>2</sub>

sub R<sub>1</sub>, R<sub>2</sub>

sub R<sub>0</sub>, R<sub>2</sub>

Register description

R<sub>0</sub> contains a

R<sub>0</sub> contains t<sub>1</sub>

R<sub>1</sub> contains c

R<sub>1</sub> contains t<sub>2</sub>

R<sub>2</sub> contains e

R<sub>2</sub> contains t<sub>3</sub>

R<sub>2</sub> contains t<sub>4</sub>

$$Q) (a + (b * c)) - (c + (d * e))$$

$$t_1 = b * c$$

$$t_2 = a + t_1$$

$$t_3 = d * e$$

$$t_4 = c + t_3$$

$$t_5 = t_2 - t_4$$



Three add  
Code

$$t_1 = b * c$$

$$t_2 = a + t_1$$

$$t_3 = d * e$$

$$t_4 = c + t_3$$

$$t_5 = t_2 - t_4$$

Target code

MOV b, R0

MUL c, R0

MOV a, R1

ADD R0, R1

MOV d, R2

MUL e, R2

MOV C, R3

ADD R2, R3

SUB R1, R3

Register description

R0 contains b

" " t1

R1 contains a

" " t2

R2 contains d

" " t3

R3 contains c

" " t4

R3 contains t5

Algorithm for code generation:

Gen-code(operator, operand1, operand2)

{

if (operand1.addressmode == 'R')

{

if (operator == '+')

generate (ADD operand2, R0);

elseif (operator == '\*')

generate (MUL operand2, R0);

elseif (operator == '/')

generate (DIV operand2, R0);

elseif (operand2.addressmode == 'R')

if (operator == '+')

generate (ADD operand1, R0);

elseif (operator == '\*')

generate (MUL operand1, R0);

elseif (operator == '/')

generate (DIV operand1, R0);

else

{ generate (MOV operand2, R0);

if (operator == '+')

generate (ADD operand2, R0);

else if (operator == '-')

generate (SUB operand2, R0);

elseif (operator == '\*')

generate (mul operand2, R0):

Generate code for following C program using  
g code generation algorithm.

$x = ++f(a)$

Param a

call f, 1

return t1

$t1 = t1 + 1$

$x = t1$

CODE GENERATION: MOV a, R0

GOTO 100 /\* At location 100 in  
f(a) \*/

MOV AX, R1

ADD #1, R1

MOV R1, X

(2)  $x = f(a) / g(b, c)$

MOV a, R0

GOTO 100 /\* At location 100 in f(a) \*/

MOV AX, R1

ADD

Q) Generate code for  $x = f(f(a))$

Ans Param a

call f,1  
return t1  
param t1  
call f,1  
return t2  
 $x = t_2$

code Gen?:

MOV A, R<sub>0</sub> /\* Parameter a in R<sub>0</sub> \*/  
GOTO 100 /\* At location 100 in f(a), \*/  
MOV AX, R<sub>1</sub> /\* return address of  
Ax is stored in R<sub>1</sub>, \*/  
MOV R<sub>1</sub>, R<sub>2</sub> /\* take parameter  
R<sub>1</sub> in Register R<sub>2</sub> \*/  
GOTO 200 /\* At loc 200 in f(f(a)) \*/  
MOV AX, R<sub>2</sub> /\* return address of  
Ax is stored in R<sub>2</sub> \*/  
MOV R<sub>3</sub>, X /\* finally x is in R<sub>3</sub> \*/

### Object code forms



→ Object code forms are 3 types

1. Absolute code
2. Relocatable code
3. Assembler code.

1.

→ It is a machine code that contains reference to actual address within program address space.

→ The generated code can be placed directly in the memory & execution starts immediately.

2.

→ producing a relocatable machine language program as o/p allows sub programs to be compiled separately.

→ A set of relocatable object modules can be linked together & loaded for execution with the help of linker/loader

→ high flexibility

3.

→ Producing an assembly language program as o/p makes the process of code generation easier & faster

→ We can generate symbolic instructions & use the macro facilities of the assembler to help in gen'g of code.

## Register Allocation & Assignment

→ During register allocation select appropriate set of variables that will reside in registers.

### Register Assignment.

→ During register assignment pick up the specific register in which corresponding variables will reside.

$$a = a[I] + 1$$

MOV I, R<sub>0</sub>

MOV a[R<sub>0</sub>], R<sub>1</sub>

ADD R<sub>1</sub>, #1

MOV R<sub>1</sub>, X

$$a[I] = a[I] + b[J]$$

MOV J, R<sub>0</sub>

MOV b[R<sub>0</sub>], R<sub>1</sub>

MOV I, R<sub>2</sub>

MOV a[R<sub>2</sub>], R<sub>3</sub>

ADD R<sub>1</sub>, R<sub>3</sub>

MOV R<sub>3</sub>, a[R<sub>2</sub>]

$$a[I] = b[c[I]]$$

MOV I, R<sub>0</sub>

MOV c[R<sub>0</sub>], R<sub>1</sub>

MOV b[R<sub>1</sub>], R<sub>2</sub>

MOV R<sub>2</sub>, a[R<sub>0</sub>]

→ Register Allocation & assignments are 4 types

1. Global register allocation

2. Usage count

3. Register assignment for outer loop

4. Bipartite coloring for register assignment.

## 1. Global register allocation:

- It has a strategy of storing the most frequently used variables in fixed registers throughout the loop
- Another strategy is to assign some fixed no. of global variables registers to hold the most active values in each inner loop.
- The registers not already allocated may be used to hold values local to one block.
- In certain languages like C, can do the register allocation by using register declaration.

## 2. Usage count:

- It is the count for the use of some variable in some register used in any basic block. The usage count uses the idea of above how many units of count can be saved by selecting a specific variable for global register allocation.

$$\leq (\text{use}(x, B) + 2 * \text{live}(x, B))$$



→ The usage count for block B, for variable a is

$$use(a, B) = 0$$

$$2 * live(a, B)$$

$$2 * 1 = 2$$

$$\leq (0+2) = 2.$$

3. Register assignment for outer loop:



There are 2 loops, L<sub>1</sub> & L<sub>2</sub>. L<sub>1</sub> is the outer loop & L<sub>2</sub> is the inner loop.

→ If a is allocated in loop L<sub>2</sub> then it should not be allocated in L<sub>1</sub>-L<sub>2</sub>.

→ If a is allocated in L<sub>1</sub> & it is not allocated in L<sub>2</sub> then store in entrance to L<sub>2</sub> & load while leaving L<sub>2</sub>.

→ If a is allocated in L<sub>2</sub> & not in L<sub>1</sub> then load a entrance of L<sub>2</sub> & store on exit from L<sub>2</sub>.

4. Graph colouring for register assignment:

→ In the first pass, the specific machine instruction is selected for register allocation for each variable a symbolic register is allocated.

→ Graph colouring technique is applied for this register inference graph.

- The K colour can be assumed to be no. of assignable registers.
- In graph coloring the technique no two adjacent nodes can have same colour. Hence in register inference graph using such graph colouring principle each node is assigned to the symbolic registers so that no symbolic registers can interplace with each other with assigned physical registers.

