

dataflow diagrams : data dependency graphs that illustrate a computation

It is useful when

1. exploring disadvantages or advantages in design
2. estimate area and performance of design
3. design datapath-centric systems, systematically

$$y = a + b + c + d + e$$



If a and b are a 8-bit input value,

$$a + b = x_1$$

The max. value of  $a = 10 = 255$ .

$$\text{So, max value of } x_1 = 255 + 255 \\ = 510$$

size of  $x_1$  is 9 bits.

In general, adding n bits with itself gives  $n + 1$

$$\text{So, } \begin{array}{ccccccc} x_2 & = & x_1 & + & c \\ \text{10} & & \text{9} & & \text{8} \\ 765 & & 510 & & 255 \end{array}$$

For  $x_3 = x_2 + d$ ,  $x_3$  can be 11 based on general rule

Based on max. values,  $x_3 \leq 1020$  so 10 bits still works!

For  $x_4 = x_3 + e = 11$  bits are needed. For  $y$ ,  $\max(y) = 1530$  so 11 bits work.

To find the latency,



→ 6 clock cycles latency

With appropriate control logic,  
this can be done with one adder,  
multiplexer to choose operand &  
FSM to control num.



If we use more clock adder,  
we can reduce latency per clock cycle.

Note that output takes it's  
own clock cycle =

Every time we cross a clock cycle boundary, data value must be stored.

single binary val  $\rightarrow$  D-ff

multiple binary vals  $\rightarrow$  DR

clock



Each O  $\rightarrow$  register

Total: 21 registers

### Performance Estimation

performance of a computation is

$$\text{Performance} \propto \frac{1}{t_{\text{execution}}}$$

Longer processing  
time  
worse performance

where

$$t_{\text{execution}} = \text{Latency} \times \text{Clock period.}$$

# of clock  
cycles required  
to produce  
output.

combinational  
circuit = 0 latency



*Design for slowest step in the pipeline //*

Assume:

single adder = 8 ns, register = 2 ns

$$\text{In first design, } t_{\text{execution}} = 10 \text{ ns} \times 6 \\ = 60 \text{ ns}$$

$$\text{In this design, } t_{\text{execution}} = 18 \text{ ns} \times 4 \\ = 72 \text{ ns}$$

### Area estimation

max # of blocks in clock cycle = total # of components.

max # of signals in clock cycle = total # of registers

max # of unconnected signal tails = # inputs

max # of unconnected signal heads = # outputs



|              |                       |
|--------------|-----------------------|
| Inputs       | 6                     |
| Outputs      | 1                     |
| Registers    | 6                     |
| Adders       | 2                     |
| Clock Period | 1 adder + 1 register. |
| Latency.     | 4                     |

not two  
adders cause  
simultaneous  
calculation

## Designing with Dataflow Diagrams

Example:

functional requirement

$$\text{output} = (a \times d) + c + (d \times b) + b$$

performance requirement

max clk period: 1 reg delay + (2 addition delays + 1 multiply delay)

max latency: 4 clock periods.

resource requirement

max of three inputs & one output

max 2 adders

max 2 multipliers

unlimited regs.

### Design Process

1. Scheduling
2. I/O allocation
3. Initial High-level model
4. Register allocation
5. Datapath allocation
6. Multiplex insertion to connect datapath comp.
7. FSM
8. Optimization

### Scheduling

1. Create a data dependency graph.

$$y = (a \times d) + c + (d \times b) + b$$



2. schedule operations into clock periods



3. reschedule to meet signal requirements.

We start with 4 inputs X violates 3 inputs rule.



delaying b/d is bad cause  
they are involved in critical  
computations

c is part of a longer chain  
a is optimal.

4. Fix clock period violation (clock period  $z_{\parallel}$ )



5. optimize.



## Input / Output allocation



Initial High-level model → Build a FSM where

- every state repr. a clock boundary
- add a reset signal to ensure in first state.

See code, for FSM and TB.

## Register allocation.



| state | i <sub>1</sub> | i <sub>2</sub> | o <sub>1</sub> |
|-------|----------------|----------------|----------------|
| 1     | d              | b              | —              |
| 2     | a              | c              | —              |
| 3     | —              | —              | —              |
| 4     | —              | —              | o <sub>1</sub> |

| state | m <sub>1</sub> | a <sub>1</sub>                | a <sub>2</sub>                |
|-------|----------------|-------------------------------|-------------------------------|
| 1     | —              | —                             | —                             |
| 2     | h <sub>1</sub> | h <sub>2</sub>                | —                             |
| 3     | h <sub>1</sub> | h <sub>4</sub>                | h <sub>2</sub> h <sub>5</sub> |
| 4     | —              | h <sub>3</sub> h <sub>5</sub> | h <sub>2</sub> a <sub>1</sub> |

const. h<sub>1</sub> No h<sub>2</sub> No h<sub>3</sub> No h<sub>4</sub> No h<sub>5</sub> No h<sub>6</sub> No h<sub>7</sub> No h<sub>8</sub> No

→ Based on this, we need 5 num  
and 3 clk-enabled reg

| state | h <sub>1</sub>   | h <sub>2</sub>   | h <sub>3</sub>   | h <sub>4</sub>   | h <sub>5</sub>   |
|-------|------------------|------------------|------------------|------------------|------------------|
| 1     | 1 i <sub>1</sub> | 1 i <sub>2</sub> | x                | x                | x                |
| 2     | 0 —              | 0 —              | 1 m <sub>1</sub> | 1 i <sub>1</sub> | 1 i <sub>2</sub> |
| 3     | x                | 1 m <sub>1</sub> | 0 —              | x                | 1 a <sub>1</sub> |
| 4     | 1 a <sub>2</sub> | —                | —                | x                | —                |

constant? Still uses No  
input go  
continue  
clk

No new input  
go 1,

Final note: register size analysis.



$d$        $b$

For  $i_1$  and  $i_2$ ,

$$h_1 = 16b \quad h_2 = 16b$$

$a$        $c$

For  $i_1$  and  $i_2$

$$h_4 = 16b \quad h_5 = 16b$$

For  $i_1 \times i_2$ ,

$$h_3 = 32 \quad h_5 = 17 \text{ bits}$$

$$h_2 = 32$$

Then,

$$h_1 = 34 \text{ bits. } \{ \text{2 more additions} \}$$

regs.

$$h_1 \quad 34b$$

$$h_2 \quad 32b$$

$$h_3 \quad 32b$$

$$h_4 \quad 16b$$

$$h_5 \quad 17b$$

wires.

$$m1a \quad 16$$

$$m1b \quad 16$$

$$m1o \quad 32$$

$$16in$$

$$a1b$$

$$a1o$$

$$32$$

$$33$$

$$ala$$

$$a1b$$

$$a1o$$

$$17$$

$$33$$

$$ala$$

$$a1b$$

$$a1o$$

$$32$$

$$33$$

$$a2a$$

$$a2b$$

$$a2o$$

$$32$$

$$33$$

$$a2a$$

$$a2b$$

$$a2o$$

$$34$$

$$\left. \begin{array}{l} \\ \\ \\ \\ \end{array} \right\} 33 \text{ inputs}$$

$$\left. \begin{array}{l} \\ \\ \\ \\ \end{array} \right\} \text{adder}$$