

## Exercise 16 Gated Clocks:



Recirculating seems rather wasteful  
we need a 2:1 mux for each flop.  
Plus all the cap and gates inside the  
flops on the clock network will be  
toggling.

Wouldn't it be better to just gate the  
clock? Pretty much same  
functionality but should save area  
and power? Yes, but you have to  
do it right.

## Exercise 16 Gated Clocks:



Something this simple does not work. Recall the enable function is probably coming from a SM. So it changes slightly after the rising edge of clock.



resulting gated clock



That's pretty messed up!

## Exercise 16 Gated Clocks:



Have to use an enable low latch to latch the clock enable signal. This ensures the enable is setup and ready for the next high phase of clock

Have to balance from clock root to all leaf nodes of clock



## Exercise 16 Gated Clocks:

- Work in groups of two
- On the website you will find two files:
  - **mult\_accum.sv** (design that uses recirculating flops)
  - **mult\_accum.dc** (synthesis script)
- Synthesize **mult\_accum.sv** using the provided script
  - Note the area, and the power estimate
  - Note whether or not it passes timing
- Now modify the design to use gated clocks instead of recirculating.
- Modify the synthesis script as needed (minor change to read in **mult\_accum\_gated.sv** instead of **mult\_accum.sv**)
- Again synthesize and remark on the difference in both area and timing.
- Submit the area and dynamic power numbers for both designs to the dropbox for this exercise. (both people submit)

No clock Area :  
gating

|                                |             |
|--------------------------------|-------------|
| Number of ports:               | 83          |
| Number of nets:                | 796         |
| Number of cells:               | 689         |
| Number of combinational cells: | 607         |
| Number of sequential cells:    | 81          |
| Number of macros/black boxes:  | 0           |
| Number of buf/inv:             | 84          |
| Number of references:          | 39          |
|                                |             |
| Combinational area:            | 1569.085075 |
| Buf/Inv area:                  | 130.630018  |
| Noncombinational area:         | 535.227282  |
| Macro/Black Box area:          | 0.000000    |
| Net Interconnect area:         | 356.419649  |
|                                |             |
| Total cell area:               | 2104.312356 |
| Total area:                    | 2460.732005 |
| 1                              |             |

Total cell area: 2104.312356  
 Total area: 2460.732005

Power:

| Power Group   | Internal Power | Switching Power | Leakage Power | Total Power | ( % )     | Attrs |
|---------------|----------------|-----------------|---------------|-------------|-----------|-------|
| io_pad        | 0.0000         | 0.0000          | 0.0000        | 0.0000      | ( 0.00%)  |       |
| memory        | 0.0000         | 0.0000          | 0.0000        | 0.0000      | ( 0.00%)  |       |
| black_box     | 0.0000         | 0.0000          | 0.0000        | 0.0000      | ( 0.00%)  |       |
| clock_network | 0.0000         | 0.0000          | 0.0000        | 0.0000      | ( 0.00%)  |       |
| register      | 0.0000         | 0.4498          | 1.1407e+07    | 11.8565     | ( 18.62%) |       |
| sequential    | 0.0000         | 0.0000          | 0.0000        | 0.0000      | ( 0.00%)  |       |
| combinational | 0.0000         | 6.6058          | 4.5210e+07    | 51.8156     | ( 81.38%) |       |
| Total         | 0.0000 uW      | 7.0556 uW       | 5.6616e+07 pW | 63.6721 uW  |           |       |
| 1             |                |                 |               |             |           |       |

Timing :

| Point                                        | Incr  | Path   |
|----------------------------------------------|-------|--------|
| clock clk (rise edge)                        | 0.00  | 0.00   |
| clock network delay (ideal)                  | 0.00  | 0.00   |
| input external delay                         | 0.75  | 0.75 r |
| A[1] (in)                                    | 0.00  | 0.75 r |
| mult_15/A[1] (mult_accum_DW02_mult_0)        | 0.00  | 0.75 r |
| mult_15/U16/Y (NAND2X0_RVT)                  | 0.04  | 0.79 f |
| mult_15/U14/Y (XOR2X2_RVT)                   | 0.12  | 0.92 r |
| mult_15/S2_2_5/S (FADDX1_RVT)                | 0.16  | 1.08 f |
| mult_15/S2_3_4/S (FADDX1_RVT)                | 0.17  | 1.25 r |
| mult_15/S2_4_3/CO (FADDX1_RVT)               | 0.12  | 1.37 r |
| mult_15/S2_5_3/S (FADDX1_RVT)                | 0.18  | 1.55 f |
| mult_15/S2_6_2/S (FADDX1_RVT)                | 0.17  | 1.72 r |
| mult_15/S4_1/CO (FADDX1_RVT)                 | 0.12  | 1.84 r |
| mult_15/U35/Y (XOR2X2_RVT)                   | 0.14  | 1.97 f |
| mult_15/FS_1/A[7] (mult_accum_DW01_add_1)    | 0.00  | 1.97 f |
| mult_15/FS_1/U12/Y (AND2X1_RVT)              | 0.07  | 2.04 f |
| mult_15/FS_1/U48/Y (AO2I1X1_RVT)             | 0.10  | 2.14 r |
| mult_15/FS_1/U46/Y (OA2I1X1_RVT)             | 0.07  | 2.21 r |
| mult_15/FS_1/U41/Y (OA2I1X1_RVT)             | 0.07  | 2.28 r |
| mult_15/FS_1/U39/Y (OA2I1X1_RVT)             | 0.07  | 2.36 r |
| mult_15/FS_1/U37/Y (AND2X1_RVT)              | 0.06  | 2.42 r |
| mult_15/FS_1/U36/Y (AO22X1_RVT)              | 0.08  | 2.50 r |
| mult_15/FS_1/U6/Y (XOR2X2_RVT)               | 0.11  | 2.61 f |
| mult_15/FS_1/SUM13 (mult_accum_DW01_add_1)   | 0.00  | 2.61 f |
| mult_15/PRODUCT[15] (mult_accum_DW02_mult_0) | 0.00  | 2.61 f |
| U89/Y (MDX21X2_RVT)                          | 0.08  | 2.69 f |
| prod_reg_reg[15]/D (DFFX1_RVT)               | 0.01  | 2.70 f |
| data arrival time                            | 2.70  |        |
|                                              |       | I      |
| clock clk (rise edge)                        | 2.50  | 2.50   |
| clock network delay (ideal)                  | 0.00  | 2.50   |
| prod_reg_reg[15]/CLK (DFFX1_RVT)             | 0.00  | 2.50 r |
| library setup time                           | -0.05 | 2.45   |
| data required time                           | 2.45  |        |
|                                              |       |        |
| data required time                           | -2.70 |        |
| data arrival time                            | -0.25 |        |
| slack (VIOLATED)                             | -0.25 |        |

With clock  
gating Area :

went  
down

| saed32rvt_tt0p85v25c (File: /cae/apps/data/saed32) |             |
|----------------------------------------------------|-------------|
| Number of ports:                                   | 83          |
| Number of nets:                                    | 767         |
| Number of cells:                                   | 653         |
| Number of combinational cells:                     | 508         |
| Number of sequential cells:                        | 84          |
| Number of macros/black boxes:                      | 0           |
| Number of buf/inv:                                 | 76          |
| Number of references:                              | 46          |
|                                                    |             |
| Combinational area:                                | 1383.559942 |
| Buf/Inv area:                                      | 121.734978  |
| Noncombinational area:                             | 581.481484  |
| Macro/Black Box area:                              | 0.000000    |
| Net Interconnect area:                             | 285.114879  |
|                                                    |             |
| Total cell area:                                   | 1965.041426 |
| Total area:                                        | 2250.156306 |
| 1                                                  |             |

Power :  
went up

| Power Group   | Internal Power | Switching Power | Leakage Power | Total Power | ( % )     | Attrs |
|---------------|----------------|-----------------|---------------|-------------|-----------|-------|
| io_pad        | 0.0000         | 0.0000          | 0.0000        | 0.0000      | ( 0.00%)  |       |
| memory        | 0.0000         | 0.0000          | 0.0000        | 0.0000      | ( 0.00%)  |       |
| black_box     | 0.0000         | 0.0000          | 0.0000        | 0.0000      | ( 0.00%)  |       |
| clock_network | 0.0000         | 6.8038          | 3.2385e+05    | 7.1276      | ( 10.78%) |       |
| register      | 0.0000         | 0.3484          | 1.4014e+07    | 14.3621     | ( 21.73%) |       |
| sequential    | 0.0000         | 0.0000          | 0.0000        | 0.0000      | ( 0.00%)  |       |
| combinational | 0.0000         | 6.1306          | 3.8473e+07    | 44.6035     | ( 67.49%) |       |
|               |                |                 |               |             |           |       |
| Total         | 0.0000 uW      | 13.2827 uW      | 5.2811e+07 pW | 66.0933 uW  |           |       |
| 1             |                |                 |               |             |           |       |

Timing :  
better

|                                                                               |
|-------------------------------------------------------------------------------|
| Startpoint: en_stg2_reg<br>(rising edge-triggered flip-flop clocked by clk)   |
| Endpoint: clk_en_lat2_reg<br>(positive level-sensitive latch clocked by clk') |
| Path Group: clk                                                               |
| Path Type: min                                                                |
|                                                                               |
| Des/Clust/Port      Wire Load Model      Library                              |
| mult_accum_gated      8000      saed32rvt_tt0p85v25c                          |
|                                                                               |
| Point      Incr      Path                                                     |
| clock clk (rise edge)      2.50      2.50                                     |
| clock network delay (ideal)      0.00      2.50                               |
| en_stg2_reg/CLK (DFFX1_RVT)      0.00      2.50 r                             |
| en_stg2_reg/Q (DFFX1_RVT)      0.13      2.63 r                               |
| clk_en_lat2_reg/D (LATCHX1_RVT)      0.01      2.64 r                         |
| data arrival time      2.64      2.64                                         |
|                                                                               |
| clock clk' (fall edge)      2.50      2.50                                    |
| clock network delay (ideal)      0.00      2.50                               |
| clk_en_lat2_reg/CLK (LATCHX1_RVT)      0.00      2.50 f                       |
| library hold time      -0.04      2.46                                        |
| data required time      2.46      2.46                                        |
|                                                                               |
| data required time      2.46      2.46                                        |
| data arrival time      -2.64      2.46                                        |
|                                                                               |
| slack (MET)      0.18      0.18                                               |