

+5.3

## 1. Code Debugging and Simulation (10pts)

A. (6pts) Identify syntax and semantic errors. Correct them and put annotations.

There are 12 errors and 0.5pts for each.

```
module 2value_comparator (result, clk, rst, valueA[3:0], valueB[3:0]);  
    /* ①不能数字間接 */  
    /* ②不用 [3:0] */  
  
    input clk;  
    input rst; /* Active High Asynchronous Reset */  
    /* ③註記錯誤 */  
  
    signed input [3:0] valueA;  
    signed input [3:0] valueB;  
    /* ④ always block 有問題，要加 reg */  
    output [1:0] result;  
  
    /* ⑤ assign 用 wire */  
    reg BigA, BigB;  
  
    assign BigA = (valueA >= valueB)? 1'b1: 1'b0;  
    assign BigB = (valueA <= valueB)? 1'b1: 1'b0;  
    /* ⑥ 加 @ */  
    /* ⑦ 用 posedge */  
    always @(posedge clk or rst);  
    /* ⑧ 不用分號 */  
  
    begin  
        if(rst)  
            result <= 2'b00;  
        else  
            result <= {BigA, BigB};  
    end  
endmodule⑨ 不用 s
```

+3.5

**B. (4pts)** Please draw the waveform for the circuit in part A. Suppose that the signal result is reset to zero in the beginning. Use "xx" to indicate values that cannot be determined given the provided information.

The diagram illustrates the state of four signals over eight clock cycles. The **clk** signal is a square wave. The **valueA[3:0]** and **valueB[3:0]** signals are 4-bit binary values. The **result** signal is a 2-bit binary value.

| clk | valueA[3:0] | valueB[3:0] | result |
|-----|-------------|-------------|--------|
| 0   | 4'b0000     | 4'b0000     | 2'b11  |
| 1   | 4'b0111     | 4'b0011     | 2'b10  |
| 2   | 4'b0011     | 4'b0111     | 2'b01  |
| 3   | 4'b1011     | 4'b0111     | 2'b01  |
| 4   | 4'b0011     | 4'b1111     | 2'b10  |
| 5   | 4'b1011     | 4'b1111     | 2'b01  |
| 6   | -           | -           | -      |
| 7   | -           | -           | -      |

Notes: In the valueA[3:0] row, the fourth column has a circled '0' under the first bit and a circled '1' under the fourth bit. The fifth column has a circled '3' under the third bit and a circled '5' under the fourth bit. The sixth column has a circled '3' under the third bit and a circled '5' under the fourth bit. In the valueB[3:0] row, the fourth column has a circled '3' under the third bit and a circled '7' under the fourth bit. The fifth column has a circled '7' under the third bit and a circled '1' under the fourth bit. The sixth column has a circled '1' under the third bit and a circled '1' under the fourth bit. The seventh column has a circled '1' under the third bit and a circled '1' under the fourth bit. In the result row, the second column has a circled '0' under the first bit and a circled '1' under the second bit. The third column has a circled '1' under the first bit and a circled '0' under the second bit. The fourth column has a circled '0' under the first bit and a circled '1' under the second bit. The fifth column has a circled '1' under the first bit and a circled '0' under the second bit. The sixth column has a circled '0' under the first bit and a circled '1' under the second bit. The seventh column has a circled '1' under the first bit and a circled '0' under the second bit. The eighth column has a circled '0' under the first bit and a circled '1' under the second bit.

## 2. Finite State Machine and Simulation (10pts)

Given a Finite-State-Machine (FSM) as below.

```
module FSM (clk, rst, in, out_r);
parameter S0 = 2'b00, S1 = 2'b01, S2 = 2'b10, S3 = 2'b11;

input clk, rst, in;
output [1:0] out_r;

reg [1:0] out_r, out;
reg [1:0] current_state, next_state;
// Next State Logic
always @(*) begin
    case(current_state)
        S0: next_state = (in == 1'b0)? S0 : S2;
        S1: next_state = (in == 1'b1)? S0 : S3;
        S2: next_state = (in == 1'b0)? S1 : S3;
        S3: next_state = (in == 1'b1)? S1 : S2;
        default: next_state = 2'b01;
    endcase
end
// Current State Memory & Output Register
always@(posedge clk or posedge rst)
begin
    if(rst) begin
        current_state <= 0;
        out_r <= 0;
    end
    else begin
        current_state <= next_state;
        out_r <= out;
    end
end
// Output Logic
always@(*) begin
    out[1] = (~in) ~^ current_state[0];
    out[0] = in ^ current_state[0] ^ current_state[1];
end
endmodule
```

|    |    | $\bar{in} = 0$ | $\bar{in} = 1$ |        |        |
|----|----|----------------|----------------|--------|--------|
|    |    | out[1]         | out[0]         | out[1] | out[0] |
| S0 | 00 | 0              | 0              | 1      | 1      |
| S1 | 01 | 1              | 1              | 0      | 0      |
| S2 | 10 | 0              | 1              | 1      | 0      |
| S3 | 11 | 1              | 0              | 0      | 1      |

(a) (5pts) Please draw a state transition graph below for this FSM.

Example



- (b) (5pts) The FSM is included as a design under test (DUT) in the testbench. After simulation, the command window shows the outputs. Please finish the outputs based on operation of the FSM.

```

module testbench;
reg clk, rst;
reg in;
wire [1:0] out;

FSM DUT(.clk(clk), .rst(rst), .in(in), .out_r(out));

always @(*)begin
    $monitor("%t %b %b %b %b", $time, clk, rst, in, out);
End

endmodule

```

### Monitor Output Response:

| Time | clk | reset | in | out |    |
|------|-----|-------|----|-----|----|
| 0    | 0   | 0     | 0  | xx  | CS |
| 1    | 1   | 1     | 0  | 00  | S0 |
| 2    | 0   | 0     | 1  | 00  | S0 |
| 3    | 1   | 0     | 1  | 11  | S2 |
| 4    | 0   | 0     | 1  | 11  | S2 |
| 5    | 1   | 0     | 1  | 10  | S3 |
| 6    | 0   | 0     | 1  | 00  | S3 |
| 7    | 1   | 0     | 1  | 01  | S1 |
| 8    | 0   | 0     | 0  | 01  | S1 |
| 9    | 1   | 0     | 0  | 11  | S3 |
| 10   | 0   | 0     | 1  | 11  | S3 |
| 11   | 1   | 0     | 1  | 01  | S1 |
| 12   | 0   | 0     | 0  | 01  | S1 |
| 13   | 1   | 0     | 0  | 11  | S3 |
| 14   | 0   | 0     | 0  | 10  | S3 |
| 15   | 1   | 0     | 0  | 10  | S2 |
| 16   | 0   | 0     | 0  | 00  | S2 |



### 3. Logic Synthesis + Blocking & Non-Blocking (10 pts)

Please draw the corresponding circuits (in the right column) according to the Verilog codes (in the left column). You can use AND, OR, NAND, NOR, XOR, XNOR, NOT, MUX, D Flip-Flop, Latch in the circuit diagram.

| (a) Verilog Code (2 pts)                                                                                            | Circuit Diagram |
|---------------------------------------------------------------------------------------------------------------------|-----------------|
| <pre>always @(*) begin     X = A&amp;B;     Y = (X)? A: B; end</pre>                                                |                 |
| (b) Verilog Code (2 pts)                                                                                            | Circuit Diagram |
| <pre>always @ (posedge clk) begin     A &lt;= ~D;     B &lt;= ~A ^ ~D;     C &lt;= B;     D &lt;= C ~^ D; end</pre> |                 |
| (c) Verilog Code ( 3pts)                                                                                            | Circuit Diagram |
| <pre>always@(A or B or C ) begin     if (C)         D = ~A &amp; B; end</pre>                                       |                 |
| (d) Verilog Code ( 3pts)                                                                                            | Circuit Diagram |
| <pre>always@( posedge clk) begin     if (!C)         D &lt;= A   ~B; end</pre>                                      |                 |

#### 4. Timing Analysis (25 pts)

##### A. (7 pts) <Timing Path>

The following figure shows a circuit with three primary inputs (A, B, C) and two primary outputs (PO1, PO2). The three sequential elements have individual data inputs ( $s_1$ ,  $s_2$ ,  $s_3$ ) and clocks ( $\text{clk}_1$ ,  $\text{clk}_2$ ,  $\text{clk}_3$ ). The cloud-shape modules are combinational elements.



(a) (4 pts) Please list all start points and end points.

start  
A, B, C  
 $\text{clk}_1, \text{clk}_2, \text{clk}_3$

共 6 個

end  
 $s_1, s_2, s_3$   
PO1, PO2

共 5 個

+4

(b) (3 pts) Please list all timing paths by their start point and end point.

(Ex:  $\text{clk}_1$  to  $s_2$  is a timing path)

$A \rightarrow s_1$   
 $B \rightarrow \text{PO2}$   
 $B \rightarrow s_2$   
 $B \rightarrow s_3$   
 $C \rightarrow s_3$

$\text{clk}_1 \rightarrow s_2$   
 $\text{clk}_2 \rightarrow \text{PO1}$   
 $\text{clk}_3 \rightarrow \text{PO2}$

+3

### B. (12 pts) <Hold Time & Setup Time>

Suppose that the timing characteristics of the flip-flops in the circuit are the same. Their timing diagrams and parameters can be described as follows:



$$T_1 = 0.4\text{ns}, T_2 = 0.1\text{ns}$$

The circuit below operates at the clock frequency of 250MHz. Suppose that the rise, fall delays for each combinational element are the same.



$$250 \times 10^6 = 4\text{ns}$$

$$T_{cq} = T_1 + T_2 = 0.5\text{ns}$$

$$T_{cq,cd} = T_1 = 0.4\text{ns}$$

(a) (3 pts) Write the timing inequality for setup time and hold time.

setup time

$$T_{cq} + T_{logic} + T_{su} < T_{clk}$$

$$\Rightarrow T_{su} < T_{clk} - T_{cq} - T_{logic}$$

$$T_{su} < 4\text{ns} - 0.5\text{ns} - (3\text{ns} + 2\text{ns})$$

$$\Rightarrow T_{su} < -1.5\text{ns}$$

hold time

$$T_{cq,cd} + T_{logic,cd} > T_h$$

$$0.4\text{ns} + 1\text{ns} > T_h$$

$$T_h < 1.4\text{ns}$$

(b) (2 pts) If  $T_{setup}$  (Setup Time) = 0.4ns,  $T_{hold}$  (Hold Time) = 1.5ns

Are there setup time and hold time violations in this circuit? Use the timing inequalities in (a) for setup/hold time at the clock frequency to check them.

$T_{su} < -1.5 \text{ ns}$ ,  $0.4 \text{ ns} > -1.5 \text{ ns} \Rightarrow$  setup time violation

+2

$T_h < 1.4 \text{ ns}$ ,  $1.5 \text{ ns} > 1.4 \text{ ns} \Rightarrow$  hold time violation

(c) (3 pts) Following part (b), if there are setup/hold time violations in this circuit, how to perform "retiming" to solve these issues? Suppose that all of combinational elements cannot be further separated. Please draw your circuit and write the timing inequalities of the modified circuits after retiming.

retiming: 調整 DFF 的位置

+3

for setup: 使 longest path ( $3+2\text{ns}$ ) 变短

for hold: 使 shortest path ( $1\text{ns}$ ) 变长

$$\Rightarrow T_{su} < 4 - 0.5 - 3 = 0.5 \Rightarrow T_{su} < 0.5 \text{ ns}, \therefore T_{su} = 0.4 \text{ ok}$$

$$\Rightarrow T_h < 0.4 \text{ ns} + 2 \text{ ns} = 2.5 \Rightarrow T_h < 2.5 \text{ ns}, \therefore T_h = 1.5 \text{ ok}$$



(d) (4 pts) If there is clock skew in this circuit, as shown in bellow. Please write the timing inequality for setup time and hold time at the clock frequency of 200MHz without any timing violation.



$$200 \text{ MHz} = 5 \text{ ns}$$

setup time

$$T_{cq} + T_{logic} + T_{su} \leq T_{clk} + T_{skew}$$

$$\Rightarrow 0.3 + (2+2) + T_{su} \leq 5 + 2$$

$$T_{su} \leq 2.5 \text{ ns}$$

hold time

$$T_{cq,cd} + T_{logic,cd} \geq T_h + T_{skew}$$

$$0.4 + (1+2) \geq T_h + 2$$

$$T_h \leq 1.4 \text{ ns}$$

C. (6 pts) <Required Time & Slack>

The following figure shows a combinational circuit with primary inputs A, B, C, and D. Arrival time of these primary inputs are 0.1, 0.2, 0.15, and 0.1 respectively. A multiplier has a delay of 0.2 and a adder has a delay of 0.1. Assume delay values of wires are zero.

(a) (2 pts) Write down the arrival time of the output of each edge.



+2

(b) (2 pts) Assume that the required time at primary output is 0.8, please compute required time and slack for each gate and primary inputs. Write down your answer on each edge. (Required time/slack)



+2

(c) (2 pts) Please indicate the critical path and show the critical path by required time values.



## 5. Coding for Synthesis (15 pts)

### A. (8 pts) Loadable, Up-Down Counter

A counter with an initial value and up/down count function is shown below.



Please complete the Verilog code of this counter:

```
module counter (clk, rst, init, load, updown, cnt);
```

*// I/O & Reg/Wire Declaration*

```
input clk, rst;  
input [3:0] init;  
input load, updown;  
  
output reg [3:0] cnt;  
reg [3:0] next_cnt;
```

*// You can declare new signals here if you need*

*// Combinational Logic for the Counter (4 pts)*

```
always@(*) begin
```

```
    case ({load, updown})
```

```
        2'b00 : next_cnt = cnt + 1;
```

```
        2'b01 : next_cnt = cnt - 1;
```

```
        default : next_cnt = init;
```

4

end case

```
end
```

// Sequential Logic for the Counter (4 pts)

```
always@(posedge clk) begin  
    if (rst)  
        Cnt <= 0;  
    else  
        Cnt <= next_cnt;  
  
end  
endmodule
```

+4

**B. (7 pts) Gray Counter**

Create a three-bit gray counter with positive edge reset. The preset is initialized to a value of 000. Recall that a Gray Counter changes only one-bit at a time. For example, a 2-bit Gray counter has a count sequence 00, 10, 11, 01 corresponding to a decimal count values of 0, 1, 2, 3 respectively. In the Gray count sequence, only one bit changes between adjacent count values, and the left-most bit is changed as long as it does not result in a code word that has been visited earlier.

clk: 1-bit clock input, all actions on positive edge

rst: 1-bit reset, causes reset on positive edge (asynchronous)

gray\_out: 3-bit result

0 0 0  
1 0 0  
1 1 0  
0 1 0

module gray\_counter (clk, rst, gray\_out);

1 1 1  
1 0 1  
0 0 1

input clk, rst;

output [2:0] gray\_out;

reg [2:0] gray\_out;  
reg [2:0] gray\_tmp;  
reg [2:0] count;

```
always @ (posedge clk or posedge rst)
```

```
begin
```

```
if (rst)
```

```
    count <= 0 ;
```

```
else
```

```
    count <= count + 1 ;
```

```
end
```

```
always @ (posedge clk or posedge rst)
```

```
begin
```

```
if (rst)
```

```
    gray_out <= 0 ;
```

```
else
```

```
    gray_out <= gray_tmp ;
```

```
end
```

```
always @ (*)
```

```
begin
```

```
case (count)
```

```
3'd0 : gray_tmp = 3'b100 ;
```

```
3'd1 : gray_tmp = 3'b110 ;
```

```
3'd2 : gray_tmp = 3'b010 ;
```

```
3'd3 : gray_tmp = 3'b011 ;
```

```
3'd4 : gray_tmp = 3'b111 ;
```

```
3'd5 : gray_tmp = 3'b101 ;
```

```
3'd6 : gray_tmp = 3'b001 ;
```

```
3'd7 : gray_tmp = 3'b000 ;
```

```
default : gray_tmp = 3'b000 ;
```

```
endcase
```

```
end
```

```
endmodule
```

## 6. Synthesis Issues (28 pts)

### A. (6 pts) < Important files related to Design Compiler >

Please explain the meaning of the following terminologies and where to use them:

- (a) (2 pts) DesignWare library (e.g: dw\_fundation.sldb)
- (b) (2 pts) Standard Delay File (e.g: CHIP\_syn.sdf)
- (c) (2 pts) .synopsys\_dc.setup

- (a) 合成用時那個 gate 的 library，也就是合成時那些 gate 的來源和資訊
- (b) 合成後用來記錄時間資訊的檔案，用來模擬用
- (c) 初步 design compiler 的設定檔，包含 library 的路徑等資訊

### B. (4 pts) < Synopsys Design Constraints File( SDC ) >

Please explain the meaning of the following command and why we use them in Design Compiler:

- (a) (2 pts) set\_dont\_touch\_network [get\_clocks\_clk]  
set\_ideal\_network [get\_ports clk]
- (b) (2 pts) set\_clock\_uncertainty 0.1 [get\_clocks clk]

- (a) 在合成時不要去侵入 clk，把他當成 ideal network，因為合成只參考 APR 再去處理
- (b) clk tree 有些地方會比較長，因此並不一定会同時抵達 register，(skew)  
因此加上 uncertainty 確保在 worst case 也能達到正的 slack

### C. (2 pts) < Post-sim >

Explain what does Standard Delay Format (SDF) file do in gate-level simulation?

SDF 檔紀錄了合成後每個 gate 的時間資訊，如 delay。  
如果在 simulation 時沒有 SDF 檔，那會有 time violation，  
有了 delay 的資訊，才能在 gate-level 做 simulation

#### D. (10 pts) < Area Report >

Below is the area report after synthesis.

```
*****
Report : area
Design : ALU
*****
...
Number of cells: 90
Number of references: 10
Combinational area: 1939.291626
Noncombinational area: 2049.062256
Net Interconnect area: undefined
Total cell area: 3988.353760
Total area: undefined
```

- (a) (2 pts) It sometimes shows “undefined” in total area. Please explain why and describe how to fix it.
- (b) Following part(a),
- (2 pts) In cell-based IC design flow, we will focus on total cell area instead of total area, please explain why.
  - (3 pts) The total cell area will be underestimated in this situation. Please briefly explain why.
- (c) (3 pts) With the same RTL code, if we reduce the clock cycle, what part in the report will increase? Please briefly explain why.

(a) 沒有 wire-load model 的資訊，因此在合成時，會不知道 Net 的 area  
fix：在 design compiler 中要 set-wire-load-model .....

由背面

+10

(e) 前面

b) i: 因為合成時我們只要做出 gate-level 的 cell, 而 cell 放在哪, 以及 cell 之間怎麼連線並沒有做優化.

到 APR 才會做, 因此只需求考慮 Total cell area 就好

ii: 因為沒有 set wire load model, 沒有線的資訊, 自然沒有線的 delay, 導致可以運算的時間更多, 可以用面積小速度慢的 cell 來做, 因此 area 會較小

c) combinatorial area 會上升, 因為 cycle time 變短, 導致要用面積大, 速度快的 cell 來算, 因此 area 會變大

+6

E. (6 pts) < External IP issue >

If there's a memory module in DUT, and we generate several files from a Memory Compiler. For example, the generated files are **rom\_1024x4\_t13\_slow\_syn.db**, **rom\_1024x4\_t13.v**.

(a) (2 pts) Please explain what these files are for and what will happen if we do not include these files.

.db 檔類似 library, 記錄了 delay, 溫度等資訊

.v 檔是合成之後的 gate-level 檔.

如果沒有 .db, 那就會缺少 timing 的資訊, 會有 violation

如果沒有 .v, 那就沒有 gate-level 的接線及功能, 就無法做 simulation

(b) (2 pts) Please modify .synopsys\_dc.setup as shown below in order to make design compiler access the memory file. (JUST NEED TO POINT OUT WHERE TO MODIFY)

+9

## 7. Design for Testability (10 pts)

Given the circuit below, please answer the following questions.



- (a) How many single stuck-at fault (SSF) are there in the circuit? (2pts)  
(b) Generate all possible input test patterns for fault  $n_1/0$  (4pts)  
(c) Generate all possible input test patterns for fault  $n_3/1$  (4pts)

(a)  $8 \times 2 = 16$  #

(b)  $n_1 = 1 \rightarrow X_1 = 0 \& X_2 = 0$

To pass  $n_1$  to  $Y$ ,  $n_3 = 1 \rightarrow (n_2, X_4) = (0, 0), (0, 1), (1, 0)$

$\Rightarrow (X_3, X_4) = (1, 0), (1, 1), (0, 1)$  (0, 0, 0, 0)

$\Rightarrow (X_1, X_2, X_3, X_4) : (0, 0, 1, 0), (0, 0, 1, 1), (0, 0, 0, 1)$

(c)  $n_3 = 0 \rightarrow n_2 = 1 \& X_4 = 1 \Rightarrow X_3 = 0$

To pass  $n_3$  to  $Y$ ,  $n_1 = 1 \Rightarrow X_1 = 0 \& X_2 = 0$

$\Rightarrow (X_1, X_2, X_3, X_4) = (0, 0, 0, 1)$  #

```
set search_path "Your_path/CBDK_TSMC013_Arm/CIC/SynopsysDC  
$search_path"  
set target_library "slow.db fast.db" 加入 rom_1024x4_t13-slow-syn.db  
set link_library "* $target_library dw.foundation.db"  
set symbol_library "tsmc13.sdb generic.sdb"  
set synthetic_library "dw.foundation.sldb"  
...  
...
```

- (c) (2 pts) Should we synthesis rom\_1024x4\_t13.v with DUT.v ? Please explain why.

不用，因 rom\_1024x4\_t13.v 已經為 gate-level 且有  
考慮 timing 的問題，可直接用，佳處合成 DUT