

# EEM16 Homework #4

TIAN YE

TOTAL POINTS

**100 / 100**

QUESTION 1

## 1 Problem #1 20 / 20

✓ - 0 pts Correct

- 1 pts Does Not Loop Back to Initial in part A after Done
- 1 pts Multiple Adders or Dividers in Part B
- 6 pts Did Partial B
- 10 pts Did Not Attempt Part B

QUESTION 2

## 2 Problem #2 20 / 20

✓ - 0 pts Correct

- 1 pts Does Not Redirect Back to the Start if mod = x or mod = 0 for Flow Diagram
- 3 pts Incomplete Flow Diagram
- 3 pts Incomplete Controller FSM
- 4 pts Incomplete Datapath
- 0 pts No Flow Diagram
- 6 pts No Controller FSM
- 8 pts No Datapath

QUESTION 3

## 3 Problem #3 20 / 20

✓ - 0 pts Correct

- 4 pts Incorrect Cycle Time in Part A
- 4 pts Incorrect Number of DFFs in Part B
- 2 pts Incorrect Latency for Part C
- 2 pts Incorrect Cycle Time for Part C
- 4 pts Incorrect for DFF in Part D
- 2 pts Incorrect Latency for Part E
- 2 pts Incorrect Cycle Time for Part E

QUESTION 4

## 4 Problem #4 20 / 20

✓ - 0 pts Correct

- 2 pts Partially Incorrect Minimum Cycle Time in

Part A

- 2 pts Partially Incorrect New Minimum Cycle Time in Part D
- 5 pts Incorrect Minimum Cycle Time in Part A
- 5 pts Incorrect Hold Time Violation in Part B
- 5 pts Incorrect Clock Skew in Part C
- 5 pts Incorrect New Minimum Cycle Time in Part D

QUESTION 5

## 5 Problem #5 20 / 20

✓ - 0 pts Correct

- 1 pts Partially Incorrect Input
- 1 pts Incorrect Contamination Delay of the Circuit
- 2 pts Incorrect Propagation Time Delay
- 3 pts Incorrect Minimum Cycle Time
- 3 pts Did not attempt Part E
- 3 pts Did not attempt Part F & G
- 3 pts Did not attempt Part H
- 20 pts Did Not Attempt Problem 5

## Homework #4 v20180516

(Deadline: 11:59PM PDT, Wednesday June 6, 2018)

Name (Last, First): Ye, Tian

Student Id #: 704931660

### INSTRUCTIONS

This homework is to be done individually. You may use any tools or refer to published papers or books, but may not seek help from any other person or consult solutions to prior exams or homeworks from this or other courses (including those outside UCLA). You're allowed to make use of tools such as Logisim, WolframAlpha (which has terrific support for boolean logic) etc.

You must submit all sheets in this file based on the procedure below. Because of the grading methodology, it is much easier if you print the document and answer your questions in the space provided in this problem set. It can be even easier if you answer in electronic form and then download the PDF. Answers written on sheets other than the provided space will not be looked at or graded. Please write clearly and neatly - if we cannot easily decipher what you have written, you will get zero credit.

**SUBMISSION PROCEDURE:** You need to submit your solution online at Gradescope (<https://gradescope.com>). Please see the following guide from Gradescope for submitting homework. You'd need to upload a PDF and mark where each question is answered.

[http://gradescope-static-assets.s3-us-west-2.amazonaws.com/help/submitting\\_hw\\_guide.pdf](http://gradescope-static-assets.s3-us-west-2.amazonaws.com/help/submitting_hw_guide.pdf)

**Problem #1**

The following is pseudo-code for an algorithm (Newton's method) for calculating square root,  $y=\text{root}(x)$ . The input,  $x$ , and output,  $y$ , are floating point numbers. The computation is triggered with a signal,  $\text{go}=1'b1$ , for 1 cycle that loads the input to a register that holds  $x$ . The computation completes by asserting a signal,  $\text{done}=1'b1$ , for 1 cycle. Implement this code as a controller and datapath

*initial*

```

y = 1
i = 0
while (i<128) {
    y = 0.5*(y + x/y)
    i++
}

```

- (a) Draw a flow chart for this algorithm
- (b) Draw the controller FSM state diagram and the datapath using building blocks that are available below. Be sure to indicate the outputs that control the datapath and inputs from the datapath. Write any boolean logic needed for those control signals of the datapath. For the datapath you can assume available to you the following building blocks: multiplexers (any number of inputs), any number of registers, one magic adder that can add either floating point or integer, and one magic divider. The variable  $y$  is the iterated solution to the algorithm. Use as few states as possible in your controller. You don't need to do any checks on the bounds of the floating point number. Assume a START state where the FSM is initialized. Upon completion return to the START state.
- (c) If instead of iterating for 128 times we are looking for a certain percent accuracy, for instance 1%. What part would you change in the design, and how would you change it? Show the change in by writing pseudo-code



(IMPORTANT: Keep this page in submission even if left unused)



$$c. \frac{|y - \sqrt{x}|}{\sqrt{x}} \leq 0.01$$

$$y - \sqrt{x} \leq 0.01\sqrt{x}$$

$$y - \sqrt{x} \geq -0.01\sqrt{x}$$

$$\therefore \text{while } (y^2 \geq 1.01^2 x \text{ || } y^2 \leq .99^2 x) /$$

*execute*

}

**(IMPORTANT: Keep this page in submission even if left unused)**

**Problem #2**

The following is pseudo-code for calculating modulo-7 in a binary (fixed-point) system. The input,  $x[31:0]$ , and output,  $mod$ , are 32-bit integers. The computation is triggered with a signal,  $go=1'b1$ , for 1 cycle that loads the input word to register  $x$ . The computation completes by asserting a signal,  $done=1'b1$ , for 1 cycle.

```
while (x > 7) {
    x = x >> 3 + x[2:0]
}
mod = (x < 7) ? x : 0;
```

Implement this code as a controller and datapath. In your implementation, similar to problem #1, first draw a flow diagram. Then draw a datapath and a controller FSM state diagram. Write any boolean expressions for the logic for any control signals to the datapath. For the datapath, as an added constraint, use as few hardware blocks or gates as possible. You are not constrained by the number of states. Indicate what building blocks or logic gates you need. Assume a START state where the FSM is waiting for the  $go=1'b1$ . Upon completion, return to the START state. The  $mod$  output of the previous computation should be asserted and maintained in the START state until the next computation starts.



**(IMPORTANT: Keep this page in submission even if left unused)**

**(IMPORTANT: Keep this page in submission even if left unused)**

**Problem #3**

The following shows delays of blocks of logic that are atomic (cannot be further divided). If  $\text{data}_{\text{in}}$  are from registers and  $\text{data}_{\text{out}}$  are stored by registers. The registers have  $t_{\text{dCLK-Q}} = 4$ ,  $t_{\text{setup}} = 2$  and  $t_{\text{hold}} = -1$ . Note that there is no contamination delay in the problem.



- What is the cycle time of this logic?
- If we are to minimize the cycle time, show on the figure below where you would insert registers. How many would you insert (show each register with a vertical line through the arrows)?
- What is the cycle time of the design in (b)? What is the latency in (b)?
- Note that the design in (b) requires a lot of registers in order to minimize the cycle time. Meanwhile by relaxing the cycle time slightly to be <22, one can come to a design that uses far fewer registers without sacrificing much cycle time. Show on the figure below where you would insert the registers. How many would you insert?
- What is the cycle time of the design in (d)? What is the latency in (d)?

$$\begin{aligned}
 a. \quad t_{\text{cyc}} &\geq t_{\text{pclk} \rightarrow Q} + t_{\text{ployer}} + t_{\text{arry}} = 58 \\
 c. \quad \text{cycle time: } 19 & \quad \text{latency: } 114 \\
 \cancel{\text{cycle time: } 19} & \quad \cancel{\text{latency: } 114} \\
 e. \quad \text{cycle time: } 21 & \quad \text{latency: } 94
 \end{aligned}$$

Figure provided to answer (b)



(IMPORTANT: Keep this page in submission even if left unused)

Figure provided to answer (d)



**(IMPORTANT: Keep this page in submission even if left unused)**

**Problem #4**

The following shows a conceptual state machine. The registers have the following characteristics:  $(t_{cCLK-Q}, t_{dCLK-Q}) = (5,40)$ ,  $t_{\text{setup}} = 20$  and  $t_{\text{hold}} = 10$ .



- (a) If the combinational logic has delay of  $(t_{CCN}, t_{DCN}) = (10,200)$ , and clock skew of  $\Delta=0$ , what is the minimum cycle time?
- (b) With the same delay as in (a), is there a hold time violation?
- (c) What is the amount of clock skew,  $\Delta$ , such that there would be a hold time violation?
- (d) Assuming the  $\Delta$  in (c), what is the new minimum cycle time?

a.  $t_{\text{cycle}} \geq 260$   
*no skew is not*

c.  $10 \leq 15 - 2\Delta$

$5/2 \leq \Delta$

*Clock skew greater than 2.5 would cause hold time violation.*

d. *Minimum cycle time is any value greater than 265.*

**(IMPORTANT: Keep this page in submission even if left unused)**

**(IMPORTANT: Keep this page in submission even if left unused)**

**Problem #5**

Hamming code is a way to correct errors when storing and/or transmitting information. This problem does not require you to know the details of a Hamming code. With a 12-bit word of data,  $d[11:0]$ , typically an additional 4 bits,  $p[3:0]$ , of information is appended to be able to correct for any errors in the resulting 16 bits. Each of the  $p[3:0]$  ensures parity for 8 of the 16 bits. The 8 bits for each  $p[3:0]$  are a carefully chosen combination of the 16 bits. Each of the 4 parity is calculated by XORing 16-bits together, and 8 of those bits are forced to 1'b0. By choosing which of the 8-bits to force to 1'b0, one can adjust and program the Hamming code.

Parity is checked by XORing 16 inputs together (8 of the inputs are selected to be 1'b0). The XOR gate you are to use is shown below. The propagation delay from A-Y,  $(t_{cA-Y}, t_{dA-Y}) = (3, 8)$ , and delay from B-to-Y,  $(t_{cB-Y}, t_{dB-Y}) = (2, 5)$ . Notice that the two inputs are not symmetric. The MUX that selects between the 1'b0 input versus the data bit has propagation delay from M-Y,  $(t_{cA-Y}, t_{dA-Y}) = (3, 6)$ .



A bit-cell style design is shown below where each bit position calculates the parity with the input and passes the result to the next bit position.



- To minimize the critical path delay of the entire parity checker, which input of the XOR (A or B) would you assign the MUX output?
- What is the critical path of this circuit, show on a logic gate diagram?
- What is the delay of this circuit (contamination and propagation)?

*combination : 6 propagation : 69*

Assume that the inputs are from flip-flops (DFF) and the parity output is an input to a flip-flop. The flip-flop (DFF) has the following characteristics:  $(t_{cCLK-Q}, t_{dCLK-Q}) = (3, 4)$ ,  $t_{setup} = 2$  and  $t_{hold} = 1$ .

- What is the minimum cycle time?

*95*

To speed up the parity calculation, we can pipeline the design so that the calculation is performed in two cycles. Each cycle would compute the parity for only 8 bits so the cycle time is shorter.

- (e) Show the change in the design needed to pipeline the design.
- (f) The pipelining enables higher throughput. What is the new cycle time?
- (g) With the cycle time in (f), what is the latency for computing the parity?

As an alternative to pipelining, the logic can be restructured so that multiple bits are processed in parallel in a single cycle instead of serially

- (h) Draw the design using logic gates that minimizes delay
- (i) What is the cycle time of this implementation?



f. 55

g. 110

**(IMPORTANT: Keep this page in submission even if left unused)**

(IMPORTANT: Keep this page in submission even if left unused)



**(IMPORTANT: Keep this page in submission even if left unused)**