

Answer each of the following problems on the paper provided. Note: The organization will influence the grading of your exam. If you feel information you need to solve a problem is missing from the problem, state that and state the assumptions under which you will solve the problem. If the assumptions are reasonable and correct, the problem will be graded under those assumptions. You may use a calculator. Devices with communications capability, e.g. cell phones, smart phones, tablets, and laptops are not calculators. Otherwise, Closed books, closed computers, and closed neighbors.

Carefully read the questions, take your time, think, and then answer.

1. Consider a simple cache (L1) and main memory system with the following parameters.

L1 access time: 3 clock cycles

Main Memory access time: 100 clock cycles

L1 hit rate: 0.85

Main Memory is accessed **after** checking the L1 cache, i.e. the L1 cache is checked first then, on a miss, Main Memory is accessed.

(A) What is the effective memory access time in clock cycles? 17.55

2. Consider a L1 cache, a L2 cache, and main memory system with the following parameters.

L1 access time: 3 clock cycles

L2 access time: 10 clock cycles

Main Memory access time: 100 clock cycles

L1 hit rate: 0.81

L2 hit rate: 0.88

because L<sub>1</sub> and L<sub>2</sub> caches are accessed simultaneously  
so if L<sub>1</sub> missed L<sub>2</sub> has only 7 seconds.  
The L1 and L2 caches are accessed simultaneously. If L1 hits, access to L2 is aborted. If L1 and L2 both miss, then Main Memory is accessed.

(A) What is the effective memory access time in clock cycles? 6.382

3. An integrated circuit manufacturer uses 30 cm diameter wafers. The integrated circuit die area is 1.1 cm by 1.1 cm.

(A) Estimate the number of complete dies on the wafer: 523

$$\frac{\pi \left(\frac{30}{2}\right)^2}{1.1 \times 1.1} - \frac{\pi \cdot 30}{\sqrt{2 \cdot 1.1 \times 1.1}} = 523$$

4. Suppose your application is 80% parallelizable. What is the SpeedUp for each of the following cases?

(A) Two (2) processors are used: 1.67

(B) Four (4) processors are used: 2.5

$$\frac{1}{(1-0.8)} + \frac{0.8}{2}$$

5. The Texas Instruments AM5K2E04 contains four (4) ARM Cortex A-15 cores.

The L2 cache is 4 MB and shared by all cores. Each core has separate instruction and data caches of 32 KB each.

(A) The L1 cache is 32 KB, 4-way set associative with 64 B blocks.

How many lines are in the L1 cache? 128 lines

$$\frac{32\text{ KB}}{64 \times 4} =$$

(B) The L2 cache is 4 MB with 64 B blocks and 8,192 lines.

How many blocks are there (association) per cache line? 8 blocks/line

6. Consider a floating-point execution unit with 10 pipeline stages. Each stage takes one (1) clock cycle. The CPU is able to issue one new computation per clock cycle.

$$10 + 99 \times 1$$

(A) Suppose there are 100 computations to compute. How many clock cycles does this set of computations require? 109 clock cycles

(B) Suppose the 100 computations have to be broken up into 10 batches of 10 computations each. Batches cannot overlap in the floating-point pipeline.

How many clock cycles does this set of computations require?

$$190 \text{ (clock cycles)}$$

7. An organization has three primary applications, call them X, Y, and Z.

Computer system A executes these applications in 110 seconds, 150 seconds, and 90 seconds, respectively. Computer system B executes these applications in 95 seconds, 145 seconds, and 100 seconds, respectively.

What is the average speed-up of system B with respect to system A?

$$\text{A: } \frac{110 + 150 + 90}{3} = \frac{350}{3} = 116.67 \text{ s.}$$
  
$$\text{B: } \frac{95 + 145 + 100}{3} = \frac{340}{3} = 113.33 \text{ s.}$$

8. A single processor has a failures in time (FIT) of 100,000. What is the mean time to fail (MTTF) for this system?

$$10^5 \text{ FIT} = \frac{10^9}{x}$$

10,000 hours

$x = 1 \times 10^{-4}$

9. We have an application that processes a sequence (array) of operand pairs. A pipeline processor has five (5) stages. Each stage takes one (1) time unit to execute. Compare this pipeline processor to a monolithic (single) processor that takes five (5) time units to carry out a computation on a pair of operands. How long does the sequence of operand pairs need to be to achieve a speed-up of 5?

$$\begin{aligned} & 15 \ 25 \ 35 \\ & 5, 6, 7, 8, 9 \ 10 \ 11 \ 12 \ 13 \ 14 \ 15 \\ & 5, 10, 15, 20, 25, 30, 35 \ 40 \ 45 \ 50 \ 55 \ 60 \ 65 \ 70 \ 75 \ 80 \ 85 \ 90 \\ & 5 \times 5 = 25 \quad x \ 5 = 5 \quad 5 \times 5 = 25 \\ & 5+4 \quad \frac{5+(x-1)}{5 \times} = 5 \quad 1 \text{ FP } \frac{1}{5} \end{aligned}$$

~~Time = 0.25  
= 0.2 unit~~

$$S = ((1 - FP) + FP / \text{SpeedUp})$$

$$FP = 1 \quad \text{Speed up} = 5$$

$$\frac{5x}{5x + (x-1) \cdot 1} = 5$$

~~Time = 25~~

~~Time = 5  
Never reached!~~

$$\frac{5x}{5 + (x+1)} = 5$$

$$5x = 25 + 5x + 5$$

$$0 \neq 30.$$