

Department of Electrical and Computer Engineering  
The University of Texas at Austin

EE 460N Spring 2014  
Aater Suleman, Instructor  
Stephen Pruett, Jay Patel, Chirag Sahuja, TAs  
Exam 1  
February 26, 2013

Name: Solution

|                        | Average | Median |
|------------------------|---------|--------|
| Problem 1 (25 points): | 13      | 14     |
| Problem 2 (15 points): | 10      | 11     |
| Problem 3 (10 points): | 8       | 8      |
| Problem 4 (25 points): | 5       | 2      |
| Problem 5 (25 points): | 12      | 13     |
| Total (100 points):    | 47      | 49     |

**You have 75 minutes to take this exam.**

Note: Please be sure that your answers to all questions (and all supporting work that is required) are contained in the space provided.

Note: Please be sure your name is recorded on each sheet of the exam.

Please sign the following. I have not given nor received any unauthorized help on this exam.

Signature: \_\_\_\_\_ 

**GOOD LUCK!**

Name: \_\_\_\_\_

**Problem 1 (25 points):**

**Part a (5 points):** On a chip you designed, the two longest paths, P1 and P2, take 7 ns and 8 ns respectively. The remaining paths take only 2 ns each. For the same amount of engineering effort, you can *either* shrink P2 alone by 5 ns *or* shrink both P1 and P2 by 2 ns each. Which optimization will provide higher performance? Explain your answer.

→ Shrink . P1 & P2 by 2ns each .  
after the optimization

P1 - 5ns      P2 - 6ns

**Part b (5 points):** An aggie hired by Little Computer Inc to design the memory circuit for LC-3b made an error: He connected MAR[3] to memory address pin 15 on the memory and connected MAR[15] to memory address pin 3 on the memory. Will this microarchitecture work?

Circle one: Yes / No

Specify the range of memory addresses, if any, for which this microarchitecture will fail.

**Part c (5 points):** Software can choose to ignore a particular interrupt by clearing the  
corresponding to the interrupt.

mask bit

Name: \_\_\_\_\_

**Problem 1 continued**

**Part d (5 points):** When an exception is detected, the currently executing instruction is rolled back to its start before the exception service routine can execute. An aggie working at Little Computer Inc finds this rollback wasteful and wants to embark on a project to optimize exception handling. Will you allow her to continue this project or tell her to move on to the other pending tasks? Explain your answer.

No.

Exceptions are save.

**Part e (5 points):** Although a  cache is prone to cache consistency issues, such issues can

be prevented if we can ensure that the sum of the number of  bits and

bits is less than or equal to the number of  bits.

Name: \_\_\_\_\_

**Problem 2 (15 points):**

Answer parts (a-d) for a PIPT, write-back cache. Assume:

- Physical address is 16-bits.
- The tag store entry is 1 byte.
- There are no unused bits in the tag store entry.
- Cache block size is 8 bytes.
- The cache uses a tree replacement policy that requires 7 bits.

*Hint: In the tree replacement policy, the replacement bits are not a part of the tag store entry.*

**Part a (2 points):** What is the cache's associativity?

7 bits for tree replacement

$$7+1 = \# \text{ways}$$

8 ways

**Part b (3 points):** How many bits of the physical address are used as the index?



7 bits

**Part c (5 points):** How big is the data store? Answer in bytes.

$$\frac{2^7}{\cancel{\# \text{sets}}} \cdot \frac{8}{\cancel{\# \text{ways}}} \cdot \frac{8}{\cancel{\text{cache block size}}} = 2^{7+3+3} = 2^{13}$$

2<sup>13</sup> bytes

**Part d (5 points):** How big is the tag store? Answer in bits.

$$\frac{2^7}{\cancel{\# \text{sets}}} \cdot \frac{8}{\cancel{\# \text{ways}}} \cdot \frac{1}{\cancel{TSE \text{ bits/size}}} \cdot \frac{8}{\cancel{\text{byte}}} + \frac{7 \cdot 2^7}{\cancel{\# \text{sets}}} = 2^{7+3+3} + 7 \cdot 2^7$$

2<sup>13</sup> + 7 · 2<sup>7</sup> bits

Name: \_\_\_\_\_

**Problem 3 continued:**

Assume:

- The logic blocks contain all the necessary gates and latches to complete the memory accesses.
- The logic blocks are implemented to maximize performance.
- Each memory chip takes 100 cycles to access.
- The system only performs reads (and no write operations).

$\Delta$  indicates time for logic.  
Any  $\Delta$  was ~~not~~ accepted.

**Part a (5 points):** How many cycles will it take to execute the data load (not instruction fetch) of LDW R0, x3000 on M1 and M2?

Cycles with M1:  $100 + \Delta$

Explain in less than 20 words.

Need to access same row in both chips, which can be done in one memory cycle.

Cycles with M2:  $100 + \Delta$

Explain in less than 20 words.

Need to access same row in both chips, which can be done in one memory cycle.

**Part b (5 points):** How many cycles will it take to execute the data load (not instruction fetch) of LDW R0, x3001 on M1 and M2?

Cycles with M1:  $200 + \Delta$

Explain in less than 20 words.

Need to access different rows in each chip, but can only send same address to both. Must take two memory cycles.

Cycles with M2:  $100 + \Delta$

Explain in less than 20 words.

Need to access different rows in each chip, and can be done in one memory cycle because mux allows different addresses to be sent to each chip.

Name: Solution

**Problem 4 (25 points):**

We have added an 8-entry, 2-way write-through PIPT cache and an x86 style 2-level virtual-to-physical translation scheme to the LC-3b. Each virtual address is 16 bits. The most significant 5-bits, VA[15:11], are used to access the system page table, the next 5 bits, VA[10:6], are used to access the process page table, and least significant 6-bits, VA[5:0], are the byte in page bits. The PTE only contains a valid bit as its most significant bit and the PFN as its least significant bits. All remaining bits are reserved and always set to zeros.



**Notes:**

- Physical address is 10 bits.
- The cache implements a perfect LRU replacement policy.
- The cache block size is 4 bytes.
- Instructions and data share the same cache.
- The cache is initially empty.
- The memory accesses to system page table, the user page table, and user pages are **all** cached.
- No page faults happen during the execution of this instruction.
- Only for the sake of this problem, assume that the LC-3b is **big endian** (most significant byte is in least significant address) and does not permit unaligned accesses.

VA: | 0 0 11 0 0 0 | 1 0 0 0 0 1 0 0 |

The PC is loaded with address x3184 and one instruction is run to completion. The figure below shows the contents of the cache after the end of this instruction. **Your job:** Use this data to answer the questions on the next page.

| LRU | User Inst/Data |        |    |        |
|-----|----------------|--------|----|--------|
|     | V0             | Tag0   | V1 | Tag1   |
| 0   | 0              | 101100 | 0  | 010010 |
| 1   | 1              | 011100 | 0  | 011100 |
| 0   | 0              | 011000 | 0  | 110110 |
| 0   | 1              | 111100 | 1  | 110000 |

  

| Data0 |     |     |     | Data1 |     |     |     |
|-------|-----|-----|-----|-------|-----|-----|-----|
| *80   | x70 | xAE | xFD | x80   | x06 | x80 | x77 |
| x60   | x43 | x59 | xAE | xFD   | xE9 | x46 | x57 |
| x53   | x74 | x65 | x70 | x68   | x65 | x6E | x21 |
| x80   | x0C | x00 | x0F | x80   | x07 | x80 | x06 |

\*Cache after execution of first instruction

\*The order of bytes in the data store is byte0 (00), byte1 (01), byte2 (10), byte3 (11).

\*V0, Tag0, and Data0 correspond to Way 0 and V0, Tag1, and Data1 correspond to Way 1.

\*If the LRU bit is 0, the data in Way 0 will be evicted next.

Part c)  $\begin{array}{r} 1111001100 \\ \text{PA}_3 \\ \hline 001100 \end{array}$

$$\text{SBR} = 11,1100,0000$$

$$= x3C0$$

Part d)  $\begin{array}{r} 11000011100 \\ \text{PA}_2 \\ \hline \end{array}$

$$011100,0100 \leftarrow \text{PA}_1$$

$$x6043 = 01101000 01000011$$

LDW R0, R1, #3

Only 3 things in cache  
so Load must be to same Cache line

**PROBLEM IS CONTINUED ON THE NEXT PAGE!!!**

Name: Solution

**Problem 4 continued**

**Part a (2 points):** How big is each page (in bytes)?

Answer:

$2^6$

**Part b (3 points):** What is the size of a PTE (in bytes)?

$2^6$  bytes/page  
 $2^5$  PTE's/page

$\frac{2^6}{2^5}$

Answer:

2

**Part c (5 points):** What is the value of SBR?

Answer:

x3cc

**Part d (10 points):** Fill in the table in the order of physical memory accesses. Note that some rows in this table may be left blank.

| Physical Address | Data              | Explanation of data |
|------------------|-------------------|---------------------|
| x3cc             | x800c             | PTE of PTE of Inst. |
| x30c             | x8007             | PTE of Inst.        |
| x1c4             | x6043             | Inst.               |
| x3cc             | x800c             | PTE of PTE of Data  |
| x30c             | x8007             | PTE of Data         |
| x1c4 OR<br>x1c6  | x6043 OR<br>x59ae | Data                |

**Part e (5 points):** What is the value of R0 and R1? (There are two possible answers to this question and either one will get full credit)

R0:

x6043 OR x59ae

R1:

~~x3184 OR x3185~~

x318e OR x3189

Either word on cache line is OK

Name: \_\_\_\_\_

**Problem 5 continued:**

**Your job** Implement ADD32 by modifying the datapath and the state diagram.

**Part a (12 points):** Complete the state diagram to implement ADD32 by filling in the shaded bubbles and state assignment for the last state.

**HINT:** You may find it helpful to look at part b while solving part a.



Figure 3: State diagram for the LC-3b

Name: \_\_\_\_\_

**Problem 5, Part b continued:**

Complete the logic for blocks X and Y below by filling in *only* the three shaded boxes.

