

# Homework 1

## CDA 4102/5155: Fall 2025

**Due Date:** September 16, 2025 **11:30pm**

**Total Points:** 20 points

You are not allowed to take or give help in completing this assignment. Submit the PDF version of the submission in e-learning before the deadline. Late submission (by email attachment to nkim2@ufl.edu) is allowed (up to 24 hours) with a 20% penalty (irrespective of whether it is late for 10 minutes or 10 hours). No grades for late submissions after 24 hours from the deadline. Handwritten (scanned PDF) submissions will NOT be accepted. Please write it in LaTeX or Microsoft Word and convert it to PDF. Please do not take any help from LLM (e.g., ChatGPT) or any other sources. ***Please do not include the questions in your solution (PDF) since it affects the plagiarism checker.*** Please include the following sentence on top of your submission (PDF):

**I have neither given nor received any unauthorized aid on this assignment.**

---

1. [2 pts] Consider a processor consisting of independent functional units for performing load and store instructions. Designers can improve the performance of the load instruction by 10 times, but this improvement reduces the performance of the store instruction by 5 times. **Compute the overall performance** (speedup) for a program consisting of 30% load instructions and 10% store instructions. **Indicate whether this modification is beneficial.** Assume that the remaining 60% of instructions are not affected by the above modification. Please show major steps in your computation.
2. [2 pts] Consider a system consisting of four components where the components have Mean-Time-To-Failure (MTTF) of 1000, 500, 250 and 25 days, respectively. The system failure will occur if any one of the components fail. Please show major steps in your computation.
  - (i) What is the MTTF of the system?
  - (ii) Assuming Mean-Time-To-Repair (MTTR) is one day, what is the availability of the system?
3. [4 pts] Assume that values A, B, C and D reside in memory. Write the code sequence for

$$D = C + (B / A)$$

for four instruction-set architectures: i) Stack, ii) Accumulator, iii) Register-memory and iv) Register-register (Load-Store). These four architectures are shown in Figure A.1 on page A-4 of Appendix A.

- Note that  $C + (B / A)$  is equivalent to  $(B / A) + C$ . Please do not use any other transformations.
- Assume that during stack operation, the first operand should be pushed first on the stack. For example, in  $X=Y-Z$ , first “Push Y”, then “Push Z” and then “Sub”. For accumulator-based architecture, you can use the intermediate result for later computations as long as the accumulator has the first source operand. For the other two architectures, the first operand is always the left one (e.g., Sub R1, B means  $R1 = R1 - B$  or Sub R1, R2, R3 means  $R1 = R2 - R3$ )
- The four variables (A, B, C, D) represent the four memory locations.
- Please use only one register (R1) for register-memory architecture. Reuse R1, if needed.

- You can use only two registers (R1 and R2) for register-register architecture. Reuse them, if needed. Note that it is okay to use the same register as both source and destination (e.g., Add R1, R1, R2).
  - If you use any memory operands in register-memory (e.g., Store R1, C or Add R1, B) or register-register architectures (e.g., Load R1, C or Store R1, C), the memory operand should be the last one. For example, it is invalid to use Store C, R1 or Add C, R2.
  - You cannot use any temporary variables or any other registers (except as outlined above).
4. [4 pts] The following table shows the run times of four programs P1, P2, P3, P4 on two computers M1 and M2. Determine **which computer is better** (faster) using arithmetic mean, weighted arithmetic mean and geometric mean. Please show the major steps in your calculations.

| Program | Weight | M1<br>(seconds) | M2<br>(seconds) |
|---------|--------|-----------------|-----------------|
| P1      | 20%    | 10              | 20              |
| P2      | 40%    | 50              | 60              |
| P3      | 20%    | 30              | 30              |
| P4      | 20%    | 20              | 10              |

5. [3+5 pts] You are trying to decide which system to purchase and are considering two different computers M1 and M2. The computer M1 has a clock rate of 1GHz, and M2 has a clock rate of 500MHz. All computers support three classes of instructions: A, B and C. The respective CPI (Cycles per Instruction) values are shown in the following table. This question has two independent parts.

| Instruction class | CPI for M1 | CPI for M2 |
|-------------------|------------|------------|
| A                 | 5          | 2          |
| B                 | 6          | 3          |
| C                 | 8          | 2          |

- a) If the number of instructions executed in a certain program is divided equally among the three classes of instructions, which machine is faster? Please show the major steps in your calculations.
- b) The following table shows the percentage of the three classes of instructions used by three specific compilers C1, C2 and C3. For example, when C1 is used, the generated assembly code has 20% of class A instructions, 70% of class B instructions and 10% of class C instructions. Consider the three compilers C1, C2, and C3 that produce the frequency results shown below. Determine the most beneficial machine (M1 or M2) and compiler (C1, C2 or C3) pair. Show the major steps in your calculations.

| Instruction Class | C1  | C2  | C3  |
|-------------------|-----|-----|-----|
| A                 | 20% | 30% | 40% |
| B                 | 70% | 30% | 40% |
| C                 | 10% | 40% | 20% |