

## EECS151/251A Fall 2022 Final

|                                          |           |                                           |
|------------------------------------------|-----------|-------------------------------------------|
| Name of the person on left<br>(or aisle) | Your Name | Name of the person on right<br>(or aisle) |
|                                          | Your SID  |                                           |

| Question    | 1  | 2  | 3  | 4  | 5 | 6 | 7  | Total     |
|-------------|----|----|----|----|---|---|----|-----------|
| Clobber     | MT | MT | -  | -  | - | - | MT |           |
| Max. points | 11 | 12 | 14 | 10 | 8 | 9 | 11 | <b>75</b> |

### Exam Notes:

1. You have 170 minutes to work, starting at 8:10 AM and ending at 11 AM Pacific Time.
2. Before 8:10 AM Pacific Time, you may write down your name, SID, names of the persons next to you, etc., on the first page, but you may NOT begin working.
3. Please write your name and SID on every page.
4. Problems are NOT organized in order of increasing difficulty. If you find that it takes too long to come up with a solution, consider moving on to the next one.
5. Both versions of the RISC-V green card are provided at the end of the document.
6. Q1,2, and 7 can be used for midterm clobber. If you get higher scores on these questions than your midterm scores, the new scores will be used for your midterm scores.

**Question 1: Animation State Machine [11 points]**

We are going to design an FSM to control Kirby, a video game character. Kirby has a special ability to inhale air to gain power so that it can then exhale to attack. As a result, we decide to support four states in Kirby's FSM: **Idle** (00), **Walk** (01), **Inhale** (10), and **Exhale** (11). Every cycle, Kirby can transition to a new state or stay at its current state based on its input. The player can control Kirby with two buttons as input, **walk** and **breath**. 1 indicates that a button is pressed and 0 indicates that it is not.

We make the following assumptions about this FSM:

- The input is encoded in  $\text{input} = \{\text{walk}, \text{breath}\}$ , e.g., an input of  $2^b10$  means that the walk button is pressed and the breath button is not.
- The breath button has higher priority than the walk button. i.e. when pressing both the walk and the breath buttons, Kirby will inhale/exhale but not walk.
- Kirby cannot walk while inhaling/exhaling
- Kirby can inhale infinitely long as long as the breath button was pressed and will only exhale for one cycle after the breath button is released (i.e. a 1 to 0 transition).
- When no button is pressed, the FSM has an input of {0,0} and is **NOT** stalled.



(a) Idle      (b) Walk      (c) Inhale      (d) Exhale

Figure 1: The animation you have in your gallery, thanks to Nintendo



Figure 2: Kirby's FSM

- (a) **Complete the FSM.** [7 points] We decided to design a Moore FSM for Kirby, and an incomplete FSM diagram is shown in Figure 2. Answer the following questions to complete the FSM.

i) What input triggers the state transition **A**?

- 00
- 01
- 11
- None

ii) What input triggers the state transition **B**?

- 00
- 01
- 10
- None

iii) What input triggers the state transition **C**?

- 00
- 01
- 10
- None

iv) What input triggers the state transition **D**?

- 11
- 01, 11
- 01
- None

v) What input triggers the state transition **E**?

- 11
- 01, 11
- 01
- None

vi) What input triggers the state transition **F**?

- 10, 00
- 01, 11
- 10, 11
- None

vii) What input triggers the state transition **G**?

- 10
- 01, 11
- 11
- None

(b) **Boolean Algebra.** [4 points] Next we will derive the `next_state` boolean expression based on the state machine. Input and state encoding: Input[0] refers to the breath button, Input[1] refers to the walk button. State encoding is shown in Figure 2. Take the **Inhale** state as an example, where State[0] = 0, and State[1] = 1.

i) What is the boolean algebra equation for `next_state[1]`?

- `Input[0] + State[1] * State[0]`
- `Input[0] + State[1] * !State[0]`
- `Input[1] + State[0] * !State[1]`
- `!Input[0] + State[0] * !State[1]`

ii) What is the boolean algebra equation for `next_state[0]`?

- `Input[1] * !Input[0] + State[1] * !State[0] * Input[0]`
- `Input[1] * !Input[0] + State[1] * !State[0] * !Input[0]`
- `!Input[1] * Input[0] + State[1] * !State[0] * Input[1]`
- `Input[1] * !Input[0] + !State[1] * State[0] * Input[0]`

**Question 2: RISC-V and Pipelining****[12 points]**

This problem explores two possible optimizations for the 5-stage RISC-V pipeline design we covered in the lecture: a) an **alternative stage** design, and b) **multithreading**. Here are the basic assumptions that you can make for both of the subproblems.

- Similar to the assumptions we have in lecture, register file, IMEM, and DMEM are all synchronous-write and asynchronous-read, where read and write can happen in one cycle.
- While some components such as immediate generator and load extension logics are omitted in the diagrams, the designs have all components needed to support RV32I.

For reference, here is the diagram of a **baseline** 5-stage pipeline design as we have learned from the lecture.



- a) (6 points) Below, we have an **alternative** 5-stage pipeline design where the memory and execute stages are swapped, compared to the baseline.

Note the additional Add block in the Decode stage, since we need to calculate address for the DMEM. Branches are predicted always-not-taken.



For each of the 3 RISC-V programs given below, select whether it will take **more**, **the same** or **fewer cycles** in the new design, compared to the baseline F-D-X-M-W design shown previously.

Briefly explain the reason why. You do not have to give the exact number of cycles.

Name:

Student ID:

---

i) (2 points) The program on the right takes:

More cycles

The same cycles

```
lw    x2, 4(x1)  
add  x3, x2, x1
```

Fewer cycles

Because:

---

ii) (2 points) The program on the right takes:

More cycles

```
beqz x1, L0 // x1 was 0
```

The same cycles

```
L0:
```

```
    xor  x2, x2, x1
```

Fewer cycles

```
    and  x2, x2, x1
```

Because:

---

iii) (2 points) The program on the right takes:

More cycles

```
add  x2, x2, x1
```

The same cycles

```
sw   x3, 4(x2)
```

Fewer cycles

Because:

---



Name:

Student ID:

---

- ii) (4 points) Now, let's run the same program on the **multithreaded** design. The two threads are both running the same program given below, starting from the same cycle. Fill in the pipeline table.

For convenience, we format instructions in different threads using *inst.thread*, e.g. `lw.1` and `add.2`.

| Instruction        | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
|--------------------|---|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|
| <code>lw.1</code>  | F | D | X | M | W |   |   |   |   |    |    |    |    |    |    |    |
| <code>lw.2</code>  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |
| <code>lh.1</code>  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |
| <code>lh.2</code>  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |
| <code>add.1</code> |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |
| <code>add.2</code> |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |
| <code>sw.1</code>  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |
| <code>sw.2</code>  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |

**Question 3: Logical Effort and Path Delay [14 points]**

For parts (a) and (b), suppose that you are working in a process node where for a minimum size transistor ( $W = 1$ ),

- NMOS on resistance  $R_{on,n} = R$
- PMOS on resistance  $R_{on,p} = 2R$
- Gate capacitance  $C_g = C$  (same for both NMOS and PMOS)

Assume  $\gamma = 1$ .

**a) Inverter design**

- i) How should a minimum inverter be sized in this technology? Specify the NMOS width  $W_n$  and PMOS width  $W_p$ . [1 point]

$$W_n = \underline{\hspace{2cm}}$$

$$W_p = \underline{\hspace{2cm}}$$

- ii) What is the  $R_{eq}$  and  $C_{in}$  of the minimum size inverter? Express in terms of  $R$  and  $C$ , which are defined above. [1 point]

$$R_{eq} = \underline{\hspace{2cm}}$$

$$C_{in} = \underline{\hspace{2cm}}$$

b) You are in charge of designing the following CMOS gate:



- i) Write the boolean logic function that is implemented by this gate. [1 point]
- ii) How should each transistor be sized to have the same output current as the minimum size inverter? [2 points]

$$W_{A,NMOS} = \text{_____} \quad W_{A,PMOS} = \text{_____}$$

$$W_{B,NMOS} = \text{_____} \quad W_{B,PMOS} = \text{_____}$$

$$W_{C,NMOS} = \text{_____} \quad W_{C,PMOS} = \text{_____}$$

Name: \_\_\_\_\_

Student ID: \_\_\_\_\_

- iii) Compute the logical effort (LE) and normalized parasitic delay (P) of the gate. [2 points]

$$LE_A = \text{_____} \quad LE_B = \text{_____} \quad LE_C = \text{_____} \quad P = \text{_____}$$

- iv) Your friend proposes an alternate gate topology (as shown in Figure (b)), where the stacked transistors are flipped.



(a) Original



(b) Flipped transistor stacks

Which one would you pick? Briefly explain your answer. [1 point]

- c) Suppose that you are now working in a **different** technology and want to optimize the following gate network. For this technology node, you're given the following information about the gates and assume  $\gamma = 1$ :

- $LE_{NAND} = 5/3$ ,  $P_{NAND} = 2$
- $LE_{NOR} = 4/3$ ,  $P_{NOR} = 2$



- i) Compute gate sizes  $a$  and  $b$  that would result in the minimum delay from In to Out. [4 points]
- ii) What is the total delay from In to Out? Express in terms of  $\tau_{inv}$ . [2 points]

**Question 4: Flipflop Timing****[10 points]**

The EECS151 Staff team is trying to make a 4-stage, pipelined CPU, and they need your help to answer some critical timing problems.

a) Simon is in charge of designing the pipeline diagram. Next page shows a simplified diagram of his design. [2 points]

- The WB stage is implicitly implemented in the ID/EX stage, and you do not have to consider it
- You should consider the register file as a combinational logic block
- The delay of each block is as follows
  - Mux: 50 ps
  - Regfile: 200 ps
  - ALU: 400 ps
  - Mem: 600 ps
  - ImmGen: 150 ps
- Register parameters
  - Clk-Q delay: 20 ps
  - Setup time: 50 ps
  - Hold time: 40 ps
  - Clock jitter: 10 ps
  - There is no clock skew in **part a)**



- i) What is the minimum clock period of this design?
- 700 ps
  - 710 ps
  - 720 ps
  - 730 ps
  - 740 ps
- ii) What is the minimum delay of the combinational logic (shown below) to avoid hold time violations? *Hint:* The function of combinational logic block does not matter.



- 10 ps
- 0 ps
- 10 ps
- 20 ps
- 30 ps

- b) The team was able to tape out the chip and get it back from the manufacturers. However, Ella and Jennifer discovered some timing issues while testing the chip **at the minimum clock period**. The output  $Y$  should be VDD after the positive clock edge, but is 0 instead. (The dashed line shows the correct output.) [8 points]



- i) What are the possible waveforms of  $X$  to result in the incorrect output waveform? Circle all that apply.



Figure 4: A



Figure 5: B



Figure 6: C



Figure 7: D

- ii) Ella thinks it's a setup time violation. What following actions should she try in order to fix the chip? Select all that apply.
- A. Raise clock frequency alone.
  - B. Lower clock frequency alone.
  - C. Raise VDD alone.
  - D. Lower VDD alone.
  - E. There is nothing she can do about it.
- iii) Jennifer thinks it's a hold time violation. What following actions should she try in order to fix the chip? Select all that apply.
- A. Raise clock frequency alone.
  - B. Lower clock frequency alone.
  - C. Raise VDD alone.
  - D. Lower VDD alone.
  - E. There is nothing she can do about it.
- iv) Fortunately, Professor Shao had the foresight to implement a clock skew controller with a variable resistor in the clock distribution network, shown in the diagram below. Which of the following statements are true? Select all that apply.



- A. If it is a setup time violation, she should increase the resistance.
- B. If it is a setup time violation, she should decrease the resistance.
- C. If it is a hold time violation, she should increase the resistance.
- D. If it is a hold time violation, she should decrease the resistance.
- E. If she increases the resistance, she might be able to run the chip at a higher frequency.

**Question 5: Arithmetics****[8 points]****a) (3 points) True or False**

- (i) \_\_\_ Recall in HW8, compared to Radix-2, a Radix-4 Parallel-Prefix Carry-lookahead Adder (CLA) reduces the depth of the adder tree by a factor of 2 (excluding the input and output stages). Theoretically, if we use a Radix-16 CLA, we can reduce the depth of the adder tree by another factor of 2, compared to a Radix-4 CLA.
- (ii) \_\_\_ In a Parallel-Prefix Carry-lookahead Adder with a carry-in bit,  $G_{i:0}$  is completely equivalent to  $C_{out,i}$ , where  $G_{i:0}$  is the "group generate" from bit 0 to bit  $i$ , and  $C_{out,i}$  is the carry-out bit of bit  $i$ .
- (iii) \_\_\_ A shift-and-add multiplier that reuses the adder circuit for all partial products can be implemented without using registers (purely combinational).

**b) (5 points) Parallel Array Multiplier**

Consider the following unit cell for a single-cycle parallel array multiplier:



- (i) Write down the Boolean expression for the Sum and Carry-out bit. You may use 2-input *AND*, *OR* and *XOR* only.

$$Sum_{out} = \underline{\hspace{10cm}}$$

$$C_{out} = \underline{\hspace{10cm}}$$

Name:

Student ID:

---

- (ii) Recall the principle of multipliers: Computing the partial products, and adding the partial product terms together at the correct bit offsets. Implement a 2-bit by 2-bit multiplier by connecting several of such unit cells (**you can represent each cell using a box without the internal circuit details**). Clearly mark the following input/output:  $b_1, b_0, a_1, a_0, p_3, p_2, p_1, p_0$  where  $p$  is the product. You may also use constants like 1s and 0s.

**Question 6: SRAM****[9 points]**

Consider the 6-Transistor (6T) SRAM shown below and answer the following questions.

**a) (5 points) True or False**

- (i)  Assuming all transistors have the channel same length. The size of the transistors should be ordered as  $M_1 > M_5 > M_2$  for proper read-stability and writability.
- (ii)  Decreasing the supply voltage  $V_{DD}$  for the cross-coupled inverters (while keeping WL and BL at the original nominal voltage) will hurt writability.
- (iii)  Decreasing the width of the access transistors helps the writability.
- (iv)  Increasing the width of  $M_1, M_3$  helps the read stability.
- (v)  If the SRAM is optimized for writability, then it is more difficult to write a "0" than a "1" into the cell in write mode.

**b) (4 points) Fill the blanks**

- (i) An SRAM made of 6-T cells like above can atmost support \_\_\_\_\_ read port(s) and \_\_\_\_\_ write port(s).
- (ii) Before a read operation, the bitlines need to be \_\_\_\_\_ before turning on the wordlines.
- (iii) To decode a large-width address, it is preferable to use predecoders to break the decoding into smaller groups first, followed by the final decoders to do the rest. Suppose you have a 128-word SRAM with 64-bit words. The row-decoding logic uses 5-bit predecoders. In this case, each predecoder output will need to drive \_\_\_\_\_ final decoders.

**Question 7: Veri-long****[11 points]**

In the classic puzzle “Tower of Hanoi”, we have three pegs (rods) and  $N$  disks each sized differently. Each of the disks go on one of the pegs. Disk 0 is the **smallest** disk and  $N - 1$  is the **largest** disk. Initially, the disks form a pyramid on peg 0 (A).



Initial state of a Hanoi puzzle with  $N=4$ .

The goal is to move the stack to peg 2 (C) by taking the **top disk** from a stack and place it on another stack or an empty rod, **one at a time**. Importantly, **no disk may be above a smaller disk**. In this question, you will build a hardware accelerator in Verilog that outputs a move every cycle. The outputted sequence of moves should solve the puzzle with  $N = \text{num\_disks}$  cycles. Assume your Hanoi solver accelerator module has the following inputs and outputs.

```
input start,
input clk,
input [5:0] num_disks,
output [1:0] src_peg,
output [1:0] dst_peg,
output done
```

- a) (1 point) **Boundary conditions.** We use a 32-bit binary number to represent the disks on each peg, where bit  $i$  indicates whether disk  $i$  is on the peg (1) or not (0). We will use 3 of these 32-bit numbers to represent the 3 pegs. The code to initialize the game state with  $\text{num\_disks}$  disks when  $\text{start}$  is asserted is already provided. Once a solution is found, write the logic to assert  $\text{done}$ .

```
reg [31:0] pegs[3];
wire [31:0] all_disks;

assign all_disks = (1 << num_disks) - 1;
always @ (posedge clk) begin
    if (start) begin
        pegs[0] <= all_disks;
        pegs[1] <= 32'd0;
        pegs[2] <= 32'd0;
    end
end
```

```
// Check for done
assign done = pegs[0] _____ &&
          pegs[1] _____ &&
          pegs[2] _____;
```

**b)** (4 points) **Finding a move.** To solve the puzzle, we will repeat these 3 steps until solved:

- **Step AB:** make the legal move between pegs A and B,
- **Step AC:** make the legal move between pegs A and C,
- **Step BC:** make the legal move between pegs B and C.

For simplicity, we will assume that we always have an even number of disks.

i) (2 points) **State machine.** We will use a 2-bit number state to indicate which step we are on: AB=00, AC=01, BC=10. Fill out the expressions for peg\_x and peg\_y, which are **the two pegs concerned in the current step**, in no particular order.

```
reg [1:0] peg_x, peg_y, state, next_state;
assign next_state = (state + 2'd1) % 2'd3;
always @* begin
  case (state)
    2'd0: {peg_x, peg_y} = _____; // AB
    2'd1: {peg_x, peg_y} = _____; // AC
    2'd2: {peg_x, peg_y} = _____; // BC
    default: {peg_x, peg_y} = 4'd0;
  endcase
end
```

ii) (2 points) What are valid ways of updating state? Select all that apply.

- always** @\* state = start ? 2'd0 : next\_state;
- assign** state = clk ? (start ? 2'd0 : next\_state) : state;
- always** @**(posedge** clk) state <= start ? 2'd0 : next\_state;
- always** @**(clk, start)** state = start ? 2'd0 : next\_state;
- initial** state = 2'd0;
 **always** @**(posedge** clk) state <= next\_state;

**c)** (0 points) **Locating top disks.** Recall that any movement is made only to the top disk of a peg so we need to locate the top disk on each peg. Good news that your teammate has implemented the code for you. You don't need to worry about the implementation details but just note that peg\_x\_top\_index and peg\_y\_top\_index store the indexes of the top disks on peg x and peg y.

**d)** (4 points) **Picking the legal direction.** After we pick the two pegs, we need to choose whether we want to move from peg x to peg y or vice versa. We designate the *forward* direction as peg  $x \rightarrow$ peg y. The forward direction is valid if following two conditions are met:

1. (peg\_x is not empty)
2. (peg\_y has no smaller disk than the top disk from peg\_x)

Initially, you wrote `pegs[peg_y][peg_x_top_idx-1:0] == 0` to test for **the second condition**. But turns out, this is not valid Verilog - slices cannot have a variable length! Select all valid expressions for the second condition that circumvents this restriction.

- `peg_x_top_idx < peg_y_top_idx`
- `((pegs[peg_y] & ((32'd1 << peg_x_top_idx) - 32'd1)) == 0);`
- `((pegs[peg_y] & ((1'd1 << peg_x_top_idx) - 1'd1)) == 0);`
- `((pegs[peg_y] & (32'hffffffff)[peg_x_top_idx-1:0]) == 0);`
- `(&(pegs[peg_y] ^ {peg_x_top_idx{1'b1}}) == 1'b0);`

**e)** (2 points) **Making a move.** Now that we know the two pegs and the direction of the move, we should perform the move to update our state and output the move.

```

wire [4:0] disk_index; // target disk
assign disk_index = fwd_dir_valid ? peg_x_top_idx : peg_y_top_idx;

assign src_peg = fwd_dir_valid ? peg_x : peg_y; // src peg of the move
assign dst_peg = fwd_dir_valid ? peg_y : peg_x; // dst peg of the move

always @(posedge clk) begin
    if (~start) begin
        pegs[src_peg] <= _____;
        pegs[dst_peg] <= _____;
    end
end

```

Congratulations! You have just made a Tower of Hanoi accelerator. Your accelerator dissects the iterative algorithm into a state machine, uses a combinational logic circuit to find which of the 2 moves are legal, and updates the state with that knowledge.





# CS 61C Reference Card

Version 1.5.0

|            | Instruction              | Name                               | Description                                                                                 | Type | Opcode   | Funct3 | Funct7   |
|------------|--------------------------|------------------------------------|---------------------------------------------------------------------------------------------|------|----------|--------|----------|
| Arithmetic | <b>add rd rs1 rs2</b>    | ADD                                | $rd = rs1 + rs2$                                                                            | R    | 011 0011 | 000    | 000 0000 |
|            | <b>sub rd rs1 rs2</b>    | SUBtract                           | $rd = rs1 - rs2$                                                                            | R    | 011 0011 | 000    | 010 0000 |
|            | <b>and rd rs1 rs2</b>    | bitwise AND                        | $rd = rs1 \& rs2$                                                                           | R    | 011 0011 | 111    | 000 0000 |
|            | <b>or rd rs1 rs2</b>     | bitwise OR                         | $rd = rs1   rs2$                                                                            | R    | 011 0011 | 110    | 000 0000 |
|            | <b>xor rd rs1 rs2</b>    | bitwise XOR                        | $rd = rs1 ^ rs2$                                                                            | R    | 011 0011 | 100    | 000 0000 |
|            | <b>sll rd rs1 rs2</b>    | Shift Left Logical                 | $rd = rs1 \ll rs2$                                                                          | R    | 011 0011 | 001    | 000 0000 |
|            | <b>srl rd rs1 rs2</b>    | Shift Right Logical                | $rd = rs1 \gg rs2$ (Zero-extend)                                                            | R    | 011 0011 | 101    | 000 0000 |
|            | <b>sra rd rs1 rs2</b>    | Shift Right Arithmetic             | $rd = rs1 \gg rs2$ (Sign-extend)                                                            | R    | 011 0011 | 101    | 010 0000 |
|            | <b>slt rd rs1 rs2</b>    | Set Less Than (signed)             | $rd = (rs1 < rs2) ? 1 : 0$                                                                  | R    | 011 0011 | 010    | 000 0000 |
|            | <b>sltu rd rs1 rs2</b>   | Set Less Than (Unsigned)           |                                                                                             | R    | 011 0011 | 011    | 000 0000 |
|            | <b>addi rd rs1 imm</b>   | ADD Immediate                      | $rd = rs1 + imm$                                                                            | I    | 001 0011 | 000    |          |
|            | <b>andi rd rs1 imm</b>   | bitwise AND Immediate              | $rd = rs1 \& imm$                                                                           | I    | 001 0011 | 111    |          |
|            | <b>ori rd rs1 imm</b>    | bitwise OR Immediate               | $rd = rs1   imm$                                                                            | I    | 001 0011 | 110    |          |
|            | <b>xori rd rs1 imm</b>   | bitwise XOR Immediate              | $rd = rs1 ^ imm$                                                                            | I    | 001 0011 | 100    |          |
|            | <b>slli rd rs1 imm</b>   | Shift Left Logical Immediate       | $rd = rs1 \ll imm$                                                                          | I*   | 001 0011 | 001    | 000 0000 |
|            | <b>srlti rd rs1 imm</b>  | Shift Right Logical Immediate      | $rd = rs1 \gg imm$ (Zero-extend)                                                            | I*   | 001 0011 | 101    | 000 0000 |
|            | <b>sraisi rd rs1 imm</b> | Shift Right Arithmetic Immediate   | $rd = rs1 \gg imm$ (Sign-extend)                                                            | I*   | 001 0011 | 101    | 010 0000 |
|            | <b>slti rd rs1 imm</b>   | Set Less Than Immediate (signed)   | $rd = (rs1 < imm) ? 1 : 0$                                                                  | I    | 001 0011 | 010    |          |
|            | <b>sltiu rd rs1 imm</b>  | Set Less Than Immediate (Unsigned) |                                                                                             | I    | 001 0011 | 011    |          |
| Memory     | <b>lb rd imm(rs1)</b>    | Load Byte                          | $rd = 1 \text{ byte of memory at address } rs1 + imm, \text{ sign-extended}$                | I    | 000 0011 | 000    |          |
|            | <b>lbu rd imm(rs1)</b>   | Load Byte (Unsigned)               | $rd = 1 \text{ byte of memory at address } rs1 + imm, \text{ zero-extended}$                | I    | 000 0011 | 100    |          |
|            | <b>lh rd imm(rs1)</b>    | Load Half-word                     | $rd = 2 \text{ bytes of memory starting at address } rs1 + imm, \text{ sign-extended}$      | I    | 000 0011 | 001    |          |
|            | <b>lhu rd imm(rs1)</b>   | Load Half-word (Unsigned)          | $rd = 2 \text{ bytes of memory starting at address } rs1 + imm, \text{ zero-extended}$      | I    | 000 0011 | 101    |          |
|            | <b>lw rd imm(rs1)</b>    | Load Word                          | $rd = 4 \text{ bytes of memory starting at address } rs1 + imm$                             | I    | 000 0011 | 010    |          |
|            | <b>sb rs2 imm(rs1)</b>   | Store Byte                         | Stores least-significant byte of $rs2$ at the address $rs1 + imm$ in memory                 | S    | 010 0011 | 000    |          |
|            | <b>sh rs2 imm(rs1)</b>   | Store Half-word                    | Stores the 2 least-significant bytes of $rs2$ starting at the address $rs1 + imm$ in memory | S    | 010 0011 | 001    |          |
|            | <b>sw rs2 imm(rs1)</b>   | Store Word                         | Stores $rs2$ starting at the address $rs1 + imm$ in memory                                  | S    | 010 0011 | 010    |          |



