

## 211 Review: Storing Data in Memory

How much data can be stored in 1 mem. address?

→ 1 byte; aka 8 bits

→ The top row is "byte address 0"

→ An int is size = 4 bytes, so it will span multiple addresses in memory!

→ 4 bytes → 4 addresses in memory.

→ Ex:  $329,421 = 0b0000\ 0000\ 0000\ 0101\ 0000\ 0110\ 1100\ 1101$

→ The leftmost byte of a number, e.g. **0000 0000**

What is the Least Significant Byte (LSB)?

(LSB)

What is Little Endian Ordering?

→ Where the LSB of an int goes in the first byte address

→ AKA, fill the table from "right to left":  
• the last 2 nibbles of num → 1<sup>st</sup> address  
• nibbles 5 & 6 → 2<sup>nd</sup> address  
• nibbles 2 & 2 → 4<sup>th</sup> address

→ Where the LSB of an int goes in the largest byte address

→ **Little Endian**

→  $(\text{Index[element]} \times \text{size of(element type)}) + \text{base\_address}$

→ Ex: An int array with 10 elements

- Each element requires 4 bytes/addresses
- element 0 → address  $0 \times 4 = \text{addr } 0$  (or whatever the base addr. is)
- element 8 → address  $8 \times 4 = \text{addr } 32$

→ Ex:  $\text{arr} = [5, 90, 100, 40]$

1 row = 1 byte

| Addr. | Data      |
|-------|-----------|
| 0     | 0000 0101 |
| 1     | 0000 0000 |
| 2     | 5         |
| 3     | 0000 0000 |
| 4     | 0101 1010 |
| 5     | 0000 0000 |
| 6     | 90        |
| 7     | 0000 0000 |
| 8     | 0110 0100 |
| 9     | 0000 0000 |
| 10    | 100       |
| 11    | 0000 0000 |
| 12    | 0010 1000 |
| 13    | 0000 0000 |
| 14    | 40        |
| 15    | 0000 0000 |

1 row = 4 bytes

| Addr.      | Data                |
|------------|---------------------|
| 0x00000000 | 5                   |
| 0x00000004 | 90                  |
| 0x00000008 | 100                 |
| 0x0000000C | 40                  |
| 0x00000010 | 0000 0000 0000 0000 |
| 0x00000014 | 0000 0000 0000 0000 |
| 0x00000018 | 0000 0000 0000 0000 |
| 0x0000001C | 0000 0000 0000 0000 |
| 0x00000020 | 0000 0000 0000 0000 |

| Byte Address | Value           |
|--------------|-----------------|
| 0            | 0b000000101 (5) |
| 1            | 0b000000100 (4) |
| 2            | 0b11111110 (-2) |

**Little Endian**

| Byte Address | Value      |
|--------------|------------|
| 0            | 11 00 1101 |
| 1            | 0000 0110  |
| 2            | 0000 0101  |
| 3            | 0000 0000  |

**Big Endian**

| Byte Address | Value      |
|--------------|------------|
| 0            | 0000 0000  |
| 1            | 0000 0101  |
| 2            | 0000 0110  |
| 3            | 11 00 1101 |

What is Big Endian Ordering?

Which do we use in this class?

How do you compute the address of an element in an array?

Diagrams to visualize arrays?

How do we know how many bits are needed to address our memory?

→ It depends on the size of our total memory!

→  $\text{bits\_per\_addr} = \log_2(\text{size of memory in bytes})$

→ Ex:

- Memory = 16 bytes ... each address is  $\log_2(16) = 4$  bits (eg, addr 1 = 0010)
- Memory = 4 GB ...
  - 1 GB =  $2^{30}$  bytes
  - each address is  $\log_2(4 \cdot 2^{30}) = 32$  bits
  - eg, addr 1 is 0000 0000 0000 0000 0000 0000 0000 0010.

# Digital Logic

What is a logic gate?

- A device that performs a boolean function to produce a single binary output.
- Acts as a building block for digital circuits.
- Several types of logic gates, each of which represent a single logical operator.
- Represents NOT logical operator
- Can be represented by a symbol, truth table, or equation:

Symbol



Equation

$$Y = \bar{A}$$

Truth Table

| A | Y |
|---|---|
| 0 | 1 |
| 1 | 0 |

## Gate

What is an AND Gate?

Symbol



Equation

$$Y = AB$$

or

$$Y = A \times B$$

Truth Table

| A | B | Y |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |

What is an OR Gate?



$$Y = A + B$$

| A | B | Y |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 1 |

What is an XOR Gate?



$$Y = A \oplus B$$

| A | B | Y |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |

What is a NAND Gate?



the circle indicates negation

$$Y = \overline{A \times B}$$

$$Y = \overline{AB}$$

| A | B | Y |
|---|---|---|
| 0 | 0 | 1 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |

→ Y is true when either A OR B is true, but not both. "XOR" = exclusive OR

What is a NOR Gate?



the circle indicates negation

$$Y = \overline{A + B}$$

$$Y = \overline{A+B}$$

| A | B | Y |
|---|---|---|
| 0 | 0 | 1 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 0 |

→ Y is true when (A and B) is false...just negate the result of AB.

What is a XNOR Gate?



the circle indicates negation

$$Y = \overline{A \oplus B}$$

$$Y = \overline{A \oplus B}$$

| A | B | Y |
|---|---|---|
| 0 | 0 | 1 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |

→ Negate the result of (A XOR B)

Example: How do you form an equation from a logic diagram?



How do you do the reverse?

How do you form an equation from a truth table?

Equation

$$(\overline{AB} \times \overline{C+D}) + \overline{E} = (\overline{AB})(\overline{C+D}) + \overline{E}$$

→ Equation:  $Y = \overline{(A+B)} \overline{CD} + \overline{E}$



→ Go through each line of the table where  $Y=1$  & create an equation that describes it using AND statements.

→ Then, your final equation is the addition (aka OR) of all those equations

• "sum of products" form

Example?

| A | B | C | Y |
|---|---|---|---|
| 0 | 0 | 0 | 1 |
| 0 | 0 | 1 | 0 |
| 0 | 1 | 0 | 1 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 |
| 1 | 1 | 0 | 0 |
| 1 | 1 | 1 | 0 |

$$Y = \overline{A}\overline{B}\overline{C} + \overline{A}B\overline{C} + A\overline{B}C$$

# Transistors

## - Basic Circuits -

What is **voltage**?

→ The force that makes currents flow!

→ Measured in **Volts (V)**

→ The rate of flow of electrons.

→ Measured in **Amperes (A)**

→ A Dam with Water & the top and a reservoir at the bottom.

→ **Voltage**  $\approx$  The desire of the water to flow downhill.

• NOT the actual movement of the water. **Voltage itself is static**.

→ **Current**  $\approx$  The actual flow of the water downstream.

• The size of the opening affects the amount of water (current) that can flow.

• The wider the "opening", the higher the current

→ A circuit that is fully connected & allows electricity to flow uninterrupted.

→ When a switch (and circuit) is closed, we know the voltage on both ends of the circuit.



What is an **open circuit**?

→ A circuit that contains a broken connection.

→ Electricity stops flowing @ point where connection was lost.

→ When a switch is open, we don't necessarily know the voltage at the end of the circuit - We'd need to see the full circuit.



What are **switches**?

What is the symbol?

→ Controlled by physical contact. **SYMBOL:**

→

-Transistors-

What are **transistors**?

→ Like switches, but controlled by a **voltage**, rather than physical contact.

What "states" do transistors operate in?

→ Just 2 states - ON or OFF ... it's binary.

→ **This is why all machines communicate in binary (1s and 0s)!**

ON OFF

Wait... I thought you said transistors are controlled by voltage?

→ They are, but we can use binary terms rather than exact voltage levels for discussing them.

→ **high voltage** = "logic high" = logic 1 = **1**

→ **low voltage** = "logic low" = logic 0 = **0**

What is the symbol for transistors?

→ The terminal labeled with the blue dot determines whether current can flow between the terminals labeled with the red dot



What are the terminals in an NMOS transistor?

1. **Gate** : controls whether transistor is on or off

2. **Source** : endpoint

3. **Drain** : endpoint



What does it mean if the gate voltage is high?

→ a.k.a. "logic 1"

→ Current can flow between the source and the drain... they will be connected & will have the same voltage.

→ e.g., if  $V_s = \text{logic 0}$ ,  $V_d$  also  $\Rightarrow \text{logic 0}$ . And vice versa.



What does it mean if the gate voltage is low?

→ SOURCE & drain are not connected; current cannot flow.

→ The voltage of the drain will be unknown.



What is a PMOS Transistor?

→ Same 3 terminals as with NMOS transistor.

→ The gate voltages are reversed:

- Gate voltage LOW ( $\text{logic 0}$ ) = current can flow
- Gate voltage HIGH ( $\text{logic 1}$ ) = current cannot flow

#### NMOS

How do we define the behavior of a transistor?

• behaves as an open switch when  $V_g$  is low.

• behaves as a closed switch when  $V_g$  is high.

#### PMOS

• behaves as an open switch when  $V_g$  is high.

• behaves as a closed switch when  $V_g$  is low.

## What is Moore's Law?

- A trend describing the reduction in transistor size in machines/computers
- Moore's Law: The # of transistors in a given area on a chip doubles every 2 years
  - ~2,000 transistors in 1970 → > 10 billion today!
- The trend is coming to an end now, as the industry develops alternative technologies to continue improving chip performance.

# Building Logic Gates with Transistors & CMOS

RECALL: Ex of a logic gate?

→ Inverter gate:  $A \rightarrow y$  ( $y = \bar{A}$ )

→ See pg 3 of notes.

What is power?

→ A component (of a circuit?) that produces a logic high (1) value

→ Symbol: T

→ Always connected to output via a pmos transistor.

• RECALL: pmos transistor is closed/ON when input = 0

→ A component that produces a logic low (0) value.

→ Symbol: ↓

→ Always connected to output via an nmos transistor.

• RECALL: nmos transistor is closed/ON when input = 1

→ RECALL: switches are a way to physically control the opening & closing of circuits, while transistors operate on logic values.



Visual example of an inverter with switches?

How does this inverter work when input = 0?

→ When input = 0, we manually close the top/power switch.

→ When input = 1, we manually close the bottom/ground switch.



→ Since power produces logic = 1 & power is connected to output, output = 1



→ Since ground produces logic = 0 & ground is connected to output, output = 0.

How does it work when input = 1?

How would we build this inverter with transistors?

→ Replace the power switch with a pmos transistor. Replace ground switch w/ an nmos transistor.

→ Then, invert the output of the power transistor (with the o symbol)



How does this inverter work when input = 1?

- If input = 1, pmos tr. is OPEN, so power is not connected to output.
- If input = 1, nmos tr. is CLOSED, so ground is connected to output.



How does this inverter work when input = 0?

- If input = 0, pmos tr. is CLOSED, so power is connected to output.
- If input = 0, nmos tr. is OPEN, so ground is not connected to output.



What is a CMOS?

- CMOS = "Complementary MOS" ... the pMOS & nMOS transistors complement each other to form the logic gate.

→ The inverter described above is a CMOS inverter.

→ What "drives" the output whenever the output is 1.

→ ALWAYS associated with PMOS transistor

→ What "drives" the output whenever output is 0.

→ ALWAYS associated with NMOS transistor.

→ Only one network will be on at a time.

→ Think of them as an AND function. When 2 switches A and B are on/closed

→ Current = switchA AND switchB



What is the pull-up network?

→ The inverter described above is a CMOS inverter.

→ What "drives" the output whenever output is 0.

→ ALWAYS associated with NMOS transistor.

→ Only one network will be on at a time.

→ Think of them as an AND function. When 2 switches A and B are on/closed

→ Current = switchA AND switchB



What are "switches in series"?

How do they look?

→ Think of them as an OR function. Current will flow if A OR B is closed.

→ Current = switchA OR switchB



What are switches in parallel?

How do we build a CMOS for the NAND logic gate?

→ RECALL: NAND  $y = \overline{A \times B} / Y = \overline{AB}$

→ When  $AB$  is true, we want output = 0, so we can build "switches in series" (aka

$A \text{ AND } B$ ) and connect them to GROUND, since GROUND always sends



- This part of the CMOS will result in something (namely, logic = 0) being sent to Y whenever  $(AB)$  is true.  $A=1, B=1 \Rightarrow \text{Ground} = 1 \Rightarrow Y = 0$

What do we build for when A and B aren't both logic = 1?

→ When  $A = 0$  OR  $B = 0$ , we want output = 1. So we can build "switches in parallel" (aka  $A \text{ OR } B$ ), but invert the values being sent to the A & B transistors.

- If  $A = 0$  and  $B = 0$ , send 1 and 1 to the 2 switches. Since at least one is 1, output of 1 is received by power.  $A=0, B=0 \Rightarrow \text{Power} = 1 \Rightarrow Y = 1$
- If  $A = 0$  and  $B = 1$ , send 1 and 0 to the 2 switches. Since at least one is 1, output of 1 is received by power.  $A=0, B=1 \Rightarrow \text{Power} = 1 \Rightarrow Y = 1$



How do we combine these 2 pieces to form the CMOS NAND?

→ RECALL: NOR  $y = \overline{A+B}$

| A | B | Y |
|---|---|---|
| 0 | 0 | 1 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 0 |

- Lets build "switches in series" & invert the inputs so that result = 1 if and only if  $A = 0$  and  $B = 0$

- Since we actually want the output to be 1 (output = result) when  $A = 0$  and  $B = 0$ , we should connect these switches to power.



| A | B | Y |
|---|---|---|
| 0 | 0 | 1 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 0 |

- We want output = 0 whenever  $A + B$  ( $A$  OR  $B$  is 1) = TRUE
- To do this, let's build switches in parallel so that whenever  $A=1$  or  $B=1$ , result = 1
- Since we actually want output = 0 when the above "result" = 1, we should connect these switches to ground.



How do we combine these 2 pieces to form the CMOS NOR?

How do we build a CMOS for the AND logic gate?

| A | B | Y |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |

→ RECALL:  $\text{AND} = Y = AB$

- When  $A \& B$  is true, send output to POWER; use switches in series.
- Use switches in parallel; send inverted inputs so that the result = 1 whenever these are true



→ But actually, this does not work!

Why aren't the diagram & method above correct for CMOS "AND"?

→ because we can't use NMOS transistors in the pull-up network?

→ Solution: Use a NAND gate & add an inverter:



1) When the NAND gate sends an output of 0 through GND, Vdd receives 0 = 2, & thus  $Y = 1$

2) When NAND sends output = 1 through Vdd, GND receives output 1, & thus sends output = 0 to Y.

How do we build a CMOS for the OR gate?

→ Similar concept: A NOR Gate + an inverter:



1) When  $A=0 \& B=0$ , Vdd sends output = 1 to inverter. Inverter sends output = 1 to GND and  $o_p=0$  to Vdd. Thus, GND sends output = 0 to Y

2) When  $A=0$  and  $B=1$ , GND sends output = 0 to inverter. Inverter sends output = 1 to Vdd, and output = 0 to GND. Thus, Vdd sends output = 1 to Y.



How many transistors are required to build each type of gate?

| Gate | # of transistors           |
|------|----------------------------|
| NOT  | 2 ( $\neg A \rightarrow$ ) |
| AND  | 6 (Fig. 3)                 |
| OR   | 6 (Fig. 4)                 |
| NAND | 4 (Fig. 1)                 |
| NOR  | 4 (Fig. 2)                 |
| XOR  | 12                         |
| XNOR | 12                         |

Why do AND and OR require 6 transistors?

→ Note that AND and OR gates require

$$4 \text{ (NAND/NOR gate)} + 2 \text{ (inverter)} = 6 \text{ transistors!}$$

## Digital Logic Pt. 2

Why do we want to minimize transistor count when building logic gates?

→ Smaller amt. of transistors Reduces:

- Delay from input to output
- The area of the circuit taken up; this is good b/c we can pack more logic (i.e. transistors) in a given space.
- Power consumption
- The cost of the circuit

What is a logically equivalent circuit to an inverter?

→ The "bubble" on an AND or OR symbol:



6 (OR) + 2 (NOT) = 8 transistors

is equivalent to



4 (NOR) transistors

But obviously, one uses much less transistors than the other!



This is the better, conventional way to draw NAND



This is the better, conventional way to draw NOR

What is a logically equivalent circuit for NAND?

What about NOR?

How do we minimize transistor count on logic gate diagrams?

Example!

1. Replace AND and OR gates with NAND gates, or NOR gates.
2. Add bubbles to make the circuit logically equivalent.
3. Cancel out the free bubbles.
4. Draw the NAND and NOR gates in their conventional form.



1)



2)



Added inverter bubbles to  
make the circuit equivalent  
After replacing the AND and  
OR with NANDs

3)



These 2 cancel out  
These 2 cancel out

4)



Replace  $\ominus$  with  $\ominus$

## - Gates with more than 2 inputs -

What does the symbol for  
a 3-input AND gate look like?



OR (logical equivalent)



Symbol for 3-input OR gate?



OR (logical equivalent)



Symbol for 3-input NAND gate?



OR (logical equivalent)



Symbol for 3-input NOR gate?



OR (logical equivalent)



Symbol for 3-input XNOR gate?



OR (logical equivalent)



Symbol for 3-input XNOR  
gate?



OR (logical equivalent)



# The Laws of Boolean Algebra

What do we use boolean algebra for?

→ To create logically equivalent circuits that use fewer transistors ↗  
(MOTIVATION: Recall notes on Digital Logic pt. 2.)

What are boolean algebra laws?

→ Laws about boolean statements where:

- Letters ( $A, B, C$ , etc.) represent literals       $\cdot " + " = \text{OR}$
- $\cdot " \cdot " (\text{aka multiplication}) = \text{AND}$

→ Every Boolean alg law comes in a pair of statements: the AND version and the OR version.

What is the Identity Law?

→  $A + 0 = A$  (Any literal A or'd with 0 will always output A).

→  $A \cdot 1 = A$  (Any literal A and'd with 1 will always output A).

What is the Null Law?

→  $A + 1 = 1$  (A OR 1 always = 1)

→  $A \cdot 0 = 0$  (A AND 0 always = 0)

What is the Idempotent Law?

→  $A + A = A$  (A OR A = A)

→  $A \cdot A = A$  (A AND A = A)

What is an example of optimizing a circuit?

→ The equation  $y = AA + BC$

produces this circuit:

- it has  $6 + 6 + 6 = 18$  transistors



→ Using the idempotent law  $AA = A$ , we can reduce the equation to  $y = A + BC$ , producing the circuit

- it has  $6 + 6 = 12$  transistors



What is the Commutative Law?

→ By converting the gates into NAND gates (RECALL: Digital Logic 2), we can further reduce the circuit to 10 transistors:



What is the Associative Law?

→  $A + B = B + A$

→  $A \cdot B = B \cdot A$

→  $(A + B) + C = A + (B + C)$

→  $(A \cdot B) \cdot C = A \cdot (B \cdot C)$

→  $A \cdot (B + C) = (A \cdot B) + (A \cdot C)$

↳  $(A \text{ AND } (B \text{ OR } C)) = ((A \text{ AND } B) \text{ OR } (A \text{ AND } C))$

→  $A + (B \cdot C) = (A + B) \cdot (A + C)$

↳  $((A \text{ OR } B) \text{ AND } (A \text{ OR } C)) = ((A \text{ OR } B) \text{ AND } (A \text{ OR } C))$

What is the Distributive Law?

Proof for the distributive law (LDR version)?

→ Let's use the other laws to prove the truth of  $A + (B \cdot C) = (A+B) \cdot (A+C)$

1. Distribute out the terms: 
$$\begin{aligned} A + (BC) &= AA + AC + AB + BC \\ &\quad \downarrow \\ &= A + AC + AB + BC \\ &= A(\cancel{1} + C + B) + BC \\ &= A(\cancel{1}) + BC \\ &= A + BC \end{aligned}$$

What is DeMorgan's Theorem?

→ Used to simplify large NOT bars (e.g.  $\overline{A+B}$ )

→  $\overline{A+B} = \overline{A} \cdot \overline{B}$  (NOT (A OR B) = NOT(A) AND NOT(B))

→  $\overline{AB} = \overline{A} + \overline{B}$  (NOT (A AND B) = NOT(A) OR NOT(B))

→ This theorem is actually how we created the "conventional" NAND gate:



→ DeMorgan's: flipping the inputs ( $\overline{AB} \rightarrow AB$ ), operations ( $AB \rightarrow A+B$ ), and the outputs ( $A+B \rightarrow \overline{A} + \overline{B}$ )... thus  $\overline{AB} \rightarrow \overline{A} + \overline{B}$

→ When converting a truth table into a boolean equation!

1. Create an equation in "sum of products" form (RECALL)
2. Use the laws to simplify!

When would we need to use these laws for simplification?

# Karnaugh Maps (K Maps)

What are K Maps?

- A tool used to simplify boolean expressions
  - An alternate tool to using boolean algebra laws.
- K-Maps provide a graphical method to find simplified boolean expressions.
- Benefit: If used properly, K Maps always derive the most simplified equation! Unlike boolean alg, where you aren't always 100% sure if you've maximized simplification
- A diagram/grid that makes it easy to identify combinable terms.
- Each box in the K-map corresponds to a single row/entry in the truth table.

How do K Maps work?

Example of a K Map for

a 3-input Function?

Given

this

truth table:

| A | B | C | Y |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 1 |
| 0 | 1 | 0 | 0 |
| 0 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 0 |
| 1 | 1 | 1 | 0 |

| A \ BC |   | B              |                |                |                |
|--------|---|----------------|----------------|----------------|----------------|
|        |   | 00             | 01             | 11             | 10             |
| 0      | 0 | R <sub>0</sub> | R <sub>1</sub> | R <sub>3</sub> | R <sub>2</sub> |
|        | 1 | R <sub>4</sub> | R <sub>5</sub> | R <sub>7</sub> | R <sub>6</sub> |

Because in Row 2,  
A=0, B=1, C=0

→ The "0" & "1" on y-axis indicate values of A

→ The "00", "01", "11", "10" are values of B & C, respectively.  
 $B=0 \downarrow$        $B=0 \downarrow$        $B=1 \downarrow$        $B=1 \downarrow$   
 $C=0 \quad C=0 \quad C=1 \quad C=0$

How do we fill in the K-Map?

→ With the Y (output) of each row!

| A \ BC |   | B  |    |    |    |
|--------|---|----|----|----|----|
|        |   | 00 | 01 | 11 | 10 |
| 0      | 0 | 0  | 1  | 1  | 0  |
|        | 1 | 0  | 0  | 0  | 0  |

What is gray code?

→ The ordering that we used for the B and C values in the ex above.

→ In Gray Code, successive values differ by only 1 bit - unlike binary:

Gray Code: 00, 01, 11, 10

Binary: 00, 01, 10, 11



eg L, R, above, or below

Why is gray code useful for K-Maps?

→ Each square in the K-Map differs from its adjacent squares by one literal:

| A \ BC |   | B  |    |    |    |
|--------|---|----|----|----|----|
|        |   | 00 | 01 | 11 | 10 |
| 0      | 0 | 0  | 1  | 1  | 0  |
|        | 1 | 0  | 0  | 0  | 0  |

• Entry 0 & Entry 1 both share " $\bar{A}\bar{B}$ ", and only differ by the value of C literal

• Entries 1 & 3 both differ only in their B literal

So how do you **use** a K-Map  
to find simplified Boolean  
equations?

Example of using a K-Map?

How do you **read** a  
grouped K-Map?

What are the **Grouping**  
**Rules** for K-Maps?

1. Make the K-Map "table", & fill in all squares (according to truth table)

|   |    | 00 | 01 | 11 | 10 |   |
|---|----|----|----|----|----|---|
|   |    | 0  | 0  | 1  | 1  | 0 |
|   |    | 1  | 0  | 0  | 0  | 0 |
| A | BC |    |    |    |    |   |

2. Group the terms together -- all terms where the entry is **1** must get grouped.

We'll learn later about how to create the groups.

|   |    | 00 | 01 | 11 | 10 |   |
|---|----|----|----|----|----|---|
|   |    | 0  | 0  | 1  | 1  | 0 |
|   |    | 1  | 0  | 0  | 0  | 0 |
| A | BC |    |    |    |    |   |

"Grouping" Entries 1 & 3

3. Find the literals that each group has in common, and "OR" them in the final eq.

• Entry 1:  $\bar{A} \bar{B} C$

• Entry 3:  $\bar{A} B C$

• Both entries share  $\bar{A}$  and  $C$ ! ANSW:  $Y = \bar{A} C$

|   |    | 00 | 01 | 11 | 10 |   |
|---|----|----|----|----|----|---|
|   |    | 0  | 0  | 0  | 0  | 0 |
|   |    | 1  | 1  | 1  | 1  | 0 |
| A | BC |    |    |    |    |   |

2. For each group, analyze each entry in the group. Extract the terms that all entries have in common (there should be 2?). AND these terms together

• Left group:  $A = 1, B = 0 \rightarrow A\bar{B}$

• Right group:  $A = 1, C = 1 \rightarrow A C$

2. OR the terms from step 1 together to create the final equation.

$$Y = A\bar{B} + A C$$

1. All squares in each group must contain only **1**s, and every cell containing a **1** must be in at least one group.

|   |   | 00 | 01 |
|---|---|----|----|
|   |   | 0  | 0  |
|   |   | 1  | 1  |
| A | B |    |    |

|   |   | 00 | 01 |
|---|---|----|----|
|   |   | 0  | 0  |
|   |   | 1  | 1  |
| A | B |    |    |

X

2. Groups may be horizontal or vertical, but NOT diagonal.

3. Groups must contain  $2^n$  cells, where  $n = 0, 1, 2, \dots$  aka, must be a power of 2. e.g. 1, 2, 4, 8, ... cells.

|   |    | 00 | 01 | 11 | 10 |  |
|---|----|----|----|----|----|--|
|   |    | 0  | 1  | 1  | 0  |  |
|   |    | 1  | 1  | 1  | 1  |  |
| A | BC |    |    |    |    |  |

|   |    | 00 | 01 | 11 | 10 |  |
|---|----|----|----|----|----|--|
|   |    | 0  | 0  | 0  | 0  |  |
|   |    | 1  | 1  | 1  | 0  |  |
| A | BC |    |    |    |    |  |

X

4. Each group must be as large as possible.

| ABC |    | 00 | 01 | 11 | 10 |
|-----|----|----|----|----|----|
|     |    | 0  | 1  | 1  | 1  |
|     |    | 1  | 0  | 0  | 1  |
| A   | BC |    |    |    |    |

| ABC |    | 00 | 01 | 11 | 10 |
|-----|----|----|----|----|----|
|     |    | 0  | 1  | 1  | 1  |
|     |    | 1  | 0  | 0  | 1  |
| A   | BC |    |    |    |    |

X

5. Groups may wrap around the table.

| ABC |    | 00 | 01 | 11 | 10 |
|-----|----|----|----|----|----|
|     |    | 0  | 1  | 0  | 1  |
|     |    | 1  | 1  | 0  | 0  |
| A   | BC |    |    |    |    |

\* Here, the group is  $\bar{A}\bar{B}\bar{C}$ ,  $A\bar{B}\bar{C}$ ,  $\bar{A}B\bar{C}$ , and  $AB\bar{C}$   
 $(Y = \bar{C})$

6. When multiple groups, every group should contain at least one unique 1 (aka unique entry).

| ABC |    | 00 | 01 | 11 | 10 |
|-----|----|----|----|----|----|
|     |    | 0  | 1  | 1  | 1  |
|     |    | 1  | 0  | 0  | 1  |
| A   | BC |    |    |    |    |

| ABC |    | 00 | 01 | 11 | 10 |
|-----|----|----|----|----|----|
|     |    | 0  | 1  | 1  | 1  |
|     |    | 1  | 0  | 0  | 1  |
| A   | BC |    |    |    |    |

the bottom group  
is redundant

X

How do we create a K-Map  
From a truth table with 4  
variables?

→ Truth table :

| Row # | A | B | C | D | Y |
|-------|---|---|---|---|---|
| 0     | 0 | 0 | 0 | 0 | 0 |
| 1     | 0 | 0 | 0 | 1 | 0 |
| 2     | 0 | 0 | 1 | 0 | 1 |
| 3     | 0 | 0 | 1 | 1 | 1 |
| 4     | 0 | 1 | 0 | 0 | 1 |
| 5     | 0 | 1 | 0 | 1 | 1 |
| 6     | 0 | 1 | 1 | 0 | 0 |
| 7     | 0 | 1 | 1 | 1 | 1 |
| 8     | 1 | 0 | 0 | 0 | 0 |
| 9     | 1 | 0 | 0 | 1 | 0 |
| 10    | 1 | 0 | 1 | 0 | 1 |
| 11    | 1 | 0 | 1 | 1 | 0 |
| 12    | 1 | 1 | 0 | 0 | 1 |
| 13    | 1 | 1 | 0 | 1 | 1 |
| 14    | 1 | 1 | 1 | 0 | 0 |
| 15    | 1 | 1 | 1 | 1 | 0 |

→ K-Map (orange numbers indicate corresponding Row) :

| AB |    | 00 | 01 | 11 | 10 |
|----|----|----|----|----|----|
|    |    | 00 | 0  | 2  | 3  |
|    |    | 01 | 4  | 5  | 7  |
| CD |    |    |    |    |    |
| 00 | 00 | 0  | 2  | 3  | 2  |
| 01 | 01 | 4  | 5  | 7  | 6  |
| 11 | 11 | 12 | 13 | 15 | 14 |
| 10 | 10 | 8  | 9  | 11 | 10 |

using Gray code order so that adjacent cells differ by  
only 1 literal

this cell is 0110 ... therefore it corresponds to Row 6.

| AB |    | 00 | 01 | 11 | 10 |
|----|----|----|----|----|----|
|    |    | 00 | 1  | 0  | 0  |
|    |    | 01 | 1  | 1  | 0  |
| CD |    |    |    |    |    |
| 00 | 00 | 1  | 0  | 0  | 0  |
| 01 | 01 | 1  | 1  | 1  | 0  |
| 11 | 11 | 1  | 1  | 1  | 1  |
| 10 | 10 | 1  | 0  | 0  | 0  |

Example of reading a  
4-variable K-Map?

How do you put a boolean equation in **Sum of Products (SOP) Form?**

How do we simplify a boolean equation using a K-Map?

→ EXAMPLE:  $Y = ABC + AB\bar{C} + \bar{A}CB + ACB + \bar{A}\bar{D}\bar{C}\bar{D} + \bar{A}B + \bar{C}$

→ Step 1: Put the equation in SOP form.

$$ABC + AB\bar{C} + \bar{A}CB + ACB + \bar{A}\bar{D}\bar{C}\bar{D} + \bar{A}B + \bar{C}$$

( )      ↓      ↓      ↓

$$ABC + AB\bar{C} + \bar{A}B + \bar{C}B + ACB + AB + C + D + AC + \bar{B}C$$

→ Step 2: Fill out the K-Map.

$$Y = ABC + AB\bar{C} + \bar{A}B + AC + ACB + AB + C + D + AC + \bar{B}C$$

|  |  | CD | 00 | 01 | 11 | 10 |
|--|--|----|----|----|----|----|
|  |  | AB | 00 | 01 | 11 | 10 |
|  |  | 00 | 0  | 1  | 1  | 1  |
|  |  | 01 | 1  | 1  | 1  | 1  |
|  |  | 11 | 1  | 1  | 1  | 1  |
|  |  | 10 | 0  | 1  | 1  | 1  |

→ each term in the SOP-form equation can tell you where to place a 1 in the K-Map.  
e.g.  $AB\bar{C} \rightarrow A=1, B=1, C=0$  then  
 $Y=1 \dots$  entry 15 has a 1

→ Step 3: Group the entries & find the simplified equation.

|  |  | CD | 00 | 01 | 11 | 10 |
|--|--|----|----|----|----|----|
|  |  | AB | 00 | 01 | 11 | 10 |
|  |  | 00 | 0  | 1  | 1  | 1  |
|  |  | 01 | 1  | 1  | 1  | 1  |
|  |  | 11 | 1  | 1  | 1  | 1  |
|  |  | 10 | 0  | 1  | 1  | 1  |

• Group 1:  $D=1$

• Group 2:  $C=1$

• Group 3:  $B=1$

ANS:  $Y = B + C + D$

# Quiz 0 Review

## Logic Gates

| Name | Equation                    | Symbol                                                                                                                                                                 | Transistor Count |
|------|-----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|
| AND  | $Y = A \cdot B$             |                                                                                       | 6                |
| OR   | $Y = A + B$                 |                                                                                       | 6                |
| NOT  | $Y = \bar{A}$               | $A \rightarrow \bar{A} \rightarrow Y$                                                                                                                                  | 2                |
| NAND | $Y = \bar{A} \cdot \bar{B}$ |  OR  | 4                |
| NOR  | $Y = \overline{A + B}$      |  OR  | 4                |
| XOR  | $Y = A \oplus B$            |                                                                                       | 12               |
| XNOR | $Y = \overline{A \oplus B}$ |                                                                                       | 12               |

## 2.11 Review

| Unit   | bits       | example  |
|--------|------------|----------|
| byte   | 8          | 00001101 |
| nibble | 4          | 1101     |
| Word   | 16 or 32 ? |          |

→ With  $n$  bits,

- you can represent  $2^n$  distinct values.

- 2's complement most negative num:  $-(2^{n-1})$

- LARGEST NUM:  $2^n - 1$

- 2's complement most positive num:  $(2^{n-1}) - 1$

→ LSB = RIGHTMOST 8 bits. Stored in the 1<sup>st</sup> mem address (little endian)

→ MSB = LEFTMOST 8 bits. Stored in the last mem address

$42,185 = 0b0000\ 0000\ 0000\ 0000\ 1010\ 0100\ 1100\ 1001$



→ Bits needed to address memory =  $\log_2 (\text{size in bytes of total memory})$

| Address | Value     |
|---------|-----------|
| 0       | 1100 1001 |
| 1       | 1010 0100 |
| 2       | 0000 0000 |
| 3       | 0000 0000 |

# Quiz 0 Review

## 21) Review

| Decimal | 0     | 1     | 2     | 3     | 4     | 5     | 6     | 7     | 8     | 9     | 10    | 11    |
|---------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| Binary  | 00000 | 00001 | 00100 | 00110 | 01000 | 01010 | 01100 | 01110 | 10000 | 10010 | 10100 | 10110 |
| Hex     | 0x0   | 0x1   | 0x2   | 0x3   | 0x4   | 0x5   | 0x6   | 0x7   | 0x8   | 0x9   | 0xA   | 0xB   |

| Decimal | 12   | 13   | 14   | 15   | 16    | 17    | 18    | 19    |
|---------|------|------|------|------|-------|-------|-------|-------|
| Binary  | 1100 | 1101 | 1110 | 1111 | 10000 | 10001 | 10010 | 10011 |
| Hex     | 0xC  | 0xD  | 0xE  | 0xF  | 0x10  | 0x11  | 0x12  | 0x13  |

→ Converting decimal to hex:

1. Fill out this chart  $\overline{4096} \ \overline{256} \ \overline{16} \ \overline{1}$ , where every blank can have a max value of F (aka 11)

Ex  $50 \rightarrow \underline{0} \ \underline{0} \ \underline{3} \ \underline{2}$  because  $3(16) + 2(1) = 50$

→ Finding address of element in array: (`index [ei] x sizeof(element type)`) + base addr

Ex • Element 10 in int array ; base addr = `0x0000300C`

- $10 \times \text{sizeof(int)} = 4 = 40$
- Add decimal num from step 1 to decimal version of last relevant digits w/ base address:

- $0C \rightarrow 12$
- $40 + 12 = 52$

- Convert ans to decimal:  $52 \rightarrow 16(3) + 1(4) \rightarrow 0x34$

- Add result to base addr (replacing last digits w/ result):

$$\begin{array}{r}
 0x0000\ 300\textcolor{yellow}{0} \\
 + 0x0000\ 0034 \\
 \hline
 0x0000\ 3034 \quad // 
 \end{array}$$

## Transistors

→ nMOS :  $V_g$  low = open

$V_g$  high = closed = ON

→ pMOS :  $V_g$  low = closed = ON

$V_g$  high = open

nMOS → "normal" MOS →

The normal thinking would be that "high" power results in it flowing through. Therefore, in nMOS,  $V_g$  high  $\Rightarrow$  circuit closed  $\Rightarrow$  power on.

## Multiplexers

What is a 2:1 multiplexer?  
(2:1 mux)

What is the 2:1 mux symbol?

Example of a regular schematic  
for an if-statement?

How could we replace this  
diagram with a 2:1 mux?

What is a 4:1 multiplexer?  
(4:1 mux)

How does the 4:1 mux diagram  
look?

How can you implement the  
same truth table with  
a 4:1 mux made out of 2  
2:1 muxes?

→ A circuit that chooses an output from among 2 inputs, based on the value of a select signal.

→ Multiplexers are useful for implementing if-statements.



These labels tell us how the mux chooses the output based on the value of the select signal

Statement

```
if (S == 0) : Y = A
else if (S == 1) : Y = B
```

Diagram



Truth Table

| S | A | B | Y |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 1 | 0 | 1 |
| 0 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 0 |
| 1 | 1 | 1 | 1 |

SOP Equation

$$\begin{aligned}
 Y &= \bar{S}A\bar{B} + \bar{S}AB + S\bar{A}B + \\
 &\quad SAB \\
 &= A\bar{S}(\bar{B}+B) + BS(\bar{A}+A) \\
 &= A\bar{S} + BS
 \end{aligned}$$



- When  $S=0$ , it signals output  $= Y = A$
- When  $S=1$ , it signals output  $= Y = B$

→ This mux diagram replaces/avoids having to draw that bigger more complicated gate diagram.

→ A circuit that chooses an output from among 4 inputs, based on the value of a select signal.

→ Since we need 4 signaling options,  $S$  must be a 2-bit wire that can send a 2-bit signal ( $n=2$ ,  $2^n = 4$  "bit combinations")

→ EX: For the Truth Table:



| S <sub>1</sub> | S <sub>0</sub> | Y |
|----------------|----------------|---|
| 0              | 0              | A |
| 0              | 1              | B |
| 1              | 0              | C |
| 1              | 1              | D |

- When  $S=0000$ ,  $Y = A$
- When  $S=0001$ ,  $Y = B$
- When  $S=0010$ ,  $Y = C$
- When  $S=0011$ ,  $Y = D$



# Definitions in Digital Logic

What is a literal?

→ A single variable. May be complemented.

→ eg  $A, B, \bar{A}$

What is a product term?

→ An AND of individual literals.

→ eg  $A\bar{B}C$ . But NOT  $\bar{ABC}$  (because " $\bar{BC}$ " isn't a literal)

→ A product term in which all variables appear once.

→ eg  $\bar{A}\bar{B}\bar{C}, \bar{ABC}, \bar{ABC}$ , or  $ABC$

→ but NOT  $A, \bar{A}C, BC$ , etc.

How do you derive an equation using minterms?

Truth Table

| A | B | C | F |
|---|---|---|---|
| 0 | 0 | 0 | 1 |
| 0 | 0 | 1 | 0 |
| 0 | 1 | 0 | 1 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 |
| 1 | 1 | 0 | 0 |
| 1 | 1 | 1 | 1 |

Minterm

Minterm name

|                         |       |
|-------------------------|-------|
| $\bar{A}\bar{B}\bar{C}$ | $m_0$ |
| $\bar{A}\bar{B}C$       | $m_1$ |
| $\bar{A}BC$             | $m_2$ |
| $ABC$                   | $m_3$ |
| $A\bar{B}\bar{C}$       | $m_4$ |
| $A\bar{B}C$             | $m_5$ |
| $AB\bar{C}$             | $m_6$ |
| $ABC$                   | $m_7$ |

1. Write the minterm for each row.

2. Take the sum of the minterms for rows where output = 1

Equation

$$F = \bar{A}\bar{B}\bar{C} + \bar{A}\bar{B}C + A\bar{B}\bar{C} + ABC$$

OR

$$F(A,B,C) = \sum(m_0, m_2, m_3, m_7)$$

What is sum of products?

→ When 2 or more product terms are summed (OR) together.

$$\rightarrow \text{e.g. } Y = AB + A\bar{C}$$

$$\rightarrow \text{but NOT } Y = (A+B)(C+A)$$

→ An SOP form in which each product contains all literals, e.g.

$$F = \bar{A}\bar{B}\bar{C} + \bar{A}BC + A\bar{B}C + ABC$$

→ When you use Boolean algebra to simplify the canonical SOP Form, e.g.

$$F = \bar{A}\bar{B}\bar{C} + C(\bar{A}B + A\bar{C} + B) \Rightarrow \bar{A}\bar{B}\bar{C} + C(\bar{A}B + A)$$

$$\Rightarrow \bar{B}(A\bar{C}) + AC + AB \Rightarrow \bar{A}\bar{C} + AC$$

What is a maxterm?

→ A term in which all variables appear once, as literals OR'd together

$$\rightarrow \text{eg } \bar{A} + B + \bar{C}$$

→ To write the maxterm for a truth table row, sum the complement of each literal's value together.

$$\rightarrow \text{Eg: } A \ B \ C \ Y$$

|   |   |   |   |
|---|---|---|---|
| 0 | 1 | 0 | 1 |
|---|---|---|---|

$$\rightarrow A + \bar{B} + C = Y$$

How do you derive an equation using minterms?

1. Write the minterm for each row

2. Take the product of the minterms for rows where output = 0

Equation

$$F = (A + B + \bar{C})(A + \bar{B} + \bar{C})(\bar{A} + B + C)(\bar{A} + \bar{B} + C)$$

OR

$$F = \prod(M_1, M_3, M_4, M_6)$$

Why does this derived equation work?

Truth Table

| A | B | C | F |
|---|---|---|---|
| 0 | 0 | 0 | 1 |
| 0 | 0 | 1 | 0 |
| 0 | 1 | 0 | 1 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 |
| 1 | 1 | 0 | 0 |
| 1 | 1 | 1 | 1 |

| Maxterm                       | Minterm name |
|-------------------------------|--------------|
| $\bar{A} + \bar{B} + \bar{C}$ | $M_0$        |
| $A + B + \bar{C}$             | $M_1$        |
| $A + \bar{B} + C$             | $M_2$        |
| $A + \bar{B} + \bar{C}$       | $M_3$        |
| $\bar{A} + B + C$             | $M_4$        |
| $\bar{A} + B + \bar{C}$       | $M_5$        |
| $\bar{A} + \bar{B} + C$       | $M_6$        |
| $\bar{A} + \bar{B} + \bar{C}$ | $M_7$        |

→ Creating the canonical SOP equation for  $\bar{F}$

$$\bar{F} = \bar{A}\bar{B}C + \bar{A}BC + A\bar{B}\bar{C} + AB\bar{C}$$

And negating both sides + applying DeMorgan's

$$F = \bar{A}\bar{B}C + \bar{A}BC + A\bar{B}\bar{C} + AB\bar{C}$$

$$F = (A + B + \bar{C})(A + \bar{B} + \bar{C})(\bar{A} + B + C)(\bar{A} + \bar{B} + C)$$

Actually yields the same term!

## K-Map Definitions

What is an **implicant**?

- Any product term (RECALL:  $AB$ ,  $\bar{B}C$ , etc) whose output for a given Boolean equation is **1**.
- aka, in a K-Map, the terms that define any group of **1s**.

Example?

|  |  | BC | 00 | 01 | 11 | 10 |
|--|--|----|----|----|----|----|
|  |  | A  | 0  | 0  | 0  | 0  |
|  |  |    | 0  | 0  | 1  | 0  |
|  |  |    | 0  | 0  | 1  | 0  |
|  |  |    | 1  | 0  | 1  | 0  |

Implicants

$\bar{A}BC$        $BC$   
 $A\bar{B}C$        $AC$   
 $ABC$

What is a prime implicant?

- An implicant that is not a subset of any other implicant.
- aka, in a K-Map, an implicant that corresponds to a group which can **NOT** be covered by any other group.

Example?

|  |  | CD | 00 | 01 | 11 | 10 |
|--|--|----|----|----|----|----|
|  |  | AB | 00 | 1  | 1  | 0  |
|  |  |    | 00 | 1  | 1  | 0  |
|  |  |    | 00 | 1  | 1  | 0  |
|  |  |    | 01 | 0  | 1  | 0  |
|  |  |    | 11 | 1  | 1  | 0  |
|  |  |    | 10 | 1  | 1  | 0  |

- red = implicants
- prime implicants:

$\bar{C}D$ ,  $\bar{B}\bar{C}$

What is an **essential prime implicant**?

- A prime implicant where :
- At least 1 element is not covered by 1 or more other prime implicants.
- aka, in a K-Map, a group that is necessary to use in the final solution to cover all 1s.

Example?

|  |  | CD | 00 | 01 | 11 | 10 |   |
|--|--|----|----|----|----|----|---|
|  |  | AB | 00 | 1  | 1  | 0  | 0 |
|  |  |    | 00 | 1  | 1  | 0  | 0 |
|  |  |    | 00 | 1  | 1  | 0  | 0 |
|  |  |    | 01 | 0  | 1  | 1  | 0 |
|  |  |    | 11 | 1  | 1  | 1  | 0 |
|  |  |    | 10 | 1  | 1  | 0  | 0 |

- red → essential prime implicants , aka the final groups (in this case)
- orange → non-essential prime implicant.

What is an **non-essential prime implicant**?

→ A prime implicant that :

- contains NO elements which can't be covered by another prime implicant group

What is the formal procedure for using K-Maps to derive equations?

- Convert truth table to K-Map.

|    | AB \ CD | 00 | 01  | 11 | 10 |
|----|---------|----|-----|----|----|
| 00 | 0 0     | 1  | 1   |    |    |
| 01 | 1 1     | 1  | 1   |    |    |
| 11 |         | 1  | 0 0 |    |    |
| 10 | 0 0     | 1  | 1   |    |    |

- Include all essential prime implicants.

$$B\bar{C}, \bar{B}C$$

AB \ CD

|    | AB \ CD | 00 | 01 | 11  | 10 |
|----|---------|----|----|-----|----|
| 00 | 0 0     | 1  | 1  |     |    |
| 01 | 1 1     | 1  | 1  |     |    |
| 11 |         | 1  | 1  | 0 0 |    |
| 10 | 0 0     | 1  | 1  |     |    |

- Include non-essential prime implicants as needed to cover all 1s.

|    | AB \ CD | 00 | 01 | 11  | 10 |
|----|---------|----|----|-----|----|
| 00 | 0 0     | 1  | 1  |     |    |
| 01 | 1 1     | 1  | 1  |     |    |
| 11 |         | 1  | 1  | 0 0 |    |
| 10 | 0 0     | 1  | 1  |     |    |

- choosing either of the non-essential primes (green or orange) will give us the most simplified equation.

$$Y = B\bar{C} + \bar{B}C + \bar{A}L = B\bar{C} + C(\bar{B} + \bar{A})$$

OR

$$Y = B\bar{C} + \bar{B}C + \bar{A}B = \bar{B}C + B(\bar{A} + \bar{C})$$

## Lab 0

What is a tunnel?

Example?

→ A way to draw an "invisible wire" to bind 2 points together

→ Tunnels are grouped by case sensitive labels

→ Normal circuit:



→ With tunnels:



## Adder/Subtractor

How can we label the parts of a binary addition operation?

$$\begin{array}{r}
 \boxed{1} \quad \boxed{1} \quad \boxed{1} \\
 + \quad \boxed{1} \quad \boxed{0} \quad \boxed{1} \\
 \hline
 \boxed{1} \quad \boxed{0} \quad \boxed{0}
 \end{array}$$

1. Column 0

2. Column 0 sum ( $S_0$ )

3. Carry-out from column 0 ( $C_{o,0}$ ); Carry-in to column 1 ( $C_{in,1}$ )

4. Column 1 sum ( $S_1$ )

5. Carry-out column 1 ( $C_{o,1}$ ); Carry-in to column 2 ( $C_{in,2}$ )

6. Column 2 sum ( $S_2$ )

7. Carry-out from column 2 ( $C_{o,2}$ )

→ A circuit that can add together 2 one-bit values

→ It can add the 2 bits in column 0 and output 2 things:

- The sum  $S$

- The carry-out bit  $C_0$



What is a half-adder circuit?

What is the truth table and equation ( $\leftrightarrow$ ) for half adder?

| A | B | $C_0$ | S |
|---|---|-------|---|
| 0 | 0 | 0     | 0 |
| 0 | 1 | 0     | 1 |
| 1 | 0 | 0     | 1 |
| 1 | 1 | 1     | 0 |

$$S = \overline{A}B + A\overline{B} \approx A \oplus B \text{ (exclusive OR)}$$

$$C_0 = AB$$

What is the circuit diagram?



What is a full adder circuit?

→ A circuit that can add together three one-bit values and output the sum  $S$  and the CO-bit  $C_o$ .

→ This circuit can be used to add the values of a column of binary addition when there was a CO-bit, e.g. column 1:

$$\begin{array}{r}
 \boxed{1} \quad \boxed{1} \quad \boxed{1} \\
 + \quad \boxed{1} \quad \boxed{0} \quad \boxed{1} \\
 \hline
 \boxed{1} \quad \boxed{1} \quad \boxed{0}
 \end{array}$$



→ The 3 inputs of the addition in this ex are A, B, and  $C_i$  (the CO-bit from previous column).

| A | B | $C_i$ | $C_o$ | S |
|---|---|-------|-------|---|
| 0 | 0 | 0     | 0     | 0 |
| 0 | 0 | 1     | 0     | 1 |
| 0 | 1 | 0     | 0     | 1 |
| 0 | 1 | 1     | 1     | 0 |
| 1 | 0 | 0     | 0     | 1 |
| 1 | 0 | 1     | 1     | 0 |
| 1 | 1 | 0     | 0     | 0 |
| 1 | 1 | 1     | 1     | 1 |

\* Notice that when an odd # of inputs = 1 (1 or 3 inputs),  $S=1$ .

A 3-input XOR can be used to represent this

$$S = A \oplus B \oplus C_i$$

$$C_o = AC_i + AB + BC_i \quad (\text{K-Map used})$$

What is the truth table & derived equation?

What would the full adder diagram look like?



How can we create a circuit to add 2 4-bit values?

(4-bit Adder)

→ For  $C_{0,3} C_{0,2} C_{0,1} C_{0,0}$ , we can use 4 full-adders to add together  $A_3 A_2 A_1 A_0$  and  $B_3 B_2 B_1 B_0$ .



How do we create a circuit for binary subtraction?

→ RECALL: With 2's complement,  $A - B \approx A + (-B)$ . To binary subtract 2 numbers A and B, we have to negate B and then add it to A.

→ Instead of building a separate circuit, we can modify the ripple carry adder (diagram above) s.t. it can perform subtraction AND addition.  
• How? By modifying it s.t. it can negate B.

→ RECALL: to negate a binary num, negate each bit & then add 1

→ Solution: negate each input of B, and send in  $K=1$  as the carry-in for column 0:



How can we modify the ripple carry adder to create the subtractor circuit?

→ Goal: Same circuit to perform both addition and subtraction

→ Solution: Let  $C_{in,0} = K$ , and use a 2:1 mux with  $K$  as the signal, to determine whether to send  $B$  or  $\bar{B}$  to the full adder!

• if  $K=0$ : binary addition, 2:1 mux outputs  $B$

• if  $K=1$ : binary subtraction, 2:1 mux outputs  $\bar{B}$



How do we create an adder-subtractor circuit?

What is an alternate way  
to implement Adder-Subtractor?

→ Using XOR instead of a 2:1 mux.

• RECALL:  $N \oplus 1 = \bar{N}$        $N \oplus 0 = N$  (boolean rules)

→ If we XOR  $B_i$  with K,

• when  $K=0$  (addition),  $B_i \oplus K = B_i$

• when  $K=1$  (subtraction),  $B_i \oplus K = \bar{B}_i$

How would this diagram look?



## Comparator and Shifter

What is an equality

comparator circuit?

How would you design a circuit to compare 4-bit inputs and output 1 if they are equal?

How do we design a circuit to perform a left-shift on a 4-bit input?

What would be the truth table?

The diagram?

→ A circuit that determines whether the 2 inputs are equal



→ Notice that when comparing 2 1-bit inputs,  $\overline{A \oplus B}$  ( $XNOR$ ) = 1 when  $A=B$  and 0 when  $A \neq B$ :



→ Alternatively, we can save transistors by converting the AND gate to a NOR gate and subsequently turning the XNOR gates into XOR gates.

→ Since  $n=4$ , shifting an input A by more than  $n-1=3$  bits will always produce 000000. So we only need to support the  $A \ll 1$ ,  $A \ll 2$ , and  $A \ll 3$  operations in our circuit.

$$\text{Ex: } A = 0b1101$$

$$A \ll 1 = 0b10$$

$$A \ll 3 = 1000$$

$$A \ll 2 = 0100$$

$$A \ll 4 = 0000$$

→ Solution: Have a 2-bit "shift" input that acts as a select signal to the output Y. Its 2 bits, so can represent up to 4 values.



→ Let  $A_3, A_2, A_1, A_0$  depict the value of A, and same for Y.

Output Value

| Shift Amount |  | $Y_3$ | $Y_2$ | $Y_1$ | $Y_0$ |
|--------------|--|-------|-------|-------|-------|
| 0            |  | $A_3$ | $A_2$ | $A_1$ | $A_0$ |
| 1            |  | $A_2$ | $A_1$ | $A_0$ | 0     |
| 2            |  | $A_1$ | $A_0$ | 0     | 0     |
| 3            |  | $A_0$ | 0     | 0     | 0     |



## Arithmetic Logic Unit (ALU)

What is a boolean unit?

- A circuit/logic unit that can perform 4 boolean operations (AND, OR, XOR, NOR)
- Encompasses the logic for all 4 operations, and uses a 2-bit control signal (RECALL: multiplexers!) to tell the circuit which operation to perform.



How does the Boolean Unit operate?

- The value of the bool select signal indicates which operation to perform:

| Bool | Operation | Expression       |
|------|-----------|------------------|
| 00   | AND       | $AB$             |
| 01   | OR        | $A+B$            |
| 10   | XOR       | $A \oplus B$     |
| 11   | NOR       | $\overline{A+B}$ |

What is the diagram for a boolean unit?

- The unit will perform all of the operations, and then output the one that was requested based on the value of "bool".



What is the Bidirectional Shifter?

- A circuit/logic unit that can perform 3 shift operations: left shift, logical right shift, or arithmetic right shift.

- Encompasses the logic for all 3 operations, and uses a 2-bit control signal to tell the circuit which operation to perform.

RECALL: What is a "logical" vs "arithmetic" right shift?

| Operation              | Expression | Meaning                                                                                                                                                                   | Example            |
|------------------------|------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|
| logical right shift    | $A >> B$   | Shift B to the right by A bits, and pad the left side with 0s                                                                                                             | $1100 >> 2 =$<br>  |
| arithmetic right shift | $A >>> B$  | Shift B to the right by A bits, and pad the left side with the MSB of B!<br>• If MSB(B)=1, B is neg., so we pad with 1s.<br>• If MSB(B)=0, then $B \ll A \approx B \gg A$ | $1100 >>> 2 =$<br> |

How does the bidirectional shifter operate?

→ The 2-bit select signal "Bool" indicates which operation to perform:



What is the add/sub?

→ RECALL: The adder/subtractor logic unit!

→ Uses a 1-bit control signal to tell the circuit which operation to perform.

| Sub | Operation | Expression |
|-----|-----------|------------|
| 0   | Add       | $A + B$    |
| 1   | Sub       | $A - B$    |



What is an Arithmetic Logic Unit?

What are "Shift" and "Math"?

→ A circuit/logic unit that can perform add/sub, bidirectional shift, AND boolean operations!

→ The value of the Shift and Math select signals are used to indicate which operation to perform.

- Shift (2-bit): if Shift = 1, perform a shift operation based on the value of Bool. If Shift = 0, perform a boolean operation based on the value of Bool.
- Math (1-bit): if Math = 1, perform an add/sub op. based on the value of Sub. if Math = 0, perform a boolean or shift op. based on the values of Shift and Bool.

Table of all operations in the ALU?

| Math | Shift | Sub | Bool | Operation  |
|------|-------|-----|------|------------|
| 1    | X     | 0   | XX   | $A + B$    |
| 1    | X     | 1   | XX   | $A - B$    |
| 0    | 1     | X   | 00   | $B \ll A$  |
| 0    | 1     | X   | 10   | $B \gg A$  |
| 0    | 1     | X   | 11   | $B \gg> A$ |
| 0    | 0     | X   | 00   | A AND B    |
| 0    | 0     | X   | 01   | A OR B     |
| 0    | 0     | X   | 10   | A XOR B    |
| 0    | 0     | X   | 11   | A NOR B    |

{ when Math=1, we don't care abt the values of Shift and Bool; we just need to know if doing ADD or SUB }

{ when Math=0, we don't care abt the value of Sub. }

What is the ALU diagram?



What is the Z Flag?

→ A component of the ALU that outputs logic 1 if & only if the Result = 0



## Waveform Diagrams

What are waveform diagrams?

How does a waveform look?

What are rising and falling edges?

Example of a waveform diagram?

→ A way to represent the values of our signals over time.

→ Each one-bit signal will have its own waveform.

→ Like a square wave, where a low value  $\approx$  logic 0 and a high value  $\approx$  logic 1:



→ Rising edge: When a signal "rises" from logic low (0) to logic high (1).

→ Falling edge: When a signal "falls" from logic high (1) to logic low (0).

→ Take the following circuit:

- At time = 0 ps, we will

set input  $A=0$  and  $B=1$



- At time  $t=40$  ps, we will change input  $A$  to  $A=1$ .



What is propagation delay?

→ The time taken for a signal to travel from input to output

- it doesn't happen immediately, as we've been assuming thus far.

→ REASON: Time for electricity to travel through a wire; capacitors charging & discharging, etc.

→ Denoted:  $t_{pd}$  = time of propagation delay.

→ When we change an input (e.g. A or B), the change in the output's waveform diagram won't be reflected immediately - b/c first there is a delay of  $\approx$  picoseconds.

→ The exact value of  $t_{pd}$  for different logic gates will be provided by Gecce; don't need to calculate.

How does propagation delay affect our waveform diagrams?

Example of computing propagation delay?

→ Let  $t_{pd}(\text{NOT gate}) = 10$  and  $t_{pd}(\text{OR gate}) = 20$ .



→ If the value of input A changes at  $t = n$ , the value won't be inverted until  $n + 10 \text{ ps}$ .

→ If the value of input A changes at  $t = n$ , the value of Y won't change until

$$n + 10 \text{ ps} (\text{NOT gate}) + 20 \text{ ps} (\text{OR gate}) = n + 30 \text{ ps}$$

• Therefore, delay from A  $\rightarrow$  Y = 30 ps

→ Delay from B to Y = 20 ps

→ Given: A and B start at 0. at  $t = 40 \text{ ps}$ , we will set A = 1.

→ To visualize the p.d., we also track a "point C" placed right after the inverter.



How do we draw the waveform diagram for this circuit?

What is the longest combinational path?

→ Given a circuit and the  $t_{pd}$  values for each logic gate, the LCP is the path from an input to Y that has the greatest propagation delay.

→ Ex



→ LCP = A  $\rightarrow$  Y :

$$15 \text{ ps} + 20 \text{ ps} + 40 \text{ ps} = 75 \text{ ps}$$

→ Shortest path = B  $\rightarrow$  Y: 20 ps + 40 ps = 60 ps

| Gate | $t_{pd}(\text{ps})$ |
|------|---------------------|
| NOT  | 15                  |
| NAND | 20                  |
| AND  | 30                  |
| OR   | 40                  |

## Flip - Flops

What is a D Flip Flop?

- A digital electronic circuit that is used to delay ("D" = delay) the change of state of its output signal using clock timing.

→ Diagram:



- D = the value we want to store
- Q = the stored value (from D) at any given point.
- $\bar{Q}$  = inverted stored value
- C = the Clock; determines when D is stored.

What is a clock?

- A one-bit signal that oscillates (or "toggles") back and forth between 0 and 1 at a constant/consistent pace (e.g. "Clock period = 100 ps")

- The purpose of the clock is to ensure that our circuits stay in sync.
- Recall Calculus: The period of a clock is the time between one rising edge and the next:



- The period is how we define/quantify the speed of a clock; e.g. "A clock with clock period = 700 ps".

- The period determines how fast our circuit runs; the shorter the period, the faster our circuit.

### 1. Positive-Edge triggered:

- Q stores the value of D when the clock goes from 0 → 1
- Q updates on every rising edge of the clock

### 2. Negative-Edge triggered:

- Q stores the value of D when the clock goes from 1 → 0
- Q updates on every falling edge of the clock

- Basically, we have 1 or more inputs and some logic circuit that outputs a value D based on the input(s) — e.g., a NAND circuit, a full adder, an ALU, etc.

- The Flip-Flop adds a variable Q which starts at some specified value (e.g. 0) and then, every time the clock has a rising edge — aka every p seconds, where p = the period length — the value of Q is updated to equal the value of D at that point in time.



What are the 2 types of D Flip-Flops?

How does a positive-edge D flip-flop work?

Example of a waveform diagram for a pos. edge-triggered flip flop?

What is Clock-to-Q delay?

Example?

- The amount of time that it takes for the "update" of D's value to appear on the output (Q) after the clock trigger.
- AKA, the propagation delay for Q — it doesn't immediately update when CLK goes from 0 → 1 ... there is a  $t_{pd}$  in between.
- The same waveform diagram from the previous ex, but with  $\text{clk-to-delay} = 1\text{ns}$ :



### Registers Motivation: Back-to-back additions

How would we perform back-to-back additions with one adder circuit?

- Let  $t_{pd}$  for our adder = 600 ps — meaning that after we input A and B, it will take 600 ps for a value and a carry-out bit to be output.
- Ex: Performing  $7+7$ ,  $18+7$ , and  $30+8$ , back-to-back.



What are the steps to doing this?

1. At time  $t=0\text{ps}$ , set  $A=7$  and  $B=7$ . Initially,  $S$  and  $C_0=0$ .
2. We can't start addition #2 until the circuit outputs  $S=14$  —  $600\text{ps}$  b/c of  $t_{pd}$ .  
At  $t=600\text{ps}$ , we can now change  $A=18$  and  $B=7$ .
3. At  $t=1200\text{ps}$ ,  $S$  has updated to  $=25$ . Now we are ready to perform the next op. so we set  $A=30$  and  $B=8$ .
4. Once we are done w/ this last one, we can set  $A=D$  and  $B=0$ .

| Time (ps) | A  | B | S  | $C_0$ |
|-----------|----|---|----|-------|
| 0         | 7  | 7 | 0  | 0     |
| 600       | 18 | 7 | 14 | 0     |
| 1200      | 30 | 8 | 25 | 0     |
| 1800      | 0  | 0 | 38 | 0     |

→ In total, performing these 3 operations took  $1800\text{ps}$ .

## - Registers -

What is a register?

→ A combination of multiple Flip Flops together

→ Used for storing multi-bit values



→ Ex: A 2-bit register

Why do we use registers?

→ If we wanted to do back-to-back operations like in the prev. addition example, we can't just manually set the input values every 600 ps.

- Instead, we use a **Clock** and we use registers to update the inputs at the proper times.

What is an N-bit

positive-edge triggered register?



• "en" = Enable

• if en=1, the register will read D on the rising edge of the clock (and store it in Q). If en=0, the register will not read D.

Ex: How would we use registers to perform the additions?



How would the timing work?

→ Each register's stored value will update on the rising edge of the clock.

This means that when we begin (e.g. 0 ps), the 1<sup>st</sup> reg will have nothing, and A & B will be storing "7" and "7".

- Upon the first rising edge (+ the CLK-to-Q delay), the register will output the vals of A and B that they store - meaning we can begin the 1<sup>st</sup> computation.

→ Due to the schedule of when the registers update, we can perform one addition per clock cycle.

→ We should set the length of the clock cycle s.t. each computation can be done in 1 cycle:



# Timing

Example of performing back-to-back additions with registers?



→ In this ex, we will perform  $30 + 8$ ,  $18 + 7$ , and  $7 + 7$ . We will complete one addition per clock cycle.

→ All values = 0

→ On the Rising Edge, the registers will update to hold values for A and B:

- $\text{next\_A} = 30$
- $\text{next\_B} = 8$

→ After the CLK-to-Q delay, the values stored in registers will get stored in Q:

- $\text{curr\_A} = 30$
- $\text{curr\_B} = 8$
- $\text{curr\_sum} = 38$
- $\text{prev\_sum} = 0$

→ After the adder propagation delay (Add pd), the adder will output the sum of Q(A) and Q(B):

- $\text{next\_A} = 30$
- $\text{next\_B} = 8$
- $\text{curr\_A} = 30$
- $\text{curr\_B} = 8$
- $\text{curr\_sum} = 38$
- $\text{prev\_sum} = 0$

→ The sum will not be sent to output Y until the next rising edge (plus CLK-to-Q delay) because curr\_sum first has to go through that final register.

→ Cycle 1 begins on the next Rising Edge of the clock, when the new operands are sent in:

- $\text{next\_A} = 18$
- $\text{next\_B} = 7$
- $\text{curr\_A} = 30$
- $\text{curr\_B} = 8$
- $\text{curr\_sum} = 38$
- $\text{prev\_sum} = 0$

→ After CLK-to-Q delay:

- $\text{next\_A} = 18$
- $\text{next\_B} = 18$
- $\text{curr\_A} = 18$
- $\text{curr\_B} = 7$
- $\text{curr\_sum} = 38$
- $\text{prev\_sum} = 38$

(Now, the value of the first add operation is "available")

→ After Adder pd:

- $\text{next\_A} = 18$
- $\text{next\_B} = 7$  (the new sum is output by the adder)
- $\text{curr\_A} = 18$
- $\text{curr\_B} = 7$
- $\text{curr\_sum} = 25$
- $\text{prev\_sum} = 38$

What happens in Cycle 2?

→ On next Rising edge:

- next-A = 7
- curr-A = 18
- curr-sum = 25
- next-B = 7
- curr-B = 7
- prev-sum = 38

→ After clk-to-q delay:

- next-A = 7
- curr-A = 7
- curr-sum = 25
- next-B = 7
- curr-B = 7
- prev-sum = 25

→ After Adder pd:

- next-A = 7
- curr-A = 7
- curr-sum = 14
- next-B = 7
- curr-B = 7
- prev-sum = 25

→ After the next rising edge & clk-to-q delay, the sum  $7+7=14$  will be sent to Y. i.e., prev-sum = 14.

→ We can adjust the length of the clock cycle - aka time between one rising edge and the next - based on our needs! It's up to us.

→ There is a minimum clock period at which the circuit will work properly, so our clock period can be set to any value  $\geq$  min. clock period.

→ The most optimized (i.e. shortest) time within which the circuit can complete one operation.

• We don't include the clk-to-q delay of the final output register when calculating the min. clock period.

→ Min clock period = clk-to-q delay of the input registers + adder propagation delay

↳ but the longest path has 1 adder

→ Min clock period = clk-to-q delay + longest combinational delay

→ The length (in times) of the longest path between 2 registers.

• AKA longest combinational path

→ Basically, the amt of times it takes for each operation unit in a path between 2 registers

→ Does NOT include the clk-to-q delay time of the registers.

→ 1/min clock period

What is the max clock frequency?

# Pipelining

What is a single-cycle implementation?

What is pipelining?

Example of a single-cycle circuit?

- An implementation of a circuit such that it completes one operation in a single clock cycle.
- Breaking our circuit down into multiple stages that can operate simultaneously in order to make more efficient use of our hardware.

• We create "stages" by adding registers.



→ This circuit performs  $Y = A + B + C + D$

→ Let clk-to-q delay = 20ps & adder pd = 480ps

→ The longest path btwn 2 registers has 2 adders, so  $480 + 480 = 960\text{ps}$

→ 20ps (clk-to-q delay) + 960ps (LCD) = 980ps

→ If we run this circuit at the min clock period, it will take  $980\text{ps} \times 10 = 9800\text{ps}$  to complete 10 operations.

Ex: What is the longest combinational delay?

Ex: What is the min clock period?

How do we implement pipelining on this circuit?



How are these 2 circuits different?

- The single-cycle impl. performs  $A + B$  and  $C + D$  simultaneously, then immediately does the sum of the 2.
- The pipelined impl. performs  $A + B$  and  $C + D$  simultaneously, but then sends the outputs into 2 registers.

Ex: How does the pipelined circuit work?

→ Given: CLK-to-q delay = 20ps and Adder pd = 480ps  
→ We will set our clock period to 500ps & do three operations, starting at t=0 ps.

1.  $10 + 9 + 8 + 7$
2.  $5 + 4 + 3 + 2$
3.  $10 + 10 + 20 + 20$



What will happen at... .

$t = 20\text{ps}$ ?

→ At the beginning, nA, nB, nC, nD will hold the operands for the 1<sup>st</sup> addition.

→ The registers will update to reflect the input values:

- nA, CA = 10
- nC, CC = 8
- nB, CB = 9
- nD, CD = 7

$t = 500\text{ps}$ ?

→ After adder pd, the outputs are updated.

→ Additionally, since we broke our circuit into 2 stages of addition, we can now start operation 2!

- |          |           |            |
|----------|-----------|------------|
| • nA = 5 | • CA = 10 | • CAB = 19 |
| • nB = 4 | • CB = 9  | • CCD = 15 |
| • nC = 3 | • CC = 8  |            |
| • nD = 2 | • CD = 7  |            |

$t = 520\text{ps}$ ?

→ The inputs for op2 move past the registers and the inputs to the 2<sup>nd</sup> adder also move past the registers.

→ Why? B/c  $t = 500\text{ps}$  was the rising edge of the clock, and then it took 20ps of CLK-to-q delay for the registers to update.

- |          |          |             |
|----------|----------|-------------|
| • nA = 5 | • CA = 5 | • CAB = 19  |
| • nB = 4 | • CB = 4 | • CCD = 15  |
| • nC = 3 | • CC = 3 | • prAB = 19 |
| • nD = 2 | • CD = 2 | • prCD = 15 |

$t = 1000\text{ps}$ ?

→ 480ps later, all 3 of the adders have performed & output values.

→ Now, we can also start operation 3!

- |           |          |             |             |
|-----------|----------|-------------|-------------|
| • nA = 10 | • CA = 5 | • CAB = 9   | • CSUM = 34 |
| • nB = 10 | • CB = 4 | • CCD = 5   |             |
| • nC = 20 | • CC = 3 | • prAB = 19 |             |
| • nD = 20 | • CD = 2 | • prCD = 15 |             |

$t = 1000 \text{ ps}$ ?

→ After the R.E. at  $t = 1000 \text{ ps}$ , the registers update to hold the values of A, B, L, D for op. 3, A+B and C+D for op. 2, and A+B+C+D for op. 1.

→ Operation 1 is now complete.

|           |           |            |             |
|-----------|-----------|------------|-------------|
| $nA = 10$ | $cA = 10$ | $cAB = 9$  | $cSum = 34$ |
| $nB = 10$ | $cB = 10$ | $cCD = 5$  | $pSum = 34$ |
| $nC = 20$ | $CC = 20$ | $prAB = 9$ |             |
| $nD = 20$ | $cD = 20$ | $prCD = 5$ |             |

$t = 1500 \text{ ps}$ ?

→ All 3 adders perform addition operations & output values.

|          |           |            |             |
|----------|-----------|------------|-------------|
| $nA = 0$ | $cA = 10$ | $cAB = 20$ | $cSum = 14$ |
| $nB = 0$ | $cB = 10$ | $cCD = 40$ | $pSum = 34$ |
| $nC = 0$ | $CC = 20$ | $prAB = 9$ |             |
| $nD = 0$ | $cD = 20$ | $prCD = 5$ |             |

$t = 1520 \text{ ps}$ ?

→ The values of A+B and C+D for op. 3 and A+B+C+D for op. 2 get sent through the registers.

→ Operation 2 is now complete.

→ It took 2 clock cycles (Op. 1 was done at 1000ps, and our clock period was 500)

→ The longest circuit between 2 registers has 1 adder, so LCP = 480ps

→ 20 ps (clk-to-q delay) + 480 ps (LCP) = 500 ps

→ Op 1 finished at 1000 ps, but operations 2 and beyond will finish at 1500, 2000, 2500...etc. ps, so for 10 operations:

$$500 \text{ ps} * (10+1) = 5,500 \text{ ps}$$

|                                | Single Cycle | Pipelined |
|--------------------------------|--------------|-----------|
| longest comb. path             | 960 ps       | 480 ps    |
| min clock period               | 980 ps       | 500 ps    |
| # of cycles to complete 1 op.  | 1            | 2         |
| Time to complete 10 ops at MCP | 9800 ps      | 5,500 ps  |

The pipelined circuit is much faster!

Comparison of the single cycle & pipelined impls of  $y = A+B+C+D$ ?

→ To perform n operations:

• Single Cycle:  $(n)(\text{clock period})$

• Pipelined:  $(n+1)(\text{clock period})$

What are the execution times for both implementations?

Why is the pipelined impl. usually faster?

- Because it has a significantly smaller MCP.
- $n(x)$  v.s.  $(n+1)(x \div 2)$  ... the 2nd expression will often yield a smaller value.

### -Metrics-

What is latency?

- Let SC-Y denote the single-cycle example circuit performing  $Y = A + B + C + D$  and let PL-Y denote the pipelined example.
- The amount of time it takes to complete a single operation, from beginning to end.
  - Ex: Latency of SC-Y = 1 clock cycle = 980 ps
  - Latency of PL-Y = 2 clock cycles =  $2 \times 500 = 1000$  ps
- The number of operations that can be completed in a given amount of time.
  - PL-Y has a higher throughput than SC-Y.

What is speedup?

- The measure of how much faster the pipelined impl. is in comparison to the single-cycle impl:
$$\text{SPEEDUP} = \frac{\text{Time to complete } n \text{ ops on SC impl.}}{\text{Time to complete } n \text{ ops on Pipelined impl.}}$$
- Ex: The speedup of operation  $Y = A + B + C + D$  for 10 operations =  $9800 \text{ ps} / 5500 \text{ ps} = 1.78$ .

## 311 Quiz 1 Review

→ TOPICS: K-maps - including "terms", Multiplexers, Digital logic terms - Canonical SOP, maxterms, minterms, Adder, comparator, shifter, TTL, Waveform diagrams, timings on FF diagrams.

### K-Maps

→ Find the literals that each group has in common, and OR them for the final term.

#### Grouping Rules:

- Each group must contain only 1s, and all 1s have to get grouped
- Groups must be  $2^n$  cells (e.g. 1, 2, 4, 8, ...)
- Each group must be as large as possible
- If  $> 2$  groups, each group must have at least 1 unique cell.
- Ways to wrap around the table:



#### STEPS:

1. Create K-map
2. Include all essential prime implicants
3. Include all non-essential prime implicants as needed to cover all 1s.

#### Implicants

→ Implicant = any product/minterm in the SOP form of an equation.

• Ex:  $Y = MN + MN\bar{D} + \bar{N}D$       █ = implicant ... aka every square with a "1" in the K-map.

→ Prime Implicant = All possible groups that can be formed in a K-map. The pi. itself is the term that defines a given group

→ Essential Prime Implicant = groups that cover at least one minterm that cannot be covered by another prime implicant.

### Multiplexers

→ Chooses output based on a select signal:

→ If # of inputs = n, select signal S

must be at least  $\log_2(n)$  bits.

• aka,  $2^{(\text{# of bits})} \geq n$



• if  $S = 00$ ,  $Y = A$ . if  $S = 01$ ,  $Y = B$ ,  
if  $S = 10$ ,  $Y = C$ , if  $S = 11$ ,  $Y = D$ .

## Intro to MIPS

RECALL: What is Comp 311 about?

about? high-level prog. languages

## Converting

```
1 int square(int num)
2     return num * num
3 }
```

## Assembly, low-level language

## Machine Code

```
1  sequence $sp,sp,-8
2    addiu $sp,$sp,-8
3    au $tp,(4*$sp)
4    move $tp,$sp
5    addiu $sp,$sp,+8
6    lw $t1,($tp)
7    nop
8    mult $t1,$t2
9    mtc $t2
10   move $sp,$tp
11   lw $tp,(4*$sp)
12   addiu $sp,$sp,+8
13   jr $t1
14   nop
```

Inty →

۱۰

15

(Easy for humans to read/write)

(human readable)

(Machine feedable)

What is MIPS?

- Microprocessor without Interlocked Pipeline Stages
  - A computer architecture & assembly language.
    - Historically used in routers, embedded systems, gaming consoles (eg PlayStation)
    - A little old / outdated today.
  - All assembly langs are very similar, but MIPS is the easiest to learn. Once you know MIPS, you can pick other langs up very easily.
  - **X86** : Developed by Intel, commonly used in servers, desktop & laptop computers.

What are some other computer architectures?

- All Macbooks from 2005-2021 use **x86\_64** architecture on their 64-bit CPUs.
- ARM: Used in most smartphones & tablets.
  - All Macbooks after 2021 use the **arm64** architecture.
- A set of rules that define the interface between the hardware and software providing a way for the SW to tell the hw what to do.
  - A full vocabulary that combines instructions with registers, addressing models, and data types
- Every ISA is specific to a **processor architecture**; for ex, the processor architecture assoc. w/ MIPS is RISC.
- RECALL: COMP 211 notes: "Compilation System overview", "Compiler steps".

- ALU Unit -

What is the ALU Unit in a  
MIPS processor?

- A unit that performs ALL the ALU operations, on a 32-bit input.
    - This is the thing that we built in Lab 2!
  - The ALU Unit employs a 5-bit ALUOp Control Signal Input that tells it which operation to perform



What are the control signals associated with each operation?

| AW Operation | Function         |
|--------------|------------------|
| 0b 00000     | AND              |
| 0b 00100     | OR               |
| 0b 01000     | XOR              |
| 0b 01100     | NDR              |
| 0b 00001     | add              |
| 0b 10001     | sub              |
| 0b 00011     | set on less than |



\* assume all registers are connected to the same clock.

What is the "set on less than" (SLT) operation?

→ Performs the operation "A < B".

- If  $A < B$ , ALU Result = 1. If  $A \geq B$ , ALU Result = 0.

How does the ALU work?

→ It performs one operation per cycle.

→ For each cycle, we must set 3 inputs: A, B, and ALUDP.

### MIPS Processor

RECALL: What is a register?

→ an operand which has to do with memory/hardware

→ MIPS defines 32 general purpose registers, \$0 through \$31, each of which have their own meanings. Each one can hold a 32-bit value. \$0 always holds val=0.

What is the Register File?

→ A small piece of memory for storing intermediate results of computations.

→ stored in hardware (aka the registers)

→ Contains the 32 32-bit registers, \$0 to \$31:



What is the MIPS processor?

→ A piece of logic that allows us to perform dependent operations!

• e.g.  $a = 5 + 4$ ,  $b = 7 + 8$ , and  $c = a + b$

→ How? By combining the RegFile with the ALU Unit so that we can read from and write to registers.

How do we read from the register file?



→ Set the "read reg." inputs to the 5-bit val. of the number of the register you want to read.

→ The value stored at that register will appear on the corresponding output.

Example of reading from the Register File?



How do you write to the Register File?



- Set the "write register" to the num. of the register where you want to store data.
- Set the "write data" input to the value you want to store.
- RegWrite = a control signal that we set to 1 when writing, and 0 when not writing.

→ When RegWrite = 1, we can still simultaneously read data on output if we want to.

→ EX: Writing 52 to register \$17:



After the next rising edge of the clock, the RegFile will update:



### Putting it all Together

How are the Reg File & ALU unit combined in the MIPS processor?

Example?

- The "read data" outputs are given as inputs A and B to the ALU
- The output of the ALU operation is wired as the input to the "write data"!
- Let's perform  $\$4 = \$2 + \$3$ , where the vals stored at \$2 and \$3 are 5 and 6.



## R-Format Instructions

RECAP: What are the inputs & outputs of a MIPS processor?



How is assembly code turned into binary code?

What are R-format instructions?

What is the syntax of R-format instructions?

- An assembly instruction can be translated into a 32-bit binary string that can then be input into the MIPS processor!
- The 32-bit data is broken up into several "sections"
- Assembly instructions to perform an operation where the registers are the operands.
  - e.g., not adding a constant val, but rather 2 vals are stored at 2 registers.

**(Operation) [Rd], [Rs], [Rt]**

1. "operation": the operation being performed (e.g. AND, ADD, etc.)
2. \$rd: the destination register ; where the output of the operation will be stored.
3. \$rs, \$rt : the source & target registers whose stored values will be the inputs to the operation (e.g. "A" and "B")

→ EX:

|               |                         |                    |            |
|---------------|-------------------------|--------------------|------------|
| <u>add</u>    | <u>\$4</u>              | <u>\$2</u>         | <u>\$3</u> |
| the operation | ↓                       | ↓                  | ↓          |
|               | the destination operand | the source operand |            |

Field:

| opcode            | rs                | rt                | rd                | shamt            | funct           |
|-------------------|-------------------|-------------------|-------------------|------------------|-----------------|
| 31:26<br>(6 bits) | 25:21<br>(5 bits) | 20:16<br>(5 bits) | 15:11<br>(5 bits) | 6:10<br>(5 bits) | 5:0<br>(5 bits) |

→ The 5-bit binary nums of the 3 registers in the operation (rs, rt, rd) are stored in the "rs", "rt", and "rd" Fields.

→ They are specified by the ISA, and they tell the processor which operation to perform.

→ The values at opcode & funct are used to determine the value of the ALUOp control signal.

What are the "opcode" and "funct" fields?

What are the opcode & funct fields for common operations?

| Instruction | Opcode              | Funct                | ALUOp: |
|-------------|---------------------|----------------------|--------|
| add         | 000000 <sub>2</sub> | 10 0000 <sub>2</sub> | 00001  |
| sub         | 000000 <sub>2</sub> | 10 0010 <sub>2</sub> | 10001  |
| and         | 000000 <sub>2</sub> | 10 0100 <sub>2</sub> | 00000  |
| or          | 000000 <sub>2</sub> | 10 0101 <sub>2</sub> | 00100  |
| nor         | 000000 <sub>2</sub> | 10 0111 <sub>2</sub> | 01100  |
| slt         | 000000 <sub>2</sub> | 10 1010 <sub>2</sub> | 00011  |

Example converting assembly to binary?

→ EXAMPLE: the instruction add \$12, \$7, \$10

2) interpret the instruction and identify register Fields:

$$\begin{array}{c} \$12 = \$7 + \$10 \\ \downarrow \quad \downarrow \quad \downarrow \\ rd \quad rs \quad rt \end{array}$$

2) Fill out Fields:

| Opcode | rs    | rt    | rd    | shamt | funct  |
|--------|-------|-------|-------|-------|--------|
| 100000 | 00111 | 01010 | 01100 | 00000 | 100000 |

ANS: 0b1000 0000 1110 1010 0110 0000 0010 0000

→ The 32-bit inst. is taken as an input, and then split into substrings corresponding to the different fields!

| Field  | bits of inst |
|--------|--------------|
| funct  | 5:0          |
| shamt  | 10:6         |
| rd     | 15:11        |
| rt     | 20:16        |
| rs     | 25:21        |
| opcode | 31:26        |



How is the 32-bit r-instruction interpreted by the MIPS processor?

Summary: What are all of the R-format operations?

→ Syntax: OP \$rd, \$rs, \$rt , R[rd] = R[rs] OP R[rt]

• R[rd] denotes "RegFile[rd]", aka, the contents stored at register rd.

| NAME | OPERATION                          |
|------|------------------------------------|
| add  | $R[rd] = R[rs] + R[rt]$            |
| sub  | $R[rd] = R[rs] - R[rt]$            |
| and  | $R[rd] = R[rs] \text{ AND } R[rt]$ |
| or   | $R[rd] = R[rs] \text{ OR } R[rt]$  |
| nor  | $R[rd] = R[rs] \text{ NOR } R[rt]$ |
| slt  | $R[rd] = (R[rs] < R[rt]) ? 1 : 0$  |

If  $R[rs] < R[rt]$ ,  $R[rd] = 1$ . Else,  $R[rd] = 0$ .

## I-format Instructions

What are i-format instructions?

→ Where the instruction operands are a combination of a register and a 16-bit constant (an "immediate" operand)

→ Used when you want to perform an op between a register and an immediate value (i.e. a constant that is not in a register)

• As opposed to R-type instructions, which perform operations between 2 registers.

(operation) [ \$rt ], [ \$rs ], [ immediate ]

↑ result stored here

What is the syntax of an i-format instruction?

→ In i-format instructions, we need the last 16 bits of the inst. to be designated for storing the constant val, so we don't have an rd field. Instead, we store the result in the reg specified by rt. We write to rt.

→ Ex:  $\$5 = \$4 + 12 \rightarrow \text{addi } \$5, \$4, 12$

→ Syntax:  $R(rt) = R(rs) \text{ OP (immediate)}$

→ Using i-format instructions with the source operand register \$0, since it always holds value = 0 !

→ Ex:

$a = 5 \rightarrow \text{addi } \$10, \$0, 5 \quad (\$10 \text{ now holds var } a)$   
 $b = 3 \rightarrow \text{addi } \$11, \$0, 3 \quad (\$11 \text{ holds var } b)$   
 $c = a + b \rightarrow \text{add } \$12, \$10, \$11 \quad (\$12 \text{ holds var } c = a + b)$

What are the fields of the 32-bit i-instruction?

| Field: | Opcode | rs    | rt    | immediate |
|--------|--------|-------|-------|-----------|
| Bits:  | 31:26  | 25:21 | 20:16 | 15:0      |

(6 bits) (5 bits) (5 bits) (16 bits)

Example converting assembly → binary?

→ The rd, shamt, and funct fields are excluded to make room for the immediate value.

→ INST: addi \$5, \$0, 2

| Opcode | rs    | rt    | immediate           |
|--------|-------|-------|---------------------|
| 001000 | 00000 | 00101 | 0000 0000 0000 1100 |

• ANS: 0b 0010 0000 0000 0101 0000 0000 0000 1100

What are the opcodes for i-format inst. operations?

| Instruction | Opcode               | ALU Op |
|-------------|----------------------|--------|
| addi        | 00 1000 <sub>2</sub> | 00001  |
| andi        | 00 1100 <sub>2</sub> | 00000  |
| ori         | 00 1101 <sub>2</sub> | 00100  |
| slti        | 00 1010 <sub>2</sub> | 00011  |

What will be input into the ALU for an i-instruction?

- The 32-bit value stored at \$rs, aka R[rs], aka "Read Data 1"
- The 16-bit value specified for the immediate, 15:0 ... EXCEPT, we need to bit-extend it to 32 bits!



→ There are 2 ways to extend the immediate value:

- \* Zero-extension
- \* Sign-extension

→ DEFN: pad immediate with 16 zeroes at the beginning.

$$\text{immediate (16)} = A_{15} A_{14} A_{13} \dots A_0$$

$$\text{immediate (32)} = 0000000000000000A_{15} A_{14} A_{13} \dots A_0$$

→ USE: For logical operations, where we do not want to preserve the decimal value of a binary # whose MSB=1 (aka a negative number)

- \* andi and ori operations will use a zero-extended immediate value.

→ DEFN: pad immediate with 16 zeroes if MSB is 0, or 16 ones if MSB=1

$$\text{immediate (16)} = A_{15} A_{14} A_{13} \dots A_0$$

$$\text{immediate (32)} =$$

$$A_{15} A_{15} A_{14} A_{13} \dots A_0$$

(where  $A_{15}$  = MSB)

→ USE: for arithmetic operations, where we want to preserve the decimal value.

- \* addi and slli will use a sign-extended immediate value.

| NAME | OPERATION                                               |
|------|---------------------------------------------------------|
| addi | $R[rt] = R(rs) + \text{sign\_ext(immediate)}$           |
| andi | $R[rt] = R(rs) \text{ AND zero-ext(immediate)}$         |
| ori  | $R[rt] = R(rs) \text{ OR zero-ext(immediate)}$          |
| slli | $R[rt] = (R(rs) < \text{sign\_ext(immediate)}) ? 1 : 0$ |

→ DG Datapath for r-instructions:



→ If executing an i-instruction, there are 3 key differences:

- \* The 5-bit value denoting which register will store the result of the operation (aka the Write Reg) will be the \$rt instead of the \$rd.
- \* The 2nd 32-bit input to the ALU will be the immediate instead of  $R(rt)$ , aka "Read Data 2".
- \* The 16-bit immediate value will have to be either 0- or sign-extended before being input to the ALU.

What is zero-extension and when should we use it?

What is sign extension and when should we use it?

What are all of the i-format operations?

How does the datapath to the MIPS Processor change for i-instructions?

How do we modify the MIPS processor hardware to execute both R and I instructions?

1) Add a 2:1 mux with a control signal that takes as inputs:

- The 5-bit value of **\$rd** (aka bits 15:11, aka the write register for r-instructions), AND
  - The 5-bit value of **\$rt** (aka bits 20:16, aka the write register for i-instructions)
- And uses a **RegDst** control signal to determine which value will get sent to the Write Register input.

2) Add a 2:1 mux that takes as inputs:

- The 32-bit value **R[rt]** (aka "Read Data 2" aka the 2<sup>nd</sup> operand for r-instructions, AND
- The 32-bit value **Extended\_immediate** (aka [extension + bits 15:0] aka the 2<sup>nd</sup> operand for i-instructions)

And uses an **ALUSrc** control signal to determine which value will get sent to the second ALU input.

3) A piece of logic that takes the **immediate** value (aka bits 15:0) as an input, and outputs the extended 32-bit imm. value.

- It has an **ExtType** control signal that specifies whether the value should be zero- or sign- extended.





Summary: Control Signals for all r & i inst operations?

→ Whenever we are doing a boolean or logic operation (and, or, nor, xor, add, sub, slt),  $\text{RegWrite} = 1$

→ R-instructions:

- $\text{ALU Src} = 0$  (taking  $R[rt]$  as the 2<sup>nd</sup> source operand)
- $\text{RegDst} = 1$  (storing the result in \$rd)
- $\text{ExtType} = X$  (Don't care) (no value that requires extension)

→ I-instructions.

- $\text{ALU Src} = 1$  (taking bits 15:0 (extended), aka imm., as 2<sup>nd</sup> source operand)
- $\text{RegDst} = 0$  (storing the result in \$rt)
- $\text{ExtType} = 0$  for boolean operations
  - andi
  - ori
  - nori
- $\text{ExtType} = 1$  for logic operations
  - addi
  - slti

# MIPS Programming

What are Labels?

Example?

- A way to "mark" sections of assembly code that you may want the program to be able to jump to or repeatedly execute. Like the beginning & end of a loop.
- EX: for-loops.

→ `loop.c:`

```
sum = 0;
for (i=0; i<5; i++) {
    sum += 2
}
sum = 10
```

→ `loop.o:`

- We'll store sum and i in \$10, \$8
- We store the value representing the "terminating point" of i, aka 5, in \$9

- We enclose the instructions in `loop.c`'s for-loop in a labeled:

`addi $10 $0 0`      } initialize sum, i, and terminating point  
`addi $8 $0 0`  
`addi $9 $0 5`

`LoopStart:` → Label "LoopStart"

`addi $10 $10 2` → sum = sum + 2

`addi $8 $8 1` → i++ (i = i + 1)

`LoopEnd:` → Label "LoopEnd"

`addi $10 $10 10` → sum = sum + 10

How do we actually utilize the labels?

What are branch instructions?

What are all of the branch instructions?

- With branch instructions that check whether the terminating condition (in this case  $i \geq 5$ ) has been reached!

- Instructions that evaluate some condition. If the condition is true, they tell the program to "branch" to a different spot in the code. Specifically, they define a label that should be branched to if the condition is true.

• EX: If  $i = 5$ , end the for-loop & go to the label for the code to be executed after the loop ends.

| Instruction                | Format                           | Meaning: Branch to Label if... |
|----------------------------|----------------------------------|--------------------------------|
| Branch on equal            | <code>beq \$rs \$rt Label</code> | $\$rs == \$rt$                 |
| Branch on not equal        | <code>bne \$rs \$rt Label</code> | $\$rs != \$rt$                 |
| B.D. greater than          | <code>bgt \$rs \$rt Label</code> | $\$rs > \$rt$                  |
| B.D. greater than or equal | <code>bge \$rs \$rt Label</code> | $\$rs \geq \$rt$               |
| B.D. less than             | <code>blt \$rs \$rt Label</code> | $\$rs < \$rt$                  |
| B.D. less than or equal    | <code>ble \$rs \$rt Label</code> | $\$rs \leq \$rt$               |

→ Note that for Branch instructions, RegWrite = 0!

Example loop.0 using branch instructions?

addi \$10 \$0 0  
addi \$8 \$0 0 → i = 0  
addi \$9 \$0 5 → terminate pt = 5

LoopStart:

beq \$8 \$9 LoopEnd → if R[\$8] = R[\$9] - aka, if i=5, immediately  
branch to the LoopEnd label to continue executing inst.  
addi \$10 \$10 2  
addi \$8 \$8 1  
j LoopStart → This is a jump instruction that tells the program to  
LoopEnd:  
addi \$10 \$10 10 → jump to the line labeled LoopStart; aka, jump back to  
beg. of loop after executing the body.

What are Native Instructions?

→ Instructions that are supported by the datapath (the MIPS hardware diagram thing)  
AKA inst. that have actual hardware support.  
• Includes: all the instructions we learned up until now  
• Includes: beq, bne

What are Pseudo Instructions?

→ Ones that aren't supported by the datapath; they only exist to make programs easier to read & write.  
→ When the program is assembled, pseudo instructions are converted to one or more native instructions.

• Includes: bgt, bge, blt, ble

...  
ble \$8 \$9 LoopEnd  
...

jump to "LoopEnd" if \$8 ≤ \$9

is converted  
to

...  
slt \$1 \$9 \$8  
beq \$1 \$0 LoopEnd  
...

if \$9 < \$8, let \$1 = 1.  
else, let \$1 = 0.

if \$1 = 0, aka \$9 ≥ \$8,  
branch to LoopEnd.  
else, don't branch.

### - Register Usage -

What is register 0 (\$0) for?

→ ALWAYS holds the value 0. Cannot be written to.

→ Useful for when you need the value 0 (like initializing variables with addi \$X \$0 0)

→ Used by the assembler to resolve pseudo instructions.

→ You CAN write to this register, but the assembler may override the value when it needs to resolve a pseudo-inst.

• Shouldn't rely on \$1 to store values you might need later.

What is register 1 (\$1) for?

## MIPS Memory Model

RECALL: What is the structure of a computer's main memory?

- Data stored in a "primary storage" component like RAM, accessible to the CPU via a BUS (communication system between diff pieces of hardware)
- Unlike Register data, which is stored in the register file which is actually located in the CPU.
- 1. Stack: Used to manage function calls & local variables.
  - STACK grows down
- 2. Dynamic Data aka Heap: Used for dynamic memory allocation.
  - HEAP grows up
- 3. Static Data: Variables allocated at compile-time.
  - e.g., statically allocated arrays.
- 4. Text: Programs (aka actual code).
- 5. Reserved memory for the OS



How does Memory compare to the Register File?

- Memory:
  - large
  - takes longer to access
  - NOT part of the CPU
- Register File:
  - small
  - much, much, much faster to access
  - part of the CPU

→ We need MM b/c we can't store all of our data in the RF

→ We store values/data that we are currently working with in the RF.

→ **lw**; used to read 4 bytes (aka 1 word) of data from memory and store it in a register.

→ **lw \$rt offset(\$rs)**

→ **\$rs**: a register that (already) holds a memory address.

→ **offset**: an immediate value that is added to **R(rs)** to obtain the address

• e.g., **lw \$rt 4(\$rs)** ≈ loading a word from  $4 + [\text{addr stored at } \$rs]$

→ **\$rt**: Where we store the 4 byte (32-bit) data that we read from addr  $(\text{offset} + R(rs))$

→  **$R[rt] = M[R[rs]] + \text{offset}$** , where "M" means Main Memory.

What is the **load word** instruction?

What do \$rt, offset, and \$rs mean in the lw inst.?

Formula for the lw instruction?

Example of the **lw** instruction?

- Let  $\$10 = 0x00004000$ , and let  $A = [5, 90, 100, 40, 11]$  be an int array stored at base addr.  $0x00004000$  (aka  $A[0]$ ). The size of an int is 4 bytes.
- Ex: **lw \$11 8(\$10)** will store the value 100 in register **\$11**.
- $\$10$  holds  $0x00004000$  & offset = 8, so we want to read the data at address  $0x00004000 + 8 = 0x00004008$
- Since  $\text{sizeof(int)} = 4$  bytes,  $0x00004008$  holds the 3<sup>rd</sup> element of  $A$ .

What is the **store word** instruction?

What do **srt**, **offset**, and **rs** mean in the **sw** inst.?

What is the formula?

Example using **sw**?

How do we store arrays in memory?

Example of using **.data** to store data in static memory?

How do we actually get a memory address into a register?

What is the syntax of the **la** inst.?

Example using **la**?

- **sw**: Used to store 4 bytes (1 word) of data from the RF into memory!
- **rs**: A register that (already) holds a memory address.
- **offset**: An immediate value that is added to **R(rs)** to obtain the address where we want to store the data.
- **srt**: Holds the data/value that we want to store at the MM address.
- $M[R[rs] + offset] = R[srt]$

→ Let  $\$10 = 0x00004000$ ,  $\$7 = 8$ , and let  $A = [5, 90, 100, 40, 11]$  be an int array stored at base addr.  $0x00004000$  (aka  $A[0]$ ).

→ Ex: **sw \$7 12(\$10)** will modify array **A** into:  
 $A = [5, 90, 100, 8, 11]$

→ We store statically allocated arrays in the static data segment (below the heap).  
→ To store ANY data in the static data segment, we use the **.data** directive in our assembly code.

The name of the array



→ E.g., the value stored in **rs** for load-word & store-word instructions.  
→ Ans: With the **load address** instruction, **la**.

**la \$rt Label**

- Gets the start address of the array/variable specified by **Label** and stores it in **srt**.
- For an array, **la** gets the start address of the array.



How do we store data in the **text** segment of memory?

Example of using the **.text** directive?

→ Using the **.text** directive

→ We will store our actual assembly code in the **text** segment, using the **.text** directive.



→ Let **arr = [10, 15, 100, 2, 4]**

**ex1.o**

**.data**

**arr: .word 10 15 100 2 4**

**.text**

**la \$10 arr**

**addi \$8 \$10 20**

**LoopBegin:**

**beq \$10 \$8 LoopEnd**

**lw \$6 0(\$10)**

**addi \$6 \$6 1**

**sw \$6 0(\$10)**

**addi \$10 \$10 4**

**j LoopBegin**

**LoopEnd:**

• Store base addr of arr in **\$10**

• set **\$8** to be the "terminating point" of the loop; since there are 5 elements in arr, once we've incremented **\$10** 4 times, we will have looped through all elements & are finished.

end the loop (by jumping to "LoopEnd") when

**R[\$10] = R[\$8]**

**let **\$6** store value of element at addr **\$10****

**increment the element by 1**

**Store the new value at **\$6** back in its spot (addr **\$10**)**

**increment the "curr-address" by 4 to point to the next element in arr.**

What is the **Static instruction count**?

→ The number of native instructions that a program has.

→ AKA; if you convert every pseudo-instruction into a native one, & then count up all the lines of assembly code written. Labels don't count.

→ Ex: **ex1.o** has 9 static instructions.

• every inst. is native except **la**

• **la \$6 arr** resolves to → **lui \$1, 4D97**  
**ori \$10, \$1, 0**

→ How much space the program takes up in memory.

What does the static inst. count tell us?

What is the dynamic instruction count?

- The # of native instructions that actually get executed.
- E.g., if there is a for-loop that has 5 inst. & 3 iterations, the dynamic inst. count ( $DIC$ ) =  $5 \times 3 = 15$
- ex1.o has  $3 + 6(5) + 1 = 34$  dynamic instructions.
- Gives us an idea of the runtime.
- The DIC & SIC are 2 metrics to look at when trying to improve program efficiency.

What does the DIC tell us?

How can we reduce the instruction count of ex1.o?

|                                                                                                                                                                                                                                       |                                                                                                                                                                                                          |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <p>ex1.o</p> <pre>.data arr: .word 10 15 100 2 4 .text la \$10 arr addi \$8 \$10 20 LoopBegin:     beg. \$10 \$8 LoopEnd     lw \$6 0(\$10)     addi \$6 \$6 1     sw \$6 0(\$10)     addi \$10 \$10 4     j LoopBegin LoopEnd:</pre> | <p>ex1.o</p> <pre>.data arr: .word 10 15 100 2 4 .text la \$10 arr addi \$8 \$10 20 LoopBegin:     lw \$8 0(\$6)     addi \$8 \$8 1     sw \$8 0(\$6)     addi \$6 \$6 4     bne \$6 \$7 LoopBegin</pre> |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

the assembler always uses \$1  
for storing temp. values when  
receiving instructions.

What are some pseudo instructions & their translations?

- `beq $rs imm Label` → `addi $1 $0 imm`  
`beg. $rs $1 Label`
- `bge $8 $9 Label` → `slt $1 $8 $9` → \$1 will equal 1 if 8 > 9, and 0 if 8 ≥ 9  
`beg. $1 $0 Label`
- `ble $9 $14 Label` → `slt $1 $9 $14`  
`beg. $1 $0 Label`

What is the load immediate instruction?

→ Sets \$rd to the immediate value:

`li $rd immediate`

$R[rd] = \text{immediate}$

What is the move instruction?

→ Copies the value of \$rs into \$rd:

`move $rd $rs`

$R[rd] = R[rs]$

Example using these?

→ `addi $8 $0 4` → `li $8 4`

→ `addi $7 $6 0` → `move $7 $6`

## Load Word and Store Word: Encoding & Hardware Support

RECALL: What are the "load word" and "store word" instructions?

How are lw and sw instructions encoded in binary?

What "operation" is being done for lw and sw?

How do we extend the datapath to support lw and sw?

→ LOAD WORD: lw \$rt immediate (\$rs)

- Store the data value at mem. address [ $\$rs + imm$ ] in  $\$rt$

→ STORE WORD: sw \$rt immediate (\$rs)

- Store the data inside  $\$rt$  at mem. address [ $\$rs + imm$ .]

→ Same as i-format!

| Field: | opcode | rs    | rt    | immediate |
|--------|--------|-------|-------|-----------|
| Bits:  | 31:26  | 25:21 | 20:16 | 15:0      |

(6 bits) (5 bits) (5 bits) (16 bits)

- lw: opcode = 100011

- sw: opcode = 101011

→ Computing the address, aka  $R[\$rs] + \text{SignExtend}(\text{immediate})$

- We can use the ALU for this!

- Since it is addition, always sign extend the immediate.



→ From the current DP, we can obtain 3 things:

- $R[\$rt]$ , aka the value to place in memory for sw instructions
- $\$rt$ , aka the register to store the mem. data for lw instructions
- $R[\$rs] + imm$ , aka the memory address in question

→ To perform operations involving memory, we add a new component & new control signals!

What is the extended Datapath?



What is the **MemRead** control signal?

→ Tells us if we have to read from memory

- **lw** = reading from Mem = **MemRead = 1**
- **sw** = Writing to MEM = **MemRead = 0**

→ When **MemRead = 1**, the "Memory unit" obtains the value at "Address" in MainMem, and places it in the "Read Data" slot that gets output

→ Tells us if we have to write to memory

- **lw** : **MemWrite = 0**
- **sw** : **MemWrite = 1**

→ When **MemWrite = 1**, the memory unit takes the data placed in "Write Data" & puts it in Main Mem. at the address.

→ Tells us what output to send to the **Write Register**: the output of the memory unit processes — aka the data from some addr. in MM — OR the output of the **ALU**.

• For add, sub, etc. ops we learned so far, we don't care about the memory unit, so **MemtoReg = 0**

• For **lw**, **MemtoReg = 1**

What is the **MemtoReg** control signal?

# The Program Counter (PC)

What is the Stored Program Concept?

- RSCAU: Instructions (like `add $8 $7 $6`) are stored in memory as 32-bit binary numbers.
- Since inst. are 32 bits, each take up 4 bytes in memory. In the Text section.
- The instructions for a given program are stored in subsequent memory addresses that increment by 4.

How does the code that we write get "resolved" for execution?

Example of the Stored program concept?

notice that address increments by 4

- 1) Pseudo-instructions are converted into native ones
- 2) Addresses for labels and offsets get resolved.

Address in memory where the inst. is stored → machine code representation of the inst. → Assembled program (code that is actually executed) → original program (the code that you wrote)

| Address    | Code       | Basic           | Source                 |
|------------|------------|-----------------|------------------------|
| 0x00003000 | 0x20060000 | addi \$b,\$0,0  | la \$6 arr             |
| 0x00003004 | 0x10c70014 | addi \$7,\$6,20 | addi \$7,\$6,20        |
| 0x00003008 | 0x8cc68000 | lw \$8,0(\$6)   | lw \$8,0(\$6)          |
| 0x0000300C | 0x10880001 | addi \$8,\$8,1  | addi \$8,\$8,1         |
| 0x00003010 | 0xacc80000 | sw \$8,0(\$6)   | sw \$8,0(\$6)          |
| 0x00003014 | 0x20c60004 | addi \$6,\$6,4  | addi \$6,\$6,4         |
| 0x00003018 | 0x14c7ffff | bne \$6,\$7,-5  | bne \$6,\$7, LoopBegin |

What is the program counter (PC)?

- A register that holds the address of the instruction being currently executed!
- Holds the 32-bit address in a 32-bit register

→ The PC is not inside the register file.

→ For EX, when executing `addi $8 $8 1` from the prog. above,  $PC = 0x00003000$

→ Our datapath executes one instruction per clock cycle. So on every clock cycle, the datapath will increment the PC value by 4 to fetch the next instruction.

→ The PC is what automates the process of sending 32-bit instructions as inputs to our data path!



How do we use the PC to do this?

- The PC holds the address of the instruction that should be executed in. On every clock cycle, we need to do the following :
- 1) Increment  $PC = PC + 4$
  - 2) Send the PC val into an "instruction memory" unit to extract the 32-bit instruction at that address
  - 3) Send the 32-bit instruction into our datapath

How do we add the PC and  
Instruction Memory to our datapath?



# Branch Instruction Encoding

How do we encode branch instructions in binary?

→ In i-format!      beg rs rt Label

→ For ex., beg \$10 \$11 LoopBegin

Field:

Bits:

| Opcode | rs    | rt    | immediate |
|--------|-------|-------|-----------|
| 000100 | 01010 | 01011 | ???       |

| Instruction | Opcode |
|-------------|--------|
| beg         | 000100 |
| bne         | 000101 |

What is the target instruction?

What is the "byte offset to target instruction"?

How do you generate the value of the immediate to encode a branch instruction?

→ The instruction that the program will jump to if the branch is taken

→ The distance, in bytes, between the current branch instruction itself, and the target instruction.

→ The byte offset will always be a multiple of 4, which means that the last 2 bits of the byte offset will always be 0. Because 0x0, 4, 8, C, etc. all have 00 at the end.

→ We actually calculate the byte offset relative to the next instruction, e.g. PC+4.

1. Obtain the value of PC + 4 - aka, the next inst. below the branch instruction, b/c the PC is currently on the branch instruction.

2. Compute the byte offset between PC+4 and the target instruction (the inst. that we would branch to)

$$\text{offset} = (\text{target addr}) - (\text{PC} + 4)$$

3. Since each inst. are 4 addresses away from each other, the byte offset is always a multiple of 4, and its last 2 bits are 0. So we remove the last 2 bits.

$$\text{immediate} = (\text{target addr} - (\text{PC} + 4)) \gg 2$$

EX: beg \$4 \$5 elseif



How do we compute the branch target address given the immediate?

→ EX: The instruction 0x15320007 translates to:

| Opcode | rs    | rt    | immediate           |
|--------|-------|-------|---------------------|
| 000101 | 01001 | 10010 | 0000 0000 0000 0111 |

→ Assume the current PC = 0x0000301C

1. Extract the immediate value: 0b 0000 0000 0000 0111

2. Sign-extend it to 30 bits: 0b 00 0000 0000 0000 0000 0000 0000 0111

3. Append 2 0s to the end of it: 0b 0000 0000 0000 0000 0000 0000 0001 1100

4. Add PC + 4 to it: 0x 0000001C + 0x 0000301C

$$\begin{array}{r} \text{The branch} \\ \text{target address} \\ \hline 0x 0000303C \end{array}$$

this syntax means concatenate

$$\text{Branch Target Addr} = \text{PC} + 4 + \{ \text{30-bit sign-ext imm.}, 0b00 \}$$

## - Datapath Support for Branching -

How do we obtain the branch target address?

→ RECALL: to generate the branch target address:

1. Sign Extend the imm. value to 30 bits
2. Append 2 0s
3. Add it to PC + 4

→ We can achieve step 2 by left-shifting the 30-bit imm. by 2!

How do we do this on the datapath?



What do we do if we shouldn't branch upon a branch inst.?

→ The branch target addr. = the value we should set PC to **IF** we want to branch.

→ If not, Set **PC = PC + 4**, like usual

→ We will need a mux to make this decision:



How will we set the control signal for the mux?

→ Using a piece of logic that interprets the **beq** and **bne** branch instructions.

→ This logic will output 1 to the Mux if we should branch, and 0 otherwise.

→ Using 2 new 1-bit control signals to indicate them:

- **beq**: 1 if inst. is beq, 0 otherwise
- **bne**: 1 if inst. is bne, 0 otherwise

→ **beq \$rs \$rt Label**: if **\$rs = \$rt**, we want to branch.

→ **bne \$rs \$rt Label**: if **\$rs ≠ \$rt**, we want to branch.

→ SUMMARY: Output 1 to the Mux control signal IF:

- **beq = 1 AND \$rs = \$rt** OR
- **bne = 1 AND \$rs ≠ \$rt**

How do we implement this logic?

→ Using the Z-Flag/Zero Output of the ALU!!.

- RECALL: The ALU has a 1-bit "Zero Output" that is set to 1 whenever the ALU Output is 0.

→ If we set the ALUOp to perform subtraction, the Z-Flag will tell us whether  $srs = srt$

- If  $srs = srt$ , ALU Output =  $srs - srt = 0$  and ZeroOutput = 1

- If  $srs \neq srt$ , ALU Output =  $srs - srt \neq 0$  and ZeroOutput = 0



How do we implement all of this logic on the whole datapath?



① Uses beq & bne control signals to determine whether we should jump

② Computes the branch target addr. using the immediate.

# Jump Instructions

RECALL: What are jump instructions?

How are jump instructions encoded in binary?

What is the problem?

How can we reduce the size of the address to 30 bits?

How do we get rid of the last 4 extra bits?

Why do we do this?

What is the JumpAddress given a binary j-instruction?

Given a program with a jump inst., how do we compute the binary encoding of the partial address?

→ An instruction that tells the program to jump to an address (specified by the label) to continue execution.

j Label

\*aka, set PC to the address of Label

→ In a new instruction format called j-format:

|        |                      |
|--------|----------------------|
| opcode | partial address (PA) |
| 31     | 26 25                |

opcode = 000010

→ We can only fit a 26-bit address in the instruction, but the PC needs a 32-bit address.

→ We can chop off/not store the last 2 bits, since the last 2 bits of any instruction address are always 0 — due to word alignment:

|            |      |
|------------|------|
| 0b0...0000 | (0)  |
| 0b0...0100 | (4)  |
| 0b0...1000 | (8)  |
| 0b0...1100 | (12) |
| 0b0...1110 | (24) |

→ By limiting the range of address values we can jump to

→ When setting the PA field of a jump instruction, we will chop off the top 4 bits of the target address.

→ When constructing the new PC Addr given the PA field of a jump instruction, we will set the top 4 bits to be the top 4 bits of (PC+4).

→ Setting the top 4 bits to be those of (PC+4) gives us an address range that is nearby to where we are currently located.

• To jump outside this range, we'd have to use a different instruction.

JumpAddr = { (PC+4)[31:28], partial\_address, 0b00 }  
 bits 31:28 of PC+4      bits 25:0 of j-instr

→ Ex, where the program begins at address 0x00003000:

```

1 .data
2
3 A:    .word 8, 9, 14, 15
4
5 .text
6
7     la $5 A      # address of A
8     addi $6 $0 0 # sum
9     addi $7 $0 4 # size of array
10    addi $8 $0 0 # loop counter
11
12 LoopBegin:
13     beq $8 $7 LoopEnd # exit loop when we reach the end of the array
14     sll $9 $8 2      # compute byte offset
15     add $9 $9 $5      # compute address of A[i]
16     lw $9 0($9)      # get the value of A[i]
17     add $6 $6 $9      # sum += A[i]
18     addi $8 $8 1      # i++
19     j LoopBegin      # jump back to start of array
20 LoopEnd:
  
```

Address:

|            |
|------------|
| 0x00003000 |
| 0x00003004 |
| 0x00003008 |
| 0x0000300C |
| 0x00003010 |

|            |
|------------|
| 0x00003010 |
| 0x00003014 |
| 0x00003018 |
| 0x0000301C |
| 0x00003020 |
| 0x00003024 |
| 0x00003028 |

1. Determine the address we want to jump to:

The address of the first inst. after the label → 0x00003010

2. Convert it to binary: 0b0000 0000 0000 0000 0011 0000 0001 00

3. Remove top 4 bits & bottom 2 bits: 0b0000 0000 0000 0011 0000 0001 00

ANS:

|        |                                  |
|--------|----------------------------------|
| 000010 | 0000 0000 0000 0011 0000 0001 00 |
| 31     | 26 25                            |

0

**SUMMARY:** Given a 32-bit jump instruction, how do we calculate the jump address - aka, the address we have to set PC to?

→ Ex: Instruction 0x08000300, where PC = 0x01001000

|        |    |    |                      |    |    |    |    |    |    |    |    |    |    |    |
|--------|----|----|----------------------|----|----|----|----|----|----|----|----|----|----|----|
| 00     | 00 | 10 | 00                   | 00 | 00 | 00 | 00 | 00 | 00 | 11 | 00 | 00 | 00 | 00 |
| opcode |    |    | partial-address (PA) |    |    |    |    |    |    |    |    |    |    |    |

1. Extract the partial address bits

0b 00 0000 0000 0000 0011 0000 0000

2. Shift the value left by 2 (in order to append 2 zeroes)

0b 00 0000 0000 0000 0011 0000 0000 00

3. Concatenate the upper 4 bits of (PC+4)

PC+4 = 0x01001004

(PC+4)[31:28] = 0b0000

ANS: jump address = 0b 0000 0000 0000 0000 0000 1100 0000 0000

### - Hardware Support for jump instructions -

How do we perform the process described above, with our hardware?

1. Use a splitter to extract the PA bits of the 32-bit instruction.

2. Use a shift logic unit to append 2 0s by left shifting by 2

3. Use a concatenate logic unit to concatenate the upper 4 bits of (PC+4)



How do we add this logic on the datapath?

→ Using a new 1-bit jump control signal:



What are all of the control signal values for a **jump** instruction?

| Control Signal | Value |
|----------------|-------|
| RegDst         | X     |
| Jump           | 1     |
| beq            | X     |
| bne            | X     |
| MemRead        | 0 *   |
| MemtoReg       | X     |
| ALUOp          | XXXXX |
| MemWrite       | 0 *   |
| Ausrc          | X     |
| RegWrite       | 0 *   |
| ExtType        | X     |

These 3 should never be "don't care's". If we aren't writing to a reg, or reading or writing from MM, set these to 0.

What are all of the control signal values for a **beq** instruction?

| Control Signal | Value       |
|----------------|-------------|
| RegDst         | X           |
| Jump           | 0           |
| beq            | 1           |
| bne            | 0           |
| MemRead        | 0 *         |
| MemtoReg       | X           |
| ALUOp          | Subtraction |
| MemWrite       | 0 *         |
| Ausrc          | 0           |
| RegWrite       | 0 *         |
| ExtType        | 1           |

Why? Because we want to perform  $\$rs - \$rt$ .

What is the range of addresses we can jump to for a jump inst.?

→ For the given PC at the time of inst., we can jump to any address that has the same top 4 bits as current PC!

• Ex: If  $PC = 0x00700000$ , our range is  $0x00000000$  to  $0xFFFFFFF0$  (not  $0xFFFFFFFF$  b/c of word alignment). That's a total of ~65,273,855 addresses!

What is the range of addresses we can jump to for a branch inst.?

→ If  $PC = 0x00700000$ , range is  $0x006E0004$  to  $0x00710000$

## Function Calls

What is the calling convention?

- A scheme for how functions receive arguments & return values
- **Caller**: A function that calls another function
- **Callee**: A function called by another function
- Before making the function call, the caller places the parameter args in registers  
**\$a0 - \$a3 (4-7)**

What is the protocol for callers and callees?

How does the callee know where to return when it finishes execution?

So how does a caller actually call a function?

- When finished, the callee places the return values in registers **\$r0 - \$v1 (2-3)**
- **Return Address**: the address of the inst. where the callee will return.
- When the caller calls the callee, they place the ret. addr. in a special register (**\$ra = \$31**) so the callee knows where to return.
- Using the **jump-and-link (JAL)** instruction! To jump to a function.
- Calling a function means 2 things on the 'hardware' side:
  1. Setting PC to the address of the function being called
  2. Setting reg **\$ra** to the address we want the program to resume executing after the function is over.
- Syntax: **jal Label**

What does the JAL instruction do?

- Sets the PC to be the address of "Label" —aka the callee's label
- AND automatically sets register **\$ra** to be the return address.  
"return addr = address of the inst. in main right after the JAL instruction"

How does the callee return execution to the caller when its done?

→ Using the **jump register** instruction!

**jr \$ra**

sets the PC to be the address value stored at **\$ra**

| Number | Name   | Uses                                                                                                                        |
|--------|--------|-----------------------------------------------------------------------------------------------------------------------------|
| \$0    | \$zero | • Always holding the value 0                                                                                                |
| \$1    | \$at   | • Resolving pseudo instructions                                                                                             |
| \$2    | \$v0   | • Where callee places return values<br>• Where we place the "operation number" when performing <b>syscall</b>               |
| \$3    | \$v1   | • Where the OS places value read from user input after <b>syscall</b>                                                       |
| \$4    | \$a0   | • Where caller places function parameter args<br>• Where we place the value we want to print when performing <b>syscall</b> |
| \$5    | \$a1   | • }                                                                                                                         |
| \$6    | \$a2   | • } Where caller places function parameter args                                                                             |
| \$7    | \$a3   | • }                                                                                                                         |
| \$31   | \$ra   | • Stores return address upon a function call                                                                                |

Example program performing  
a function call?

→ C program:

```
int sum3(int a, int b, int c) {  
    return a + b + c;  
}  
  
int main() {  
    int a = 2;  
    int b = 5;  
    int c = 8;  
  
    int y = sum3(a, b, c);  
  
    printf("%d\n", y);
```

→ Assembly program (MIPS):

PC Address

| PC Address | Text               |                                                                                           |
|------------|--------------------|-------------------------------------------------------------------------------------------|
| 0x00003000 | main:              |                                                                                           |
| 0x00003004 | addi \$a0 \$0 1    | store int a, b, and c in the argument registers a0, a1, a2                                |
| 0x00003008 | addi \$a1 \$0 5    |                                                                                           |
| 0x0000300C | addi \$a2 \$0 8    |                                                                                           |
| 0x00003010 | jal sum3           | jump to Label sum3 AND set \$ra to return instruction addr (PC+4): \$ra = 0x00003010      |
| 0x00003014 | addi \$v0 \$v0 0   | We know that the return value y is stored in \$v0. Move it to \$a0 (\$a4) so we can print |
| 0x00003018 | addi \$v0 \$0 1    | Set \$v0 (\$a2) to 1 to indicate "print an integer from \$a4"                             |
| 0x0000301C | syscall            |                                                                                           |
| 0x00003020 | addi \$v0 \$v0 10  | Set \$f2 to 10 for "exit program".                                                        |
| 0x00003024 | syscall            | NECESSARY b/c otherwise prog will keep executing the stuff below                          |
| 0x00003028 | sum3:              |                                                                                           |
| 0x0000302C | add \$v0 \$a0 \$a1 | add the values in all arg. registers and store ans in \$v0                                |
| 0x00003030 | add \$v0 \$v0 \$a2 |                                                                                           |
|            | jr \$ra            | return to main; resume execution at PC = \$ra = 0x00003010                                |

## Shift Instructions

What is the "shift left logical" instruction?

Why is SLL useful?

Example?

- **SLL \$rd \$rt shamt**
  - Shift the contents of \$rt to the LEFT by shamt & store result in \$rd
  - $R[rd] = R[rt] \ll shamt$
- For computing the byte offset when we need to index an array!
- RECALL:  $x \ll y = x \cdot 2^y$
- Given an array, if we want to access index #b, we shift b left by 2 and add the result to our base address!

```
.data
arr: .word 1 5 10 15 20
.text
addi $a1 $0 3           → we want arr[3]
la $5 arr              → stores address of arr[0] in $5
sll $a1 $a1 2           → $a1 = 3 << 2 = 3 * 4 = 12, the byte offset
add $6 $5 $a1            → stores addr of arr[0] + 12 = addr of arr[3] in $6
lw $7 0($6)             → $7 = arr[3] = 15
```

What are the 2 right shift instructions?

→ Shift right logical:

- Pad left side with 0s

$$R[rd] = R[rt] \gg shamt$$

→ Shift right arithmetic:

- Pad left side with MSB to preserve sign

# Storing Variables on the Stack

When is the register \$ra not enough to help return from function calls?

→ When we have nested function calls.

→ **RCAU**: \$ra is used to store the return address when a function call is made using `jal`.

→ Ex:

| Addr | .data                                                             |
|------|-------------------------------------------------------------------|
|      | hello: .asciz "Hello "                                            |
|      | world: .asciz "world!\n"                                          |
|      | text                                                              |
|      | main:                                                             |
| 0    | jal print_helloWorld → 1) jump to print_helloWorld & set \$ra =   |
| 4    | jal print_hello → PC + 4 = 4                                      |
| 8    | li \$v0 10                                                        |
| 12   | syscall                                                           |
|      | print_helloWorld:                                                 |
| 16   | jal print_hello → 2) jump to print_hello & set \$ra = PC + 4 = 20 |
| 20   | jal print_world → 4) jump to print_world & set \$ra = PC + 4 = 24 |
| 24   | jr \$ra → 6) jump to address \$ra = 24 ????                       |
|      | print_hello:                                                      |
| 28   | la \$a0 hello                                                     |
| 32   | li \$v0 4                                                         |
| 36   | syscall                                                           |
| 40   | jr \$ra → 3) Finished w/ print_hello. Jump to addr. \$ra = 20     |
|      | print_world:                                                      |
| 44   | la \$a0 world                                                     |
| 48   | li \$v0 4                                                         |
| 52   | syscall                                                           |
| 56   | jr \$ra → 5) Finished w/ print_world. Jump to addr. \$ra = 24     |

What has happened in this example?

→ At step 6, we are ready to return to main, as it was the one that called print\_helloWorld

• We expected `jr $ra` to take us to  $PC = 20$

→ BUT, at this point `$ra` was overwritten, so we have lost the return address to main and are stuck infinitely jumping to address 24 !!!

→ By saving the ret. addr. somewhere else before we overwrite it. Saving it to another register is not a sustainable solution b/c it can still get overwritten.

→ Instead, we will store the RA on the stack.

How can we fix this problem?

When would we save the RA on the stack?

- When the callee begins, save the value of \$ra to the stack.
- When the callee is ready to return, get the val. from the stack & place it back into \$ra, before the jr \$ra instruction.

```
print-helloWorld:  
    # Save $ra=20 on stack  
    jal print-hello  
    jal print-world  
    # Restore ra=20 from stack  
    jr $ra
```

→ Now, step 6 (from Sx on prev page) will take us back to main by setting PC=20!

### - The stack pointer -

RECALL: What is the stack?

- An area of memory used to store temporary function data, such as local variables.
- The stack grows DOWN (high → low mem. addresses)

What is the stack pointer?

- A special register \$sp (which is actually \$29) that is used to point to the last item that was placed on the stack.
- \$sp holds the address of the most recently placed val on the stack.



How do we store a new item on the stack?

Example?

- The stack grows down, so we must decrement the addr. value stored in \$sp by 4 to move to a new spot on the stack.

① Current stack: \$sp = 0x00001034

and points to the last val added, 12. \$sp → 0x00001034

| Address    | Data |
|------------|------|
| 0x00001030 | 12   |
| 0x0000102C |      |
| 0x00001028 |      |
| 0x00001024 |      |

→ To store a new item, the value of \$ra:

1) Decrement sp by 4

2) store val in memory

addi \$sp \$sp -4

sw \$ra 0(\$sp)

② Updated stack:

| Address           | Data       |
|-------------------|------------|
| 0x00001034        | 12         |
| \$sp → 0x00001030 | 0x00003004 |
| 0x0000102C        |            |
| 0x00001028        |            |
| 0x00001024        |            |

How do we remove an item from the stack?

- "Remove" aka to fetch back the value we stored
- When we want to restore a val from the stack, we simply
  - 1. read it into a register (with `lw`)
  - 2. and then move the stack pointer back up
- We don't have to "clear" the value from memory; by moving `sp` back up, we've "unallocated" that stack space. B/c next time we want to store on the stack, we will just overwrite this value.

Example?

① Current stack: `$sp = 0x00001030`

② Read / "remove" / restore value of `ra`:

```
lw $ra 0($sp)
addi $sp $sp 4
```

③ Updated stack:

| Address    | Data       |
|------------|------------|
| 0x00001034 | 12         |
| 0x00001030 | 0x00003004 |
| 0x0000102C |            |
| 0x00001028 |            |
| 0x00001024 |            |

`$sp → 0x00001034`

| Address    | Data       |
|------------|------------|
| 0x00001034 | 12         |
| 0x00001030 | 0x00003004 |
| 0x0000102C |            |
| 0x00001028 |            |
| 0x00001024 |            |

### - Calling Convention with Function - Saved values -

RECALL: Why do we have a calling convention?

- Imagine each function in a program was written by a different person, & they didn't know how the other person wrote their func, or what registers they used.
- Then, we can't necessarily rely on a val. saved in a reg. to not be overwritten, so we must have rules/conventions for preserving all parts of the program (for ex, param args in `$a0-$a3` & return values in `$r0-$r1`)

→ If a callee overwrites caller-saved values! Ex: `main:`

```
addi $s1 $0 0
la $s2 array-1
la $s3 array-2
lw $a0 0($s2)
jal fun
```

fun overwrites  
the values in `$s1`  
and `$s2`!

`fun:`

```
addi $s1 $a0 1
sub $s2 $0 $a0
mul $r0 $s1 $s2
jr $ra
```

What is another example of when we might need to save values to the stack?

How can we fix this problem?

Example of the fix?

- By having the callee save the values in the caller-saved registers to the stack before overwriting them.
- Then, before returning to caller, the callee restores the values from the stack!
- The callee "fun":

```
Fun:  
addi $sp $sp -8 → allocate space to store 2 reg. values on the stack  
by decrementing $sp by 8.  
sw $s2 4($sp) → Store contents of $s2 at addr [spaddr + 4]  
sw $s1 0($sp) → Store contents of $s1 at addr [spaddr]  
  
addi $s1 $a0 1  
sub $s2 $0 $a0  
mul $v0 $s1 $s2  
  
lw $s1 0($sp)  
lw $s2 4($sp)  
addi $sp $sp 8 → move sp back upto "deallocate"  
  
jr $ra
```

Now, callee can use registers \$s1 and \$s2 w/o worrying

Before returning to the caller, restore the values of \$s1 and \$s2 by reading from the stack

Registers that the caller uses, BUT expects that the callee may overwrite them.

The callee CAN overwrite these registers w/o saving them

If caller intends to use them after a func. call, it is the caller's responsibility to save them.

Registers that the caller expects the callee to NOT overwrite - caller expects the vals in these registers to be saved.

If the callee wants to use these registers, it is the callee's responsibility to save their values to the stack and then later restore them.

| Registers | Name      | Type                        |
|-----------|-----------|-----------------------------|
| \$2-3     | \$v0-\$v1 | callee stores return values |
| \$4-7     | \$a0-\$a3 | caller stores param. args   |
| \$8-15    | \$t0-\$t7 | registers for callee to use |
| \$16-23   | \$s0-\$s7 | registers for caller to use |
| \$24-25   | \$t8-\$t9 | registers for callee to use |
| \$31      | \$ra      | return address pointer      |

If the caller is ALSO a callee, they are responsible for saving their own RA before calling another func.

What is a caller-saved register in the calling convention?

What are caller-saved registers?

What are the designated caller- and callee-saved registers in MIPS?