



# K-map / Karnaugh Map

## Definition of terms for two-level simplification

- Implicant** every group ที่มีผลลัพธ์เดียวกัน  $A \wedge ACB \rightarrow AB = \text{implicant}$ 
  - single element of ON-set or DC-set or any group of these elements that can be combined to form a subcube
- Prime implicant** ของ implicant
  - implicant that can't be combined with another to form a larger subcube
- Essential prime implicant** ที่ต้องใช้ prime implicant
  - prime implicant is essential if it alone covers an element of ON-set
  - will participate in ALL possible covers of the ON-set
  - DC-set used to form prime implicants but not to make implicant essential
- Objective:**
  - grow implicant into prime implicants (minimize literals per term)
  - cover the ON-set with as few prime implicants as possible (minimize number of product terms)

Example



- Sum-of-products**
  - AND gates to form product terms (minterms)
  - OR gate to form sum

$$\text{if term AND} : \text{True} \rightarrow \text{out put} = \text{True}$$



- Product-of-sums**

- OR gates to form sum terms (maxterms)
- AND gates to form product

$$\text{if term OR} : \text{False} \rightarrow \text{out put} = \text{False}$$



III - Working with

## การลดรูปโดยใช้ Karnaugh Map

- การลดรูป โดยใช้ Karnaugh Map มีวิธีการดังนี้

- ใช้ Minterm (1) หรือ Maxterm (0) ลงใน Karnaugh Map
- จับกลุ่มด้วยที่อยู่ติดกัน โดยมีหนลักษณะที่ว่า จับกลุ่มได้ครึ่งจะ 2^n ตัว คือ 1 ตัว หรือ 2 ตัว หรือ 4 ตัว หรือ 8 ตัว หรือ 16 ตัว ... (ก คือเลขจำนวนเต็ม บวก)
- การจับกลุ่มจะต้องจับกลุ่มให้ได้มากที่สุด เท่าที่ทำได้
- ด้วยที่ถูกจับกลุ่มไปแล้ว ก็สามารถนำไปจับกลุ่มกับด้วยอีกได้
- Karnaugh Map มีคุณสมบัติมีความโดยรอบ
  - ตัวริมข่าย จับกลุ่ม กับตัวริมขวา ได้
  - ตัวริมบน จับกลุ่ม กับตัวริมล่าง ได้



Digital Circuit Design

Karnaugh Map I

// K-map : ห้ามต่อไป จนกว่าจะรู้ว่า รูปแบบอย่างไร



$$\begin{aligned} & \text{implicant คือ } 1 \text{ term ที่ไม่ต้องมี input ที่ซ้ำกัน} \\ & \text{ในรูป คือ prime implicant} \\ & (A+B+C+\bar{D})(A+B+\bar{C}+\bar{D})(A+\bar{B}+\bar{C}+\bar{D}) \\ & [(A+B+\bar{D}) + (\bar{C}+\bar{D})F] \cdot [(A+\bar{B}+\bar{D}) + (\bar{C}+\bar{D})F] \\ & (A+\bar{B}+\bar{D}) \cdot (A+\bar{B}+\bar{D}) \\ & A+\bar{D} \end{aligned}$$

$$\text{จำนวน cube} = 4$$

## Convert using OA/AND gate $\rightarrow$ NAND/NOR gate



- AOI function:** three stages of logic — AND, OR, Invert

$\rightarrow$  AND OR invert gate logical concept

↳ บล็อกวงจรที่มี AND, OR, Invert



possible implementation  
implement in AOI function



2x2 AOI gate symbol

3x2 AOI gate symbol



- Multiplexers/selectors: general concept

$\rightarrow \text{input} + \text{Selector} = \text{output}$  : Selector เลือก index ของ input ที่ต้องไป output  
 $\rightarrow$  ร่างร่าง Select & output ของ process ของ output ของ process  
 $\rightarrow \forall (\text{data bit} / \text{input}) = \text{data bit} / \text{output}$   
 $\rightarrow \text{input} = 2^n \text{ m} \rightarrow \text{Selector} = n \text{ bits}$



- Example: // 8 input  $\rightarrow$  Selector  $\rightarrow$  2 input  $\rightarrow$  Sector

$$\begin{aligned} & F(A, B, C) = m_0 + m_2 + m_6 + m_7 \\ & \text{variable} = A'B'C' + A'BC' + ABC' + ABC \\ & = A'B(C') + A'B(C') + AB'(0) + AB(1) \end{aligned}$$



|   | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | C  | F |
|---|---|---|---|---|---|---|---|---|----|---|
| 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | C' | 0 |
| 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | C' | 1 |
| 2 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | C' | 0 |
| 3 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | C' | 1 |
| 4 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0  | 0 |
| 5 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0  | 1 |
| 6 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1  | 0 |
| 7 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1  | 1 |

IV - Combinational Logic Technologies  
© Copyright 2004, Gaetano Borriello and Randy H. Katz

## Decoders/demultiplexers: general concept

- Selector Selects [what output is equal to input] [else is default value]
  - when give input we process in process like as if value = input else value = default
  - $A(Data\ bit / I\ output) = Data\ bit \oplus (input)$
  - $out\ out = 2^n \text{ bits} \rightarrow \text{Selector} = n \text{ bits}$



# PROM & PAL & PLA // with product of sum



| Feature     | Programmable Logic                       | Fixed Logic                                    |
|-------------|------------------------------------------|------------------------------------------------|
| Flexibility | Reconfigurable for different tasks       | Optimized for one specific function            |
| Cost        | More expensive for smaller volumes       | Cheaper for large-scale production             |
| Performance | Slightly slower due to reprogrammability | Faster, optimized for specific tasks           |
| Power       | Higher power consumption                 | Lower power consumption                        |
| Use Cases   | Prototyping, dynamic systems             | High-volume, performance-critical applications |

**Figure 1.** Basic configurations of (a) PROM, (b) PAL, (c) PLA

- ROM – full AND plane, general OR plane
    - ❑ cheap (high-volume component)
    - ❑ can implement any function of n inputs
    - ❑ medium speed
  - PAL – programmable AND plane, fixed OR plane
    - ❑ intermediate cost
    - ❑ can implement functions limited by number of terms
    - ❑ high speed (only one programmable plane that is much smaller than ROM's decoder)
  - PLA – programmable AND and OR planes
    - ❑ most expensive (most complex in design, need more sophisticated tools)
    - ❑ can implement any function up to a product term limit
    - ❑ slow (two programmable planes)

| Feature              | PAL (Programmable Array Logic)                          | PLA (Programmable Logic Array)                               |
|----------------------|---------------------------------------------------------|--------------------------------------------------------------|
| Structure            | Fixed AND array, programmable OR array                  | Fully programmable AND and OR arrays                         |
| Flexibility          | Less flexible due to fixed AND array                    | More flexible with programmable AND and OR arrays            |
| Complexity           | Simpler design, lower complexity                        | More complex, allowing for more customized logic             |
| Programming          | Only the OR array is programmable                       | Both AND and OR arrays are programmable                      |
| Speed                | Typically faster due to simpler design                  | Slower due to additional complexity and flexibility          |
| Power Consumption    | Lower power consumption                                 | Higher power consumption due to more programmable elements   |
| Cost                 | Less expensive                                          | More expensive                                               |
| Logic Implementation | Can implement simpler logic functions                   | Can implement more complex logic functions                   |
| Typical Use          | Used for simpler logic functions and faster performance | Used for more complex designs with greater logic flexibility |
| Reconfigurability    | Limited reconfigurability due to fixed AND array        | High reconfigurability with fully programmable logic arrays  |



ROM

- Two dimensional array of 1s and 0s
    - entry (row) is called a "word"
    - width of row = word-size
    - index is called an "address"
    - address is input
    - selected word is output



# Problem: 7 segment / sensor / calendar

BCD to 7-segment display controller problem show 10-6



08234  
56889

$$\begin{aligned} C0 &= A + B D + C + B' D' \\ C1 &= C D' + C D + B' \\ C2 &= B + C' + D \\ C3 &= B' D' + C D' + B C' D + B' C \\ C4 &= B' D' + C D' \\ C5 &= A + C' D' + B D' + B C' \\ C6 &= A + C D' + B C + B' C \end{aligned}$$

Truth table

| A | B | C | D | C0 | C1 | C2 | C3 | C4 | C5 | C6 |
|---|---|---|---|----|----|----|----|----|----|----|
| 0 | 0 | 0 | 0 | 1  | 1  | 1  | 1  | 1  | 1  | 0  |
| 0 | 0 | 0 | 1 | 0  | 1  | 1  | 0  | 0  | 0  | 0  |
| 0 | 0 | 1 | 0 | 1  | 1  | 0  | 1  | 1  | 0  | 1  |
| 0 | 0 | 1 | 1 | 1  | 1  | 1  | 1  | 0  | 0  | 1  |
| 0 | 1 | 0 | 0 | 0  | 1  | 1  | 0  | 0  | 1  | 1  |
| 0 | 1 | 0 | 1 | 1  | 0  | 1  | 1  | 0  | 1  | 1  |
| 0 | 1 | 1 | 0 | 1  | 0  | 1  | 1  | 1  | 1  | 1  |
| 0 | 1 | 1 | 1 | 1  | 1  | 1  | 0  | 0  | 0  | 0  |
| 1 | 0 | 0 | 0 | 1  | 1  | 1  | 1  | 1  | 1  | 1  |
| 1 | 0 | 0 | 1 | 1  | 1  | 1  | 0  | 0  | 1  | 1  |
| 1 | 0 | 1 | - | -  | -  | -  | -  | -  | -  | -  |
| 1 | 1 | - | - | -  | -  | -  | -  | -  | -  | -  |

PLA



## Position of sensors

- A to B distance = specification - 5%
- A to C distance = specification + 5%



problem size obj

Truth table

- show don't cares

| A | B | C | Function   |
|---|---|---|------------|
| 0 | 0 | 0 | do nothing |
| 0 | 0 | 1 | do nothing |
| 0 | 1 | 0 | do nothing |
| 0 | 1 | 1 | do nothing |
| 1 | 0 | 0 | too short  |
| 1 | 0 | 1 | don't care |
| 1 | 1 | 0 | in spec    |
| 1 | 1 | 1 | too long   |

logic implementation now straightforward just use three 3-input AND gates

"too short" =  $AB'C'$   
(only first sensor tripped)

"in spec" =  $A B C'$   
(first two sensors tripped)

"too long" =  $A B C$   
(all three sensors tripped)

## Calendar subsystem

problem

- Determine number of days in a month (to control watch display)
- used in controlling the display of a wrist-watch LCD screen
- inputs: month, leap year flag
- outputs: number of days

## Use software implementation to help understand the problem

```
Integer number_of_days (month, leap_year_flag) {
    switch (month) {
        case 1: return (31);
        case 2: if (leap_year_flag == 1)
                    then return (29)
                    else return (28);
        case 3: return (31);
        case 4: return (30);
        case 5: return (31);
        case 6: return (30);
        case 7: return (31);
        case 8: return (31);
        case 9: return (30);
        case 10: return (31);
        case 11: return (30);
        case 12: return (31);
        default: return (0);
    }
}
```



## Discrete gates

- $28 = m^8 m^4 m^2 m^1' \text{leap}'$
- $29 = m^8 m^4 m^2 m^1' \text{leap}$
- $30 = m^8 m^4 m^1' + m^8 m^1$
- $31 = m^8 m^1 + m^8 m^1'$

| month | leap | 28 | 29 | 30 | 31 |
|-------|------|----|----|----|----|
| 0000  | -    | -  | -  | -  | -  |
| 0001  | -    | 0  | 0  | 0  | 1  |
| 0010  | 0    | 1  | 0  | 0  | 0  |
| 0011  | 1    | 0  | 1  | 0  | 0  |
| 0100  | -    | 0  | 0  | 1  | 0  |
| 0101  | -    | 0  | 0  | 0  | 1  |
| 0110  | -    | 0  | 0  | 1  | 0  |
| 0111  | -    | 0  | 0  | 0  | 1  |
| 1000  | -    | 0  | 0  | 0  | 1  |
| 1001  | -    | 0  | 0  | 1  | 0  |
| 1010  | -    | 0  | 0  | 0  | 1  |
| 1011  | -    | 0  | 0  | 1  | 0  |
| 1100  | -    | 0  | 0  | 0  | 1  |
| 1101  | -    | -  | -  | -  | -  |
| 111-  | -    | -  | -  | -  | -  |

leap year  
 $\Rightarrow 4 \cdot 4 = 0 \quad k \cdot 100 \neq 0 \quad \& \cdot 400 = 0$

## Activity: divisible-by-4 circuit

X BCD coded year  
Y8 Y4 Y2 YM1 - YH8 YH4 YH2 YH1 - YT8 YT4 YT2 YT1 - YO8 YO4 YO2 YO1

- BCD coded year
  - Y8 Y4 Y2 YM1 - YH8 YH4 YH2 YH1 - YT8 YT4 YT2 YT1 - YO8 YO4 YO2 YO1
- Only need to look at low-order two digits of the year  
all years ending in 00, 04, 08, 12, 16, 20, etc. are divisible by 4
  - if tens digit is even, then divisible by 4 if ones digit is 0, 4, or 8
  - if tens digit is odd, then divisible by 4 if the ones digit is 2 or 6.

- Translates into the following Boolean expression  
(where Y1 is the year's tens digit low-order bit,  
Y08 is the high-order bit of year's ones digit, etc.):

$$\begin{aligned} &YT1' (Y8' YO4' YO2' YO1' + Y8 YO4' YO2' YO1' + YO8 YO4' YO2' YO1') \\ &+ YT1 (Y8' YO4' YO2' YO1' + Y8 YO4' YO2' YO1') \end{aligned}$$

- Digits with values of 10 to 15 will never occur, simplify further to yield:

$$YT1' YO2' YO1' + YT1 YO2' YO1'$$

## Divisible-by-100 and divisible-by-400 circuits

100  
Divisible-by-100 just requires checking that all bits of two low-order digits are all 0:

$$YT8' YT4' YT2' YT1' \cdot YO8' YO4' YO2' YO1' \Rightarrow XX00$$

400  
Divisible-by-400 combines the divisible-by-4 (applied to the thousands and hundreds digits) and divisible-by-100 circuits

$$(YM1' YH2' YH1' + YM1 YH2 YH1') \rightarrow AB00 \wedge AB \cdot 4 = 0$$

$$\cdot (YT8' YT4' YT2' YT1' + YO8' YO4' YO2' YO1') \Rightarrow XX00$$

## Combine 4&100&400

Label results of previous three circuits: D4, D100, and D400

$$\begin{aligned} \text{leap\_year\_flag} &= D4 (D100 \cdot D400') \\ &= D4 \cdot D100' + D4 \cdot D400 \\ &= D4 \cdot D100' + D400 \end{aligned}$$

## Implementation of leap year flag



# Number system

① Signed & magnitude

↳ last bit sign bit  $\oplus 0$ : +

↳ another bit = value bit

pro: easy implement

con: hard subtraction, lost 1 bit ( $1000 = 0000$ )



② 1's complement

$\oplus \rightarrow 0 \oplus 0 \oplus \oplus \oplus$

toggle All bit

$$N' = (2N-1) - N \rightarrow X-N = X+N' + X + (2N-1)-N$$

pro: easy subtraction

con: hard implementation, lost 1 bit ( $1111 = 0000$ )



③ 2's complement

$\oplus \rightarrow 0 \oplus 0 \oplus \oplus \oplus$

toggle All bit  $\rightarrow +Value(1)$

$$N' = 2N - N \rightarrow X-N = X+N = X+(2N)-N$$

pro: easy subtraction, use all bit

con: hard implementation



Can't ignore it completely // 2's complement

needed to check for overflow (see next two slides)

When there is no overflow, carry-out may be true but can be ignored

- M + N when N > M:

$$M^* + N = (2n - M) + N = 2n + (N - M)$$

ignoring carry-out is just like subtracting 2n

- M + N where N + M  $\leq 2n-1$

$$(-M) + (-N) = M^* + N^* = (2n - M) + (2n - N) = 2n - (M + N) + 2n$$

ignoring the carry, it is just the 2s complement representation for  $-(M + N)$

|          |         |
|----------|---------|
| 0 1 1 1  | 1 0 0 0 |
| 5        | 1 0 0 1 |
| 3        | 0 0 1 1 |
| -8       | 1 1 1 0 |
| overflow | 0 1 1 1 |

origin next  $\xrightarrow{+8}$  -8  
carry in next cycle  
overflow next cycle

## Adder

① Adder 1 : 4-bit adder with a borrow

- Use an adder to do subtraction thanks to 2s complement representation
- $A - B = A + (-B) = A + B' + 1$
- control signal selects B or 2s complement of B



truth table for full Adder

| Ai | Bi | Cin | Sum | Cout |
|----|----|-----|-----|------|
| 0  | 0  | 0   | 0   | 0    |
| 0  | 0  | 1   | 1   | 0    |
| 0  | 1  | 0   | 1   | 0    |
| 0  | 1  | 1   | 0   | 1    |
| 1  | 0  | 0   | 1   | 0    |
| 1  | 0  | 1   | 0   | 1    |
| 1  | 1  | 0   | 0   | 1    |
| 1  | 1  | 1   | 1   | 1    |

pro: least transistor

cons: last value sum & carry slow because  $(sum \& carry)_i = input + carry_{i-1}$

② Adder 2 : 4-bit adder with carry notation

- Carry generate:  $G_i = A_i \oplus B_i$
- must generate carry when  $A = B = 1$  in function carry i

- Carry propagate:  $P_i = A_i \oplus B_i$
- carry-in will equal carry-out here

- Sum and Cout can be re-expressed in terms of generate/propagate:

$$S_i = A_i \oplus B_i \oplus G_i$$

$$= P_i \oplus G_i$$

$$C_i+1 = A_i B_i + A_i G_i + B_i G_i$$

$$= A_i B_i + C_i (A_i + B_i)$$

$$= A_i B_i + C_i (A_i \oplus B_i)$$

$$= G_i + C_i P_i$$

$$= G_i + C_i (A_i \oplus B_i)$$

$$= G_i + C_i + A_i B_i$$

$$= G_i + C_i + P_i$$

$$= G_i + C_i + (A_i \oplus B_i)$$

$$= G_i + C_i + G_i$$

$$= C_i + 1$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

$$= C_i + P_i$$

$$= C_i + (A_i \oplus B_i)$$

## Sequential circuit: circuit with feed back

Output = function( inputs, { past inputs, memory? } )

### ① simple circuit with feedback



$\Rightarrow$  if  $Q_{\text{old}} \rightarrow Q_{\text{new}}$  with stored value  $\bar{Q}$

If  $Q_{\text{old}} = 1$  then load required  $\rightarrow Q_{\text{new}} = 0$   $\Rightarrow$  if  $Q_{\text{old}} = 0$  then reverse value

if  $Q_{\text{old}} = 1$  then start end  $\Rightarrow$  start  $\Rightarrow$  end

| remembered | closed                             | opened                               |
|------------|------------------------------------|--------------------------------------|
| closed     | Value = $\text{load}(\text{now})$  | Value = $\text{load}(\text{now})$    |
| opened     | Value = $\text{load}(\text{last})$ | Value = 0 ( $\bar{Q}_{\text{old}}$ ) |

Latch: memory in this clock

flip flop: memory in this clock

### ② SR latch: easy implement but data changing all the time

Cross-coupled NOR gates Implemented by NOR

- similar to inverter pair, with capability to force output to 0 (reset=1) or 1 (set=0) (not used)



X.1: hard to understand  
(inversion)  
easy to implement real circuit

X.2: easy to understand  
(inversion)  
hard to implement real circuit

Cross-coupled NAND gates Implemented by NAND

- similar to inverter pair, with capability to force output to 0 (reset=0) or 1 (set=0) (2)



X.2: easy to understand  
(inversion)

| S | R | $Q(t)$ | $Q(t+\Delta)$ |
|---|---|--------|---------------|
| 0 | 0 | 0      | 0             |
| 0 | 0 | 1      | hold          |
| 0 | 1 | 0      | reset         |
| 0 | 1 | 1      | 0             |
| 1 | 0 | 0      | 1             |
| 1 | 0 | 1      | 1             |
| 1 | 1 | 0      | X             |
| 1 | 1 | 1      | X             |

| S | R | $Q(t)$ | $S$ |
|---|---|--------|-----|
| 0 | 0 | X      | 1   |
| 1 | 0 | X      | 1   |



$S=0, R=0$

$\rightarrow$  NOR<sub>1</sub>, Output  $Q=1$  ( $\bar{R}=0$ )  $\rightarrow$  strong wire  $\bar{Q}$   
 $\wedge$  NOR<sub>2</sub>, Output  $Q'=1$  ( $\bar{S}=0$ )  $\rightarrow$  strong wire  $\bar{Q}'$   
 $\rightarrow$  NOR<sub>1</sub>,  $\bar{Q}$  wire  $\bar{\bar{Q}}' \rightarrow$  output  $Q=0+Q' \Rightarrow Q=Q'$   
 $\wedge$  NOR<sub>2</sub>,  $\bar{Q}'$  wire  $\bar{\bar{Q}}$   $\rightarrow$  output  $Q'=0+Q \Rightarrow Q'=Q$



$S=0, R=1$

$\rightarrow$  NOR<sub>1</sub>, Output  $Q=0$  ( $\bar{R}=1$ )  $\times$  NOR<sub>2</sub>, Output  $Q'=0$  ( $\bar{S}=0$ )  $\rightarrow$  strong wire  $\bar{Q}$   
 $\wedge$  NOR<sub>2</sub>, Output  $Q'=0$  ( $\bar{S}=0$ )  $\rightarrow$  strong wire  $\bar{Q}'$   
 $\rightarrow$  NOR<sub>1</sub>,  $\bar{Q}$  wire  $\bar{\bar{Q}}' \rightarrow$  output  $Q=0+Q' \Rightarrow Q=0$   
 $\wedge$  NOR<sub>2</sub>, Output  $Q'=0$



$S=1, R=0$

$\rightarrow$  NOR<sub>1</sub>, Output  $Q=1$  ( $\bar{R}=0$ )  $\times$  NOR<sub>2</sub>, Output  $Q'=0$  ( $\bar{S}=1$ )  $\rightarrow$  strong wire  $\bar{Q}'$   
 $\wedge$  NOR<sub>2</sub>, Output  $Q'=0$  ( $\bar{S}=1$ )  $\rightarrow$  strong wire  $\bar{Q}$   
 $\rightarrow$  NOR<sub>1</sub>,  $\bar{Q}'$  wire  $\bar{\bar{Q}} \rightarrow$  output  $Q=0+Q' \Rightarrow Q=0$   
 $\wedge$  NOR<sub>2</sub>, Output  $Q=0$



$S=1, R=1$

$\rightarrow$  NOR<sub>1</sub>, Output  $Q=0$  ( $\bar{R}=1$ )  $\times$  NOR<sub>2</sub>, Output  $Q'=0$  ( $\bar{S}=1$ )  $\rightarrow$  strong wire  $\bar{Q}$   
 $\wedge$  NOR<sub>2</sub>, Output  $Q'=0$  ( $\bar{S}=1$ )  $\rightarrow$  strong wire  $\bar{Q}'$   
 $\therefore Q=Q' \rightarrow$  bug



# FSM (Finite State Machine)

## Common FSM

Example problem: bit counter

① Create State diagram (problem)



② Create truth table (input: Each index of present State, Output: Each index of next state)

| present state | next state |
|---------------|------------|
| 0   000       | 001   1    |
| 1   001       | 010   2    |
| 2   010       | 011   3    |
| 3   011       | 100   4    |
| 4   100       | 101   5    |
| 5   101       | 110   6    |
| 6   110       | 111   7    |
| 7   111       | 000   0    |

③ Create K-map from truth table (1 output = 1 K-map)



N1 <= C1'  
N2 <= C1C2' + C1'C2  
<= C1 xor C2  
N3 <= C1C2C3' + C1'C3 + C2'C3  
<= (C1C2)C3' + (C1' + C2')C3  
<= (C1C2)C3' + (C1C2)C3  
<= (C1C2) xor C3

④ Create Circuit



→ output from K-map

⑤ Example problem: Shift register (insert new input in leftmost String 3 bits)

⑥ Truth table

| In | C1 | C2 | C3 | N1 | N2 | N3 |
|----|----|----|----|----|----|----|
| 0  | 0  | 0  | 0  | 0  | 0  | 0  |
| 0  | 0  | 0  | 1  | 0  | 0  | 1  |
| 0  | 0  | 1  | 0  | 0  | 0  | 1  |
| 0  | 1  | 0  | 0  | 0  | 1  | 0  |
| 1  | 0  | 0  | 0  | 1  | 0  | 0  |
| 1  | 0  | 1  | 0  | 0  | 1  | 0  |
| 1  | 1  | 0  | 0  | 0  | 1  | 1  |
| 1  | 1  | 1  | 0  | 1  | 0  | 1  |
| 0  | 0  | 0  | 1  | 0  | 0  | 0  |
| 1  | 0  | 0  | 1  | 0  | 0  | 1  |
| 1  | 0  | 1  | 0  | 1  | 0  | 0  |
| 1  | 1  | 0  | 0  | 1  | 1  | 0  |
| 1  | 1  | 1  | 0  | 1  | 1  | 1  |



⑦ Example problem: 2-bit up-down counter (2 inputs)

direction: D = 0 for up, D = 1 for down

count: C = 0 for hold, C = 1 for count

① State diagram



② Truth table

state input(C,D) next state

S1 S0 C D N1 N0

0 0 0 0 0 0

0 0 0 1 0 0

0 0 1 0 0 1

0 0 1 1 1 1

0 1 0 0 0 0

0 1 0 1 0 1

0 1 1 0 0 1

0 1 1 1 0 0

1 0 0 0 1 0

1 0 0 1 1 0

1 0 1 0 0 1

1 0 1 1 0 0

1 1 0 0 0 0

1 1 0 1 0 1

1 1 1 0 0 0

1 1 1 1 1 0

1 1 1 1 1 1

③ K-map

$$\begin{aligned} N_1 &= CS_1 \\ &+ CDS_0S_1' + CDS_0S_1 \\ &+ CD'S_0S_1' + CD'S_0 \\ &= CS_1 \\ &+ C(D'(S_1 \oplus S_0) + D(S_1 = S_0)) \end{aligned}$$

$$N_0 = CS_0' + C'S_0$$

④ Circuit



## Moore & Mealy: types of FSM

| Feature                   | Mealy Machine                                                                   | Moore Machine                           |
|---------------------------|---------------------------------------------------------------------------------|-----------------------------------------|
| Output Dependency         | Depends on current state and input                                              | Depends only on current state           |
| Output Association        | Associated with transitions                                                     | Associated with states                  |
| Generality                | More general; can require fewer states                                          | Less general; may require more states   |
| Response Speed            | Faster response to inputs (input change $\Rightarrow$ state change (immediate)) | Slower response; changes at clock edges |
| Implementation Complexity | More complex due to input dependency                                            | Simpler; outputs tied to states         |



① Example problem

Suitable abstract representation

- tabulate typical input sequences:
  - 3 nickels
  - nickel, dime
  - dime, nickel
  - two dimes
- draw state diagram:
  - inputs: N, D, reset
  - outputs: open chute
- assumptions:
  - assume N and D asserted for one cycle
  - each state has a self loop for N = D = 0 (no coin)



② Implement by moore

① State diagram



② truth table

| present state | inputs | next state | output | present state | inputs | next state | output |
|---------------|--------|------------|--------|---------------|--------|------------|--------|
| 0¢            | 0 0    | 0¢         | 0      | 0 0           | 0 0    | 0 0        | 0      |
|               | 1 0    | 5¢         | 0      | 0 1           | 0 1    | 1 0        | 0      |
|               | 1 1    | 10¢        | 0      | 1 1           | 1 1    | 1 1        | 0      |
| 5¢            | 0 0    | 5¢         | 0      | 0 1           | 0 1    | 10¢        | 0      |
|               | 1 0    | 10¢        | 0      | 1 1           | 1 1    | 15¢        | 0      |
| 10¢           | 0 0    | 10¢        | 0      | 0 1           | 0 1    | 15¢        | 0      |
|               | 1 0    | 15¢        | 0      | 1 1           | 1 1    | 20¢        | 0      |
| 15¢           | —      | 15¢        | 1      | 1 1           | 1 1    | 20¢        | 1      |

③ K-map

| D1  | Q1  | N |
|-----|-----|---|
| 0 0 | 1 1 | X |
| 0 1 | 1 1 | X |
| 1 0 | X X | 1 |
| 1 1 | 1 1 | Y |

| D0  | Q1  | N |
|-----|-----|---|
| 0 0 | 1 1 | X |
| 1 0 | 1 1 | X |
| 0 1 | X X | 1 |
| 1 1 | 1 1 | Y |

| Open | Q1  | N |
|------|-----|---|
| 0 0  | 1 0 | X |
| 0 1  | 0 1 | X |
| 1 0  | X X | 1 |
| 1 1  | 0 0 | Y |

$$D1 = Q1 + D + Q0 N$$

$$D0 = Q0' N + Q0 N' + Q1 N + Q1 D$$

$$OPEN = Q1 Q0$$

④ Circuit



## ① implement by moody

### ① State diagram



### ② truth table

| present state | inputs | next state | output |
|---------------|--------|------------|--------|
| Q1 Q0         | D N    | D1 D0      | open   |
| 0 0           | 0 0    | 0 0        | 0      |
|               | 0 1    | 0 1        | 0      |
|               | 1 0    | 1 0        | 0      |
|               | 1 1    | - -        | -      |
| 0 1           | 0 0    | 0 1        | 0      |
|               | 0 1    | 1 0        | 0      |
|               | 1 0    | 1 1        | 1      |
|               | 1 1    | - -        | -      |
| 1 0           | 0 0    | 1 0        | 0      |
|               | 0 1    | 1 1        | 1      |
|               | 1 0    | 1 1        | 1      |
|               | 1 1    | - -        | -      |
| 1 1           | - -    | 1 1        | 1      |

### ③ k-map

| Open | Q1  |
|------|-----|
| 0 0  | 1 0 |
| 0 1  | 1 1 |
| X X  | 1 X |
| 0 1  | 1 1 |

N

© Copyright 2004, G2

$$D0 = Q0'N + Q0N' + Q1N + Q1D$$

$$D1 = Q1 + D + Q0N$$

$$OPEN = Q1Q0 + Q1N + Q1D + Q0D$$

### ④ Circuit



## FSM Chart

### ① register :

• មិនត្រូវ value Y ពេលគាំទ្រ

• ដើម្បីសម្រាប់ការបញ្ចូនការយុទ្ធសាស្ត្រ

នៃការ Load X

X ↓



### ② Variable $q_i$ in FSM

• គឺជាបិន្ទុណិតនៃ code នៃរបៀប

• នឹង declare និងនិត្តន៍ register

ex ①  $x = \text{input}$



②  $X = \text{input}$

loop :

$$X = X - Y$$



① if  $q_i$  in FSM



// now control unit A: In=True inform to my State  $q_{out}$   
Q in out in each block

② Loop  $q_i$  in FSM : តើ If នៅក្នុងរាយការណ៍

វិធាននៃ Control unit នៃការ State ខ្លះ

④ នៅលើ output



// Selector=output comparator និង check this output

⑤ clear  $q_i$  in Control Unit Set State=000 (state start)

# Computer Architecture

①

- ISA = Instruction set Arch: tecture  
↳ set of instructions the computer's processor can execute.



- Moore's Law  
↳ The number of transistors on microchips  $\times 2$  every 2 years

②

- ① Performance =  $\frac{1}{\text{Execution time}}$
- ② Speedup ( $n$ ) =  $\frac{\text{Performance new (1/f)} - \text{Execution time (old)}}{\text{Performance old (1/f)} - \text{Execution time (new)}}$
- ③ Elapsed time (sumTime) = CPU time + I/O time + Memory time  
↳ Response time = max(minimum)

- Throughput : speed (Task / time)

- ↳ ex Unit • MIPS = millions of instructions per second
- FLOPs = Floating point Operations per second

- Benchmarks (measurements)

- ① Real Benchmark = real app
- ② Microbench mark = part of software
- ③ Kernels = specific operation
- ④ Toys benchmark = small programs (ex sorting)
- ⑤ Synthetic benchmarks = mimic behavior of real programs

- SPEC Benchmark Suites

- ↳ Standard Performance evaluation Cooperation (SPEC)

- Performance Comparison

$$\text{① Arithmetic Mean} = \frac{\sum x_i}{n}$$

$$\text{② Weighted Arithmetic Mean} = \frac{\sum (w_i x_i)}{n}$$

$$\text{③ Geometric Mean} = \sqrt[n]{\prod x_i}$$

$$\text{• CPU Time}$$

$$\text{① CPU Time} = \text{Instruction} \times (\text{PFI} \times \text{cycle Time})$$

clock per Instruction ( $\frac{1}{\text{instruction}}$ )

$$= (\text{Number of clock}) \times \text{cycle Time}$$

$$\text{② CPI}_{\text{average}} = \sum (\text{CPI} \times \text{Weight})$$

(instruction)

$$\text{③ Cycle Time (second)} = \frac{1}{\text{Clock Rate (MHz)}} \quad (T = \frac{1}{f})$$

- Amdahl's Law : Law of diminish return

$$\text{① Overall Speedup} = \frac{\text{old Speed}}{\text{new Speed}} = \frac{1}{\sum P_i} = \frac{1}{(1-P) + \frac{P}{n}}$$

$P$  (percent) = fraction of overall time     $\therefore \sum P_i = 100\%$

$P_i$  (rate) =  $r_i$  (the fraction of time spent in  $i$ th function)

- ③ CPU Architecture



- Load accA  
 $\rightarrow accA = accA + b$
- Store  $x \leftarrow accA$



- implement like postfix

- ③ Memory - Memory Architecture



$\text{ex MUL } k, a, b ; k = a * b$

- ④ Register - memory Architecture



$\text{ex LD } \$r_1, a ;$   
 $MUL \$r_1, b ; k_{r1} = \$r_1 * b = a * b$

- ⑤ Register - Register Architecture



$\text{ex LD } \$r_1, a ;$   
 $LD \$r_2, b ;$   
 $MUL \$r_3, \$r_1, \$r_2 ; k_{r3} = \$r_1 * \$r_2$

## Pros

- ① easy implement
  - ② easy compiler
  - ③ fewer instruction, Easy compiler
  - ④ easy encode, Good code density
  - ⑤ simple to encode, easy parallelism
- 2 Paradigms of CPU Architecture
    - ↳ CISC: powerful instruction (complex) ex Intel
    - ↳ RISC: basic for building block ex ARM (low E consume)

## Cons

- Bottleneck, Bad for parallelism, ↑ memory
- Bottle neck, Difficult for optimization
- ↑ memory traffic
- limited register
- ↑ instruction count, performance computer

## Memory

★ Little Endian easy implement Adding (Handle carry)

$\rightarrow [78 \ 56 \ 34 \ 12]$

o Low-order byte stores at the lowest address

$\rightarrow [12 \ 34 \ 56 \ 78]$

o Used by Intel

★ Big Endian easy compare & sign bit

o High-order byte stores at the lowest address

o Used by IBM PowerPC, Sun

★ Note that modern processor (ie. ARM from ARM6 onwards) have the option of operating in either little-endian or big-endian mode.

! • Unrestricted Alignment : no alignment order declare

↳ Pros : Software simple

↳ Cons : expensive logic, Bad Performance

! • Restricted Alignment : start index % size == 0

↳ Pros : burden is on the software (compiler)

↳ Cons : Good performance

• caller & callee : caller calls callee

↳ ex main(caller) call function Add(callee)

• handlers : handling of interrupts and exceptions

① Vector Table : Contains @ interruption/exception handler.

② Handler Table : Contains functions/blocks that handle the interruption/exception  
ex if an interrupt occurs, the vector table point to @ in Handler table.



## (1) Single cycle processor

- 1 instruction per 1 clock
- ∴ 1 clock must be longest for any instruction time.
- ∴ Sum time =  $\sum$  (max time instruction)

• ALU = Arithmetic Logic Unit



## (2) Multiple cycle processor

- ★ Instructions are explained with RTL.
- ★ ORI rt, rs, imm ; (I-type) ; Opcode 010000
  - R[rt]  $\leftarrow$  R[rs] | zero\_ext(imm);
  - PC  $\leftarrow$  PC + 4
- ★ ORUI ; (I-type) ; Opcode 010001
  - R[rt]  $\leftarrow$  R[rs] | zero\_pad(imm);
  - PC  $\leftarrow$  PC + 4
- ★ ADD rd, rs, rt ; (R-type)
  - R[rd]  $\leftarrow$  R[rs] + R[rt];
  - PC  $\leftarrow$  PC + 4
- ★ LW rt, imm(rs) ; (I-type)
  - Load word
  - R[rt]  $\leftarrow$  MEM[R[rs]] + sign\_ext(imm);
  - PC  $\leftarrow$  PC + 4
- ★ SW rt, imm(rs) ; (I-type)
  - Store word
  - MEM[R[rs]] + sign\_ext(imm)  $\leftarrow$  R[rt];
  - PC  $\leftarrow$  PC + 4
- ★ BEQ rs, rt, imm\*4 ; (I-type)
  - Branch if equal
  - If (R[rs] == R[rt]) then
  - PC  $\leftarrow$  PC + 4 + (sign\_ext(imm) \* 4)
  - else
  - PC  $\leftarrow$  PC + 4
- ★ JMP addr\*4 ; (J-type) ; Jump
  - PC  $\leftarrow$  (addr \* 4)

• new memory : size memory  $\times \frac{1}{speed}$



• new Data path : Add register in each step



## (3) & (7) pipeline

• pipeline = The idea is to start a new task as soon as the resource are available.



single cycle

Time = instruction  $\times$  cycle time

multiple cycle

Time = instruction  $\times$  CPI  $\times$  cycle time

pipeline

Time = (All tasks)  $\times$  cycle time

## Pipeline Hazards

- ① Structural hazard: Limited in resource  
 ↳ some component required at the same time.

Sol: ① Add more instances (e.g. ALU)

② separate memory



popular

③ Redesign/Add dummy



- ② Data hazard: dependency of work / data.

↳ required data but data not yet.

Sol: ① stall

② Re-design/Reschedule (Reorder Command)



Type ① Read after Write (RAW): Read in in



② Write after Write (WAW)



③ Write after Read (WAR): Read in out



! write = assign value

Read = use value / Arithmetic

- ④ Control hazard: don't know what to do next.

↳ In case jump. If work on jump → don't know preinstruction should be next-instruction / jump-instruction

Sol: ① stall

② Guess: work ↳ correct: no delay

incorrect: delay 1 clock

③ Reschedule

| Without delay slot         |                  | With delay slot            |  |
|----------------------------|------------------|----------------------------|--|
| (1) ADD \$r1, \$r2, \$r3   |                  | (2) BEQ \$r2, \$r3, BRANCH |  |
| (2) BEQ \$r2, \$r3, BRANCH | \$r4, \$r5, #100 | (1) ADD \$r1, \$r2, \$r3   |  |
| (3) ORI \$r4, \$r5, #100   |                  | (3) ORI \$r4, \$r5, #100   |  |
| BRANCH: SW \$r1, 100(\$r3) |                  | BRANCH: SW \$r1, 100(\$r3) |  |

# Flynn's Taxonomy $\Rightarrow$ 4 types of processor based on instruction stream, Data stream

Uniprocessor : Single instruction stream & Single data stream (SISD)

Vector processor : Single instruction stream & Multi data stream (SIMD)

Fault tolerant computing : Multi instruction stream & Single data stream (MISD)

Multi core : Multi instruction stream & Multi data stream (MIMD)



## ILP (Instruction level Parallelism)

↳ និងសារណ៍ “ចំណែន” ការសំរាប់អនុវត្តន៍យ៉ាវ

- Pipelining Processor ;  $T(n) = \frac{(k + (n-1))}{x} T_c$

(ILP = k)

\* instruction  
Time / clock duration  
state of pipeline

- Super Scalar ;  $T(n) = \frac{(k + \frac{(n-x)}{x})}{x} T_c$

(ILP = kx)

degree of superscalar

! SuperScalar may have out of order execute.  
will get WAR / WAW  
↳ Solve with using In order completion

- Super Pipeline ;  $T(n) = (K + \frac{(n-1)}{m}) T_c$

(ILP = km)

\* minorcycle



Tip: mix SuperScalar & SuperPipeline

- Super Pipeline SuperScalar ;  $T(n) = (K + \frac{(n-x)}{mx}) T_c$

(ILP = kmx)

! state នៃបន្ទូលនៅ ALU នៅ x degree



- VLIW (very long instruction word) ;  $T(n) = (k + \frac{(n-x)}{x}) T_c$

\* degree of VLIW

! state នៃបន្ទូលនៅ ALU នៅ x degree



Compiler  $\Rightarrow$  កំណត់នរណៈរបៀបរៀបចំ ILP នៃវា

- Small basic block : less Tom's for filling available slot.

- Large basic block : more Tom's for filling slot.

Loop Unroll  $\Rightarrow$  մի կողման օրուային շարադարձություն

| Normal Loop                                                                                                                                                                     | Unroll + Rescheduling                                                                                                                                                                                                                                                                                                                                                                                                                       |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <pre>LOOP:     LW    \$f0, \$r1 (stall)     ADD   \$f2, \$f0, \$f1 (stall)     SW    \$f2, \$r1     SUB   \$r1, \$r1, #4     BNE   \$r1, LOOP</pre> <p>8 cycles / Iteration</p> | <pre>LOOP:     LW    \$f0, 0(\$r1)     LW    \$f3, 4(\$r1)     LW    \$f5, 8(\$r1)     LW    \$f7, 12(\$r1)     ADD   \$f2, \$f0, \$f1     ADD   \$f4, \$f3, \$f1     ADD   \$f6, \$f5, \$f1     ADD   \$f8, \$f6, \$f1     SW    \$f2, 0(\$r1)     SW    \$f4, 4(\$r1)     SW    \$f6, 8(\$r1)     SW    \$f8, 12(\$r1)     SUB   \$r1, \$r1, #16     BNE   \$r1, LOOP</pre> <p>14 cycles / 4 Iteration<br/>= 3.5 cycles per iteration</p> |

! լից մ շե  
յ ր շ ա ռ է շ ա վ լ ե

## Dynamic scheduling (Hardware)

- Out of order execute.
- loop dynamic unroll
- Multiple instruction dynamic

## Static scheduling (Software/compiler)

- loop static unroll
- Multiple instruction static

## Chip with Multicore

- ↳ have multiple superscalar inside
- ↳ also be viewed as multiple pipes

## Thread level parallelism

- մետք սլոթ պահպան կամ լուսաց ցանց  $\Rightarrow$  ուն ինստրուկցիան ուն թեադ
- instruction in pipeline come from same thread
- մետքայի ժամանակում "Simultaneous Multitasking" (SMT)

## Cache : օյլու CPU

Խորհրդական համակարգություն կամ համակարգչային համակարգ  
 $\Rightarrow$  մ մեմ հիերարխի



## Memory Tech

- RAM (random access memory)  
(capacitor)
- DRAM (Dynamic random access memory)  
↳ high density, low power, slow, cheap  
(transistor)
- SRAM (static random access memory)  
↳ good for cache  
↳ low density, high power, expensive, fast  
\* operation on Power on
- Semi random access tech.



## Locality

- Temporal  $\rightarrow$  Time  $\sim$  9 នៃជាន់ទេ
- Spacial  $\rightarrow$  space  $\sim$  9 ស្រុក 9 ភូមិ

Hit : can find data in cache

Miss : can't find data in cache

Block : 1 ភូមិ និង ឱ្យបាន ចូល ឱ្យនៅ cache

## Directed mapped

### Set Cache in Memory



Given block size, cache size

block size  $\Rightarrow 2^{\text{word}}$   $\Rightarrow 2^4 \Rightarrow 16$

Cache size  $\Rightarrow 16 \times 4 \Rightarrow 64$

## Performance

- Mem access time  $\Rightarrow$  hit time + (miss rate  $\times$  miss penalty)
- CPI  $\Rightarrow$  ideal CPI + (Miss rate  $\times$  Miss Penalty)

## Source of miss

- (cold miss)
  - Compulsory miss  $\Rightarrow$  No data in cache
  - Capacity miss  $\Rightarrow$  cache size is too small
  - Conflict miss  $\Rightarrow$  have free space in cache but ms map prevent Yj

## Cache Config

5 វិធី ; cache size, block size, Associativity, replacement, write buffer

Block size : larger size mean have more spatial but higher miss penalty (less temporal)

Two-way set associative : 2 និង 2 ឯកតាម Tag ឱ្យ index ដោរក្នុង (mu=Tag)

Less conflict ;

4-way set associative: more complex, 4 comparator, Large mux



## replacement algorithm

Least Recently Used (LRU)  $\Rightarrow$  ឯកសារក្នុង RAM ដែលមិនបានប្រើបាយ

Round Robin  $\Rightarrow$  រៀបចំសម្រាប់តីវិញ្ញាបន្ទូរ

| Rep. | bench | Data  |           |
|------|-------|-------|-----------|
|      |       | Miss  | Miss rate |
| RR   | gcc   | 36805 | 0.018     |
| LRU  | gcc   | 33126 | 0.017     |
| RR   | go    | 7255  | 0.005     |
| LRU  | go    | 9073  | 0.006     |

## Write management

write back : ត្រូវបានរាយការណ៍នៅ ram ដែលរាយការណ៍នៅលើ replacement

pro: good for repeat write, fast

write through : ត្រូវបានរាយការណ៍នៅ ram  $\begin{cases} \text{pro: prefer for multiprocessor} \\ \text{cons: bad for repeat write, slow} \end{cases}$

cons: incoherent data

| Write allocate              | write not allocate                 |
|-----------------------------|------------------------------------|
| - loading data before write | - write direct without 1st loading |

# Mix Məyōñkən

ISA (instruction set architecture)

សំវិធ

- LW (load word)

- SW (store word)

- ORI (or immediate) in zero-extend

- ORUI (or unsigned immediate) in zero-padded

- BEQ (branch if equal)

ALU (arithmetic logic unit)

CPU (central processing unit)

MIPS (million instruction per second)

SMT (simultaneous multithreading)

FLOPs (floating point operations per second)

ROM (read only memory)

SPEC (standard performance evaluation cooperation)

RAM (random access memory)

CPI (clock per instruction)

DRAM (dynamic random access memory)

RISC (reduce instruction set computing)

SRAM (static random access memory)

CISC (complex instruction set computing)

FIFO (first in first out)

ABI (application binary interface)

LRU (least recently used)

RTL (register transfer language)

TLB (translation lookaside buffer)

RTS (request to send)

NAS (network attached storage)

Wr (write the data back to register)

SAN (storage area network)

WAR (write after read)

SSD (solid state drive)

WAW (write after write)

HDD (hard disk drive)

RAW (read after write)

JBOD (just a bunch of disk/drive)

ILP (instruction-level parallelism)

VLIW (very long instruction word)

SISD (single instruction stream & single data stream)

SIMD (single instruction stream & multi data stream)

MISD (multi instruction stream & single data stream)

MIMD (multi instruction stream & multi data stream)

## replacement algorithm

- Least Recently Used (LRU)  $\rightarrow$  នូវការសរសៃត្រូវបានបង្កើតឡើង
- Round Robin  $\rightarrow$  ត្រូវការសរសៃត្រូវបានបង្កើតឡើង  
 $\hookrightarrow$  FIFO, Random

| Rep. | bench | Data  |           |
|------|-------|-------|-----------|
|      |       | Miss  | Miss rate |
| RR   | gcc   | 36805 | 0.018     |
| LRU  | gcc   | 33126 | 0.017     |
| RR   | go    | 7255  | 0.005     |
| LRU  | go    | 9073  | 0.006     |

## Write management

- write back: និងពារីនីវិវាទនាមពារីនីនៃការផ្លើត្រូវបានបង្កើតឡើង
- write through: និងពារីនីរាយពារីនី

pro: good for repeat write, fast

cons: incoherent data

pro: prefer for multiprocessor

cons: bad for repeat write, slow

| Write allocate              | Write not allocate                |
|-----------------------------|-----------------------------------|
| - loading data before write | - write direct without it loading |

Q1



Virtual Address = @ in software

page table = mapping from physical

page space = 2<sup>10</sup> byte storage

page = visual block } size = 2<sup>10</sup> byte word 2<sup>10</sup> block = 2<sup>10</sup> offset

page frame = physical block

page fault: miss is called block [W<sub>1</sub> | W<sub>2</sub> | W<sub>3</sub> | W<sub>4</sub>]

size physical memory = 2<sup>10</sup> (from bit 10)

size physical memory = 2<sup>10</sup>

Q2: Add TLB (translation look-aside Buffer)



upgrade



∴ Setup Tag in Cache "OCN"

∴ Check Cache "OCN"  $\rightarrow$  access Cache tag



Good performance

low reliability

∴ Good performance

high reliability



security

fast recovery



SAN = block storage

fixed connect

super hierarchy



NAS = file storage

file server



∴ Good performance

high reliability



∴ Good performance

high reliability



∴ 1 disk  $\downarrow$   
∴ 2 disk  $\downarrow$   
(with security)