

| <b>Law/Theorem</b> | <b>Law of Addition</b>                    | <b>Law of Multiplication</b>                |
|--------------------|-------------------------------------------|---------------------------------------------|
| Identity Law       | $x + 0 = x$                               | $x \cdot 1 = x$                             |
| Complement Law     | $x + x' = 1$                              | $x \cdot x' = 0$                            |
| Idempotent Law     | $x + x = x$                               | $x \cdot x = x$                             |
| Dominant Law       | $x + 1 = 1$                               | $x \cdot 0 = 0$                             |
| Involution Law     | $(x')' = x$                               |                                             |
| Commutative Law    | $x + y = y + x$                           | $x \cdot y = y \cdot x$                     |
| Associative Law    | $x + (y + z) = (x + y) + z$               | $x \cdot (y \cdot z) = (x \cdot y) \cdot z$ |
| Distributive Law   | $x \cdot (y + z) = x \cdot y + x \cdot z$ | $x + y \cdot z = (x + y) \cdot (x + z)$     |
| Demorgan's Law     | $(x + y)' = x' \cdot y'$                  | $(x \cdot y)' = x' + y'$                    |
| Absorption Law     | $x + (x \cdot y) = x$                     | $x \cdot (x + y) = x$                       |

## Simply the logical Expression function

$$\underline{F} = AB + A(B+C) + B(B+C)$$

$$F = AB + |AB + AC + BB + \underline{BC} \text{ (Distributive Law)}$$

$$F = AB + AC + \cancel{BB} + \underline{BC} \text{ (Idempotent Law)}$$

$$F = AB + AC + \cancel{B} + \underline{BC} \text{ (Idempotent Law)}$$

$$F = AB + AC + \cancel{B} + BC$$

$$F = AB + AC + \cancel{B} \text{ (Absorption Law)}$$

$$F = \cancel{AB} + \cancel{B} + AC$$

$$F = \cancel{B} + AC$$

# Chapter 1

## Introduction

Digital Logic Design and Computer Organization with Computer Architecture for Security

1

## Impact of Technologies (1)

- Impact of innovations in integrated chip (IC) technologies**
  - Moore's law:
    - 35% increase in transistor density per year
    - 40% to 50% increase in transistor count per year
    - Has been used as a guide to design each next generation of microprocessors that revolutionized personal computers
    - Also resulted in increased power use and heat dissipation

Digital Logic Design and Computer Organization with Computer Architecture for Security

4

## In this Chapter

- Digital systems
- Number systems
- Digital circuits
- Computer organization
- Computer architecture
- Computer security
  - Security through hardware
- Other chapters

Digital Logic Design and Computer Organization with Computer Architecture for Security

2

## Impact of Technologies (2)

- Impact of innovations in application developments**
  - Revolutionizing the way digital systems are designed
  - Digital circuits are described in HDL
  - CAD tools simulate (validate) HDL descriptions
  - CAD tools synthesize (translate) HDL descriptions to circuits

Digital Logic Design and Computer Organization with Computer Architecture for Security

5

## Digital Systems

- Computers, iPad, cell phones, digital cameras, etc. created digital revolution, changing ways we:
  - Communicate, work, are entertained, shop
- Digital systems are in everything we see and use
  - Cars, grocery checkout equipment, utility meters, set-top, boxes, emergency equipment, etc.
- Thus, more data is created, processed, stored, transmitted, and accessed
  - Results in demands for more powerful computers
    - Personal computers
    - Large computers used in
      - E-commerce, banking, search engines, research
  - Create more chances for unauthorized access to data and information

Digital Logic Design and Computer Organization with Computer Architecture for Security

3

## Number Systems

- Digital circuits make logical decisions as True or False logic values
- A voltage range defines each logic value.
  - E.g., 5-volt power source
    - 2.4 to 5V as True
    - 0 to 0.8V as False
  - Lower voltage sources help save power in battery powered devices
- True and False values as 1 and 0 form binary numbers to represent
  - Characters
    - 8-bit, or 256 ASCII codes
    - 16-bit numbers, or over 65,000 Unicodes
  - Pixels to create images
  - Audio and video data
  - Integer and real numbers used in computations

Digital Logic Design and Computer Organization with Computer Architecture for Security

6

## Digital Systems as Von Neumann machines

- Computer system that contains**
  - One or processors
    - Each consisting of one or more processing cores (CPUs)
  - Memory
  - I/O devices
  - OS and application programs
- Embedded system that is**
  - A complete system as circuit board or ASIC or FPGA , known as SoC
    - Contain CPU(s) and memory
    - Dedicated software known as firmware
  - May include transmitter/receiver modules
  - May include signal conversion modules (converters)
    - Analog-to-Digital (A/D)
    - Digital-to-Analog (D/A)
  - Applications:**
    - Cell phones, digital camcorder, etc.
    - Host device controller interface
      - E.g., USB

Digital Logic Design and Computer Organization with Computer Architecture for Security

7

## Digital Logic Design (2)

- Requires logic gates**
  - Gates perform logic operations
  - Modern gates are built as CMOS circuits
  - Complementing MOS transistors reduce power consumption and heat dissipation
- CMOS chips can be fan-cooled when hot**
  - Thus, enabled personal computers
  - Exclusively used today in all types of digital systems

Digital Logic Design and Computer Organization with Computer Architecture for Security

10

## Von Neumann Computer

- Processing core (CPU) consists of**
  - Data path that includes
    - Digital circuit modules perform computations
    - Storage modules store computed data
  - Controller that orders data path operations
- As microcomputer that includes**
  - Multicore processor (s)
  - Memory
  - Interconnection medium
  - I/O devices
  - I/O device controller and interface
  - Potential for bottleneck between faster processor and slower memory



Digital Logic Design and Computer Organization with Computer Architecture for Security

8

## Digital Logic Design (3)

- Require logic circuits**
  - Logic circuits implement logic expressions
  - All NAND or all NOR gates show implementation details
  - Two types of digital circuits:
    - Combinational: Outputs are generated concurrently**
      - Outputs depend on current inputs only
      - E.g., Adders and selectors
    - Sequential: Outputs are generated in sequence (in steps)**
      - Output depends on current inputs as well as previous inputs
      - Uses combinational circuits to generate outputs
      - E.g., registers, counters, and control units

Digital Logic Design and Computer Organization with Computer Architecture for Security

11

## Digital Logic Design (1)

- Requires logic expressions**
  - Example:**  $f = ((\text{NOT } a) \text{ AND } b) \text{ OR } c$ 
    - NOT, AND, and OR indicate Boolean logic operators
- Evaluation**
  - Suppose  $a = 0$ ,  $b = 1$ , and  $c = 0$

$f = (\text{NOT } 0) \text{ AND } 1 \text{ OR } 0$   
 $= (1 \text{ AND } 1) \text{ OR } 0$   
 $= 1 \text{ OR } 0$   
 $= 1$
- Requirement: Minimal expressions to reduce hardware**

What function does f perform?  
 $f = 1$  when  $a, b, c$  forming a 3-bit number is prime

Digital Logic Design and Computer Organization with Computer Architecture for Security

9

## Effect of Increased Power Consumption and Heat Dissipation

- Examples,**
  - 2 watts for Intel 80386 processor
  - 130 watts for 3.3 GHz (Giga Hertz) Intel Core i7 processor
    - 65 times more watts
- Problem: Harder to make processors any faster**
  - Affects computer organization, programming model, and OS
  - Current Solutions:**
    - Divide tasks into subtasks using multithreaded programming
    - OS assigns processing cores to perform subtasks
      - Creates thread-level parallelism
    - Use multiprocessor systems to perform many independent and dependent tasks faster
      - Modern Supercomputers for scientific computations
      - Warehouse-scale computers for
        - Interactive applications (Facebook, Google, etc.)
        - Large-scale storage and computing (e.g., cloud computing)

Digital Logic Design and Computer Organization with Computer Architecture for Security

12

## Computer Organization

### Specifies implementation details:

- Circuit and their physical relationship that makeup**
  - Processing core
    - data path organization
    - Example: 32-bit Intel vs. AMD processors
      - Two different data paths but same instruction set
  - processor,
  - memory,
  - I/O device controller and interface
  - Interconnection of a computer components
- Memory organization**
  - Cache, SDRAM, multi-channel, etc.

Digital Logic Design and Computer Organization with Computer Architecture for Security

13

## Computer Security

- Protecting digital assets (programs and data) from malware attacks
- Protecting digital assets from unauthorized access by employees
- Affects all
  - individuals, government, business organizations
- Possible devices also subject to *physical attacks*
- Application examples
  - Secure data storage
  - Secure communication
  - Secure e-commerce
  - Etc.
- How hardware can help?
  - Hardware more secure than software and disk storage
  - Computer security through hardware
    - Secure co-processor
      - E.g., Crypto-processor
    - Secure processor
      - Also, supports secure execution
      - Also, guards against physical attacks

Digital Logic Design and Computer Organization with Computer Architecture for Security

16

## Computer Architecture (1)

### Specifies design concepts:

- Pipelining**
  - Concept of an assembly line
- E.G., pipelining CPU data path**

| Execute  | Load r1, B | Load r2, C | Add r3, r1, r2 | Store A, r3 |
|----------|------------|------------|----------------|-------------|
| Decode   | Load r1, B | Load r2, C | Add r3, r1, r2 | Store A, r3 |
| Fetch    | Load r1, B | Load r2, C | Add r3, r1, r2 | Store A, r3 |
| Time (T) | 1          | 2          | 3              | 4           |

If each stage is 2 minutes, how many cars built in a year (assume perfect scenario)? **260,000+**

What can go wrong?

Digital Logic Design and Computer Organization with Computer Architecture for Security

14

## Remaining Chapters (1)

- Combinational circuits**
  - Design methodology for small circuits (Ch2)
    - Circuits with fewer (e.g.,  $\leq 4$ ) inputs
  - Design methodology for large circuits (Ch3)
    - Circuits with many inputs (e.g., 32-bit Adder)
- Sequential circuits**
  - Basic core modules (Ch4)
    - Basic storage elements
  - Design methodology for small circuits (Ch5)
    - Registers, counters, etc.
  - Design methodology for large circuits (Ch6)
    - Data paths and control units

Digital Logic Design and Computer Organization with Computer Architecture for Security

17

## Computer Architecture (2)

- Parallelism**
  - Single Instruction Multiple Data (SIMD)**
    - E.g., Intel's Streaming SIMD Extension (SSE) instruction set
    - E.g., AMD's 3DNow instruction set
    - Also in GPUs
  - Instruction level parallelism (ILP)**
    - Also called Superscalar processor (i.e., processing core)
    - E.g., processing cores in Intel Core i7
  - Multiple Instructions Multiple Data (MIMD)**
    - Multicore Processors
      - E.g., Intel Core i7
    - Multiprocessor Systems**
      - Shared memory: Processors communicate using memory
      - Message passing: Processors communicate by sending/receiving messages

Digital Logic Design and Computer Organization with Computer Architecture for Security

15

## Remaining Chapters (2)

- Memory (Ch7)**
  - Memory organization
  - Memory timing
- Processing core (CPU) design, a very complex sequential circuit (Ch8)**
  - CPU data path and control
- Microcomputer organization, history and modern designs (Ch9)**
  - CPU, memory, I/O device interconnections
  - Device communication
- Memory system (Ch10)**
  - Cache memory organization
  - Main memory as physical memory
  - Disk space as virtual memory
- Computer security for computer architects (Ch11), an introduction**
  - Threat models
  - HW and SW security models, policies, and mechanisms
  - Trusted computing base as secure co-processor or secure processor

Digital Logic Design and Computer Organization with Computer Architecture for Security

18

# Chapter 2

## COMBINATIONAL CIRCUITS SMALL DESIGNS

Digital Logic Design and Computer Organization with Computer Architecture for Security

1

## Signal Naming Standards

- Active-high signal polarity**
  - 1 represents signal is active, asserted, enabled
  - 0, otherwise
  - E.g., signal labeled as  $x$  without a pre- or post-symbol
- Active-low signal polarity**
  - 0 represents signal is active, asserted, enabled
  - 1, otherwise
  - E.g., signal labeled as  $\_x, x', /x$ , or  $x\#$ 
    - With a pre- or post-symbol

| c | x | y |
|---|---|---|
| 0 | 0 | 1 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |

Digital Logic Design and Computer Organization with Computer Architecture for Security

4

## In this Chapter

- Small Combinational Circuits**
  - Fewer inputs (e.g.,  $\leq 4$  inputs)
  - Circuits modeled as Truth Tables
  - Circuit minimization techniques
- Circuit implementation options**
  - NANDs only
  - NORs only
- Timing diagram**
  - Signal propagation delay
  - Understanding signal hazards ("glitches")
- Other types of logic gates**
- Design examples**
- Introduction to design with HDL**

Digital Logic Design and Computer Organization with Computer Architecture for Security

2

## Primitive Logic Gates with Truth Tables

| NOT                                                                                                         | AND                       | OR                    |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
|-------------------------------------------------------------------------------------------------------------|---------------------------|-----------------------|---|---|---|---|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|---|---|---|---|---|---|---|---|---|---|---|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|---|---|---|---|---|---|---|---|---|---|---|
| $x \rightarrow f$                                                                                           | $x \cdot y \rightarrow f$ | $x + y \rightarrow f$ |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| <table border="1"> <tr><td>0</td><td>1</td><td>f</td></tr> <tr><td>1</td><td>0</td><td>0</td></tr> </table> | 0                         | 1                     | f | 1 | 0 | 0 | <table border="1"> <tr><td>0</td><td>0</td><td>0</td></tr> <tr><td>0</td><td>1</td><td>0</td></tr> <tr><td>1</td><td>0</td><td>0</td></tr> <tr><td>1</td><td>1</td><td>1</td></tr> </table> | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | <table border="1"> <tr><td>0</td><td>0</td><td>0</td></tr> <tr><td>0</td><td>1</td><td>1</td></tr> <tr><td>1</td><td>0</td><td>1</td></tr> <tr><td>1</td><td>1</td><td>1</td></tr> </table> | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 |
| 0                                                                                                           | 1                         | f                     |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 1                                                                                                           | 0                         | 0                     |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 0                                                                                                           | 0                         | 0                     |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 0                                                                                                           | 1                         | 0                     |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 1                                                                                                           | 0                         | 0                     |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 1                                                                                                           | 1                         | 1                     |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 0                                                                                                           | 0                         | 0                     |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 0                                                                                                           | 1                         | 1                     |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 1                                                                                                           | 0                         | 1                     |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 1                                                                                                           | 1                         | 1                     |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |

  

| NAND                                                                                                                                                                                        | NOR                              | XOR                        |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------|----------------------------|---|---|---|---|---|---|---|---|---|---|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|---|---|---|---|---|---|---|---|---|---|---|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|---|---|---|---|---|---|---|---|---|---|---|
| $\overline{x \cdot y} \rightarrow f$                                                                                                                                                        | $\overline{x + y} \rightarrow f$ | $x \oplus y \rightarrow f$ |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| <table border="1"> <tr><td>0</td><td>0</td><td>1</td></tr> <tr><td>0</td><td>1</td><td>1</td></tr> <tr><td>1</td><td>0</td><td>1</td></tr> <tr><td>1</td><td>1</td><td>0</td></tr> </table> | 0                                | 0                          | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | <table border="1"> <tr><td>0</td><td>0</td><td>1</td></tr> <tr><td>0</td><td>1</td><td>0</td></tr> <tr><td>1</td><td>0</td><td>0</td></tr> <tr><td>1</td><td>1</td><td>0</td></tr> </table> | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | <table border="1"> <tr><td>0</td><td>0</td><td>0</td></tr> <tr><td>0</td><td>1</td><td>1</td></tr> <tr><td>1</td><td>0</td><td>1</td></tr> <tr><td>1</td><td>1</td><td>0</td></tr> </table> | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
| 0                                                                                                                                                                                           | 0                                | 1                          |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 0                                                                                                                                                                                           | 1                                | 1                          |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 1                                                                                                                                                                                           | 0                                | 1                          |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 1                                                                                                                                                                                           | 1                                | 0                          |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 0                                                                                                                                                                                           | 0                                | 1                          |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 0                                                                                                                                                                                           | 1                                | 0                          |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 1                                                                                                                                                                                           | 0                                | 0                          |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 1                                                                                                                                                                                           | 1                                | 0                          |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 0                                                                                                                                                                                           | 0                                | 0                          |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 0                                                                                                                                                                                           | 1                                | 1                          |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 1                                                                                                                                                                                           | 0                                | 1                          |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |
| 1                                                                                                                                                                                           | 1                                | 0                          |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |                                                                                                                                                                                             |   |   |   |   |   |   |   |   |   |   |   |   |

  

| XNOR                                                                                                                                                                                        |   |   |   |   |   |   |   |   |   |   |   |   |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|---|---|---|---|---|---|---|---|---|---|---|
| $\overline{x \oplus y} \rightarrow f$                                                                                                                                                       |   |   |   |   |   |   |   |   |   |   |   |   |
| <table border="1"> <tr><td>0</td><td>0</td><td>1</td></tr> <tr><td>0</td><td>1</td><td>0</td></tr> <tr><td>1</td><td>0</td><td>0</td></tr> <tr><td>1</td><td>1</td><td>1</td></tr> </table> | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 |
| 0                                                                                                                                                                                           | 0 | 1 |   |   |   |   |   |   |   |   |   |   |
| 0                                                                                                                                                                                           | 1 | 0 |   |   |   |   |   |   |   |   |   |   |
| 1                                                                                                                                                                                           | 0 | 0 |   |   |   |   |   |   |   |   |   |   |
| 1                                                                                                                                                                                           | 1 | 1 |   |   |   |   |   |   |   |   |   |   |

Digital Logic Design and Computer Organization with Computer Architecture for Security

5

## Small Combinational Circuits

- Example: 2-bit unsigned multiplier,  $P = A * B$**
- Block diagram and truth table**
  - Labeling of input and output signals
- Implementation options**
  - LUT
    - Easier, slower, configurable
  - Logic circuit
    - Faster, less hardware

Digital Logic Design and Computer Organization with Computer Architecture for Security

3

## SOP Expressions

- Based on input values that produce 1 as output**
- Each such input is expressed as a product term**
- Circuit performs AND-OR logic**

Theorem 1:  $\overline{x}\overline{y} = \overline{x} + \overline{y}$   
Theorem 2:  $\overline{x+y} = \overline{x}\overline{y}$

Digital Logic Design and Computer Organization with Computer Architecture for Security

6

## POS Expressions

- Based on input values that produce 0 for  $f$  (an output)
  - Same input values produce 1 for  $\bar{f}$
- Find expression for  $f$  by complementing  $\bar{f}$
- Each such input is expressed as a sum term

$$f = (x + \bar{y})(\bar{x} + y)$$

- Circuit performs OR-AND logic

$$\text{SOP of } \bar{f} = \bar{x}y + x\bar{y}$$

$$f = (\bar{x}\bar{y})(\bar{x}y)$$

- Can be implemented with NOR gates
- DeMorgan's theorems convert OR-AND circuit into NOR-only circuit
- Also can use signal negation with Dual principle

$$\text{Dual of } \bar{f} = (\bar{x} + y)(x + \bar{y})$$

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Why minimize logic expressions

- Eliminates redundancies
- Requires fewer gates
- Fewer inputs per gates
- Less wire
- Less power usage
- Reduces circuit delay

How many gates and types for SOP?

Canonical SOP:  
3 NOTs,  
four 3-input ANDs,  
one 4-input OR.

Minimal SOP:  
One NOT,  
two 2-input AND,  
one 2-input OR

Implement with NAND gates

| x | y | z | f |
|---|---|---|---|
| 0 | 0 | 1 | 0 |
| 0 | 1 | 0 | 1 |
| 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 0 |

Canonical SOP:  $f = \bar{x}\bar{y}z + \bar{x}yz + xy\bar{z} + xyz$   
Minimal SOP:  $f = \bar{x}\bar{y}z$

Implement with NOR gates

| x | y | z | f |
|---|---|---|---|
| 0 | 0 | 1 | 1 |
| 0 | 1 | 0 | 1 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 1 |
| 1 | 1 | 1 | 1 |

Canonical POS:  $f = (x + y + z)(\bar{x} + \bar{y} + z)(\bar{x} + y + \bar{z})(\bar{x} + y + z)$   
Minimal POS:  $f = (x + z)(\bar{x} + y)$

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Show mathematically

- NAND implementation

$$f = \bar{f} \rightarrow f = \bar{\bar{x}}\bar{y} + \bar{x}\bar{y} \rightarrow \bar{x} + \bar{y} = \bar{x}\bar{y}$$

Theorem 2  
 $\bar{x} + \bar{y} = \bar{x}\bar{y}$

$$f = \overline{(\bar{x}\bar{y})(\bar{x}\bar{y})}$$

NAND, NAND

- NOR implementation

$$f = (x + \bar{y})(\bar{x} + y) = \overline{(x + \bar{y})(\bar{x} + y)} = \overline{(x + \bar{y})} + \overline{(\bar{x} + y)}$$

NOR, NOR  
NOR, again

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Karnaugh map (K-Map) Layouts

| yz: | 00 | 01 | 11 | 10 |
|-----|----|----|----|----|
| x:  | 0  | 1  | 3  | 2  |
| 1   | 4  | 5  | 7  | 6  |

2 × 4

| z:  | 0  | 1 |   |
|-----|----|---|---|
| xy: | 00 | 0 | 1 |
| 01  | 2  | 3 |   |
| 11  | 6  | 7 |   |
| 10  | 4  | 5 |   |

4 × 2

| yz: | 00 | 01 | 11 | 10 |   |
|-----|----|----|----|----|---|
| wx: | 00 | 0  | 1  | 3  | 2 |
| 01  | 4  | 5  | 7  | 6  |   |
| 11  | 12 | 13 | 15 | 14 |   |
| 10  | 8  | 9  | 11 | 10 |   |

4 × 4

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Canonical Expression

$g = x\bar{y} + \bar{x}z + xyz$  Non-Canonical

$g = \bar{x}\bar{y}\bar{z} + \bar{x}\bar{y}z + xy\bar{z} + xyz$  Canonical, every term has all the variable names

Min Terms vs. Canonical expression

For example,  $g(x, y, z) = \sum(0, 1, 6, 7)$

$$g(x, y, z) = \sum((000)2, (001)2, (110)2, (111)2)$$

$g = \bar{x}\bar{y}\bar{z} + \bar{x}\bar{y}z + xy\bar{z} + xyz$

$p2(a1, a0, b1, b0) = \sum(10, 11, 14)$

Digital Logic Design and Computer Organization with Computer Architecture for Security

## SOP and POS K-maps

| yz: | 00 | 01 | 11 | 10 |
|-----|----|----|----|----|
| x:  | 0  |    |    | 1  |
| 1   |    | 1  | 1  | 1  |

SOP

$$g(x, y, z) = \sum(2, 6, 7)$$

| yz: | 00 | 01 | 11 | 10 |
|-----|----|----|----|----|
| x:  | 0  | 0  | 0  |    |
| 1   | 0  | 0  |    |    |

POS

$$g(x, y, z) = \prod(0, 1, 3, 4, 5)$$

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Minimizing SOP Expressions

$\sum(2,3,6,7) = \bar{x}y\bar{z} + \bar{x}yz + xy\bar{z} + xyz$

$= \bar{x}y(\bar{z} + z) + xy(\bar{z} + z)$  Factor out smaller terms and simplify

$= \bar{x}y + xy$  Factor out y and simplify

$= y(\bar{x} + x)$  Simplify

$= y$

|     |    |    |    |    |
|-----|----|----|----|----|
| yz: | 00 | 01 | 11 | 10 |
| x:  | 0  |    | 1  | 1  |
|     | 1  |    | 1  | 1  |

Each pair of adjacent terms reduces to a simplified expression with one less variable.

Digital Logic Design and Computer Organization with Computer Architecture for Security

13

## Don't-Care Signal values

- Example: Displaying BCD numbers

A 7-segment display unit and converter

| Inputs        | Outputs                            |
|---------------|------------------------------------|
| 0 0 0 0 0 0 0 | fa 1 fb 1 fc 1 fd 0 fe 1 ff 1 fg 0 |
| 0 0 0 0 0 0 1 | fa 0 fb 1 fc 0 fd 0 fe 1 ff 1 fg 0 |
| 0 0 0 0 0 1 0 | fa 1 fb 0 fc 1 fd 0 fe 1 ff 0 fg 1 |
| 0 0 0 0 1 0 0 | fa 1 fb 1 fc 0 fd 0 fe 1 ff 0 fg 1 |
| 0 0 0 0 1 0 1 | fa 0 fb 1 fc 0 fd 0 fe 1 ff 0 fg 1 |
| 0 0 0 1 0 0 0 | fa 1 fb 1 fc 0 fd 0 fe 1 ff 0 fg 1 |
| 0 0 0 1 0 0 1 | fa 0 fb 1 fc 1 fd 0 fe 1 ff 0 fg 1 |
| 0 0 0 1 0 1 0 | fa 1 fb 1 fc 1 fd 0 fe 1 ff 0 fg 1 |
| 0 0 0 1 0 1 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 0 0 1 0 0 0 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 0 |
| 0 0 1 0 0 0 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 0 0 1 0 0 1 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 0 0 1 0 0 1 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 0 0 1 0 1 0 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 0 |
| 0 0 1 0 1 0 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 0 0 1 0 1 1 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 0 0 1 0 1 1 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 0 1 0 0 0 0 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 0 |
| 0 1 0 0 0 0 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 0 1 0 0 0 1 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 0 1 0 0 0 1 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 0 1 0 0 1 0 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 0 |
| 0 1 0 0 1 0 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 0 1 0 0 1 1 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 0 1 0 0 1 1 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 0 1 0 1 0 0 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 0 |
| 0 1 0 1 0 0 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 0 1 0 1 0 1 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 0 1 0 1 0 1 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 0 1 0 1 1 0 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 0 |
| 0 1 0 1 1 0 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 0 1 0 1 1 1 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 0 1 0 1 1 1 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 1 0 0 0 0 0 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 0 |
| 1 0 0 0 0 0 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 1 0 0 0 0 1 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 1 0 0 0 0 1 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 1 0 0 0 1 0 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 0 |
| 1 0 0 0 1 0 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 1 0 0 0 1 1 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 1 0 0 0 1 1 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 1 0 0 1 0 0 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 0 |
| 1 0 0 1 0 0 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 1 0 0 1 0 1 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 1 0 0 1 0 1 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 1 0 0 1 1 0 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 0 |
| 1 0 0 1 1 0 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 1 0 0 1 1 1 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 1 0 0 1 1 1 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 1 0 1 0 0 0 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 0 |
| 1 0 1 0 0 0 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 1 0 1 0 0 1 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 1 0 1 0 0 1 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 1 0 1 0 1 0 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 0 |
| 1 0 1 0 1 0 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 1 0 1 0 1 1 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 1 0 1 0 1 1 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 1 0 1 1 0 0 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 0 |
| 1 0 1 1 0 0 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 1 0 1 1 0 1 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 1 0 1 1 0 1 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 1 0 1 1 1 0 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 0 fg 0 |
| 1 0 1 1 1 0 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 0 fg 1 |
| 1 0 1 1 1 1 0 | fa 1 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |
| 1 0 1 1 1 1 1 | fa 0 fb 1 fc 1 fd 1 fe 1 ff 1 fg 1 |

?

Assuming that w, x, y, and z will not exceed 9, what should we enter in the table for inputs 10 through 15?

Digital Logic Design and Computer Organization with Computer Architecture for Security

16

## K-Map Minimization Rules

- Min/max terms that differ in only one bit are adjacent (an Implicant). A K-map is assumed to wrap around on both sides.
- A set of adjacent min/max terms may be combined to form a large group (a Prime Implicant). The number of terms in each group must be powers of 2 e.g., 2, 4, 8, or 16 terms.
- Each group of min/max terms must contain at least a single term that doesn't belong to any other group (no redundant groups), an Essential Prime Implicant
- All terms must be grouped.

Digital Logic Design and Computer Organization with Computer Architecture for Security

14

## K-Map with Don't Cares

$f(w, x, y, z) = \Sigma(1, 9, 14) + \Sigma_d(3, 7, 11)$

|     |    |    |    |    |
|-----|----|----|----|----|
| yz: | 00 | 01 | 11 | 10 |
| wx: | 00 | 1  | d  | d  |
|     | 01 |    | d  | d  |
|     | 11 |    | ①  |    |
|     | 10 | 1  | d  | d  |

$f(w, x, y, z) = \bar{x}z + wx\bar{y}\bar{z}$

Digital Logic Design and Computer Organization with Computer Architecture for Security

17

## More K-map Examples (no slides)

Digital Logic Design and Computer Organization with Computer Architecture for Security

15

## Logic Minimization Algorithm

- Based on K-Map minimization technique

- Compare neighboring min/max terms two at a time (e.g., 0000 with 0001) to produce all Implicants
- Write the Implicant with a dash (e.g., 000-) for the bit that changes
- Repeat steps 1 and 2 for neighboring terms with matching dashes (e.g., 000- with 100- to get -00-)
- Prime implicants: Repeat step 3 until all prime implicants are identified
- Essential prime implicants: Choose a minimum set among the prime implicants

Digital Logic Design and Computer Organization with Computer Architecture for Security

18

## Minimization Software

$f(w, x, y, z) = \Sigma(0, 2, 3, 4, 5, 6, 7, 8, 10, 12, 13)$

**Input File**

```
#Inputs: 4, Outputs: 1
i 4
o 1
#Input bit labels
#list w x y z
#output bit label
.o f
#list of min-terms separated by space and a single output bit separated
#list of bits
0000 1
0010 1
0011 1
0100 1
0101 1
0110 1
0111 1
1000 1
1010 1
1100 1
1101 1
#end of list
.e
```

**Output**

```
#Inputs: 4, Outputs: 1
#Input bit labels
#Output bit label
#List of min-terms and output
#end of list
.i 4
.o 1
#list w x y z
.p 0
.p 3
.p -1
.p -0
.p 0-1
.p 0-0
.p -1-1
.e
```

$\bar{x}\bar{z} + \bar{w}y + xy$

- Can be used with don't care inputs too

Digital Logic Design and Computer Organization with Computer Architecture for Security 19

## Other Gates

- Open collector (o.c.) buffer**
  - Application:** Wired-logic with a large fan-in
  - E.g., wired-AND or wired-OR logic
  - Many application areas**
- Tri-state buffer**
  - Used to create a bus for multiple modules to transmit data
  - Modules not outputting to the bus should be electrically isolated**

| In | Out |
|----|-----|
| 0  | 0   |
| 0  | 1   |
| 1  | 1   |
| 1  | 0   |

Digital Logic Design and Computer Organization with Computer Architecture for Security 22

## Circuit Timing Diagram

- Circuits have gate and signal wire delays
- Gates may have different output signal *rise* and *fall* times
- Circuits have different signal paths from inputs to outputs
  - These may result in signals reaching each gate at different times
  - Can cause unwanted signal change (glitch) at some outputs
  - Must wait for the longest signal propagation delay before the output(s) of a circuit can be used (e.g., stored in a register)

Digital Logic Design and Computer Organization with Computer Architecture for Security 20

## Small combinational design examples

- Full-adder circuit**
- Multiplexer circuit**
  - Selects data one from 2 or more inputs
- Decoder circuit**
  - Translates an input value to a corresponding signal
- Encoder circuit**
  - Translates an active input signal to a corresponding signal number

Digital Logic Design and Computer Organization with Computer Architecture for Security 23

## Circuit Fan-In and Fan-Out

- Fan-in:** Number of gate inputs
- Fan-out:** Number of places a gate output can connect to
- Buffer gate to increase a gate's fan-out**

(a) Buffer gate and its truth table.

(b) Increasing fan-out

Digital Logic Design and Computer Organization with Computer Architecture for Security 21

## Logic Implementation

- ASIC (application specific integrated chip)**
- FPGA (Field Programmable Gate Arrays)**

Altera FPGA board      FPGA internal

Legend: CLB (yellow), Interconnection switch block (grey), I/O Block (white), Wiring channel (blue).

Digital Logic Design and Computer Organization with Computer Architecture for Security



## Behavioral Model

```

module full_adder
(
    input a, b, cin,
    output reg s, cout
);
    always@((a or b or cin))
        case ((a, b, cin))
            3'b000: begin s = 0; cout = 0; end
            3'b001: begin s = 1; cout = 0; end
            3'b010: begin s = 1; cout = 0; end
            3'b011: begin s = 0; cout = 1; end
            3'b100: begin s = 1; cout = 0; end
            3'b101: begin s = 0; cout = 1; end
            3'b110: begin s = 0; cout = 1; end
            3'b111: begin s = 1; cout = 1; end
            default:begin s = 0; cout = 0; end
        endcase
endmodule
  
```

Digital Logic Design and Computer Organization with Computer Architecture for Security

28



## A summary of Verilog HDL operators

| Precedence | Operator Type | Symbol                                              | Example                                                                                      |
|------------|---------------|-----------------------------------------------------|----------------------------------------------------------------------------------------------|
| Highest    | Unary         | $\neg$ , $\sim$                                     | $a, \neg a$ (negate a), $\neg a$ (logical not), $\sim a$ (bitwise not)                       |
|            | Exponential   | $\wedge\wedge$                                      | $a^3$ (a cubed)                                                                              |
|            | Arithmetic 1  | $\cdot$ , $/$ , $\%$                                | $a \cdot b$ (multiply), $a / b$ (divide), $a \% b$ (mod)                                     |
|            | Arithmetic 2  | $+$ , $-$                                           | $a + b$ (add), $a - b$ (subtract)                                                            |
|            | Shift:        | $\ll$ , $\gg$                                       | $a \ll 2$ (shift left twice)                                                                 |
|            | Logical       | $\ll<$ , $\gg>$                                     | $a \gg 3$ (shift right 3 times extending the sign bit)                                       |
|            | Arithmetic    | $<<$ , $>>$                                         | $a \gg b$ (a greater or equal to b)                                                          |
|            | Relational    | $<$ , $\leq$ , $\geq$ , $>$                         | $a < b$ if a is identical to b excluding x and z                                             |
|            | Equality      | $\equiv$ , $\neq$                                   | $a \equiv b$ if a is identical to b including x and z                                        |
|            | Bit-wise      | $\&=$ , $\&=$                                       | $a \&= b$ if a is identical to b including x and z                                           |
|            | Basics        | $\&$ , $\mid$ , $\sim$ , $\wedge$                   | $a \& b$ (and), $a \mid b$ (or), $\neg a$ (not), $a \wedge b$ (xor)                          |
|            | Combined      | $\&\&$ , $\mid\mid$ , $\neg\wedge$ , $\wedge\wedge$ | $a \&\& b$ (hand), $a \mid\mid b$ (nor), $\neg a \wedge b$ (xnor), $a \wedge\wedge b$ (xnot) |
|            | Logical       | $\&\&$ , $\mid\mid$ , $!$                           | $a \&\& b$ (and), $a \mid\mid b$ (or), $\neg a$ (not)                                        |
| Lowest     | Conditional   | $?:$                                                | $(a \geq b) ? a : b$ , $b : a$                                                               |

Digital Logic Design and Computer Organization with Computer Architecture for Security

29



# Chapter 3

## COMBINATIONAL CIRCUITS LARGE DESIGNS

Digital Logic Design and Computer Organization with Computer Architecture for Security

1

## In this Chapter

- Design methodologies for large combinational circuits
  - Bit-parallel
  - Bit-serial
- Integer arithmetic as examples
  - Add, subtract, multiply, and divide as four basic arithmetic operations
- IEEE floating-point number standards
  - Floating-point Data Space
  - Floating-point arithmetic
  - Floating-point unit (FPU)

Digital Logic Design and Computer Organization with Computer Architecture for Security

2

## Top-Down Design Methodology

- Bit-parallel
  - Partition n-bit design problem into smaller n-bit design problems
  - E.g., 8-bit ALU designed using 8-bit adder/subtractor and 8-bit bit-wise logic
- Bit-serial
  - Partition n-bit design problem into a fewer-bit design problem (called slice)
  - E.g., 8-bit ALU designed using eight 1-bit ALU modules
- Hybrid
  - Design uses bit-parallel and bit-serial modules

Digital Logic Design and Computer Organization with Computer Architecture for Security

3

## Carry Propagate Adder (A bit-serial adder)

- Use FA slices
- Carry bits generated sequentially, one at a time
- Propagation delay proportional to number of carry bits
- Assuming SOP expressions for sum and carry bits and 0.1 ns delay for NANDs determine:
  - ACPA(8)
  - ACPA(32)
- CPA is the slowest

ACPA(8) = 1.7 ns

ACPA(32) = 6.5 ns, too slow

Digital Logic Design and Computer Organization with Computer Architecture for Security

4

## Carry Look-Ahead (CLA) Adder

- Goal: Generate carry bits in parallel
- Let's examine FA expressions
  - Easy to generate  $p$  and  $g$  bits in parallel
  - Carry bits are dependent, but can substitute carry expressions to break dependency
  - Once carry bits are known, easy to generate sum bits in parallel
  - $\Delta CLA(8) = ?$

**0.8 ns**

FA expression from Ch2:

$$s_l = a_l \oplus b_l \oplus c_{l-1}$$

$$c_l = (a_l \oplus b_l)c_{l-1} + a_l b_l$$

Let,  
 $p_l = a_l \oplus b_l$   
 $g_l = a_l b_l$

$$s_l = p_l \oplus c_{l-1}$$

$$c_l = g_l + p_l c_{l-1}$$

Digital Logic Design and Computer Organization with Computer Architecture for Security

5

## Observations

- If keep substituting previous carry expression in next carry expression will run into Fan-in and fan-out problems
- Solution: Generate some carry bits sequentially and some in parallel (next slide)

Digital Logic Design and Computer Organization with Computer Architecture for Security

6

## Large CLA Adder

- Group carry bits into equal sized sets with sets resulting in no fan-in or fan-out problem
- Generate carry bits in two steps
  - What is the longest signal path from inputs to outputs?
- Determine  $\Delta\text{CLA}(32)$

$\Delta\text{CLA}(32) = 1.2 \text{ ns}$

Example: For simplicity assume  $n = 8$

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Subtractor

- Similar expressions as FA:

$$d_i = x_i \oplus y_i \oplus b_{i-1}$$

$$b_i = (\overline{x_i} \oplus y_i)b_{i-1} + \overline{x_i}y_i$$

- Can use adder to do subtraction if both are needed
  - 2's complement adder/subtractor

|                                                                                                 |                                                                                           |
|-------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
| $\dots bi \quad bi-1 \quad \dots$ $- \dots \quad yi \quad \dots$ $-----$ $\dots di \quad \dots$ | $\dots 0 \quad 1 \quad \dots$ $- \dots \quad 1 \quad \dots$ $-----$ $\dots 0 \quad \dots$ |
|-------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Twos Complement Adder/Subtractor

- $A - B$  can be viewed as adding  $A$  with  $-B$
- Circuits for  $A + B$  and  $A + (-B)$  are similar except second input is negated when subtracting
- What do we know about converting negative numbers to 2's complement representation?
  - Flip bits
    - Can be done with NOT gates
  - Add 1
- Addition:
  - Do not flip  $B$  bits
  - Set carry-in to 0
  - Carry-out not part of the result
- Subtraction:
  - Flip  $B$  bits
  - Set carry-in to 1
  - Carry-out not part of the result
- How to combine into one circuit? Use a control bit  $m$  for mode.
  - Add when  $m = 0$
  - Subtract when  $m = 1$
  - Need an inverter circuit controlled by  $m$
- Potential problem?
  - Result can overflow and become incorrect
  - Need overflow detection logic

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Arithmetic Overflow

- When sum of two positive numbers is negative
  - I.e., Sign of result becomes 1
  - Applies to subtraction too
    - $A - B$  when  $A > 0$  and  $B < 0$
- When sum of two negative numbers is positive
  - Sign of result becomes 0
  - Applies to subtraction too
    - $A - B$  when  $A < 0$  and  $B > 0$
- A simple rule to detect overflow
  - Overflow when carry-in to sign bit position  $\neq$  carry-out from sign bit position

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Arithmetic Logic Unit (ALU)

- Performs arithmetic or bit-wise logic functions
  - A function code specifies which operation to perform
  - A complex combinational circuit
- Need to use bit-parallel or bit-serial design methodology
- Overflow flag (OVF) can only be active when performing arithmetic operations
  - Must be masked otherwise

Example

| f2 | f1 | f0 | Function    |
|----|----|----|-------------|
| 0  | 0  | 0  | Add         |
| 0  | 0  | 1  | Sub         |
| 0  | 1  | 0  | Increment   |
| 0  | 1  | 1  | Decrement   |
| 1  | 0  | 0  | Bitwise AND |
| 1  | 0  | 1  | Bitwise OR  |
| 1  | 1  | 0  | Bitwise NOT |
| 1  | 1  | 1  | Not Defined |

$A = a_{n-1} \dots a_1 a_0$     $B = b_{n-1} \dots b_1 b_0$

$R = r_{n-1} \dots r_1 r_0$

ovf

Digital Logic Design and Computer Organization with Computer Architecture for Security

## ALU Bit-Parallel Design

- Identify different types of operations
  - n-bit Arithmetic
    - Add, subtract, increment, decrement
    - Can combine into one 2's complement adder/subtractor
  - n-bit Bit-wise operators
    - NOT, AND, OR
- Assume you have these modules draw a data path
  - Use MUX to select only one output
  - Include other necessary circuits
    - Circuit to convert input  $F$  into intermediate truth signals
    - Circuit to mask OVF during bit-wise operations
- Design modules and assemble

$A = a_{n-1} \dots a_1 a_0$     $B = b_{n-1} \dots b_1 b_0$

$R = r_{n-1} \dots r_1 r_0$

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Example ALU Modules

- Design Arithmetic module**
  - Use n-bit 2's complement adder/subtractor
  - Left input always A
  - Right input either B or 1
    - Need a circuit that outputs B if add/sub or 1 if inc/dec.
    - Use known modules when possible
- Design n-bit 4-to-1 MUX**
  - Bit-parallel: Design using n-bit 2-to-1 MUXes (bit-parallel)
  - Bit-serial: Design using 1-bit 4-to-1 MUX slices
- Design Map and Mask circuits**
  - Create truth tables
  - Find minimal SOP/POS expressions

Digital Logic Design and Computer Organization with Computer Architecture for Security 13

## ALU Bit-Serial Design

- Consider n copies of 1-bit ALU slices
  - Create truth table for 1-bit ALU slice
  - Use Espresso
- May use larger slices
  - 2-bit or 4-bit, for example
  - Larger slices may be designed bit-parallel
- Can be slow for large n

| r2 | r1 | r0 | a | b | c1 | c0 | r | Function      |
|----|----|----|---|---|----|----|---|---------------|
| 0  | 0  | 0  | 0 | 0 | 1  | 0  | 1 | Add           |
| 0  | 0  | 0  | 0 | 1 | 0  | 1  | 0 |               |
| 0  | 0  | 1  | 1 | 1 | 0  | 1  | 1 |               |
| 1  | 0  | 0  | 0 | 0 | 0  | 1  | 0 |               |
| 1  | 0  | 0  | 1 | 0 | 1  | 0  | 1 |               |
| 1  | 1  | 0  | 1 | 1 | 1  | 0  | 1 |               |
| 1  | 1  | 1  | 1 | 1 | 1  | 1  | 1 |               |
| 0  | 0  | 1  | 0 | 0 | 0  | 1  | 1 | Sub           |
| 0  | 0  | 1  | 0 | 1 | 0  | 1  | 1 |               |
| 0  | 1  | 0  | 0 | 0 | 1  | 1  | 0 |               |
| 0  | 1  | 0  | 1 | 0 | 1  | 1  | 0 |               |
| 1  | 0  | 0  | 0 | 0 | 0  | 1  | 0 |               |
| 1  | 0  | 0  | 1 | 0 | 0  | 1  | 0 |               |
| 1  | 1  | 0  | 0 | 0 | 0  | 0  | 0 |               |
| 1  | 1  | 0  | 1 | 0 | 0  | 0  | 0 |               |
| 0  | 1  | 0  | 0 | d | 1  | 0  | 1 | Increment     |
| 1  | 1  | 0  | 0 | d | 0  | 0  | 0 |               |
| 0  | 1  | 1  | 0 | d | 1  | 1  | 0 | Decrement     |
| -1 | 0  | 0  | 0 | 1 | d  | d  | 1 | Bit-wised AND |
| 1  | 0  | 1  | 0 | d | 1  | d  | 1 | Bit-wised OR  |
| 1  | 1  | 0  | 0 | d | 1  | d  | 1 | Bit-wised NOT |
| 1  | 1  | 1  | 1 | d | d  | d  | d | Not Defined   |

Truth Table for 1-bit ALU slice 14

## Other design examples (unsigned multiplier)

**Algorithm**

$$\begin{array}{r} \begin{array}{r} 1 & 0 & 1 & 1 \\ \times & 1 & 0 & 1 & 1 \\ \hline 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ + & 1 & 0 & 0 & 1 \\ \hline 1 & 1 & 0 & 0 & 1 & 1 \end{array} P = B \cdot A \end{array}$$

- Bit-parallel**
  - Addends are added one at a time after adding 1<sup>st</sup> two
  - Less concurrency in the data path
  - Slower, longer propagation delay
- Bit-serial**
  - Addend bits are added vertically, the way numbers are added by hand
  - More concurrency in the data path
  - Faster, shorter propagation delay

Digital Logic Design and Computer Organization with Computer Architecture for Security 15

## Unsigned Divider (restoring)

- Similar to how we divide by hand
  - In each step, remainder can be + or -
  - If remainder positive, use in the next step
  - Else, restore
  - Compute next numerator bit and repeat
- Bit-parallel**
  - Requires subtractor and MUX modules
- Bit-serial**
  - Can use 1-bit combined subtractor/MUX slices
  - See Exercise section

**Bit-parallel**

**Bit-serial**

Digital Logic Design 90%+ system, bo=0 for 0 divisor with 0.0001+ Accuracy 16

## Real Number Arithmetic

**IEEE 754 FP number Standards**

- Single, 32 bits**
  - 1-bit sign, 8-bit biased exponent (bias = 127), 23-bit fraction
  - Stored as a 32-bit number in memory
- double, 64 bits**
  - 1-bit sign, 11-bit biased exponent (bias = 1023), 52-bit fraction
  - Stored as a 64-bit number in memory
- Extended, 80 bits**
  - 1-bit sign, 15-bit bias exponent (bias = 16383), 64-bit fraction
  - Stored in 80-bit registers only (no memory representation)

Digital Logic Design and Computer Organization with Computer Architecture for Security 17

## FP number Data Space (assume 32-bit FP numbers)

- Normal**
  - 1 ≤ Biased exponent ≤ 254
- Denormal**
  - Biased exponent = 0 and fraction ≠ 0
- Zero**
  - Biased exponent = 0 and fraction = 0
- Infinity**
  - Biased exponent = 255 and fraction = 0
  - E.g.,  $\frac{1}{0}$
- Not-a-number (NaN)**
  - Biased exponent = 255 and fraction ≠ 0
  - E.g.,  $\sqrt{-1}$

Digital Logic Design and Computer Organization with Computer Architecture for Security 18

## Data Space Illustration (1-Dimensional)

- Bold and thin lines indicate real numbers stored as FP numbers in computer
- More fraction bits implies more thin lines
- More exponent bits implies more bold lines



Digital Logic Design and Computer Organization with Computer Architecture for Security

19

## Two-Dimensional Illustration

- Easier to identify data space regions
- Easier to mark specific FP numbers or domain or range of a function
  - Eg. The largest FP number
  - E.g., for test generation purposes
  - region identified by  $(-1, 1)$  or  $[-1, 1]$ , for example



Digital Logic Design and Computer Organization with Computer Architecture for Security

20

## FP Arithmetic

- Requires integer arithmetic
  - Operates on exponent and fraction numbers independently
  - Typically combinational arithmetic circuits
- Requires shift operations
  - Typically combinational shifter circuits
  - Used to line up implicit decimal points
    - E.g., during FP add
  - Used for normalizing results
    - Result converted to standard format
  - Used for rounding results
    - 64-bit fraction in register is converted to 23 or 52 bits format for storage
      - "float" data type: 23-bit fraction
      - "double" data type: 52-bit fraction
    - The resultant fraction is rounded
      - Based on the value of the bits lost
      - May require another normalization step

Digital Logic Design and Computer Organization with Computer Architecture for Security

21

## FP Add

### (e.g., S = A + B)

1. Switch operands (if necessary)
  - For  $S = A + B$ ,  $|A| \geq |B|$
2. Align decimal points and compute result  $R.F = A.F + B.F$
3. Normalize R.F
4. Round R.F to produce S.F
- Example

Digital Logic Design and Computer Organization with Computer Architecture for Security

22

## FP subtract, multiply, divide

- Subtraction
  - Lineup decimal points
  - Compute  $A - B$  if  $A.s = B.s$  or  $A + B$  if  $A.s \neq B.s$ .
- Multiplication
  - Integer multiply fractions
  - Add exponents
  - XOR the sign bits
- Division
  - Integer divide fractions
  - Subtract exponents
  - XOR the sign bits
- The rounding and normalization steps are the same as in FP add

Digital Logic Design and Computer Organization with Computer Architecture for Security

23

# Chapter 4

## SEQUENTIAL CIRCUITS: CORE MODULES

## In this Chapter

- Basic modules for saving sequential circuit's state
  - Latches and flip flops
- Timing constraints
- Other flip-flops
- HDL models

## Sequential Circuit Example

- Register retains state of circuit as *sum*
- Adder computes next *sum* (circuit's next state)
- Timing signal must guarantee *sum* is computed before register loading

## SR latch core circuit

- Three basic functions:
  - retain previous value (keeps *q* as is)
  - store 0 (reset *q*)
  - store 1 (set *q*)
- However, four possible cases, only three used
  - s* = 0 and *r* = 0, retain *q*
  - s* = 0 and *r* = 1, reset
  - s* = 1 and *r* = 1, set
  - s* = 1 and *r* = 1, not used
    - makes *q* =  $\bar{q}$ , invalid
    - causes *q* and  $\bar{q}$  to oscillate when both *s* and *r* instantly become 0 next

(a)

(b)

## SR Latch (added clock signal)

- Goal: Prevent Case 4 (*s* = 1 and *r* = 1 affecting core circuit)
- Includes a signal gating logic to time sampling of inputs (*s* and *r*)
  - Only when both *s* and *r* are both stable (not changing) should affect core circuit
- Naming convention
  - positive-level, inputs are sampled when *clk* = 1
  - negative-level, inputs are sampled when *clk* = 0
- Assume positive-level (figure below)
  - clk* = 0, core retains *q* value
  - clk* = 1, *s* and *r* gated (sampled), enter the core
  - Timing constraint: *s* and *r* must stabilize at or before *clk* changes to 1 (sampling level)
    - s* and *r* called synchronous inputs

## Asynchronous Inputs

- Function: Changes *q* and  $\bar{q}$  independent of *clk*
- Used for circuit initialization during startup

SR latch, NAND Version And Symbol

(a) SR latch, NAND version

(b) SR latch symbol



## Other flip-flops

- JK flip-flop**
  - Uses two inputs called  $j$  and  $k$
  - $j = k = 1$  will not cause oscillation
  - Better when wires require less space than gates
    - Not necessarily true today
- T (toggle) flip-flop**
  - $q$  alternates every clock cycle when  $t = 1$
  - E.g., consider when  $t = 1$ ,  $q$  controls a 2-to-1 MUX
- Both can be designed to operate edge triggered

(a) JK Flip-flop

| J | K | Q(t) | Q(t+1) |
|---|---|------|--------|
| 0 | 0 | 0    | 0      |
| 0 | 1 | 0    | 1      |
| 1 | 0 | 1    | 0      |
| 1 | 1 | 1    | 0      |

(b) T flip-flop

| T | Q(t) | Q(t+1) |
|---|------|--------|
| 0 | 0    | 0      |
| 1 | 1    | 0      |

## D Latch HDL Model (Behavioral)

```
module d_latch
(
  input clock, _reset, _preset, d,
  output reg q,
  output nq
);
  assign nq = ~q;
  always@(clock or !_reset or !_preset or d) begin
    if(!_reset) q <= 0;
    else if(!_preset) q <= 1;
    else if(clock) q <= d;
  end
endmodule
```

Designed like a combinational circuit.

Except, note the missing else here  
(creates a D-latch)

“<=“ called non-blocking or concurrent assignment

## D flip-flop HDL Model (Behavioral)

```
module dff
(
  input clock, _reset, _preset, d, e,
  output reg q,
  output nq
);
  assign nq = ~q;
  always@(posedge clock, _reset, _preset) begin
    if(!_reset) q <= 0;
    else if(!_preset) q <= 1;
    else if(e) q <= d;
  end
endmodule
```

Note the missing synchronous inputs  $d$  and  $e$  here

Note also the missing else here  
(creates a D flip-flop)

# Chapter 5

## SEQUENTIAL CIRCUITS: Small Designs

Digital Logic Design and Computer Organization with Computer Architecture for Security

1

## In this Chapter

- Sequential circuit design models
- Design examples:
  - registers, counters, sequence recognizer
- Sequential circuit timing
- Interfacing sequential circuits

Digital Logic Design and Computer Organization with Computer Architecture for Security

2

## Sequential Circuit as a Finite State Machine (FSM)

- Requires flip-flop(s) to save circuit state
- Requires combinational circuit(s) to generate next circuit state and output(s)
- Two types of combinational circuits:
  - Functions of the circuits cannot be determined in advance
    - FSM design is modeled with a finite state diagram (FSD)
  - Functions of the circuits can be determined in advance
    - E.g., MUX, adder, ALU
    - No need for an FSD

Digital Logic Design and Computer Organization with Computer Architecture for Security

3

## A simple design Example (Parallel-load register with enable)

- Assume unknown combinational circuit

- Design 1-bit register 1st
  - We have seen D flip-flop with enable in Ch4
  - Here, the flip-flop formally modeled as FSD (next slide)
- Combine register slices to create 4-bit register below
  - enable*, *clk*, *\_reset* connect to all

Digital Logic Design and Computer Organization with Computer Architecture for Security

4

## FSM design steps (1-bit parallel-load register example)

- Draw FSD and detailed block diagram of 1-bit register slice
- Convert FSD to truth table (also called transition table)

| Current State | External Inputs | Next State |
|---------------|-----------------|------------|
| 0             | 0               | 0          |
| 0             | 0               | 0          |
| 0             | 1               | 0          |
| 0             | 1               | 1          |
| 1             | 0               | 1          |
| 1             | 0               | 1          |
| 1             | 1               | 0          |
| 1             | 1               | 1          |

- Find minimal SOP or POS expression(s) for the output(s)

$d = \bar{e}q + ex$

Digital Logic Design and Computer Organization with Computer Architecture for Security

5

## Some design rules

- If cannot determine function(s) of combinational circuit(s) in advance:
  - Model FSM as FSD
    - May need to design bit-slice 1st
  - Determine number of flip flops
  - Convert the FSD to truth table
  - Find minimal expressions for next state variable(s) and output(s)
  - Draw the complete circuit with flip-flops
- Otherwise
  - Use bit-serial design with known modules
  - Or, bit-parallel design with known modules

Digital Logic Design and Computer Organization with Computer Architecture for Security

6

## Multi-Function Register

- Performs one of many functions
- Can be designed bit-serial or bit-parallel
- If cannot determine function(s) of combinational circuit(s) in advance
  - Model a bit-slice (e.g., 1-bit) as FSD
- Otherwise: Oh, I can use a MUX
  - Bit-serial: Use 1-bit 4-to-1 MUXes, for example
  - Bit-parallel: Use n-bit 4-to-1 MUX



| f1 | f2 | Action                                                 |
|----|----|--------------------------------------------------------|
| 0  | 0  | Clear (synchronous reset)                              |
| 0  | 1  | Load X                                                 |
| 1  | 0  | Arithmetic shift (shifts right repeating the sign bit) |
| 1  | 1  | Right shift (shifts right entering 0 from left)        |

Digital Logic Design and Computer Organization with Computer Architecture for Security

7

## FSM

### (A formal view)

#### Three types of FSMs:

- **Moore**
  - External inputs do not synchronously affect outputs
  - Outputs called Moore
- **Mealy**
  - External inputs synchronously affect outputs
  - Outputs called Mealy
- **Hybrid**
  - Generates both Moore and Mealy outputs



Digital Logic Design and Computer Organization with Computer Architecture for Security

8

## Sequence Recognizer

Suppose FSM recognizes overlapping input sequence "101":

- Inputs processed one bit at a time (one per clock cycle)
- Not possible to determine the functions of the combinational circuits in advance
  - Model FSD as FSD
- Reset initializes the machine to known state
- Can be designed either as Moore or Mealy machine



Digital Logic Design and Computer Organization with Computer Architecture for Security

9

## Moore Sequence Recognizer ("101")

- FSD has 4 states (4 bubbles)
- E.g., labeled A to D
  - E.g., A being the initial state
  - D being the acceptance state where output z becomes 1
- z is called Moore output
  - I.e., synchronized to clk, can only change on clk edge

Digital Logic Design and Computer Organization with Computer Architecture for Security

10

## Ways to encode FSM states

- **Binary encoded states**
  - Assign unique binary numbers to states (each bubble in FSD)
  - Advantage: Minimum number of flip-flops
  - Disadvantage: More complex combinational circuits
- **One hot design**
  - Use one flip-flop per state
    - Only one flip-flop is set during each clock cycle
  - Disadvantage: Requires maximum number of flip-flops
  - Advantage: Less complex combinational circuits
  - Better with PLDs (e.g., FPGAs)

Digital Logic Design and Computer Organization with Computer Architecture for Security

11

## Example

- Binary Encoded State Values
  - Fewer flip-flops
  - Relatively more complex CCs
- One-hot State Values
  - More flip-flops
  - Simpler CCs



Digital Logic Design and Computer Organization with Computer Architecture for Security



12





## Sequential Circuit Timing (2)

- Possible problem: Next  $clk1$  reaches FF2 before previous  $clk2$  reaches FF1
  - If  $\Delta_{cc} \approx \tau - \tau_{cs}$ ,  $d2$  may be still changing when  $clk1$  arrives
    - FSM malfunction due to setup or hold time violation at FF2
- Normal operation
  - When  $\Delta_{cc} < \tau - \tau_{cs}$

Digital Logic Design and Computer Organization with Computer Architecture for Security 25

## Clock Frequency Estimation – With Clock Skew

Add  $\tau_{cs}$  to the clock period determined in Ch. 4

$$\tau \geq \tau_{cq\_max} + \tau_{pd\_max} + \tau_{st} + \tau_{cs}$$

Digital Logic Design and Computer Organization with Computer Architecture for Security

26

## Interfacing Sequential Circuits

- How to handle asynchronous inputs
  - Asynchronous inputs may change at any time with respect to *clock* signal
    - They can cause setup or hold time violation if directly enter FSM
- Solution: Use synchronization flip-flop(s)
  - Asynchronous input  $d$  may violate setup or hold time of synchronizing FF, not FFs of FSM
    - $q$  of synchronizing FF will eventually stabilize if setup or hold time violated
  - May use two synchronizing FFs if  $q$  slow to stabilize

Digital Logic Design and Computer Organization with Computer Architecture for Security 27

## FSM HDL Model (All Behavioral)

- E.g., consider modeling sequence “101” recognizer
- Single module with three “always” blocks:
  - Combinational circuit NSG
    - Use “case” statement to list states
      - E.g., Moore: four states A, B, C, and D (the  $q$  bits)
    - Use “if-else” statements to model FSD arcs
      - Uses current state A, B, C, or D and input  $x$  to model next state (the  $d$  bits)
  - Combinational circuit OG
    - Use “case” and/or “if-else” to model outputs
      - Moore: if current state is D then  $z = 1$  else  $z = 0$
      - Mealy: if current state is D and  $x = 1$  then  $z = 1$  else  $z = 0$
  - Flip-flops
    - Model FFs with either asynchronous or synchronous reset and enable (if necessary)

Digital Logic Design and Computer Organization with Computer Architecture for Security

28

# Chapter 6

## SEQUENTIAL CIRCUITS: LARGE DESIGNS

Digital Logic Design and Computer Organization with Computer Architecture for Security

1

## In this chapter

- Recall from Ch1:
  - Complex sequential circuit = data path + control unit
  - Data path types
  - Control unit types
  - Performance parameters
  - Design examples
    - Unsigned sequential multiplier
    - 2's complement sequential multiplier
    - A simple GPU

Digital Logic Design and Computer Organization with Computer Architecture for Security

2

## Complex Data Paths

- Include combinational circuit modules
  - ALU, MUXs, decoders, wired-logic, etc.
  - Sometimes custom combinational circuits
- Include small sequential modules
  - Registers and counters
  - Sometimes custom sequential circuits
- Can include buses
  - With buffers and tristate buffers
- Include wires for interconnecting the modules creating multiple paths for data
  - Each path identifies a unique data path operation
    - Computation, memory access, etc.

Digital Logic Design and Computer Organization with Computer Architecture for Security

3

## Control Units

1. Monitor events as input signals
  - External event signals
    - E.g., an “start” signal that starts a task performed on the data path
  - Data path event signals
    - Arithmetic overflow flag signal
    - A counter reaching a target value
    - Etc.
2. Generate control as output signals
  - Signals to control data path modules
    - E.g., ALU function signals, MUXs’ selection signals, register enable signals, etc.
  - Signals to other interfacing modules (if any)
    - E.g., A “done” signal when done completing a task

Digital Logic Design and Computer Organization with Computer Architecture for Security

4

## Register Transfer Notation (RTN)

- Formally describes a data path operation
- May use an arbitrary or an HDL syntax
- Examples:
  - `CNTR ← CNTR + 1 //incrementing counter`
  - `CNTR <= CNTR + 1; //Verilog HDL`
  - `R ← R[7]//R[7:1] //Arithmetic right shift`
  - `R <= R >>> 1; //arithmetic right shift (Verilog)`
  - `R <= {R[7], R[7:1]}; //arithmetic right shift (Verilog)`
  - `M[x] ← R; //memory transfer (write)`
  - `R ← M[x]; //memory transfer (read)`
  - Etc.

Digital Logic Design and Computer Organization with Computer Architecture for Security

5

## Data Path Types

1. Single-cycle
  - Performs an operation specified by one or more RTNs during a single clock cycle
  - Uses more hardware but simple controller
  - Can be the lowest (e.g., execute a program)
2. Multi-cycle
  - Performs an operation specified by one or more RTNs using multiple clock cycles
  - Uses less hardware but more complex controller
    - Operations share hardware modules
    - Faster than single-cycle
3. Pipelined
  - Operates like an assembly line
  - Efficient when performing stream of operations
    - Stream of instructions
    - Stream of FLOPs
    - Etc.
  - Multiple clock cycles per operation
  - Concurrent processing
  - More hardware like single-cycle
  - High performance

Digital Logic Design and Computer Organization with Computer Architecture for Security

6

## Design Example (Comparing different data paths)

- Problem:** Consider two complex RTNs:
  - $R \leftarrow A + B + C + D$  and  $R \leftarrow A + B + C - D$
  - Each called **fused** for involving three or more operands
  - 3-operand fused instructions common in some modern processors
    - E.g., Intel's multiply-add instruction
    - $R \leftarrow A + B * C$
    - Useful for matrix operations (e.g., matrix multiplication)
  - One instruction instead of two
    - FMA vs. FMUL and FADD
- Advantage of fused RTN:**
  - Reduces arithmetic rounding errors (refer to FP arithmetic)
  - Reduces number of instructions
- Complex fused RTNs application**
  - Consider reconfigurable CPU

Digital Logic Design and Computer Organization with Computer Architecture for Security

7

## 1. Single-Cycle Data Path

- One clock cycle to perform  $R \leftarrow A + B + C + D$  or  $R \leftarrow A + B + C - D$
- Hardware required**
  - two adders
  - One adder/subtractor
  - One register
- Controller required**
  - A mode signal deciding  $A + B + C + D$  or  $A + B + C - D$
- Estimated data path clock period = ?**

$$\tau \geq 2\Delta_{ADD} + \Delta_{ADD/SUB} + \tau_{st} + \tau_{cq} + \tau_{cs}$$

Digital Logic Design and Computer Organization with Computer Architecture for Security

8

## 2. Multi-Cycle Data Path

- One or more clock cycles per RTN**
  - Divide complex RTNs into simple ones
  - Multiple clock cycles to perform  $R \leftarrow A + B + C + D$  or  $R \leftarrow A + B + C - D$
- More complex controller**
  - Operations are done in steps
- Less hardware**
  - One adder/subtractor, two MUXes, and a register
  - Data path modules used multiple times
    - E.g., adder/subtractor module used 3 times
- Estimated data path clock period = ?**

$$\tau \geq \Delta_{MUX1} + \Delta_{ADD/SUB} + \Delta_{MUX2} + \tau_{st} + \tau_{cq} + \tau_{cs}$$

Digital Logic Design and Computer Organization with Computer Architecture for Security

9

## 3. Pipelined Data Path

- Three clock cycles per RTN**
  - Cycle 1:  $R \leftarrow A$
  - Cycle 2:  $R \leftarrow R + B$
  - Cycle 3:  $R \leftarrow R + C$
  - Cycle 4: if  $m = 0$  then  $R \leftarrow R + D$   
else  $R \leftarrow R - D$
- Hardware similar to single-cycle except organized into stages**
- Efficient when performing  $A_i + B_i + C_i \pm D_i$  for  $i = 0$  to  $N-1$  for large  $N$**
- Requires less total time to complete a job**
- Estimated clock period = ?**

$$\tau \geq \Delta_{ADD/SUB} + \tau_{st} + \tau_{cq} + \tau_{cs}$$

Digital Logic Design and Computer Organization with Computer Architecture for Security

10

## Charting Pipeline Behavior

Final Stage → Stage 3   Stage 2   Stage 1   Cycle

|     |              |              |              |              |     |
|-----|--------------|--------------|--------------|--------------|-----|
|     | X0 ← A0 + B0 | Y0 ← X0 + C0 | R0 ← Y0 ± D0 | R1 ← Y1 ± D1 | ... |
|     | X1 ← A1 + B1 | Y1 ← X1 + C1 | Y2 ← X2 + C2 | ...          |     |
|     | X2 ← A2 + B2 | Y3 ← X3 + C3 | X3 ← A3 + B3 | Y4 ← X4 + C4 | ... |
| 1   |              |              |              |              |     |
| 2   |              |              |              |              |     |
| 3   |              |              |              |              |     |
| 4   |              |              |              |              |     |
| 5   |              |              |              |              |     |
| ... |              |              |              |              |     |

**Horizontal Organization**

Final Stage ↓

|     |              |              |              |  |
|-----|--------------|--------------|--------------|--|
|     | Stage 1      | Stage 2      | Stage 3      |  |
| 1   | X0 ← A0 + B0 | Y0 ← X0 + C0 | R0 ← Y0 ± D0 |  |
| 2   | X1 ← A1 + B1 | Y1 ← X1 + C1 | Y2 ← X2 + C2 |  |
| 3   | X2 ← A2 + B2 | Y3 ← X3 + C3 | X3 ← A3 + B3 |  |
| 4   | X3 ← A3 + B3 | Y4 ← X4 + C4 | Y5 ← X5 + C5 |  |
| 5   | X4 ← A4 + B4 | Y6 ← X6 + C6 | Y7 ← X7 + C7 |  |
| ... | ...          | ...          | ...          |  |

**Vertical Organization**

Digital Logic Design and Computer Organization with Computer Architecture for Security

11

## Performance Parameters

- Processing time (T)**
  - Time required to perform a job (e.g.,  $N$  tasks)
- Speedup**
  - Calculated as a ratio
    - $\frac{T_{slow}}{T_{fast}}$ , T of slower system divided by T of faster system
    - E.g., System A is 1.5 times faster than system B
- Efficiency**
  - Calculated as a ratio
    - $\frac{S}{S_{ideal}}$ , speedup divided by ideal speedup
    - E.g., 100% efficiency when all hardware modules utilized all the time
- Throughput**
  - Calculated as ratio
    - $\frac{N}{T}$ , Number of tasks performed divided by T
    - E.g., number of tasks performed per second

Digital Logic Design and Computer Organization with Computer Architecture for Security

12

## Speedup (Pipelining vs. Single-cycle)

- $N$  tasks:  $A_i + B_i + C_i \pm D_i$  for  $i = 0$  to  $N-1$

$$\text{Speedup} = \frac{T_{\text{single-cycle}}}{T_{\text{pipeline}}} \quad \text{Assume } \tau_{\text{single-cycle}} \approx K \tau_{\text{pipeline}}$$

- For  $N = 3$  and  $k = 3$ , speedup = ? 1.8
- For  $N = 1000$ ,  $k = 3$ , speedup = ? 2.99
- For very large  $N$ ? (e.g., as  $N \rightarrow \infty$ ) Speedup approaches  $k$ , the number of stages

Digital Logic Design and Computer Organization with Computer Architecture for Security 13

## Efficiency of Pipelining

- $N$  tasks:  $A_i + B_i + C_i \pm D_i$  for  $i = 0$  to  $N-1$

$$\text{Efficiency} = \frac{\text{Speedup}}{K}$$

- For  $N = 3$  and  $k = 3$ , Efficiency = ? 60%
- For  $N = 1000$  and  $k = 3$ , Efficiency = ? 99.8%
- For very large  $N$ ? (e.g., as  $N \rightarrow \infty$ ) 100%

Digital Logic Design and Computer Organization with Computer Architecture for Security 14

## Throughput of Pipelining

- $N$  tasks:  $A_i + B_i + C_i \pm D_i$  for  $i = 0$  to  $N-1$

$$\text{Throughput} = \frac{N}{T_{\text{pipeline}}}$$

- For  $N = 3$  and  $k = 3$ , the throughput is about?  $0.6 \tau^{-1}$
- For  $N = 1000$  and  $k = 3$ , the throughput is about?  $0.998 \tau^{-1}$
- Throughput as  $N \rightarrow \infty$ ?  $\tau^{-1}$  or  $f$
- Standards
  - MIPS
    - E.g., 100 MIPS CPU
  - FLOPS
    - E.g., 1T FLOPS system
  - Benchmarks, more typical
    - SPEC CPU2006 for measuring performance of computer systems
    - SPECviewperf for measuring performance of computer-graphic systems

Digital Logic Design and Computer Organization with Computer Architecture for Security 15

## Types of Control Units

- Modeled as FSD for multi-cycle data paths
- Micro-programmed Control (programmable)
  - Easy to modify after implementation
  - Also used if FSD would be very large
- Pipeline control for pipelined data paths

Digital Logic Design and Computer Organization with Computer Architecture for Security 16

## 1. FSD-based Control Unit

- Design steps:
  - Draw FSD for the controller
    - Assume CU triggered by external signal *start*
    - 4 clock cycles to perform  $A + B + C \pm D$
    - Specify data path operations with RTNs
  - Complete design (assuming all structural)
    - Determine specific data path control signals
    - Draw detailed block diagram the controller
    - Construct truth tables
    - Find minimal expressions
    - Combine with data path to complete design

Digital Logic Design and Computer Organization with Computer Architecture for Security 17

## 2. Microprogrammed Control

**Example**

```

0:if start == 0 then go to 0;
1:R ← A;
2:R ← R + B;
3:R ← R + C;
4:if mode == 0 then R ← R + D; go to 0;
5:R ← R - D; go to 0;
  
```

- Described as a program
  - Conditional statements use signal values
  - RTNs specifies data path operations
  - Branching changes program flow
- Tree pieces of hardware:
  - A 2-function counter, called microprogram counter (MPC)
  - A memory called control memory (CM) stores micro-instructions representation in binary
  - A MUX controls the functions of MPC

Digital Logic Design and Computer Organization with Computer Architecture for Security 18

## 3. Pipeline Control

- As oppose to other two controllers, signals are generated at same time but applied to data path at different times
- Example, signal mode enters pipeline but not used until stage 3

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Circuit Energy Consumption

- Energy is a function of capacitance and voltage
- Consider a CMOS NOT gate
- 0-1 transition at output draws energy from power source
  - ½ stored in capacitance
  - ½ dissipated as heat
$$E_{Dynamic}^{0 \rightarrow 1} = \int_{v=0}^{v=V_{DD}} C v \, dv = \frac{1}{2} C V_{DD}^2 \text{ Joules}$$
- 1-0 transition doesn't draw energy from power source but ½ charge in capacitance dissipates as heat
 
$$E_{Dynamic}^{1 \rightarrow 0} = \frac{1}{2} C V_{DD}^2 \text{ Joules}$$

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Circuit Power Consumption

- Power is energy consumed over time
  - 1 Watts = Amounts of Joules consumed in one second
- During each clock period some signals make 1-0 and some 0-1 transitions
- Energy consumed for all 0-1 transitions during one clock period:
 
$$E_{Dynamic} = \frac{1}{2} C_{Total} V_{DD}^2 \text{ Joules}$$
- Power consumed during one clock period
 
$$P_{Dynamic} = \frac{E_{Dynamic}}{\tau} = \frac{1}{2} C_{Total} V_{DD}^2 f \text{ watts}$$
- Signals not changing during clock period consume only static power
 
$$P_{Static} = V_{DD} I_{DD}$$

$I_{DD}$ , DC or leakage current

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Some Ways to Reduce Dynamic Power Consumption

- Reduce total capacitance,  $C_{Total}$
- Reduce supply voltage,  $V_{DD}$
- Reduce clock frequency,  $f$ .
- Reduce glitches

$$P_{Dynamic} = \frac{1}{2} C_{Total} V_{DD}^2 f$$

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Power vs. Energy

- Energy is independent of clock frequency
  - Increasing clock frequency increases power, not energy consumption
- Consider two CPUs A and B
  - Suppose A consumes 20% more power but executes a program quicker; requires 40% less time than B
  - Which CPU is more energy efficient?

CPU A uses 72% of energy used by CPU B

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Unsigned Sequential Multiplier

- Hardware:
  - Three registers A, B, and P
    - A and B are n-bits, P is n+1 bits
  - A Mod n+1 counter (CNTR)
  - An n-bit adder
  - Basic algorithm:
    - Initialization
      - Load values and clear register P and CNTR
      - $P \leftarrow P + A$  if next multiplier bit ( $b_0$ ) is 1
      - Skipping over 0 multiplier bits
    - Right shift P //B, increment CNTR
    - If not done go to 2
    - Else, final product is in P//B

|                                                                                                                               |
|-------------------------------------------------------------------------------------------------------------------------------|
| $\begin{array}{r} 1001 & A \\ \times 1011 & B \\ \hline 1001 & \\ 0000 & \\ 1001 & \\ \hline 1100011 & P = B * A \end{array}$ |
|-------------------------------------------------------------------------------------------------------------------------------|

Digital Logic Design and Computer Organization with Computer Architecture for Security

## Multiplier Algorithm

```

A ← A_value; B ← B_value; P ← 0; CNTR ← 0;
Do
    if(B[0] == 1)
        P ← P + A;
    {P, B} ← {P, B} >> 1;
    CNTR ← CNTR + 1;
While CNTR < n

```

Digital Logic Design and Computer Organization with Computer Architecture for Security

25

## Design Entry Options (Data Path + FSD-based Controller)

### Option I: All structural

- Design all modules at gate level
- Recommended for schematic-only design tools

### Option II: Hybrid

- Behavioral models for modules
- Combine modules to create structural model

### Option III: All behavioral

- Behavioral model of NSG
- Behavioral model of OG
  - Describes data path
  - OG not combinational circuit in this case
- Behavioral model of FFs

Digital Logic Design and Computer Organization with Computer Architecture for Security

26

## Multiplier Data Path (Structural)



Digital Logic Design and Computer Organization with Computer Architecture for Security

27

## Multiplier Controller (FSD)

- Assumes external input signal **start\_asyn**

- Uses one synchronizing FF (Ch5)

- Assumes external Moore output **done\_moore**

- indicates product is ready when asserted

- Draw controller FSD

- E.g., Mealy

Digital Logic Design and Computer Organization with Computer Architecture for Security

28

## Multiplier Model (All Behavioral)

- One module with five “always” blocks:
  1. Model of synchronization FF
  2. Model of **done-Moore** FF
  3. Combination circuit model of controller NSG
  4. Model of OG as data path (sequential circuit)
  5. Model of controller FFs

Digital Logic Design and Computer Organization with Computer Architecture for Security

29

## Synchronizing FF

- Resets when **\_reset** or **done** asserted

- Otherwise, loads asynchronous input **start\_asyn**.

```

always@(posedge clock or negedge _reset or posedge done)
begin
    if(_reset == 0 || done == 1)
        start <= 1'b0;
    else
        start <= start_asyn;
end

```

Digital Logic Design and Computer Organization with Computer Architecture for Security

30

## Converting done-Mealy to done-Moore

- Asynchronously de-asserts done-Moore when \_reset or start\_asyn asserted
- Synchronously asserts done-Moore when done-Mealy asserted

```
always@(posedge clock or negedge _reset or posedge start_asyn)
begin
    if(_reset == 0 || start_asyn == 1)
        done_moore = 1'b0;
    else
        done_moore <= done; //done = 1 kept for one clock cycle
end
endmodule
```

Digital Logic Design and Computer Organization with Computer Architecture for Security

31

## OG defines data path

```
case(current_state)
Start: if(start == 1)
begin // initialize
    a <= a_value;
    b <= b_value;
    p <= 0;
    cntr <= 0;
end
Check: begin
    if (cntr < 8)
        if(b[0] == 1)
            p <= p[7:0] + a;
        else
            begin
                {p, b} <= {p, b} >> 1;
                cntr <= cntr + 1;
            end
    end
Shift_Inc: begin
    {p, b} <= {p, b} >> 1;
    cntr <= cntr + 1;
end
endcase
end
```

Digital Logic Design and Computer Organization with Computer Architecture for Security

32

## FSD Verification

- Was the FSD description correct?
- Tools can generate FSD from the HDL description for verification purposes



Digital Logic Design and Computer Organization with Computer Architecture for Security

33

## Simulation Results

- $0x03 * 0x7f = 0x017d$ , longer multiplication time
- $0x7f * 0x03 = 0x17d$ , shorter multiplication time
- Optionally, can implement an initial switching circuit to switch A with B if A < B.



Digital Logic Design and Computer Organization with Computer Architecture for Security

34

## 2's Complement Multiplier

- Multiplier B and multiplicand A are 2's complement numbers resulting in 2's complement product
- Basic data path operation is  $P \leftarrow P + A$  or  $P \leftarrow P - A$
- Examines B bits two bits at a time overlapping
- Hardware:
  - Similar to unsigned multiplier except for adder/subtractor
- Algorithm:
 

```
A <- A_value[n-1]/A_value; B <- B_value//0; P <- 0; CNTR <- 0;
Do
    if(B[1] ⊕ B[0] == 1) //overlapping two bits
        P <- P ± A; //where m = B[1]
    {P, B} <- {P, B} >> 1;
    CNTR <- CNTR + 1;
  While CNTR < n
E.g., if B_value = (1111)2s = -1, then P = A * -1 = -A for arbitrary A_value
```

Digital Logic Design and Computer Organization with Computer Architecture for Security

35

## 2's Complement Multiplier Microprogram Controller

### Multiplier microprogram

|   |                                                |
|---|------------------------------------------------|
| 0 | If start == 0 go to 0;                         |
| 1 | P <- 0, A <- A_value, B <- B_value, CNTR <- 0; |
| 2 | If CNTR == n go to 6;                          |
| 3 | If B[1] ⊕ B[0] == 0 go to 5                    |
| 4 | P <- P ± A;                                    |
| 5 | {P, B} >> 1, CNTR <- CNTR + 1, go to 2;        |
| 6 | done = 1, go to 0;                             |

### Controller



Digital Logic Design and Computer Organization with Computer Architecture for Security

36

## 2D Virtual Object Rotation

- Rotating object by  $\beta$  degrees requires rotating each vector by  $\beta$  degrees
 
$$X' = \cos \beta * X - \sin \beta * Y$$

$$Y' = \sin \beta * X + \cos \beta * Y$$
- Implementation requires cosine and sin functions
  - More hardware
- **CORDIC algorithm simpler**
  - Can use integer arithmetic
  - Addition, subtraction, and left shift for multiplication
  - Rotation done in steps
    - Factor out  $\cos \beta$ 

$$X' = \cos \beta * (X - \tan \beta * Y)$$

$$Y' = \cos \beta * (\tan \beta * X + Y)$$
    - E.g., 55° rotation is done in 4 steps:
      - 45°, 7°, 2°, 1°
      - Note,  $\tan 45^\circ = 1$ ,  $\tan 7^\circ = 1/8$ ,  $\tan 2^\circ = 1/32$ ,  $\tan 1^\circ = 1/64$
  - Done in steps results in a gain
    - E.g., for 55°, gain is  $\frac{\cos 45^\circ \cos 7^\circ \cos 2^\circ \cos 1^\circ}{\cos 45^\circ \cos 7^\circ \cos 2^\circ \cos 1^\circ}$
    - Rotated object looks bigger

Digital Logic Design and Computer Organization with Computer Architecture for Security 37

## CORDIC Rotation Algorithm

- Fix number of steps for  $|\beta| \leq 90^\circ$ 
  - Rotation can be done in 7 steps
    - E.g., for 45°, steps are 45°, 27°, -14°, -7°, 4°, 2°, 1°
  - $\tan \beta * X$  and  $\tan \beta * Y$  are computed by shifting X and Y by 0 bits in step 1, 2 bits in step 2, ..., and 6 bits in step 7
- An initial rotation of 90° or 180° if  $|\beta| > 90^\circ$
- Replace  $\beta$  with  $\beta \bmod 360^\circ$  if  $|\beta| > 360^\circ$ ,
- Algorithm for  $|\beta| \leq 90^\circ$ :
 

```

if ( $\beta >= 0$ )
    d = 1;
else
    d = -1;
for (i = 0; i < 7, i++)
    x = d * (x >> i) + Y_i; //arithmetic right shift (integer division)
    y = d * (x >> i) + Y_i; //tan-12-i read from table or
     $\beta = \beta - d * \tan^{-1}2^{-i}$  //hard coded as 45, 27, 14, 7, 4, 2, 1
endfor
      
```

Digital Logic Design and Computer Organization with Computer Architecture for Security 38

## Pipelined CORDIC Rotation

- Loop unrolling: 7 stages for 7 iterations of the for-loop
- Hard coded rotation angles
- Wired shifts implement integer division
- Simpler controller
  - All stages perform the same computations but operate on different data

Digital Logic Design and Computer Organization with Computer Architecture for Security 39

## Simulation Result

- Illustrates pipeline vector transformation
- At 1 GHz clock, one billion peak transformations per second
- Typically million vectors per virtual object

Digital Logic Design and Computer Organization with Computer Architecture for Security 40

## Rotated Virtual Object

- Shows a gain of  $1.653 = 1/0.6048$ 
  - CPU must multiply each new coordinate point by 0.6048 to get the actual size
 
$$0.6048 = \cos 47^\circ \cdot \cos 27^\circ \cdot \cos 14^\circ \cdot \cos 7^\circ \cdot \cos 2^\circ \cdot \cos 1^\circ$$
- New coordinate points include small error due to integer division used in the CORDIC algorithm

Digital Logic Design and Computer Organization with Computer Architecture for Security 41

# Chapter 7

## Memory

Digital Logic Design and Computer Organization with Computer Architecture for Security

1

## In this chapter

- Memory technologies
- Memory design
  - Memory cell
  - Memory chip internal organization
- Memory communication protocols
- Data storage schemes
- UMA vs. NUMA system architectures

Digital Logic Design and Computer Organization with Computer Architecture for Security

2

## Logical organization

- Number of addresses by word size
  - E.g.,  $1K \times 8$
  - E.g.,  $512 \times 16$
- Total size in bytes
  - 1KB
  - 1MB
  - 1GB
  - Etc.

| Address | Binary (10 bits) | Data             |
|---------|------------------|------------------|
| 0'      | 0000000000       | 00010001         |
| 1'      | 0000000001       | 00011100         |
| 2'      | 0000000002       | 00111100         |
| 3'      | 0000000003       | 11000000         |
| ...     | ...              | ...              |
| 1023'   | 1111111111       | 1000000011001100 |

(a)  $1K \times 8$  Memory

| Address | Binary (9 bits) | Data             |
|---------|-----------------|------------------|
| 0'      | 000000000       | 0001000001010001 |
| 1'      | 000000001       | 0001110001010001 |
| 2'      | 000000002       | 0011110001010001 |
| 3'      | 000000003       | 1100000001010001 |
| ...     | ...             | ...              |
| 511'    | 111111111       | 1000000011001100 |

(b)  $512 \times 16$  Memory

Digital Logic Design and Computer Organization with Computer Architecture for Security

3

## Memory Technologies

- Non-Volatile
  - Each memory cell retains 0 or 1 indefinitely
  - Word accessible
  - ROM
  - PROM
  - EEPROM
    - Can be written limited number (~ 100,000) of times
    - Older technologies EPROM
    - Applications: boot loader, LUT, firmware
  - Block accessible (as secondary memory storage)
    - Magnetic disk
    - Flash memory (EEPROM based)
- Volatile
  - Each memory cell retains 0 or 1 as long as powered
  - Word accessible only
  - SRAM
  - DRAM
    - SDRAM (DDR, DDR2, DDR3, etc.) as modern DRAM

Digital Logic Design and Computer Organization with Computer Architecture for Security

4

## RAM cells

- SRAM
  - Hardware
    - 6 transistors
  - Retains data while powered
  - fast
- DRAM
  - Hardware
    - One transistor
    - One small capacitor
  - Much smaller than SRAM cell
  - Cheaper per bit
  - Slow

(a) An SRAM Cell

(b) A DRAM Cell

(c) NMOS Transistor

Digital Logic Design and Computer Organization with Computer Architecture for Security

5

## Organization and Access

- 2D Organization (cell array)
  - Rectangular as the die
  - Requires fewer total number of wires
- Read/Write Operation
  1. First select a row
    - Also called row activation
  2. Then select one or more cells from activated row to either read or write
- Burst access
  - Access multiple cells in specific order typically from a single row
  - Cells form a block of data (e.g., 32B)
- Page access
  - Access many cells from one or more rows
  - Cells form a large block of data (e.g., 4KB)

(a) One-dimensional organization

(b) Two-dimensional organization (shown for read)

Digital Logic Design and Computer Organization with Computer Architecture for Security

6

## Multi-bank

- Allows seamless access
  - Cells read/written may belong to different banks
- Can overlap operations
  - Activating a row in one bank while read/writing cells from already activated row in another bank

Digital Logic Design and Computer Organization with Computer Architecture for Security 7

## Memory Interface

- Requires address lines (address bus)
  - Address for DRAM is provided in two cycles
- Requires control lines (control bus)
  - Indicating enabling, reading, and writing
- Requires data lines (data bus)
  - Bi-directional data bus
  - Separate input and output data lines

Digital Logic Design and Computer Organization with Computer Architecture for Security (a) Logical View 8

(b) An SRAM block diagram

(c) A DRAM block diagram 8

## SDRAM (Synchronous DRAM)

- Interface signals form memory command
- Synchronous operation makes design of computers easier, cheaper
- Today SDRAM technologies are used for main memory

Digital Logic Design and Computer Organization with Computer Architecture for Security 9

## SRAM Cell Model

- Real RAM cells cannot be simulated with logic simulation tools
- It can be modeled with SR latch and tri-state buffers to mimic similar behavior
- Register converts Hi-Z output to 0

Digital Logic Design and Computer Organization with Computer Architecture for Security 10

(a) SRAM Cell logic model

(b) SRAM Cell block diagram

(c) A pull-down tri-state buffer and its truth table

## Memory Design

- Memory chip**
  - Internal organization
    - Single or multi-banked
    - Bi-directional data bus
  - Access protocol defines signal timing
- Memory module**
  - Wider data bus than memory chip
- Memory unit**
  - Wider address bus than memory module

Digital Logic Design and Computer Organization with Computer Architecture for Security 11

## Memory Chip

- Requires two decoders
  - Row decoder activates a row
  - Column decoder selects one or more cells
- Input and output tri-state buffers to implement bi-directional data bus

Digital Logic Design and Computer Organization with Computer Architecture for Security 12

## Memory Module

- Also called memory card
- 32- or 64-bit data bus
  - Wider if ECC
- For building memory unit(s) as main memory

Digital Logic Design and Computer Organization with Computer Architecture for Security

13

## Memory Unit

- Maps logical memory space to physical memory space
- Different mapping options
  - High-order interleaving
  - Low-order interleaving (later)
  - Hybrid
    - E.g., NUMA architectures

Digital Logic Design and Computer Organization with Computer Architecture for Security

14

## Memory Access

- Follows specific communication protocol and signal timing
- Memory Cycle
  - Starts when address decoding begins
  - WAits to activate a row and select cell(s)
  - Completes read or write operation
  - Ends cycle
- Timing parameters
  - Access time
    - Read: From start until data appears on data bus
    - Write: From start until data is written to memory cells
  - Transfer time
    - Time to transfer data to/from memory
- Memory latency
  - Access time + transfer time

Digital Logic Design and Computer Organization with Computer Architecture for Security

15

## SDRAM

- Concurrent memory operations
- Read Protocol:
  - Issue burst size
  - Issue row address
  - Wait for row to activate (fixed number of clock cycles)
  - Issue column address
  - Repeat step 4 as needed
    - Timing depends on burst size
  - Data placed on data bus, one per clock cycle, seamlessly

Digital Logic Design and Computer Organization with Computer Architecture for Security

16

## DDR SDRAM

- Operation similar to SDRAM
- Data placed on data bus on rising as well as falling clock edges
  - Two data items per clock cycle
  - Doubling the bandwidth of SDRAM
    - Doubling number of data bytes per second

Digital Logic Design and Computer Organization with Computer Architecture for Security

17

## Data Interleaving

- High-Order Interleaving
  - Data for consecutive memory addresses are stored in the same memory module/unit
  - Advantage:
    - Divides memory space into two or more disjoint sub-spaces
    - Each sub-space may be accessed by a separate processor
- Low-Order (fine) Interleaving
  - Data for consecutive memory address are stored in different memory modules/units
  - Advantage:
    - Increases memory bandwidth

Digital Logic Design and Computer Organization with Computer Architecture for Security

18



# **Chapter 8**

# **INSTRUCTION SET ARCHITECTURE**

# **Chapter 8: Instruction Set architecture**

- **Introduction**
- **Type of Instructions**
- **High level language program to execution**
- **Instruction cycle**
- **Instruction Set architecture (ISA)**
- **Addressing Modes and Machine code**
- **Types of ISA**
- **Design Examples ACC ISA**
- **Sparc and Pentium Assembly program architecture**
- **Performance Parameters**

# Chapter 8: Introduction

- The preceding chapters covered digital design concepts and , Datapath and Memory organization
- Modern CPUs implement pipelining and instruction-level parallelism (ILP) to increase performance
- Data path fetches an instruction from memory and decodes the instruction

# Chapter 8: Introduction (cont'd)

- An instruction set architecture (ISA) refers to Data path that executes a program.
  - Single cycle
  - Multicycle
  - Pipeline
- Data-dependent instructions go through the pipeline stages
  - Can lead to stall in the pipeline
  - May reduce pipeline efficiency
- Branch instructions change execution flow
  - jne (Jump if not equal to)
  - Jge (Jump if greater than or equal to)
  - Jmp (Will jump to specified address)

# Type of Instructions

- A processor is designed for general-purpose programming
  - Graphics processing Units (GPU)
  - Digital signal processing (DSP)
- Special purpose instructions
  - SIMD
  - Computer security related instructions
- Data-manipulation instructions
- Data-movement instructions
- Program-flow control instructions

# High level language program to execution

As shown in Fig. 8.1, a software program is typically written in a high-level language, such as C/C++ or Java, and is translated by a compiler into assembly



FIGURE 8.1 Basic program translation and execution process.

# Instruction Cycle

- Data path has four main tasks
  - Fetch
  - Decode
  - Execute
    - May access data memory (another cache)
  - Writes Results



FIGURE 8.2 An instruction data path with instruction memory (IM) and data memory (DM).

# Instruction Set architecture (ISA)

- Types of Instruction set architecture:
  - Stack ISA
  - Accumulator ISA
  - CISC ISA (complex instruction set computer)
  - RISC ISA (reduced instruction set computer)
- An opcode is part of instruction set which tells the hardware what operation needs to be performed
  - Assembly language mnemonic form, an opcode is a command such as ADD, MOV, SUB or JMP
  - For example: MOV AL, 34h
- An instruction includes a set of operands that are specified explicitly, implicitly

# Addressing Modes and Syntax Examples

- Immediate
  - E.g., Add R1, 9;
- Direct
  - E.g., ADD R1, (M[9]);
- Register
  - E.g., ADD R1, R2;
- Register direct
  - E.g., ADD R1, (R2);
- Register indexed
  - E.g., ADD R1, R2, (M[9]);

| Operand Notation | Addressing Mode                                                                                              |
|------------------|--------------------------------------------------------------------------------------------------------------|
| V                | I, immediate: V is an immediate input operand, a 2's complement number.                                      |
| (V)              | D, direct: V is a memory address and (V) indicates the content of memory address V (i.e., M[V]).             |
| R                | R, register: Indicates an input data register source or a destination register or both                       |
| R, (V)           | X, indexed: V is a memory address and R + V is the address of the next data item in memory (i.e., M[R + V]). |

TABLE 8.1 Examples of Addressing Modes

# Machine Instruction Format (Examples)

|                     | <b>Example</b> | <b>Instruction</b> |
|---------------------|----------------|--------------------|
| <b>Stack:</b>       | 1              | ADD                |
| <b>Accumulator:</b> | 2              | ADD 9              |
| <b>CISC:</b>        | 3              | ADD R1, -9         |
| <b>CISC:</b>        | 4              | ADD R1, (9)        |
| <b>CISC:</b>        | 5              | ADD R1, R2, (9)    |
| <b>CISC:</b>        | 6              | ADD R1, R2         |
| <b>RISC:</b>        | 7              | ADD R3, R1, R2     |

Table 8.2

| <b>Example</b>     | <b>Instruction Format</b>        |
|--------------------|----------------------------------|
| 1: ADD             | Op-code                          |
| 2: ADD 9           | Op-code   I   data               |
| 3: ADD R1, -9      | Op-code   RI   r   data          |
| 4: ADD R1, (9)     | Op-code   RD   r   address       |
| 5: ADD R1, (R2), 9 | Op-code   RX   r1   r2   address |
| 6: ADD R1, R2      | Op-code   RR   r1   r2           |
| 7: ADD R1, R2, R3  | Op-code   RR   r1   r2   r3      |

Fig 8.3

# Types of ISA – Stack (1)

- Arithmetic instructions have no explicitly declared operands
- Operands are on stack inside CPU
- E.g., ADD
- Example:  $A = B * (C + D);$ 
  - Requires converting statement into reverse polish notation
    - $CD+B*=A$
  - Reverse polish notation converted to assembly program?
- Short instructions (advantage)
- Stack as LIFO buffer (disadvantage)

# Stack ISA - Example of assembly program

- Example Program :

| Instruction<br>number |                                                                                              |
|-----------------------|----------------------------------------------------------------------------------------------|
| 1:                    | PUSH (C) //stack $\leftarrow M[C]$                                                           |
| 2:                    | PUSH (D) //stack $\leftarrow M[D]$                                                           |
| 3:                    | ADD //stack $\leftarrow (C) + (D)$ , values popped, added,<br>//result pushed                |
| 4:                    | PUSH (B) //stack $\leftarrow M[B]$                                                           |
| 5:                    | MUL //stack $\leftarrow ((C) + (D)) * (B)$ , values popped, added,<br>//result pushed        |
| 6:                    | POP (A) // $M[A] \leftarrow (((C) + (D)) * (B))$ , value is popped<br>//and stored in memory |

# Types of ISA – Accumulator (2)

- One of the operands is a known register, called accumulator (Acc)
  - LD (C)
- Second operand is immediate or data from memory
- Acc always destination register
  - E.g., ADD 9 //Acc  $\leftarrow$  Acc + 9
  - E.g., ADD (C) //Acc  $\leftarrow$  Acc + M[C]
- Example Program: A = B \* (C + D); ?
- Simple data path, less hardware (advantage)
- Acc bottleneck (disadvantage)
  - E.g., A = (C + D) \* (E - F);

# Acc ISA - Example of assembly program

- Example Program:

1: LD (C)

2: ADD (D)

3: MUL (B)

4: ST (A)

# Types of ISA – CISC (3)

- CISC (complex instruction set computer)
- Many simple and complex instructions
- Multiple addressing modes
- Many working registers (e.g., 16)
  - Recent results kept inside CPU in registers
- Arithmetic instructions can access memory
  - E.g., ADD R1, R2 // $R1 \leftarrow R1 + R2$
  - E.g., ADD R1, (9) // $R1 \leftarrow R1 + M[9]$
- Example:  $A = B * (C + D);$ 
  - Program?
- Complex instruction set (advantage)
  - Fewer instructions per program
- Complex instruction set (disadvantage)
  - Complex data path
  - Limited pipelining of instruction cycle
  - Many instructions and addressing modes seldom used

# CISC-ISA: Example of assembly program

- Example Program:

1. LD R1, (C)
2. ADD R1, (D)
3. MUL R1, (B)
4. ST (A), R1

# Types of ISA – RISC-(4)

- RISC (reduced instruction set computer)
- Arithmetic instructions cannot access memory
- Many more working registers (e.g., 32)
- Implements only most commonly used instructions
- 3-operand instructions
  - E.g., ADD R3, R1, R2 //R3  $\leftarrow$  R1 + R2
  - E.g., ADD R2, R1, 9 //R2  $\leftarrow$  R1 + 9
- Only LD and ST instructions access memory
- Example Program: A = B \* (C + D);
- Simpler and highly pipelined data path (advantage)
- Requires compiler optimization to increase efficiency
- The architecture of all modern processing cores

# RISC-ISA: Example of assembly program

- Example Program:

1. LD R1, (C)
2. LD R2, (D)
3. ADD R3, R1, R2
4. LD R4, (B)
5. MUL R5, R3, R4
6. ST (A), R5

# Design Example: Acc-ISA

- We start with example high-level language program
- Design instruction set for example program
- Generate assembly program
- Generate binary machine instructions
- Create Acc-ISA data path
- Model in HDL
- Simulation results

# Example Instruction format for Acc-ISA

.code

|        |          |
|--------|----------|
| LD     | 0        |
| ST     | (sum)    |
| ST     | (i)      |
| L1:    |          |
| CMP    | 7        |
| JGT    | L2       |
| MVX    |          |
| LD     | X(array) |
| ADD    | (sum)    |
| ST     | (sum)    |
| LD     | (i)      |
| ADD    | 1        |
| ST     | (i)      |
| JMP    | L1       |
| L2:    | ...      |
| .data  |          |
| array: | RB 16    |
| i:     | RB 2     |
| sum:   | RB 2     |

*Example code #1*

```
int array[8];
int i, sum;
sum = 0;
for (i = 0; i < 8; i++)
    sum = sum + array[i];
```

Example 8.2. The listing of an Acc-ISA assembly language program for the program in Example code 1.

# Example Program

Example code #1:

```
int array[8];
int i, sum;
sum = 0;
for (i = 0; i < 8; i++)
    sum = sum + array[i];
```

## Acc-ISA instruction set?

- Create a list of Acc-ISA instructions to translate the program to assembly language program

| Op-Code | Instruction | Addressing Mode | Example       | Action                                   |
|---------|-------------|-----------------|---------------|------------------------------------------|
| 0       | NOP         |                 | NOP           | Do nothing                               |
| 1       | ADD         | Immediate       | ADD data      | ACC $\leftarrow$ ACC + data              |
| 2       |             | Direct          | ADD (address) | ACC $\leftarrow$ ACC + M[address]        |
| 3       | CMP         | Immediate       | CMP data      | if ACC == data then GTF = 1 else GTF = 0 |
| 4       | JGT         | Immediate       | JGT address   | PP $\leftarrow$ address if GTF = 1       |
| 5       | JMP         | Immediate       | JMP address   | PP $\leftarrow$ address                  |
| 6       | LD          | Immediate       | LD data       | ACC $\leftarrow$ data                    |
| 7       |             | Direct          | LD (address)  | ACC $\leftarrow$ M[address]              |
| 8       |             | Indexed         | LD X(address) | ACC $\leftarrow$ M[X + address]          |
| 9       | MVX         | Register        | MVX           | X $\leftarrow$ ACC                       |
| 10      | ST          | Direct          | ST (address)  | M[address] $\leftarrow$ ACC              |

ACC: Accumulator; GTF: Greater than flag; PP: Program pointer; X: Index register

**TABLE 8.3** Example Acc-ISA Instruction Set That Translates a High-Level Program into an Equivalent Assembly Language Program

# Acc-ISA Example Assembly Program



Figure 8.6 The instruction formats for the Acc-ISA example processor.

# Acc-ISA Machine Instructions

| Address | Instruction | Instruction in binary | Inhex |
|---------|-------------|-----------------------|-------|
| 0:      | LD 0        | 0000,0110;0000,0000   | 0600  |
| 2:      | ST 0xEC     | 0000,1010;1110,1100   | 0AEC  |
| 4:      | ST 0xEE     | 0000,1010;1110,1110   | 0AEE  |
| 6:      | CMP 7       | 0000,0011;0000,0111   | 0307  |
| 8:      | JGT 0x1A    | 0000,0100;0001,1010   | 041A  |
| A:      | MVX         | 0000,1001;0000,0000   | 0900  |
| C:      | LD X(0xF0)  | 0000,1000;1111,0000   | 08F0  |
| E:      | ADD (0xEC)  | 0000,0010;1110,1100   | 02EC  |
| 10:     | ST (0xEC)   | 0000,1010;1110,1100   | 0AEC  |
| 12:     | LD (0xEE)   | 0000,0111;1110,1110   | 07EE  |
| 14:     | ADD 1       | 0000,0001;0000,0001   | 0101  |
| 16:     | ST (0xEE)   | 0000,1010;1110,1110   | 0AEE  |
| 18:     | JMP 6       | 0000,0101;0000,0110   | 0506  |
| 1A:     | ...         |                       |       |

Example 8.5. The manually assembled output for the assembly program in Example 8.2.

# Acc-ISA Pipelined Data path

- Data path consists of four stages (block diagram)



Fig 8.10

# Acc-ISA example assembly program

- Code below converted to:
  - Assembly program

## Example 8.1. A program code listing

```
int array[8];  
int i, sum;  
sum = 0;  
for (i = 0; i < 8; i++)  
    sum = sum + array[i]
```

**Example 8.2.** The listing of an Acc-ISA assembly language program for the program in Example 8.1:

```
.code //start program code  
LD 0 //Initialize, ACC ← 0  
ST (sum) //M[sum] ← ACC  
ST (i) //M[i] ← ACC  
  
L1: CMP 7 //is i > 7? (is ACC == 7?)  
JGT L2 //exit for-loop if yes (PP ← L2)  
MVX //get next index (X ← ACC)  
LD X(array) //get next array element (ACC ← M[array  
//+ X])  
ADD (sum) //and add it to the partial sum (ACC ← ACC  
//+ M[sum])  
ST (sum) //store the partial sum in memory (M[sum]  
//← ACC)  
LD (i) //do i = i + 1: get i (ACC ← M[i]),  
ADD 1 //increment i (ACC ← ACC + 1), and  
ST (i) //save i (M[i] ← ACC).  
JMP L1 //loop back
```

# Single-cycle data path Acc-ISA processor

- Fetch, decode, execute, and write-back units
  - Decode unit contains a combinational circuit that inputs an op-code and generates all the control signals
  - Execute unit contains all the components necessary to execute an instruction
    - MUX is needed to choose either the operand when the operand is an immediate data or  $M[\text{operand}]$  when the operand is an address and indicates memory content



FIGURE 8.7 The Acc-ISA single-cycle data path to execute the program in Example 8.2.

# Pipeline Simulation

- Illustrates pipeline flush on jumps



# Sparc Example Assembly Program

- Note, arithmetic instructions access only registers
- Only “ld” and “st” instructions access memory

```
    st    %g0, [%fp-48]      //Store, Memory[fp -48] ← g0 (g0 always 0) (sum = 0)
    st    %g0, [%fp-44]      //Store, Memory[fp - 44] ← g0 (i = 0)

.LL5:
    ld    [%fp-44], %g1      //Load, g1 ← Memory[i]
    cmp   %g1, 7             //Compare, is g1 > 7?
    bg    .LL6                //Branch if greater than 7

    nop
    ld    [%fp-44], %g1      //Load, g1 ← Memory[i]
    sll   %g1, 2, %g2        //Compute ptr to array[i]: Shift Left Logical (i * 2)
    add   %fp, -8, %g1        //g1 ← fp - 8 (get memory location of array)
    add   %g2, %g1, %g1        //g1 ← g2 + g1 (array location + next i * 2)
    ld    [%fp-48], %g2      //Load sum, g2 ← Memory[fp -48]
    ld    [%g1-32], %g1      //Load array[i], g1 ← Memory[g1 - 32]
    add   %g2, %g1, %g1        //Array[i] + sum, g1 ← g2 + g1
    st    %g1, [%fp-48]      //Store sum, Memory[fp - 48] ← g1
    ld    [%fp-44], %g1      //Load i, g1 ← Memory[fp - 44]
    add   %g1, 1, %g1        //Increment, g1 ← g1 + 1
    st    %g1, [%fp-44]      //Store i, Memory[fp - 44] ← g1
    b     .LL5                //Branch to instruction ay LL5

    nop

.LL6:
```

# Pentium IV Machine instructions (CISC)

- Variable size instructions
- Requires more complex data path and control unit

|         |                         |      |                                |
|---------|-------------------------|------|--------------------------------|
| 401340: | c7 45 d0 00 00 00 00 00 | movl | \$0x0,0xffffffffd0(%ebp)       |
| 401347: | c7 45 d4 00 00 00 00 00 | movl | \$0x0,0xffffffffd4(%ebp)       |
| 40134e: | 83 7d d4 07             | cmpl | \$0x7,0xffffffffd4(%ebp)       |
| 401352: | 7f 13                   | jg   | 401367 <_main+0x77>            |
| 401354: | 8b 45 d4                | mov  | 0xffffffffd4(%ebp),%eax        |
| 401357: | 8b 54 85 d8             | mov  | 0xffffffffd8(%ebp,%eax,4),%edx |
| 40135b: | 8d 45 d0                | lea  | 0xffffffffd0(%ebp),%eax        |
| 40135e: | 01 10                   | add  | %edx,%eax                      |
| 401360: | 8d 45 d4                | lea  | 0xffffffffd4(%ebp),%eax        |
| 401363: | ff 00                   | incl | (%eax)                         |
| 401365: | eb e7                   | jmp  | 40134e <_main+0x5e>            |
| 401367: | b8 02 00 00 00          | ...  |                                |

# Performance Parameters

- **Cycles per instruction**
  - Average number of clock cycles used to execute each instruction

$$CPI = \frac{\text{Number of clock cycles used (N)}}{\text{Number of instructions executed (n)}}$$

- **Execution time (T) of a program**

$$T = CPI * n * \tau$$

# Supplemental Slides

# Acc-ISA Pipelined Data Path



## Stack ISA - Example of assembly program: A = B \* (C+D)

|                    | 4                      |                        | 2                                      |                        |                                                |                                |
|--------------------|------------------------|------------------------|----------------------------------------|------------------------|------------------------------------------------|--------------------------------|
| Logical Stack      | <u>3</u>               | <u>3</u>               | <u>7</u>                               | <u>7</u>               |                                                |                                |
|                    | Four registers         |                        |                                        |                        |                                                |                                |
| Hardware Stack     | 1       ...            | 0   1   ...            | 1       ...                            | 0   1   ...            | 0       ...                                    |                                |
|                    | 1       ...            | 0   1   ...            | 1       ...                            | 1   1   ...            | 1       ...                                    |                                |
|                    | 0       ...            | 1   0   ...            | 1       ...                            | 0   1   ...            | 1       ...                                    |                                |
|                    | 0       ...            | 0   0   ...            | 0       ...                            | 0   0   ...            | 1       ...                                    |                                |
|                    | Push 3,<br>Counter = 1 | Push 4,<br>Counter = 2 | Two Pops, Add,<br>Push,<br>Counter = 1 | Push 2,<br>Counter = 2 | Two Pops,<br>Multiply,<br>Push,<br>Counter = 1 | Pop,<br>Counter = 0<br>(empty) |
| Instruction number | 1                      | 2                      | 3                                      | 4                      | 5                                              | 6                              |

**Figure 8.4** An illustration of stack content when computing the reverse polish notation C D + B \* = A; it is assumed that (B) = 2, (C) = 3, and (D) = 4.

| Instruction number |                                                                                            |
|--------------------|--------------------------------------------------------------------------------------------|
| 1:                 | PUSH (C) //stack $\leftarrow M[C]$                                                         |
| 2:                 | PUSH (D) //stack $\leftarrow M[D]$                                                         |
| 3:                 | ADD //stack $\leftarrow (C) + (D)$ , values popped, added,<br>//result pushed              |
| 4:                 | PUSH (B) //stack $\leftarrow M[B]$                                                         |
| 5:                 | MUL //stack $\leftarrow ((C) + (D)) * (B)$ , values popped, added,<br>//result pushed      |
| 6:                 | POP (A) // $M[A] \leftarrow ((C) + (D)) * (B)$ , value is popped<br>//and stored in memory |

Acc ISA - Example of assembly program:  $A = B * (C+D)$

1. LD (C) // ACC  $\leftarrow M[C]$
2. ADD (D) // ACC  $\leftarrow ACC + M[D]$
3. MUL (B) // ACC  $\leftarrow ACC * M[B]$
4. ST(A) //  $M[A] \leftarrow ACC$

CISC-ISA: Example of assembly program:  $A = B * (C+D)$

$B = 4; C = 5; D = 10$

The value in R1 after execution of instruction No. 1 is 5.

The value of R1 after execution of instruction No. 3 is 60

1. LD R1, (C) //  $R1 \leftarrow M[C]$
2. ADD R1, (D) //  $R1 \leftarrow R1 + M[D]$
3. MUL R1, (B) //  $R1 \leftarrow R1 * M[B]$
4. ST (A), R1 //  $M[A] \leftarrow R1$

## RISC-ISA: Example of assembly program: $A = B * (C + D)$

1. LD R1, (C) //  $R1 \leftarrow M[C]$
2. LD R2, (D) //  $R2 \leftarrow M[D]$
3. Add R3, R1, R2 //  $R3 \leftarrow R1 + R2$
4. LD R4, (B) //  $R4 \leftarrow M[B]$
5. MUL R5, R3, R4 //  $R5 \leftarrow R3 * R4$
6. ST (A), R5 //  $M[A] \leftarrow R5$

Computation is performed by a RISC ISA.  $A = B * (C + D)$ . What is the value in R5 after the execution of code line # 6: ( $B = 5$ ;  $C = 10$ ;  $D = 15$ ) ie: Code line # 6 has been completed. (20 pts)

R5 ?

# Addressing Modes and Syntax Examples

- Immediate
  - E.g., Add R1, 9;
- Direct
  - E.g., ADD R1, (M[9]);
- Register
  - E.g., ADD R1, R2;
- Register direct
  - E.g., ADD R1, (R2);
- Register indexed
  - E.g., ADD R1, R2, (M[9]);

| Operand Notation | Addressing Mode                                                                                              |
|------------------|--------------------------------------------------------------------------------------------------------------|
| V                | I, immediate: V is an immediate input operand, a 2's complement number.                                      |
| (V)              | D, direct: V is a memory address and (V) indicates the content of memory address V (i.e., M[V]).             |
| R                | R, register: Indicates an input data register source or a destination register or both                       |
| R, (V)           | X, indexed: V is a memory address and R + V is the address of the next data item in memory (i.e., M[R + V]). |

TABLE 8.1 Examples of Addressing Modes

Immediate E.g., Add R1, 9;  
//

Direct E.g., ADD R1, (M[9]);

Register E.g., ADD R1, R2;

Register direct E.g., ADD R1, (R2);

Register indexed E.g., ADD R1, R2, (M[9]);

○ Register indexed E.g., **ADD R1, R2, (M[9]);**

- What is the value in R1 ?

RTN#1:  $R1 \leftarrow R1 + M[R2 + 9]$

- Example 8.2 Assembly code listing of an Acc-ISA assembly language program for the high level (c) program in Example code 1

## Example Instruction format for Acc-ISA

```

.code
    LD  0
    ST (sum)
    ST (i)
L1:   CMP  7
      JGT L2
      MVX
      LD   X(array)
      ADD (sum)
      ST   (sum)
      LD   (i)
      ADD  1
      ST   (i)
      JMP  L1
L2:   ...
.data
array: RB 16
i:     RB 2
sum:   RB 2

```

*Example code #1*

```

int array[8];
int i, sum;
sum = 0;
for (i = 0; i < 8; i++)
    sum = sum + array[i];

```

Example 8.2. The listing of an Acc-ISA assembly language program for the program in Example code 1.

.code L1: The listing of an Acc-ISA assembly language program for the program in  
 .code //start program code

```

LD  0
ST
ST

L1:
CMP  7
JGT L2
MVX
LD   X(array)
ADD (sum)
ST   (sum) )
LD   (i)

```

```
ADD 1
ST (i)
JMP L1 //loop back ( End of for loop)
```

## Program level Translation:

Now, here is an example of a real C If-Then-Else:

```
if (x == 10)
{
    x = 0;
}
else
{
    x++;
}
```

Which gets translated into the following assembly/machine code:

X = 5; 0x 05 = 5 in decimal;

X = 0xA which is equal decimal 10.

```
Mov eax, $x
Cmp eax, 0x0A ; 0x0A = 10
```

```
Jne    else
Mov    eax,    0
Jmp    end
Else:
Inc    eax
End;
Mov    $x,    eax
```

### Sequential (Seq) Circuit: Large Design

- Large Seq Circuit is made up of a data path and a control unit.
- Data path consists of seq and combinational circuit such as registers, counters, mux, decoders and ALU'S. (Arithmetic Logic Unit)
- High Clock frequency (freq) implies data path has a short maximum propagation delay.
- Max clock freq results in higher core junction temperature.
- Designer's goal is to stay within/under max  $T_j$  ( $T_j$ = temperature inside Computer Brain) to avoid signal integrity issues.



FIGURE 6.1 A diagram block of a large sequential circuit.

Register transfer notation (RTN) is used to describe an operation of a data path:

- Formally describes a data path operation
- Examples:

| RTN Example                                                                                                               |
|---------------------------------------------------------------------------------------------------------------------------|
| $R \leftarrow \text{value}$                                                                                               |
| $CNTR \leftarrow CNTR + 1$<br>Or<br>$CNTR <= CNTR + 1; \text{(Verilog)}$                                                  |
| $R \leftarrow 0//R[7..1]$<br>Or<br>$R <= R >> 1; \text{(Verilog)}$<br>Or<br>$R <= \{0, R[7:1]\}; \text{(Verilog)}$        |
| $R \leftarrow R[7]//R[7..1]$<br>Or<br>$R <= R >>> 1; \text{(Verilog)}$<br>Or<br>$R <= \{R[7], R[7:1]\}; \text{(Verilog)}$ |
| $R3 \leftarrow R1 + R2$<br>Or<br>$R3 <= R1 + R2; \text{(Verilog)}$                                                        |
| $M[X] \leftarrow R;$                                                                                                      |

### **Data path types:**

- Architecture of data path can be classified as single cycle, multiple cycles or pipelined.
- A single-cycled data path requires more hardware but a simpler control unit.
- A multicycle data path requires less hardware but generates results in several clock cycles.
- A pipelined data path also requires more hardware but can operate on multiple inputs concurrently.
- The single Data path contains two adders (+) modules and one adder/sub tractor (+/-) module.
- The single mode controls the functions of the adder/sub tractor modules.
- Time period is proportional to the propagation delay of the longest signal path that starts from the inputs of the first adder and ends at the input of the register.
- In general, a single cycle data path implements several simple and complex operations, its minimum clock would be proportional to the time required to complete most complex operations.

### **Sequential Circuit: Datapath ( Timing )**

Tcq: Clock to queue

Tst (T setup time): is defined as the minimum amount of time before the clock's active edge that the data must be stable for it to latch correctly.

Tcs: clock to skew

Thold or Th: Hold time is defined as the minimum amount of time after the clock's active edge during which data must be stable.

- Sequential circuit: single cycle data path:  $A + B + C + D$  or  $A + B + C - D$
- Data path contains two adder modules and one adder/subtractor module
- The single mode controls the functions of the adder/subtractor modules



**FIGURE 6.2** A single-cycle two-function data path that computes either  $A + B + C + D$  or  $A + B + C - D$  in one clock cycle.

### Sequential circuit: Single cycle data path architecture

-Data path that computes either the quantity

- $A + B + C + D$  or  $A + B + C - D$

-Equation that estimates the minimum clock period ( $\tau$ ) required to run the data path

- Add stands for Adder; Sub stands for subtractor
- $\Delta$  is delta time delay from input to output

$$\tau_s \geq 2\Delta_{add} + \Delta_{add/sub} + T_{st} + T_{cq} + T_{cs}$$

$$\tau_s = \tau_{\text{single-cycle}}$$

### Sequential circuit: Multicycle data path architecture

-Data path that computes either the quantity

- $A + B + C + D$  or  $A + B + C - D$

-Equation that estimates the minimum clock period ( $\tau$ ) required to run the data path

- Add stands for Adder; Sub stands for subtractor; Mux stands for Multiplexor
- $\Delta$  is delta time delay from input to output

$$\tau_m \geq \Delta_{\text{mux1}} + \Delta_{\text{add/sub}} + \Delta_{\text{mux2}} + T_{st} + T_{cq} + T_{cs}$$

$\tau_m = \tau$ -multicycle

-A multicycle data path requires that a computation be divided and computed in steps.

-A multi cycle algorithm to implement  $R \leftarrow A + B + C + D$  or  $A + B + C - D$ ; (5 possible simple operations)

Cycle 1:  $R \leftarrow A$

Cycle 2:  $R \leftarrow R + B$

Cycle 3:  $R \leftarrow R + C$

Cycle 4: If mode == 0, then  $R \leftarrow R + D$ ; otherwise  $R \leftarrow R - D$



FIGURE 6.3 A multicycle data path requiring four clock cycles to compute  $A + B + C + D$  or  $A + B + C - D$ .

## Seq circuit: Pipelined Data path architecture

-Data path that computes stream of quantities  $A_i + b_i + C_i +/- D_i$

- $A_i + b_i + C_i +/- D_i$
- $\tau$  = represents minimum clock period for pipeline data path architecture
- Equation that estimates the minimum clock period ( $\tau$ ) required to run the data path
  - Add stands for Adder; Sub stands for subtractor; Mux stands for Multiplexor
  - $\Delta$  is delta time delay from input to output

$$\tau_p \geq \Delta_{\text{add/sub}} + T_{st} + T_{cq} + T_{cs}$$

$\tau_p$  =  $\tau$ -pipeline

- Computing stream of quantities  $A_i + b_i + C_i +/- D_i$ .

Sequential Circuits: Large Designs 221



FIGURE 6.4 A two-function pipelined data path computing a stream of quantities  $A_i + B_i + C_i + D_i$  for  $i = 0, 1, 2, \dots$

- Seq circuit: Pipelined Data path architecture (Cont)

Refer to Figure 6.5 -Horizontal Pipeline chart

A pipeline uses more hardware, similar to a single-cycle data path, but operates with a higher-frequency clock, similar to a multicycle data path. Furthermore, it can process a stream of data a lot faster than the other two data paths. The clock period of a pipelined data path is proportional to the propagation delay of its



Fig 6.5

Pipeline chart for 3 stage pipeline

Refer to Figure 6.5 -Horizontal Pipeline chart (Cont)

- The pipeline chart in Fig. 6.5(a) has horizontal organization with clock cycles shown on the x-axis.
  - Pipeline chart illustrates various chart, does not include 1 cycle delay caused by interfacing registers
  - Has horizontal organization with clock cycles shown on x-axis.

## Sequential circuit: Pipeline performance

- In the example, R0 being the first result, requires 3 clock cycles to complete. (Ignore 1 cycle for interface registers)
- After R0, R1, R2.....Rn each output requires 1 clock cycle to produce results.
- Reduces time required to compute N final results.
- K stage (linear) pipeline requires K cycles for output.
- Equation Eq6.4 estimates total time ( $T_{pipeline}$ ) required to process stream size of N using K stage pipeline

$$T_{pipeline} = K * \tau_p + (N-1) * \tau_p \quad (\text{Eq 6.4})$$

- Estimates total time-  $T_{single}$  -cycle to process data stream of size N using single cycle data path.

$$T_{Single-cycle} = N * K * \tau_p \quad (\text{Eq 6.5})$$

- Speed up is performance parameter that measures performance of faster system to a slower system.
- Defined as ratio of the time required by a slower system over faster system.
- Equation that defines speedup between a faster pipeline data path compared slower single cycle Datapath.

$$\begin{aligned} \text{Speedup} &= T_{single-cycle}/T_{pipeline} = N * k * \tau_p / k * \tau_p + (n-1) * \tau_p \quad (\text{eq. 6.6}) \\ &= N * K / (K + N - 1) \end{aligned}$$

- Efficiency is performance parameter that measures how well a system's resource's are utilized.

- Overall efficiency of a system is defined as the ratio of its speedup to its maximum possible speedup.
- Efficiency = Speedup/K =  $N / (k+N-1)$

- As N approaches infinity, efficiency of pipeline approaches 100%.
- Throughput is performance parameter that measures a system's rate of processing
- Indicates the # of items (N) performed per second
- Eq. 6.8 defines the throughput of a linear pipeline with k stages

$$\text{Throughput} = \frac{N}{T_{\text{pipeline}}} = \frac{N}{[(k \tau) + (N-1) * \tau]}$$

Calculation Speedup:

$$\begin{aligned} \text{Speedup} &= \frac{T_{\text{single cycle}}}{T_{\text{pipeline}}} = \frac{NK \tau_p}{K \tau_p + (N-1) \tau_p} \\ &= \frac{NK \tau_p}{K \tau_p + N \tau_p - \tau_p} \\ &= \frac{\cancel{NK} \cancel{\tau_p}}{\cancel{\tau_p} (K + N - 1)} \end{aligned}$$

$$\text{Speedup} = \frac{NK}{(K + N - 1)}$$

Timing Diagram of D flip flop:

- $t_{cq}$  or  $t_{dqq}$ :
- $t_{st}$  or  $t_{sa}$ :
- $T_{es}$ :
- $t_h$ :



4-0-1

6-2-4



## Chapter 7 – Memory Tech introduction

- A register is storage for single value for quick access/readily available
- Memory is designed to store the code and data of programs during execution
- Storage memory technologies require less H/W than flip-flop
- Will go over commonly used memory technologies and their applications
- Storage size is defined in terms of bytes

| Unit | Reads        | Approximate Size |
|------|--------------|------------------|
| 1kb  | One Kilobyte | $10^3$ Bytes     |
| 1MB  | One Megabyte | $10^6$ Bytes     |
| 1 GB | One Gigabyte | $10^9$ bytes     |
| 1 TB | One Terabyte | $10^{12}$ Bytes  |

- Memory technologies are categorized as Read only memory or Random-Access memory
- ROM memory is nonvolatile and retains their content even when they are not powered
- RAM are volatile and would lose their content when not powered
- Examples of Non-Volatile memory technologies are magnetic disks, flash memory and optical disk such as CD ROM's
- The content of a ROM cell is fixed at logic 0 or 1
- An Electrically erasable PROM ([EEPROM](#)) is most commonly used today
- EEPROM cells can be programmed only a certain number of times
- Ram's functions as the Main storage for programs and data

- SRAM vs DRAM
  - SRAM is preferable for register files and L1/L2 caches
    - Faster access time
    - No Refreshes needed
    - Simpler Manufacturing.
    - Lower density
- DRAM is preferable for stand alone memory chips
  - Much higher capacity per area
  - Higher density.
  - Lower cost – i.e. cost vs performance trade off.
  - Continuously refreshed to keep the capacitors charged up.
- A memory timing diagram precisely illustrates memory communication protocols.
- Time required to select target cells and perform a read or write operation is called a memory access time
- A memory cycle includes both an access time and data transfer time.
- The access time is directly proportional to the size of the memory cell array.
- The decoders in turn determine the time required to activate a target row and access target cells

## A. Peak Memory Bandwidth Exercise

Example 1: Consider a **32-bit** data bus **SDRAM**. Given that the clock frequency of the bus is **100MHz**, what is the peak memory bandwidth in megabyte per second (MBs)?

32 -bit data bus:

1 byte = 8 bits, so 32 bits x 1 byte/8bits = 4Bytes per cycle

1 MB = 1,000,000 Bytes

1 Hz = 1 cycle/sec

1 MHz= 1 million Hz = 1,000,000 \* 1 cycle/sec = 1,000,000 cycles/sec

100 MHz = 100 Million Hz = 100 Million \* 1 cycle/sec = 100,000,000 cycles/sec

100,000,000 cycle/sec \* 4 Bytes/cycle = 400,000,000 Bytes/sec

400,000,000 Bytes/sec \* 1MB/1,000,000 Bytes = 400 MB/sec

### 7.5.1 SRAM

Figure 7.16 illustrates an SRAM read cycle from the memory point of view.



FIGURE 7.16 An SRAM read cycle from the memory point of view.

Figure 7.17 illustrates an SRAM memory write cycle.

A memory cycle is initiated by CPU and typically takes multiple CPU clock cycles to complete.



---

FIGURE 7.17 An SRAM write cycle from a memory point of view.

## Tri-State Buffer

A tri-state buffer is similar to a [buffer](#), but it adds an additional "enable" input that controls whether the primary input is passed to its output or not. If the "enable" inputs signal is **true**, the tri-state buffer behaves like a normal buffer. If the "enable" input signal is **false**, the tri-state buffer passes a *high impedance* (or hi-Z) signal, which effectively disconnects its output from the circuit.

Truth table for a tri-state buffer

| Enable Input | Input A | Output |
|--------------|---------|--------|
| false        | false   | hi-Z   |
| false        | true    | hi-Z   |
| true         | false   | false  |
| true         | true    | true   |

Tri-state buffers are often connected to a *bus* which allows multiple signals to travel along the same connection.

The *truth table* for a tri-state buffer appears to the right.

## Symbols

The symbol below can be used to represent a tri-state buffer.



A tri-state buffer is similar to a [buffer](#), but it adds an additional "enable" input that controls whether the primary input is passed to its output or not. If the "enable" inputs signal is **true**, the tri-state buffer behaves like a normal buffer. If the "enable" input signal is **false**, the tri-state buffer passes a *high impedance* (or hi-Z) signal, which effectively disconnects its output from the circuit.

Tri-state buffers are often connected to a *bus* which allows multiple signals to travel along the same connection.

The *truth table* for a tri-state buffer appears to the right.





## System Performance Advantages of Higher Density SRAMs

By Jayasree Nayar, Applications Engineer Staff, Cypress Semiconductor Corp.

### Executive Summary

With ever increasing system bandwidth requirements on the order of multigigabits/sec, SRAMs need to be optimized for higher density and performance as well as reliability, especially for look up table and packet buffering in networking applications. This article will address the different factors affecting system performance using Synchronous SRAMs.

With ever-increasing system bandwidth requirements of the order of multigigabits/sec, SRAMs need to be optimized for higher density and performance.

Today, Synchronous SRAMs are used in networking applications, such as L2/L3 caches and buffers, and SRAM performance is a significant factor that contributes to overall system performance. The need for high-density, high-performance SRAMs with high reliability is crucial for networking applications, especially for look-up table and statistics buffers.

Some of the advantages of higher density SRAMs include:

- Reduced board area which can be achieved by replacing two or more lower density parts with a higher density part having the same package footprint
- Increased density to handle more data traffic leading to better system performance
- Fewer interface signals and lower pin count leading to improved signal integrity, timing management, and skew control
- Lower cost
- Lower power
- Lower latency

This article will address the different factors affecting system performance using Synchronous SRAMs. Some of the critical factors that enhance system performance using Synchronous SRAMs are:

- Higher density
- Higher operating frequencies and wider data bus
- Lower latency
- Lower power
- Improved signal integrity

### Higher density

The higher the SRAM memory performance, the better will the system performance.

Higher SRAM memory performance can be enhanced by increasing the operating frequencies and increasing the capacity of storing data.

The latter can be achieved by increasing the density of the SRAMs. This enables more data to be buffered as well as increasing the number of transactions that can be performed in a fixed time interval. The greater the transaction speed, the higher the density required for the SRAMs. Therefore, increased density of the SRAMs enables it to handle more data traffic leading to better system performance.

Another advantage of using higher density SRAMs is reduced board space. System designers strive to reduce the board area occupied by the different components for reducing overall board cost. Because SRAM vendors may not be offering the

density required in their applications, system designers often have to use multiple SRAMs to meet the memory density requirements of the system. If higher density SRAMs are available, then 2 or more SRAMs can be replaced by a single SRAM. For example, a single 144M density SRAM can replace two 72M density SRAMs or four 36M density SRAMs mounted on the board thereby reducing board space occupied by the SRAMs (see Figures 1 and 2).



**Figure 1: Two 72M density SRAMs replaced by a single 144M density SRAM**



**Figure 2: Four 36M density SRAMs replaced by a single 144M density SRAM**



## ***Higher Operating Frequencies and Wider Data Bus***

The higher the frequency of the SRAMs and greater the bus width, the better the bandwidth. The more bandwidth, the more number of bits that can be processed per time interval, leading to better performance.

Below is the formula for calculating bandwidth in SRAMs:

### **For Single data rate SRAMs:**

- Bandwidth = Bus width x SRAM Frequency
  - Example: for a Single data rate Synchronous SRAM operating at 250MHz with a bus width of 36:
    - Bandwidth =  $36 * 250 * 10^6 = 9\text{Gb/sec.}$

### **For Double data rate SRAMs:**

- Bandwidth = Bus width x 2(DDR) x SRAM Frequency
  - Example: for a Double data rate Synchronous SRAM operating at 400MHz with a bus width of 36:
    - Bandwidth =  $36 * 2 * 400 * 10^6 = 28.8\text{Gb/sec.}$

## ***Lower Latency***

Lower latency results from the fact that the higher density part is capable of operating at higher frequencies. Reduced latency enables data to be driven out of the bus faster, leading to increased bus efficiency. Reduced latency also enables more transactions to be performed on the memory within a certain time interval.

## ***Lower power***

One of the main challenges faced by system designers is to lower overall system power. The greater the density of the SRAMs, the fewer SRAMs required to satisfy system density requirements. Reducing the number of SRAMs used also lowers overall power consumption.

Consider the case of the following 2 density SRAMs.

### 144Mb density SRAM with Vdd =1.8V

I<sub>dd</sub> =1070mA

Core power = V<sub>dd</sub> \* I<sub>dd</sub> =1.926W

where V<sub>dd</sub> is the voltage of the core power supply and I<sub>dd</sub> is the operating current

### 72Mb density SRAM with Vdd =1.8V

I<sub>dd</sub> =1000mA

Core power = V<sub>dd</sub> \* I<sub>dd</sub> =1.800W

where V<sub>dd</sub> is the voltage of the core power supply and I<sub>dd</sub> is the operating current

Bear in mind that two 72M SRAMs could be replaced with a single 144M SRAM.

Core power dissipated by two 72M SRAMs =  $1.8 * 2 = 3.6\text{W}$

Core Power dissipated by a single 144M SRAM = 1.926W

Please refer to Figures 3 & 4 below:



**Figure 3:**



**Figure 4:**

Therefore, though the power consumed by a single 72M SRAM is lower than a single 144M SRAM, the power consumed by a single 144M SRAM is lower than the power consumed by two 72M SRAMs combined by about ~47%.

### **Improved Signal Integrity**

The higher the density of the SRAMs, the fewer number of interface signals required, thus leading to improved signal integrity at high operating frequencies. This also eliminates the need to clamshell SRAMs to reduce board space, thus making the board routing less complicated and signal integrity better. Fewer traces also reduces crosstalk, thereby improving signal integrity, especially at high frequencies.



Cypress Semiconductor  
198 Champion Court  
San Jose, CA 95134-1709  
Phone: 408-943-2600  
Fax: 408-943-4730  
<http://www.cypress.com>

© Cypress Semiconductor Corporation, 2007. The information contained herein is subject to change without notice. Cypress Semiconductor Corporation assumes no responsibility for the use of any circuitry other than circuitry embodied in a Cypress product. Nor does it convey or imply any license under patent or other rights. Cypress products are not warranted nor intended to be used for medical, life support, life saving, critical control or safety applications, unless pursuant to an express written agreement with Cypress. Furthermore, Cypress does not authorize its products for use as critical components in life-support systems where a malfunction or failure may reasonably be expected to result in significant injury to the user. The inclusion of Cypress products in life-support systems application implies that the manufacturer assumes all risk of such use and in doing so indemnifies Cypress against all charges.

PSoC Designer™, Programmable System-on-Chip™, and PSoC Express™ are trademarks and PSoC® is a registered trademark of Cypress Semiconductor Corp. All other trademarks or registered trademarks referenced herein are property of the respective corporations.

This Source Code (software and/or firmware) is owned by Cypress Semiconductor Corporation (Cypress) and is protected by and subject to worldwide patent protection (United States and foreign), United States copyright laws and international treaty provisions. Cypress hereby grants to licensee a personal, non-exclusive, non-transferable license to copy, use, modify, create derivative works of, and compile the Cypress Source Code and derivative works for the sole purpose of creating custom software and/or firmware in support of licensee product to be used only in conjunction with a Cypress integrated circuit as specified in the applicable agreement. Any reproduction, modification, translation, compilation, or representation of this Source Code except as specified above is prohibited without the express written permission of Cypress.

Disclaimer: CYPRESS MAKES NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, WITH REGARD TO THIS MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Cypress reserves the right to make changes without further notice to the materials described herein. Cypress does not assume any liability arising out of the application or use of any product or circuit described herein. Cypress does not authorize its products for use as critical components in life-support systems where a malfunction or failure may reasonably be expected to result in significant injury to the user. The inclusion of Cypress' product in a life-support systems application implies that the manufacturer assumes all risk of such use and in doing so indemnifies Cypress against all charges.

Use may be limited by and subject to the applicable Cypress software license agreement.

## Week 3

### DeMorgan's Theory Explained:

**Theorem 1:** Theorem states that two or more variables NAND'ed together is the same as the two terms inverted and OR'ed

$$\overline{A \cdot B} \equiv \overline{A} + \overline{B}$$

**Theorem 2:** Theorem states that two or more variables NOR'ed together is the same as the two variables inverted and AND'ed

$$\overline{A + B} \equiv \overline{A} \cdot \overline{B}$$

### DeMorgan's Theorem illustrated in Gates (Schematic):

$$\overline{A \cdot B} \equiv \overline{A} + \overline{B}$$



A NAND gate is equivalent to an inverted input OR gate

$$\overline{A + B} \equiv \overline{A} \cdot \overline{B}$$



A NOR gate is equivalent to an AND gate followed by inversion

Lets go video to work out an example: Refer to video example:

DeMorgan's Theorem

$$F_1 = F_2$$

Implementation of SOP expression with NAND Gates: Inverter



## Karnaugh Maps (K – Map)

- A Karnaugh map is a graphical representation of a logic function.
- The map for n-input logic function is an array with  $2^N$  cells,(One for each possible input combination or minterm.
  - o 2 variables → 4 cells; 3 variables → 8 cells
- To represent a logic function o a k-map, we simply copy 1's from the truth table to the corresponding cells in the K-Map.
- Implicants are squares or rectangles made up of minterms
- Pairs of adjacent “1” cells in the K-map have minterms that differ in ONLY one variable
- The minterm pairs can be combined into a single product term using the generalization of theorem T10:
  - o  $T10 = \text{Term} * Y + \text{Term} * Y' = (\text{Term})$
- Thus, we can use k-map to simplify the sum of the function.

# Three Variable K-Map.

Truth table

|    | X | Y | Z | F |
|----|---|---|---|---|
| R1 | 0 | 0 | 0 | 1 |
| R2 | 0 | 0 | 1 | 0 |
| R3 | 0 | 1 | 0 | 1 |
| R4 | 0 | 1 | 1 | 1 |
| R5 | 1 | 0 | 0 | 0 |
| R6 | 1 | 0 | 1 | 0 |
| R7 | 1 | 1 | 0 | 0 |
| R8 | 1 | 1 | 1 | 1 |

$$F = \overline{X}\overline{Y}\overline{Z} + \overline{X}\overline{Y}\overline{Z} + \overline{X}\overline{Y}Z + XY\overline{Z}$$

$$= YZ + \overline{X}Y + X\overline{Z}$$



## Combinational circuit -Small Design Summary

Truth table:

### LUT (LOOK UP TABLE)

Entire truth table ..

Advantage:

Disadvantage:.

### Minimal logic circuit Implementation

Advantage:

Disadvantage:

## Week 4

### Design of Full adder

Carry Propagate Adder (CPA)

- Full adder is an arithmetic logic unit or circuit to add two single digit numbers with Carry-in
- It is still combinational circuit with inputs Ax and Bx
  - o Sx – Sum (1bit); C-in; Cx – Carry out;
- Adder is also called Ripple Carry Adder (RCA)
- Fundamental circuit for an adder and has longest propagation delay that is proportional to the number of the carry bits.
- Equation to estimate the propagation delay of an n-bit CPA.  $\Delta F_{AC}$  and  $\Delta F_{AS}$  stand for delta time for carry bit and delta time for sum signal propagation delay of a Full Adder (FA)

$$\Delta CPA(n) = [ (n-1) * \Delta F_{AC} ] + \Delta F_{AS} \quad Eq(2.8)$$

- Using AND/OR/EOR gate,  $\Delta$  for AND/OR gate is = 0.1 ns and  $\Delta$  for EOR = 0.3ns,  $\Delta F_{AC}$  = 0.2ns and  $\Delta F_{AS}$  = 0.3ns, as per Eq(2.8)
  - o For 8 bit CPA (n=8), the equation to calculate propagation delay is as follows:

$$\begin{aligned}\Delta CPA(n) &= (n - 1)(0.2\text{ns}) + 0.3\text{ns} \\ &= (8-1)_-(0.2nx) + 0.3\text{ns}\end{aligned}$$

-

***Block Diagram of Full adder***



## Design of Full Adder

| Inputs |   |        | Outputs |         |
|--------|---|--------|---------|---------|
| A      | B | C - IN | Sum     | C - OUT |
| 0      | 0 | 0      | 0       | 0       |
| 0      | 0 | 1      | 1       | 0       |
| 0      | 1 | 0      | 1       | 0       |
| 0      | 1 | 1      | 0       | 1       |
| 1      | 0 | 0      | 1       | 0       |
| 1      | 0 | 1      | 0       | 1       |
| 1      | 1 | 0      | 0       | 1       |
| 1      | 1 | 1      | 1       | 1       |

## Schematic of a 4 bit Cary Propagate Adder (CPA)



### 4-bit binary addition

|           |        |        |        |        |
|-----------|--------|--------|--------|--------|
| A(3)      | A(2)   | A(1)   | A(0)   |        |
| B(3)      | B(2)   | B(1)   | B(0)   | +      |
| <hr/>     |        |        |        |        |
| Carry-Out | Sum(3) | Sum(2) | Sum(1) | Sum(0) |

### 4-bit binary addition

| A(3)      | A(2)   | A(1)   | A(0)   | +      |
|-----------|--------|--------|--------|--------|
| B(3)      | B(2)   | B(1)   | B(0)   |        |
| Carry-Out | Sum(3) | Sum(2) | Sum(1) | Sum(0) |

Binary Addition review:

| A | B | S | Cout |
|---|---|---|------|
| 0 | 0 | 0 | 0    |
| 0 | 1 | 1 | 0    |
| 1 | 0 | 1 | 0    |
| 1 | 1 | 0 | 1    |

$$1_2 + 1_2 = 10_2$$

$$1_{10} + 1 = 2_{10} = \textcolor{violet}{10}_2$$

$$1_2 + 1_2 + 1_2 = 11_2$$

$$\begin{array}{r} 9 \\ + 1 = \\ 10 \end{array}$$

A = 0<sub>green</sub>1<sub>green</sub>1<sub>red</sub>

B = 1<sub>green</sub>1<sub>green</sub>1<sub>red</sub>

| Inputs |      |      | Output |       |
|--------|------|------|--------|-------|
| A[i]   | B[i] | C-in | S      | C out |
| 1      | 1    | 0    | 0      | 1     |
| 1      | 1    | 1    | 1      | 1     |
| 1      | 1    | 1    | 1      | 1     |
| 0      | 1    | 1    | 0      | 1     |

## Design a full adder from ground up.... (Exercise video Note)

Design of a Full Adder (CPA - 1 bit - single cell)

Step 1 : Generate the truth table

With 3 inputs, we have 8 different total Combinations.

| Inputs<br>$A_x, B_x, C_{in}$ |       |          | Outputs<br>$S_x, C_{out}$ |           |  |
|------------------------------|-------|----------|---------------------------|-----------|--|
| Inputs                       |       |          | Output                    |           |  |
| $A_x$                        | $B_x$ | $C_{in}$ | $S_x$                     | $C_{out}$ |  |
| 1                            | 0     | 0        | 0                         | 0         |  |
| 2                            | 0     | 0        | 1                         | 0         |  |
| 3                            | 0     | 1        | 0                         | 0         |  |
| 4                            | 0     | 1        | 1                         | 1         |  |
| 5                            | 1     | 0        | 0                         | 1         |  |
| 6                            | 1     | 0        | 1                         | 0         |  |
| 7                            | 1     | 1        | 0                         | 1         |  |
| 8                            | 1     | 1        | 1                         | 1         |  |

$Z_{10} = \frac{1}{10^2}$

$\begin{array}{r} 1 \\ 0 \\ 1 \\ 0 \\ 1 \\ 1 \\ 0 \\ 0 \\ \hline 0 \end{array}$

Step 2: Determine the logical Expression

$$S_x = \bar{A}_x \bar{B}_x C_{in} + \bar{A}_x B_x \bar{C}_{in} + A_x \bar{B}_x \bar{C}_{in} + A_x B_x C_{in}$$

$$C_{out} = \bar{A}_x B_x C_{in} + A_x \bar{B}_x C_{in} + A_x B_x \bar{C}_{in} + A_x B_x C_{in}$$

Step 3. Simplify Equation or logical Expression:

$$S_x = \bar{A}_x \bar{B}_x C_{in} + \bar{A}_x B_x \bar{C}_{in} + A_x \bar{B}_x \bar{C}_{in} + A_x B_x C_{in}$$

$$S_x = A_x \oplus B_x \oplus C_{in}$$

Simplify Using K-Map



$$S_x = A_x \oplus B_x \oplus C_{in}$$

$$C_{out} = \bar{A}_x B C_{in} + A_x \bar{B} C_{in} + A_x B \bar{C}_{in} + A_x B C_{in}$$

$$= A_x C_{in} + A_x B_x + B C_{in}$$



$$C_{out} = A_x C_{in} + A_x B_x + B C_{in}$$

Step 4: Design of the Circuit

$$S_x = A_x \oplus B_x \oplus C_{in} \quad \checkmark$$

$$C_{out} = A_x C_{in} + A_x B_x + B_x C_{in}$$



## Multiplexer

- Selecting of data or information is a critical function in a digital systems and computers
- A Multiplexer (Mux for short) is a digital switch; Mux is a circuit used to select and route any of the several inputs to an output signal
- Mux is a combinational circuit. it has the following:
  - o  $2^n$  inputs
  - o n control inputs
  - o one set of output
  - o In summary, has the following  $2^n$  input signals, n control inputs (selector signals), and a single set of output – Output signal.
- For a multiplexer, the value of the control inputs (selector signals) determines the data input that is selected.
- Multiplexer means many into one. a simple example of a non-electronic circuit of a mux is a single pole multi-position switch. Multi-position switches are widely used in many electronics circuit, however, circuits that operate at high speed require the multiplexer to be automatically selected. A mechanical switch cannot perform this task satisfactorily. Therefore, mux is used to perform high speed switching and are constructed for digital circuits
- Example:
  - o X and Y are inputs
  - o S is the selector signal
  - o r is the output

Design of a 1 bit, 2 to 1 mux

### 1. Block diagram of 1 bit, 2 to 1 mux



Design of 1bit, 2 to 1 Mux

1. Block diagram of 1bit, 2 to 1 Mux



2. 1 bit, 2 to 1 mux Truth Table

| Ctr | Ctrl | Input Signals |   | Output |
|-----|------|---------------|---|--------|
| 1   | 0    | X             | 0 | 0      |
| 2   | 0    | X             | 0 | 1      |
| 3   | 0    | X             | 1 | 0      |
| 4   | 0    | X             | 1 | 1      |
| 5   | 1    | Y             | 0 | 0      |
| 6   | 1    | Y             | 1 | 0      |
| 7   | 1    | Y             | 1 | 1      |

Outputs Y when  $s=0$ ; Outputs X when  $s=1$ .

$$r = \bar{s}\bar{X}Y + \bar{s}XY + s\bar{X}Y + sXY$$

Design of 1 bit, 2 to 1 Mux

3. Simplify the logical Expression:

Minimal SOP Expression:  $\bar{S}Y + S\bar{X}$

$$F = \bar{S}\bar{X}Y + \bar{S}XY + S\bar{X}\bar{Y} + S\bar{X}Y$$

Simplify Using K-map:



Conversion to Floating Point Representation:  $5.375_{10}$

Step 1: Convert decimal to Binary :  $5.375_{10}$

- First convert whole number or integer: 5

$$\begin{array}{r} 2 \\ \hline 5 \\ 4 \cancel{\sqrt{}} \\ \underline{1} \end{array} \quad \begin{array}{r} 2 \\ \hline 2 \\ 2 \cancel{\sqrt{}} \\ \underline{0} \end{array}$$

↓                  ↓  
1                  0.

$$2 \sqrt{1} \rightarrow !$$

$$5_{10} = 101_2$$

- Now we take fraction piece :  $0.375_{10}$

$$0.375 \times 2 = \underbrace{0}_{\downarrow} + 0.75$$

$$0.75 \times 2 = 1 + 0.5$$

$$0.5 \times 2 = 1 + 0 \quad \text{END}$$

$$0.375_{10} = 011_2$$

$$5.375_{10} = 101.011_2$$

Step 1:  $+5 \cdot 375_{10} = 101.011_2$

Step 2: Converting Binary to Scientific Notation:  $5 \cdot 375_{10} = 101.011_2$

Convert  $631.65_{10}$

$$631.65_{10} \xrightarrow[\text{Notation}]{\text{Scientific}} 6.3165 \times 10^2$$

Convert  $101.011_2$

$$101.011_2 \xrightarrow[\text{Notation}]{\text{Scientific}} 1.01011 \times 2^2$$

$$5 \cdot 375_{10} = 1.01011_2 \times 2^{8+2}$$

Step 2:  $1.01011 \times 2^7$

Step 3: Calculate the Biased Exponent:

Biased offset: 7

$$\text{Biased Expo} = \text{Unbiased Exp} + \text{Biased offset}$$

$$= 2 + 7$$

$$= 9_{10}$$

$$1.01011 \times 2^{2+7} = 1.01011 \times 2^9$$

$$\text{Biased Exponent} = 9_{10} = 1001_2$$

---

Biased offset

$N$  = # of bits used to store Biased Exponent

$$N = 4$$

$$\begin{aligned}\text{Biased offset} &= 2^{N-1} - 1 \\ &= 2^{4-1} - 1\end{aligned}$$

$$= 2^3 - 1$$

$$= 8 - 1$$

$$= 7$$

Step 3:  $1.01011 \times 2^9$

Step 4: Fusion of Floating Point:  $1.01011 \times 2^9$



Step 5: Convert to Hex

0x 4A C 0

# Hexadecimal Numbers

| Dec | Binary | Hex |
|-----|--------|-----|
| 0   | 0000   | 0   |
| 1   | 0001   | 1   |
| 2   | 0010   | 2   |
| 3   | 0011   | 3   |
| 4   | 0100   | 4   |
| 5   | 0101   | 5   |
| 6   | 0110   | 6   |
| 7   | 0111   | 7   |
| 8   | 1000   | 8   |
| 9   | 1001   | 9   |
| 10  | 1010   | A   |
| 11  | 1011   | B   |
| 12  | 1100   | C   |
| 13  | 1101   | D   |
| 14  | 1110   | E   |
| 15  | 1111   | F   |

$$= 18$$

$$0 \times 10 = 16$$

$$= 1 \times 10^1 + 8 \times 10^0$$

$$= 10 + 8 \times 1$$

$$0 \times 10 = 1 \times 16^1 + 0 \times 16^0 \\ = 16 + 0 \times 1$$

$$= 10 + 8 \\ = 18$$

$$= 16 + 0$$

$$= 16$$

FIVE STAR

Implementation of SOP Expression with  
NAND Gates.

$$\text{Theorem \#1 : } \overline{x \cdot y} \Leftrightarrow \bar{x} + \bar{y}$$

$$\text{Theorem \#2 : } \overline{x+y} \Leftrightarrow \bar{x} \cdot \bar{y}$$

$$f = \bar{x} \cdot \bar{y} + x \cdot y$$



Step 1: Replace each not gate with its equivalent NAND gate by connecting inputs of a two input NAND gates to the single input of the not gate



Step 2: Place two not gates on each end of the intermediate signals  $g$  and  $h$ . This will not change the output.  
E.G.  $\bar{\bar{g}} = g$

An AND-OR gate with two not gate replaced.



Step 3: Replace each AND-NOT gate with NAND gate.



Step 4: Replace OR gate with inverted input

with NAND gate.



$$\bar{X} + \bar{Y} \Leftrightarrow \overline{X \cdot Y}$$

## Sequential Circuit – Core modules

### **S-R Latches**

- Latches are basic building block of flip-flops (Basic memory Unit).
- Two types of memory elements based on the type of trigger that is suitable to operate.
  - Latches
  - Flip Flops
- Latches operate with enable signal, which is level sensitive, whereas flip-flops are edge sensitive.

### **D flip-flop**

- Basic memory storage device.
- For D flip-flop, a clock signal is needed to change states.
- Similar to S-R latch, only we use set and reset function. (Tie D input to s input and not D to R input to make S\_R flip flop into D flip flop.
- Block Diagram of D flip flop.



Internal circuit of D flip flop:

- D flip flop is designed using S-R latches.
- D input goes into S and D goes through inverter and feeds into R.



## Sequential Circuit: Core Modules (Cont)

Internal circuit of D flip flop:

(D flip flop is designed using S-R latches. D input goes into S and D goes through inverter and feeds into R)



Lets take look at the design of a S-R-flip flop built with NOR gates.

NOR

S-R Flip-Flop with  $\wedge$  Gates



| S | R | $Q_t$        | $Q_{t+1}$    |
|---|---|--------------|--------------|
| 0 | 0 | $Q_t$        | $Q_t$        |
| 0 | 1 | 0            | 0            |
| 1 | 0 | 1            | 1            |
| 1 | 1 | Undetermined | Undetermined |

CLK = Clock Signal

Case 1:  $S=0; R=0$

$$\begin{aligned} \overline{Q_{t+1}} &= \overline{0 + \bar{Q}_t} = \bar{\bar{Q}_t} = Q_t \\ \overline{Q_{t+1}} &= \overline{0 + Q_t} = \bar{Q}_t \end{aligned}$$

Case 2:  $S=0, R=1$

$$\begin{aligned} \overline{Q_{t+1}} &= \overline{1 + \bar{Q}_t} = \bar{1} = 0 \\ \overline{Q_{t+1}} &= \overline{0 + Q_t} = \bar{Q}_t \end{aligned}$$

Case 3:  $S=1; R=0$

$$\begin{aligned} \overline{Q_{t+1}} &= \overline{0 + \bar{Q}_t} = \bar{\bar{Q}_t} = Q_t = 1 \\ \overline{Q_{t+1}} &= \overline{1 + Q_t} = \bar{1} = 0 \end{aligned}$$

## S-R Flip-Flop with NOR Gates (Cont)

Case 4:  $S = 1$ ;  $R = 1$

$$Q_{t+1} = \overline{1 + \bar{Q}_t} = \bar{1} = 0 \quad 0$$

$$\overline{Q_{t+1}} = \overline{\bar{1} + \bar{Q}_t} = \bar{1} = 0 \quad 0$$

$0 = \text{Not Used}$

NAND as a Inverter

NAND gate



Inverter



Truth Table

| D | A | B | Q <sub>n</sub> |
|---|---|---|----------------|
| 0 | 0 | 0 | 1              |
| 1 | 1 | 1 | 0              |

| D | Q <sub>n</sub> |
|---|----------------|
| 0 | 1              |
| 1 | 0              |

## Sequential Circuit: Small Design Introduction

- All small and large sequential circuits are made of flip-flops and set of CC's – Combinational circuits.
- Contrary to CC's, a sequential circuit design has states and transitions from a current state to next state.
- A sequential circuit design problem is typically modeled as a finite state diagram (FSD).
- FSD consist of circles as states and arcs (arrows) as transitions, which specifies the behavior of sequential circuit.
- An FSD is systematically converted into circuit called a finite state machine. (FSM)
- Finite state machine designs are categorized into Mealy, Moore or hybrid machines.
- Mealy is a FSM whose output values are determined by its ***current state and current input***. Contrast to a Moore machine, whose output values are determined **solely by** its current inputs.
- Finite state diagram (FSD) describes the behavior of system
- A register is used as a storage module to save the output of a CC.
- FSM's are categorized into Mealy, Moore or hybrid machines.

## **Sequential Circuit: Small Design Introduction – Moore Vs Mealy**

### Moore Machines

- Output depends only upon present state
- If input changes, output does not change
- More number of states are required
- There is more hardware requirement
- They react slower to inputs ( One clock cycle later)
- Synchronous output and state generation
- Output is placed on states
- Easy to design

### Mealy Machine

- Output depends on present state as well as present input
- If input changes, output also changes
- Less number of states are required
- There is less hardware requirement
- They react faster to inputs
- Asynchronous output generation
- Output is placed on transitions
- It is difficult to design

Example : Design of a Moore FSM that detects overlapping seq "101"



Step 1: Create a Moore FSD - Finite State Diagram



Design of Moore FSM that detects Overlapping Sequence "101" - Cont.

Step 2 : Determine the minimum number of bits required to store the states

$$\text{Number of bits} = \log_2 [k] = \log_2 [4] = 2$$

$$k = \# \text{ of states}$$

Step 3 : From the FSD, create the Truth table for NSG & DG

$$A = 00, \quad B = 01,$$

$$C = 10, \quad D = 11$$

NSG

|   | Current state |       | Input X | Next state |       |   |
|---|---------------|-------|---------|------------|-------|---|
|   | $q_1$         | $q_0$ | X       | $d_1$      | $d_0$ |   |
| 1 | A             | 0     | 0       | 0          | 0     | A |
| 2 |               | 0     | 0       | 1          | 0     | B |
| 3 | B             | 0     | 1       | 0          | 1     | C |
| 4 |               | 0     | 1       | 1          | 0     | B |
| 5 | C             | 1     | 0       | 0          | 0     | A |
| 6 |               | 1     | 0       | 1          | 1     | D |
| 7 | D             | 1     | 1       | 0          | 1     | C |
| 8 |               | 1     | 1       | 1          | 0     | B |

### Step 3 - Cont

Design of a Moore FSM that detects Overlapping Sequence "101"

Step 3: Create Output Generator (OG)

| Current state |       | O.G.  | output |
|---------------|-------|-------|--------|
|               | $q_1$ | $q_0$ | $Z$    |
| 1             | 0     | 0     | A      |
| 2             | 0     | 1     | B      |
| 3             | 1     | 0     | C      |
| 4             | 1     | 1     | D      |



clk = Clock signal

Design of a Moore FSM that detects Overlapping Sequence "101"

Step 4: From the truth table, Determine Min SOP for each of the states - var d<sub>1</sub>, d<sub>0</sub>, output Z

$$\begin{aligned}
 d_1 &= \overline{x} \cdot \bar{q}_1 q_0 + x \cdot q_1 \bar{q}_0 + \bar{x} \cdot q_1 q_0 \\
 &= \overline{x} \cdot \bar{q}_1 q_0 + \bar{x} \cdot q_1 \bar{q}_0 + x \cdot q_1 \bar{q}_0 \\
 &= \overline{x} \cdot (\underbrace{\bar{q}_1 q_0 + q_1 \bar{q}_0}_{q_0 = 0}) + x \cdot q_1 \bar{q}_0 \\
 &\quad \left( \begin{array}{l} q_0 = 0 \\ q_0 = 1 \end{array} \right) \downarrow q_0 \\
 &= \overline{x} \cdot q_0 + x \cdot q_1 \bar{q}_0 \\
 d_1 &= \boxed{\overline{x} \cdot q_0 + x \cdot q_1 \bar{q}_0}
 \end{aligned}$$

|                               |               |
|-------------------------------|---------------|
| Complement Law                |               |
| $\bar{q}_0 (\bar{q}_1 + q_1)$ | $q_0 \cdot 1$ |
| $q_0 \cdot 1$                 | $q_0$         |

$$d_0 = \boxed{x}; \quad Z = q_1 q_0$$

$$\bar{q}_1 q_0 + q_1 \bar{q}_0 \Leftrightarrow q_0$$

$$q_0 = 0 \Rightarrow \text{whole expression} = 0$$

$$\bar{q}_1 \cdot 0 + q_1 \cdot 0 =$$

Design of a Moore FSM that detects "101" Sequence

Step 5: Built or design the circuit that detects sequence "101"



# D flip Flops



$Clk = \text{Clock}$

$d_o = \text{Input (1 bit)}$

$q_o = \text{Output}$

$\bar{q}_o = \text{Inverted Output}$

Design of Mealy FSM - Overlapping sequence "101"

\* Mealy output is assigned to the arcs and not to the states



Step 1: Create a Mealy Finite state Diagram



## Design of Mealy FSM - Overlapping Sequence "101"

Step 2: Determine the Min number of states/bits required to store the states

$$\text{Number of bits} = \log_2[k] = \log_2[3] \approx 2$$

K = Total # of states

Step 3: From FSD, Create the truth table.  
Let 00 = A, 01 = B, 10 = C, 11 = D.

NSG / OG

|   | Current<br>states<br>$z_1 z_0$ | Input<br>X | Next state<br>$d_1 d_0$ |                | Z |
|---|--------------------------------|------------|-------------------------|----------------|---|
|   |                                |            | d <sub>1</sub>          | d <sub>0</sub> |   |
| 1 | A                              | 0 0        | 0                       | 0 0            | A |
| 2 |                                | 0 0        | 1                       | 0 1            | B |
| 3 | B                              | 0 1        | 0                       | 1 0            | C |
| 4 |                                | 0 1        | 1                       | 0 1            | B |
| 5 | C                              | 1 0        | 0                       | 0 0            | A |
| 6 |                                | 1 0        | 1                       | 0 1            | B |
| 7 | D                              | 1 1        | 0                       | d d            | d |
| 8 |                                | 1 1        | 1                       | d d            | d |

Design of Mealy FSM - Overlapping Sequence "101"

Step 4: Determine the logical Expression

$$d_1 = q_0 \bar{x} ; d_1 = q_0 \bar{q}_1 \bar{x}$$

$$d_0 = x \quad z = q_1 x$$

Step 5: Draw the Circuit Diagram



Example: Design of Mealy Finite State machine that detects NON-OVERLAPPING Sequence "101"



Step 1: Create a mealy FSD - Finite state Diagram.

Let  $A = 00$ ,  $B = 01$ ,  $C = 10$ ,  $D = 11$

