

# Week 2

Monday, August 21, 2023 11:26 PM

- A variable has name, data type, address
- Variable occupies space in memory so it needs address which system keeps track of
- 16 billion places
- Can get address using address operator & (use %p for pointer)
- Pointer is hexadecimal number
- Address of a variable varies from run to run, cuz system allocates any free memory to the var
- **POINTER VARIABLE**

- Varibale that contains address of another variable



- a\_ptr is said to be **pointing to** variable a.
- Address of a is immaterial, we simply draw an arrow → blue box to the variable it points to.



- Declaring a pointer
- type \*pointer\_name (readright to left)
- Ex. int \*a\_ptr
- Assigning value to pointer
- Only an address may be assigned to pointer
- Ex. a\_ptr = &a;
- All variable contain random data before initialization
- Access variable through pointer
- Can access "a" directly as usual or indirectly thorugh a\_ptr by using the indirection operator (also called dereferencing operator)\*
- **printf("a = %d\n", \*a\_ptr);**
- \*a\_ptr = 456; changes value of a

- Incrementing pointer

- (\*p1)++

- If p is a pointer variable, what does p = p + 1 (or p++) mean?

```
int a; float b; char c; double d;
int *ap; float *bp;
char *cp; double *dp;

ap = &a; bp = &b; cp = &c; dp = &d;
printf("%p %p %p %p\n", ap, bp, cp, dp);

ap++; bp++; cp++; dp++;
printf("%p %p %p %p\n", ap, bp, cp, dp);

ap += 3;
printf("%p\n", ap);
```

Recall Lect#2a slide 16:  
int takes up 4 bytes  
float takes up 4 bytes  
char takes up 1 byte  
double takes up 8 bytes

ffbf0a4 ffbff0a0 ffbff09f ffbff090  
ffbf0a8 ffbff0a4 ffbff0a0 ffbff098  
ffbf0b4

IncrementPointers.c

## 1.9 Common Mistake

CommonMistake.c

```
int *n;
*n = 123;
printf("%d\n", *n);
```

What's wrong with this?  
Can you draw the picture?

n ?

- Where is the pointer n pointing to?
- Where is the value 123 assigned to?
- Result: Segmentation Fault (core dumped)
  - Remove the file "core" from your directory. It takes up a lot of space!

- Cuz the n is allocated random numbers as it's not initialized
- Math library <math.h> AND + compile with -lm option
- The only way to modify variables is to give it the address - restriction of C

## 2. Calling Functions (3/3)

To link to Math library

```
#include <stdio.h>
#include <math.h>

int main(void) {
    int x, y;
    float val;

    printf("Enter x and y: ");
    scanf("%d %d", &x, &y);
    printf("pow(%d, %d) = %f\n", x, y, pow(x, y));

    printf("Enter value: ");
    scanf("%f", &val);
    printf("sqrt(%f) = %f\n", val, sqrt(val));
}

return 0;
```

gcc -lm MathFunctions.c  
a.out

Enter x and y: 3 4  
pow(3,4) = 81.000000  
Enter value: 65.4  
sqrt(65.400002) = 8.087027

- User-Defined Functions

## 3. User-Defined Functions (6/6)

- Let's remove (or comment off) the function prototype for circle\_area() in WashersV2.c
- Messages from compiler:

```
WashersV2.c: In function 'main':
WashersV2.c:19:2: warning: implicit declaration of function
'circle_area' [-Wimplicit-function-declaration]
    rim_area = circle_area(d2) - circle_area(d1);
    ^
WasherV2.c: At top level:
WasherV2.c:27:8: error: conflicting types for 'circle-area'
:
```

- Without function prototype, compiler assumes the default (implicit) return type of int for circle\_area() when the function is used in line 19, which conflicts with the function header of circle\_area() when the compiler encounters the function definition later in line 27.

- Pass by value

## 4. Pass-by-Value and Scope Rule (1/4)

- In C, the actual parameters are passed to the formal parameters by a mechanism known as pass-by-value.

```
int main(void) {
    double a = 10.5, b = 7.8;
    printf("%.2f\n", sqrt_sum_square(3.2, 12/5));
    printf("%.2f\n", sqrt_sum_square(a, a+b));
    return 0;
}
```

```
double sqrt_sum_square(double x, double y) {
    double sum_square;
    sum_square = pow(x, 2) + pow(y, 2);
}
```

a      b  
10.5    7.8

Actual parameters:  
3.2 and 2.0

Formal parameters:  
x      y  
[ ]    [ ]

## 4. Pass-by-Value and Scope Rule (1/4)

- In C, the actual parameters are passed to the formal parameters by a mechanism known as **pass-by-value**.
- ```
int main(void) {  
    double a = 10.5, b = 7.8;  
    printf("%.2f\n", sqrt_sum_square(3.2, 12/5);  
    printf("%.2f\n", sqrt_sum_square(a, a+b));  
    return 0;  
}
```

  

```
double sqrt_sum_square(double x, double y) {  
    double sum_square;  
    sum_square = pow(x,2) + pow(y,2);  
    return sqrt(sum_square);  
}
```

a      b  
10.5    7.8

Actual parameters:  
3.2 and 2.0

Formal parameters:  
x      y  
3.2    2.0



## 4. Pass-by-Value and Scope Rule (2/4)

- Formal parameters are local to the function they are declared in.
- Variables declared within the function are also local to the function.
- Local parameters and variables are only accessible in the function they are declared – **scope rule**.
- When a function is called, an activation record is created in the call stack, and memory is allocated for the local parameters and variables of the function.
- Once the function is done, the activation record is removed, and memory allocated for the local parameters and variables is released.
- Hence, local parameters and variables of a function exist in memory only during the execution of the function. They are called **automatic variables**.
- In contrast, static variables exist in the memory even after the function is executed.



- The only way for a function to modify the value of a var outside scope is to use pointers

```
#include <stdio.h>  
void swap(int *, int *);  
int main(void) {  
    int a, b;  
    printf("Enter two integers: ");  
    scanf("%d %d", &var1, &var2);  
    swap(&a, &b);  
    printf("var1 = %d; var2 = %d\n", var1, var2);  
    return 0;  
}  
void swap(int *ptr1, int *ptr2) {  
    int temp;  
    temp = *ptr1; *ptr1 = *ptr2; *ptr2 = temp;  
}
```

In main(): a 3      b 2  
In swap(): ptr1      ptr2



- F
- ARRAYS

## 2.1 Array Declaration with Initializers

- As seen in `ArraySumV2.c`, an array can be initialized at the time of declaration.

```
// a[0]=54, a[1]=9, a[2]=10  
int a[3] = {54, 9, 10};  
  
// size of b is 3 with b[0]=1, b[1]=2, b[2]=3  
int b[] = {1, 2, 3};  
  
// c[0]=17, c[1]=3, c[2]=10, c[3]=0, c[4]=0  
int c[5] = {17, 3, 10};
```

Note what happens when fewer initial values are provided.

- The following initializations are incorrect:

```
int e[2] = {1, 2, 3}; // warning issued: excess elements  
  
int f[5];  
f[5] = {8, 23, 12, -3, 6}; // too late to do this;  
// compilation error
```



## 2.2 Arrays and Pointers

- Example: `int a[10]`



- When the array name `a` appears in an expression, it refers to the address of the first element (i.e. `&a[0]`) of that array.

`int a[3];` ↗  
`printf("%p\n", a);` ↗  
`printf("%p\n", &a[0]);`  
`printf("%p\n", &a[1]);`

These 2 outputs will always be the same.

Output varies from one run to another. Each element is of `int` type, hence takes up 4 bytes (32 bits).

- Cannot set an uninitialized array to equal another one
  - An array name is a fixed (constant) pointer ie it points to first element of array which cannot be changed.
  - You will be trying to alter the uninitialized array to make it point elsewhere
  - Instead you must use a for loop to copy over
- In array prototype don't need to specify the size of array
- \*\* C ignores length information of array as parameter. So you need to add size as another param

## 2.4 Array Parameters in Functions (3/3)

- Since an array name is a pointer, the following shows the alternative syntax for array parameter in function prototype and function header in the function definition

```
int sumArray(int *, int); // fn prototype
```

```
// function definition
int sumArray(int *arr, int size) {
    ...
}
```

- Compare this with the [ ] notation

```
int sumArray(int [], int); // fn prototype
```

```
// function definition
int sumArray(int arr[], int size) {
    ...
}
```

- Since array name is pointer, no need to pass address to function
- A function can always modify the content of the array its received
- STRINGS
- A string is a array of char with a null character '\0' at the end of the array
- Then you can use string functions by importing <string.h>
- Declare: char str[6];
- Assign: str[0] = 'e'; str[3] = '\0'

### 3.1 Strings: Basic

- Declaration of an array of characters

```
char str[6];
```

- Assigning character to an element of an array of characters

```
str[0] = 'e';
str[1] = 'g';
str[2] = 'g';
str[3] = '\0';
```



Without '\0', it is just an array of character, not a string.

- Initializer for string

- Two ways:

```
char fruit_name[] = "apple";
char fruit_name[] = {'a','p','p','l','e','\0'};
```

Do not need '\0' as it is automatically added.

- Single quote is for 1 character
- Double quote produces a string followed by null terminator

## 3.2 Strings: I/O (1/3)

- Read string from stdin (keyboard)

```
fgets(str, size, stdin) // reads size - 1 char,  
// or until newline  
scanf("%s", str); // reads until white space
```

- Print string to stdout (monitor)

```
puts(str); // terminates with newline  
printf("%s\n", str);
```

Note: There is another function `gets(str)` to read a string interactively. However, due to security reason, we avoid it and use `fgets()` function instead.

- `scanf` reads until first white space
- `fgets` reads until newline

## 3.2 Strings: I/O (3/3)

StringIO1.c

```
#include <stdio.h>  
#define LENGTH 10  
  
int main(void) {  
    char str[LENGTH];  
  
    printf("Enter string (at most %d characters): ", LENGTH-1);  
    scanf("%s", str);  
    printf("str = %s\n", str); Output:  
    return 0;  
}
```

Test out the programs with this input:

My book

str = My

StringIO2.c

```
#include <stdio.h>  
#define LENGTH 10  
  
int main(void) {  
    char str[LENGTH];  
  
    printf("Enter string (at most %d characters): ", LENGTH-1);  
    fgets(str, LENGTH, stdin);  
    printf("str = ");  
    puts(str); Output:  
    return 0;  
}
```

Output:

str = My book

StringIO2.c

Note that `puts(str)` adds a newline automatically.

- By changing the last part of the string to null, the length is changed

RemoveVowels.c

```
#include <stdio.h>  
#include <string.h>  
#include <ctype.h>  
  
int main(void) {  
    int i, len, count = 0;  
    char str[101], newstr[101];  
  
    printf("Enter a string (at most 100 characters): ");  
    fgets(str, 101, stdin); // what happens if you use scanf() here?  
    len = strlen(str); // strlen() returns number of char in string  
    if (str[len - 1] == '\n')  
        str[len - 1] = '\0';  
    len = strlen(str); // check length again  
  
    for (i=0; i<len; i++) {  
        switch (toupper(str[i])) {  
            case 'A': case 'E':  
            case 'I': case 'O': case 'U': break;  
            default: newstr[count++] = str[i];  
        }  
    }  
    newstr[count] = '\0';  
    printf("New string: %s\n", newstr);  
    return 0;  
}
```

## 3.4 String Functions (1/2)

- C provides a library of string functions
  - Must include <string.h>
  - Here are a few commonly used string functions
- **strlen(s)**
  - Return the number of characters in s
- **strcmp(s1, s2)**
  - Compare the ASCII values of the corresponding characters in strings s1 and s2.
  - Return
    - a negative integer if s1 is lexicographically less than s2, or
    - a positive integer if s1 is lexicographically greater than s2, or
    - 0 if s1 and s2 are equal.
- **strncmp(s1, s2, n)**
  - Compare first n characters of s1 and s2.



## 3.4 String Functions (2/2)

- **strcpy(s1, s2)**
  - Copy the string pointed to by s2 into array pointed to by s1.
  - Function returns s1.
  - Example:

```
char name[10];      M a t t * h e w \0 ? ?  
strcpy(name, "Matthew");
```
- The following assignment statement does not work:

```
name = "Matthew";
```
- What happens when string to be copied is too long?

```
strcpy(name, "A very long name");
```



- Don't use strcpy
- Use strncpy(s1, s2, n) copies at most n characters

## 3.5 Importance of '\0' in a String (1/2)

- To be treated as a string, the array of characters must be terminated with the null character '\0'.
- Otherwise, string functions will not work properly on it.
- For instance, the printf("%s", str) statement will print until it encounters a null character in str.
- Likewise, strlen(str) will count the number of characters up to (but not including) the null character.
- In many cases, a string that is not properly terminated with '\0' will result in illegal access of memory.



## 3.5 Importance of '\0' in a String (2/2)

- What is the output of this code?

```
#include <stdio.h>
#include <string.h>

int main(void) {
    char str[10];
    str[0] = 'a';
    str[1] = 'p';
    str[2] = 'p';
    str[3] = 'l';
    str[4] = 'e';

    printf("Length = %d\n", strlen(str));
    printf("str = %s\n", str);

    return 0;
}
```

WithoutNullChar.c

One possible output:

Length = 5\*

str = apple; Ø<

Compare the output if you add:  
str[5] = '\0';

or, you have:

char str[10] = "apple";

printf() will print  
%s from the  
starting  
address of str  
until it  
encounters the  
\0 character.



- STRUCTURES

- A "group" can be a member of another "group"

- Structure type
  - "typedef struct" is a keyword
  - Then have elements in curly braces

```
typedef struct {
    int length, width, height;
} box_t;
```

- ```
typedef struct {
    int acctNum;
    float balance;
} account_t;
```
- ```
typedef struct {
    int stuNum;
    float score;
    char grade;
} result_t;
```
- Creates a new data type ex. Account\_t
- A type is not a variable
- No memory is allocated to a type
- Struct is only definition of type

- Structure variables

```
typedef struct {
    int stuNum;
    float score;
    char grade;
} result_t;
result_t result1 = { 123321, 93.5, 'A' };
```

```
typedef struct {
    int day, month, year;
} date_t;
typedef struct {
    int cardNum;
    date_t expiryDate;
} card_t;
card_t card1 = { 888888, { 31, 12, 2020 } };
```

- Also can be done using dot operator

- ```

result_t result2;
result2.stuNum = 456654;
result2.score = 62.0;
result2.grade = 'D';

card_t card2 = { 666666, {30, 6} };

card2.expiryDate.year = 2021;

```

- Use dot notation to print out

## 4.6 Reading a Structure Member

- The structure members are read in individually the same way as we do ordinary variables
- Example:

- ```

result_t result1;

printf("Enter student number, score and grade: ");

scanf("%d %f %c", &result1.stuNum, &result1.score,
      &result1.grade);

```



- Must provide address because we want scan to be able to modify
- Assigning structures
  - Can use the dot operator to access indiv. Member of a structure variable
  - If structure variable's name is used, we are referring to the entire structure
  - Asssignments can be done to structures
  - Result2 = result1;



## 4.8 Returning Structure from Function (1/3)

- Example:
  - Given this structure type

```

typedef struct {
    int max;
    float ave;
} result_t;

```
- Define a function func() that returns a structure of this type:

```

result_t func( ... ) {
}

```
- To call this function:

```

result_t result;
result = func( ... );

```



## 4.8 Returning Structure from Function (2/3)

```
#include <stdio.h> StructureEg2.c
typedef struct {
    int max;
    float ave;
} result_t;
result_t max_and_average(int, int, int);

int main(void) {
    int num1, num2, num3;
    result_t result;
    printf("Enter 3 integers: ");
    scanf("%d %d %d", &num1, &num2, &num3);
    result = max_and_average(num1, num2, num3);

    printf("Maximum = %d\n", result.max);
    printf("Average = %.2f\n", result.ave);
    return 0;
}
...
```

```
// Computes the maximum and average of 3 integers
result_t max_and_average(int n1, int n2, int n3) {
    result_t result;

    result.max = n1;
    if (n2 > result.max)
        result.max = n2;
    if (n3 > result.max)
        result.max = n3;

    result.ave = (n1+n2+n3)/3.0;

    return result;
}
```

- POINTERS TO STRUCTURES
- When you pass a structure in a function it's the same as assigning the structure to the param
- Which means the entire structure is copied. The members of the actual param is copied into the corresponding members of the formal param
- Pass by value

## 4.9 Passing Structure to Function (2/2)

```
// include statements and definition
// of player_t are omitted here for brevity
void print_player(char [], player_t);

int main(void) {
    player_t player1 = { "Brusco", 23, 'M' }, player2;
    strcpy(player2.name, "July");
    player2.age = 21;
    player2.gender = 'F';
    print_player("player1", player1);
    print_player("player2", player2);
    return 0;
}

// Print player's information
void print_player(char header[], player_t player) {
    printf("%s: name = %s; age = %d; gender = %c\n", header,
           player.name, player.age, player.gender);
}
```

- Ex. NearbyStores.c
- Combining structures and array is good

```
// include statements, definition of player_t,
// and function prototypes are omitted here for brevity
int main(void) {
    player_t player1 = { "Brusco", 23, 'M' };
    change_name_and_age(player1);
    print_player("player1", player1);
    return 0;
}

// To change a player's name and age
void change_name_and_age(player_t player) {
    strcpy(player.name, "Alexandra");
    player.age = 25;
}

// Print player's information
void print_player(char header[], player_t player) {
    printf("%s: name = %s; age = %d; gender = %c\n", header,
           player.name, player.age, player.gender);
}
```

- Value of player1 isn't changed because pass by value.
- To allow the function to modify content of the original structure variable can be done but you need to pass in the address of it

```

// #include statements, definition of player_t,
// and function prototypes are omitted here for brevity
int main(void) {
    player_t player1 = { "Brusco", 23, 'M' };

    change_name_and_age(&player1);
    print_player("player1", player1);
    return 0;
}

// To change a player's name and age
void change_name_and_age(player_t *player_ptr) {
    strcpy((*player_ptr).name, "Alexandra");
    (*player_ptr).age = 25;
}

// Print player's information
void print_player(char header[], player_t player) {
    printf("%s: name = %s; age = %d; gender = %c\n", header,
           player.name, player.age, player.gender);
}

```

- Dot operator has higher precedence than \* dereference
- So you need the parenthesis
- The arrow operator (->)
  - Shortcut



# tut01qns

Thursday, August 24, 2023 10:16 PM



tut01qns

**CS2100 Computer Organization**  
**Tutorial 1: C and Number Systems**  
**(Week 3: 28 Aug – 1 Sep 2023)**

**Note: Friday's tutorial groups will have tutorial 1 on 25 August in lieu of 1 September. Refer to Canvas announcement for more information.**

1. In 2's complement representation, “sign extension” is used when we want to represent an  $n$ -bit signed integer as an  $m$ -bit signed integer, where  $m > n$ . We do this by copying the MSB (most significant bit) of the  $n$ -bit number  $m - n$  times to the left of the  $n$ -bit number to create the  $m$ -bit number.

For example, we want to sign-extend 0b0110 to an 8-bit number. Here  $n = 4$ ,  $m = 8$ , and thus we copy the MSB bit 0 four ( $8 - 4$ ) times, giving 0b00000110.

Similarly, if we want to sign-extend 0b1010 to an 8-bit number, we would get 0b1111010.

Show that IN GENERAL, sign extension is value-preserving. For example, 0b00000110 = 0b0110 and 0b1111010 = 0b1010.

2. We generalize  $(r - 1)$ 's-complement (also called *radix diminished complement*) to include fraction as follows:

$$(r - 1)\text{'s complement of } N = r^n - r^m - N$$

where  $n$  is the number of integer digits and  $m$  the number of fractional digits. (If there are no fractional digits, then  $m = 0$  and the formula becomes  $r^n - 1 - N$  as given in class.)

For example, the 1's complement of 011.01 is  $(2^3 - 2^{-2}) - 011.01 = (1000 - 0.01) - 011.01 = 111.11 - 011.01 = 100.10$ . (Since 011.01 represents the decimal value 3.25 in 1's complement, this means that -3.25 is represented as 100.10 in 1's complement.)

Perform the following binary subtractions of values represented in 1's complement representation by using addition instead. (Note: Recall that when dealing with complement representations, the two operands must have the same number of digits.)

- (a) 0101.11 – 010.0101
- (b) 010111.101 – 0111010.11

Is sign extension used in your working? If so, highlight it.

Check your answers by converting the operands and answers to their actual decimal values.

3. Convert the following numbers to fixed-point binary in 2's complement, with 4 bits for the integer portion and 3 bits for the fraction portion.

- (a) 1.75
- (b) -2.5
- (c) 3.876
- (d) 2.1

Using the binary representations you have derived, convert them back into decimal. Comment on the compromise between range and accuracy of the fixed-point binary system.

4. [AY2010/2011 Semester 2 Term Test #1]

How would you represent the decimal value  $-0.078125$  in the IEEE 754 single-precision representation? Express your answer in hexadecimal. Show your working.

5. Given the partial C program shown below, complete the two functions: **readArray()** to read data into an integer array (with at most 10 elements) and **reverseArray()** to reverse the array. For **reverseArray()**, you are to provide two versions: an iterative version and a recursive version. For the recursive version, you may write an auxiliary/driver function to call the recursive function.

```
#include <stdio.h>
#define MAX 10

int readArray(int [], int);
void printArray(int [], int);
void reverseArray(int [], int);

int main(void) {
    int array[MAX], numElements;

    numElements = readArray(array, MAX);
    reverseArray(array, numElements);
    printArray(array, numElements);

    return 0;
}

int readArray(int arr[], int limit) {

    // ...
    printf("Enter up to %d integers, terminating with a negative
integer.\n", limit);
    // ...
}

void reverseArray(int arr[], int size) {

    // ...
}

void printArray(int arr[], int size) {
    int i;

    for (i=0; i<size; i++) {
        printf("%d ", arr[i]);
    }
    printf("\n");
}
```

6. Trace the following program manually (do not run it on a computer) and write out its output.  
When you present your solution, draw diagrams to explain.

```
#include <stdio.h>

int main(void) {
    int a = 3, *b, c, *d, e, *f;

    b = &a;
    *b = 5;
    c = *b * 3;
    d = b;
    e = *b + c;
    *d = c + e;
    f = &e;
    a = *f + *b;
    *f = *d - *b;

    printf("a = %d, c = %d, e = %d\n", a, c, e);
    printf("*b = %d, *d = %d, *f = %d\n", *b, *d, *f);

    return 0;
}
```

Remember to post on the Canvas forum or QnA if you have any queries.

# Week 3

Friday, September 1, 2023 11:01 PM

MIPS - microprocessor without interlocked pipelined stages

- Instruction Set Architecture
  - Abstraction on the interface between hardware and low level software
  - ASSEMBLER converts from Assembly to Machine Code
- Components
  - Processor - performs computations
  - Memory - store code and data



- Bus is the wires and grants that connect
- Memory is actually much slower than processor. So most of the time would have been taken for processor waiting for values from memory
- So to avoid frequent access of memory,
  - We provide temporary storage for values in the processor known as REGISTERS
  - Registers work at speed of processor
- We need instructions to move data into registers - instruction called LOAD



- Arithmetic ops can work directly on registers only

## 3. Walkthrough: Control Flow (10/15)

- We need instructions to change the control flow based on condition:
- Repetition (loop) and Selection (if-else) can both be supported



- Both instruction and data are stored in memory
- Load store model
  - Limits memory ops and relies on register for storage during execution
  - Major types of assembly instruction
    - Memory - move vals between mem and register
    - Calculation - arithmetic other ops
    - Control flow - change the sequential execution

- General Purpose Registers
  - Fast memories in processor
    - Data is transferred from mem to register for faster processing
  - Limited
    - Typically 16-32 registers
    - Compiler associates var in program with registers
  - Registers don't have data type
    - Machine/assemble instruction assumes data stored in register is of correct type
  - Add - 2's/ addu - binary / fadd - floating
- MIPS has 32 registers (\$0, \$1 ... \$31)

#### 4. General Purpose Registers (2/2)

- There are 32 registers in MIPS assembly language:

↳ Can be referred by a number (\$0, \$1, ..., \$31) OR

↳ Referred by a name (eg: \$a0, \$t1)

| Name      | Register number | Usage                                       |
|-----------|-----------------|---------------------------------------------|
| \$zero    | 0               | Constant value 0                            |
| \$v0-\$v1 | 2-5             | Values for results and condition evaluation |
| \$a0-\$a3 | 4-7             | Arguments                                   |
| \$t0-\$t3 | 8-11            | Temporaries                                 |
| \$s0-\$s2 | 12-23           | Program variables                           |
| \$t4-\$t9 | 24-25           | More temporaries                            |
| \$gp      | 26              | Global pointer                              |
| \$sp      | 29              | Stack pointer                               |
| \$fp      | 30              | Frame pointer                               |
| \$ra      | 31              | Return address                              |

Sal (register 1) is reserved for the assembler.

\$k0-\$k1 (registers 26-27) are reserved for the operating system.

- MIPS assembly language

- Each instruction executes a simple cmd
- Each ln of assemble code contains at most 1 instruction
- # is a comment

##### 5.1 General Instruction Syntax



Naturally, most of the MIPS arithmetic/logic operations have three operands: 2 sources and 1 destination

- Variable mapping - mapping the values to registers in mips processor

- 5.2 Arithmetic Operation: Addition

| C Statement             | MIPS Assembly Code                |
|-------------------------|-----------------------------------|
| <code>a = b + c;</code> | <code>add \$s0, \$s1, \$s2</code> |

We assume the values of "a", "b" and "c" are loaded into registers \$s0, \$s1 and \$s2.

Known as **variable mapping**

Actual code to perform the reading will be shown later in **memory**.

**Important concept:**

MIPS arithmetic operations are mainly **register-to-register**

**NOTE:**  
The first two register mappings deals with step 1) i.e. load and step 2) i.e. store. All compilers are assumed to be in register for the set of instructions. We will talk about conditions in the next lecture.

##### 5.3 Arithmetic Operation: Subtraction

| C Statement             | MIPS Assembly Code                |
|-------------------------|-----------------------------------|
| <code>a = b - c;</code> | <code>sub \$s0, \$s1, \$s2</code> |

- Positions of \$s1 and \$s2 (i.e. source1 and source2) are important for subtraction

**NOTE:**  
`sub $s0, $s1, $s2`  
is basically  
`$s0 = $s1 - $s2`

##### 5.4 Complex Expression (1/3)

| C Statement                 | MIPS Assembly Code                                                                           |
|-----------------------------|----------------------------------------------------------------------------------------------|
| <code>a = b + c - d;</code> | <code>??? ??? ???</code><br><code>add \$s0, \$s1, \$s2</code><br><code>sub \$s0, \$s3</code> |

- A single MIPS instruction can handle at most two source operands.

→ Need to break a complex statement into multiple MIPS instructions

| MIPS Assembly Code                                                                                   | Use temporary registers \$t0 to \$t7 for intermediate results |
|------------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
| <code>add \$t0, \$s1, \$s2 # tmp0 = b + c</code><br><code>sub \$s0, \$t0, \$s3 # a = tmp0 - d</code> |                                                               |

##### 5.4 Complex Expression: Example (2/3)

| C Statement                         | Variable Mappings                                                                                                                                                      |
|-------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code>f = (g + h) - (i + j);</code> | <code>\$s0 → variable f</code><br><code>\$s1 → variable g</code><br><code>\$s2 → variable h</code><br><code>\$s3 → variable i</code><br><code>\$s4 → variable j</code> |
| <code>t0 = g + h;</code>            | <code>\$s1 → variable g</code><br><code>\$s2 → variable h</code><br><code>\$t0 → intermediate t0</code>                                                                |
| <code>i1 = i + j;</code>            | <code>\$s3 → variable i</code><br><code>\$s4 → variable j</code><br><code>\$t1 → intermediate t1</code>                                                                |
| <code>f = t0 - t1;</code>           | <code>\$s0 → variable f</code><br><code>\$t0 → intermediate t0</code><br><code>\$t1 → intermediate t1</code>                                                           |

- Break it up into multiple instructions.

→ Use two temporary registers \$t0, \$t1

`add $t0, $s1, $s2 # tmp0 = g + h`  
`add $t1, $s3, $s4 # tmp1 = i + j`  
`sub $s0, $t0, $t1 # f = tmp0 - tmp1`

##### 5.4 Complex Expression: Exercise (3/3)

| C Statement                     | Variable Mappings                                                                                                                                                      |
|---------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code>x = a + b + c + d;</code> | <code>\$s1 → variable a</code><br><code>\$s2 → variable b</code><br><code>\$s3 → variable c</code><br><code>\$s4 → variable d</code><br><code>\$s5 → variable x</code> |

##### 5.5 Constant/Immediate Operands

| C Statement             | MIPS Assembly Code              |
|-------------------------|---------------------------------|
| <code>a = a + 4;</code> | <code>addi \$s0, \$s0, 4</code> |

Immediate values are numerical constants

↳ Frequently used in operations

↳ MIPS supplies a set of operations specially for them

↳ "Add immediate" (`addi`)

↳ Syntax is similar to `add` instruction; but source2 is a constant instead of register

↳ The constant ranges from  $2^{31}$  to  $2^{31}-1$

Can you guess what number system is used?

Answer: 16-bit 2's complement number system

- There is no `subi`. Just use `addi` with negative constant

- Register Zero \$0

### 5.6 Register Zero (\$0 or \$zero)

- The number zero (0), appears very often in code.

Provide register zero (\$0 or \$zero) which always have the value 0

| C Statement | MIPS Assembly Code                                                                                                   |
|-------------|----------------------------------------------------------------------------------------------------------------------|
| $f = g;$    | <code>add \$f, \$g, \$zero</code>                                                                                    |
| $f = g + 0$ | $\begin{array}{l} \$g \rightarrow \text{variable } f \\ \$0 \rightarrow \text{variable } f \\ f = g + 0 \end{array}$ |

- The above assignment is so common that MIPS has an equivalent pseudo-instruction (`move`):

| MIPS Assembly Code           | Pseudo-Instruction                                                                                                       |
|------------------------------|--------------------------------------------------------------------------------------------------------------------------|
| <code>move \$s0, \$s1</code> | "False" instruction that gets translated to corresponding MIPS instructions.<br>Provided for convenience in coding only. |

## LOGICAL OPERATIONS

- Arithmetic instructions view content of register as single quantity
- We can view register as 32 raw bits rather than 32 bit number

| Logical operation | C operator              | Java operator                       | MIPS instruction       |
|-------------------|-------------------------|-------------------------------------|------------------------|
| Shift Left        | <code>&lt;&lt;</code>   | <code>&lt;&lt;</code>               | <code>sll</code>       |
| Shift right       | <code>&gt;&gt;**</code> | <code>&gt;&gt;, &gt;&gt;&gt;</code> | <code>srl</code>       |
| Bitwise AND       | <code>&amp;</code>      | <code>&amp;</code>                  | <code>and, andi</code> |
| Bitwise OR        | <code> </code>          | <code> </code>                      | <code>or, ori</code>   |
| Bitwise NOT*      | <code>~</code>          | <code>~</code>                      | <code>nor</code>       |
| Bitwise XOR       | <code>^</code>          | <code>^</code>                      | <code>xor</code>       |

### 5.7 Logical Operations: Overview (2/2)

- Truth tables of logical operations

\* 0 represents false; 1 represents true

|                 |              |                     |
|-----------------|--------------|---------------------|
| <b>AND</b>      | <b>OR</b>    | <b>XOR</b>          |
|                 |              |                     |
| <b>NAND (a)</b> | <b>NOT a</b> | <b>Exclusive OR</b> |
|                 |              |                     |

### 5.8 Logical Operations: Shifting (1/2)

**Opcode:** →`sll` (shift left logical)

Move all the bits in a word to the left by a number of positions; fill the emptied positions with zeroes.

- E.g. Shift bits in \$s0 to the left by 4 positions

|  |                                                       |  |
|--|-------------------------------------------------------|--|
|  | <code>sll \$t2, \$s0, 4 # \$t2 = \$s0&lt;&lt;4</code> |  |
|--|-------------------------------------------------------|--|

**NOTE:**  
The emptied positions are filled with 0s

- Max shifts is 31
- Shifting left is same as multiply by  $2^n$
- Shifting right is same as divide by  $2^n$
- Shifting ls much faster than multiplication/division

### 5.9 Logical Operations: Bitwise AND

**Opcode:** `and` (bitwise AND)

Bitwise operation that leaves a 1 only if both the bits of the operands are 1

- E.g.: `and $t0, $t1, $t2`

|  |  |
|--|--|
|  |  |
|--|--|

- `and` can be used for masking operation:

- Place 0s into the positions to be ignored → bits will turn into 0s
- Place 1s for interested positions → bits will remain the same as the original.

**NOTE:**  
Bit-mask is setting the irrelevant part to 0 (i.e., masked).

### 5.9 Exercise: Bitwise AND

- We are interested in the last 12 bits of the word in register \$t1. Result to be stored in \$t0.
- Q: What's the mask to use?

|  |  |
|--|--|
|  |  |
|--|--|

If you want specific bits of a word, use a AND mask with 1s in the places u want

## 5.9 Exercise: Bitwise AND

- We are interested in the last 12 bits of the word in register \$t1. Result to be stored in \$t0.
  - Q: What's the mask to use?



**Notes:**  
The **and** instruction has an immediate version, **andi**

If you want specific bits of a word, use a AND mask with 1s in the places u want

## 5.10 Logical Operations: Bitwise OR

**Opcode:** or (bitwise OR)

Bitwise operation that places a 1 in the result if either operand bit is 1

**Example:** or \$t0, \$t1, \$t2 32-bit number

- The **or** instruction has an immediate version **ori**
- Can be used to force certain bits to 1s
- E.g.: ori \$t0, \$t1, 0xFFFF



- NOR operation can be used to get NOT instruction

## 5.11 Logical Operations: Bitwise NOR

- Strange fact 1:

- There is no **NOT** instruction in MIPS to toggle the bits (1 → 0, 0 → 1)
- However, a **NOR** instruction is provided:

**Opcode:** nor (bitwise NOR)

**Example:** nor \$t0, \$t1, \$t2

|   |   | NOR(b)  |                    |
|---|---|---------|--------------------|
| a | b | a NOR b | a <sub>NOT</sub> b |
| 0 | 0 | 1       | 1                  |
| 0 | 1 | 0       | 0                  |
| 1 | 0 | 0       | 1                  |
| 1 | 1 | 0       | 0                  |

- Question: How do we get a NOT operation?

- Question: Why do you think is the reason for not providing a NOT instruction?

- Nor \$t0, \$t0, \$zero. Do Nor op with zeros to get NOT
- No NOT op because want to keep instruction set small

- Can use XOR op to get NOT
  - By pairing with all 1s

## 5.12 Logical Operations: Bitwise XOR

**Opcode:** xor (bitwise XOR)

**Example:** xor \$t0, \$t1, \$t2

- Question: Can we also get **NOT** operation from XOR?
  - Yes, let \$t2 contain all 1s.
- Strange Fact 2:
  - There is no **NORI**, but there is **XORI** in MIPS
  - Why? NOTE:  
A possible reason is that there is not much need for **NORI**. So there is no reason to add this capability to keep the processor design simple.

To get not, can also do xor op with 1s

- Addi is for 16-bit 2s comple
- So for large constants? 32 bit constant?
- Use "load upper immediate" (lui) to set upper 16 bit
- Use "or immediate" (ori) to set lower order bits

## 6. Large Constant: Case Study

- Question: How to load a 32-bit constant into a register? e.g. 10101010 10101010 11110000 11110000

- Use "load upper immediate" (lui) to set the upper 16-bit:

```
lui $t0, 0xAAAA      #1010101010101010
1010101010101010 0000000000000000
```

Lower-order bits filled with zeros.

- Use "or immediate" (ori) to set the lower-order bits:

```
ori $t0, $t0, 0xF0F0 #1111000011110000
Higher-order bits filled with zeros. 1010101010101010 0000000000000000
ori 0000000000000000 1111000011110000
```

### Memory (RAM) Organization

- Can be viewed as a single dimension arr of memory locations
  - Each location of memory has address which is index into arr

- Given k-bit address, address space is of size  $2^k$



- The memory map above contains 1 byte in every location - this is byte addressing

- Memory -- transfer unit
  - With distinct memory address, can access
    - Single byte
    - Or single word (word addressable)
  - Word is
    - $2^n$  bytes
    - Common unit of transfer between processor and memory
    - Commonly coincide with register size, integer and instruction size

### 1.2 Memory: Word Alignment

#### Word alignment:

- Words are aligned in memory if they begin at a byte address that is a multiple of the number of bytes in a word.

Example: If a word consists of 4 bytes, then:



- To quickly check if a given memory address is word aligned? Do address/4 - which is word size
- MIPS memory instructions
  - MIPS is load-store register archi
  - 32 register - each 32 bit long
  - Each word contains 32 bits
  - Memory addresses are 32 bit long

| Name                                                                                                              | Examples                                                                         | Comments                                                                                                                                                                                                                    |
|-------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 32 registers                                                                                                      | \$s0-\$s7, \$t0-\$t9, \$zero, \$a0-\$a3, \$v0-\$v1, \$gp, \$fp, \$sp, \$ra, \$at | Fast processor storage for data.<br>In MIPS, data must be in registers to perform arithmetic.                                                                                                                               |
| 2 <sup>32</sup> memory words<br> | Mem[0], Mem[4], ..., Mem[4294967292]                                             | Accessed only by data transfer instructions.<br>MIPS uses byte addresses, so consecutive words differ by 4.<br>Memory holds data structures, such as arrays, and spilled registers, such as those saved on procedure calls. |

## 2.1 Memory Instruction: Load Word

- Example: `lw $t0, 4($s0)`



- Steps:

- Memory Address =  $\$s0 + 4 = 8000 + 4 = 8004$
- Memory word at  $\text{Mem}[8004]$  is loaded into  $\$t0$

## 2.2 Memory Instruction: Store Word

- Example: `sw $t0, 12($s0)`



- Steps:

- Memory Address =  $\$s0 + 12 = 8000 + 12 = 8012$
  - Content of  $\$t0$  is stored into word at  $\text{Mem}[8012]$
- Memory at address 8012*

- Only load and store instructions can access data in memory

## 2.3 Load and Store Instructions

- Only **load** and **store** instructions can access data in memory.

- Example: Each array element occupies a word.



- Handwritten notes:*
- Each array element occupies a word (4 bytes).
  - \$s3 contains the **base address** (address of first element, A[0]) of array A. Variable **h** is mapped to \$s2.
  - Remember arithmetic operands (for **add**) are registers, not memory!

- Load/store word would load 4 bytes. Load/store byte would get 1 byte

## 2.4 Memory Instructions: Others (1/2)

- Other than load word (**lw**) and store word (**sw**), there are other variants, example:
  - load byte (**lb**)
  - store byte (**sb**)
- Similar in format:
  - $\rightarrow lb \ $t1, 12(\$s3) \rightarrow 1\text{ byte}$
  - $\rightarrow sb \ \$t2, 13(\$s3) \rightarrow 1\text{ byte}$
- Similar in working except that one byte, instead of one word, is loaded or stored
  - Note that the offset no longer needs to be a multiple of 4

| Address | Content |
|---------|---------|
| 0       | 8 bits  |
| 1       | 8 bits  |
| 2       | 8 bits  |
| 3       | 8 bits  |
| 4       | 8 bits  |
| 5       | 8 bits  |
| 6       | 8 bits  |
| 7       | 8 bits  |
| 8       | 8 bits  |
| 9       | 8 bits  |
| 10      | 8 bits  |
| 11      | 8 bits  |
| :       |         |

## 2.4 Memory Instructions: Others (2/2)

- Align word*  
*unaligned*
- MIPS disallows loading/storing unaligned word using **lw**/**sw**.
  - Pseudo-instructions **unaligned load word (ulw)** and **unaligned store word (usw)** are provided for this purpose.
  - Other memory instructions:
    - lh** and **sh**: load halfword and store halfword
    - lwl**, **lwr**, **swl**, **swr**: load word left / right, store word left / right.
    - Etc...
- NOTE:** **ulw/usw** can be translated to sequence of **lb/lb** + other operations

## 2.5 Example: Array (assume 4 bytes per element)



- Registers don't have types
- Register can hold any 32 bit number
  - The number has no implicit data type and its interpreted according to the instruction that uses it

## SWAPPING ELEMENTS

### 2.7 Example: Swapping Elements



### 3. Making Decisions (2/2)

- Decision-making instructions
  - Alter the control flow of the program
  - Change the next instruction to be executed
- Two types of decision-making statements in MIPS
  - Conditional (branch)**
    - `bne $t0, $t1, label`
    - `beq $t0, $t1, label`
  - Unconditional (jump)**
    - `j label`
- A label is an "anchor" in the assembly code to indicate point of interest, usually as branch target.
  - Labels are NOT instructions!
  - NOT variable → No memory.

if (condition)  
{     }①  
else  
    }②

NOTE:  
Selection: branch/jump down  
Repetition: branch/jump up  
In both cases, it's a "goto" operation

$t_1 = t_1$   
↳ Next inst after  
 $t_0 \neq t_1$   
↳ branch to  
label.

### 3.1 Conditional Branch: `beq` and `bne`

- Processor follows the branch only when the condition is satisfied (true)
- beq \$r1, \$r2, L1**
  - Go to statement labeled L1 if the value in register \$r1 equals the value in register \$r2
  - `beq` is "branch if equal"
  - C code: `if (a == b) goto L1`
- bne \$r1, \$r2, L1**
  - Go to statement labeled L1 if the value in register \$r1 does not equal the value in register \$r2
  - `bne` is "branch if not equal"
  - C code: `if (a != b) goto L1`

### 3.2 Unconditional Jump (`j`)

- Processor **always** follows the branch
- j L1**
  - Jump to label L1 unconditionally
  - C code: `goto L1`
  - Technically equivalent to such statement
  - `beq $s0, $s0, L1`

→ beq \$s3, \$s4, L1  
→ j Exit  
L1: add \$s0, \$s1, \$s2  
→ Exit: —

**NOTE:**  
The general technique of compiling from C to MIPS with selection/repetition is to first remove the construct and replace it with either one of these:

→ `if (x == y) goto l; // this is beq`  
→ `if (x != y) goto l; // this is bne`

Some tricks can be used (e.g., inversion) to generate fewer lines of codes.

IF statements

### 3.3 IF statement (1/2)



Bne is more efficient here

### 3.3 IF statement (2/2)

| C Statement to translate                                            | Variables Mapping                                                                                                         |
|---------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
| <pre>if (i == j)   f = g + h; ] if else   [ f = g - h; ] else</pre> | $f \rightarrow \$s_0$<br>$g \rightarrow \$s_1$<br>$h \rightarrow \$s_2$<br>$i \rightarrow \$s_3$<br>$j \rightarrow \$s_4$ |

4 instr

```
bne \$s_3, \$s_4, Else
add \$s_0, \$s_1, \$s_2 ] if
j EXIT
Else: [ sub \$s_0, \$s_1, \$s_2 ] else
Exit:
```

Both cases are similar - cuz using 4 instruction

### 3.4 Exercise #1: IF statement

| MIPS code to translate into C                                  | Variables Mapping                                                       |
|----------------------------------------------------------------|-------------------------------------------------------------------------|
| <pre>beg \$s1, \$s2, Exit add \$s0, \$zero, \$zero Exit:</pre> | $f \rightarrow \$s_0$<br>$i \rightarrow \$s_1$<br>$j \rightarrow \$s_2$ |

- What is the corresponding high-level statement?

```
if (i != j) {
  f = 0;
}
```

NOTE:

Remember the inversion? Also works in "reverse" (or decompilation) process. `beq` becomes `if (i != j)`.

| C Statement to translate                                                         | Variables Mapping                                                       |
|----------------------------------------------------------------------------------|-------------------------------------------------------------------------|
| <pre>Loop: if (j != k)       goto Exit;       [ i = i+1 ]       goto Loop;</pre> | $i \rightarrow \$s_3$<br>$j \rightarrow \$s_4$<br>$k \rightarrow \$s_5$ |

NOTE:  
This shows the process clearly:  
1. Convert from while to if...goto  
2. Convert from there to MIPS

\* What is the corresponding MIPS code?

```
Loop: bne \$s_4, \$s_5, Exit
      addi \$s_3, \$s_3, 1
      j Loop
      # if (j!=k) Exit
      # repeat loop
Exit:
```

ISA

### 4.1 Exercise #2: FOR loop

- Write the following loop statement in MIPS

| C Statement to translate                        | Variables Mapping                              |
|-------------------------------------------------|------------------------------------------------|
| <pre>for (i=0; i&lt;10; i++)   a = a + 5;</pre> | $i \rightarrow \$s_0$<br>$a \rightarrow \$s_2$ |

NOTE:  
Alternatively, if you know how to complete while-loop, then you can translate the for-loop into:

```
init: li $s_0, 0
      addi $s_1, $s_0, 10
      addi $s_2, $s_0, 5
      lsl $s_0, $s_0, 1
      j Loop
      # for loop
      # loop body
      # end of loop
      # exit
```

## 4.2 Inequalities (1/2)

$i=j$      $i \neq j$

- We have beg and bne, what about branch-if-less-than?
  - There is no real blt instruction in MIPS
- Use slt (set on less than) or slti.



NOTE:  
If we use the "ternary operator" discussed in tutorial, then:

$\$t0 = (\$s1 < \$s2) ? 1 : 0;$

## 4.2 Inequalities (1/2)

$i=j$      $i \neq j$

- We have beg and bne, what about branch-if-less-than?
  - There is no real blt instruction in MIPS
- Use slt (set on less than) or slti.



NOTE:  
If we use the "ternary operator" discussed in tutorial, then:

$\$t0 = (\$s1 < \$s2) ? 1 : 0;$

## Array and Loop

- To access arr elements in a loop
  - Initialization for result variable, loop counter, and array pointer
  - Label:
    - Work by
      - Calculating address
      - Load data
      - Perform task
    - Update loop counter and arr pointers
    - Compare and branch
- Count number of zeros in 40 elem word arr

5. Array and Loop: Version 1.0 (using index)

```
Address of A[0] → $t0
Result → $t1
# t1 < 40
addi $t2, $zero, 0
addi $t3, $zero, 40 # end point
loop: li $t4, 4
    addi $t5, $t2, $t4
    addi $t6, $t3, $t4
    lw $t7, 0($t5)
    beq $t7, $zero, skip
    addi $t8, $t5, 1
    addi $t9, $t6, 1
    addi $t10, $t1, 1
skip: addi $t5, $t5, 4
    addi $t6, $t6, 4
    addi $t10, $t10, 1
    b loop
end:
```

5. Array and Loop: Version 2.0 (using pointer)

```
Address of A[0] → $t0
Result → $t1
# A[0] → $t0
# t0 < 40
# Initialize the code using pointer with:
# needs = 0;
# pre = A[40];
# while t0 >= 0 and needs < 40:
#     if pre == 0:
#         needs++;
#     pre = pre + 4;
# needs -> t1;
# t1 -> Result;
# The resulting MIPS looks like the code
# on the left.
# Use of "pointers" can produce more efficient code!
```

# Week 4

Tuesday, September 5, 2023 9:24 PM

- Each mips instruction has fixed length of 32 bits
- MIPS instruction classification: R, I, J
  - Classified according to operands
  - **R-format** register format: op \$r1 \$r2 \$r3
    - Instructions which use 2 source registers and 1 destination register
      - Add, sub, and, or , nor, slt, etc
      - Srl, sll
  - **I format** - intermediate format: op \$r1, 4r2, Immd
    - Instructions which use 1 source register, 1 immediate value ad 1 destination register
      - Addi, andi, ori, slti, lw, sw, beq, bne
  - **J-format** (jump format: op Immd
    - Uses only 1 immediate val

\* For simplicity, register numbers (\$0, \$1, ..., \$31) will be used in examples here instead of register names

| Name      | Register number | Usage                                        |
|-----------|-----------------|----------------------------------------------|
| \$zero    | 0               | Constant value 0                             |
| \$v0-\$v1 | 2-3             | Values for results and expression evaluation |
| \$a0-\$a3 | 4-7             | Arguments                                    |
| \$t0-\$t7 | 8-15            | Temporaries                                  |
| \$s0-\$s7 | 16-23           | Program variables                            |

Sat (register 1) is reserved for the assembler  
\$k0-\$k1 (registers 26-27) are reserved for the operation system.

## R- FORMAT



5. R-Format (2/2) → add \$11, \$12, \$10

| Fields                    | Meaning |
|---------------------------|---------|
| opcode                    | = 0     |
| funct                     | =       |
| rs (Source Register)      | s11     |
| rt (Target Register)      | s12     |
| rd (Destination Register) | d10     |
| shamt                     | =       |

- R- format opcode is always 0

### 5.1 R-Format: Example (1/3)

| MIPS instruction     | NOTE:                                                                                                     |
|----------------------|-----------------------------------------------------------------------------------------------------------|
| → add \$8, \$9, \$10 | Find the value of <b>opcode</b> and <b>funct</b> from the MIPS green sheet that contains all information. |

| R-Format Fields | Value | Remarks                   |
|-----------------|-------|---------------------------|
| opcode          | 0     | (textbook pg 94 - 101)    |
| funct           | 32    | (textbook pg 94 - 101)    |
| rd              | 8     | (destination register)    |
| rs              | 9     | (first operand)           |
| rt              | 10    | (second operand)          |
| shamt           | 0     | (not a shift instruction) |

### 5.1 R-Format: Example (2/3)



### 5.1 R-Format: Example (3/3)



### 5. R-Format: Summary



### I Format Instruction

- Instructions with immediate vals
- 5 bit shamt field can only represent 0 to 31
- Immediates can be much larger

### 6. I-format (2/4)

- Define fields with the following number of bits each:
  - 6 + 5 + 5 + 16 = 32 bits



### 6. I-format (3/4)

- opcode**
  - Since there is no funct field, opcode uniquely specifies an instruction
- rs**
  - specifies the source register operand (if any)
- rt**
  - specifies register to receive result
  - note the difference from R-format instructions
- Continue on next slide.....

## 6. I-format (4/4)

- immediate:

- Treated as a signed integer
  - Except for bitwise operations (andi, ori, xor)
- 16 bits → can be used to represent a constant up to  $2^{16}$  different values
- Large enough to handle:
  - The offset in a typical lw or sw
  - Most of the values used in the addi, slli instructions

**NOTE:**

This should answer why we cannot put 32-bit constants in the instruction. Because the instruction itself is only 32-bit wide. So we can only use parts of it. Plus, we still need to encode opcode, rs and rt.



offset      lw + sd      16-bits

### 6.1 I-format: Example (2/2)

MIPS instruction

addi \$21, \$22, -50

Field representation in decimal:

|   |    |    |     |
|---|----|----|-----|
| 8 | 22 | 21 | -50 |
|---|----|----|-----|

Field representation in binary:

|        |       |       |                  |
|--------|-------|-------|------------------|
| 001000 | 10110 | 10101 | 1111111111001110 |
|--------|-------|-------|------------------|

2's comp binary

50 → 000...110010

2's comp → 111...001110

16-bits

Hexadecimal representation of instruction:

2 2 D 5 F F C E<sub>16</sub>

- Need to add preceding zeros

### 6.2 Try It Yourself #2

MIPS instruction

lw \$9, 12(\$8)

Field representation in decimal:

|        |    |    |     |
|--------|----|----|-----|
| opcode | rs | rt | imm |
| 35     | 8  | 9  | 12  |

000...1100

Field representation in binary:

|        |       |       |                  |
|--------|-------|-------|------------------|
| 100011 | 01000 | 01001 | 0000000000001100 |
|--------|-------|-------|------------------|

Hexadecimal representation of instruction:

8 D 0 9 0 0 0 C<sub>16</sub>

- Instructions are 32 bit long and stored in memory
- Word aligned
- PROGRAM COUNTER (PC)
  - A special register that keeps address of next instruction being executed in processor

## 6.4 Branch: PC-Relative Addressing (1/5)

- Use I-Format



- opcode specifies beq, bne

- rs and rt specify registers to compare
- immediate specifies the address of next instr.

- What can immediate specify?

- Immediate is only 16 bits
- Memory address is 32 bits
- → Immediate is not enough to specify the entire target address!

## 6.4 Branch: PC-Relative Addressing (2/5)

- How do we usually use branches?
    - Answer: if, else, while, for (bne, beq)
    - Loops are generally small:
    - Typically up to 50 instructions
  - Unconditional jumps are done using jump instructions (j), not the branches
  - Conclusion: A branch often changes PC by a small amount
- (large jump)*
- 

## 6.4 Branch: PC-Relative Addressing (3/5)

- Solution:**
  - Specify target address relative to the PC
  - Target address is generated as:  $PC = PC + \text{immed}$
  - PC + the 16-bit immediate field
  - The immediate field is a signed two's complement integer
- Can branch to  $\pm 2^{15}$  bytes from the PC:
  - Should be enough to cover most loops



## 6.4 Branch: PC-Relative Addressing (5/5)

- Branch calculation:
  - If the branch is not taken:  
 $PC = PC + 4$   
 ( $PC + 4$  is address of next instruction)
  - If the branch is taken:  
 $PC = (PC + 4) + (\text{immediate} \times 4)$
- Observations:
  - immediate field specifies the number of words to jump, which is the same as the number of instructions to "skip over"
  - immediate field can be positive or negative
  - Due to hardware design, add immediate to  $(PC+4)$ , not to PC (more in later topic)

## 6.5 Branch: Example (1/3)

```

Loop: beq $9, $0, End # rlt addr: 0
      add $8, $8, $10 # rlt addr: 4
      addi $9, $9, -1 # rlt addr: 8
      j Loop          # rlt addr: 12
End:   _____          # rlt addr: 16

```

- **beq** is an I-Format instruction →

| I-Format Fields | Value | Remarks          |
|-----------------|-------|------------------|
| opcode          | 4     |                  |
| rs              | 9     | (first operand)  |
| rt              | 0     | (second operand) |
| immediate       | ???   | (in base 10)     |

# instruction ?

- Immediate is # of instruction to skip
- # of instructions get counted from the instruction following the branch

## 6.5 Branch: Example (3/3)

```

Loop: beq $9, $0, End # rlt addr: 0
      add $8, $8, $10 # rlt addr: 4
      addi $9, $9, -1 # rlt addr: 8
      j Loop          # rlt addr: 12
End:   _____          # rlt addr: 16

```

- Field representation in decimal:

|        |    |    |           |
|--------|----|----|-----------|
| opcode | rs | rt | immediate |
| 4      | 9  | 0  | 3         |

To bin

Field representation in binary:

|        |       |       |                   |
|--------|-------|-------|-------------------|
| 000100 | 01001 | 00000 | 00000000000000011 |
|--------|-------|-------|-------------------|

To hex

↔

## 6.6 Try It Yourself #3

```

Loop: beq $9, $0, End # rlt addr: 0
      add $8, $8, $10 # rlt addr: 4
      addi $9, $9, -1 # rlt addr: 8
      beq $0, $0, Loop # rlt addr: 12
End:   _____          # rlt addr: 16

```

- What would be the immediate value for the second **beq** instruction? ↴?

## 6. I-Format: Summary



NOTE:  
please note the position of rs and rt here.  
The first register is NOT rt but is rs instead!

## 6. I-Format: Summary



- For branches we used PC relative addressing but the max we can jump is 32768 ( $2^{15}$ ). This is ok for branches since we don't need to branch too far
- But for general jumps, we can go anywhere in memory
- We however, cannot write whole address. Because we need to use 6 bits for opcode. Then the remaining 26 bits

## 7. J-Format (2/5) j label

- Define fields of the following number of bits each:



- As usual, each field has a name:



- Keep opcode field identical to R-format and I-format for consistency
- Combine all other fields to make room for larger target address

- Since we can only specify 26 bits of a 32 bit address
- Like branches, jumps only jump to word-aligned addresses so last 2 bits are always 00 (multiples of 4)
- So we assume address ends with 00 and leave it out

- So we can specify 28 bits of 32 bit address
- Then for remaining 4 bits, we use the 4 most significant bits of next address
- Can jump  $2^{26}$  instructions
- Maximum jump range is 256 mb boundary ( $2^{28}$  bytes)
- If program straddles 256mb boundary, there is jr instruction. But out of syllabus

### 7. J-Format (5/5)



- Effective address

### 7.1 J-Format: Example



- To get the actual/effective address (32 bits), take target address(26 bits) add 2 zeros to the end and add the MSB from PC to start

### 7.2 Branching Far Way

- Given the instruction

**beq** \$s0, \$s1, L1

Assume that the address L1 is farther away from the PC than what can be supported by **beq** and **bne** instructions

- Challenge: [bq/bre + j]

- Construct an equivalent code sequence with the help of unconditional (j) and conditional branch (beq, bne) instructions to accomplish this far away branching

NOTE:  
Discuss in forum

Bne \$s0, \$s1, End

j L1

End

## ADDRESSING MODES

- Register addressing
  - Operand is register
- Immediate addressing
  - Operand is constant within the instruction itself

### 8. Addressing Modes (1/3)

① Register addressing: operand is a register (short)



- ② Immediate addressing: operand is a constant within the instruction itself (addi, andi, ori, slti) 16-bits



- Base addressing (displacement addressing)
  - Operand is at the memory location whose address is sum of a register and a constant in the instruction (lw, sw)
  - Displacement is # of bytes

### 8. Addressing Modes (2/3)

③ Base addressing (displacement addressing):

operand is at the memory location whose address is sum of a register and a constant in the instruction (lw, sw)



- PC-relative addressing
  - Address is sum of PC and constant in instruction
  - Similar to Base addressing HOWEVER
    - Base address is now stored in PC
    - Displacement//offset is not #of bytes but # of instructions
  - Rmb PC is actually PC+4
- Pseudo-direct addressing
  - 26 bit of instruction concatenated with upper 4 bits of pc (j)

### 8. Addressing Modes (3/3)

④ PC-relative addressing: address is sum of PC and constant in the instruction (beq, bne)



- ⑤ Pseudo-direct addressing: 26-bit of instruction concatenated with upper 4-bits of PC (j)



## Summary (1/2)

- MIPS Instruction:  
32 bits representing a single instruction

|   |        |    |    |                |       |           |
|---|--------|----|----|----------------|-------|-----------|
| R | opcode | rs | rt | rd             | shamt | funct     |
| I | opcode | rs | rt |                |       | immediate |
| J | opcode |    |    | target address |       |           |

- Branches and load/store are both I-format instructions; but branches use PC-relative addressing, whereas load/store use base addressing
- Branches use PC-relative addressing  
Jumps use pseudo-direct addressing
- Shifts use R-format, but other immediate instructions (addi, andi, ori) use I-format

IN OTHER WORDS:  
beq, bne: ignore PC, count displacement  
j: use PC, ignore displacement

## Summary (2/2)

| MIPS assembly language |                         |                      |                                         |                                          |
|------------------------|-------------------------|----------------------|-----------------------------------------|------------------------------------------|
| Category               | Instruction             | Example              | Meaning                                 | Comments                                 |
| Arithmetic             | add                     | add \$s1, \$s2, \$s3 | \$s1 = \$s2 + \$s3                      | Three operands; data in registers        |
|                        | subtract                | sub \$s1, \$s2, \$s3 | \$s1 = \$s2 - \$s3                      | Three operands; data in registers        |
|                        | add immediate           | addi \$s1, \$s2, 100 | \$s1 = \$s2 + 100                       | Used to add constants                    |
|                        | load word               | lw \$s1, 100(\$s2)   | \$s1 = Memory[\$s2 + 100]               | Word from memory to register             |
|                        | store word              | sw \$s1, 100(\$s2)   | Memory[\$s2 + 100] = \$s1               | Word from register to memory             |
|                        | load byte               | lb \$s1, 100(\$s2)   | \$s1 = Memory[\$s2 + 100]               | Byte from memory to register             |
|                        | store byte              | sb \$s1, 100(\$s2)   | Memory[\$s2 + 100] = \$s1               | Byte from register to memory             |
| Data transfer          | load upper immediate    | lui \$s1, 100        | \$s1 = 100 * 2 <sup>16</sup>            | Loads constant in upper 16 bits          |
|                        | branch on equal         | beq \$s1, \$s2, 25   | if (\$s1 == \$s2) go to PC + 4 + 100    | Equal test; PC-relative branch           |
|                        | branch on not equal     | bne \$s1, \$s2, 25   | if (\$s1 != \$s2) go to PC + 4 + 100    | Not equal test; PC-relative              |
|                        | branch                  | set on less than     | slt \$s1, \$s2, \$s3                    | if (\$s2 < \$s3) \$s1 = 1; else \$s1 = 0 |
|                        | set less than immediate | slti \$s1, \$s2, 100 | if (\$s2 < 100) \$s1 = 1; else \$s1 = 0 | Compare less than constant               |
|                        | jump                    | j 2500               | go to 10000                             | Jump to target address                   |
|                        | jump register           | jr \$ra              | go to \$ra                              | For switch, procedure return             |
| Unconditional jump     | jump and link           | jal 2500             | \$ra = PC + 4; go to 10000              | For procedure call                       |

## ISA

- There are 2 major design philosophy for ISA
- CISC and RISC
- Complex Instruction Set Computer CISC
  - Intel x86-32
  - Single instruction performs complex operation
    - VAX archi had an instruction to multiply polynomials
    - VAX is a mini computer used yrs ago
  - There was smaller program size as memory was premium
  - So they try to optimize memory

- Complex implementation, no room for hardware optimization
- Reduced Instruction Set Computer (RISC)
  - MIPS, ARM
  - Keeps the instruction set small and simple, makes it easier to build/optimize hardware
  - Burden on software to combine simpler operations to implement high-level language statements
  - More and cheaper memory now
  - And it allows compiler optimization
  - These days RISC is used more
- Concepts in ISA design
  - Data Storage - where we get data
  - Memory Addressing Modes - how to access data from memory
  - Operations in instruction set - types operations
  - Instruction format - types of instruction formats
  - Encoding the instruction set - how instructions are encoded into binary bits

## 1. DATA Storage

- In MIPS, general purpose register architecture

### 3.1 Storage Architecture: Definition



- Common designs

- Stack Architecture

- Operands are **implicitly** on top of the stack
- LIFO
- Has push/pop operations



- Accumulator architecture

- One operand is **implicitly** in the accumulator (a special register)
- IBM 701 example

### 3.1 Storage Architecture: Animation

| Stack | Accumulator | Register (load-store) | Memory-Memory |
|-------|-------------|-----------------------|---------------|
|       | Load A ↗    | Register (load-store) | Memory-Memory |
|       | Add B ↗     |                       |               |
|       | Store C ↗   |                       |               |



- General-purpose register architecture
  - Only **explicit** operands
  - Register memory architecture
    - Where 1 operand in memory
      - Ex. Intel 80386
  - Register-register (or load store) architecture
    - MIPS/ARM

### 3.1 Storage Architecture: Animation

| Stack | Registers | Register (load-store) | Memory-Memory |
|-------|-----------|-----------------------|---------------|
|       |           | Load R1,A             |               |
|       |           | Load R2,B             |               |
|       |           | Add R3,R1,R2          |               |
|       |           | Store R3,C            |               |



- Memory-memory architecture
  - All operands in memory
  - Like the mini computers used in 70s

### 3.1 Storage Architecture: Example

| Stack  | Accumulator | Register (load-store) | Memory-Memory |
|--------|-------------|-----------------------|---------------|
| Push A | Load A      | Load R1,A             | Add C, A, B   |
| Push B | Add B       | Load R2,B             |               |
| Add    | Store C     | Add R3,R1,R2          |               |
| Pop C  |             | Store R3,C            |               |



- Modern processors use general purpose register (GPR)
- RISC computers use register-register (load/store) design
  - Coz simplify hardware design
  - Balanced pipeline
- CISC computers use a mixture of register-register and register-memory

## 2. Memory Addressing Mode

- Given k bit address, address space is of size  $2^k$
- Each memory transfer consists of 1 word of n bits
- Memory address register - stores address which the processor needs to read
- Memory Data register - stores data that needs to be written/read into memory
- Address bus is always unidirectional

- Data bus can go both ways
- There are also control lines - manages whether current memory access is Read/Write

### 3.2 Memory Address and Content

- Given k-bit address, the address space is of size  $2^k$
- Each memory transfer consists of one word of  $n$  bits



- ENDIANNES - relative ordering of bytes in multiple-byte word stored in memory
  - Big endian - LSB → MSB
  - Little endian - MSB → LSB

### 3.2 Memory Content: Endianness

#### ▪ Endianness:

- The relative ordering of the bytes in a multiple-byte word stored in memory

| Big-endian:                                                       | Little-endian:                                                    |
|-------------------------------------------------------------------|-------------------------------------------------------------------|
| Most significant byte stored in lowest address.                   | Least significant byte stored in lowest address.                  |
| Example:<br>IBM 360/370, Motorola 68000, MIPS                     | Example:<br>Intel 80x86, DEC VAX, DEC Alpha.                      |
| Example: 0xDE AD BE EF<br>Stored as: 0 DE<br>1 AD<br>2 BE<br>3 EF | Example: 0xDE AD BE EF<br>Stored as: 0 EF<br>1 BE<br>2 AD<br>3 DE |

NOTE: The endian-ness of MIPS is actually implementation specific.

- Addressing modes - ways to specify an operand in assembly
  - In mips, there are only 3
    - Register → operand is in a register (add \$t1, \$t2, \$t3)
    - Immediate → addi
    - Displacement → lw

### 3.2 Addressing Modes: Others

| Addressing mode    | Example            | Meaning                    |
|--------------------|--------------------|----------------------------|
| Register           | Add R4,R3          | R4 ← R4+R3                 |
| Immediate          | Add R4,#3          | R4 ← R4+3                  |
| Displacement       | Add R4,100(R1)     | R4 ← R4+Mem[100+R1]        |
| MIPS               |                    |                            |
| Register indirect  | Add R4,(R1)        | R4 ← R4+Mem[R1]            |
| Indexed / Base     | Add R3,(R1+R2)     | R3 ← R3+Mem[R1+R2]         |
| Direct or absolute | Add R1,(1001)      | R1 ← R1+Mem[1001]          |
| Memory indirect    | Add R1,@(R3)       | R1 ← R1+Mem[Mem[R3]]       |
| Auto-increment     | Add R1,(R2)+       | R1 ← R1+Mem[R2]; R2 ← R2+d |
| Auto-decrement     | Add R1,-(R2)       | R2 ← R2-d; R1 ← R1+Mem[R2] |
| Scaled             | Add R1,100(R2)(R3) | R1 ← R1+Mem[100+R2+R3*d]   |

### OPERATIONS in the instruction set

- Standard operations in an instruction set

- Frequently used instruction

### 3.3 Standard Operations



- Besides string/graphics, RISC processor supports all of above
- If we know frequently used ops/ can optimize them

### 3.3 Frequently Used Instructions *(Optimize)* *(Profile SW)*

| Rank | Integer Instructions      | Average % |
|------|---------------------------|-----------|
| 1    | Load                      | 22%       |
| 2    | Conditional Branch        | 20%       |
| 3    | Compare                   | 16%       |
| 4    | Store                     | 12%       |
| 5    | Add                       | 8%        |
| 6    | Bitwise AND               | 6%        |
| 7    | Sub                       | 5%        |
| 8    | Move register to register | 4%        |
| 9    | Procedure call            | 1%        |
| 10   | Return                    | 1%        |
|      | Total                     | 98%       |

Make these instructions fast!  
Amdahl's law - make the common cases fast!

→ Memory  
→ Cache performance  
→ most freq used instr

## Instruction Formats

- Instruction length
  - In mips, they are 32 bits
  - For 64 bit machines, instructions are 64 bit

### 3.4 Instruction Length



- Instruction Fields
  - Type and size of operands
  - Consists of opcode and operands
    - Opcode: unique code to specify desired op
    - Operands: zero or more additional information needed for op

## 3.4 Instruction Fields

*Algorithm  $\rightarrow$  0 / function  $\leftarrow$  OPY*

- An instruction consists of
  - opcode: unique code to specify the desired operation
  - operands: zero or more additional information needed for the operation
- The operation designates the type and size of the operands
  - Typical type and size: Character (8 bits), half-word (eg: 16 bits), word (eg: 32 bits), single-precision floating point (eg: 1 word), double-precision floating point (eg: 2 words).
- Expectations from any new 32-bit architecture:
  - Support for 8-, 16- and 32-bit integer and 32-bit and 64-bit floating point operations. A 64-bit architecture would need to support 64-bit integers as well.

### Encoding the instruction set

- MIPS has 32 registers
  - Need 5 bits for register indexing



## 3.5 Encoding Choices

- Three encoding choices: variable, fixed, hybrid.



- Fixed Length : encoding

- How can we fit multiple sets of instruction types into same number of bits
- Expanding Opcode scheme
  - Opcode has variable lengths for diff instructions
  - To maximize instruction bits

### 3.5 Fixed Length Instructions: Encoding (2/4)



### 3.5 Fixed Length Instructions: Encoding (3/4)

Use expanding opcode scheme:

- Extend the opcode for type-B instructions to 11 bits
- No wasted bits and result in a larger instruction set



Questions:

- How do we distinguish between Type-A and Type-B?
- How many different instructions do we really have?

Possible Answers:

- Think about MIPS: we simply set the opcode as all 0 for R-format but use the "extended opcode" called funct to distinguish.
- There are different ways to compute the many different instructions depending on what to maximize and/or minimize.

- We reserve 1 6 bit number to be assigned to type b
- $2^{6-1}$  type a opcodes possible
- 111111
- The 11 bits of type b will always start with 111111
- So we have only  $2^5$  opcodes of type b
- 11111100000 -> 11111111111

- We maximize # of instruction for type b and minimize for type A
- So opcode for type a will be only 1: 000000
- Type B:  $(2^{6-1}) * 2^5$  instructions
- Type A: only 1 instruction

# Week 5

Monday, September 18, 2023 11:41 PM

- C -> Assembly -> embedded binary -> processor
- Building a processor has 2 parts Datapath and control path
  - Data path is collection of components that process data
  - Performs arithmetic , logical, memory ops
  - Control tells the Datapath, memory and I/O devices what to do according to program instructions

## 2. MIPS Processor: Implementation

- Simplest possible implementation of a subset of the core MIPS ISA:
  - **Arithmetic and Logical operations**
    - add, sub, and, or, addi, andi, ori, slt
  - **Data transfer instructions**
    - lw, sw
  - **Branches**
    - beq, bne
- Shift instructions (**sll, srl**) and J-type instructions (**j**) will not be discussed:
  - Left as exercises ☺

Note  
andi and ori is not supported in this current processor design because we always do "sign extension" on immediate value.

Note  
sll and srl can be done by multiplication (which can be done by add with loop). j can be done by beq \$zero, \$zero, label if we ignore the difference related to 256MB blocks.

- Executing MIPS instruction overview
  - Program is stored in memory
  - The processor has a Program Counter (PC)
    - Which is address of next instruction to be executed
  - Each instruction is executed in 5 stages: fetch, decode, operand fetch, execute, result write (store)
  - Fetch
    - Getting instruction from memory
    - Address is in PC register
  - Decode
    - Find out the operation required
  - Operand fetch
    - Get operand needed for operation
  - Execute
    - Perform operation
  - Result write
    - Store result

## 4. MIPS Instruction Execution (1/2)

- Show the actual steps for 3 representative MIPS instructions
- Fetch and Decode stages not shown:
  - The standard steps are performed

|                    | <u>add \$rd, \$rs, \$rt</u>                                                                                      | <u>lw \$rt, ofst(\$rs)</u>                                                                                           | <u>beq \$rs, \$rt, ofst</u>                                                                                      | 16 bit                                                          |
|--------------------|------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------|
| 1<br>Fetch         | standard                                                                                                         | standard                                                                                                             | standard                                                                                                         |                                                                 |
| 2<br>Decode        |                                                                                                                  |                                                                                                                      |                                                                                                                  |                                                                 |
| 3<br>Operand Fetch | <ul style="list-style-type: none"> <li>Read [\$rs] as <u>opr1</u></li> <li>Read [\$rt] as <u>opr2</u></li> </ul> | <ul style="list-style-type: none"> <li>Read [\$rs] as <u>opr1</u></li> <li>Use <u>ofst</u> as <u>opr2</u></li> </ul> | <ul style="list-style-type: none"> <li>Read [\$rs] as <u>opr1</u></li> <li>Read [\$rt] as <u>opr2</u></li> </ul> |                                                                 |
| 4<br>Execute       | $Result = opr1 + opr2$                                                                                           | $MemAddr = opr1 + opr2$<br>Use <u>MemAddr</u> to read from memory                                                    | $Taken = (opr1 == opr2) ?$<br>$Target = (PC+4) + ofst \times 4$                                                  |                                                                 |
| 5<br>Result Write  | Result stored in \$rd                                                                                            | Memory data stored in \$rt                                                                                           | if (Taken)<br>$PC = Target$                                                                                      | $= opr = operand$<br>$= MemAddr = address$<br>$= ofst = offset$ |

## 4. MIPS Instruction Execution (2/2)

- Design changes:
  - Merge Decode and Operand Fetch – Decode is simple for MIPS
  - Split Execute into ALU (Calculation) and Memory Access

|                    | <u>add \$rd, \$rs, \$rt</u>                                                                                      | <u>lw \$rt, ofst(\$rs)</u>                                                                                           | <u>beq \$rs, \$rt, ofst</u>                                                                                      |
|--------------------|------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| 1<br>Fetch         | Read instr from [PC]                                                                                             | Read instr from [PC]                                                                                                 | Read instr from [PC]                                                                                             |
| 2<br>Decode        |                                                                                                                  |                                                                                                                      |                                                                                                                  |
| 3<br>Operand Fetch | <ul style="list-style-type: none"> <li>Read [\$rs] as <u>opr1</u></li> <li>Read [\$rt] as <u>opr2</u></li> </ul> | <ul style="list-style-type: none"> <li>Read [\$rs] as <u>opr1</u></li> <li>Use <u>ofst</u> as <u>opr2</u></li> </ul> | <ul style="list-style-type: none"> <li>Read [\$rs] as <u>opr1</u></li> <li>Read [\$rt] as <u>opr2</u></li> </ul> |
| 4<br>ALU           | $Result = opr1 + opr2$                                                                                           | $MemAddr = opr1 + opr2$                                                                                              | $Taken = (opr1 == opr2) ?$<br>$Target = (PC+4) + ofst \times 4$                                                  |
| 5<br>Memory Access |                                                                                                                  | Use <u>MemAddr</u> to read from memory                                                                               |                                                                                                                  |
| Result Write       | Result stored in \$rd                                                                                            | Memory data stored in \$rt                                                                                           | if (Taken)<br>$PC = Target$                                                                                      |

### Fetch Stage

- Requirements
  - Use PC to fetch instruction from memory
    - PC is special register in processor
  - Increment PC by 4 to get address of next instruction
    - 32 bits of each MIPS instruction = 4 bytes
    - Expect branch/jump is different
  - Output to the decode stage

### 5.1 Fetch Stage: Block Diagram



- INSTRUCTIONAL MEMORY
  - Storage element for the instructions
  - A sequential circuit
  - Has an internal state that stores information

- Clock signal is assumed

## 5.1 Element: Instruction Memory

- Storage element for the instructions
  - It is a **sequential circuit** (to be covered later)
  - Has an internal state that stores information
  - Clock signal is assumed and not shown

- Supply instruction given the address
  - Given instruction address  $M$  as input, the memory outputs the content at address  $M$
  - Conceptual diagram of the memory layout is given on the right →



## 5.1 The Idea of Clocking

- It seems that we are reading and updating the PC at the same time:
  - How can it work properly?
- Magic of clock:**
  - PC is read during the **first half** of the clock period and it is updated with  $PC+4$  at the **next rising clock edge**



## Decode stage

- Gather data from instruction fields
- Read opcode to determine instruction type and field lengths
- Read data from all necessary registers
  - Ex. (add -> 2, addi -> 1, j -> 1)
- Input from previous stage fetch
- The register file, is basically if you input the register #, it will output the register value

## 5.2 Decode Stage: Block Diagram



## 5.2 Element: Register File

- A collection of 32 registers:
  - Each 32-bit wide; can be read/written by specifying register number
  - Read at most two registers per instruction
  - Write at most one register per instruction
- RegWrite** is a control signal to indicate:
  - Writing of register
  - 1(True) = Write. 0(False) = No Write

## 5.2 Element: Register File

- A collection of 32 registers:
  - Each 32-bit wide; can be read/written by specifying register number
  - Read at most two registers per instruction
  - Write at most one register per instruction
- RegWrite is a control signal to indicate:
  - Writing of register
  - 1(True) = Write, 0(False) = No Write



## 5.2 Decode Stage: R-Format Instruction



## 5.2 Decode Stage: I-Format Instruction



## 5.2 Decode Stage: Choice in Destination



- Multiplexer
  - Functions to select 1 input from multiple input lines
  - To choose correct write register based on instruction type
  - For I-Format instructions, multiplexer slhd choose 0
    - For R-format, choose 1

## 5.2 Multiplexer

- Function:**
    - Selects one input from multiple input lines
  - Inputs:**
    - $n$  lines of same width
  - Control:**
    - $m$  bits where  $n = 2^m$
  - Output:**
    - Select  $i^{th}$  input line if control =  $i$
- Can be Combined to Form Larger MUX
- ```
function Mux2(in0,in1,in2,in3,ctrl1) {
  return Mux(Mux(in0,in1,ctrl0),
             Mux(in2,in3,ctrl1),
             ctrl1);
}
```
- As a Function
- ```
// 2 input + 1 control + 1 output
function Mux(in0, in1, ctrl) {
  if(!ctrl) {
    return in0;
  } else {
    return in1;
  }
}
```



Control=0 → select  $\text{in}_0$  to  $\text{out}$   
 Control=3 → select  $\text{in}_3$  to  $\text{out}$

## 5.2 Decode Stage: Choice in Data 2



## ALU

- Arithmetic logic unit
- Execution stage
- Does arithmetic, shifting, logical ops
- For memory (lw, sw), address calc is done
- For branch ops, register comparison and target address calc is done

### 5.3 ALU Stage: Block Diagram



### 5.3 Element: Arithmetic Logic Unit

- **ALU (Arithmetic Logic Unit)**
    - Combinational logic to implement arithmetic and logical operations
  - **Inputs:**
    - Two 32-bit numbers
  - **Control:**
    - 4-bit to decide the particular operation
  - **Outputs:**
    - Result of arithmetic/logical operation
    - A 1-bit signal to indicate whether result is zero
- Handwritten notes: DP, Read Data, ALU op = Data, Ctrl Signals, -RegDst, -ALUSrc, -RegWrite, ALU*



### 5.3 ALU Stage: Non-Branch Instructions

- We can handle non-branch instructions easily:



## 5.3 ALU Stage: Branch Instructions

- Branch instruction is harder as we need to perform two calculations:
- Example: "beq \$9, \$0, 3"
  - 1. Branch Outcome:**
    - Use ALU to compare the register
    - The 1-bit "isZero?" signal is enough to handle equal/not equal check (how?)
- 

- 2. Branch Target Address:**
  - Introduce additional logic to calculate the address
  - Need PC (from Fetch Stage)
  - Need Offset (from Decode Stage)

**Note**  
Two things need to happen to actually take the branch.

1. The instruction is a branch instruction.
2. The condition of the branch is true.

- Exception : Although branch instruction is I format, the ALUsrc = 1 usually, but here the ALUsrc = 0.
- New control signal: PCSrc can be 0, 1
  - To choose either PC + 4 (0) or Branch target address (1)



## Memory Stage

- Only the load and store instructions perform operations in this stage
  - Use the memory address calculated by ALU
  - Read/write to data memory

## 5.4 Element: Data Memory

- Storage element for the data of a program
- **Inputs:**
  - Memory Address
  - Data to be written (Write Data) for store instructions
- **Control:**
  - Read and Write controls; only one can be asserted at any point of time
- **Output:**
  - Data read from memory (Read Data) for load instructions
- Memwrite and memread cannot both be 1



## 5.4 Memory Stage: Load Instruction



## 5.4 Memory Stage: Store Instruction

- Need Read Data 2 (from Decode stage) as the Write Data



## 5.4 Memory Stage: Non-Memory Inst.

- Add a multiplexer to choose the result to be stored



### Register Write Stage

- Most instructions write result of computation into register
  - But not stores, branches, jumps

## 5.5 Register Write Stage: Block Diagram



- Result Write stage has no additional element:
  - Basically just route the correct result into register file
  - The **Write Register** number is generated way back in the **Decode Stage**

## 5.5 Register Write Stage: Routing



# Week 6

Thursday, September 21, 2023 10:34 PM

- Control signals
- RegDst, RegWrite,
- RegDst - choose rt or rd as destination register
  - If r format, rd is chosen
    - Rd, rs, rt
  - If I format, rt is chosen
    - Rt, rs
- RegWrite - controls whether data from WD should be written to write register
- ALUSrc - choose right input, whether sign extended immediate operand or RD2
- ALUcontrol - performs different ops, like add,subtract
- MemWrite - control whether right data is write or read
- MemRead
- MemToReg - choose whether ALU result or read data
- PCSrc - if branch, then control where PC should go or simpl PC+4

## 1. Identified Control Signals



| Control Signal        | Execution Stage                  | Purpose                                               |
|-----------------------|----------------------------------|-------------------------------------------------------|
| RegDst                | Decode/Operand Fetch             | Select the destination register number                |
| RegWrite              | Decode/Operand Fetch<br>RegWrite | Enable writing of register                            |
| ALUSrc                | ALU                              | Select the 2 <sup>nd</sup> operand for ALU            |
| ALUcontrol            | ALU                              | Select the operation to be performed                  |
| MemRead /<br>MemWrite | Memory                           | Enable reading/writing of data memory                 |
| MemToReg              | RegWrite                         | Select the result to be written back to register file |
| PCSrc                 | Memory/RegWrite                  | Select the next PC value                              |

### Generating the Control Signals

- Generated based on instruction to be executed
- All R-format have 0 opcode, J format have 2, and remaining are for I format
- R-format instruction
  - RegDst = 1
    - Because we choose the rd as destination register not rt
  - ALUSrc = 0
    - RD2 is passed to ALU as second operand, not a immediate
  - ALUcontrol
  - MemToReg = 0
    - Since r-type instruction has a 6 bit function code, this is used to generate the control signals
- A combinational circuit is a circuit where the output is immediately affected by the input
- A sequential circuit has a clock input, output changes only depending on behavior of clock signal and input
  - Ex Program counter
  - Only changes at rising clock

### Implementing the control unit

### 3. MIPS Instruction Subset (Review)



- RegDst

- If control is 0, the rt taken

#### 4. Control Signal: RegDst

- False (0): Write register =  $\text{Inst}[20:16] \text{ rt}$
- True (1): Write register =  $\text{Inst}[15:11] \text{ rd}$



- RegWrite

- If control is 0, nothing happens
- If is 1, the data present in WD will be written to WR

#### 4. Control Signal: RegWrite

- False (0): No register write
- True (1): New value will be written



- ALUSrc

- Responsible to select the second operand
- 1st operand comes from rs
- Second chosen between rd2(rt) or sign extended immediate

- MemRead

- If 0, then nothing read from memory

- Otherwise, read from "address" and pass to "read data"
- MemWrite
  - If 0, then nothing written to memory
  - If 1, then whatever data is in "write data" is written to the "address"
- MemToReg
  - If 0, the ALU output written to register WD
  - If 1, then read data from memory is returned to WD
- PCSrc
  - Choose between PC+4 and Branch target address ( $PC + 4 + 4 * \text{immed}$ )
  - If 1, branch should be taken
  - For beq, when the 2 register vals are same, the branch address is taken and PCSrc = 1
  - Register vals being same is given from ALU. Ex. Calc zero
  - Then updates the PC value
  - The opcode, tells if it's a branch instruction

## ALUcontrol

- A 4 bit control signal
- When input changes, output changes immediately

Aaron Tan, NUS      Lecture #12: The Processor: Control      20

### 5. One Bit At A Time

**Note:** We will revisit this when we cover combinational circuits later.

= A simplified 1-bit MIPS ALU can be implemented as follows:

Acknowledgement: Image taken from NYU Course CSCI-UA.0436

- To achieve  $A - B$ , ALU sets B invert to 1. to do  $A + B'$ . Cin set to one
  - So  $A + B' + 1$  is eval  $(B' + 1) = -B$

### 5. One Bit At A Time (Aha!)

= Can you see how the **ALUcontrol** (4-bit) signal controls the ALU?

= Note: implementation for **slt** not shown

| ALUcontrol |         |           | Function |
|------------|---------|-----------|----------|
| Ainvert    | Binvert | Operation |          |
| 0          | 0       | 00        | AND      |
| 0          | 0       | 01        | OR       |
| 0          | 0       | 10        | add      |
| 0          | 1       | 10        | subtract |
| 0          | 1       | 11        | slt      |
| 1          | 1       | 00        | NOR      |

Acknowledgement: Image taken from NYU Course CSCI-UA.0436

NOR

- Uses DeMorgans law, that  $(A \text{ and } B)' = A' \text{ and } B'$

## 5. One Bit At A Time (Aha!)

- Can you see how the **ALUcontrol** (4-bit) signal controls the ALU?
- Note: implementation for **slt** not shown

| ALUcontrol |         |           | Function |
|------------|---------|-----------|----------|
| Ainvert    | Binvert | Operation |          |
| 0          | 0       | 00        | AND      |
| 0          | 0       | 01        | OR       |
| 0          | 0       | 10        | add      |
| 0          | 1       | 10        | subtract |
| 0          | 1       | 11        | slt      |
| 1          | 1       | 00        | NOR      |



Acknowledgement: Image taken from [NYU Course CSCI-UA.0436](#)

$$(A \circ B)' = A' + B'$$

$$(A + B)' = A' \cdot B'$$

|| 01

### Generating ALUcontrols signal

- They do depend on opcode(6bit) and function code (6 bit)
- We use a multilevel decoding approach
  - Use some of the input to reduce the cases, then generate full output
  - Done to reduce size of main controller
- INTERMEDIATE SIGNAL: **ALUop**
  - Use opcode to generate a 2bit **ALUop** signal
  - Represents whether lw,sw / beq / r-type
  - Use the **ALUop** signal and function code field (for R-type) to generate the 4 bit **ALUcontrol** signal

## 5. Intermediate Signal: **ALUop**

### Basic Idea:

1. Use Opcode to generate a 2-bit **ALUop** signal
  - Represents classification of the instructions:

| Instruction type | ALUop |
|------------------|-------|
| Add              | 00    |
| Sub              | 01    |
| funt             | 10    |

2. Use **ALUop** signal and Function Code field (for R-type instructions) to generate the 4-bit **ALUcontrol** signal



## 5. Generating ALUcontrol Signal

don't care

| Opcode | ALUop | Instruction Operation | Funct field | ALU action       | ALU control |
|--------|-------|-----------------------|-------------|------------------|-------------|
| Iw     | 00    | load word             | XXXXXX      | add              | 0010        |
| sw     | 00    | store word            | XXXXXX      | add              | 0010        |
| beq    | 01    | branch equal          | XXXXXX      | subtract         | 0110        |
| R-type | 10    | add                   | 10 0000     | add              | 0010        |
| R-type | 10    | subtract              | 10 0010     | subtract         | 0110        |
| R-type | 10    | AND                   | 10 0100     | AND              | 0000        |
| R-type | 10    | OR                    | 10 0101     | OR               | 0001        |
| R-type | 10    | set on less than      | 10 1010     | set on less than | 0111        |

Generation of 2-bit ALUop signal will be discussed later

D/P

| Instruction Type | ALUop |
|------------------|-------|
| Iw / sw          | 00    |
| beq              | 01    |
| R-type           | 10    |

| ALUcontrol | Function |
|------------|----------|
| 0000       | AND      |
| 0001       | OR       |
| 0010       | add      |
| 0110       | subtract |
| 0111       | slt      |
| 1100       | NOR      |

## 5. Design of ALU Control Unit (1/2)

- Input: 6-bit **Funct** field and 2-bit **ALUop**
- Output: 4-bit **ALUcontrol**
- Find the simplified expressions

|     | ALUop |     | Funct Field<br>( F[5:0] == Inst[5:0] ) |    |    |    |    |    | ALU control<br>A <sub>3</sub> A <sub>2</sub> A <sub>1</sub> A <sub>0</sub> |
|-----|-------|-----|----------------------------------------|----|----|----|----|----|----------------------------------------------------------------------------|
|     | MSB   | LSB | F5                                     | F4 | F3 | F2 | F1 | F0 |                                                                            |
| Iw  | 0     | 0   | X                                      | X  | X  | X  | X  | X  | 010                                                                        |
| sw  | 0     | 0   | X                                      | X  | X  | X  | X  | X  | 010                                                                        |
| beq | 0     | 1   | X                                      | X  | X  | X  | X  | X  | 110                                                                        |
| add | 1     | 0   | 1                                      | 0  | 0  | 0  | 0  | 0  | 010                                                                        |
| sub | 1     | 0   | 1                                      | 0  | 0  | 0  | 1  | 0  | 110                                                                        |
| and | 1     | 0   | 1                                      | 0  | 0  | 1  | 0  | 0  | 000                                                                        |
| or  | 1     | 0   | 1                                      | 0  | 0  | 1  | 0  | 1  | 001                                                                        |
| slt | 1     | 0   | 1                                      | 0  | 1  | 0  | 1  | 0  | 111                                                                        |

$$\begin{aligned} A_3 &= \\ A_2 &= \\ A_1 &= \\ A_0 &= \\ &\downarrow \\ \text{Current} & \end{aligned}$$

## 5. Design of ALU Control Unit (1/2)

- Input: 6-bit **Funct** field and 2-bit **ALUop**
- Output: 4-bit **ALUcontrol**
- Find the simplified expressions

k = 0/1

|     | ALUop |     | Funct Field<br>( F[5:0] == Inst[5:0] ) |    |    |    |    |    | ALU control<br>A <sub>3</sub> A <sub>2</sub> A <sub>1</sub> A <sub>0</sub> |
|-----|-------|-----|----------------------------------------|----|----|----|----|----|----------------------------------------------------------------------------|
|     | MSB   | LSB | F5                                     | F4 | F3 | F2 | F1 | F0 |                                                                            |
| Iw  | 0     | 0   | X                                      | X  | X  | X  | X  | X  | 010                                                                        |
| sw  | 0     | 0   | X                                      | X  | X  | X  | X  | X  | 010                                                                        |
| beq | 0     | 1   | X                                      | X  | X  | X  | X  | X  | 110                                                                        |
| add | 1     | 0   | 1                                      | 0  | 0  | 0  | 0  | 0  | 010                                                                        |
| sub | 1     | 0   | 1                                      | 0  | 0  | 0  | 1  | 0  | 110                                                                        |
| and | 1     | 0   | 1                                      | 0  | 0  | 1  | 0  | 0  | 000                                                                        |
| or  | 1     | 0   | 1                                      | 0  | 0  | 1  | 0  | 1  | 001                                                                        |
| slt | 1     | 0   | 1                                      | 0  | 1  | 0  | 1  | 0  | 111                                                                        |

$$\begin{aligned} \text{ALUcontrol3} &= 0 \\ \text{ALUcontrol2} &= ? \\ \text{ALUop0} + \text{ALUop1} \cdot F_1 & \end{aligned}$$

$$\begin{aligned} A_3 &= \\ A_2 &= \\ A_1 &= \\ A_0 &= \\ &\downarrow \\ \text{Current} & \end{aligned}$$



## 5. Control Design: Outputs

|        | RegDst | ALUSrc | MemToReg | RegWrite | MemRead | MemWrite | Branch | ALUop   |
|--------|--------|--------|----------|----------|---------|----------|--------|---------|
|        |        |        |          |          |         |          |        | op1 op0 |
| R-type | 1      | 0      | 0        | 1        | 0       | 0        | 0      | 1 0     |
| lw     | 0      | 1      | 1        | 1        | 1       | 0        | 0      | 0 0     |
| sw     | X      | 1      | X        | 0        | 0       | 1        | 0      | 0 0     |
| beq    | X      | 0      | X        | 0        | 0       | 0        | 1      | 0 1     |



## 5. Combinational Circuit Implementation



### Instruction Execution

- Instructions are executed within a single clock cycle

- This has a drawback
- So instead, multicycle implementation is used
  - 6. Solution #1: Multicycle Implementation
  - Break up the instructions into execution steps:
    - 1. Instruction fetch
    - 2. Instruction decode and register read
    - 3. ALU operation
    - 4. Memory read/write
    - 5. Register write
  - ▪ Each execution step takes one clock cycle
    - **Cycle time is much shorter, i.e., clock frequency is much higher**
  - Instructions take variable number of clock cycles to complete execution
  - Not covered in class:
    - See Section 5.5 of COD if interested
- Or use pipelining

## Week 7

Thursday, October 12, 2023 11:44 AM

### Logic gates

## 2. Boolean Algebra

### Boolean values:

- True (T or 1)
- False (F or 0)



### Connectives

- Conjunction (AND)
  - $A \cdot B$ ;  $A \wedge B$
- Disjunction (OR)
  - $A + B$ ;  $A \vee B$
- Negation (NOT)
  - $A'$ ;  $\bar{A}$ ;  $\neg A$



## 4. Precedence of Operators

2 + 3 \* 4

### Precedence from highest to lowest

- Not ( $'$ )
- And ( $\cdot$ )
- Or ( $+$ )

Note the difference with CS1231/CS1231S. Here in CS2100, AND has higher precedence than OR.

### Examples:

- $A \cdot B + C = (A \cdot B) + C$  Hence,  $A \cdot B + C$  is not ambiguous in CS2100.
- $X + Y = X + (Y')$
- $P + Q' \cdot R = P + ((Q') \cdot R)$

### Use parenthesis to overwrite precedence. Examples:

- $A \cdot (B + C)$  [Without parenthesis, it means  $A \cdot B + C$  or  $(A \cdot B) + C$ ]
- $(P + Q)' \cdot R$  [Without parenthesis, it means  $P + Q' \cdot R$  or  $P + (Q' \cdot R)$ ]

## 5. Laws of Boolean Algebra

| Identity laws                                 |                                             |
|-----------------------------------------------|---------------------------------------------|
| $A + 0 = 0 + A = A$                           | $A \cdot 1 = 1 \cdot A = A$                 |
| Inverse/complement laws                       |                                             |
| $A + A' = A' + A = 1$                         | $A \cdot A' = A' \cdot A = 0$               |
| Commutative laws                              |                                             |
| $A + B = B + A$                               | $A \cdot B = B \cdot A$                     |
| Associative laws *                            |                                             |
| $A + (B + C) = (A + B) + C$                   | $A \cdot (B \cdot C) = (A \cdot B) \cdot C$ |
| Distributive laws                             |                                             |
| $A \cdot (B + C) = (A \cdot B) + (A \cdot C)$ | $A + (B \cdot C) = (A + B) \cdot (A + C)$   |

\* Due to the associative laws,  $A + B + C$  is unambiguous. It may be evaluated as  $A + (B + C)$  or  $(A + B) + C$ . Likewise for  $A \cdot B \cdot C$ .

### Duality

- If And/Or operators and identity elements 0/1 in an equation are interchanged, it remains valid
- Ex.  $a \cdot (b+c) = (a+b) \cdot (a+c)$  is  $a \cdot (b+c) = (a \cdot b) + (a \cdot c)$
- If  $(x+y+z)' = x' \cdot y' \cdot z'$  is valid, then  $(x \cdot y \cdot z)' = x' + y' + z'$  is also valid
- If  $x+1 = 1$  is valid, then  $x \cdot 0 = 0$  is valid

## 7. Theorems

| Idempotency                                                 |                                |
|-------------------------------------------------------------|--------------------------------|
| $X + X = X$                                                 | $X \cdot X = X$                |
| One element / Zero element                                  |                                |
| $X + 1 = 1 + X = 1$                                         | $X \cdot 0 = 0 \cdot X = 0$    |
| Involution                                                  |                                |
| $(X')' = X$                                                 |                                |
| Absorption 1                                                |                                |
| $X + X \cdot Y = X$                                         | $X(X + Y) = X$                 |
| Absorption 2                                                |                                |
| $X \cdot X + X = X$                                         | $X(X + Y) = X \cdot Y$         |
| DeMorgan's (can be generalised to more than 2 variables)    |                                |
| $(X + Y)' = X' \cdot Y'$                                    | $(X \cdot Y)' = X' + Y'$       |
| Consensus                                                   |                                |
| $X \cdot Y + X \cdot Z + Y \cdot Z = X \cdot Y + X \cdot Z$ | $(X+Y)(X+Z)(Y+Z) = (X+Y)(X+Z)$ |

## 9. Complement Functions

- Given a Boolean function F, the complement of F, denoted as  $F'$ , is obtained by interchanging 1 with 0 in the function's output values.

| x | y | z | F1 | F1' |
|---|---|---|----|-----|
| 0 | 0 | 0 | 0  | 1   |
| 0 | 0 | 1 | 0  | 1   |
| 0 | 1 | 0 | 0  | 1   |
| 0 | 1 | 1 | 0  | 1   |
| 1 | 0 | 0 | 0  | 1   |
| 1 | 0 | 1 | 0  | 1   |
| 1 | 1 | 0 | 1  | 0   |
| 1 | 1 | 1 | 0  | 1   |

### Standard Forms

- Certain types of boolean expression lead to circuits that are desirable
- Sum-of-Products (SOP)
- Product-of-sums (POS)
- Literals
  - Boolean variable on its own or is complement
  - $x, x'$
- Product term
  - Single literal or a logical product (AND) of several literals
  - $x, x \cdot y \cdot z, A \cdot B$
- Sum term
  - Single literal or logical sum (OR) of several literals
  - $x, x+y+z, A'+B, A+B$
- SOP
  - Product term or logical sum (OR) of several product terms
  - $x, x \cdot y \cdot z'$ 
    - $A \cdot B + A' \cdot B'$
    - No brackets needed
- POS
  - Sum term or a logical product (AND) of several sum terms
    - $x \cdot (y+z)$
    - $(A+B) \cdot (A'+B')$
- Every Boolean expression can be expressed in SOP or POS

|     | Expression         | SOPP | POS7 |
|-----|--------------------|------|------|
| (1) | $X'Y'Z + X'Y'Z'$   | ✓    | ✗    |
| (2) | $(X+Y)(X+Y)(X+Z)$  | ✗    | ✓    |
| (3) | $X'Y'Z + Z'$       | ✓    | ✓    |
| (4) | $X(W + YZ)$        | ✗    | ✗    |
| (5) | $XWZ$              | ✓    | ✓    |
| (6) | $W'X'Y + V(X + W)$ | ✗    | ✗    |

## 11. Minterms and Maxterms (1/2)

- A minterm of  $n$  variables is a **product term** that contains  $n$  literals from all the variables.
- Example: On 2 variables  $x$  and  $y$ , the minterms are:  $x'y', x'y, x'y'$  and  $xy$ .
- A maxterm of  $n$  variables is a **sum term** that contains  $n$  literals from all the variables.
- Example: On 2 variables  $x$  and  $y$ , the maxterms are:  $x+y', x+y, x+y'$  and  $x+y$ .
- In general, with  $n$  variables we have up to  $2^n$  minterms and  $2^n$  maxterms.

## 11. Minterms and Maxterms (2/2)

- The minterms and maxterms on 2 variables are denoted by  $m0$  to  $m3$  and  $M0$  to  $M3$  respectively.

|   |   | Minterms | Maxterms |
|---|---|----------|----------|
| x | y | Term     | Notation |
| 0 | 0 | $x'y'$   | $m0$     |
| 0 | 1 | $x'y$    | $m1$     |
| 1 | 0 | $x'y'$   | $m2$     |
| 1 | 1 | $xy$     | $m3$     |

- Important fact: Each minterm is the complement of its corresponding maxterm. Likewise, each maxterm is the complement of its corresponding minterm.
- Example:  $m2 = x'y'$   
 $m2' = (x'y')' = x' + (y')' = x' + y = M2$

## Quiz Time Again!

- Ability to convert minterms and maxterms from its Boolean expression to its notation (and vice versa) is important.
- Test yourself with the following quiz, assuming that you are given a Boolean function on 4 variables A, B, C, D.

| Minterm                  | Maxterm          |                                       |                  |
|--------------------------|------------------|---------------------------------------|------------------|
| Boolean expression       | Minterm notation | Boolean expression                    | Maxterm notation |
| (1) $A'B'C'D$            | $m3$             | (1) $\bar{A}\bar{B}\bar{C}\bar{D}'$   | $M3$             |
| (2) $\bar{A}B'C'D'$      | $m10$            | (2) $\bar{A}'\bar{B}'\bar{C}\bar{D}'$ | $M13$            |
| (3) $A'B'C'D$            | $m11$            | (3) $A+\bar{B}+C+\bar{D}$             | $M0$             |
| (4) $A\bar{B}C\bar{D}'$  | $m14$            | (4) $\bar{A}+\bar{B}+C+\bar{D}$       | $M2$             |
| (5) $A\bar{B}'C'\bar{D}$ | $m9$             | (5) $\bar{A}'+B+C+\bar{D}'$           | $M9$             |

## 12. Canonical Forms

- Canonical/normal form: a unique form of representation.
- Sum-of-minterms = Canonical sum-of-products
- Product-of-maxterms = Canonical product-of-sums

### 12.1 Sum-of-Minterms

- Given a truth table, example:
- | x | y | z | F1 | F2 | F3 |
|---|---|---|----|----|----|
| 0 | 0 | 0 | 0  | 0  | 0  |
| 0 | 0 | 1 | 0  | 1  | 0  |
| 0 | 1 | 0 | 0  | 0  | 0  |
| 0 | 1 | 1 | 0  | 0  | 1  |
| 1 | 0 | 0 | 0  | 1  | 0  |
| 1 | 0 | 1 | 0  | 1  | 0  |
| 1 | 1 | 0 | 1  | 1  | 0  |
| 1 | 1 | 1 | 0  | 1  | 0  |
- $F1 = \underline{x'y'z'} = m6$
- $F2 = x'y'z + x'y'z' + x'y'z + x'y'z' + x'y'z + x'y'z' = m1 + m4 + m5 + m6 + m7 = \Sigma m(1,4,5,6,7) \text{ or } \Sigma m(1,4 - 7)$
- $F3 = x'y'z + x'y'z + x'y'z + x'y'z' = m1 + m4 + m5 = \Sigma m(1,3,4,5) \text{ or } \Sigma m(1,3 - 5)$

### 12.2 Product-of-Maxterms

- Given a truth table, example:
- | x | y | z | F1 | F2 | F3 |
|---|---|---|----|----|----|
| 0 | 0 | 0 | 0  | 0  | 0  |
| 0 | 0 | 1 | 0  | 1  | 0  |
| 0 | 1 | 0 | 0  | 0  | 0  |
| 0 | 1 | 1 | 0  | 0  | 1  |
| 1 | 0 | 0 | 0  | 1  | 1  |
| 1 | 0 | 1 | 0  | 1  | 0  |
| 1 | 1 | 0 | 1  | 0  | 0  |
| 1 | 1 | 1 | 0  | 1  | 0  |
- $F2 = (x+y+z) \cdot (x+y+z') \cdot (x+y'+z) \cdot (x+y'+z')$   
 $= M0 \cdot M2 \cdot M3 = \prod M(0,2,3)$
- $F3 = (x+y+z) \cdot (x+y+z') \cdot (x+y'+z) \cdot (x+y'+z')$   
 $= M0 \cdot M2 \cdot M6 \cdot M7 = \prod M(0,2,6,7)$

### 12.3 Conversion of Standard Forms

- We can convert between sum-of-minterms and product-of-maxterms easily.

Example:  $F2 = \Sigma m(1,4,5,6,7) = \prod M(0,2,3)$

- Why? See  $F2'$  in truth table.

- $F2' = m0 + m2 + m3$   
 Therefore,  
 $F2 = (m0 + m2 + m3)'$   
 $= m0' \cdot m2' \cdot m3'$  (by DeMorgan's)  
 $= M0 \cdot M2 \cdot M3$  (as  $m' = Mx$ )

| x | y | z | F2 | F2' |
|---|---|---|----|-----|
| 0 | 0 | 0 | 0  | 1   |
| 0 | 0 | 1 | 1  | 0   |
| 0 | 1 | 0 | 0  | 1   |
| 0 | 1 | 1 | 1  | 0   |
| 1 | 0 | 0 | 1  | 0   |
| 1 | 0 | 1 | 0  | 1   |
| 1 | 1 | 0 | 0  | 1   |
| 1 | 1 | 1 | 1  | 0   |

## 1. Logic Gates

| Gate symbols | Symbol set 1 | Symbol set 2<br>(ANSI/IEEE Standard 91-1984) |
|--------------|--------------|----------------------------------------------|
| AND          |              |                                              |
| OR           |              |                                              |
| NOT          |              |                                              |
| NAND         |              |                                              |
| NOR          |              |                                              |
| EXCLUSIVE OR |              |                                              |

### 1.1 Inverter/AND/OR Gates



### 1.2 NAND/NOR Gates



#### Logic Circuits

- Fan-in : the number of inputs of a gate
- Every input must be connected

▪ Example:  $F_1 = xy^2$  (note the use of a 3-input AND gate)

- 

▪ Example:  $F_2 = x + y'z$



- Example:  $F_3 = x'y + x'z$



- Draw solid circle to denote wires intersect

#### Universal Gates

- AND/OR/NOT gates are sufficient for building any function
- XOR gate for parity bit generation is useful
- {AND, OR, NOT} is complete set of logic

#### NAND Gate - universal

- NAND gate alone is complete set of logic
- {NAND} is a complete set of logic.
- Proof: Implement NOT/AND/OR using only NAND gates.



#### NOR Gate - universal

- {NOR} is complete set of logic
- {NOR} is a complete set of logic.
- Proof: Implement NOT/AND/OR using only NOR gates.



### 3.3 SOP and NAND Circuits (1/2)

- An SOP expression can be easily implemented using
  - 2-level AND-OR circuit
  - 2-level NAND circuit

Example:  $F = A \cdot B + C' \cdot D + E$

- Using 2-level AND-OR circuit



- Inverter is not considered a level

### 3.3 SOP and NAND Circuits (2/2)

- Example:  $F = A \cdot B + C' \cdot D + E$ 
  - Using 2-level NAND circuit

$$\begin{array}{c} \overline{A \cdot B} \\ \overline{C' \cdot D} \\ \overline{E} \end{array}$$



### 3.4 POS and NOR Circuits (2/2)

- Example:  $G = (A+B) \cdot (C'+D) \cdot E$ 
  - Using 2-level NOR circuit



- Add bubble to inputs of and/or circuit to get the nor circuit

- Quad NAND gate

### 5. PLA Example (2/3)

| Inputs | Inputs |   |   | Outputs |   |   |
|--------|--------|---|---|---------|---|---|
|        | A      | B | C | D       | E | F |
| 0      | 0      | 0 | 0 | 0       | 0 | 0 |
| 0      | 0      | 0 | 1 | 0       | 0 | 0 |
| 0      | 0      | 1 | 0 | 0       | 0 | 0 |
| 0      | 1      | 0 | 0 | 0       | 0 | 0 |
| 0      | 1      | 0 | 1 | 0       | 1 | 0 |
| 1      | 0      | 0 | 0 | 1       | 0 | 0 |
| 1      | 0      | 0 | 1 | 1       | 0 | 0 |
| 1      | 1      | 0 | 0 | 1       | 1 | 0 |
| 1      | 1      | 0 | 1 | 1       | 1 | 0 |
| 1      | 1      | 1 | 0 | 1       | 1 | 1 |
| 1      | 1      | 1 | 1 | 1       | 1 | 1 |



- Aim to minimize number of literals

### 2. Algebraic Simplification (2/3)

- Example 1: Simplify  $(x+y) \cdot (x+y') \cdot (x+z)$

$$\begin{aligned}
 & (x+y) \cdot (x+y') \cdot (x+z) \\
 &= (x \cdot x + x \cdot y' + y \cdot x + y \cdot y') \cdot (x+z) \quad (\text{distributivity}) \\
 &= (x + y \cdot y') \cdot (x+z) \quad (\text{idempotency}) \\
 &= (x + x \cdot (y \cdot y')) \cdot (x+z) \quad (\text{distributivity}) \\
 &= (x + x \cdot 0) \cdot (x+z) \quad (\text{complement}) \\
 &= (x \cdot x) \cdot (x+z) \quad (\text{identity}) \\
 &= x \cdot (x \cdot z) \quad (\text{idempotency}) \\
 &= x \cdot x' \cdot z \quad (\text{distributivity}) \\
 &= 0 \cdot z \quad (\text{complement}) \\
 &= 0
 \end{aligned}$$

## 2. Algebraic Simplification (3/3)

- Example 2: Find the simplified SOP expression of  $F(a,b,c,d) = a'b'c + ab'd + a'b'c' + cd + b'd'$

$$\begin{aligned}
 & a'b'c + \cancel{ab'd} + a'b'c' + cd + \cancel{b'd'} \\
 & = a'b'c + \cancel{ab} + \cancel{a'b'c'} + cd + \cancel{b'd'} \quad (\text{absorption 2}) \\
 & = \cancel{a'b'c} + \cancel{ab} + \cancel{b'c'} + cd + \cancel{b'd'} \quad (\text{absorption 2}) \\
 & = a + b + \cancel{cd} + \cancel{b'c'} \quad (\text{distributivity}) \\
 & = a + b + \cancel{cd} + b(c'd') \quad (\text{DeMorgan's}) \\
 & = a + b + \cancel{cd} + b \quad (\text{absorption 2}) \\
 & = a + b + \cancel{cd} \quad (\text{absorption 1})
 \end{aligned}$$

Number of literals reduced from 13 to 3.

## 3. Half Adder (1/2)

- Half adder** is a circuit that adds 2 single bits ( $X, Y$ ) to produce a result of 2 bits ( $C, S$ ).
- The black-box representation and truth table for half adder are shown below.



## 3. Half Adder (2/2)

- In canonical form (sum-of-minterms):
  - $C = XY$
  - $S = X'Y + X'Y'$

- Output S can be simplified further (though no longer in SOP form):
  - $S = X'Y + X'Y' = X \oplus Y$

- Implementation of a half adder



## Gray Code

- Unweighted, not arithmetic code
- Only a single bit change from one code value to the next
- Not restricted to decimal nbit =  $2^n$  val
- Good for error detection
- Reflected binary code

## 4. Gray Code (1/3)

- Unweighted (not an arithmetic code)
  - Only a single bit change from one code value to the next.
  - Not restricted to decimal digits:  $n$  bits  $\rightarrow 2^n$  values.
  - Good for error detection.
  - Named after Frank Gray; also called reflected binary code.

| Decimal | Binary | Gray Code | Decimal | Binary | Gray code |
|---------|--------|-----------|---------|--------|-----------|
| 0       | 0000   | 0000      | 8       | 1000   | 1000      |
| 1       | 0001   | 0001      | 9       | 1001   | 1001      |
| 2       | 0010   | 0011      | 10      | 1010   | 1011      |
| 3       | 0011   | 0010      | 11      | 1011   | 1100      |
| 4       | 0100   | 0110      | 12      | 1100   | 1101      |
| 5       | 0101   | 0111      | 13      | 1101   | 1011      |
| 6       | 0110   | 0101      | 14      | 1110   | 1001      |
| 7       | 0111   | 0100      | 15      | 1111   | 1000      |

- There are many gray code sequences

## 4. Gray Code (2/3)

- There are many Gray code sequences.
- Example: For 3 bits, here are some possible Gray code sequences:
  - These are NOT Gray codes (why?):
 

|     |     |     |     |     |     |
|-----|-----|-----|-----|-----|-----|
| 000 | 000 | 110 | 000 | 010 | 010 |
| 001 | 010 | 111 | 001 | 110 | 011 |
| 011 | 110 | 101 | 010 | 101 | 111 |
| 010 | 111 | 100 | 011 | 001 | 110 |
| 110 | 011 | 000 | 100 | 100 | 111 |
| 001 | 001 | 001 | 101 | 111 | 101 |
| 101 | 101 | 011 | 110 | 000 | 001 |
| 100 | 100 | 010 | 111 | 011 | 000 |

This is the standard Gray code.

## 4. Gray Code (3/3)

- Generating a 4-bit standard Gray code sequence:
 

|         |         |
|---------|---------|
| 0 0 0 0 | 1 1 0 0 |
| 0 0 0 1 | 1 1 0 1 |
| 0 0 1 1 | 1 1 1 1 |
| 0 0 1 0 | 1 1 1 0 |
| 0 1 1 0 | 1 0 1 0 |
| 0 1 1 1 | 1 0 1 1 |
| 0 1 0 1 | 1 0 0 1 |
| 0 1 0 0 | 1 0 0 0 |

- First 2 are same as binary, then draw a line and mirror it. Change next msb to 1. then do again after 4 numbers, then 8 numbers, then 16

## K-maps

- Systematic method to find simplified sum-of-products (SOP)
- Limited to 5/6 variables
  - Karnaugh-map (K-map) is an abstract form of Venn diagram, organised as a matrix of squares, where
    - Each square represents a minterm
    - Two adjacent squares represent minterms that differ by exactly one literal

## 5.1 2-Variable K-maps (1/3)

- Let the 2 variables be  $a$  and  $b$ . The K-map can be drawn as...
  - Alternative layouts of a 2-variable (a, b) K-map:



## 5.1 2-Variable K-maps (2/3)

- Alternative labelling of a 2-variable (a, b) K-map:





## 5.1 2-Variable K-maps (3/3)

- The K-map for a function is filled by putting
  - A '1' in the square corresponds to a minterm of the function
  - A '0' otherwise
- Example: Half adder.



## 5.1 3-Variable K-maps (1/2)

- As there are 8 minterms for 3 variables, so there are 8 squares in a 3-variable K-map.

- Example: Let the variables be  $a, b, c$ .



Above arrangement ensures that minterms of adjacent cells differ by only **ONE literal**. (Other arrangements which satisfy this criterion may also be used.)

## 5.1 3-Variable K-maps (2/2)

- There is **wrap-around** in the K-map:
  - $a'b'c'$  ( $m0$ ) is adjacent to  $a'b'c$  ( $m2$ )
  - $a'b'c'$  ( $m4$ ) is adjacent to  $a'b'c$  ( $m6$ )



Each cell in a 3-variable K-map has 3 adjacent neighbours.  
For example,  $m0$  has 3 adjacent neighbours:  $m1, m2$  and  $m4$ .

In general, each cell in an  $n$ -variable K-map has  $n$  adjacent neighbours.

### Quick Review Questions #1

- DLD page 106, questions 5-1 to 5-2.
- 5-1. The K-map of a 3-variable function  $F$  is shown below. What is the sum-of-minterms expression of  $F$ ?



- 5-2. Draw the K-map for this function  $A$ :  $A(x, y, z) = x'y + y'z + x'y'z$

$$A(x, y, z) = x'y + y'z + x'y'z$$

## 5.1 4-Variable K-maps (2/2)

- There are 2 wrap-arounds.
- Every cell has 4 neighbours.
- Example: The cell corresponding to minterm  $m0$  has neighbours  $m1, m2, m4$  and  $m8$ .



## 5.1 5-Variable K-maps (2/2)

- Organised as two 4-variable K-maps. One for  $v'$  and the other for  $v$ .



## 5.2 How to Use K-maps (1/7)

- Based on the Unifying Theorem (complement law):  
 $\underline{\mathbf{A + A' = 1}}$
- In a K-map, each cell containing a '1' corresponds to a minterm of a given function  $F$  where the output is 1.
- Each valid grouping of adjacent cells containing '1' then corresponds to a simpler product term of  $F$ .
  - A group must have size in powers of two: 1, 2, 4, 8, ...
  - Grouping 2 adjacent cells eliminates 1 variable from the product term; grouping 4 cells eliminates 2 variables; grouping 8 cells eliminates 3 variables, and so on. In general, grouping  $2^n$  cells eliminates  $n$  variables.

## 5.2 How to Use K-maps (4/7)

- Each group of adjacent minterms corresponds to a possible product term of the given function.

Here, there are 2 groups of minterms, A and B:

$$\begin{aligned} \mathbf{A} &= w'x'y'z' + w'x'y'z = w'x'y'(z' + z) = \underline{\mathbf{w'x'y'}} \\ \mathbf{B} &= w'x'y'z' + w'x'y'z + w'x'y'z' + w'x'y'z \\ &= w'x'y(z' + z) + w'x'y(z' + z) \\ &= w'x'y + w'x'y \\ &= w(x' + x)y \\ &= \underline{\mathbf{w.y}} \end{aligned}$$



## 5.2 How to Use K-maps (6/7)

- The larger the group (the more minterms it contains), the fewer is the number of literals in the associated product term.
  - Recall that a group must have size in powers of two.
  - Example: For a 4-variable K-map with variables w, x, y, z
 

|          |                                                    |
|----------|----------------------------------------------------|
| 1 cell   | = 4 literals. Examples: $w'x'y'z$ , $w'x'y'z'$     |
| 2 cells  | = 3 literals. Examples: $w'x'y$ , $w'y'z$          |
| 4 cells  | = 2 literals. Examples: $w'x$ , $x'y$              |
| 8 cells  | = 1 literal. Examples: $w$ , $y$ , $z$             |
| 16 cells | = no literal (i.e. logical constant 1). Example: 1 |

## 5.2 How to Use K-maps (7/7)

- Examples of valid and invalid groupings.



## 5.3 Converting to Minterms Form (2/2)

- Example:
 
$$\begin{aligned} F(A,B,C,D) &= A.(C+D).(B'+D') + C.(B+C'+A'.D) \\ &= A.(C'.D').(B'+D') + B.C + C.C' + A'.C.D \\ &\checkmark= \underline{A.B'.C'.D'} + \underline{A.C'.D'} + \underline{B.C} + \underline{A'.C.D} \end{aligned}$$

Expanding it to sum of minterms (unnecessary):

$$\begin{aligned} &A.B'.C'.D' + \underline{A.C'.D'} + B.C + A'.C.D \\ &= A.B'.C'.D' + A.C'.D'.(B+B') + \underline{B.C} + A'.C.D \\ &= A.B'.C'.D' + A.B.C'.D' + A.B'.C'.D' + B.C.(A+A') \\ &\quad + A'.C.D \\ &= A.B'.C'.D' + A.B.C'.D' + A.B.C + A'.B.C + \\ &\quad A'.C.D \\ &= A.B'.C'.D' + A.B.C'.D' + A.B.C.(D+D') + \\ &\quad A'.B.C.(D+D') + A'.C.D.(B+B') \\ &= A.B'.C'.D' + A.B.C'.D' + A.B.C.D + A.B.C.D' + \\ &\quad A'.B.C.D + A'.B.C.D' + A'.B'.C.D \end{aligned}$$



## 5.2 How to Use K-maps (2/7)

- Group as many cells as possible
  - The larger the group, the fewer the number of literals in the resulting product term.
- Select as few groups as possible to cover all the cells (minterms) of the function
  - The fewer the groups, the fewer is the number of product terms in the simplified SOP expression.

## 5.2 How to Use K-maps (5/7)

- Each product term that corresponds to a group,  $w'x'y'$  and  $w.y$ , represents the sum of minterms in that group.
- Boolean expression is therefore the sum of product terms (SOP) that represent all groups of the minterms of the function:
 
$$F(w,x,y,z) = \text{group A} + \text{group B} = \underline{\mathbf{w'x'y'}} + \underline{\mathbf{w.y}}$$

## 5.4 PIs and EPIS (1/3)

- To find the simplest (minimal) SOP expression from a K-map, you need to obtain:
  - Minimum number of literals per product term; and
  - Minimum number of product terms.
- Achieved through K-map using
  - Bigger groupings of minterms (prime implicants) where possible; and
  - No redundant groupings (look for essential prime implicants)
- Implicant: a product term that could be used to cover minterms of the function.

## 5.4 PIs and EPIS (2/3)

- Prime implicant (PI): a product term obtained by combining the *maximum possible number of minterms* from adjacent squares in the map. (That is, it is the biggest grouping possible.)
- Always look for prime implicants in a K-map.



## 5.4 PIs and EPIS (3/3)

- No redundant groups:



- Essential prime implicant (EPI): a prime implicant that includes at least one minterm that is not covered by any other prime implicant.

## Quick Review Questions #2

- DLD page 106, question 5-3.
- 5-3. Identify the prime implicants and essential prime implicants of the two K-maps below.



Finding simplified SOP exp

## 5.5 Finding Simplified SOP Expression (1/4)

- Algorithm

- Circle all prime implicants on the K-map.
- Identify and select all essential prime implicants for the cover.
- Select a minimum subset of the remaining prime implicants to complete the cover, that is, to cover those minterms not covered by the essential prime implicants.

## 5.5 Finding Simplified SOP Expression (3/4)



- DLD pages 106-107, questions 5-4 to 5-7.

- 5-4. Find the minimal SOP expression for G(A,B,C,D).

Prime implicants?

B·D, A'·B·C', A·C'·D, A·B·C, A'·C·D  
Essential prime implicants?  
All, except B·D, are essential.

$$G = A' \cdot B \cdot C' + A \cdot C' \cdot D + A \cdot B \cdot C + A' \cdot C \cdot D$$



Simplified POS Expression

## 5.6 Finding Simplified POS Expression (2/2)



This gives the SOP of  $F$  to be  
 $F = B'D' + AB$

To get POS of  $F$ , we have

$$\begin{aligned} F &= (B'D')' \cdot (AB)' \\ &= (B'D')' \cdot (A'B')' \quad (\text{DeMorgan}) \\ &= (B'D') \cdot (A'B')' \quad (\text{DeMorgan}) \end{aligned}$$

To go from SOP to POS, take the complement

## 5.7 Don't-Care Conditions (1/5)

- In certain problems, some outputs are not specified or are invalid. Hence, these outputs can be either '1' or '0'.
- They are called **don't-care conditions**, denoted by X (or d).
- Example: A circuit takes in a 3-bit value ABC and outputs 2-bit value FG which is the sum of the input bits. It is also known that inputs 000 and 111 never occur.

| A | B | C | F | G |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 1 | 0 |
| 0 | 1 | 0 | 1 | 1 |
| 0 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 1 | 0 |
| 1 | 0 | 1 | 0 | 0 |
| 1 | 1 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 1 |

| A | B | C | F | G |
|---|---|---|---|---|
| 0 | 0 | 0 | X | X |
| 0 | 0 | 1 | 0 | 1 |
| 0 | 1 | 0 | 1 | 0 |
| 0 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 1 | 0 |
| 1 | 0 | 1 | 0 | 0 |
| 1 | 1 | 0 | 0 | 0 |
| 1 | 1 | 1 | X | X |

## 5.7 Don't-Care Conditions (3/5)

### Comparison

- Without don't-cares:

$$\begin{aligned} F(A,B,C) &= \Sigma m(3, 5, 6, 7) \\ G(A,B,C) &= \Sigma m(1, 2, 4) \end{aligned}$$



$$F = A'C + AB + BC$$

$$G = A'B'C' + A'B'C + AB'C + A'B'C'$$

### With don't-cares:

$$\begin{aligned} F(A,B,C) &= \Sigma m(3, 5, 6) + \Sigma d(0, 7) \\ G(A,B,C) &= \Sigma m(1, 2, 4) + \Sigma d(0, 7) \end{aligned}$$



$$F = A'C + AB + BC$$

$$G = B'C' + A'B' + A'C'$$

## 5.7 Don't-Care Conditions (4/5)

- Suppose you are given the truth table for a function  $F(K,L,M,N)$  as follows:

- You are also told that the inputs K, L, M, N are taken from the outputs of two half adders as shown:



Then you may revise the truth table:

## 5.7 Don't-Care Conditions (5/5)

### K-map of $F$ :



Pls:  $K'M$ ,  $L'M$ , and  $K'M'N'$   
 (Note:  $K'L$  and  $MN$  not considered Pls as they consist of only X's.)

EPIs:  $K'M$  and  $K'M'N'$



Do we need to have an additional term  $A'B'$  to cover the 2 remaining X's?  
 No, because all the 1's (minterms) have been covered.

## 6. More Examples (2/6)

### Example #2:

$$F(A,B,C,D) = A \cdot B \cdot C + B' \cdot C \cdot D' + A \cdot D + B' \cdot C' \cdot D'$$



$A \cdot D$ ,  $A \cdot C$  and  $B' \cdot D'$  are EPIs, and they cover all the minterms.  
 So the answer is:  $F(A,B,C,D) = A \cdot D + A \cdot C + B' \cdot D'$

## 6. More Examples (4/6)

### Example #3 (with don't-cares):

$$F(A,B,C,D) = \Sigma m(2,8,10,15) + \Sigma d(0,1,3,7)$$



Answer:  $F(A,B,C,D) = B'D' + B \cdot C \cdot D$

## 6. More Examples (5/6)

- Find the simplest POS expression for example #2:  
 $F(A,B,C,D) = A \cdot B \cdot C + B' \cdot C \cdot D' + A \cdot D + B' \cdot C' \cdot D'$
- Draw the K-map of the complement of F, that is,  $F'$ .



From K-map,  
 $F' = A' \cdot B + A' \cdot D + B \cdot C' \cdot D'$   
Using DeMorgan's theorem,  
 $F = (A' \cdot B + A' \cdot D + B \cdot C' \cdot D')'$   
 $= (A+B')(A+D')(B'+C+D)$

## 6. More Examples (6/6)

- Find the simplest POS expression for example #3:  
 $F(A,B,C,D) = \Sigma m(2,8,10,15) + \Sigma d(0,1,3,7)$
- Draw the K-map of the complement of F, that is,  $F'$ .  
 $F'(A,B,C,D) = \Sigma m(4,5,6,9,11,12,13,14) + \Sigma d(0,1,3,7)$



From K-map,  
 $F' = B \cdot C' + B \cdot D' + B' \cdot D$   
Using DeMorgan's theorem,  
 $F = (B \cdot C' + B \cdot D' + B' \cdot D)'$   
 $= (B'+C)(B'+D)(B+D')$

+

# Week 8

Tuesday, October 17, 2023 2:27 AM

## 2. Analysis Procedure

- Given a combinational circuit, how do you analyze its function?

What is this circuit?



- Steps:
  - Label the inputs and outputs.
  - Observe the values of intermediate points and the outputs.
  - Draw the truth table.
  - Deduce the functionality of the circuit  $\Rightarrow$  Half adder.

## 3. Design Methods

- Different combinational circuit design methods:
  - Gate-level design method (with logic gates)
  - Block-level design method (with functional blocks)
- Design methods make use of logic gates and useful function blocks
  - These are available as Integrated Circuit (IC) chips.
  - Types of IC chips (based on packing density): SSI, MSI, LSI, VLSI, ULSI.
- Main objectives of circuit design:
  - Reduce cost (number of gates for small circuits; number of IC packages for complex circuits)
  - Increase speed
  - Design simplicity (re-use blocks where possible)

## 4. Gate-Level (SSI) Design: Half Adder (2/2)

- Obtain simplified Boolean functions.

Example:  $C = X \cdot Y$   
 $S = X' \cdot Y + X \cdot Y' = X \oplus Y$

| X | Y | C | S |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 1 |
| 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 0 |

- Draw the logic diagram.



## 4. Gate-Level (SSI) Design: Full Adder (1/5)

- Half adder adds up only two bits.
- To add two binary numbers, we need to add 3 bits (including the carry).
  - Example:

$$\begin{array}{r} 1 & 1 & 1 \\ 0 & 0 & 1 \\ + & 0 & 1 \\ \hline 1 & 0 & 1 \end{array} \quad \begin{array}{l} \downarrow \\ \text{carry} \end{array}$$

$$\begin{array}{r} X \\ Y \\ \hline S \end{array}$$

## 4. Gate-Level (SSI) Design: Full Adder (2/5)

- Truth table:

| X | Y | Z | C | S |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 | 1 |
| 0 | 1 | 0 | 0 | 1 |
| 0 | 1 | 1 | 1 | 0 |
| 1 | 0 | 0 | 0 | 1 |
| 1 | 0 | 1 | 1 | 0 |
| 1 | 1 | 0 | 1 | 0 |
| 1 | 1 | 1 | 1 | 1 |

Note:

Z - carry in (to the current position)  
 C - carry out (to the next position)



$$\Sigma \text{ Full Adder}$$

$$(X + Y + Z)$$

- Using K-map, simplified SOP form:

$$C = X \cdot Y + X \cdot Z + Y \cdot Z$$

$$S = X' \cdot Y' \cdot Z + X' \cdot Y \cdot Z' + X \cdot Y' \cdot Z' + X \cdot Y \cdot Z$$



## 4. Gate-Level (SSI) Design: Full Adder (3/5)

- Alternative formulae using algebraic manipulation:

$$\begin{aligned}
 C &= X \cdot Y + X \cdot Z + Y \cdot Z \\
 &= X \cdot Y + (X + Y) \cdot Z \\
 &= X \cdot Y + ((X \oplus Y) + X \cdot Y) \cdot Z \\
 &= X \cdot Y + (X \oplus Y) \cdot Z + X \cdot Y \cdot Z \\
 &= X \cdot Y + (X \oplus Y) \cdot Z \\
 X \oplus Y &= (X \oplus Y) + X \cdot Y \\
 X \oplus Y &= X' \cdot Y + X \cdot Y' \\
 (X \oplus Y) + X \cdot Y &= X' \cdot Y + X \cdot Y' + X \cdot Y \\
 &= X' \cdot Y + X \cdot Y' + X \cdot Y \\
 &= X' \cdot Y + X \cdot (Y' + Y) \\
 S &= X' \cdot Y \cdot Z + X' \cdot Y \cdot Z' + X \cdot Y \cdot Z + X \cdot Y \cdot Z' \\
 &= X' \cdot (Y \cdot Z + Y \cdot Z') + X \cdot (Y \cdot Z + Y \cdot Z') \\
 &= X' \cdot (Y \oplus Z) + X \cdot (Y \oplus Z)' \\
 &= X \oplus (Y \oplus Z)
 \end{aligned}$$

## 4. Gate-Level (SSI) Design: Full Adder (4/5)

- Circuit for above formulae:

$$C = X \cdot Y + (X \oplus Y) \cdot Z$$

$$S = X \oplus (Y \oplus Z) = (X \oplus Y) \oplus Z \text{ (XOR is associative)}$$



- Full adder made from 2 half adders + or gate

## 4. BCD to Excess-3 Code Converter (1/3)

| Digit | BCD code | Excess-3 code |
|-------|----------|---------------|
| 0     | 0000     | 0011          |
| 1     | 0001     | 0100          |
| 2     | 0010     | 0101          |
| 3     | 0011     | 0110          |
| 4     | 0100     | 0111          |
| 5     | 0101     | 1000          |
| 6     | 0110     | 1001          |
| 7     | 0111     | 1010          |
| 8     | 1000     | 1011          |
| 9     | 1001     | 1100          |



## 4. BCD to Excess-3 Code Converter (2/3)

- Truth table:

| BCD      | Excess-3 |   |   |   |   |   |   |   |
|----------|----------|---|---|---|---|---|---|---|
|          | A        | B | C | D | W | X | Y | Z |
| 0 0 0 0  | 0        | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 1 0 0 0  | 1        | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 2 0 0 1  | 0        | 0 | 1 | 0 | 0 | 1 | 0 | 1 |
| 3 0 0 1  | 1        | 0 | 1 | 0 | 1 | 1 | 0 | 0 |
| 4 0 1 0  | 0        | 1 | 0 | 0 | 0 | 1 | 1 | 1 |
| 5 0 1 0  | 0        | 1 | 0 | 1 | 1 | 0 | 0 | 0 |
| 6 0 1 1  | 0        | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 7 0 1 1  | 1        | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| 8 1 0 0  | 0        | 0 | 0 | 1 | 0 | 1 | 1 | 1 |
| 9 1 0 0  | 1        | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| 10 1 0 1 | 0        | 1 | 0 | 1 | 0 | X | X | X |
| 11 1 0 1 | 1        | 0 | 1 | 1 | 1 | X | X | X |
| 12 1 1 0 | 1        | 1 | 0 | 0 | 0 | X | X | X |
| 13 1 1 0 | 1        | 1 | 0 | 1 | 1 | X | X | X |
| 14 1 1 1 | 0        | 1 | 1 | 1 | 0 | X | X | X |
| 15 1 1 1 | 1        | 1 | 1 | 1 | 1 | X | X | X |

- K-maps:



#### 4. BCD to Excess-3 Code Converter (3/3)



### 5. Block-Level Design

- More complex circuits can also be built using **block-level** method.
- In general, block-level design method (as opposed to gate-level design) relies on algorithms or formulae of the circuit, which are obtained by decomposing the main problem to sub-problems recursively (until small enough to be directly solved by blocks of circuits).
- First example shows how to create a 4-bit parallel adder using block-level design.
- Using **4-bit parallel adders** as building blocks, we can create the following:
  - BCD-to-Excess-3 Code Converter
  - 16-bit Parallel Adder

#### 5. 4-bit Parallel Adder (1/4)

- Consider a circuit to add two 4-bit numbers together and a carry-in, to produce a 5-bit result.



- 5-bit result is sufficient because the largest result is:

$$1111_2 + 1111_2 + 1_2 = 11111_2$$

#### 5. 4-bit Parallel Adder (4/4)

- Cascading 4 full adders via their carries, we get:



- Note that carry is propagated by cascading the carry from one full adder to the next.
- Called **Parallel Adder** because inputs are presented simultaneously (in parallel). Also called **Ripple-Carry Adder**.

## 5. BCD to Excess-3 Converter: Revisit (1/2)

- Excess-3 code can be converted from BCD code using truth table:
- Gate-level design can be used since only 4 inputs.
- However, alternative design is possible.
- Use problem-specific formula:  
Excess-3 code  
= BCD Code + 0011<sub>2</sub>

| BCD |   |   |   | Excess-3 |   |   |   |
|-----|---|---|---|----------|---|---|---|
| A   | B | C | D | W        | X | Y | Z |
| 0   | 0 | 0 | 0 | 0        | 0 | 0 | 1 |
| 1   | 0 | 0 | 0 | 0        | 1 | 0 | 0 |
| 2   | 0 | 0 | 0 | 1        | 0 | 1 | 0 |
| 3   | 0 | 0 | 0 | 1        | 0 | 1 | 1 |
| 4   | 0 | 1 | 0 | 0        | 0 | 1 | 1 |
| 5   | 0 | 1 | 0 | 1        | 1 | 0 | 0 |
| 6   | 0 | 1 | 0 | 1        | 1 | 0 | 1 |
| 7   | 0 | 1 | 0 | 1        | 1 | 1 | 0 |
| 8   | 0 | 1 | 0 | 0        | 0 | 0 | 1 |
| 9   | 0 | 1 | 0 | 0        | 1 | 0 | 1 |
| 10  | 1 | 0 | 0 | 1        | 0 | 1 | 0 |
| 11  | 1 | 0 | 0 | 1        | 1 | 0 | 0 |
| 12  | 1 | 0 | 0 | 0        | 0 | 0 | 0 |
| 13  | 1 | 0 | 0 | 0        | 1 | 0 | 0 |
| 14  | 1 | 0 | 0 | 1        | 0 | 0 | 0 |
| 15  | 1 | 0 | 0 | 1        | 1 | 0 | 0 |

## 5. BCD to Excess-3 Converter: Revisit (2/2)

- Block-level circuit:



Note: In the lab, input 0 (low) is connected to GND, 1 (high) to Vcc.

## 5. 16-bit Parallel Adder

- Larger parallel adders can be built from smaller ones.
- Example: A 16-bit parallel adder can be constructed from four 4-bit parallel adders:



## 7. Example: 6-Person Voting System

- Application: 6-person voting system.
- Use FAs and a 4-bit parallel adder.
- Each FA can sum up to 3 votes.



## 8. Magnitude Comparator (1/4)

- Magnitude comparator: compares 2 unsigned values A and B, to check if A>B, A=B, or A<B.
- To design an n-bit magnitude comparator using classical method, it would require  $2^{2n}$  rows in truth table!
- We shall exploit regularity in our design.
- Question: How do we compare two 4-bit unsigned values A (a<sub>3</sub>a<sub>2</sub>a<sub>1</sub>a<sub>0</sub>) and B (b<sub>3</sub>b<sub>2</sub>b<sub>1</sub>b<sub>0</sub>)?  
If (a<sub>3</sub> > b<sub>3</sub>) then A > B  
If (a<sub>3</sub> < b<sub>3</sub>) then A < B  
If (a<sub>3</sub> = b<sub>3</sub>) then if (a<sub>2</sub> > b<sub>2</sub>) ...

## 8. Magnitude Comparator (2/4)

Let  $A = A_3A_2A_1A_0$ ,  $B = B_3B_2B_1B_0$ ;  $x_i = A_iB_i + A'_iB'_i$



## 8. Magnitude Comparator (3/4)

- Block diagram of a 4-bit magnitude comparator



## 8. Magnitude Comparator (4/4)

- A function F accepts a 4-bit binary value ABCD, and returns 1 if  $3 \leq ABCD \leq 12$ , or 0 otherwise. How would you implement F using magnitude comparators and a suitable logic gate?



## 9. Circuit Delays (1/5)

- Given a logic gate with delay  $t$ . If inputs are stable at times  $t_1, t_2, \dots, t_n$ , then the earliest time in which the output will be stable is:

$$\max(t_1, t_2, \dots, t_n) + t$$



- To calculate the delays of all outputs of a combinational circuit, repeat above rule for all gates.

## 9. Circuit Delays (2/5)

- As a simple example, consider the full adder circuit where all inputs are available at time 0. Assume each gate has delay  $t$ .



## 9. Circuit Delays (4/5)

- Analyse the delay for the repeated block.



- Performing the delay calculation:



## 9. Circuit Delays (5/5)

- Calculating:

When  $i=1$ ,  $m=0$ ;  $S_1 = 2t$  and  $C_2 = 3t$   
When  $i=2$ ,  $m=3$ ;  $S_2 = 4t$  and  $C_3 = 5t$   
When  $i=3$ ,  $m=5$ ;  $S_3 = 6t$  and  $C_4 = 7t$   
When  $i=4$ ,  $m=7$ ;  $S_4 = 8t$  and  $C_5 = 9t$

- In general, an  $n$ -bit ripple-carry parallel adder will experience the following delay times:

$$S_n = ((n-1)2 + 2)t$$

$$C_{n+1} = ((n-1)2 + 3)t$$

- Propagation delay of ripple-carry parallel adders is proportional to the number of bits it handles.



# Week 9

Sunday, October 22, 2023 6:56 PM

## 1. Introduction (1/2)

- An integrated circuit (referred to as an **IC**, a **chip** or a **microchip**) is a set of electronic circuits on one small flat piece (or 'chip') of semiconductor material.
- Scale of integration: the number of components fitted into a standard size IC

| Name | Signification                 | Year | #transistors | #logic gates  |
|------|-------------------------------|------|--------------|---------------|
| SSI  | Small-scale Integration       | 1964 | 1 to 10      | 1 to 12       |
| MSI  | Medium-scale Integration      | 1968 | 10 to 500    | 13 to 99      |
| LSI  | Large-scale Integration       | 1971 | 500 to 20000 | 100 to 9999   |
| VLSI | Very large-scale integration  | 1980 | 20k to 1m    | 10k to 99999  |
| ULSI | Ultra-large-scale integration | 1984 | 1m and more  | 100k and more |

## 2. Decoders (1/5)

- Codes are frequently used to represent entities, eg: your name is a code to denote yourself (an entity).
- These codes can be identified (or decoded) using a decoder. Given a code, identify the entity.
- Convert binary information from  $n$  input lines to (a maximum of)  $2^n$  output lines.
- Known as  $n$ -to- $m$ -line decoder, or simply  $n:m$  or  $n \times m$  decoder ( $m \leq 2^n$ ).
- May be used to generate  $2^n$  minterms of  $n$  input variables.

## 2. Decoders (2/5)

- Example: If codes 00, 01, 10, 11 are used to identify four light bulbs, we may use a 2-bit decoder.



- This is a **2x4 decoder** which selects an output line based on the 2-bit code supplied.

| X | Y | F <sub>0</sub> | F <sub>1</sub> | F <sub>2</sub> | F <sub>3</sub> |
|---|---|----------------|----------------|----------------|----------------|
| 0 | 0 | 1              | 0              | 0              | 0              |
| 0 | 1 | 0              | 1              | 0              | 0              |
| 1 | 0 | 0              | 0              | 1              | 0              |
| 1 | 1 | 0              | 0              | 0              | 1              |

For an  $n$ -bit code, a decoder could select up to  $2^n$  lines for output

## 2. Decoders: Implementing Functions (1/3)

- A Boolean function, in **sum-of-minterms** form  $\Rightarrow$ 
  - decoder to generate the minterms, and
  - an OR gate to form the sum.
- Any combinational circuit with  **$n$  inputs** and  **$m$  outputs** can be implemented with an  **$n:2^m$  decoder** with  **$m$  OR gates**.
- Good when circuit has many outputs, and each function is expressed with a few minterms.

## 2. Decoders with Enable (1/2)

- Decoders often come with an **enable control** signal, so that the device is only activated when the enable,  $E = 1$ .

- Truth table:

| E | X | Y | F <sub>0</sub> | F <sub>1</sub> | F <sub>2</sub> | F <sub>3</sub> |
|---|---|---|----------------|----------------|----------------|----------------|
| 1 | 0 | 0 | 1              | 0              | 0              | 0              |
| 1 | 0 | 1 | 0              | 1              | 0              | 0              |
| 1 | 1 | 0 | 0              | 0              | 1              | 0              |
| 1 | 1 | 1 | 0              | 0              | 0              | 1              |
| 0 | d | d | 0              | 0              | 0              | 0              |

- Circuit of a **2x4 decoder with enable**:



## 2. Decoders (3/5)

- From truth table, circuit for **2x4 decoder** is:

- Note: Each output is a minterm ( $X'Y'$ ,  $X'Y$  or  $X:Y$ ) of a 2-variable function

| X | Y | F <sub>0</sub> | F <sub>1</sub> | F <sub>2</sub> | F <sub>3</sub> |
|---|---|----------------|----------------|----------------|----------------|
| 0 | 0 | 1              | 0              | 0              | 0              |
| 0 | 1 | 0              | 1              | 0              | 0              |
| 1 | 0 | 0              | 0              | 1              | 0              |
| 1 | 1 | 0              | 0              | 0              | 1              |



## 2. Decoders (4/5)

- Design a **3x8 decoder**.

| x | y | z | F <sub>0</sub> | F <sub>1</sub> | F <sub>2</sub> | F <sub>3</sub> | F <sub>4</sub> | F <sub>5</sub> | F <sub>6</sub> | F <sub>7</sub> |
|---|---|---|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| 0 | 0 | 0 | 1              | 0              | 0              | 0              | 0              | 0              | 0              | 0              |
| 0 | 0 | 1 | 0              | 1              | 0              | 0              | 0              | 0              | 0              | 0              |
| 0 | 1 | 0 | 0              | 0              | 1              | 0              | 0              | 0              | 0              | 0              |
| 0 | 1 | 1 | 0              | 0              | 0              | 1              | 0              | 0              | 0              | 0              |
| 1 | 0 | 0 | 0              | 0              | 0              | 0              | 1              | 0              | 0              | 0              |
| 1 | 0 | 1 | 0              | 0              | 0              | 0              | 0              | 1              | 0              | 0              |
| 1 | 1 | 0 | 0              | 0              | 0              | 0              | 0              | 0              | 1              | 0              |
| 1 | 1 | 1 | 0              | 0              | 0              | 0              | 0              | 0              | 0              | 1              |



## 2. Decoders: Implementing Functions (3/3)

$$S(x, y, z) = \sum m(1, 2, 4, 7)$$

$$C(x, y, z) = \sum m(3, 5, 6, 7)$$



The numbers 0,1,2 ... 7 come from ordering in truth table  
They are the minterm number

## 2. Decoders with Enable (2/2)

- In the previous slide, the decoder has a **one-enable** control signal, i.e. the decoder is enabled with  $E=1$ .
- In most MSI decoders, enable signal is **zero-enable**, usually denoted by  $E'$  or  $\bar{E}$ . The decoder is enabled when the signal is zero (low).

| E | X | Y | F <sub>0</sub> | F <sub>1</sub> | F <sub>2</sub> | F <sub>3</sub> |
|---|---|---|----------------|----------------|----------------|----------------|
| 1 | 0 | 0 | 1              | 0              | 0              | 0              |
| 1 | 0 | 1 | 0              | 1              | 0              | 0              |
| 1 | 1 | 0 | 0              | 0              | 1              | 0              |
| 1 | 1 | 1 | 0              | 0              | 0              | 1              |
| 0 | d | d | 0              | 0              | 0              | 0              |

Decoder with 1-enable

| E' | X | Y | F <sub>0</sub> | F <sub>1</sub> | F <sub>2</sub> | F <sub>3</sub> |
|----|---|---|----------------|----------------|----------------|----------------|
| 0  | 0 | 0 | 1              | 0              | 0              | 0              |
| 0  | 0 | 1 | 0              | 1              | 0              | 0              |
| 0  | 1 | 0 | 0              | 0              | 1              | 0              |
| 0  | 1 | 1 | 0              | 0              | 0              | 1              |
| 1  | d | d | 0              | 0              | 0              | 0              |

Decoder with 0-enable

## 2. Constructing Larger Decoders (2/4)



## 2. Constructing Larger Decoders (3/4)

= Construct a 4x16 decoder from two 3x8 decoders with one-enable and an inverter.



\* Note: The input w and its complement w' are used to select either one of the two smaller decoders.

## 2. Standard MSI Decoder (1/2)

- 74138 (3-to-8 decoder)



## 2. Standard MSI Decoder (2/2)

| INPUTS         |                |   | OUTPUTS        |                |                |                |                |                |                |                |
|----------------|----------------|---|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| G <sub>1</sub> | G <sub>2</sub> | C | Y <sub>0</sub> | Y <sub>1</sub> | Y <sub>2</sub> | Y <sub>3</sub> | Y <sub>4</sub> | Y <sub>5</sub> | Y <sub>6</sub> | Y <sub>7</sub> |
| X              | X              | X | X              | H              | H              | H              | H              | H              | H              | H              |
| L              | X              | X | X              | H              | H              | H              | H              | H              | H              | H              |
| L              | L              | X | L              | H              | H              | H              | H              | H              | H              | H              |
| M              | L              | L | H              | L              | H              | H              | H              | H              | H              | H              |
| H              | L              | L | H              | H              | L              | H              | H              | H              | H              | H              |
| H              | L              | L | H              | H              | H              | L              | H              | H              | H              | H              |
| H              | L              | L | H              | H              | H              | H              | L              | H              | H              | H              |
| H              | L              | L | H              | H              | H              | H              | H              | L              | H              | H              |
| H              | L              | L | H              | H              | H              | H              | H              | H              | L              | H              |
| H              | L              | L | H              | H              | H              | H              | H              | H              | H              | L              |

74138 decoder module.  
(c) Function table.

74138 decoder module.  
(d) Generic symbol.  
(e) IEEE standard logic symbol.

Source: The Data Book Volume 2, Texas Instruments Inc., 1985

For CBA = 001,  
We get Y<sub>1</sub> = 0 and others = 1 which  
is opposite of previous ex.  
We are looking for the 0

## 2. Decoders: Implementing Functions Revisit (1/2)

- Example: Implement the following function using a 3x8 decoder and an appropriate logic gate

$$f(Q, X, P) = \sum m(0, 1, 4, 6, 7) = \prod M(2, 3, 5)$$

- We may implement the function in several ways:

- Using a decoder with active-high outputs with an OR gate:  
 $f(Q, X, P) = m_0 + m_1 + m_4 + m_6 + m_7$
- Using a decoder with active-low outputs with a NAND gate:  
 $f(Q, X, P) = (m_0' \cdot m_1' \cdot m_4' \cdot m_6' \cdot m_7')$
- Using a decoder with active-high outputs with a NOR gate:  
 $f(Q, X, P) = (m_2 + m_3 + m_5)' [ = M_2 \cdot M_3 \cdot M_5 ]$
- Using a decoder with active-low outputs with an AND gate:  
 $f(Q, X, P) = m_2' \cdot m_3' \cdot m_5'$

## 2. Decoders: Implementing Functions Revisit (2/2)

$$f(Q, X, P) = \sum m(0, 1, 4, 6, 7) = \prod M(2, 3, 5)$$



## ENCODERS

Converse of decoding

Given a set of input lines, of which exactly 1 is high and rest are low, encoder provides a code that corresponds to that high input line

Contains 2^n or fewer input lines and n output lines

Implemented with OR gates

### 3. Encoders (2/4)

- Truth table:
- With K-map, we obtain:
  - $D_0 = F_1 + F_3$
  - $D_1 = F_2 + F_3$
- Circuit:



Simple 4-to-2 encoder

| F <sub>0</sub> | F <sub>1</sub> | F <sub>2</sub> | F <sub>3</sub> | D <sub>0</sub> | D <sub>1</sub> |
|----------------|----------------|----------------|----------------|----------------|----------------|
| 1              | 0              | 0              | 0              | 0              | 0              |
| 0              | 1              | 0              | 0              | 0              | 1              |
| 0              | 0              | 1              | 0              | 1              | 0              |
| 0              | 0              | 0              | 1              | 1              | 1              |
| 0              | 0              | 0              | 0              | X              | X              |
| 0              | 0              | 1              | 1              | X              | X              |
| 0              | 1              | 0              | 1              | X              | X              |
| 0              | 1              | 1              | 0              | X              | X              |
| 0              | 1              | 1              | 1              | X              | X              |
| 1              | 0              | 0              | 1              | X              | X              |
| 1              | 0              | 1              | 0              | X              | X              |
| 1              | 0              | 1              | 1              | X              | X              |
| 1              | 1              | 0              | 0              | X              | X              |
| 1              | 1              | 0              | 1              | X              | X              |
| 1              | 1              | 1              | 0              | X              | X              |
| 1              | 1              | 1              | 1              | X              | X              |

invalid

### 3. Encoders (3/4)

- Example: 8-to-3 encoder.
  - At any one time, only one input line of an encoder has a value of 1 (high), the rest are zeroes (low).
  - To allow for more than one input line to carry a 1, we need priority encoder.

| Inputs         |                |                |                |                |                |                |                | Outputs |   |   |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|---------|---|---|
| D <sub>0</sub> | D <sub>1</sub> | D <sub>2</sub> | D <sub>3</sub> | D <sub>4</sub> | D <sub>5</sub> | D <sub>6</sub> | D <sub>7</sub> | x       | y | z |
| 1              | 0              | 0              | 0              | 0              | 0              | 0              | 0              | 0       | 0 | 0 |
| 0              | 1              | 0              | 0              | 0              | 0              | 0              | 0              | 0       | 0 | 1 |
| 0              | 0              | 1              | 0              | 0              | 0              | 0              | 0              | 0       | 1 | 0 |
| 0              | 0              | 0              | 1              | 0              | 0              | 0              | 0              | 0       | 1 | 1 |
| 0              | 0              | 0              | 0              | 1              | 0              | 0              | 0              | 1       | 0 | 0 |
| 0              | 0              | 0              | 0              | 0              | 1              | 0              | 0              | 1       | 0 | 1 |
| 0              | 0              | 0              | 0              | 0              | 0              | 1              | 0              | 1       | 1 | 0 |
| 0              | 0              | 0              | 0              | 0              | 0              | 0              | 1              | 1       | 1 | 1 |

- Example: 8-to-3 encoder.



An 8-to-3 encoder

### 3. Priority Encoders (1/2)

- A priority encoder is one with priority
  - If two or more inputs are equal to 1, the input with the highest priority takes precedence.
  - If all inputs are 0, this input combination is considered invalid.
- Example of a 4-to-2 priority encoder:

| Inputs         |                |                |                | Outputs |   |   |
|----------------|----------------|----------------|----------------|---------|---|---|
| D <sub>0</sub> | D <sub>1</sub> | D <sub>2</sub> | D <sub>3</sub> | f       | g | v |
| 0              | 0              | 0              | 0              | X       | X | 0 |
| 1              | 0              | 0              | 0              | 0       | 0 | 1 |
| X              | 1              | 0              | 0              | 0       | 1 | 1 |
| X              | X              | 1              | 0              | 1       | 0 | 1 |
| X              | X              | X              | 1              | 1       | 1 | 1 |

Represents 2 rows with don't care values

Multiplexors and demultiplexers

### 3. Priority Encoders (2/2)

- Understanding "compact" function table

| Inputs         |                |                |                | Outputs |   |   |
|----------------|----------------|----------------|----------------|---------|---|---|
| D <sub>0</sub> | D <sub>1</sub> | D <sub>2</sub> | D <sub>3</sub> | r       | g | v |
| 0              | 0              | 0              | 0              | X       | X | 0 |
| 1              | 0              | 0              | 0              | 0       | 0 | 1 |
| X              | 1              | 0              | 0              | 0       | 1 | 1 |
| X              | X              | 1              | 0              | 1       | 0 | 1 |
| X              | X              | X              | 1              | 1       | 1 | 1 |

  

| Inputs         |                |                |                | Outputs |   |   |
|----------------|----------------|----------------|----------------|---------|---|---|
| D <sub>0</sub> | D <sub>1</sub> | D <sub>2</sub> | D <sub>3</sub> | f       | g | v |
| 0              | 0              | 0              | 0              | X       | X | 0 |
| 1              | 0              | 0              | 0              | 0       | 0 | 1 |
| X              | 1              | 0              | 0              | 0       | 1 | 1 |
| X              | X              | 1              | 0              | 1       | 0 | 1 |
| X              | X              | X              | 1              | 1       | 1 | 1 |

↑ highest precedence

### Multiplexers and Demultiplexers

- An application:



- Helps share a single communication line among a number of devices.
- At any time, only one source and one destination can use the communication line.

### 4. Demultiplexers (1/2)

- Given an input line and a set of selection lines, a demultiplexer directs data from the input to one selected output line.
- Example: 1-to-4 demultiplexer.



| S <sub>1</sub> | S <sub>0</sub> | Y <sub>0</sub> | Y <sub>1</sub> | Y <sub>2</sub> | Y <sub>3</sub> |
|----------------|----------------|----------------|----------------|----------------|----------------|
| 0              | 0              | 0              | 0              | 0              | 0              |
| 0              | 1              | 0              | 0              | 0              | 0              |
| 1              | 0              | 0              | 0              | 0              | 0              |
| 1              | 1              | 1              | 0              | 0              | 0              |

- Helps share a single communication line among a number of devices.
- At any time, only one source and one destination can use the communication line.



## 4. Demultiplexers (2/2)

- It turns out that the demultiplexer circuit is actually identical to a decoder with enable.



+

## 5. Multiplexers (1/4)

- A multiplexer is a device that has
  - A number of input lines
  - A number of selection lines
  - One output line
- It steers one of  $2^n$  inputs to a single output line, using  $n$  selection lines. Also known as a **data selector**.



## 5. Multiplexers (3/4)

- Output of multiplexer is
 
$$\sum m_i (S_1 \cdot S_0')$$
- Example: Output of a 4-to-1 multiplexer is:

$$Y = I_0(S_1 \cdot S_0') + I_1(S_1 \cdot S_0) + I_2(S_1 \cdot S_0') + I_3(S_1 \cdot S_0)$$

Note:  
Expressing  
 $I_0(S_1 \cdot S_0') + I_1(S_1 \cdot S_0) + I_2(S_1 \cdot S_0') + I_3(S_1 \cdot S_0)$   
in minterms notation, it is equal to  
 $I_0m_0 + I_1m_1 + I_2m_2 + I_3m_3$   
This is useful later (eg: slide 45).

## 5. Multiplexers (2/4)

- Truth table for a 4-to-1 multiplexer:

| I <sub>0</sub> | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | S <sub>1</sub> | S <sub>0</sub> | Y              |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| 0              | 0              | 0              | 0              | 0              | 0              | I <sub>0</sub> |
| 0              | 0              | 0              | 0              | 0              | 1              | I <sub>1</sub> |
| 0              | 0              | 0              | 0              | 1              | 0              | I <sub>2</sub> |
| 0              | 0              | 0              | 0              | 1              | 1              | I <sub>3</sub> |



## 5. Multiplexer IC Package

- Some IC packages have a few multiplexers in each package (chip). The selection and enable inputs are common to all multiplexers within the package.



## 5. Multiplexers (4/4)

- A 2<sup>n</sup>-to-1-line multiplexer, or simply 2<sup>n</sup>:1 MUX, is made from an  $n$ :2<sup>n</sup> decoder by adding to it  $2^n$  input lines, one to each AND gate.

- A 4:1 multiplexer circuit:



## 5. Constructing Larger Multiplexers (2/4)



- When S<sub>2</sub>S<sub>1</sub>S<sub>0</sub> = 000
- When S<sub>2</sub>S<sub>1</sub>S<sub>0</sub> = 001
- When S<sub>2</sub>S<sub>1</sub>S<sub>0</sub> = 110

## 5. Constructing Larger Multiplexers (3/4)

- Another implementation of an 8-to-1 multiplexer using smaller multiplexers:



## 5. Constructing Larger Multiplexers (4/4)

- A 16-to-1 multiplexer can be constructed from five 4-to-1 multiplexers:



## 5. Standard MSI Multiplexer (1/2)



74151A 8-to-1 multiplexer. (a) Package configuration. (b) Function table.

## 5. Standard MSI Multiplexer (2/2)



74151A 8-to-1 multiplexer. (c) Logic diagram. (d) Generic logic symbol.  
(e) IEEE standard logic symbol.

Source: The TTL Data Book Volume 2, Texas Instruments Inc., 1985.

## 5. Multiplexers: Implementing Functions (1/3)

- Boolean functions can be implemented using multiplexers.
- A  $2^n$ -to-1 multiplexer can implement a Boolean function of  $n$  input variables, as follows:
  - Express in sum-of-minterms form.
  - Connect  $n$  variables to the  $n$  selection lines.
  - Put a '1' on a data line if it is a minterm of the function, or '0' otherwise.

## 5. Multiplexers: Implementing Functions (2/3)

$$F(A, B, C) = \sum m(1, 3, 5, 6)$$

This method works because:

$$\text{Output} = I_0 \cdot m_0 + I_1 \cdot m_1 + I_2 \cdot m_2 + I_3 \cdot m_3 + I_4 \cdot m_4 + I_5 \cdot m_5 + I_6 \cdot m_6 + I_7 \cdot m_7$$

Supplying '1' to  $I_1, I_3, I_5, I_6$ , and '0' to the rest:

$$\text{Output} = m_1 + m_3 + m_5 + m_6$$

ABC corresponds to a minterm

From slide 34 (4:1 mux)  
Expressing  
 $I_0(S_1 \cdot S_0) + I_1(S_1 \cdot S_0) + I_2(S_1 \cdot S_0) + I_3(S_1 \cdot S_0)$   
in minterms notation, it is equal to  
 $I_0 m_0 + I_1 m_1 + I_2 m_2 + I_3 m_3$

## 5. Multiplexers: Implementing Functions (3/3)

- Example: Use a 74151A to implement  $f(x_1, x_2, x_3) = \sum m(0, 2, 3, 5)$



## 5. Using Smaller Multiplexers (1/6)

- Earlier, we saw how a  $2^n$ -to-1 multiplexer can be used to implement a Boolean function of  $n$  (input) variables.
- However, we can use a single smaller  $2^{(n-1)}$ -to-1 multiplexer to implement a Boolean function of  $n$  (input) variables.
- Example: The function  $F(A, B, C) = \sum m(1, 3, 5, 6)$  can be implemented using a 4-to-1 multiplexer (rather than an 8-to-1 multiplexer).

4 to 1 multiplexer is the next smallest after 8 to 1

## 5. Using Smaller Multiplexers (3/6)

- Procedure

- Express Boolean function in sum-of-minterms form.  
Example:  $F(A, B, C) = \sum m(0, 1, 3, 6)$
- Reserve one variable (in our example, we take the least significant one) for input lines of multiplexer, and use the rest for selection lines.  
Example:  $C$  is for input lines;  $A$  and  $B$  for selection lines.

## 5. Using Smaller Multiplexers (2/6)

- Let's look at this example:

$$F(A, B, C) = \sum m(0, 1, 3, 6) = A' \cdot B' \cdot C' + A' \cdot B' \cdot C + A' \cdot B \cdot C + A \cdot B \cdot C'$$



- Note: Two of the variables,  $A$  and  $B$ , are applied as selection lines of the multiplexer, while the inputs of the multiplexer contain  $1$ ,  $C$ ,  $0$  and  $C'$ .

## 5. Using Smaller Multiplexers (4/6)

- Draw the truth table for function, by grouping inputs by selection line values, then determine multiplexer inputs by comparing input line ( $C$ ) and function ( $F$ ) for corresponding selection line values.

| A | B | C | F | MUX input |
|---|---|---|---|-----------|
| 0 | 0 | 0 | 1 | 1         |
| 0 | 0 | 1 | 1 |           |
| 0 | 1 | 0 | 0 | C         |
| 0 | 1 | 1 | 1 |           |
| 1 | 0 | 0 | 0 | 0         |
| 1 | 0 | 1 | 0 |           |
| 1 | 1 | 0 | 1 | C'        |
| 1 | 1 | 1 | 0 |           |



## 5. Using Smaller Multiplexers (5/6)

- Alternative: What if we use **A** for input lines, and **B, C** for selector lines?



do

## 5. Using Smaller Multiplexers (6/6)

- Example: Implement the function below with 74151A:

$$f(x_1, x_2, x_3, x_4) = \sum m(0, 1, 2, 3, 4, 9, 13, 14, 15)$$



**Sequential logic**  
Latches and Flip flops

- Logic Circuits
  - There are combinations (dependent on input only) and sequential logic circuits (dependent also on memory)



Multivibrator: a class of sequential circuits

- There are bistable (2 states), monostable, and astable

Bistable logic devices - latches and flip flops

## 2. Memory Elements (1/3)

- Memory element: a device which can remember value indefinitely, or change value on command from its inputs.



Characteristic table:

| Command (state) | Q(0) | Q(1) | Q(0) or Q(1) current state |
|-----------------|------|------|----------------------------|
| Set             | 0    | 1    | Q(1)                       |
| Reset           | 1    | 0    | Q(0)                       |
| Memorize        | 0    | 0    | Q(0)                       |
| No Change       | 1    | 1    | Q(1)                       |



## 2. Memory Elements (2/3)

- Memory element with clock.



- Clock is usually a square wave.



PC is edge triggered

## 2. Memory Elements (3/3)

- Two types of triggering/activation



- Pulse-triggered

- Latches

- ON = 1, OFF = 0

- Edge-triggered

- Flip-flops

- Positive edge-triggered (ON = from 0 to 1; OFF = other time)

- Negative edge-triggered (ON = from 1 to 0; OFF = other time)

Latches are only active when high state

LATCHES

### 3.1 S-R Latch (1/3)

- Two inputs: S and R.

- Two complementary outputs: Q and Q'.

- When Q = HIGH, we say latch is in SET state.

- When Q = LOW, we say latch is in RESET state.

- For active-high input S-R latch (also known as NOR gate latch)

- R = HIGH and S = LOW  $\rightarrow$  Q becomes LOW (RESET state)

- S = HIGH and R = LOW  $\rightarrow$  Q becomes HIGH (SET state)

- Both R and S are LOW  $\rightarrow$  No change in output Q

- Both R and S are HIGH  $\rightarrow$  Outputs Q and Q' are both LOW (invalid!)

- Drawback: invalid condition exists and must be avoided.



Block diagram:



### 3.1 S-R Latch (2/3)

- Active-high input S-R latch:



### 3.1 Gated S-R Latch

- S-R latch + enable input (EN) and 2 NAND gates  $\rightarrow$  a gated S-R latch.



- Outputs change (if necessary) only when EN is high.



### 3.2 Gated D Latch (1/2)

- Make input R equal to S  $\rightarrow$  gated D latch.

- D latch eliminates the undesirable condition of invalid state in the S-R latch.



### 3.2 Gated D Latch (2/2)

- When EN is high:
  - D = HIGH  $\rightarrow$  latch is SET
  - D = LOW  $\rightarrow$  latch is RESET

- Hence when EN is high, Q "follows" the D (data) input.

- Characteristic table:

| EN | D | Q(t+1)         |
|----|---|----------------|
| 0  | 0 | 0 Reset        |
| 1  | 1 | 1 Set          |
| 0  | X | Q(t) No change |
| 1  | 1 | indeterminate  |

When EN=1,  $Q(t+1) = ?$

FLIP FLOPS

#### 4. Flip-flops (1/2)

- Flip-flops are synchronous bistable devices.
- Output changes state at a specified point on a triggering input called the clock.
- Change state either at the positive (rising) edge, or at the negative (falling) edge of the clock signal.



#### 4. Flip-flops (2/2)

- S-R flip-flop, D flip-flop, and J-K flip-flop.
- Note the " $>$ " symbol at the clock input.



#### 4.1 S-R Flip-flop

- S-R flip-flop: On the triggering edge of the clock pulse,
  - $R = \text{HIGH}$  and  $S = \text{LOW} \rightarrow Q$  becomes LOW (RESET state)
  - $S = \text{HIGH}$  and  $R = \text{LOW} \rightarrow Q$  becomes HIGH (SET state)
  - Both  $R$  and  $S$  are LOW  $\rightarrow$  No change in output  $Q$
  - Both  $R$  and  $S$  are HIGH  $\rightarrow$  Invalid!

- Characteristic table of positive edge-triggered S-R flip-flop:

| S | R | CLK | Q(t+1) | Comments  |
|---|---|-----|--------|-----------|
| 0 | 0 | X   | Q(t)   | No change |
| 0 | 1 | +   | 0      | Reset     |
| 1 | 0 | +   | 1      | Set       |
| 1 | 1 | -   | -      | Invalid!  |

X = irrelevant ("don't care")  
+ = clock transition LOW to HIGH

#### 4.2 D Flip-flop (1/2)

- D flip-flop: Single input  $D$  (data). On the triggering edge of the clock pulse,

  - $D = \text{HIGH} \rightarrow Q$  becomes HIGH (SET state)
  - $D = \text{LOW} \rightarrow Q$  becomes LOW (RESET state)

Hence,  $Q$  "follows"  $D$  at the clock edge.

- Convert S-R flip-flop into a D flip-flop: add an inverter.



A positive edge-triggered D flip-flop formed with an S-R flip-flop.



#### 4.3 J-K Flip-flop (1/2)

- J-K flip-flop:  $Q$  and  $Q'$  are fed back to the pulse-steering NAND gates.
- No invalid state.
- Include a toggle state
  - $J = \text{HIGH}$  and  $K = \text{LOW} \rightarrow Q$  becomes HIGH (SET state)
  - $K = \text{HIGH}$  and  $J = \text{LOW} \rightarrow Q$  becomes LOW (RESET state)
  - Both  $J$  and  $K$  are LOW  $\rightarrow$  No change in output  $Q$
  - Both  $J$  and  $K$  are HIGH  $\rightarrow$  Toggle

#### 4.4 T Flip-flop

- T flip-flop: Single input version of the J-K flip-flop, formed by tying both inputs together.



- Characteristic table:

| T | CLK | Q(t+1) | Comments  |
|---|-----|--------|-----------|
| 0 | 0   | Q(t)   | No change |
| 1 | 0   | Q(t)   | Toggle    |
| 0 | 1   | 1      |           |
| 1 | 1   | 0      |           |

$$Q(t+1) = J \cdot Q' + K \cdot Q$$

#### 5. Asynchronous Inputs (1/2)

- S-R, D and J-K inputs are synchronous inputs, as data on these inputs are transferred to the flip-flop's output only on the triggered edge of the clock pulse.
- Asynchronous inputs affect the state of the flip-flop independent of the clock; example: preset (PRE) and clear (CLR) [direct set (SD) and direct reset (RD)].
- When  $PRE = \text{HIGH}$ ,  $Q$  is immediately set to HIGH.
- When  $CLR = \text{HIGH}$ ,  $Q$  is immediately cleared to LOW.
- Flip-flop in normal operation mode when both  $PRE$  and  $CLR$  are LOW.

#### 5. Asynchronous Inputs (2/2)

- A J-K flip-flop with active-low PRESET and CLEAR asynchronous inputs.



## 5. Asynchronous Inputs (1/2)

- S-R, D and J-K inputs are synchronous inputs, as data on these inputs are transferred to the flip-flop's output only on the triggered edge of the clock pulse.
- Asynchronous inputs affect the state of the flip-flop independent of the clock; example: preset (PRE) and clear (CLR) [or direct set (SD) and direct reset (RD)].
- When PRE=HIGH, Q is immediately set to HIGH.
- When CLR=HIGH, Q is immediately cleared to LOW.
- Flip-flop in normal operation mode when both PRE and CLR are LOW.

A J-K flip-flop with active-low PRESET and CLEAR asynchronous inputs.



Figure 1

## 6.2 Sequential Circuits: Analysis (1/7)

- Given a sequential circuit diagram, we can analyze its behaviour by deriving its state table and hence its state diagram.
- Requires state equations to be derived for the flip-flop inputs, as well as output functions for the circuit outputs other than the flip-flops (if any).
- We use  $A(t)$  and  $A(t+1)$  (or simply  $A$  and  $A^*$ ) to represent the present state and next state, respectively, of a flip-flop represented by  $A$ .

Figure 1

## 6.2 Sequential Circuits: Analysis (2/7)

- Example using D flip-flops



Figure 1

## 6.2 Sequential Circuits: Analysis (3/7)

- From the state equations and output function, we derive the state table, consisting of all possible binary combinations of present states and inputs.
- State table
  - Similar to truth table.
  - Inputs and present state on the left side.
  - Outputs and next state on the right side.
- $m$  flip-flops and  $n$  inputs  $\rightarrow 2^{m+n}$  rows.

## 6.2 Sequential Circuits: Analysis (4/7)

- State table** for circuit of Figure 1:

| State equations:               |       | Output function:       |        |
|--------------------------------|-------|------------------------|--------|
| $A^* = A \cdot x + B \cdot x'$ |       | $y = (A + B) \cdot x'$ |        |
| $B^* = A'$                     |       |                        |        |
| Present State                  | Input | Next State             | Output |
| $A$                            | $B$   | $x$                    | $A^*$  |
| 0 0                            | 0     | 0                      | 0      |
| 0 0                            | 1     | 0                      | 1      |
| 0 1                            | 0     | 0                      | 0      |
| 0 1                            | 1     | 1                      | 1      |
| 1 0                            | 0     | 0                      | 1      |
| 1 0                            | 1     | 1                      | 0      |
| 1 1                            | 0     | 0                      | 0      |
| 1 1                            | 1     | 1                      | 0      |

Figure 1

## 6.2 Sequential Circuits: Analysis (5/7)

- Alternative form of state table:

| Full table |   | Present State | Input | Next State | Output |
|------------|---|---------------|-------|------------|--------|
| A          | B | x             | $A^*$ | $B^*$      | y      |
| 0          | 0 | 0             | 0     | 0          | 0      |
| 0          | 0 | 1             | 0     | 1          | 1      |
| 0          | 1 | 0             | 0     | 0          | 0      |
| 0          | 1 | 1             | 1     | 1          | 1      |
| 1          | 0 | 0             | 1     | 0          | 1      |
| 1          | 0 | 1             | 1     | 0          | 0      |
| 1          | 1 | 0             | 0     | 0          | 0      |
| 1          | 1 | 1             | 1     | 0          | 1      |

  

| Compact table |  | Present State | Next State | Output |
|---------------|--|---------------|------------|--------|
| AB            |  | $x=0$         | $x=1$      | $x=0$  |
| 00            |  | 00            | 01         | 0 0    |
| 01            |  | 00            | 11         | 1 0    |
| 10            |  | 00            | 10         | 1 0    |
| 11            |  | 00            | 10         | 1 0    |

## 6.2 Sequential Circuits: Analysis (6/7)

- From the state table, we can draw the state diagram.
- State diagram
  - Each state is denoted by a circle.
  - Each arrow (between two circles) denotes a transition of the sequential circuit (a row in state table).
  - A label of the form  $a/b$  is attached to each arrow where  $a$  (if there is one) denotes the inputs while  $b$  (if there is one) denotes the outputs of the circuit in that transition.
- Each combination of the flip-flop values represents a state. Hence,  $m$  flip-flops  $\rightarrow$  up to  $2^m$  states.

## 6.2 Sequential Circuits: Analysis (7/7)

- State diagram** of the circuit of Figure 1:

| Present State |     | Next State | Output  |
|---------------|-----|------------|---------|
| $A$           | $B$ | $x=0$      | $x=1$   |
| 00            |     | $A'B^*$    | $A'B^*$ |
| 01            |     | 00         | 11      |
| 10            |     | 00         | 10      |
| 11            |     | 00         | 10      |

DONE!



## 6.2 Flip-flop Input Functions (1/3)

- The outputs of a sequential circuit are functions of the present states of the flip-flops and the inputs. These are described algebraically by the circuit output functions.
  - In Figure 1:  $y = (A + B) \cdot x'$
- The part of the circuit that generates inputs to the flip-flops are described algebraically by the flip-flop input functions (or flip-flop input equations).
- The flip-flop input functions determine the next state generation.
- From the flip-flop input functions and the characteristic tables of the flip-flops, we obtain the next states of the flip-flops.

## 6.2 Flip-flop Input Functions (2/3)

- Example: circuit with a JK flip-flop.
- We use 2 letters to denote each flip-flop input: the first letter denotes the input of the flip-flop ( $J$  or  $K$  for JK flip-flop,  $S$  or  $R$  for SR flip-flop,  $D$  for D flip-flop,  $T$  for T flip-flop) and the second letter denotes the name of the flip-flop.



QR

## 6.2 Analysis: Example #2 (3/3)

- Draw the state diagram from the state table.

| Present state | A | B | Input | x | Next state | A' | B' | Flip-flop inputs    |
|---------------|---|---|-------|---|------------|----|----|---------------------|
| 0             | 0 | 0 | 0     | 0 | 0          | 1  | 0  | JA = B<br>KA = B x' |
| 0             | 0 | 1 | 0     | 1 | 1          | 1  | 1  | JA = B<br>KA = B x' |
| 0             | 1 | 0 | 0     | 1 | 1          | 0  | 1  | JA = B<br>KA = B x' |
| 0             | 1 | 1 | 0     | 1 | 0          | 0  | 0  | JA = B<br>KA = B x' |
| 1             | 0 | 0 | 0     | 0 | 0          | 0  | 0  | JA = B<br>KA = B x' |
| 1             | 0 | 1 | 0     | 0 | 1          | 0  | 1  | JA = B<br>KA = B x' |
| 1             | 1 | 0 | 0     | 0 | 0          | 0  | 0  | JA = B<br>KA = B x' |
| 1             | 1 | 1 | 0     | 0 | 1          | 0  | 1  | JA = B<br>KA = B x' |



## 6.2 Analysis: Example #2 (1/3)

- Given Figure 2, a sequential circuit with two J-K flip-flops  $A$  and  $B$ , and one input  $x$ .



- Obtain the flip-flop input functions from the circuit:

$$JA = B \quad JB = x' \quad KA = B x' \quad KB = A' x + A x' = A \oplus x$$

QR

## 6.2 Analysis: Example #2 (2/3)

- Fill the state table using the above functions, knowing the characteristics of the flip-flops used.

| Q | A | B | x | Q <sup>n+1</sup> | Combinatorial |    | Flip-flop Inputs    |
|---|---|---|---|------------------|---------------|----|---------------------|
|   |   |   |   |                  | A'            | B' |                     |
| 0 | 0 | 0 | 0 | 0                | 0             | 0  | JA = B<br>KA = B x' |
| 0 | 0 | 1 | 0 | 1                | 0             | 1  | JA = B<br>KA = B x' |
| 0 | 1 | 0 | 0 | 1                | 1             | 0  | JA = B<br>KA = B x' |
| 0 | 1 | 1 | 0 | 0                | 0             | 1  | JA = B<br>KA = B x' |
| 1 | 0 | 0 | 0 | 0                | 0             | 0  | JA = B<br>KA = B x' |
| 1 | 0 | 1 | 0 | 1                | 0             | 1  | JA = B<br>KA = B x' |
| 1 | 1 | 0 | 0 | 0                | 1             | 0  | JA = B<br>KA = B x' |
| 1 | 1 | 1 | 0 | 1                | 1             | 0  | JA = B<br>KA = B x' |

For J/K flip flop, must use table to figure out what is next state

QR

## 6.2 Analysis: Example #3 (1/3)

### 6.2 Analysis: Example #3 (1/3)

- Derive the state table and state diagram of this circuit.



## 6.2 Analysis: Example #3 (2/3)

- Flip-flop input functions:

$$JA = B \quad JB = KB = (A \oplus B)x' = A x + A' x'$$

KA = B'

- State table:

| Present state | A | B | Input | x | Next state | A' | B' | Flip-flop inputs    |
|---------------|---|---|-------|---|------------|----|----|---------------------|
| 0             | 0 | 0 | 0     | 0 | 0          | 1  | 0  | JA = B<br>KA = B' x |
| 0             | 0 | 1 | 0     | 1 | 1          | 1  | 1  | JA = B<br>KA = B' x |
| 0             | 1 | 0 | 0     | 1 | 1          | 0  | 1  | JA = B<br>KA = B' x |
| 0             | 1 | 1 | 0     | 1 | 0          | 1  | 0  | JA = B<br>KA = B' x |
| 1             | 0 | 0 | 0     | 0 | 0          | 0  | 0  | JA = B<br>KA = B' x |
| 1             | 0 | 1 | 0     | 1 | 0          | 1  | 1  | JA = B<br>KA = B' x |
| 1             | 1 | 0 | 1     | 1 | 1          | 0  | 0  | JA = B<br>KA = B' x |
| 1             | 1 | 1 | 1     | 0 | 1          | 0  | 1  | JA = B<br>KA = B' x |

QR

## 6.3 Flip-flop Excitation Tables (1/2)

- Excitation tables: given the required transition from present state to next state, determine the flip-flop input(s).

| Q <sup>n</sup> | Q <sup>n+1</sup> | JK Flip-flop |   | SR Flip-flop |   | D Flip-flop |   | T Flip-flop |   |
|----------------|------------------|--------------|---|--------------|---|-------------|---|-------------|---|
|                |                  | J            | K | S            | R | D           | T | D           | T |
| 0              | 0                | 0            | 0 | 0            | 0 | 0           | 0 | 0           | 0 |
| 0              | 1                | 1            | X | 0            | 1 | 0           | 1 | 1           | 1 |
| 1              | 0                | X            | 1 | 1            | 0 | 1           | 0 | 0           | 0 |
| 1              | 1                | X            | 0 | 0            | 1 | 1           | 1 | 1           | 1 |

QR

## 6.4 Sequential Circuits: Design

- Design procedure:

- Start with circuit specifications – description of circuit behaviour, usually a state diagram or state table.
- Derive the state table.
- Perform state reduction if necessary.
- Perform state assignment.
- Determine number of flip-flops and label them.
- Choose the type of flip-flop to be used.
- Derive circuit excitation and output tables from the state table.
- Derive circuit output functions and flip-flop input functions.
- Draw the logic diagram.

QR

## 6.4 Design: Example #1 (2/5)

- Circuit state/excitation table, using JK flip-flops.



## 6.4 Design: Example #1 (4/5)

- From state table, get flip-flop input functions.

| Present state | Input | Next state | Flip-flop inputs |    |    |    |    |
|---------------|-------|------------|------------------|----|----|----|----|
| A             | B     | $A'$       | $B'$             | JA | KA | JB | KB |
| 0             | 0     | 0          | 0                | 0  | 0  | 0  | 0  |
| 0             | 0     | 1          | 1                | 0  | 1  | 0  | 1  |
| 0             | 1     | 0          | 1                | 1  | 0  | 1  | 1  |
| 0             | 1     | 1          | 0                | 1  | 1  | 0  | 0  |
| 1             | 0     | 1          | 0                | 0  | 0  | 0  | 0  |
| 1             | 0     | 0          | 1                | 1  | 0  | 1  | 1  |
| 1             | 1     | 1          | 0                | 1  | 1  | 1  | 0  |
| 1             | 1     | 0          | 1                | 0  | 1  | 1  | 1  |
| 1             | 1     | 1          | 1                | 0  | 0  | 0  | 1  |
| 1             | 1     | 0          | 0                | 0  | 0  | 1  | 0  |
| 1             | 1     | 1          | 1                | 0  | 0  | 1  | 1  |

Equations:  
 $JA = Bx'$   
 $KA = Bx$   
 $KB = (A \oplus B)'$

## 6.4 Design: Example #1 (5/5)

- Flip-flop input functions:

$$JA = Bx'$$

$$KA = Bx$$

$$KB = (A \oplus B)'$$

Logic diagram:



## 6.4 Design: Example #2 (1/3)

$J_1 = x_1, J_2 = x_2, J_3 = x_3$

- Using D flip-flops, design the circuit based on the state table below. (Exercise: Design it using JK flip-flops.)

| Present state | Input | Next state | Output |      |   |
|---------------|-------|------------|--------|------|---|
| A             | B     | $x$        | $A'$   | $B'$ | y |
| 0             | 0     | 0          | 0      | 0    | 0 |
| 0             | 0     | 1          | 0      | 1    | 1 |
| 0             | 1     | 0          | 1      | 0    | 0 |
| 0             | 1     | 1          | 0      | 1    | 1 |
| 1             | 0     | 0          | 1      | 0    | 0 |
| 1             | 0     | 1          | 1      | 1    | 1 |
| 1             | 1     | 0          | 1      | 0    | 0 |
| 1             | 1     | 1          | 0      | 0    | 0 |

Since its D flip flop,

## 6.4 Design: Example #3 (1/4)

- Design involving unused states.

| Present state | Input | Next state | Output |      |   |
|---------------|-------|------------|--------|------|---|
| A             | B     | $x$        | $A'$   | $B'$ | y |
| 0             | 0     | 0          | 0      | 0    | 0 |
| 0             | 0     | 1          | 0      | 1    | 1 |
| 0             | 1     | 0          | 1      | 0    | 0 |
| 0             | 1     | 1          | 0      | 1    | 1 |
| 1             | 0     | 0          | 1      | 0    | 0 |
| 1             | 0     | 1          | 1      | 1    | 1 |
| 1             | 1     | 0          | 1      | 0    | 0 |
| 1             | 1     | 1          | 0      | 0    | 0 |

Given these  
Unused state 000  
 $y = B'x$

Derive these  
Are there other unused states?  
 $y = B'x$

## 6.4 Design: Example #3 (3/4)

- From state table, obtain expressions for flip-flop inputs (cont'd).

| Present state | Input | Next state | Output |      |   |
|---------------|-------|------------|--------|------|---|
| A             | B     | $x$        | $A'$   | $B'$ | y |
| 0             | 0     | 0          | 0      | 0    | 0 |
| 0             | 0     | 1          | 0      | 1    | 1 |
| 0             | 1     | 0          | 1      | 0    | 0 |
| 0             | 1     | 1          | 0      | 1    | 1 |
| 1             | 0     | 0          | 1      | 0    | 0 |
| 1             | 0     | 1          | 1      | 1    | 1 |
| 1             | 1     | 0          | 1      | 0    | 0 |
| 1             | 1     | 1          | 0      | 0    | 0 |

## 7. Memory (1/4)

- Memory stores programs and data.

### Definitions:

- 1 byte = 8 bits
- 1 word: in multiple of bytes, a unit of transfer between main memory and registers, usually size of register.
- 1 KB (kilo-bytes) =  $2^{10}$  bytes; 1 MB (mega-bytes) =  $2^{20}$  bytes; 1 GB (giga-bytes) =  $2^{30}$  bytes; 1 TB (tera-bytes) =  $2^{40}$  bytes.

- Desirable properties: fast access, large capacity, economical cost, non-volatile.

- However, most memory devices do not possess all these properties.

## 6.4 Design: Example #1 (4/5)

- From state table, get flip-flop input functions.

| Present state | Input | Next state | Flip-flop inputs |      |    |    |    |    |
|---------------|-------|------------|------------------|------|----|----|----|----|
| A             | B     | $x$        | $A'$             | $B'$ | JA | KA | JB | KB |
| 0             | 0     | 0          | 0                | 0    | 0  | 0  | 0  | 0  |
| 0             | 0     | 1          | 0                | 1    | 0  | 1  | 0  | 1  |
| 0             | 1     | 0          | 1                | 0    | 1  | 0  | 1  | 0  |
| 0             | 1     | 1          | 0                | 1    | 1  | 0  | 1  | 1  |
| 1             | 0     | 0          | 1                | 0    | 0  | 0  | 0  | 0  |
| 1             | 0     | 1          | 1                | 1    | 1  | 1  | 1  | 1  |
| 1             | 1     | 0          | 1                | 0    | 0  | 1  | 1  | 0  |
| 1             | 1     | 1          | 0                | 0    | 0  | 0  | 0  | 1  |

Equations:  
 $JA = Bx'$   
 $KA = Bx$   
 $KB = (A \oplus B)'$

## 6.4 Design: Example #2 (2/3)

- Determine expressions for flip-flop inputs and the circuit output.

## 6.4 Design: Example #2 (3/3)

- From derived expressions, draw logic diagram:



## 6.4 Design: Example #3 (2/4)

- From state table, obtain expressions for flip-flop inputs.

| Present state | Input | Next state | Output |      |   |
|---------------|-------|------------|--------|------|---|
| A             | B     | $x$        | $A'$   | $B'$ | y |
| 0             | 0     | 0          | 0      | 0    | 0 |
| 0             | 0     | 1          | 0      | 1    | 1 |
| 0             | 1     | 0          | 1      | 0    | 0 |
| 0             | 1     | 1          | 0      | 1    | 1 |
| 1             | 0     | 0          | 1      | 0    | 0 |
| 1             | 0     | 1          | 1      | 1    | 1 |
| 1             | 1     | 0          | 1      | 0    | 0 |
| 1             | 1     | 1          | 0      | 0    | 0 |

## 6.4 Design: Example #3 (3/4)

- From derived expressions, obtain expressions for flip-flop inputs.

| Present state | Input | Next state | Output |      |   |
|---------------|-------|------------|--------|------|---|
| A             | B     | $x$        | $A'$   | $B'$ | y |
| 0             | 0     | 0          | 0      | 0    | 0 |
| 0             | 0     | 1          | 0      | 1    | 1 |
| 0             | 1     | 0          | 1      | 0    | 0 |
| 0             | 1     | 1          | 0      | 1    | 1 |
| 1             | 0     | 0          | 1      | 0    | 0 |
| 1             | 0     | 1          | 1      | 1    | 1 |
| 1             | 1     | 0          | 1      | 0    | 0 |
| 1             | 1     | 1          | 0      | 0    | 0 |

$$SA = Bx$$

$$SB = A'B'x$$

$$SC = Cx$$

$$RD = Cx'$$

$$RA = Cx'$$

$$RB = B'C + Bx$$

$$RC = x$$

$$y = Ax$$

$$AB = B'C + Bx$$

## 7. Memory (2/4)

### Memory hierarchy



## 7. Memory (3/4)



Memory Address Register  
Memory Data Register  
1 for read/ 0 for write

## 7.2 Read/Write Operations

### Write operation:

- Transfers the address of the desired word to the address lines.
- Transfers the data bits (the word) to be stored in memory to the data input lines.
- Activates the Write control line (set Read/Write to 0).

### Read operation:

- Transfers the address of the desired word to the address lines.
- Activates the Read control line (set Read/Write to 1).

| Memory Enable | Read/Write | Memory Operation        |
|---------------|------------|-------------------------|
| 0             | X          | None                    |
| 1             | 0          | Write to selected word  |
| 1             | 1          | Read from selected word |

## 7. Memory (4/4)

- A memory unit stores binary information in groups of bits called **words**.
- The data consists of  $n$  lines (for  $n$ -bit words). **Data input lines** provide the information to be stored (**written**) into the memory, while **data output lines** carry the information **out (read)** from the memory.
- The **address** consists of  $k$  lines which specify which word (among the  $2^k$  words available) to be selected for reading or writing.
- The control lines **Read** and **Write** (usually combined into a single control line **Read/Write**) specifies the direction of transfer of the data.



## 7.3 Memory Cell

- Two types of RAM
  - Static RAMs use flip-flops as the memory cells.
  - Dynamic RAMs use capacitor charges to represent data. Though simpler in circuitry, they have to be constantly refreshed.
- A single memory cell of the static RAM has the following logic and block diagrams:



## 7.4 Memory Arrays (1/4)

- Logic construction of a  $4 \times 3$  RAM (with decoder and OR gates):



- An array of RAM chips: memory chips are combined to form larger memory.

- A  $1K \times 8$  RAM chip:



Block diagram of a 1K x 8 RAM chip

## 7.4 Memory Arrays (3/4)

- $4K \times 8$  RAM.



First 2 bit determine which ram to activate

## 7.4 Memory Arrays (4/4)



= 2M x 32 memory module

= Using 512K x 8 memory chips.

# Week 11

Monday, November 6, 2023 9:35 PM

## Pipelining

### 1. Introduction: Pipelining Lessons

- Pipelining doesn't help latency of single task:
  - It helps the throughput of entire workload
- Multiple** tasks operating simultaneously using different resources
- Possible delays:
  - Pipeline rate limited by **slowest** pipeline stage
  - Stall for **dependencies**

### 2. MIPS Pipeline Stages (1/2)

- Five** Execution Stages
  - IF: Instruction Fetch
  - ID: Instruction Decode and Register Read
  - EX: Execute an operation or calculate an address
  - MEM: Access an operand in data memory →  $lw/sw$
  - WB: Write back the result into a register →  $R_f/I_f$
- Idea:**  $IF/ID/EX/MEM/WB$
- Each execution stage takes 1 clock cycle
- General flow of data is from one stage to the next

#### Exceptions:

- Update of PC and write back of register file – more about this later...



### 2. Pipelined Execution: Illustration



It took 5 clock cycles for first instruction to complete, thereafter 1 instruction completes at each subsequent clock cycle

### 3. MIPS Pipeline: Datapath (1/3)

- Single-cycle** implementation:
  - Update all state elements (PC, register file, data memory) at the end of a clock cycle
- Pipelined** implementation:
  - One cycle per pipeline stage
  - Data required for each stage needs to be stored separately (why?)

### 3. MIPS Pipeline: Datapath (2/3)

- Data used by **subsequent instructions**:
  - Store in programmer-visible state elements: PC, register file and memory
- Data used by **same instruction** in later pipeline stages:
  - Additional registers in datapath called **pipeline registers**
  - IF/ID: register between IF and ID
  - ID/EX: register between ID and EX
  - EX/MEM: register between EX and MEM
  - MEM/WB: register between MEM and WB

Why no register at the end of WB stage?

### 3. MIPS Pipeline: Datapath (3/3)





Pipeline datapath & registers

### 3. Pipeline Datapath: IF Stage



- At the end of a cycle, **IF/ID** receives (stores):
  - Instruction read from InstructionMemory[ PC ]
  - PC + 4
- PC + 4
  - Also connected to one of the MUX's inputs (another coming later)

### 3. Pipeline Datapath: ID Stage



| At the beginning of a cycle<br><b>ID/EX</b> register supplies:                                                                                                           | At the end of a cycle<br><b>IP/EX</b> receives:                                                                                       |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| <ul style="list-style-type: none"> <li>Register numbers for reading two registers</li> <li>16-bit offset to be sign-extended to 32-bit</li> <li><u>PC + 4</u></li> </ul> | <ul style="list-style-type: none"> <li>Data values read from register file</li> <li>32-bit immediate value</li> <li>PC + 4</li> </ul> |

### 3. Pipeline Datapath: EX Stage



| At the beginning of a cycle<br><b>ID/EX</b> register supplies:                                                                               | At the end of a cycle<br><b>EX/MEM</b> receives:                                                                                                                        |
|----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul style="list-style-type: none"> <li>Data values read from register file</li> <li>32-bit immediate value</li> <li><u>PC + 4</u></li> </ul> | <ul style="list-style-type: none"> <li>(PC + 4) + (Immediate x 4)</li> <li>ALU result</li> <li><u>isZero?</u> signal</li> <li>Data Read 2 from register file</li> </ul> |

### 3. Pipeline Datapath: MEM Stage



| At the beginning of a cycle<br><b>EX/MEM</b> register supplies:                                                                                                         | At the end of a cycle<br><b>MEM/WB</b> receives:                                       |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------|
| <ul style="list-style-type: none"> <li>(PC + 4) + (Immediate x 4)</li> <li>ALU result</li> <li><u>isZero?</u> signal</li> <li>Data Read 2 from register file</li> </ul> | <ul style="list-style-type: none"> <li>ALU result</li> <li>Memory read data</li> </ul> |

### 3. Pipeline Datapath: WB Stage



| At the beginning of a cycle<br><b>MEM/WB</b> register supplies:                        | At the end of a cycle                                                                                                                              |
|----------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul style="list-style-type: none"> <li>ALU result</li> <li>Memory read data</li> </ul> | <ul style="list-style-type: none"> <li>Result is written back to register file (if applicable)</li> <li><u>There is a bug here.....</u></li> </ul> |

Write back is not passed in the timeline

### 3. Corrected Datapath (1/2)

- Observe the "Write register" number
  - Supplied by the **IF/ID** pipeline register
  - It is NOT the correct write register for the instruction now in **WB** stage!
- Solution:**
  - Pass "Write register" number from **ID/EX** through **EX/MEM** to **MEM/WB** pipeline register for use in **WB** stage
  - i.e. let the "Write register" number follow the instruction through the pipeline until it is needed in **WB** stage

### 3. Corrected Datapath (2/2)



Control path changes  
Each control signal belongs to a particular pipeline stage

### 4. Pipeline Control: Grouping

= Group control signals according to pipeline stage

|        | RegDst | ALUSrc | MemTo Reg | Reg Write | Mem Read | Mem Write | Branch | ALUop |     |
|--------|--------|--------|-----------|-----------|----------|-----------|--------|-------|-----|
| R-type | 1      | 0      | 0         | 1         | 0        | 0         | 0      | op1   | op0 |
| lw     | 0      | 1      | 1         | 1         | 1        | 0         | 0      | 0     | 0   |
| sw     | X      | 1      | X         | 0         | 0        | 1         | 0      | 0     | 0   |
| beq    | X      | 0      | X         | 0         | 0        | 0         | 1      | 0     | 1   |

  

|        | EX Stage |        |         | MEM Stage |           |        | WB Stage  |           |   |
|--------|----------|--------|---------|-----------|-----------|--------|-----------|-----------|---|
|        | RegDst   | ALUSrc | ALUop   | Mem Read  | Mem Write | Branch | MemTo Reg | Req Write |   |
|        |          |        | op1 op0 |           |           |        |           |           |   |
| R-type | 1        | 0      | 1 0     | 0         | 0         | 0      | 0         | 0         | 1 |
| lw     | 0        | 1      | 0 0     | 1         | 0         | 0      | 1         | 1         |   |
| sw     | X        | 1      | 0 0     | 0         | 1         | 0      | X         | 0         |   |
| beq    | X        | 0      | 0 1     | 0         | 0         | 1      | X         | 0         |   |

### 4. Pipeline Control: Implementation



### 4. Pipeline Control: Datapath and Control



### 5. Different Implementations



## 5. Comparison of Performance (1/2)

| Single-Cycle Processor                                                    |  |
|---------------------------------------------------------------------------|--|
| Cycle time: $CT_{\text{seq}} = \max(T_1, T_2, \dots, T_N)$                |  |
| - $T_k$ = Time for operation in Stage $k$                                 |  |
| - $N$ = Number of stages                                                  |  |
| Execution Time for $I$ instructions:                                      |  |
| - $Time_{\text{seq}} = I \times CT_{\text{seq}}$                          |  |
| Cycle time = 8ns                                                          |  |
| Time to execute 100 instructions = $100 \times 8\text{ns} = 800\text{ns}$ |  |

## Multicore Processor

| (Cycle time: $CT_{\text{multicore}} = \frac{I}{N} \times T_0$ )                        |  |
|----------------------------------------------------------------------------------------|--|
| Execution Time for $I$ instructions:                                                   |  |
| - $Time_{\text{multicore}} = I \times \text{Average CPI} \times CT_{\text{multicore}}$ |  |
| - Average CPI needed as each instruction takes different number of cycles              |  |

## 5. Comparison of Performance (2/2)

### Single-Cycle Processor

- Cycle time:  $CT_{\text{seq}} = \max(T_1, T_2, \dots, T_N)$
- $T_k$  = Time for operation in Stage  $k$
- $N$  = Number of stages
- Execution Time for  $I$  instructions:
- $Time_{\text{seq}} = I \times CT_{\text{seq}}$

| Instruction   | Inst Mem | Reg read | ALU | Data Mem | Reg write | Total |
|---------------|----------|----------|-----|----------|-----------|-------|
| ALU (eg: add) | 2        | 1        | 2   | 1        | 1         | 6     |
| lw            | 2        | 1        | 2   | 2        | 1         | 8     |
| sw            | 2        | 1        | 2   | 2        | 1         | 7     |
| beq           | 2        | 1        | 2   | 1        | 1         | 6     |

Assume 100 instructions.

Cycle time = 8ns

Time to execute 100 instructions =  $100 \times 8\text{ns} = 800\text{ns}$

### Pipelining Processor

- Cycle time:  $CT_{\text{pipeline}} = \max(T_k) + T_d$
- $T_d$  = Overhead for pipelining, e.g. pipeline register
- Cycles needed for  $I$  instructions:
- $I + N - 1$  (need  $N - 1$  cycles to fill up the pipeline)
- Execution Time for  $I$  instructions:
- $Time_{\text{pipeline}} = (I + N - 1) \times CT_{\text{pipeline}}$

Assume pipeline register latency = 0.5ns

Cycle time = 2ns + 0.5ns = 2.5ns

Time to execute 100 instructions =  $(100 \times 5 - 1) \times 2.5\text{ns} = 260\text{ns}$

$\approx 260\text{ns}$

## 5. Pipeline Processor: Ideal Speedup

- Assumptions for ideal case:
  - Every stage takes the same amount of time  $\rightarrow \sum_{k=1}^N T_k = N \times T_1$
  - No pipeline overhead  $\rightarrow T_d = 0$
  - $I \gg N$  (Number of instructions is much larger than number of stages)

$$\begin{aligned} \text{Speedup}_{\text{pipeline}} &= \frac{\text{Time}_{\text{seq}}}{\text{Time}_{\text{pipeline}}} \\ &= \frac{I \times \sum_{k=1}^N T_k}{(I+N-1) \times (\max(T_k) + T_d)} = \frac{(I \times N \times T_1)}{(I+N-1) \times T_1} \\ &\approx \frac{N \times T_1}{T_1} \\ &= N \end{aligned}$$

**Conclusion:**  
Pipeline processor can gain  $N$  times speedup, where  $N$  is the number of pipeline stages.

### Review Question

- Given this code:
  - add \$t0, \$s0, \$s1
  - sub \$t1, \$s0, \$s1
  - sll \$t2, \$s0, 2
  - srl \$t3, \$s1, 2
- a) How many cycles will it take to execute the code on a single-cycle datapath?
- b) How long will it take to execute the code on a single-cycle datapath, assuming a 100 MHz clock?
- c) How many cycles will it take to execute the code on a 5-stage MIPS pipeline?  $(5 + 1) \times 4 = 24$
- d) How long will it take to execute the code on a 5-stage MIPS pipeline, assuming a 500 MHz clock?

1sec =  $10^9$  ns

Pipelining can potentially speed up by 5

### • Pipeline hazards

#### 1. Pipeline Hazards

- Speedup from pipeline implementation:
  - Based on the assumption that a new instruction can be "pumped" into pipeline every cycle
- However, there are **pipeline hazards**.
  - Problems that prevent next instruction from immediately following previous instruction
  - Structural hazards:** RF | M | ALU
    - Simultaneous use of a hardware resource
  - Data hazards:**
    - Data dependencies between instructions
  - Control hazards:**
    - Change in program flow



## 1. Graphical Notation for Pipeline



- = Horizontal = the stages of an instruction
- = Vertical = the instructions in different pipeline stages

Structural Hazard

## 2. Structural Hazard: Example

- If there is only a single memory module:



## 2. Solution 1: Stall the Pipeline



## 2. Solution 2: Separate Memory

- Split memory into **Data** and **Instruction** memory



## 2. Quiz (1/2)

Is there another conflict?



## 2. Quiz (2/2)

Recall that registers are very fast memory.

Solution: Split cycle into **half**; first half for writing into a register; second half for reading from a register.



Data dependency

### 3. Instruction Dependencies

- Instructions can have relationship that prevent pipeline execution:
  - Although a partial overlap maybe possible in some cases
- When different instructions access (read/write) the same register
  - Register contention is the cause of dependency
  - Known as **data dependency**
- When the execution of an instruction depends on another instruction
  - Control flow is the cause of dependency
  - Known as **control dependency**
- Failure to handle dependencies can affect **program correctness!**

#### 3. Data Dependency: RAW

##### "Read-After-Write" Definition:

- Occurs when a later instruction **reads** from the destination register **written** by an earlier instruction
- Also known as **true data dependency**



##### Effect of incorrect execution:

- If i2 reads register \$1 before i1 can write back the result, i2 will get a **stale result (old result)**

#### 3. Other Data Dependencies

- Similarly, we have:
  - WAR**: Write-after-Read dependency
  - WAW**: Write-after-Write dependency
- Fortunately, these dependencies **do not cause any pipeline hazards**
- They affect the processor only when instructions are executed out of program order:
  - i.e. in Modern SuperScalar Processor



#### 4. RAW Dependency: Hazards?

- Suppose we are executing the following code fragment:

```
sub $2, $1, $3    #i1
and $12, $2, $5   #i2
or $13, $6, $2    #i3
add $14, $2, $2   #i4
sw $15, 100($2)  #i5
```

- Note the multiple uses of register \$2

- Question:
  - Which are the instructions require special handling?

#### 4. RAW Data Hazards

- Value from prior instruction is needed before write back



#### 4.1 RAW Data Hazards: Forwarding



#### 4.2 Data Hazards: LOAD Instruction Solution



### 4.3 Exercise #1

- How many cycles will it take to execute the following code on a 5-stage pipeline
  - without** forwarding?
  - with** forwarding?

```
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
```

### 4.3 Exercise #2

- How many cycles will it take to execute the following code on a 5-stage pipeline
  - without** forwarding?
  - with** forwarding?

```
lw $2, 20($3)
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
```

### 4.3 Exercise #1: Without Forwarding → 11-cycles

```
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
```

|     |   |   |   |   |   |   |   |   |   |    |    |
|-----|---|---|---|---|---|---|---|---|---|----|----|
| sub | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| and |   |   |   |   |   |   |   |   |   |    |    |
| or  |   |   |   |   |   |   |   |   |   |    |    |
| add |   |   |   |   |   |   |   |   |   |    |    |
| sw  |   |   |   |   |   |   |   |   |   |    |    |

### 4.3 Exercise #1: With Forwarding →

```
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
```

|     |   |   |   |   |   |   |   |   |   |    |    |
|-----|---|---|---|---|---|---|---|---|---|----|----|
| sub | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| and |   |   |   |   |   |   |   |   |   |    |    |
| or  |   |   |   |   |   |   |   |   |   |    |    |
| add |   |   |   |   |   |   |   |   |   |    |    |
| sw  |   |   |   |   |   |   |   |   |   |    |    |

### 4.3 Exercise #2: Without Forwarding → 11-cycles

```
lw $2, 20($3)
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
```

|     |   |   |   |   |   |   |   |   |   |    |    |
|-----|---|---|---|---|---|---|---|---|---|----|----|
| lw  | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| and |   |   |   |   |   |   |   |   |   |    |    |
| or  |   |   |   |   |   |   |   |   |   |    |    |
| add |   |   |   |   |   |   |   |   |   |    |    |
| sw  |   |   |   |   |   |   |   |   |   |    |    |

### 4.3 Exercise #2: With Forwarding → 10 cycles

```
lw $2, 20($3)
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
```

|     |   |   |   |   |   |   |   |   |   |    |    |
|-----|---|---|---|---|---|---|---|---|---|----|----|
| lw  | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| and |   |   |   |   |   |   |   |   |   |    |    |
| or  |   |   |   |   |   |   |   |   |   |    |    |
| add |   |   |   |   |   |   |   |   |   |    |    |
| sw  |   |   |   |   |   |   |   |   |   |    |    |

### 5. Control Dependency

- Definition:**
  - An instruction *j* is control dependent on *i* if *i* controls whether or not *j* executes
  - Typically *i* would be a branch instruction
- Example:**

```
    $3 = $5
11: beq $3, $5, label
12: add($1), $2, $4
```

Effect of incorrect execution:

  - If i2 is allowed to execute before i1 is determined, register \$1 maybe incorrectly changed!



### 5. Control Dependency: Example



### 6. Control Hazards: Stall Pipeline?



### 6. Control Hazards: Reducing the Penalty

- Branching is very common in code:
  - A 3-cycle stall penalty is too heavy!
- Many techniques invented to reduce the control hazard penalty:
  - Move branch decision calculation to earlier pipeline stage
    - Early Branch Resolution**
  - Guess the outcome before it is produced
    - Branch Prediction**
  - Do something useful while waiting for the outcome
    - Delayed Branching**

Early branch resolution

### 6.1 Reduce Stalls: Early Branch (1/3)

- Make decision in **ID** stage instead of **MEM**
  - Move branch target address calculation
  - Move register comparison → cannot use ALU for register comparison any more



### 6.1 Reduce Stalls: Early Branch (2/3)



### 6.1 Reduce Stalls: Early Branch (3/3)



- Wait until the branch decision is known:
  - Then fetch the correct instruction
- Reduced from 3 to **1 clock cycle delay**

### 6.1 Early Branch: Problems (2/3)

#### Solution:

- Add forwarding path from ALU to ID stage
- One clock cycle delay is still needed

<- problem is if s0 is produced in previous instruction. Since then s0 will not be ready by the time beq requires s0



### 6.1 Early Branch: Problems (3/3)

- Problem is worse with **load** followed by **branch**

#### Solution:

- MEM to ID forwarding and 2 more stall cycles!
- In this case, we ended up with 3 total stall cycles  
→ no improvement!



### Branch Prediction

-branch is not taken

## 6.2 Reduce Stalls: Branch Prediction

- There are many branch prediction schemes
  - We only cover the simplest in this course ☺
- Simple prediction:
  - All branches are assumed to be **not taken**
  - Fetch the successor instruction and start pumping it through the pipeline stages
- When the actual branch outcome is known:
  - Not taken:** Guessed correctly → No pipeline stall
  - Taken:** Guessed wrongly → Wrong instructions in the pipeline → **Flush** successor instruction from the pipeline

## 6.2 Branch Prediction: Correct Prediction



## 6.2 Branch Prediction: Wrong Prediction



### 6.2 Exercise #3: Branch Prediction

- How many cycles will it take to execute the following code on a 5-stage pipeline **with** forwarding and ...
  - \* without branch prediction? → What will branch be under?
  - \* with branch prediction (predict **not taken**)? → Assume branch is **not taken**
- addi \$s0, \$zero, 10  
Loop: addi \$s0, \$s0, -1  
bne \$s0, \$zero, Loop  
sub \$t0, \$t1, \$t2
- Decision making moved to ID stage
- Total instructions =  $1 + 10 \times 2 + 1 = 22$
- Ideal pipeline =  $4 + 22 = 26$  cycles

$$26 = (22 \text{ instructions} - 1) + 5 \text{ cycles}$$

### 6.2 Exercise #3: Without Branch Prediction

| 1                   | 2  | 3  | 4  | 5   | 6  | 7   | 8  | 9 | 10 | 11 |
|---------------------|----|----|----|-----|----|-----|----|---|----|----|
| → addi <sup>1</sup> | IF | ID | EX | MEM | WB |     |    |   |    |    |
| → addi <sup>2</sup> | IF | ID | EX | MEM | WB |     |    |   |    |    |
| → bne               |    |    | IF | ID  | EX | MEM | WB |   |    |    |
| → addi <sup>2</sup> |    |    | IF | ID  | EX | MEM | WB |   |    |    |

→ Data dependency between (addi \$s0, \$s0, -1) and bne incurs 1 cycle of delay. There are 10 iterations, hence 10 cycles of delay.  
 → Every bne incurs a cycle of delay to execute the next instruction. There are 10 iterations, hence 10 cycles of delay.  
 → Total number of cycles of delay = 20.  
 → Total execution cycles = 26 + 20 = 46 cycles.

### 6.2 Exercise #3: With Branch Prediction

| 1                   | 2  | 3  | 4  | 5   | 6  | 7   | 8   | 9  | 10 | 11 |
|---------------------|----|----|----|-----|----|-----|-----|----|----|----|
| → addi <sup>1</sup> | IF | ID | EX | MEM | WB |     |     |    |    |    |
| → addi <sup>2</sup> | IF | ID | EX | MEM | WB |     |     |    |    |    |
| → bne               |    |    | IF | ID  | EX | MEM | WB  |    |    |    |
| sub                 |    |    |    | IF  |    |     |     |    |    |    |
| → addi <sup>2</sup> |    |    |    | IF  | ID | EX  | MEM | WB |    |    |

Predict not taken.

- The data dependency remains, hence 10 cycles of delay for 10 iterations.
- In the first 9 iterations, the branch prediction is wrong, hence 1 cycle of delay.
- In the last iteration, the branch prediction is correct, hence saving 1 cycle of delay.
- Total number of cycles of delay = 19.
- Total execution cycles = 26 + 19 = 45 cycles.

Delayed branch

## 6.3 Reduce Stalls: Delayed Branch

- Observation:**
  - Branch outcome takes X number of cycles to be known
  - X cycles stall
- Idea:**
  - Move non-control dependent instructions into the X slots following a branch
  - Known as the **branch-delay slot**

## 6.3 Delayed Branch: Example

| Non-delayed branch                                                                                      | Delayed branch                                                                                          |
|---------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|
| <pre>or \$8, \$9, \$10 add \$1, \$2, \$3 sub \$4, \$5, \$6 beq \$1, \$4, Exit xor \$10, \$1, \$11</pre> | <pre>add \$1, \$2, \$3 sub \$4, \$5, \$6 beq \$1, \$4, Exit or \$8, \$9, \$10 xor \$10, \$1, \$11</pre> |

• Idea:

- Move **non-control dependent instructions** into the X slots following a branch
  - Known as the **branch-delay slot**
- These instructions are executed **regardless of the branch outcome**
- In our MIPS processor:
  - Branch-Delay slot = 1 (with the early branch)

Taken      Not taken



- The "or" instruction is moved into the delayed slot:
  - Get executed regardless of the branch outcome
  - Same behavior as the original code!

### 6.3 Delayed Branch: Observation

• Best case scenario

- There is an instruction **preceding the branch** which **can be moved** into the delayed slot
  - Program correctness must be preserved!

• Worst case scenario

- Such instruction cannot be found
  - Add a no-op (**nop**) instruction in the branch-delay slot
- Re-ordering** instructions is a common method of program optimization
  - Compiler must be smart enough to do this
  - Usually can find such an instruction at least 50% of the time

## 1. Data Transfer: The Big Picture



Registers are in the datapath of the processor. If operands are in memory we have to **load** them to processor (registers), operate on them, and **store** them back to memory.

## 1. Memory Technology Today: DRAM

### DDR SDRAM

- Double Data Rate
  - Synchronous Dynamic RAM
- The dominant memory technology in PC market
- Delivers memory on the positive and negative edge of a clock (double rate)
- Generations:
  - DDR (MemClkFreq x 2(double rate) x 8 words)
  - DDR2 (MemClkFreq x 2(multiplier) x 2 x 8 words)
  - DDR3 (MemClkFreq x 4(multiplier) x 2 x 8 words)
  - DDR4 (released in 2014)
  - DDR5 (in Q3 2021)



## 1. Faster Memory Technology: SRAM



|              | Capacity    | Latency     | Cost/GB      |
|--------------|-------------|-------------|--------------|
| Register     | 100s Bytes  | 20 ps       | \$\$\$\$     |
| SRAM         | 100s KB     | 0.5-5 ns    | \$\$\$       |
| DRAM         | 100s MB     | 50-70 ns    | \$           |
| Hard Disk    | 100s GB     | 5-20 ms     | Cents        |
| <b>Ideal</b> | <b>1 GB</b> | <b>1 ns</b> | <b>Cheap</b> |

## 1. Slow Memory Technology: Magnetic Disk



Typical high-end hard disk:  
Average Latency: 4 - 10 ms  
Capacity: 500-2000GB

## 2.1 Cache: Types of Locality

### Temporal Locality

- If an item is referenced, it will tend to be referenced again soon



### Spatial Locality

- If an item is referenced, nearby items will tend to be referenced soon

### Different Locality for

- Instructions
- Data

## 2.1 Working Set: Definition

### Set of locations accessed during $\Delta t$

- Different phases of execution may use different working sets

Our aim is to capture the working set and keep it in the memory closest to CPU



## 2.2 Two Aspects of Memory Access



### How to make SLOW main memory appear faster?

- Cache** – a small but fast SRAM near CPU
- Hardware managed:** Transparent to programmer

### How to make SMALL main memory appear bigger than it is?

- Virtual memory**
- OS managed:** Transparent to programmer
- Not in the scope of this module (covered in CS2106)

## 2.2 Memory Access Time: Terminology



### Hit: Data is in cache (e.g., X)

- Hit rate: Fraction of memory accesses that hit
- Hit time: Time to access cache

### Miss: Data is not in cache (e.g., Y)

- Miss rate = 1 - Hit rate
- Miss penalty: Time to replace cache block + hit time

Hit time < Miss penalty

## 2.2 Memory Access Time: Formula

### Average Access Time

$$= \text{Hit rate} \times \text{Hit Time} + (1 - \text{Hit rate}) \times \text{Miss penalty}$$

Example:

- Suppose our on-chip SRAM (cache) has **0.8 ns** access time, but the fastest DRAM (main memory) we can get has an access time of **10ns**. How high a **hit rate** do we need to sustain an average access time of **1ns**?

Let  $h$  be the desired hit rate.

$$1 = 0.8h + (1 - h) \times (10 + 0.8)$$

$$= 0.8h + 10.8 - 10.8h$$

$$10h + 0.8 \rightarrow h = 0.98$$

Hence we need a hit rate of **98%**.

### 3. Memory to Cache Mapping (1/2)

- Cache Block/Line:**
  - Unit of transfer between memory and cache
- Block size is typically one or more words
  - e.g.: 16-byte block  $\geq 4\text{-word block}$
  - $1 \text{ word} \geq 4 \text{ bytes}$
  - 32-byte block  $\geq 8\text{-word block}$
- Why is the block size bigger than word size?

### 3. Memory to Cache Mapping (2/2)



### 4. Direct Mapping Analogy



### 4. Direct Mapped Cache: Cache Index



### 4. Direct Mapped Cache: Cache Tag



### 4. Direct Mapped Cache: Mapping



### 4. Direct Mapped Cache: Cache Structure



Along with a data block (line), cache also contains the following administrative information (overheads):

1. **Tag** of the memory block
2. **Valid bit** indicating whether the cache line contains valid data

**When is there a cache hit?**

(Valid[Index] = TRUE) AND  
(Tag[index] = Tag[memory address])

### 4. Cache Mapping: Example



#### 4. Cache Circuitry: Example



#### 5. Reading Data: Setup

Given a direct mapped 16KB cache:

- 16-byte blocks x 1024 cache blocks

Trace the following memory accesses:

| Tag              | Index      | Offset |
|------------------|------------|--------|
| 0000000000000000 | 0000000001 | 0100   |
| 0000000000000000 | 0000000001 | 1100   |
| 0000000000000000 | 0000000001 | 0100   |
| 0000000000000010 | 0000000001 | 1000   |
| 0000000000000000 | 0000000001 | 0000   |

#### 5. Reading Data: Load #1-2

| Tag              | Index      | Offset |
|------------------|------------|--------|
| 0000000000000000 | 0000000001 | 0100   |

Step 2. Data in block 1 is invalid [Cold/Compulsory Miss]

| Index | Valid | Tag | Word0 Bytes 0-3 | Word1 Bytes 4-7 | Word2 Bytes 8-11 | Word3 Bytes 12-15 |
|-------|-------|-----|-----------------|-----------------|------------------|-------------------|
| 0     | 0     |     |                 |                 |                  |                   |
| 1     | 0     |     | A               | B               | C                | D                 |
| 2     | 0     |     |                 |                 |                  |                   |
| 3     | 0     |     |                 |                 |                  |                   |
| 4     | 0     |     |                 |                 |                  |                   |
| 5     | 0     |     |                 |                 |                  |                   |
| 1022  | 0     |     |                 |                 |                  |                   |
| 1023  | 0     |     |                 |                 |                  |                   |

#### 5. Reading Data: Load #1-3

| Tag              | Index      | Offset |
|------------------|------------|--------|
| 0000000000000000 | 0000000001 | 0100   |

Step 3. Load 16 bytes from memory; Set Tag and Valid bit

| Index | Valid | Tag | Word0 Bytes 0-3 | Word1 Bytes 4-7 | Word2 Bytes 8-11 | Word3 Bytes 12-15 |
|-------|-------|-----|-----------------|-----------------|------------------|-------------------|
| 0     | 0     |     |                 |                 |                  |                   |
| 1     | 1     | 0   | A               | B               | C                | D                 |
| 2     | 0     |     |                 |                 |                  |                   |
| 3     | 0     |     |                 |                 |                  |                   |
| 4     | 0     |     |                 |                 |                  |                   |
| 5     | 0     |     |                 |                 |                  |                   |
| 1022  | 0     |     |                 |                 |                  |                   |
| 1023  | 0     |     |                 |                 |                  |                   |

#### 5. Reading Data: Load #1-4

| Tag              | Index      | Offset |
|------------------|------------|--------|
| 0000000000000000 | 0000000001 | 0100   |

Step 4. Return Word1 (byte offset = 4) to Register

| Index | Valid | Tag | Word0 Bytes 0-3 | Word1 Bytes 4-7 | Word2 Bytes 8-11 | Word3 Bytes 12-15 |
|-------|-------|-----|-----------------|-----------------|------------------|-------------------|
| 0     | 0     |     |                 |                 |                  |                   |
| 1     | 1     | 0   | A               | B               | C                | D                 |
| 2     | 0     |     |                 |                 |                  |                   |
| 3     | 0     |     |                 |                 |                  |                   |
| 4     | 0     |     |                 |                 |                  |                   |
| 5     | 0     |     |                 |                 |                  |                   |
| 1022  | 0     |     |                 |                 |                  |                   |
| 1023  | 0     |     |                 |                 |                  |                   |

#### 5. Reading Data: Load #2-1

| Tag              | Index      | Offset |
|------------------|------------|--------|
| 0000000000000000 | 0000000001 | 1100   |

Step 1. Check Cache Block at index 1

| Index | Valid | Tag | Word0 Bytes 0-3 | Word1 Bytes 4-7 | Word2 Bytes 8-11 | Word3 Bytes 12-15 |
|-------|-------|-----|-----------------|-----------------|------------------|-------------------|
| 0     | 0     |     |                 |                 |                  |                   |
| 1     | 1     | 0   | A               | B               | C                | D                 |
| 2     | 0     |     |                 |                 |                  |                   |
| 3     | 0     |     |                 |                 |                  |                   |
| 4     | 0     |     |                 |                 |                  |                   |
| 5     | 0     |     |                 |                 |                  |                   |
| 1022  | 0     |     |                 |                 |                  |                   |
| 1023  | 0     |     |                 |                 |                  |                   |

#### 5. Reading Data: Load #2-2

| Tag              | Index      | Offset |
|------------------|------------|--------|
| 0000000000000000 | 0000000001 | 1100   |

Step 2. [Cache Block is Valid] AND [Tags match] → Cache hit!

| Index | Valid | Tag | Word0 Bytes 0-3 | Word1 Bytes 4-7 | Word2 Bytes 8-11 | Word3 Bytes 12-15 |
|-------|-------|-----|-----------------|-----------------|------------------|-------------------|
| 0     | 0     |     |                 |                 |                  |                   |
| 1     | 1     | 0   | A               | B               | C                | D                 |
| 2     | 0     |     |                 |                 |                  |                   |
| 3     | 0     |     |                 |                 |                  |                   |
| 4     | 0     |     |                 |                 |                  |                   |
| 5     | 0     |     |                 |                 |                  |                   |
| 1022  | 0     |     |                 |                 |                  |                   |
| 1023  | 0     |     |                 |                 |                  |                   |

#### 5. Reading Data: Load #2-3

| Tag              | Index      | Offset |
|------------------|------------|--------|
| 0000000000000000 | 0000000001 | 1100   |

Step 3. Return Word3 (byte offset = 12) to Register [Spatial Locality]

| Index | Valid | Tag | Word0 Bytes 0-3 | Word1 Bytes 4-7 | Word2 Bytes 8-11 | Word3 Bytes 12-15 |
|-------|-------|-----|-----------------|-----------------|------------------|-------------------|
| 0     | 0     |     |                 |                 |                  |                   |
| 1     | 1     | 0   | A               | B               | C                | D                 |
| 2     | 0     |     |                 |                 |                  |                   |
| 3     | 0     |     |                 |                 |                  |                   |
| 4     | 0     |     |                 |                 |                  |                   |
| 5     | 0     |     |                 |                 |                  |                   |
| 1022  | 0     |     |                 |                 |                  |                   |
| 1023  | 0     |     |                 |                 |                  |                   |

#### 5. Reading Data: Load #3-1

| Tag              | Index      | Offset |
|------------------|------------|--------|
| 0000000000000000 | 0000000001 | 0100   |

Step 1. Check Cache Block at index 3

| Index | Valid | Tag | Word0 Bytes 0-3 | Word1 Bytes 4-7 | Word2 Bytes 8-11 | Word3 Bytes 12-15 |
|-------|-------|-----|-----------------|-----------------|------------------|-------------------|
| 0     | 0     |     |                 |                 |                  |                   |
| 1     | 0     |     | A               | B               | C                | D                 |
| 2     | 0     |     |                 |                 |                  |                   |
| 3     | 1     | 0   | I               | J               | K                | L                 |
| 4     | 0     |     |                 |                 |                  |                   |
| 5     | 0     |     |                 |                 |                  |                   |
| 1022  | 0     |     |                 |                 |                  |                   |
| 1023  | 0     |     |                 |                 |                  |                   |

#### 5. Reading Data: Load #3-3

| Tag              | Index      | Offset |
|------------------|------------|--------|
| 0000000000000000 | 0000000001 | 0100   |

Step 3. Load 16 bytes from memory; Set Tag and Valid bit

| Index | Valid | Tag | Word0 Bytes 0-3 | Word1 Bytes 4-7 | Word2 Bytes 8-11 | Word3 Bytes 12-15 |
|-------|-------|-----|-----------------|-----------------|------------------|-------------------|
| 0     | 0     |     |                 |                 |                  |                   |
| 1     | 1     | 0   | A               | B               | C                | D                 |
| 2     | 0     |     |                 |                 |                  |                   |
| 3     | 1     | 0   | I               | J               | K                | L                 |
| 4     | 0     |     |                 |                 |                  |                   |
| 5     | 0     |     |                 |                 |                  |                   |
| 1022  | 0     |     |                 |                 |                  |                   |
| 1023  | 0     |     |                 |                 |                  |                   |

#### 5. Reading Data: Load #4-3

| Tag              | Index      | Offset |
|------------------|------------|--------|
| 0000000000000000 | 0000000001 | 0100   |

Step 4. Return Word1 (byte offset = 4) to Register

| Index | Valid | Tag | Word0 Bytes 0-3 | Word1 Bytes 4-7 | Word2 Bytes 8-11 | Word3 Bytes 12-15 |
|-------|-------|-----|-----------------|-----------------|------------------|-------------------|
| 0     | 0     |     |                 |                 |                  |                   |
| 1     | 1     | 2   | E               | F               | G                | H                 |
| 2     | 0     |     |                 |                 |                  |                   |
| 3     | 1     | 0   | I               | J               | K                | L                 |
| 4     | 0     |     |                 |                 |                  |                   |
| 5     | 0     |     |                 |                 |                  |                   |
| 1022  | 0     |     |                 |                 |                  |                   |
| 1023  | 0     |     |                 |                 |                  |                   |

## 5. Reading Data: Load #5-1



## 5. Reading Data: Load #5-2

= Load from 00000000000000000000000000000001 0000

**Step 2. Cache block is Valid but Tags mismatch [Cold miss]**



## 5. Reading Data: Load #5-3

= Load from 00000000000000000000000000000001 0000

**Step 3. Replace block 1 with new data: Set Tag**



## 5. Reading Data: Load #5-4

= Load from 00000000000000000000000000000001 0000

**Step 4. Return Word0 (byte offset = 0) to Register**



## 5. Reading Data: Summary



## 6. Types of Cache Misses

- Compulsory misses**
  - On the first access to a block; the block must be brought into the cache
  - Also called cold start misses or first reference misses
- Conflict misses**
  - Occur in the case of direct mapped cache or set associative cache, when several blocks are mapped to the same block/set
  - Also called collision misses or interference misses
- Capacity misses**
  - Occur when blocks are discarded from cache as cache cannot contain all blocks needed

## Exercise #1: Setup Information



## Exercise #2: Load #2



| Index | Valid | Tag | Word0 | Word1 |
|-------|-------|-----|-------|-------|
| 0     | 1     | 0   | (A)   | B     |
| 1     | 0     |     |       |       |
| 2     | 0     |     |       |       |
| 3     | 0     |     |       |       |

## Exercise #2: Load #3



| Index | Valid | Tag | Word0 | Word1 |
|-------|-------|-----|-------|-------|
| 0     | 1     | 0   | X I   | X J   |
| 1     | 1     | 0   | C     | D     |
| 2     | 0     |     |       |       |
| 3     | 0     |     |       |       |

## Exercise #2: Tracing Memory Accesses

- Using the given setup in exercise #1, trace the following memory loads:
  - Load from addresses: 4, 0, 8, 12, 36, 0, 4
  - Note that "A", "B", ..., "J" represent word-size data
  - Assume 1 word = 4 bytes

| Addr | Data |
|------|------|
| 0    | A    |
| 4    | B    |
| 8    | C    |
| 12   | D    |
| ...  | ...  |
| 32   | I    |
| 36   | J    |
| ...  | ...  |

## Exercise #2: Load #1

Addresses: 4, 0, 8, 12, 36, 0, 4

| Index | Valid | Tag | Word0 | Word1 |
|-------|-------|-----|-------|-------|
| 0     | 0     | 1   | 0     | B     |
| 1     | 0     |     |       |       |
| 2     | 0     |     |       |       |
| 3     | 0     |     |       |       |

## Exercise #2: Load #3

MISS HIT MISS HIT MISS HIT

Addresses: 4, 0, 8, 12, 36, 0, 4

Address 8 = 00000000000000000000000000000000 01 000

| Index | Valid | Tag | Word0 | Word1 |
|-------|-------|-----|-------|-------|
| 0     | 1     | 0   | A     | B     |
| 1     | 0     |     | C     | D     |
| 2     | 0     |     |       |       |
| 3     | 0     |     |       |       |

## Exercise #2: Load #7

MISS HIT MISS HIT MISS MODE HIT

Addresses: 4, 0, 8, 12, 36, 0, 4

Address 4 = 00000000000000000000000000000000 00 100

| Index | Valid | Tag | Word0 | Word1 |
|-------|-------|-----|-------|-------|
| 0     | 1     | 0   | X A   | B     |
| 1     | 1     | 0   | C     | D     |
| 2     | 0     |     |       |       |
| 3     | 0     |     |       |       |

## 7. Writing Data: Store #1-1

Store X to 00000000000000000000000000000001 1000

**Step 2.** Cache block is Valid AND [Tags match] → Cache hit!

| Index | Valid | Tag | Data Word0 Bytes 0-3 | Data Word1 Bytes 4-7 | Data Word2 Bytes 8-11 | Data Word3 Bytes 12-15 |
|-------|-------|-----|----------------------|----------------------|-----------------------|------------------------|
| 0     | 1     | 0   | A                    | R                    | C                     | D                      |
| 1     | 1     | 0   |                      |                      |                       |                        |

Writing data

## 7. Writing Data: Store #1-1

Store X to 00000000000000000000000000000001 1000

**Step 1.** Check Cache Block 1

| Index | Valid | Tag | Data Word0 Bytes 0-3 | Data Word1 Bytes 4-7 | Data Word2 Bytes 8-11 | Data Word3 Bytes 12-15 |
|-------|-------|-----|----------------------|----------------------|-----------------------|------------------------|
| 0     | 0     |     |                      |                      |                       |                        |
| 1     | 1     | 0   |                      |                      |                       |                        |

\* Store X to [0000000000000000] [0000000001] 1000

#### Step 1. Check Cache Block 1



\* Store X to [0000000000000000] [0000000001] 1000

#### Step 2. [Cache Block is Valid] AND [Tags match] → Cache hit!



\* Store X to [0000000000000000] [0000000001] 1000

#### Step 2. Replace Word2 (offset = 8) with X



## 8. Changing Cache Content: Write Policy

- Cache and main memory are inconsistent
  - Modified data only in cache, not in memory!
- Solution 1: Write-through cache**
  - Write data both to cache and to main memory
- Solution 2: Write-back cache**
  - Only write to cache
  - Write to main memory only when cache block is replaced (evicted)

## 8. Write-Back Cache

- Problem:**
  - Quite wasteful if we write back every evicted cache blocks
- Solution:**
  - Add an additional bit (**Dirty bit**) to each cache block
  - Write operation will change dirty bit to 1
    - Only cache block is updated, no write to memory
  - When a cache block is replaced:
    - Only write back to memory if dirty bit is 1

## 8. Write-Through Cache



- Problem:**
  - Write will operate at the speed of main memory!
- Solution:**
  - Put a write buffer between cache and main memory
    - Processor: writes data to cache + write buffer
    - Memory controller: write contents of the buffer to memory

## 8. Handling Cache Misses

- On a **Read Miss**:
  - Data loaded into cache and then load from there to register
- Write Miss option 1: Write allocate**
  - Load the complete block into cache
  - Change only the required word in cache
  - Write to main memory depends on write policy
- Write Miss option 2: Write around**
  - Do not load the block to cache
  - Write directly to **main memory only**

## 8. Writing Data: Summary



## Associative Cache

### 2. Block Size Trade-off (1/2)

**Average Access Time**  
= Hit rate × Hit Time + (1-Hit rate) × Miss penalty

- Larger block size:
  - Takes advantage of spatial locality
  - Larger miss penalty: Takes longer time to fill up the block
  - If block size is too big relative to cache size → Too few cache blocks → miss rate will go up

Miss Penalty

Miss Rate

Exploits Spatial Locality

Fewer blocks; compromises temporal locality

Average Access Time

### 2. Block Size Trade-off (2/2)



## 3. Set Associative (SA) Cache

### Compulsory misses

- On the first access to a block, the block must be brought into the cache
- Also called cold start misses or first reference misses

### Conflict misses

- Occur in the case of direct mapped cache or set associative cache, when several blocks are mapped to the same block/set
- Also called collision misses or interference misses

### Capacity misses

- Occur when blocks are discarded from cache as cache cannot contain all blocks needed

## 3. Set Associative Cache: Mapping



## 3. Set Associative Cache: Example



## 3. Set Associative Cache: Structure



An example of 2-way set associative cache

Each set has two cache blocks

A memory block maps to a **unique set**

In the set, the memory block can be placed in **either** of the N cache blocks in the set

Need to search both to look for the memory block

## 3. Set Associative Cache: Circuitry



### 3. Advantage of Associativity (1/3)



### 3. Advantage of Associativity (2/3)



### 3. Advantage of Associativity (3/3)



### 3. SA Cache Example: Setup

= Given:  
Memory access sequence: 4, 0, 8, 36, 0  
2-way set-associative cache with a total of four 8-byte blocks → **total of 2 sets**  
Indicate hit/miss for each access

|                    |   |           |   |        |
|--------------------|---|-----------|---|--------|
| 31                 | 4 | 3         | 2 | 0      |
| Offset, N = 3 bits | + | Set Index | + | Offset |

Block Number = 32 – 3 = 29 bits  
2-way associative, number of sets = 2 = 2<sup>1</sup>  
Set Index, M = 1 bits  
Cache Tag = 32 – 3 = 1 = 28 bits

### 3. SA Cache Example: Load #4

= Load from 36 → [00000000000000000000000000000019] 0 100

**Check:** Both blocks in Set 0 are invalid [Cold Miss]  
**Result:** Load from memory and place in Set 0 - Block 0

| Set Index | Block 0 |     |      |       | Block 1 |     |    |    |
|-----------|---------|-----|------|-------|---------|-----|----|----|
|           | Valid   | Tag | W0   | W1    | Valid   | Tag | W0 | W1 |
| 0         | 1       | 0   | M[0] | M[4]  | 0       |     |    |    |
| 1         | 1       | 0   | M[8] | M[12] | 0       |     |    |    |

### 3. SA Cache Example: Load #5

= Load from 0 → [00000000000000000000000000000000] 0 100

**Check:** [Valid but tag mismatch] Set 0 - Block 0  
[Invalid] Set 0 - Block1 [Cold Miss]

| Set Index | Block 0 |     |      |       | Block 1 |     |       |       |
|-----------|---------|-----|------|-------|---------|-----|-------|-------|
|           | Valid   | Tag | W0   | W1    | Valid   | Tag | W0    | W1    |
| 0         | 1       | 0   | M[0] | M[4]  | 1       | 2   | M[32] | M[36] |
| 1         | 1       | 0   | M[8] | M[12] | 0       |     |       |       |

### Fully Associative Cache

#### 4. Fully Associative Cache

= **Fully Associative Cache**  
A memory block can be placed in any location in the cache

##### Key Idea:

- Memory block placement is no longer restricted by cache index or cache set index
- Can be placed in any location, **BUT**
- Need to search all cache blocks for memory access

#### 4. Fully Associative Cache: Circuity

= Example: 2<sup>12</sup> × 2<sup>12</sup> = 2<sup>24</sup> bytes  
= 4KB cache size and 16-Byte block size  
= Compare tags and valid bit in parallel



#### 4. Fully Associative Cache: Mapping



### No Conflict Miss

#### 4. Cache Performance

**Observations:**

- Cold/compulsory miss remains the same irrespective of cache size/associativity.
- For the same cache size, conflict miss goes down with increasing associativity.
- Conflict miss is 0 for FA caches.
- For the same cache size, capacity miss remains the same irrespective of associativity.
- Capacity miss decreases with increasing cache size.



## 5. Block Replacement Policy (1/3)

### • Set Associative or Fully Associative Cache:

- Can choose where to place a memory block
- Potentially replacing another cache block if full
- Need **block replacement policy**

### • Least Recently Used (LRU)

- How: For cache hit, record the cache block that was accessed
- When replacing a block, choose one which has not been accessed for the longest time
- Why: Temporal locality

## 5. Block Replacement Policy (2/3)

### • Least Recently Used policy in action:



## 5. Block Replacement Policy (3/3)

### • Drawback for LRU

- Hard to keep track if there are many choices

### • Other replacement policies:

- First in first out (FIFO)
- Random replacement (RR)
- Least frequently used (LFU)

## 6. Additional Examples #1

|     | Addr      | Tag | Index | Offset |
|-----|-----------|-----|-------|--------|
| 4:  | 00...0000 | 00  | 100   |        |
| 8:  | 00...0001 | 01  | 000   |        |
| 36: | 00...0010 | 00  | 100   |        |
| 48: | 00...0011 | 10  | 000   |        |
| 68: | 00...0100 | 00  | 100   |        |
| 0:  | 00...0000 | 00  | 000   |        |
| 32: | 00...0011 | 00  | 000   |        |

### • Direct-Mapped Cache:

- Four 8-byte blocks

### • Memory accesses:

| Index | Valid | Tag | Word0       | Word1       |
|-------|-------|-----|-------------|-------------|
| 0     | 0     | 1   | M[0] M[32]  | M[32] M[36] |
|       | -0    | 1   | M[0] M[32]  | M[32] M[36] |
|       | -0    | 2   | M[0] M[64]  | M[64] M[68] |
| 1     | 0     | 1   | M[8] M[12]  |             |
| 2     | 0     | 1   | M[48] M[52] |             |
| 3     | 0     |     |             |             |

## 6. Additional Examples #2

|     | Addr      | Tag | Index | Offset |
|-----|-----------|-----|-------|--------|
| 4:  | 00...0000 | 00  | 100   |        |
| 8:  | 00...0001 | 01  | 000   |        |
| 36: | 00...0010 | 00  | 100   |        |
| 48: | 00...0011 | 00  | 000   |        |
| 68: | 00...0100 | 00  | 100   |        |
| 0:  | 00...0000 | 00  | 000   |        |
| 32: | 00...0010 | 00  | 000   |        |

### • Fully-Associative Cache:

- Four 8-byte blocks

### • LRU Replacement Policy

### • Memory accesses:

4, 8, 36, 48, 68, 0, 32

| Index | Valid | Tag | Word0 | Word1                 |
|-------|-------|-----|-------|-----------------------|
| 0     | 0     | 1   | -0 8  | M[0] M[64] M[4] M[68] |
| 1     | 0     | 1   | -1 0  | M[8] M[0] M[12] M[4]  |
| 2     | 0     | 1   | 4     | M[32] M[36]           |
| 3     | 0     | 1   | 6     | M[48] M[52]           |

## 6. Additional Examples #3

|     | Addr      | Tag | Index | Offset |
|-----|-----------|-----|-------|--------|
| 4:  | 00...0000 | 0   | 100   |        |
| 8:  | 00...0001 | 0   | 000   |        |
| 36: | 00...0010 | 0   | 100   |        |
| 48: | 00...0011 | 0   | 000   |        |
| 68: | 00...0100 | 0   | 100   |        |
| 0:  | 00...0000 | 0   | 000   |        |
| 32: | 00...0010 | 0   | 000   |        |

### • 2-way Set-Associative Cache:

- Four 8-byte blocks

### • LRU Replacement Policy

### • Memory accesses:

4, 8, 36, 48, 68, 0, 32

| Set Index | Block 0 |     |       |       | Block 1 |        |       |             |
|-----------|---------|-----|-------|-------|---------|--------|-------|-------------|
|           | Valid   | Tag | Word0 | Word1 | Valid   | Tag    | Word0 | Word1       |
| 0         | 0       | 1   | M[0]  | M[4]  | 2       | -M[32] | M[36] |             |
| 1         | 0       | 1   | M[48] | M[52] | 3       | M[64]  | M[68] | M[32] M[36] |

## 8. Exploration: Improving Cache Penalty

### Average Access Time

$$= \text{Hit rate} \times \text{Hit Time} + (1 - \text{Hit rate}) \times \text{Miss penalty}$$

So far, we tried to improve Miss Rate:

- Larger block size
- Larger Cache
- Higher Associativity

What about **Miss Penalty**?

Are you a RAW hazard? Come find out how many cycles you take. Start with 0 cycles.

Q1. Do you have forwarding?

No - Exit with 2 cycles

Yes - Go to Q2

Q2. Is your first instruction lw?

No - Go to Q4 (Forwarding from EX)

Yes - Go to Q3 (Forwarding from MEM)

Q3. Is your second instruction sw?

No - Add one cycle and go to Q4

Yes - Exit (MEM-MEM forward)

Q4. Is your second instruction beq with early branching?

No - Exit (Forwarding to EX/MEM)

Yes - Add 1 cycle and exit (Forwarding to ID)

RAW Dependency (Generic)

If no forwarding, need to wait until WB stage of the first instruction for writing back to the registers(first half)

In the second half, do the ID stage of the second instruction to read registers.

Thus 2 cycle delay.

If there is forwarding, it will occur from EX/MEM register to the EX stage of the second instruction.

Thus no delay.

RAW Dependency (First one is Load)

If there is no forwarding, it's the same as the generic one. Need to wait until WB then read during ID.

Thus 2 cycle delay.

If there is forwarding, it will occur from MEM/WB register to the EX stage of the second instruction.

(Load instruction's data is produced only in MEM stage, unlike EX stage in generic.)

Thus 1 cycle delay.

If there is forwarding and the second instruction is Save Word WITH THE SAME REGISTER AS FIRST ARGUMENT, then there isn't any delay because the info is needed only at MEM of sw

Eg

lw \$1, 0(\$5)

sw \$1, 0(\$6)

Thus 0 cycle delay.

RAW Dependency (Second one is Branch) DO GENERIC FIRST

If there is no early branching, the branch calculation is done at MEM. On top of any existing RAW dependencies, there won't be any extra cycle delays.

0 cycle delay.

If there is early branching, the branch calculation is done at ID. On top of any existing RAW dependencies, there will be an extra +1 delay if there is forwarding, because the data needs to be produced at EX stage, forward to ID.

Thus 1 cycle delay. (If Forwarding)

Thus 0 cycle delay (If no Forwarding)

Control Dependency (First one is Branch)

If there is no branch prediction, the branch calculation is done at MEM or ID depending on setup. On top of any existing early branching or not, there will be a guaranteed 1 cycle delay because you need the calculation to finish before doing the next IF.

Thus 1 cycle delay.

If there is branch prediction, you can run the IF in the background while the branch calculation is happening. If the prediction is wrong, simply flush and IF again (+1 delay like before). If the prediction is correct, no delay as the correct instruction is already there.

Thus 0/1 cycle delay depending on prediction correctness.