

# Midterm Notes

## 1. Details on PPT

Classes of computers  
 | personal computers: general purpose, cost/performance tradeoff  
 | server computers: network based, high capacity, performance  
 | supercomputers: highest capability  
 | embedded computers: stringent power/cost constraints

Common number system  
 | binary  
 | decimal  
 | hexadecimal



Components of the computer

| Input (data transferred to memory)

| Output (data read from memory)

Memory

Processor  
 | datapath: 用于数据处理和传输的硬件 (register, I/O, ...)  
 | control: 负责协调管理数据的操作 (clock signal, instruction decoding, ...)

ISA (native data types)  
 | instructions  
 | registers  
 | addressing modes  
 | memory architecture  
 | interrupt and exception handling  
 | external I/O

包括  
 | 一种叫 RISC 的, MIPS 和 RISC  
 | 较简便, 轻量级的

## 2. Digital Logic

{ digital waveforms: discrete (e.g.: > certain threshold → )

analog waveforms: continuous

logic  
 | truth table  
 | logical expression  
 | Graphic form (k-map)

Universal Gate (单一 Gate 可搭建所有电路): NAND 和 NOR

Gate: NOT  $\rightarrow$  AND  $\rightarrow$  OR  $\rightarrow$  INAND  $\rightarrow$  NOR  $\rightarrow$  XOR  $\rightarrow$

{ combinational logic circuits: no memory  $\Rightarrow$  output can be expressed as logic functions of input

sequential logic circuits: with memory! cannot

Combinational logic

① Multiplexer (selector)

$2^n$  to 1 multiplexor  
 $n^+$  selection input (which to choose)  $\rightarrow$  这个 n 位输入对后面 binary number 就是选择 input  
 1<sup>+</sup> output (相当于先在前面的电路里用 not gate 做 n 位 selector  
 input 转成 2<sup>n</sup> 相位, 然后每相位和一个 input - 起进 and gate  
 也就是说必须把 n 位相位定好相位, input 才能连去)

② Decoder ( $n$  to  $2^n$  decoder)

$n$  to  $2^n$  decoder  
 $n$  input  
 $2^n$  outputs (每个 input 选择 In<sub>1</sub> 还是 not In<sub>1</sub>, 每种组合对应一个 response)  
 one-hot vector  
 ③ minterm and maxterm  
 canonical form  
 SOP (min): 所有结果为 1 的 input 算起来之后的和  
 POS (max): 所有结果为 0 的 input 算起来的和

④ two-level logic and PLA

⑤ K-map

要注意的点: 表中 entry 都是 minterm  
 简化方法可能唯一  
 是上一行和下一行 最左边一行和最右边一行也是 adjacent 的  
 相邻的 2<sup>n</sup> 才能被圈

Sequential logic

① S-R Latch  
  
 NOR Gate implementation:  
 $R = 1, S = 1 \Rightarrow \text{forbidden state} \rightarrow 1 - 0 \Rightarrow 0 - 0 \Rightarrow 1' (x)$   
 $R = 0, S = 0 \Rightarrow \text{Latch}$   
 $R = 0, S = 1 \Rightarrow \text{Set}$   
 $R = 1, S = 0 \Rightarrow \text{Reset}$   
 NAND Gate implementation:  
 $S = 1 \text{ AND } Q = 0 \Rightarrow \text{forbidden state} \rightarrow 0 - 1 \Rightarrow 1 - 0 \Rightarrow 0' (x)$   
 $R = 0, S = 1 \Rightarrow \text{Latch}$   
 $R = 0, S = 0 \Rightarrow \text{Reset}$   
 $R = 1, S = 0 \Rightarrow \text{Set}$

② D-Latch  
  
 $WE = 1: Q = D$   
 $WE = 0: Q = \text{previous value}$

③ Register  
 a collection of D-latches, controlled by a common WE

④ clock signal

a shared signal that controls all the circuit element about time

clock speed / clock frequency  $\times$  cycle time =

edge-triggering terminology (state change only occur in edge time)

$\hookrightarrow$  flip-flops (latch can do state change any time)

⑤ Master-Slave D Flip-flop

built by a pair of D-latches



## 3. Data representation

### 2's complement

① signed integer

$$(6)_10 = (0110)_2 \quad 0110 \xrightarrow{\text{flip}} 1001 \xrightarrow{\text{add 1}} 1010$$

$$(-6)_10 = (1010)_2 \quad 1010 \xrightarrow{\text{flip}} 0101 \xrightarrow{\text{add 1}} 0110$$

∴ 对 k-bit 来说, 能表示的是  $[-2^{k-1}, 2^{k-1}]$

Overflow & Underflow  
 $\xrightarrow{-2^{k-1}} \xleftarrow{2^{k-1}}$   
 underflow  
 overflow  
 representable

(unsigned int 0 ~  $2^{k-1}$ , 退 trivial - 点.)

(signed extension: for signed ints  
 zero extension: for unsigned ints  
 (more in MIPS))

② floating point numbers

finite range, limited precision

(能表示的数不是 equally distributed, 前密后疏)

$$\begin{array}{c} \overline{1} \quad \overline{8} \quad \overline{23} \\ \overline{1} \quad \overline{11} \quad \overline{52} \end{array} \quad \begin{array}{c} \text{(32 bits, single precision)} \\ \text{(14 bits, double precision)} \end{array} \quad \begin{array}{c} 7 \text{ 位精度} \\ 16 \text{ 位精度} \end{array}$$

算法:  $(\text{significand\_bits} + 1) \times \log_2 2$

IEEE 754 Standard

bias 目的: 更容易做 comparison (直接 bit-by-bit 比较)

1° normalized case

拿到一个数, 先化为  $2^{\pm k} \times \dots$  的形式  
 整数: 整数, 小数部分分别拆了再合起来  
 小数化二进制用“归整整数部分”法

然后把二进制表示写出来, 化为 normalized scientific representation

然后 signifand 表示出来

exponent 是原数的 exp + bias 之后转为二进制源码的值

最后加 1 sign bit, 0 for 正, -1 for 负

注意: exponent 在这个 case 里不能 = 000...0 或 111...1

假设是 1 位的那:

$$\begin{array}{l} \text{representable range:} \\ \text{max: } (-1)^k \times 1.11\dots1 \times 2^{(1-k)\times 2^{\pm 1}} = -1.1\dots1 \times 2^{\pm 1} \approx -2^{128} \\ \text{min: } (-1)^k \times 1.00\dots0 \times 2^{(1-k)\times 2^{\pm 1}} = 2^{\pm 1} \\ \text{+min: } (-1)^k \times 1.00\dots0 \times 2^{(1-k)\times 2^{\pm 1}} = 2^{\pm 1} \\ \text{+max: } (-1)^k \times 11\dots1 \times 2^{(1-k)\times 2^{\pm 1}} = 11\dots1 \times 2^{\pm 1} \approx 2^{128} \end{array}$$

underflow & overflow 都是 exponent 意义上的, significand 控制的更多是精度

underflow =  $-2^{128} \approx 2^{-128} \approx 2^{128} \approx 2^{128} + 2^{128}$

说明 significand 有错吗!!!

2° de-normalized case

规定 exponent = 00...0 = the exponent of  $(00\dots0)_2 = -127 = -126$

significand 直接读, 但没有 default 1

bit 位数看是 0 000...0 01100...0

value =  $(-1)^k \times 2^{-126} \times (0.01100\dots0)_2$

if exponent to significand 全是 0, 那就表示 0

3° Special case

$+ \infty$ : significand 是全 0: infinite number, 正负 sign bit

$+ \text{NaN}$ : significand 不是全 0: NaN

一般习惯用 hexdecimal 的时候在前面加“0x”

④ characters  $\Rightarrow$  unsigned bytes!

usually 1 byte (8-bit), follow ASCII standard

## 4. MIPS

Doing abstraction makes life easier!

Ideally for each ISA there is an assembly language (satisfying/following the ISA standard)

但其实对于一个 ISA 来说, 有的几种 在细微差异, 也挺常见

Assembler 把它转为 machine code

### Basic Knowledge

① Register: inside processor, 处理速度 (transfer 速度) 比内存快很多  
 in MIPS: 32 registers [number tradeoff]

data 从 memory 里拿出来, 到 processor 里才能做操作

② data transfer [lw (load word)]

sw (store word)

memory location: Offset (base address)

↳ must be constant

③ Endianness

data 之间 总体都是从 地址到 地址的

寻址读取永远先读 memory value 小的 byte, 但写时是

0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07

但都是先读 0x00 再读 0x01, 再根据 endianness 来决定怎么 “interpret” 数据

④ Immediate operands

(addi, no addi!)

constant zero: \$zero (read-only)

↳ 但当成 register 用 (add \$t2,\$t1,\$zero)

addi \$t2,\$zero,5

Memory register constant  $\rightarrow$  快

⑤ Logical operation [bit-by-bit, 每位分步操作/logical operation 再合起来]

and, or, nor, andi, ori (with an immediate value)

shift [sll 整体左移/右移 k 格 e.g.: sll destination, source, shift\_amount]

得是 constant it's register

### MIPS Program

① Areas in memory:  
 Reserved, text, data, heap, stack  
 小地址  $\rightarrow$  大地址

当我们在 data 中 define variable 的时候, 占用的内存都是连续的

· data  
 hi.word 1234  $\Rightarrow$  12345 = 可通过 12345 来访问 h[3]=4

· si.word 5  
 .word 12345

declare float: .byte 1 byte →

.word 4 byte →

.double 8 byte →

.ascii str (string)

.asciz str (string, 末尾 + '0')

## ② Making decisions in MIPS

Conditional jump

beg: branch if equal  
e.g.: beg reg1, reg2, L1  
go to label L1 if reg1==reg2

bne: branch if not equal  
e.g.: bne reg1, reg2, L2  
go to label L2 if reg1!=reg2

Unconditional jump

j: jump  
e.g.: jump L1  
jr: jump register  
e.g.: jr reg [reg存着 address]  
通常在 procedure call/case语句里用

这个会起始 implement if-else-if-else语句

## ③ Loops

Basic blocks 就是一串 sequentially 执行的代码块 (不能从中间进去/出去)

One of the first early phases of compilation is breaking the program into basic blocks.

A compiler identifies basic blocks for optimization

An advanced processor can accelerate execution of basic blocks

## ④ Comparison

slt: set on less than

slti: set on less than immediate

e.g.: slt reg1, reg2, reg3

e.g.: slti \$0, \$52, 10

reg2<reg3: reg1=1

\$t0=1 if \$52<10

otherwise: reg1=0

如果重做 if-else jump 操作, 只能组合两条 instruction

如果把 comparison 和 jump 统一起, 那单条 instruction 就会变慢 [slower clock]

而且 beg, bne 才是 common case

pseudo instructions

$blt(t2)$   
 $bge(s2)$  branch if  $\begin{cases} l \text{(less)} + t \text{(than)} \\ g \text{(great)} + e \text{(equal)} \\ \text{(to) zero, former就赢了} \end{cases}$

## ⑤ Representing instructions

All 32 bits, 4 bytes, 3 formats

- R-format (register)
- I-format (immediate)
- J-format (Jump)

decoding 的时候看 opcode就知道是哪个 format!

|   | opcode | rs    | rt    | rd    | shamt | funct     |   |
|---|--------|-------|-------|-------|-------|-----------|---|
| 1 | 26-25  | 21-20 | 16-15 | 11-10 | 6-5   |           | 0 |
| 1 | opcode | rs    | rt    |       |       | immediate |   |
| J | opcode | 26-25 | 21-20 | 16-15 |       |           | 0 |

R: 注意 结果(也就是写 code 时候的第一个)是存在 rd 里的

rs, rt 的顺序和 code 里一样

shamt: shift amount

funct: 当 opcode 为 (00000) 时, 有很多种指令, 这时候就靠 function code 区分

I: rs: 有着 value register

rt: 有着 address b register

immediate: constant offset

J: to be continued (有个 11h~26bit 的东西)

| Instruction | Type | op               | rs  | rt  | rd  | shamt    | funct            | const/address |
|-------------|------|------------------|-----|-----|-----|----------|------------------|---------------|
| add         | R    | 0                | reg | reg | reg | 0        | 32 <sub>10</sub> | -             |
| sub         | R    | 0                | reg | reg | reg | 0        | 34 <sub>10</sub> | -             |
| and         | R    | 0                | reg | reg | reg | 0        | 36 <sub>10</sub> | -             |
| or          | R    | 0                | reg | reg | reg | 0        | 37 <sub>10</sub> | -             |
| slt         | R    | 0                | 0   | reg | reg | constant | 0                | -             |
| addi        | I    | 8 <sub>10</sub>  | reg | reg | -   | -        | 2 <sub>10</sub>  | constant      |
| andi        | I    | 16 <sub>10</sub> | reg | reg | -   | -        | 0                | address       |
| ori         | I    | 16 <sub>10</sub> | reg | reg | -   | -        | 0                | address       |
| lw          | I    | 43 <sub>10</sub> | reg | reg | -   | -        | 0                | address       |

I: reg: a register number between 0 and 31

constant/address: a constant or a 16-bit address (offset)

## ⑥ Other MIPS instructions

Load/Store a byte

lb \$rt, offset(\$rs) sign extension

lbw \$rt, offset(\$rs) zero extension

sb \$rt, offset(\$rs) 把 rt 最右边的 byte 存到 offset(\$rs)

Logical operation

nor \$rd, \$rs, \$rt 把 rt 换成 zero 来 implement logical not

sltiu: sli 最后一个只能是 register, 这个可以是 immediate

slr: same

## ⑦ Procedures

General purpose registers for procedure calling:

\$(a0-a3): arguments (reg 4-7) For parameter passing: main@>procedure

\$(v0, \$v1, \$v2, \$v3): result values (reg 2 and 3) procedure@>main@>return@>main@>main

\$(s0, \$t0, \$s1, \$t1): local variables (reg 8-11) procedure@>main@>return@>main@>main

\$(s0-s7): saved (reg 8-15) saved@>main@>procedure@>main@>main@>main

\$(sp): stack pointer (reg 20) stack pointer@>main@>procedure@>main@>main

\$(fp): frame pointer (reg 30) frame pointer@>main@>procedure@>main@>main

\$(ra): return address (reg 31) return address@>main@>procedure@>main@>main

PC: 在当前正在执行的 instruction 的地址 (后面会讲, 在 instruction fetch之后就会自动执行 PC=PC+4)

jal ProcedureLabel1 {当前 "jal" 指令的下一行的 address 被 assign给 ra (不是 pseudo)}

jump to ProcedureLabel1 → 等待这个 procedure

往堆栈里 jro 回到原处位置

由于在 nested procedure 里, ra 会被 override, 引入 \$sp 来辅助

caller 开头先把 ra 的值 push 到 stack 里去: addi \$sp, \$sp, -4

sw ra, 0(\$sp)

然后在 procedure 结束之前 (jr ra 前一行) 把值从 stack 里 pop 出来 (lw ra, 0(\$sp))

addi \$sp, \$sp, 4

(leaf procedure 不用做 push/pop 但是做肯定没问题)

→ 因为 ra 不会被 override

## ⑧ char & string

lb & sb 用法 (load/store char, 1bit sign extension)

pseudo instruction: la {lw: load upper immediate

ori {lui: load lower immediate}

la \$s1, int {lw \$s1, int [把前面的 16位移位的前四位, then set the lower 16 bits of \$s1 to 0]}

ori \$s1, int {ori \$s1, int [真把后面的 16位和 \$s1 的后16位, 或者 \$s1, 前16位不变]}

la \$s1, int {lw \$s1, int [把前面的 16位移位的前四位, then set the lower 16 bits of \$s1 to 0]}

ori \$s1, int {ori \$s1, int [真把后面的 16位和 \$s1 的后16位, 或者 \$s1, 前16位不变]}

## ⑨ MIPS Arithmetic

1° instruction details about signed/unsigned adding & extension

add, addu, addi 都是直接加, signed 和 unsigned 的加法没差, 差别只是 signed 的要检查 overflow 而 unsigned 的不用

addiu 和 addiu 不说明 instruction 中的 register/immediate 中的值是 unsigned 的, 说明被先加法之后再检查是否 overflow

addiu 是 sign extension, 因为是 positive value

{slt 有: 表示 unsigned comparison (也就是跳溢出检测的值)  
无: 表示 signed comparison (也就是补码的检测值)}

{sltu 一通通: 表示都是 sign extension sltu 是先做 sign extension, 再 unsignedly 比较)

{sltiu 一通通: 表示都是 sign extension 再跟字节加

{lb \$rt, imm(\$rs) 抓一个 byte 出来, 再 sign extension 存进 rt 里}

{lbu \$rt, imm(\$rs) sign extension 抓一个 byte 出来, 再 zero extension 存进 rt 里}

load half 同理, 看看 lw, 没有 lwu

andi \$rt, \$rs, imme zero extension

2° ALU: arithmetic logic unit

hardware in charge of arithmetic operations

hardware in charge of logical operations

1-bit full adder 用 truth table 写出 Cout . sum



1-bit ALU (and, or, addition)

control signal (应该是 opcode 变成的)

multiplexer (4 to 1) 4 to 1 不够用

所以(加法或减法) or operation 上面是做了 +, and 然后选择 or

假想实现 subtract 功能的话可以在里面放一个 multiplexer (4 to 1) 然后把 carryin 设为 1 下面有

(符号先左移一位) 最大位 (opcode: 11) 把 sign bit 的值放到最高位

SLT: 假设实现 subtract 功能的话可以在里面放一个 multiplexer (4 to 1) 然后把 carryin 设为 1 下面有

log: basic idea 是做 subtraction, 用两个 register 中的值相减看是否为 0

操作 -b negate - control signal 表现 (4 bit)

AND 00 0 0 0

OR 01 0 0 0

ADD 10 0 0 0

SUB 10 1 1 1

BELT 10 1 1 1

SLT 11 1 1 0

优点: 32 bit 的 data 只用了 32 个 1 bit ALU, 省硬件

缺点: 下一个 bit 需等上一个 bit 算完, 单操作, 慢

这是 SLT 的图? 那不做 SLT 的时候

怎么办? OR 撤掉?

Carry look ahead (Optional)

实际的 ALU: 每四位之间用接续的

方法, 但是四 bit 和四 bit 之间用 Carry look

ahead (8 7 6 5)

define:  $G_1 = G_0 - b_1 \cdot C_1$

$C_{11} = G_1 + P_1 \cdot C_1$

(要么进位, 要么其中某个 bit, 而且  $C_{11}=1$ ,  $C_{10}$  才是 1)

这个办法用硬件提高效率

Universal representation

[32 bit] ALU Control Lines

2 bit (控制) multiplexor

1 bit (控制 b negative)

这玩意是啥?

Unsigned multiplication

问题: 把 multiplier 从多 bit 转为 1 bit?

解决: 把 multiplier 从多 bit 转为 1 bit?

⇒ partial product = partial product + multiplier (shifted)

Procedure a(m bit) b(n bit)

Product = 0

Step 1: \$r0 = \$r0 + \$r1 \* \$r2

Step 2: \$r0 = \$r0 + \$r1 \* \$r2

Step 3: \$r0 = \$r0 + \$r1 \* \$r2

Step 4: \$r0 = \$r0 + \$r1 \* \$r2

Step 5: \$r0 = \$r0 + \$r1 \* \$r2

Step 6: \$r0 = \$r0 + \$r1 \* \$r2

Step 7: \$r0 = \$r0 + \$r1 \* \$r2

Step 8: \$r0 = \$r0 + \$r1 \* \$r2

Step 9: \$r0 = \$r0 + \$r1 \* \$r2

Step 10: \$r0 = \$r0 + \$r1 \* \$r2

Step 11: \$r0 = \$r0 + \$r1 \* \$r2

Step 12: \$r0 = \$r0 + \$r1 \* \$r2

Step 13: \$r0 = \$r0 + \$r1 \* \$r2

Step 14: \$r0 = \$r0 + \$r1 \* \$r2

Step 15: \$r0 = \$r0 + \$r1 \* \$r2

Step 16: \$r0 = \$r0 + \$r1 \* \$r2

Step 17: \$r0 = \$r0 + \$r1 \* \$r2

Step 18: \$r0 = \$r0 + \$r1 \* \$r2

Step 19: \$r0 = \$r0 + \$r1 \* \$r2

Step 20: \$r0 = \$r0 + \$r1 \* \$r2

Step 21: \$r0 = \$r0 + \$r1 \* \$r2

Step 22: \$r0 = \$r0 + \$r1 \* \$r2

Step 23: \$r0 = \$r0 + \$r1 \* \$r2

Step 24: \$r0 = \$r0 + \$r1 \* \$r2

Step 25: \$r0 = \$r0 + \$r1 \* \$r2

Step 26: \$r0 = \$r0 + \$r1 \* \$r2

Step 27: \$r0 = \$r0 + \$r1 \* \$r2

Step 28: \$r0 = \$r0 + \$r1 \* \$r2

Step 29: \$r0 = \$r0 + \$r1 \* \$r2

Step 30: \$r0 = \$r0 + \$r1 \* \$r2

Step 31: \$r0 = \$r0 + \$r1 \* \$r2

Step 32: \$r0 = \$r0 + \$r1 \* \$r2

Step 33: \$r0 = \$r0 + \$r1 \* \$r2

Step 34: \$r0 = \$r0 + \$r1 \* \$r2

Step 35: \$r0 = \$r0 + \$r1 \* \$r2

Step 36: \$r0 = \$r0 + \$r1 \* \$r2

Step 37: \$r0 = \$r0 + \$r1 \* \$r2

Step 38: \$r0 = \$r0 + \$r1 \* \$r2

Step 39: \$r0 = \$r0 + \$r1 \* \$r2

Step 40: \$r0 = \$r0 + \$r1 \* \$r2

Step 41: \$r0 = \$r0 + \$r1 \* \$r2

Step 42: \$r0 = \$r0 + \$r1 \* \$r2

Step 43: \$r0 = \$r0 + \$r1 \* \$r2

Step 44: \$r0 = \$r0 + \$r1 \* \$r2

Step 45: \$r0 = \$r0 + \$r1 \* \$r2

Step 46: \$r0 = \$r0 + \$r1 \* \$r2

Step 47: \$r0 = \$r0 + \$r1 \* \$r2

Step 48: \$r0 = \$r0 + \$r1 \* \$r2

Step 49: \$r0 = \$r0 + \$r1 \* \$r2

Step 50: \$r0 = \$r0 + \$r1 \* \$r2

Step 51: \$r0 = \$r0 + \$r1 \* \$r2

Step 52: \$r0 = \$r0 + \$r1 \* \$r2

Step 53: \$r0 = \$r0 + \$r1 \* \$r2

Step 54: \$r0 = \$r0 + \$r1 \* \$r2

Step 55: \$r0 = \$r0 + \$r1 \* \$r2

Step 56: \$r0 = \$r0 + \$r1 \* \$r2

Step 57: \$r0 = \$r0 + \$r1 \* \$r2

Step 58: \$r0 = \$r0 + \$r1 \* \$r2

Step 59: \$r0 = \$r0 + \$r1 \* \$r2

Step 60: \$r0 = \$r0 + \$r1 \* \$r2

Step 61: \$r0 = \$r0 + \$r1 \* \$r2



**Multiplication in MIPS**

Hi, Lo is used to store the product  
乘积的高位和低位  
(mult \$rs, \$rt, \$rd [R-format])  
乘积可能超过32bit → 用两个特殊register存着

Fetch Results  
{ mthi (move from lo) mtho \$rs  
| mthi (move from hi) mthi \$rt  
Overflow testing (看能不能把整个result装在一个32bit register里)

① Unsigned overflow  
mult \$s0, \$s1  
mthi \$s1  
mflo \$s2  
mflo \$s2  
bneq \$s2, \$s1, 31  
bneq \$s1, \$t0, no\_overflow  
j overflow  
no\_overflow:

② Signed overflow  
mult \$s0, \$s1  
mfhi \$s1  
mflo \$s2  
mflo \$s2  
bneq \$s2, \$s1, 31  
bneq \$s1, \$t0, no\_overflow  
j overflow  
no\_overflow:  
SRA  
Shift Right Logical  
before C after  
Shift Right Arithmetic  
SRA  
在左边补上 sign bit  
(元sla)

### MIPS Processor Overview



### Example

Division of a 4-bit unsigned number (0111) by another one (0011)

| Iteration | Divisor (D) | Remainder (R) | Remark                            |
|-----------|-------------|---------------|-----------------------------------|
| 0         |             | 0000 0111     | Initial state                     |
| 1         | 0011        | 0000 1110     | $R = K < D$                       |
| 2         |             | 0000 1110     | Undo                              |
| 3         | 0011        | 0000 1110     | $R = K < 1, R = 0$                |
| 4         |             | 0000 1100     | $(Len(R) < Len(D)) \rightarrow D$ |
| 5         | 0011        | 0000 1100     | $R = K < 1, R = 0$                |
| 6         |             | 0000 1000     | $(Len(R) < Len(D)) \rightarrow D$ |
| 7         | 0011        | 0000 1000     | $R = K < 1, R = 0$                |
| 8         |             | 0000 0100     | $(Len(R) < Len(D)) \rightarrow D$ |
| 9         | 0011        | 0000 0100     | $R = K < 1, R = 0$                |
| 10        |             | 0000 0000     | $(Len(R) < Len(D)) \rightarrow D$ |

initial state: 在左边空位, 右半边是 dividend

第一次: 起手先 shift left - 1格!!

左半边减 divisor, 观察相减后的 sign bit  
1: undo 相减操作 + 左移 - 1格, 右边填 0  
0: 左移 - 1格, 右边填 1  
i = divisor 的 bit 数

结束时 shift right - 1格!!

### Hardware Optimized Version (cont.)

■ 32-bit ALU

| Two registers:                                                                                                      |
|---------------------------------------------------------------------------------------------------------------------|
| □ Remainder register 32 bits                                                                                        |
| □ Quotient register 64 bits                                                                                         |
| □ (right half also used for storing quotient)                                                                       |
| Operations:                                                                                                         |
| □ 32-bit divisor is always subtracted from the left half of remainder register                                      |
| □ The result is written back to the left half of the remainder register                                             |
| □ The right half of the remainder register is initialized with the dividend                                         |
| □ The quotient register is updated by shifting left                                                                 |
| □ The new order of the operations in the loop is that the remainder register will be shifted left one time too many |
| □ In the final correction step, just right shift back only the remainder in the left half of the remainder register |



Ideal pipeline analogy ( $M$  jobs,  $N$  stage, 1 time/stage)

$$\text{sequential} \rightarrow \text{pipeline : speedup} = \frac{T_{\text{seq}}}{T_{\text{pip}}} = \frac{MNT}{M(N-1)T}$$

$$= N$$

unbalanced pipeline  
cannot align perfectly (should wait, add stall time)

Each step is independent of each other and takes different datapath

