



## *EE141-Spring 2010 Digital Integrated Circuits*

Lecture 25  
Memory

EECS141

Lecture #25

1

## *Administrativia*

- Hw 8 Posted
  - Last one to be graded
  - Due Friday April 30
- Hw 6 and 7 Graded and available
- Project Phase 2 Graded
- Project Phase 3 Launch Today

EECS141

Lecture #25

2

## Project Phase 2



Performance Histogram

EECS141

Lecture #25

3

## Energy-Delay Plot



Performance - pruned



EECS141

Lecture #25

4

## Phase 2- Results

- Max: 98
- Avg: 78.9
- StDev: 12.9
- Median: 81



EECS141

Lecture #25

5

## Project Phase 3

- Design implementation of 256-bit code
- Create floorplan to estimate wirelength
- Optimize circuit such that energy is minimized for max delay of 10 nsec
- Determine also  $T_{d,\min}$  and  $E_{\min}$  for that design
- Present results in poster



EECS141

Lecture #25

6

## Code Design

Template:



Group A:



Group B: Same but period is 4

Group C: Same but period is 8

Group D: Same but period is 16

## Possible Floorplans



Example #1



Example #2

## Estimating Wire Length



EECS141

Lecture #25

9

## Optimization Criteria



Optimization target



The extremes

EECS141

Lecture #25

10

## Class Material

- ❑ Last lecture
  - Multipliers
- ❑ Today's lecture
  - Memory cells
- ❑ Reading (Ch 12)



## Semiconductor Memory

## Semiconductor Memory Classification

| Read-Write Memory |                               | Non-Volatile Read-Write Memory | Read-Only Memory                       |
|-------------------|-------------------------------|--------------------------------|----------------------------------------|
| Random Access     | Non-Random Access             |                                |                                        |
| SRAM              | FIFO                          | EPROM<br>E <sup>2</sup> PROM   | Mask-Programmed<br>Programmable (PROM) |
| DRAM              | LIFO<br>Shift Register<br>CAM | FLASH                          |                                        |

EECS141

Lecture #25

13

## Random Access Memories (RAM)

### STATIC (SRAM)

**Data stored as long as supply is applied**  
**Larger (6 transistors/cell)**  
**Fast**  
**Differential (usually)**

### DYNAMIC (DRAM)

**Periodic refresh required**  
**Smaller (1-3 transistors/cell)**  
**Slower**  
**Single Ended**

EECS141

Lecture #25

14

## Random Access Chip Architecture

- Conceptual: linear array
  - Each box holds some data
  - But this does not lead to a nice layout shape
  - Too long and skinny
  
- Create a 2-D array
  - Decode Row and Column address to get data



EECS141

Lecture #25

15

## Memory Architecture: Decoders



Intuitive architecture for  $N \times M$  memory  
Too many select signals:  
 $N$  words ==  $N$  select signals

Decoder reduces the number of select signals  
 $K = \log_2 N$

EECS141

Lecture #25

16

## Memory Architecture

### CORE:

- keep square within a 2:1 ratio
- rows are **word lines**
- columns are **bit lines**
- data in and out on columns

### DECODERS:

- needed to reduce total number of pins;  $N+M$  address lines for  $2^{N+M}$  bits of storage
- Ex: if  $N+M=20 \rightarrow 2^{20} = 1\text{Mb}$

### MULTIPLEXING:

- used to select one or more columns for input or output of data



## Basic Static Memory Element



- If  $D$  is high,  $D_b$  will be driven low
  - Which makes  $D$  stay high
  - Positive feedback

## Positive Feedback: Bi-Stability



## Writing into a Cross-Coupled Pair



Access transistor must be able to overpower the feedback

## Writing a “1”

EECS141

Lecture #25

21

## Memory Cell



Complementary data values are written (read) from two sides

EECS141

Lecture #25

22

## 6-Transistor CMOS SRAM Cell



## SRAM Operation - Read

*Read*



- $Q_b$  will get pulled up when WL first goes high
  - Reading the cell should not destroy the stored value

## CMOS SRAM Analysis (Read)



$$k_{n,M5} \left( (V_{DD} - \Delta V - V_{Tn}) V_{DSATn} - \frac{V_{DSATn}^2}{2} \right) = k_{n,M1} \left( (V_{DD} - V_{Tn}) \Delta V - \frac{\Delta V^2}{2} \right)$$

$$\Delta V = \frac{V_{DSATn} + CR(V_{DD} - V_{Tn}) - \sqrt{V_{DSATn}^2(1 + CR) + CR^2(V_{DD} - V_{Tn})^2}}{CR}$$

## CMOS SRAM Analysis (Read)



## CMOS SRAM Analysis (Write)



$$k_{n,M6} \left( (V_{DD} - V_{Tn}) V_Q - \frac{V_Q^2}{2} \right) = k_{p,M4} \left( (V_{DD} - |V_{Tp}|) V_{DSATp} - \frac{V_{DSATp}^2}{2} \right)$$

$$V_Q = V_{DD} - V_{Tn} - \sqrt{(V_{DD} - V_{Tn})^2 - 2 \frac{\mu_p}{\mu_n} PR \left( (V_{DD} - |V_{Tp}|) V_{DSATp} - \frac{V_{DSATp}^2}{2} \right)},$$

## CMOS SRAM Analysis (Write)



## Read Static Noise Margin



EECS141

Lecture #25

29

## 6T-SRAM — Layout



Compact cell

Bitlines: M2

Wordline: bootstrapped in M3

EECS141

Lecture #25

30

## 65nm SRAM

□ ST/Philips/Motorola

*Access Transistor*



EECS141

Lecture #25

31

## SRAM Column



EECS141

Lecture #25

32

## SRAM Array Layout



EECS141

Lecture #25

33

## Resistive Pull-Ups



Static power dissipation -- Want  $R_L$  large  
 Bit lines precharged to  $V_{DD}$  to address  $t_p$  problem

EECS141

Lecture #25

34

## 3-Transistor DRAM Cell



No constraints on device ratios  
 Reads are non-destructive  
 Value stored at node X when writing a "1" =  $V_{WWL} - V_{Tn}$

## 3T DRAM Layout



## 1-Transistor DRAM Cell



Write:  $C_S$  is charged or discharged by asserting WL and BL.  
Read: Charge redistribution takes place between bit line and storage capacitance

$$\Delta V = V_{BL} - V_{PRE} = V_{BIT} - V_{PRE} \frac{C_S}{C_S + C_{BL}}$$

Voltage swing is small; typically around 250 mV.

## 1T DRAM Cell



Cross-section



Layout

Uses Polysilicon-Diffusion Capacitance

Expensive in Area

## Micrograph of 1T DRAM



EECS141

Lecture #25

39

## Advanced 1T DRAM Cells



Trench Cell



Stacked-capacitor Cell

EECS141

Lecture #25

40

## Read-Only Memory



Diode ROM



MOS ROM 1



MOS ROM 2

## MOS NOR ROM



## MOS NOR ROM Layout



EECS141

Lecture #25

43

## MOS NOR ROM Layout



EECS141

Lecture #25

44

## MOS NAND ROM



All word lines high by default with exception of selected row

## MOS NAND ROM Layout





## Floating-Gate Transistor Programming



EECS141

Lecture #25

49

## FLOTOX EEPROM



FLOTOX transistor

Fowler-Nordheim  
 $I-V$  characteristic

EECS141

Lecture #25

50

## EEPROM Cell



Absolute threshold control  
is hard  
Unprogrammed transistor  
might be depletion  
⇒ 2 transistor cell

## Cross Sections of NVM Cells



Flash

EPROM

*Courtesy Intel*

## Periphery

- Decoders
- Sense Amplifiers
- Input/Output Buffers
- Control / Timing Circuitry

## Row Decoders

Collection of  $2^M$  complex logic gates  
 Organized in regular and dense fashion

### (N)AND Decoder

$$WL_0 = A_0 \bar{A}_1 \bar{A}_2 \bar{A}_3 \bar{A}_4 \bar{A}_5 \bar{A}_6 \bar{A}_7 \bar{A}_8 \bar{A}_9$$

$$WL_{511} = \overline{\bar{A}_0 \bar{A}_1 \bar{A}_2 \bar{A}_3 \bar{A}_4 \bar{A}_5 \bar{A}_6 \bar{A}_7 \bar{A}_8 \bar{A}_9}$$

### NOR Decoder

$$WL_0 = \overline{\bar{A}_0 + \bar{A}_1 + \bar{A}_2 + \bar{A}_3 + \bar{A}_4 + \bar{A}_5 + \bar{A}_6 + \bar{A}_7 + \bar{A}_8 + \bar{A}_9}$$

$$WL_{511} = \overline{\bar{A}_0 + \bar{A}_1 + \bar{A}_2 + \bar{A}_3 + \bar{A}_4 + \bar{A}_5 + \bar{A}_6 + \bar{A}_7 + \bar{A}_8 + \bar{A}_9}$$

## Decoder Design Example



- Look at decoder for 256x256 memory block (8KBytes)

EECS141

Lecture #25

55

## Problem Setup

- Goal: Build fastest possible decoder with static CMOS logic

- What we know

- Basically need 256 AND gates, each one of them drives one word line



EECS141

Lecture #25

56

## Problem Setup (1)

- ❑ Each word line has 256 cells connected to it
- ❑ Total output load is  $256 * C_{cell} + C_{wire}$
- ❑ Assume that decoder input capacitance is  
 $C_{address} = 4 * C_{cell}$
- ❑ Each address drives  $2^8 / 2$  AND gates
  - A0 drives  $\frac{1}{2}$  of the gates, A0\_b the other  $\frac{1}{2}$  of the gates
- ❑ Neglecting  $C_{wire}$ , the fan-out on each one of the 16 address wires is:

$$BF = \frac{C_{load}}{C_{in}} = \frac{(2^8 / 2)(2^8 C_{cell})}{4C_{cell}} = 2^{13}$$

## Decoder Fan-Out

- ❑ FB of at least  $2^{13}$  means that we will want to use more than  $\log_4(2^{13}) = 6.5$  stages to implement the AND8
- ❑ Need many stages anyways
  - So what is the best way to implement the AND gate?
  - Will see next that it's the one with the most stages and least complicated gates

## 8-Input AND



$$LE = 10/3 \quad 1$$

$$\Pi LE = 10/3$$

$$P = 8 + 1$$

$$LE = 2 \quad 5/3$$

$$\Pi LE = 10/3$$

$$P = 4 + 2$$

$$LE = 4/3 \quad 5/3$$

$$\Pi LE = 80/27$$

$$P = 2 + 2 + 2 + 1$$

## 8-Input AND



- ❑ Using 2-input NAND gates

- 8-input gate takes 6 stages

- ❑ Total LE is  $(4/3)^3 \approx 2.4$

- ❑ So PE is  $2.4 * 2^{13} - \text{optimal } N \text{ of } \sim 7.1$

## Decoder So Far

- 256 8-input AND gates
  - Each built out of tree of NAND gates and inverters
- Issue:
  - Every address line has to drive 128 gates (and wire) right away
  - Can't build gates small enough - Forces us to add buffers just to drive address inputs



## Look Inside Each AND8 Gate



## Predecoders

- ❑ Use a single gate for each of the shared terms
  - E.g., from  $A_0$ ,  $\bar{A}_0$ ,  $A_1$ , and  $\bar{A}_1$ , generate four signals:  $A_0A_1$ ,  $\bar{A}_0A_1$ ,  $A_0\bar{A}_1$ ,  $\bar{A}_0\bar{A}_1$
- ❑ In other words, we are decoding smaller groups of address bits first
  - And using the “predecoded” outputs to do the rest of the decoding

## Predecoder and Decoder



## Hierarchical Decoders

*Multi-stage implementation improves performance*



## Dynamic Decoders



### 4-input pass-transistor based column decoder



Advantages: speed ( $t_{pd}$  does not add to overall memory access time)

Only one extra transistor in signal path

Disadvantage: Large transistor count

### 4-to-1 tree based column decoder



Number of devices drastically reduced

Delay increases quadratically with # of sections; prohibitive for large decoders

Solutions:  
buffers  
progressive sizing  
combination of tree and pass transistor approaches

# Sense Amplifiers

## Idea: Use Sense Amplifier



# *Differential Sense Amplifier*



Directly applicable to  
SRAMs

## Differential Sensing – SRAM



(a) SRAM sensing scheme



(b) two stage differential amplifier

## Latch-Based Sense Amplifier (DRAM)



Initialized in its meta-stable point with  $\overline{EQ}$

Once adequate voltage gap created, sense amp enabled with  $SE$   
Positive feedback quickly forces output to a stable operating point.