

ECE 6473

# Course Project: Memory System Design

Assistant Professor Shaloo Rakheja  
Electrical and Computer Engineering, NYU

# Project components

| Project Part                             | Due Date |
|------------------------------------------|----------|
| Part-1 Address Decoder                   | Nov. 10  |
| Part-2 Peripherals                       | Nov. 20  |
| Part-3 SRAM cell + layout                | Dec. 04  |
| Part-4 SRAM array + layout = full report | Dec. 15  |

**Project deliverables are distributed in 4 parts with specific deadlines.**

**In addition to these deliverables, we have TWO more HW, and ONE pop-quiz.**

| November 2015                                       |                     |     |       |                     |     |     |
|-----------------------------------------------------|---------------------|-----|-------|---------------------|-----|-----|
| Mon                                                 | Tues                | Wed | Thurs | Fri                 | Sat | Sun |
| 26                                                  | 27                  | 28  | 29    | 30                  | 31  | 1   |
| 2<br><b>Mid-term Exam</b>                           | 3                   | 4   | 5     | 6                   | 7   | 8   |
| 9<br><b>(SRAM)</b>                                  | 10<br><b>Part-1</b> | 11  | 12    | 13                  | 14  | 15  |
| 16<br><b>(SRAM + PD)</b>                            | 17                  | 18  | 19    | 20<br><b>Part-2</b> | 21  | 22  |
| 23<br><b>(DRAM + Clocking)</b>                      | 24                  | 25  | 26    | 27                  | 28  | 29  |
| 30<br><b>(Int. + I/O)<br/>HW-5<br/>(individual)</b> | 1                   | 2   | 3     | 4                   | 5   | 6   |

| December 2015                               |                           |     |       |             |     |     |
|---------------------------------------------|---------------------------|-----|-------|-------------|-----|-----|
| Mon                                         | Tues                      | Wed | Thurs | Fri         | Sat | Sun |
| 30                                          | 1                         | 2   | 3     | 4<br>Part-3 | 5   | 6   |
| <b>7 No class<br/>HW-6<br/>(individual)</b> | 8                         | 9   | 10    | 11          | 12  | 13  |
| 14<br>Revision                              | <b>15<br/>Full report</b> | 16  | 17    | 18          | 19  | 20  |
| 21<br><b>Final exam<br/>Comprehensive</b>   | 22                        | 23  | 24    | 25          | 26  | 27  |

**No class on Dec. 7<sup>th</sup>**, prepare for the final project report + exam  
 Office hours on Dec. 11<sup>th</sup>, 15<sup>th</sup>, 16<sup>th</sup> (timing announced later)

# Memory (SRAM) System



# Configurations

- Address: 4 bit
  - Stored in 4-bit register (4 Master-Slave Flip Flops)
- Data: 4 bit
  - Stored in 4-bit register (4 Master-Slave Flip Flops)
- SRAM array: 64 bit
  - 16 rows, 1 column (data width = 4 bits)
  - 4 bit row decoding
- Column circuits
  - Read and write circuits

# Specifications

- Technology: FreePDK45 CMOS
  - Min. length = 50 nm, min. width = 90 nm
- Nominal supply voltage: 1V
- Target clock frequency: 1 GHz
- SRAM cell area < 1  $\mu\text{m}^2$

# Functionality



- Functional testing: We will keep  $A1=A2$  and see whether design gives  $D1=D2$

# Timing



- We will test whether the array functions correctly (i.e. you write and read data correctly) at a frequency of 1 GHz
- What is the maximum frequency you can run the design? **Extra points**

# Project Plan

## Part-I:

- Decoder Logic and WL generation
  - Schematic level design by **Nov. 10**

## Part-II:

- Peripherals (address & data latches, R/W circuitry)
  - Schematic level design **Nov. 20**

## Part-III:

- Design and layout of the SRAM cell
  - Schematic level design and cell layout **Dec. 04**

## Part-IV:

- Layout of the array (DRC and LVS clean)+ all Cadence files **(Dec. 15)**
- Project report **(Dec. 15)**
- Presentations **(Dec. 14 during regular class hours)**

# Project Grading (100% + **20% bonus**)

- Written report – 20%
- Presentation – 10%
- Schematic – 25%
  - **If schematic + results for any component is missing you will not get ANY POINT in this part**
- Layout – 20%
  - **If layout for the SRAM cell and array is missing you will not get ANY POINT in this part**
- Timely progress (meeting deadlines) – 15% (**submitting HWs in time**)
- Design quality
  - Meeting specifications – 10%
  - Can you run it faster? – **10% (bonus)**
  - Can you lower the supply and still run it at 1 GHz? **-10% (bonus)**

# “Good Design Practices”

- You will get more points in if you follow “good design practices” in your schematic and layout
- Bottom-line: another designer should be able to understand your design by looking at the schematic and layout
- **Schematic: Some Tips**
  - Hierarchical : Transistors -> gates -> blocks -> system
  - No transistors or basic gates at the top level
  - Do not use ‘long wires’ to connect nodes in schematic
    - If you want connect point A to B - draw a short wire at A, label it (Say, X), draw another short wire at B and label it with X
  - Label everything (nodes, wires, instances)
  - Write an one line description of the circuit in the schematic view’
  - Your schematic should ‘look good’ and easy to follow
- **Layout: Some Tips**
  - Hierarchical : Transistors -> gates -> blocks -> system
  - Above M1 draw one layer of metal in one direction only (M2 in vertical, M3 in horizontal, etc. )
  - Label every nodes
  - Do everything you can to make it more readable
  - Layout should be easy to follow

**PART-I: DECODER LOGIC AND WORD-LINE GENERATION**

**DUE: NOV. 10**

# Decoder logic

- **At the rising edge of the CLK, the address will be assigned to the Decoders.**
- **Row decoder:**
  - Decode 4 bits ( $A_0, \dots, A_3$ ).
  - All the WL will be initially low.
  - The WL corresponding to the decoded address will make a low-to-high transition only after the decoding is complete.
  - The selected WL will make a high-to-low transition:
    - either at the falling edge of CLK (i.e. CLK high-time = WL turn on time + decoder delay)
    - or after a time interval equal (from the rising edge of WL) to the 50% of clock period (i.e. CLK high-time = WL turn on time)

# Timing Details: Option (1)



- You may decide to synchronize the falling edge of the WL with falling edge of the CLK
  - Pros: Simple logic for WL generation
  - Cons: Tighter timing for SRAM cell

# Timing Details: Option (2)



- You may decide to delay the falling edge of the WL so that WL turn on time is same as the CLK high time
  - Pros : Less stringent timing for SRAM cell
  - Cons: Little more complex logic for WL generation

You are free to choose any options

# Other Timing Details



- Pre-charge (PRE) is synchronized with WL.
- Remember there are 16 WL signals but one pre-charge signals.
- You need to generate pre-charge signal from CLK and synchronize with all WL signals.
- SAE also needs to be generated from CLK and synchronized with PRE.
- If you use option (1), you can synchronize SAE and PRE with CLK as falling edge of the WL is synchronized with CLK.

# Tasks

- **Design the row decoder**
  - Row decoder delay should be less than 25%-35% of the total CLK high time
- **Ensure the logical functionality and timing of the WL signal for the chosen option.**

# Tasks



- Since we have not done the layout we do not know the exact capacitance of the WLs.
- For this part we will assume a 10 fF (possibly overestimation) capacitive load at each WL.

# Decoder



| <b>S0</b> | <b>S1</b> | <b>I0</b> | <b>I1</b> | <b>I2</b> | <b>I3</b> |
|-----------|-----------|-----------|-----------|-----------|-----------|
| 0         | 0         | 1         | 0         | 0         | 0         |
| 0         | 1         | 0         | 1         | 0         | 0         |
| 1         | 0         | 0         | 0         | 1         | 0         |
| 1         | 1         | 0         | 0         | 0         | 1         |

- Depending on the input conditions only ‘one’ output will be high and others will be low .
- You can also have only one output low and others high.
- Bottom line: only one output should be at a different state than the others.

# Decoder: Static Logic



# Large Decoders: Static Logic

**Consider 16 to 256 decoder**

$$I_0 = \overline{S_0 + S_1 + \dots + S_{15}} \Rightarrow \text{high fan-in } NOR \text{ gate}$$

**=> Large low-to-high delay**

**Hierarchical implementation is helpful**

**Use 2-to-4 decoders to implement 4-to-16 decoder**

**Use 4-to-16 decoders to implement 16-to-256 decoder**

# What to submit with homework on **Nov. 10**

- A printout of the top-level schematic of the decoder from cadence
  - clearly write your group number on the Cadence schematic view.
- A printout of the schematic which implements the WL logic
- A printout of the waveform of operation
  - Consider two successive CLK cycles (1 ns to 2ns ) and (2 ns to 3 ns). Assume CLK is zero from 0 – 1 ns
  - Assume address bits A0, A1, A2, A3 = 0 for 0 – 1ns
  - Make A2 = 1 at 1ns and turn it back to 0 at 1.9 ns
  - Show waveform of CLK, A2
  - Show waveform of WLs corresponding to a) A0 A1 A2 A3 = [0 0 0 0] and b) A0 A1 A2 A3 = [0 1 0 0 ]
  - Waveform should show the correct operation for the option you have chosen
- Report:
  - Delay from rising edge of the CLK to the rising edge of WL
  - Delay from falling edge of the CLK to the falling edge of the WL

# What to include in final report for this section?

- **Explain the design goal stating**
  - (a) target functionality (draw the cartoon of the waveform showing the intended functionality of WL)
  - (b) Target speed
- **Explain the design concept:**
  - (a) How did you implement the decoder
  - (b) Which logic style you have used for decoder
  - (c) If you have used logical effort (you don't have to) to size the gates in the decoder to meet the delay target, specify it.
  - (d) Explain the logical implementation of the WL signal – show hand drawn schematics
- **Cadence printouts (same as previous slide)**
- **Report**
  - Delay from rising edge of CLK to rising edge of WL
  - Delay from falling edge of CLK to falling edge of WL
  - Average power dissipation in the decoder

## **PART-II: REGISTERS AND COLUMN CIRCUITS**

**DUE: NOV. 20**

## Part-II components

| Part-II   | Nov. 20                    |
|-----------|----------------------------|
| Problem-1 | Address and data registers |
| Problem-2 | Write circuits             |
| Problem-3 | Read circuits              |
| Problem-4 | Complete Peripheral        |

*Break the tasks between the four project partners. Interact and LEARN.*

# **PROBLEM 1: ADDRESS AND DATA REGISTERS**

# **Problem-1: Address and Data Registers**

- 4 bit address register
- 4 bit data register
  - one for input data
  - one for output data
- Registers
  - Positive edge-triggered
  - You can use Dynamic or Static Master-Slave Flip-Flops

# Tasks

- Design the 4 bit address and 4 bit data registers (you need to design a single flip/flop and arrange them in parallel) satisfying the following targets
  - Target CLK period = 1 ns
  - Setup time < 15% of CLK period
  - CLK-Q delay < 10% of CLK period
  - Do not unnecessarily upsize the devices in the latch => it will increase the power dissipation
- Connect the address register to the row decoder to complete the addressing circuit.
- *You have already designed the row decoder in Part-I (due on Nov. 09)*

# What to submit with homework on Nov. 20

- **Cadence schematic of the single flip/flop**
- **Describe the design style you have used**
- **Show waveform of operation for a single flip/flop**
  - Apply data to D and show it changes state in Q
  - Point the CLK-Q-Delay in the waveform
  - Show two cases one just satisfying setup requirement and one just violating the setup requirement
  - Report the setup time in your waveform and in your written solution
  - What is the hold time for your flip flop design?

# What to submit with homework on Nov. 20

- **Show the combined waveform of operation of the address latch and decoders**
  - Repeat the address assignment experiment that you did in Part-I but now instead of applying the addresses A0, A1, A2, A3 to the decoder directly, apply them to the registers.
  - Show that the address changes propagate first to the flip-flop outputs
  - Next, show that selected WL changes state as the applied address changes
  - Remember address need to appear at the D input of the flip/flops before the CLK edge (i.e. need to satisfy the setup time requirement).

# What to include with final report

- **Describe the circuit**
  - Circuit level schematic of the flip/flops
  - Hand-drawn waveform of the circuit operation
  - Report key design features
- **Report the key characteristics (setup time, CLK-Q delay, hold time, power dissipation, etc.)**
- **Waveform of operations for the flip/flop**

## **PROBLEM 2: WRITE CIRCUITS**

## Problem 2: Write circuitry

- Design the Write circuit to apply the data to the bit-lines (**you will need an additional input signal that tells whether a memory access operation is read or write**)
  - Design the data drivers
  - Connect the write circuit to the column and implement the read/write control
  - Ensure that data driver will be connected to the columns if and only if write operation is selected
  - Assume that the bitlines have a capacitive load of 40 fF
- Connect the one bit of the data register to the write circuitry

# What to submit with the homework on Nov. 20

- **Cadence schematic**
- **Testing strategy**
  - How you will verify the circuit? Draw waveform that will tell you the circuit is functioning correctly.
  - Which delay (s) in the write circuit are important to ensure proper write operations [characteristics delay(s)]?
  - Describe the global synchronization/timing requirement for the entire system to ensure write operation (you can draw simple waveform to explain this)
    - remember you want the write data to arrive at the bitline before the WL signal goes high
    - What should be the expected timing for Pre-charge, Column select arrival, data arrival at bitline, and WL arrival?
- **Show the waveform from Cadence which verifies the operation**
- **Report the delay(s) that characterizes the write circuit**

# What to submit in final report

- **Description of the circuit**
  - Include simple hand-drawn waveforms
- **Schematic**
- **Waveform of operation – Characteristics delay(s)**

# **PROBLEM 3: READ CIRCUITS**

## Problem 3: Read circuitry

- Design the read circuit
  - Sense amplifiers
    - Ensure the sense-amplifier delay < 15-20% of the CLK period (assume 10fF capacitance at the output).
    - Assume 10% worst-case mismatch in the width of sense-amplifier devices and find out the minimum bit-differential that can be correctly sensed
  - Ensure that the during the read operation write path is disconnected
  - Consider the global timing and synchronization requirements

# What to submit with HW on Nov. 20

- **Cadence schematic**
- **Testing strategy**
  - **How you will verify the circuit? Draw the waveform that will tell you the circuit is functioning correctly.**
  - **Which delay (s) in the sense-amplifier are important to ensure proper read operations?**
  - **Describe the global synchronization/timing requirement for the entire system to ensure read operation (you can draw simple waveform to explain this)**
    - **What happens after WL goes high to bitlines?**
    - **When the sense-amplifier gets activated?**
    - **When the sensed data arrives at the output and gets latched at the data latch?**
- **Show the simulation waveform (from Cadence/Awaves) which verifies the operation**
- **Report the delay(s) that characterizes the read circuit**

# What to submit in final report

- **Description of the circuit**
  - Include simple hand-drawn waveforms
- **Schematic**
- **Waveform of operation – Characteristics delay(s)**

# **PROBLEM 4: COMPLETE PERIPHERAL**

## Problem 4: Peripheral

- **Complete the row circuits**
  - Connect the 4- bit row address registers to Row decoders (with WL generation logic).
- **Complete the column circuits**
  - Assume 16 bitlines (bl and !bl) each with a cap. of 40fF
  - Connect the output of the bitlines to the read/write circuitry

## Problem 4: Peripheral



## Problem 4: Timing path completed Peripheral



# What to submit with homework on Nov. 20

- **Block level Cadence schematic**
- **Waveform**
  - Operation of Path (1): i.e. show row addr. gets propagated to the WL
  - Show waveforms verifying operation of Timing Path (3)
    - Read : Assume that  $VBL = VDD$  and  $\neg VBL = VDD - \Delta$  for the column and show that if read operation is selected the sense-amp output latches the correct value
    - Write: Assume  $d = 1$  and  $\neg d = 0$  and show the bitlines for the column gets proper value

# What to submit with homework on Nov. 20

- **Report total delay for**
  - **Timing path (1): Positive CLK edge to positive WL edge**
  - **Timing path (2) – read: Positive edge at SAE to data arrival at the latch output**
  - **Timing path (2) – write: Positive edge at CLK to the change in the bitlines voltages (assume before the positive edge arrives all the bitlines are pre- charged to VDD)**

# What to submit in final report

- **Block diagram of the total peripheral Description**
- **Waveform of operations and delays for all the timing paths through peripherals**

**PART-III: SRAM CELL AND ARRAY**  
**DUE: DEC. 04**

| <b>Part-III</b>  | <b>Dec. 04</b>                                                       |
|------------------|----------------------------------------------------------------------|
| <b>Problem-1</b> | <b>SRAM schematic (un-optimized)</b>                                 |
| <b>Problem-2</b> | <b>SRAM cell design to meet the target specification (optimized)</b> |
| <b>Problem-3</b> | <b>SRAM Cell layout</b>                                              |
| <b>Problem-4</b> | <b>Timing simulation (optimized cell)</b>                            |
| <b>Problem-5</b> | <b>Schematic of SRAM array</b>                                       |

*Break the tasks between the four project partners. Interact and LEARN.*

## Problem 1:

- **Create the schematic of SRAM cell with precharge transistors and 40fF bitline capacitances in Cadence**
  - ✓ Assume  $W_p = W_{\min}$ ,  $W_{\text{access}} = 1.5W_p$ ,  $W_{\text{pd}} = 2W_p$ .
- **Simulate the cell to perform read and write operations**
  - ✓ Connect the cell to one WL (with 40fF cap) output of the row decoder circuits
  - ✓ Connect the column circuits to the cell.
  - ✓ Use worst-case  $\Delta_{\text{BIT}}$  obtained previously for sense-amplifier firing. If you did not solve it use  $\Delta_{\text{BIT}} = 100\text{mV}$ .

## Problem 1:

- **SRAM cell design**
- **Do not worry about cell access delay at this point**
  - let it be whatever it is for the assumed sizing.
  - At this point aim is to set-up schematic and simulation setup.
- **Include the waveform showing the operations**

## Problem 2:

- **SRAM cell design: size the different transistors of the SRAM cell to achieve the following cell parameters**
  - ✓ Read margin > 30% of  $V_{DD}$
  - ✓ Write margin > 50% of  $V_{DD}$
  - ✓ Cell access time (assuming 40fF bit line capacitances) is such that, total time required to obtain a bit-differential of  $\Delta_{BIT}$  from the rising edge of the CLK < 500ps (i.e. CLK-q-delay of latch + row decoder + cell access time).
  - ✓ Cell area < 1  $\mu\text{m}^2$

## Problem 2:

- Use the methods described in lecture notes for **read margin and write margin measurements**.
- For **access time estimation** you can use either a simple read current based method or full transient simulation
- You can develop a simple formula for computing cell area from device widths (assuming  $L = 50$  nm for all devices).
- Show the intermediate read margin, write margin, and cell access time values for different sizes you tried.
  - For read and write margins, give plots with respect to beta ratios (width ratio for you) as discussed in class.
  - For access time, plot the values as a function of device widths directly.

## Problem 3 (solve concurrently with Problem 2)

- Draw the layout of the optimized cell from problem 2 in Cadence and estimate the total area.
- Perform DRC and LVS and include your DRC/LVS reports in your submission. Your design must be DRC? LVS clean to receive credit.
  - Remember you need to include the substrate and N-well contact even for the single cell to pass DRC.
  - Include it now, but later when you will create the SRAM array, you will delete them and create contacts at the higher level.

## Problem 4

- Solve Problem-1 again with the optimized cell designed in Problem-2.
- Report the final “schematic-level” timing:
  - (a) CLK to WL
  - (b) cell access time
  - (c) sense-amplifier
  - (d) latch
- Show the maximum frequency at which you can read your array (schematic level).

## Problem 5

- Create the schematic for the 16x4 array and connect it to the full-decoder and column circuits.
- This should complete the entire schematic level design of the entire memory system.
- Include the top level block diagram with your submission.

# What to include in the final report?

- **Schematic and layout of the cell**
- **Report:**
  - Size of different transistors
  - Read margin, write margin, and cell access time
  - Cell area
- **Waveform:**
  - Full system level waveform for read operation
  - Full system level waveform for write operation
- **System level delay:**
  - Report delays of different components of the read path and the total read delay
  - Report the max. frequency at which you can run the system

# **PART-IV: ARRAY LAYOUT**

# Array layout

- Make the layout of the full 16x4 SRAM array
- You need to add the pre-charge transistors to each column
  - Think about where you will place them
- Do not forget to include the substrate and the n0well contact
- Attach the power supply grid to the array
- Make it DRC and LVS clean
  - Remember to run LVS with 16x4 cell, pre-charge devices

# Final Report

- Make the layout of the full 16x4 SRAM array
- You need to add the pre-charge transistors to each column
  - Think about where you will place them
- Do not forget to include the substrate and the n0well contact
- Attach the power supply grid to the array
- Make it DRC and LVS clean
  - Remember to run LVS with 16x4 cell, pre-charge devices

# Final Report

- **IEEE Double-Column Format**
- **One report per group**
- **Present a block-diagram of the system and identify the critical components in the delay path**
- **Discuss the key design concept that you have used to design each components and interconnect them**
  - The key concepts only: Do not include the width of every transistors
  - If you have used any interesting technique this is the place to clearly mention it
  - Circuits are better understood from figures (not cadence printouts) – so figures to explain your trick
- **Present the schematic ( with width of different transistors clearly readable - width numbers in Cadence printouts are normally not readable) and layout of the single SRAM cell**
  - Include the layout as a figure in the word-file

# Final Report

- **Make a Table where you present the value of key properties of different components of your design**
  - Latch -> setup time and CLK-Q delay
  - Decoder -> row-decoder delay and column decoder delay (mention delay before array layout and after)
  - SRAM cell -> Read margin, write margin, area, and cell access time
  - Sense-amplifier -> worst-case delay and offset voltage
  - Write-circuit -> Delay to discharge a bitline from  $V_{dd}$  to 0
- **Show waveform (include only CLK, WL, cell nodes, bitlines, and final read-outs) of read and write operation**
  - Make sure the waveforms are readable
  - You will be given credit if you include the waveform as a figure in your report (not as a separate printout) – use the print-to-file option to create a picture, annotate it properly to indicate different signals

# Final Report

- Report your total read access delay (CLK-edge to read-out signal) and write delay (CLK-edge to cell node)
- Report the maximum clock frequency till which you could run the system
- The maximum length (shorter the better) of the report is

**4 pages**

**Additional pages will not only increase your effort it will also reduce your marks**

**4 page is a lot of space if you only report the important stuff - the top design conferences allow only 2-pages to explain a microprocessor design**

## Presentations (Dec. 14)

### Informal

- One person from each group will present for 10 minutes (keep it brief) followed by question-answers
- It is a good chance for you
  - to present the concepts you have used to the class
  - to know what others have done
- Grading – if you present at the assigned time you will get the full-credit for presentation