

---

# **128K × 8 BIT CMOS STATIC RAM DESIGNED FOR LOW POWER**

---

A REPORT ON THE DESIGN AND PERFORMANCE OF A STATIC RAM DESIGNED FOR LOW POWER

*Submitted by:*

ANINDYA PODDAR (2015EEN2655)

MOHIT MISHRA (2014EEN2792)

SAURABH AGRAWAL (2015JVL2553)

VISHNU SHARAN (2015EEN2673)

***Completed under the guidance of***

DR. KAUSHIK SAHA



Department of Electrical Engineering

**INDIAN INSTITUTE OF TECHNOLOGY DELHI  
NEW DELHI, INDIA- 110016**

---

## ABSTRACT

---

During the course of this project, a  $128K \times 8$  bit CMOS Static RAM (SRAM) was designed with focus on achieving minimal power consumption. The target specifications are taken from that of TB85 belonging to the K6F1008T2D family of CMOS SRAMs manufactured by Samsung Electronics. Using UMC90nm technology, a 6T SRAM cell was designed for low leakage, obtained area was  $2.395\mu\text{m}^2$ . A Charge Transfer Sense Amplifier (CTSA) with a novel biasing scheme was designed for further lowering power during read operations. A novel Address Transition Detector (ATD) having very low power consumption and area, highly immune to PVT variations was devised. The chip requires  $3.2025\text{ mm}^2$  area.

## ACRONYMS

---

---

|      |                                      |
|------|--------------------------------------|
| SRAM | Static Random Access Memory          |
| SNM  | Static Noise Margin                  |
| CR   | Cell Ratio                           |
| PR   | Pull up Ratio                        |
| RSNM | Read Static Noise Margin             |
| HSNM | Hold Static Noise Margin             |
| WSNM | Write Static Noise Margin            |
| TT   | Typical NMOS and Typical PMOS corner |
| SS   | Slow NMOS and Slow PMOS Corner       |
| FF   | Fast NMOS and Fast PMOS Corner       |
| SNFP | Slow NMOS and Fast PMOS Corner       |
| FNSP | Fast NMOS and Slow NMOS Corner       |
| HS   | High Speed Transistor                |
| LL   | Low Leakage Transistor               |
| ATD  | Address Transition Detector          |
| CTSA | Charge Transfer Sense Amplifier      |
| PD   | Power Dissipation                    |
| PDP  | Power Delay Product                  |
| SA   | Sense Amplifier                      |
| LVS  | Layout versus Schematic              |
| DRC  | Design Rule Check                    |
| PEX  | Parasitic Extraction                 |
| PVT  | Process Voltage Temperature          |

## CONTENTS

---

|                                         |           |
|-----------------------------------------|-----------|
| <b>Abstract</b> .....                   | <b>2</b>  |
| <b>Acronyms</b> .....                   | <b>3</b>  |
| <b>List of Figures</b> .....            | <b>7</b>  |
| <b>List of Tables</b> .....             | <b>9</b>  |
| <b>1. Introduction</b> .....            | <b>10</b> |
| 1.1. Design Specifications .....        | 10        |
| <b>2. Background Information</b> .....  | <b>11</b> |
| 2.1. Process Technology .....           | 11        |
| 2.1.1. Device Model.....                | 11        |
| 2.1.2. SPICE control Parameters .....   | 11        |
| 2.2. Design Flow Chart.....             | 12        |
| 2.3. Timeline and Work Division.....    | 13        |
| 2.4. Literature survey .....            | 13        |
| 2.5. Simulation Tools Used.....         | 13        |
| <b>3. Architecture</b> .....            | <b>14</b> |
| 3.1. Array Partitioning.....            | 14        |
| 3.2. RC Extraction .....                | 14        |
| 3.3. Metal Planning.....                | 15        |
| 3.4. Memory Division.....               | 15        |
| 3.5. Address Lines Division.....        | 16        |
| 3.6. Floorplan.....                     | 17        |
| 3.6.1. Address Routing .....            | 18        |
| 3.6.2. Data Routing .....               | 19        |
| <b>4. Detailed Design</b> .....         | <b>20</b> |
| 4.1. Cell and Core Design .....         | 20        |
| 4.1.1. Cell Type.....                   | 20        |
| 4.1.2. Cell Schematic.....              | 20        |
| 4.1.3. Cell Schematic Simulations ..... | 21        |
| 4.1.3.1. Static Noise Margin.....       | 21        |
| 4.1.3.2. Write Noise Margin .....       | 23        |
| 4.1.3.3. Read Noise Margin .....        | 25        |
| 4.1.3.4. Transient Simulations .....    | 28        |
| 4.1.4. Cell Layout (Unoptimized) .....  | 30        |
| 4.1.5. Summary of Cell .....            | 30        |

---

|                                                                                          |           |
|------------------------------------------------------------------------------------------|-----------|
| 4.1.6. Core Layout .....                                                                 | 31        |
| 4.1.6.1. Core Assembly .....                                                             | 31        |
| 4.1.6.2. Optimizations Incorporated.....                                                 | 31        |
| <b>4.2. Buffers and Drivers .....</b>                                                    | <b>33</b> |
| 4.2.1. Write Buffers .....                                                               | 33        |
| 4.2.1.1. Logic Expressions and Circuit Schematic: .....                                  | 33        |
| 4.2.1.2. Sizing of Gates: .....                                                          | 34        |
| 4.2.1.3. Layout: .....                                                                   | 35        |
| 4.2.2. Word Line Driver .....                                                            | 35        |
| 4.2.2.1. Schematic: .....                                                                | 35        |
| 4.2.2.2. Layout: .....                                                                   | 36        |
| <b>4.3. Novel ATD Scheme .....</b>                                                       | <b>36</b> |
| 4.3.1. ATD Inputs.....                                                                   | 37        |
| 4.3.2. ATD Outputs .....                                                                 | 37        |
| 4.3.3. Conventional ATD Circuit: .....                                                   | 37        |
| 4.3.4. Pulse stretching Circuit: .....                                                   | 38        |
| 4.3.4.1. Inverter Based Delay Approach I.....                                            | 39        |
| 4.3.4.2. Inverter Based Delay Approach II.....                                           | 40        |
| 4.3.4.3. RC Based Delay Approach.....                                                    | 41        |
| 4.3.5. Novel ATD Circuit: .....                                                          | 42        |
| 4.3.6. Comparison of Novel and Conventional ATD.....                                     | 44        |
| 4.3.6.1. Robustness w.r.t. Process and Temperature variations for Conventional ATD ..... | 44        |
| 4.3.6.2. Robustness w.r.t. Process and Temperature variations for novel ATD .....        | 44        |
| 4.3.6.3. Power Comparison .....                                                          | 45        |
| 4.3.7. ATD Layout.....                                                                   | 45        |
| <b>4.4. Decoding scheme.....</b>                                                         | <b>46</b> |
| 4.4.1. Address Decoding .....                                                            | 46        |
| 4.4.1.1. Scheme used.....                                                                | 46        |
| 4.4.1.2. Address decode circuitry with schematic .....                                   | 47        |
| 4.4.1.3. Address decode circuit critical path timing and sizing of gates-.....           | 47        |
| 4.4.1.4. Address decode circuit schematic simulation.....                                | 48        |
| 4.4.1.5. Address decode circuit layout.....                                              | 49        |
| 4.4.2. Row Decoding .....                                                                | 49        |
| 4.4.2.1. Scheme used.....                                                                | 49        |
| 4.4.2.2. Row decode circuitry with schematic.....                                        | 50        |
| 4.4.2.3. Row decoder circuit critical path timing and sizing of gates-.....              | 51        |
| 4.4.2.4. Row Address decoder circuit layout-.....                                        | 51        |
| 4.4.3. Column Decoding .....                                                             | 52        |
| 4.4.3.1. Scheme used.....                                                                | 52        |
| 4.4.3.2. Column decode circuitry with schematic .....                                    | 52        |
| 4.4.3.3. Column decoder circuit critical path timing and sizing of gates- .....          | 53        |
| 4.4.3.4. Column Address decoder circuit layout- .....                                    | 54        |
| <b>4.5. Core I/O .....</b>                                                               | <b>55</b> |
| 4.5.1. Precharge Circuitry .....                                                         | 55        |
| 4.5.1.1. Scheme Used .....                                                               | 55        |
| 4.5.1.2. Inputs and Outputs Used.....                                                    | 55        |
| 4.5.1.3. Precharge Circuitry Routing .....                                               | 55        |

|                                                       |           |
|-------------------------------------------------------|-----------|
| 4.5.1.4. Estimated Load Capacitance on Outputs .....  | 55        |
| 4.5.1.5. Different Precharge circuit topologies.....  | 56        |
| 4.5.1.6. Precharge Circuit Schematic Simulation ..... | 57        |
| 4.5.1.7. Precharge Circuit Layout .....               | 58        |
| 4.5.2. Sense Amplifier.....                           | 59        |
| 4.5.2.1. Different Topologies .....                   | 59        |
| Voltage mode sense amplifier .....                    | 59        |
| Current mode sense amplifier.....                     | 59        |
| Charge transfer sense amplifier .....                 | 60        |
| 4.5.2.2. Inputs used .....                            | 61        |
| 4.5.2.3. CTSA outputs .....                           | 61        |
| 4.5.2.4. CTSA circuit schematic .....                 | 61        |
| 4.5.2.5. CTSA schematic simulation results .....      | 62        |
| 4.5.2.6. CTSA layout.....                             | 64        |
| <b>5. Integration .....</b>                           | <b>65</b> |
| 5.2.1 summary of area distribution .....              | 68        |
| 5.2.2. Complete Summary .....                         | 68        |
| <b>References .....</b>                               | <b>69</b> |

## LIST OF FIGURES

---

|                                                                     |    |
|---------------------------------------------------------------------|----|
| FIGURE 1: DESIGN FLOWCHART .....                                    | 12 |
| FIGURE 2: GANTT CHART .....                                         | 13 |
| FIGURE 3: METAL PLANNING .....                                      | 15 |
| FIGURE 4: PIN DESCRIPTION OF CHIP .....                             | 16 |
| FIGURE 5: FLOORPLAN OF THE CHIP .....                               | 17 |
| FIGURE 6: ADDRESS ROUTING .....                                     | 18 |
| FIGURE 7: DATA ROUTING (READ) .....                                 | 19 |
| FIGURE 8: DATA ROUTING (WRITE) .....                                | 19 |
| FIGURE 9: CELL SCHEMATIC .....                                      | 21 |
| FIGURE 10: SCHEMATIC FOR STATIC NOISE MARGIN .....                  | 21 |
| FIGURE 11: BUTTERFLY CURVE FOR SNM .....                            | 22 |
| FIGURE 12: BUTTERFLY CURVE FOR HNM AT VARIOUS PROCESS CORNERS ..... | 23 |
| FIGURE 13: BUTTERFLY CURVE FOR WNM .....                            | 24 |
| FIGURE 14: WNM WITH PROCESS VARIATIONS .....                        | 25 |
| FIGURE 15: BUTTERFLY CURVE FOR RNM .....                            | 26 |
| FIGURE 16: RNM AT VARIOUS PROCESSES .....                           | 27 |
| FIGURE 17: WRITE DELAY ANALYSIS .....                               | 28 |
| FIGURE 18: READ DELAY ANALYSIS .....                                | 29 |
| FIGURE 19: LAYOUT OF SRAM CELL .....                                | 30 |
| FIGURE 20: OPTIMIZED 4 BY 4 SRAM CELL .....                         | 31 |
| FIGURE 21: OPTIMIZED CELL LAYOUT .....                              | 32 |
| FIGURE 22: BITLINE TWISTING .....                                   | 33 |
| FIGURE 23: SCHEMATIC OF WRITE BUFFER .....                          | 34 |
| FIGURE 24: LAYOUT OF WRITE BUFFER .....                             | 35 |
| FIGURE 25: SCHEMATIC OF WORD LINE DRIVER .....                      | 35 |
| FIGURE 26: LAYOUT OF WORD LINE DRIVER .....                         | 36 |
| FIGURE 27: CONVENTIONAL ATD .....                                   | 38 |
| FIGURE 28: PULSE GENERATING CIRCUIT .....                           | 39 |
| FIGURE 29: INVERTER BASED DELAY APPROACH I .....                    | 39 |
| FIGURE 30: INVERTER BASED DELAY APPROACH II .....                   | 40 |
| FIGURE 31: RC BASED DELAY .....                                     | 41 |
| FIGURE 32: DUAL EDGE DETECTOR .....                                 | 42 |
| FIGURE 33: OR OPERATION OF THE DUAL EDGE DETECTORS .....            | 43 |
| FIGURE 34: MONOSTABLE MULTIVIBRATOR .....                           | 43 |
| FIGURE 35: ATD LAYOUT .....                                         | 45 |
| FIGURE 36: ADDRESS DECODER .....                                    | 46 |
| FIGURE 37: GATE LEVEL IMPLEMENTATION .....                          | 47 |
| FIGURE 38: SIMULATION .....                                         | 48 |
| FIGURE 39: GATE LEVEL IMPLEMENTATION .....                          | 50 |
| FIGURE 40: ELMORE DELAY MODEL .....                                 | 51 |
| FIGURE 41: LAYOUT .....                                             | 51 |
| FIGURE 42: GATE LEVEL IMPLEMENTATION .....                          | 52 |
| FIGURE 43: ELMORE DELAY MODEL .....                                 | 53 |
| FIGURE 44: LAYOUT OF COLUMN DECODER WITH WRITE DRIVERS .....        | 54 |
| FIGURE 45: HVT PMOS WITH EQUALIZATION CIRCUIT .....                 | 56 |
| FIGURE 46: HVT PMOS WITH LVT NMOS AND EQUALIZATION CIRCUIT .....    | 57 |

---

|                                                              |    |
|--------------------------------------------------------------|----|
| FIGURE 47: LAYOUT OF PRECHARGE CIRCUIT.....                  | 58 |
| FIGURE 48: CROSS COUPLED VOLTAGE MODE SENSE AMPLIFIER.....   | 59 |
| FIGURE 49: CURRENT MODE SENSE AMPLIFIER.....                 | 60 |
| FIGURE 50: CHARGE TRANSFER SENSE AMPLIFIER .....             | 61 |
| FIGURE 51: PVT SIMULATION OF BIASING CIRCUIT .....           | 62 |
| FIGURE 52: SCHEMATIC SIMULATION WAVEFORM OF CTSA .....       | 63 |
| FIGURE 53: LAYOUT OF CTSA.....                               | 64 |
| FIGURE 54: ROW DEC &SENSE AMPLIFIER INTEGRATION LAYOUT ..... | 65 |
| FIGURE 55: INTEGRATION OF CORE WORD LINE DRIVER .....        | 65 |
| FIGURE 56: COLUMN DECODER & CORE INTEGRATION LAYOUT .....    | 66 |
| FIGURE 57: SENSE AMPLIFIER & WRITE BUFFER INTEGRATION .....  | 66 |
| FIGURE 58: TOP LEVEL CHIP LAYOUT.....                        | 67 |

## LIST OF TABLES

---

|                                                                                                  |    |
|--------------------------------------------------------------------------------------------------|----|
| TABLE 1: SUMMARY OF CELL .....                                                                   | 30 |
| TABLE 2: SIZING OF GATES.....                                                                    | 34 |
| TABLE 3: ATD INPUTS.....                                                                         | 37 |
| TABLE 4: ATD OUTPUTS .....                                                                       | 37 |
| TABLE 5: ROBUSTNESS OF CONVENTIONAL ATD.....                                                     | 44 |
| TABLE 6: ROBUSTNESS OF NOVEL ATD .....                                                           | 44 |
| TABLE 7: SIZING OF GATES IN ADDRESS DECODER CIRCUIT HAS BEEN DONE BY LOGICAL EFFORT METHODS..... | 47 |
| TABLE 8: COMPARISON OF DIFFERENT PRECHARGE TOPOLOGIES .....                                      | 56 |
| TABLE 9: COMPARISON.....                                                                         | 57 |
| TABLE 10 INPUTS TO SENSE AMPLIFIER.....                                                          | 61 |
| TABLE 11: CTSA OUTPUTS.....                                                                      | 61 |
| TABLE 12 PVT READINGS.....                                                                       | 62 |
| TABLE 13: SUMMARY .....                                                                          | 68 |

## 1. INTRODUCTION

---

In today's world of increasing chip density and increased constraints on power consumption, memories are proving to be the bottle neck of any embedded or On-Chip design. The fast access time of SRAM makes asynchronous SRAM appropriate as main memory for small cache-less embedded processors used in almost every domain. They are used in various applications like switches and routers, IP-Phones, IC-Testers, DSLAM Cards, to Automotive Electronics. Static Random Access Memory (SRAM) has found its way into almost every IC as an embedded component. Traditionally, an SRAM macro is mainly formed by an array of cells consisting of four or six transistors and a number of periphery circuits such as row decoder, column decoder, sense amplifier, write buffer, etc. Information access from/to this macro consume power in both dynamic and static ways. The dynamic power involving the switching of signals is consumed in operations such as address transition detection, bitline charging/discharging, sense amplification, etc. The static power is consumed when there is a direct path from VDD to ground during memory access. The main challenge is to reduce these power losses to enable the use of SRAMs in energy-limited applications, while maintaining the access speeds.

### 1.1. DESIGN SPECIFICATIONS

---

|                         |                        |
|-------------------------|------------------------|
| Memory Type             | SRAM                   |
| Memory Size             | 128x8k                 |
| Data Width              | 8 bits                 |
| Address Width           | 17 bits                |
| Operating Voltage       | 1 V                    |
| Cycle Time              | 85 ns                  |
| Read Access Time        | 7 ns                   |
| Write Access Time       | 8 ns                   |
| Process Technology Node | UMC 90 nm              |
| No. of Metals           | 8                      |
| No. of Polys            | 1                      |
| Cell Size               | 2.395 $\mu\text{m}^2$  |
| Core Efficiency         | 77.95% without I/O Pad |
| Logic Used              | Static                 |
| Sense Amplifier Type    | Charge Transfer        |
| Decoding Style          | Hierarchical           |
| ATD used                | 1 (Novel)              |

## 2. BACKGROUND INFORMATION

---

---

### 2.1. PROCESS TECHNOLOGY

---

**Technology Information:** The SRAM is designed in the UMC 90nm technology node. The minimum length of transistors for this node is 80 nm and minimum width of transistor is 120 nm. There are three types of transistors available in this technology – HSL (High Speed Logic), LLL (Low leakage logic) and SP (Standard Process). In this project High VT transistors N\_10\_SPHVT and P\_10\_SPHVT were used to achieve low leakage current and effectively low static power dissipation. High VT will also results in high noise margin.

#### 2.1.1. DEVICE MODEL

---

Threshold voltage of NMOS: 0.325 V

Threshold voltage of PMOS: -0.29 V

Gate capacitance of Minimum sized NMOS: 0.369 fF (approx.)

Gate capacitance of Minimum sized PMOS: 0.210 fF (approx.)

#### 2.1.2. SPICE CONTROL PARAMETERS

---

Relative Tolerance (reltol): 1e-3

VAbsolute Tolerance (V abstol): 1e-6 V

IAbsolute Tolerance (Iabstol): 1e-6 V

## 2.2. DESIGN FLOW CHART



FIGURE 1: DESIGN FLOWCHART

## 2.3. TIMELINE AND WORK DIVISION



FIGURE 2: GANTT CHART

## 2.4. LITERATURE SURVEY

We did an intensive literature survey for basic understanding of SRAM. 6T SRAM cell was chosen over other variants of SRAM cell design keeping in mind the advantages of 6T design discussed in Section 4.1.1. Noise margin analysis and basic understanding of read write ability of the cell was understood after going through [1], [2] and values of CR and PR were decided. We went through [3]-[4] for basic understanding of SRAM design and practical challenges in the design. Standard latch based sense amplifier was used in the design. ATD was used with a latch [8] to suppress variations with respect to different address transitions.

## 2.5. SIMULATION TOOLS USED

|                      |                                         |
|----------------------|-----------------------------------------|
| Schematic            | Cadence Design Toolkit-Schematic Editor |
| Layout               | Cadence Design Toolkit-Virtuoso         |
| Design Rule Check    | Mentor Graphics Calibre DRC             |
| Layout vs Schematic  | Mentor Graphics Calibre LVS             |
| Parasitic Extraction | Mentor Graphics Calibre PEX             |

### 3. ARCHITECTURE

---

#### 3.1. ARRAY PARTITIONING

---

It must be made clear in the beginning that due to I/O pins being on one side (right) of the package, the bitlines run horizontally and wordlines run vertically in our design. The entire memory is divided into 8 banks each of 128 Kb. In order to reduce the bit line capacitance, it was decided to have 256 rows and 512 columns per bank. The motive behind such a fine division is to reduce the power dissipation by keeping all the other memory blocks in retention mode and enable sharing of the peripherals.

#### 3.2. RC EXTRACTION

---

*Bitline Resistance ( $\Omega/\mu\text{m}$ ) = 0.564*

*Poly - Metal Via Resistance ( $\Omega/\mu\text{m}$ ) = 12.37*

*Wordline Resistance ( $\Omega/\mu\text{m}$ ) = 12.395 (including the Via Resistance)*

It has been assumed that 1 Via is present at every 2x2 cell Core.

*Bitline Capacitance (fF/ $\mu\text{m}$ ) = 0.135*

*Wordline Capacitance (fF/ $\mu\text{m}$ ) = 0.110*

According to the specifications,

Worst case Access Time  $\leq 10$  ns

Dividing the delay across 5 major components of the memory

Worst case Read or Write Cycle time  $\leq 2$  ns

Using the Elmore Delay model for Bitline, we calculate the maximum no. of columns possible in a Memory Bank

$$2\text{ns} = 0.69 * R * C * (N + 1) * N / 2$$

Where R, C are resistance and capacitance values corresponding to height of 1 cell, N is no. of columns.

Solving for N gives

$$N = 1614 \text{ Columns}$$

Procedure is repeated for maximum no. of rows possible

$$N = 1000 \text{ Rows}$$

**Possible Configurations of the Memory Bank:**

512 Rows x 256 Columns

256 Rows x 512 Columns

128 Rows x 1024 Columns

**The 256 Rows x 512 Columns configuration is chosen for better Aspect Ratio, as our package is square.****We have 8 banks, arranged in 4 rows and 2 columns, thus,  $4 \times 256 = 1024$  wordlines and  $2 \times 512 = 1024$  bitlines. So, it allows us to have a square die.****3.3. METAL PLANNING**

| Layers        | Used For        | Resistance<br>(per $\mu\text{m}$ ) | Capacitance<br>(per $\mu\text{m}$ ) |
|---------------|-----------------|------------------------------------|-------------------------------------|
| Metal Layer 1 | Local Routing   | 0.579 $\Omega$                     | 93.61 aF                            |
| Metal Layer 2 | Word Line       | 0.565 $\Omega$                     | 77.76 aF                            |
| Metal Layer 3 | Bit Line        | 0.564 $\Omega$                     | 69.88 aF                            |
| Metal Layer 4 | Bit Line (Bar)  | 0.557 $\Omega$                     | 64.85 aF                            |
| Metal Layer 5 | Address Routing | 0.544 $\Omega$                     | 61.24 aF                            |
| Metal Layer 6 | Control Routing | 0.534 $\Omega$                     | 59.11 aF                            |
| Metal Layer 7 | VSS             | 0.224 $\Omega$                     | 63.14 aF                            |
| Metal Layer 8 | VDD             | 0.124 $\Omega$                     | 65.22 aF                            |

FIGURE 3: METAL PLANNING

**3.4. MEMORY DIVISION**

No. of Banks = 8

Bank Size = 256 x 512 = 128 Kb

### 3.5. ADDRESS LINES DIVISION

No. of Row Decoding Bits = 5 (A0 – A4)

No. of Column Decoding Bits = 9 (A5 – A13)

No. of Bits for bank Selection = 3 (A14 – A16)

**Considering temporal and spatial locality, MSB address lines are allocated to block decoder, so that memory blocks need not to be switched frequently.**

Total Address bits = 5+9+3 = 17



FIGURE 4: PIN DESCRIPTION OF CHIP

### 3.6. FLOORPLAN



FIGURE 5: FLOORPLAN OF THE CHIP

The figure above shows the initial floor plan of the SRAM. Memory is divided in 16 blocks. These blocks are arranged such that row decoder, column decoder, sense amplifier, write buffer are shared between two memories blocks.

### 3.6.1. ADDRESS ROUTING

The address routing is shown below. The address pins are fed to the ATD and the column and row decoders.



FIGURE 6: ADDRESS ROUTING

### 3.6.2. DATA ROUTING

The data routing is shown below for Read and Write cycles.

During Read, the data is fed into the sense amplifiers and then into the I/O pads.



FIGURE 7: DATA ROUTING (READ)

During Write, the data is fed from the I/O pads to the write buffers and then into the memory cells.



FIGURE 8: DATA ROUTING (WRITE)

## 4. DETAILED DESIGN

---

### 4.1. CELL AND CORE DESIGN

---

#### 4.1.1. CELL TYPE

---

Static 6-T cell is chosen for SRAM cell design. This topology has the following advantages over 8T or 9T SRAM cell.

- **Lower power dissipation** - 6T SRAM cell has less leakage current because less number of transistors are being used compared to 7T, 8T, and 9T SRAM cell architecture.
- **Good stability** - 6T SRAM cell has good stability as compared to 4T SRAM cell. As 4T SRAM cell has lower value of SNM (Static Noise Margin) in low voltage operation.
- **Relaxed area constraints.**
- **Symmetric structure** - Helps during fabrication of device.

#### 4.1.2. CELL SCHEMATIC

---

MATLAB simulations were carried out to find the bounds on pull-up ratio and cell ratio. Pull-up ratio and cell ratio can be of any value because as we are using the high VT transistors, the voltage rise in the internal node will not flip the contents of the cell. Considering acceptable noise margins pull-up ratio was chosen to be 1.5 and cell ratio was 1. Accordingly sizes chosen were as shown in fig. The access transistors were chosen to be of minimum size so as to minimize the bitline capacitance. Sizing of transistors of cell, CR /PR and transistor models are given below:

- Optimum CR = 1.5 & PR = 1.0
- Sizing of Transistors : L= 80 nm
  - Pull-up Transistors : W = 120 nm
  - Access Transistors : W = 120 nm
  - Drive Transistors : W = 180 nm
- Transistor Model :
  - SPHVT (Standard Performance High Threshold Voltage)
    - N\_10\_SPHVT
    - P\_10\_SPHVT
- Process Corner : Slow NMOS Slow PMOS (ss)



FIGURE 9: CELL SCHEMATIC

#### 4.1.3. CELL SCHEMATIC SIMULATIONS

##### 4.1.3.1. STATIC NOISE MARGIN



FIGURE 10: SCHEMATIC FOR STATIC NOISE MARGIN

Static noise margin is the maximum noise voltage  $V_N$  that can come simultaneously on both the voltage sources as shown in the figure above and will not be able to flip the contents of the cell. The cell is in hold mode during this time. Inverter characteristic is taken and is flipped and Maximum Square is determined as shown below:



FIGURE 11: BUTTERFLY CURVE FOR SNM

Static Noise Margin = 548mV (very high as we are using high VT transistors)

Above simulation is for typical corner. We also present the SNM for process variation simulation results as shown below.



FIGURE 12: BUTTERFLY CURVE FOR HNM AT VARIOUS PROCESS CORNERS

Static Noise Margin with process variation = 514mV (for worst case).

#### 4.1.3.2. WRITE NOISE MARGIN

Write noise margin is the maximum noise voltage  $V_N$  that can come simultaneously on both the voltage sources as shown in above section that will prevent write operation to flip the content of the cell. Cell is put in write mode with word lines enabled and one of the bit lines pulled to ground. The worst case write margin is shown below.



FIGURE 13: BUTTERFLY CURVE FOR WNM

Write Noise Margin = 576mV

With process variation WNM is shown below:



FIGURE 14: WNM WITH PROCESS VARIATIONS

Worst Write Noise Margin = 580mV (with process variation)

#### 4.1.3.3. READ NOISE MARGIN

Read noise margin is the maximum noise voltage  $V_N$  that can come simultaneously on both the voltage sources that will flip the content of a cell during a read operation. Cell is put in read mode with word line enabled and bit lines tied to  $V_{DD}$ . The worst case read margin is shown below.



FIGURE 15: BUTTERFLY CURVE FOR RNM

Worst Read Noise Margin = 442mV

With process variation RNM is given below:



FIGURE 16: RNM AT VARIOUS PROCESSES

Worst Read Noise Margin = 310mV (with process variation)

#### 4.1.3.4. TRANSIENT SIMULATIONS

Write Delay of our cell is 82.53 psec. Simulation for write delay analysis is given below:



FIGURE 17: WRITE DELAY ANALYSIS

Read delay of our cell is 146.677 psec. Simulation results of our read delay analysis are shown:



FIGURE 18: READ DELAY ANALYSIS

#### 4.1.4. CELL LAYOUT (UNOPTIMIZED)



FIGURE 19: LAYOUT OF SRAM CELL

Single SRAM CELL Layout is shown in the Figure above. At the top, the VDD contacts are shared with transistor above and the adjacent column transistors. The ground contacts are also shared between the adjacent column transistors. Also, at the bottom the contacts for BIT and BIT\_B are shared with the transistor below. For BIT and BIT\_B, Metal 2 is used. Word Line is drawn using Poly and Metal 3 stripes running in parallel.

#### 4.1.5. SUMMARY OF CELL

Dimensions, Area, Estimated core area, SNM, Read and write margins.

TABLE 1: SUMMARY OF CELL

| Parameter                                            | Value                 |
|------------------------------------------------------|-----------------------|
| <b>Length of the cell (um)</b>                       | 2.035 $\mu\text{m}$   |
| <b>Width of the cell (um)</b>                        | 1.659 $\mu\text{m}$   |
| <b>Area of the cell (<math>\mu\text{m}^2</math>)</b> | 3.376 $\mu\text{m}^2$ |

|                                             |                      |
|---------------------------------------------|----------------------|
| <b>Estimated Core Area (mm<sup>2</sup>)</b> | 3.27 mm <sup>2</sup> |
| <b>Hold SNM in Active Mode (mV)</b>         | 514 mV               |
| <b>Read SNM (mV)</b>                        | 310 mV               |
| <b>Write SNM (mV)</b>                       | 580 mV               |

#### 4.1.6. CORE LAYOUT

##### 4.1.6.1. CORE ASSEMBLY

Area of our optimized 4 by 4 cell area is 38.23  $\mu\text{m}^2$ .



FIGURE 20: OPTIMIZED 4 BY 4 SRAM CELL

##### 4.1.6.2. OPTIMIZATIONS INCORPORATED

Following are the Layout Optimization Techniques we have implemented:

- The vias and contacts that are instantiated in the designs were flattened and reduced to a minimum possible size.
- Where ever possible, Metal contacts can be removed from the intermediate source and drains of the transistors.
- Pickups are placed 20 $\mu\text{m}$  apart from each other. Such that there must be a Pickup for every transistor within 20 $\mu\text{m}$ .

### Optimized Layout:

Optimized layout of cell is given below:

- Length : 1.785  $\mu\text{m}$
- Width : 1.485  $\mu\text{m}$
- Area : 2.650  $\mu\text{m}^2$



FIGURE 21: OPTIMIZED CELL LAYOUT

### Bitline Twisting:

We have done following steps in the bit line twisting idea:

- Bit line twisting scheme, where bit line is twisted after every 16 columns (or after every 16<sup>th</sup> Word Line).
- Bit line twisting will make sure that effect of coupling noise will be symmetric on both BL and BL\_BAR, and thus effect of coupling noise will be reduced.
- There are total 512 columns. Bit line is twisted 32 times in each core.

Next, we have shown bit line twisting scheme:



FIGURE 22: BITLINE TWISTING

## 4.2. BUFFERS AND DRIVERS

### 4.2.1. WRITE BUFFERS

Write buffer discharges the bit-line or the complement as required by the input data to be written to the cell.

#### 4.2.1.1. LOGIC EXPRESSIONS AND CIRCUIT SCHEMATIC:

##### **Case-I:**

When, Data (D) = 0

$$\overline{WE} = 0, \overline{CS}_1 = 0, CS_2 = 0, D = 0$$

$$Y_1 = \overline{\overline{WE} \cdot \overline{CS}_1 \cdot \overline{D} \cdot CS_2}$$

$$= \overline{\overline{WE}} \cdot \overline{D} + \overline{CS_1} \cdot CS_2$$

**Case-II:**

When, Data (D) = 1

$$\overline{WE} = 0, \overline{CS_1} = 0, CS_2 = 0, D = 1$$

$$Y_1 = \overline{\overline{WE}} \cdot \overline{CS_1} \cdot D \cdot \overline{CS_2}$$

$$= \overline{\overline{WE}} \cdot D + \overline{CS_1} \cdot CS_2$$



FIGURE 23: SCHEMATIC OF WRITE BUFFER

**4.2.1.2. SIZING OF GATES:**

Sizing of gates according to Elmore Delay are:

TABLE 2: SIZING OF GATES

| S. No. | Gate      | Size          |
|--------|-----------|---------------|
| 1      | NOT gate  | $S_1 = 1$     |
| 2      | NAND gate | $S_2 = 3.76$  |
| 3      | NOR gate  | $S_3 = 11.32$ |

#### 4.2.1.3. LAYOUT:



FIGURE 24: LAYOUT OF WRITE BUFFER

Write Buffer (width =  $9.06\mu\text{m}$ , height =  $5.17\mu\text{m}$ , Area =  $46.84\mu\text{m}^2$ )

#### 4.2.2. WORD LINE DRIVER

There should be an inverter chain to drive the long word line. This word line driver is designed to drive word line of length  $350\mu\text{m}$ , equivalent to a capacitance of  $64\text{ fF}$ .

##### 4.2.2.1. SCHEMATIC:



FIGURE 25: SCHEMATIC OF WORD LINE DRIVER

#### 4.2.2.2. LAYOUT:



FIGURE 26: LAYOUT OF WORD LINE DRIVER

#### 4.3. NOVEL ATD SCHEME

The ATD (Address Transition Detector) circuit is heart of the asynchronous SRAM memory chip. It decides total operation of the memory chip. Unlike the synchronous SRAM design which suffers from clock skew, synchronization with external inputs etc., whereas an asynchronous SRAM design responds to any address transition in the signal. This operation is very fast but it also suffers from problems which lead to undesired operation due to change in the address signal because of noise. To overcome this problems ATD circuit was used, which will check for address transition and generate a stretched pulse for the internal circuit to perform its operation. Until the pulse generated we can keep the rest of the circuit of memory chip can be in non-operating mode. ATD is used to generate a tile level control signal that is used to generate the required signals such as enable for the block decoders, sense amplifiers, precharge, etc.

A **novel** ATD was designed in this project to aid in the minimization of power, area and improving immunity w.r.t. PVT variations and also reduce the delay of pulse generation once an address transition is detected. The principle is based on a dual-edge pulse generator which generates a pulse whenever an edge (rising or falling) is detected, which is then fed to a monostable circuit to generate the required pulses, with widths that can be set according to our requirement. Since only one RC pair is required, power dissipation is reduced drastically and it is highly immune to PVT variations. The area also reduces significantly.

#### 4.3.1. ATD INPUTS

TABLE 3: ATD INPUTS

| S. No. | Inputs | Sources       |
|--------|--------|---------------|
| 1      | A0-A16 | Input buffers |

#### 4.3.2. ATD OUTPUTS

TABLE 4: ATD OUTPUTS

| S. No. | Inputs  | Sources                    |
|--------|---------|----------------------------|
| 1      | ATD_OUT | Tile Level Control Circuit |

#### 4.3.3. CONVENTIONAL ATD CIRCUIT:

ATD pulses coming from different XOR gates are finally OR-ed to obtain and a small pulse is generated. This circuit has the drawback of changing pulse width with respect to different address transitions, and the width is not controllable. There will also be significant variation across the process corners and a delay in the pulse generation once an address change occurs. The power and area required is also very large.



FIGURE 27: CONVENTIONAL ATD

#### 4.3.4. PULSE STRETCHING CIRCUIT:

A pulse stretching circuit is used along with a conventional ATD to control the pulse width. The operation of memory chip depends upon the ATD pulse. The ATD pulse in turn depends upon operation of delay circuit. Hence this delay circuit operation is critical in our design. The delay circuit operation should be PVT invariant to generate proper ATD pulse width. The RC delay in the following figure can be generated in the following ways:

- Inverter Based Delay Approach I
- Inverter Based Delay Approach II
- RC Based Delay



FIGURE 28: PULSE GENERATING CIRCUIT

#### 4.3.4.1. INVERTER BASED DELAY APPROACH I

In this approach, strong inverters are placed alternately to increase delay and thus have greater pulse widths.

However, there are several problems with this approach:

- Delay decreases with increasing temperature.
- Delay is dependent on PVT.
  - Temp  $\pm 5\%$
  - Supply  $\pm 25\%$
  - Process  $\pm 30\%$
  - Overall variation  $\pm 60\%$



FIGURE 29: INVERTER BASED DELAY APPROACH I

#### 4.3.4.2. INVERTER BASED DELAY APPROACH II

In this approach, cascaded inverters with load are used to increase delay and thus have greater pulse widths. This provides required width with lesser stages.



FIGURE 30: INVERTER BASED DELAY APPROACH II

Delay is dependent on PVT.

- Temp  $\pm 5\%$
- Supply  $\pm 20\%$
- Process  $\pm 25\%$
- Overall variation  $\pm 50\%$

#### 4.3.4.3. RC BASED DELAY APPROACH

Uses strong delays alternately with RC delays.



FIGURE 31: RC BASED DELAY

Delay is dependent on PVT.

- Temp  $\pm 5\%$
- Supply  $\pm 15\%$
- Process  $\pm 15\%$
- Overall variation  $\pm 35\%$

#### 4.3.5. NOVEL ATD CIRCUIT:

The dual edge detector circuit uses two inverters to push and then pull the output node once a change in the address occurs, resulting in a spike once an address change is detected.



FIGURE 32: DUAL EDGE DETECTOR

These spikes are then OR-ed to get a final spike which acts as a trigger for the monostable multivibrator shown.



FIGURE 33: OR OPERATION OF THE DUAL EDGE DETECTORS



FIGURE 34: MONOSTABLE MULTIVIBRATOR

The capacitor and resistor used in the monostable multivibrator are the only ones in the ATD, thus saving lot of area, power and increasing immunity to PVT variations.

#### 4.3.6. COMPARISON OF NOVEL AND CONVENTIONAL ATD

##### 4.3.6.1. ROBUSTNESS W.R.T. PROCESS AND TEMPERATURE VARIATIONS FOR CONVENTIONAL ATD

TABLE 5: ROBUSTNESS OF CONVENTIONAL ATD

| Pulse Width (ns) |       |       |       |
|------------------|-------|-------|-------|
| Temperature (°C) | -40   | 27    | 85    |
| <b>Typical</b>   | 23.46 | 23.42 | 23.67 |
| <b>Slow</b>      | 31.20 | 30.49 | 30.92 |
| <b>Fast</b>      | 19.72 | 19.74 | 19.92 |

**Overall Variation in pulse width <33%**

**Delay in generation of pulse after address transition > 650 ps**

##### 4.3.6.2. ROBUSTNESS W.R.T. PROCESS AND TEMPERATURE VARIATIONS FOR NOVEL ATD

TABLE 6: ROBUSTNESS OF NOVEL ATD

| Pulse Width (ns) |       |       |       |
|------------------|-------|-------|-------|
| Temperature (°C) | -40   | 27    | 85    |
| <b>Typical</b>   | 26.17 | 24.54 | 23.44 |
| <b>Slow</b>      | 25.20 | 23.44 | 22.20 |
| <b>Fast</b>      | 26.66 | 25.03 | 23.91 |

**Overall Variation in pulse width <10%**

**Delay in generation of pulse after address transition < 410 ps**

#### 4.3.6.3. POWER COMPARISON

##### For Conventional ATD:

- Average Power:  $14.65\mu\text{W}$
- Static Power:  $236\text{nW}$
- Peak Dynamic Power:  $432\mu\text{W}$

##### For novel ATD:

- Average Power:  $1.657\mu\text{W}$
- Static Power:  $33.5\text{nW}$
- Peak Dynamic Power:  $124.7\mu\text{W}$

Thus, we see almost a ten-fold reduction in power consumption.

#### 4.3.7. ATD LAYOUT



FIGURE 35: ATD LAYOUT

Length =  $144 \mu\text{m}$

Width =  $30 \mu\text{m}$

Area =  $4320 \mu\text{m}^2$

## 4.4. DECODING SCHEME

### 4.4.1. ADDRESS DECODING

#### 4.4.1.1. SCHEME USED

- A 3:8 Static CMOS decoder to select row and column decoder of 8 memory banks.
- Control signal coming from ATD enables Address decoder which signal is active high. If the enable signal is low, Address Decoder does not select any of the row and column decoder of memory banks.
- Address Decoder is sized for minimum path delay and proper driving capability for low power design.
- MSBs of address buses are the inputs of the Address Decoder with an enable signal from ATD.
- EN at input stage gates has following benefits as compared to EN at last stage-

| EN at last stage gates                                                                                                                                                                                                                                                                       | EN at input stage gates                                                                                                                                                                                                                                                                                         |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul style="list-style-type: none"> <li>• Since large no. of gates are used at output stage so fan-out of EN generating circuit should be high due to large branching efforts.</li> <li>• All the stages can change its value up to last stage so number of transitions are large.</li> </ul> | <ul style="list-style-type: none"> <li>• One of the inputs will be zero if EN=0 at input stage, so fan-out of EN generating circuit is reduced because branching effort is only two.</li> <li>• Only few stages can change its value so no. of transitions are less.</li> <li>• Power is minimized!!</li> </ul> |



FIGURE 36: ADDRESS DECODER

#### 4.4.1.2. ADDRESS DECODE CIRCUITRY WITH SCHEMATIC



FIGURE 37: GATE LEVEL IMPLEMENTATION

#### 4.4.1.3. ADDRESS DECODE CIRCUIT CRITICAL PATH TIMING AND SIZING OF GATES-

TABLE 7: SIZING OF GATES IN ADDRESS DECODER CIRCUIT HAS BEEN DONE BY LOGICAL EFFORT METHODS

|                                                | Stage 1<br>NOT Gate | Stage 2<br>NAND Gate | Stage 3<br>NOR Gate |
|------------------------------------------------|---------------------|----------------------|---------------------|
| • Parasitic Delay ( $p$ )                      | 1                   | 2                    | 2                   |
| • Logical Effort ( $g$ )                       | 1                   | 4/3                  | 5/3                 |
| • Branching Effort ( $b$ )                     | 1                   | 4                    | 1                   |
| • Sizing (with reference to 2:1 size inverter) | 1                   | 1                    | 1.3                 |



#### 4.4.1.4. ADDRESS DECODE CIRCUIT SCHEMATIC SIMULATION

A0 to A2 are input address lines, E is enable and S0 to S7 are decoded outputs.



FIGURE 38: SIMULATION

#### 4.4.1.5. ADDRESS DECODE CIRCUIT LAYOUT

DRC cleaned Address Decoder circuit layout is shown in fig.



#### 4.4.2. ROW DECODING

Hierarchical scheme has been used for the design of row decoders. The decoder block is divided into 2 Stages.

##### 4.4.2.1. SCHEME USED

- The Row Decoder is split into one 3 to 8 and one 2 to 4 pre-decoders stage and an array of 32 AND gates, used to generate Word Line Enable signals.
- Between successive Read/Write cycles, the word line must be pulled to ground to avoid crosstalk of various SRAM cells and/or to pre-charge the bit lines.
- This cycle is synchronized by the ATD block which generates an active high pulse to enable the predecoders.
- The Row Decoder worst case delay was observed keeping EN coming from ATD high and switching the row address from 00000 to 11111.
- To get an estimate of power consumption, we decoded 32 contiguous addresses, switching the pre-decoder from 0 to 31. This provides us with a fairly accurate estimate of average power consumption in the circuit.

## 4.4.2.2. ROW DECODE CIRCUITRY WITH SCHEMATIC



FIGURE 39: GATE LEVEL IMPLEMENTATION

#### 4.4.2.3. ROW DECODER CIRCUIT CRITICAL PATH TIMING AND SIZING OF GATES-

Row decoder circuit with load capacitance of 26fF is shown in schematic shown below. Sizing of gates has been calculated by logical effort method.



FIGURE 40: ELMORE DELAY MODEL

#### 4.4.2.4. ROW ADDRESS DECODER CIRCUIT LAYOUT-



FIGURE 41: LAYOUT

Length : 88  $\mu\text{m}$

Width : 11  $\mu\text{m}$

Area : 968  $\mu\text{m}^2$

#### 4.4.3. COLUMN DECODING

Hierarchical scheme has also been used for the design of column decoders. The decoder block is divided into 2 Stages.

##### 4.4.3.1. SCHEME USED

- The column Decoder is split into three 3 to 8 as pre-decoders stage and an array of 512 AND gates, used to generate Word Line Enable signals.
- The Row Decoder worst case delay was observed keeping EN coming from ATD high and switching the row address from 000000000 to 111111111.

##### 4.4.3.2. COLUMN DECODE CIRCUITRY WITH SCHEMATIC



FIGURE 42: GATE LEVEL IMPLEMENTATION

#### 4.4.3.3. COLUMN DECODER CIRCUIT CRITICAL PATH TIMING AND SIZING OF GATES-

Column decoder circuit with load capacitance of 32fF is shown in schematic shown below. Sizing of gates has been calculated by logical effort method.



FIGURE 43: ELMORE DELAY MODEL

## 4.4.3.4. COLUMN ADDRESS DECODER CIRCUIT LAYOUT-



FIGURE 44: LAYOUT OF COLUMN DECODER WITH WRITE DRIVERS

Length : 1015  $\mu\text{m}$ (Pitch Matched)   Width : 17.5  $\mu\text{m}$    Area : 0.1776  $\text{mm}^2$

## 4.5. CORE I/O

---

### 4.5.1. PRECHARGE CIRCUITRY

---

#### 4.5.1.1. SCHEME USED

- Precharging is done at the end of every clock cycle.
- Both static and dynamic precharge schemes are used.
- Static precharge transistors are used to avoid the bit-lines floating, when the core is not in use. Also, they do not affect the normal operation of the core in any way.
- Size of Dynamic transistors selected such that it is able to charge up the Bitline Capacitance within the allocated precharge budget.
- Equalization transistor is also used to ensure the BL and BL\_BAR lines equal at the start of read operation.

#### 4.5.1.2. INPUTS AND OUTPUTS USED

| Signal | Type | Source input                                               |
|--------|------|------------------------------------------------------------|
| PRE_EN | IN   | From control signal generation circuit followed by buffers |
| BL     | OUT  | Connected to the bit line of the SRAM cell                 |
| BL_BAR | OUT  | Connected to the bit line bar of SRAM cell                 |

#### 4.5.1.3. PRECHARGE CIRCUITRY ROUTING

- Every row of SRAM Core has a precharge circuit at the right / left. We have 32 rows in a bank and each row has 8 bit lines so one bank contain 256 precharge circuits and there are 8 such banks. All have same scheme.
- PMOS in the middle are controlled by PRE\_EN control signal. These are the dynamic transistors. The corner PMOS (static PMOS) whose gates are grounded, are always ON.
- The Drains of the above PMOS are connected to BL and BL\_BAR of the SRAM Core.
- When PRE\_EN is low, BL and BL\_BAR are pulled high, this is done usually when the SRAM is not in operation, after a Read / Write is performed in the core.
- PMOS at the bottom is an equalization transistor, added in order to equalize the BL and BL\_BAR whenever precharge is done.

#### 4.5.1.4. ESTIMATED LOAD CAPACITANCE ON OUTPUTS

- Bit-Line load Capacitance = 200 fF
- Bit-Line resistance (Pex Extracted) = 600 ohms

#### 4.5.1.5. DIFFERENT PRECHARGE CIRCUIT TOPOLOGIES

TABLE 8: COMPARISON OF DIFFERENT PRECHARGE TOPOLOGIES

| Parameters   | HVT PMOS<br>With Equalization circuit      | HVT PMOS<br>With LVT NMOS and Equalization<br>circuit |
|--------------|--------------------------------------------|-------------------------------------------------------|
| <b>Area</b>  | Relatively Less (All PMOS)                 | More (NMOS and PMOS)                                  |
| <b>Power</b> | $C_b \times V_{DD} \times V_{DD} \times f$ | $C_b \times (V_{DD} - V_{Tn}) \times V_{DD} \times f$ |
| <b>Speed</b> | Slow                                       | High Speed                                            |

We can see from the table that HVT PMOS with LVT NMOS and Equalization circuit is better because it consumes less power which is required for our design.



FIGURE 45: HVT PMOS WITH EQUALIZATION CIRCUIT



FIGURE 46: HVT PMOS WITH LVT NMOS AND EQUALIZATION CIRCUIT

#### 4.5.1.6. PRECHARGE CIRCUIT SCHEMATIC SIMULATION

The difference between the above two scheme is shown in the table given below:-

TABLE 9: COMPARISON

| Parameter                 | HVT & LVT              | HVT                    |
|---------------------------|------------------------|------------------------|
| <b>Static Power(Watt)</b> | $6.44 \times 10^{-11}$ | $2.26 \times 10^{-10}$ |
| <b>Delay</b>              | 0.7 nSec               | 0.6 nSec               |

In the following figure we have shown the schematic with equivalent resistance and capacitance of bit line and bit line bar.



#### 4.5.1.7. PRECHARGE CIRCUIT LAYOUT

As HVT PMOS with LVT NMOS and Equalization circuit consumes less power than other scheme so we are using this technique for our SRAM low power design. In the figure given below we have shown the layout of the precharge circuit.



FIGURE 47: LAYOUT OF PRECHARGE CIRCUIT

#### 4.5.2. SENSE AMPLIFIER

In memory design, it is imperative to design robust and highly sensitive sense amplifiers to improve the speed of the memory and thus, they are one of the primary building blocks in the memory design. The large bit-line capacitance due to lengthy bit-lines, is one of the main bottlenecks to the performance of on-chip caches. The speed of the overall design is heavily stunted by the signal delay over long and highly loaded interconnects due to increased capacitance and resistance, which also limits the signal swing.

##### 4.5.2.1. DIFFERENT TOPOLOGIES

###### VOLTAGE MODE SENSE AMPLIFIER

The VSA detects the voltage difference on the bit lines and gives the output. The possible voltage mode sense amplifiers are like single ended sense amplifiers, Cross-coupled sense amplifiers and differential amplifiers. Based on the requirements of the design different types of VSA are used.



FIGURE 48: CROSS COUPLED VOLTAGE MODE SENSE AMPLIFIER

###### CURRENT MODE SENSE AMPLIFIER

In advanced memories, due to technology scaling the number of cells attached to the bit lines increases. This results in increase in the bit line capacitance. In such schemes, using the voltage mode SA will not help in keeping the high performance. Thus to read the bit lines with such large capacitance we need faster sensing scheme like current mode sense amplifier. The CSA reduces the delay, as they apply provide very low common input/output impedances, which result in reduced voltage swing, substrate current and cross talk.



FIGURE 49: CURRENT MODE SENSE AMPLIFIER

### CHARGE TRANSFER SENSE AMPLIFIER

Charge Transfer Sense Amplifier (CTSA) works on the principle of charge redistribution from higher capacitance bit lines to lower capacitance sense amplifier output node. This results in high speed operation and lower power consumption due to low voltage swing on the bit-lines. The basic concept behind charge-transfer amplification is to produce voltage gain by exploiting charge conservation among capacitive devices. For a series connection of two capacitive elements in a system for which charge is conserved, the product of the voltage across the first element and its capacitance must be equal the product of the voltage across the second element and its capacitance as shown in the following equation.

$$C_{small} V_{small} = Q_{system} = C_{large} V_{large}$$

Charge Transfer Sense Amplifier (CTSA) overcomes most of these problems and is emerging as an efficient sensing technique for low power memories. The CTSA offers better performance and has lower power consumption than other topologies, namely the voltage sense amplifier and current sense amplifier. The basic working principle of the CTSA is to produce voltage gain by exploiting charge conservation amongst capacitive devices, viz. distribution from the high capacitance node to the low capacitance node. This allows detection of extremely low bit-line swing, resulting in higher sensitivity, along with lower sense delay.

#### 4.5.2.2. INPUTS USED

TABLE 10 INPUTS TO SENSE AMPLIFIER

| S. No. | Inputs    | Sources         |
|--------|-----------|-----------------|
| 1      | Sense     | Control Circuit |
| 2      | Y_sel     | Control circuit |
| 3      | Pre_ch    | Control circuit |
| 4      | Bit line  | Memory array    |
| 5      | Bit line' | Memory array    |

#### 4.5.2.3. CTSA OUTPUTS

TABLE 11: CTSA OUTPUTS

| S. No. | Inputs   | Routed to |
|--------|----------|-----------|
| 1      | DATA_OUT | DATA I/O  |

The output of the CTSA can drive load capacitance up to 200 fF.

#### 4.5.2.4. CTSA CIRCUIT SCHEMATIC



FIGURE 50: CHARGE TRANSFER SENSE AMPLIFIER

#### 4.5.2.5. CTSA SCHEMATIC SIMULATION RESULTS



FIGURE 51: PVT SIMULATION OF BIASING CIRCUIT

TABLE 12 PVT READINGS

| Supply Voltage | Vref (mV)       |        |                 |        |                 |        |
|----------------|-----------------|--------|-----------------|--------|-----------------|--------|
|                | FF              |        | TT              |        | SS              |        |
|                | Temperature(°C) |        | Temperature(°C) |        | Temperature(°C) |        |
|                | -40             | 85     | -40             | 85     | -40             | 85     |
| 0.5            | 303.74          | 322.59 | 284.62          | 302.27 | 265.36          | 282.39 |
| 0.7            | 305.84          | 325.38 | 286.62          | 304.76 | 267.34          | 284.82 |
| 0.9            | 307.69          | 327.5  | 288.4           | 304.98 | 269.1           | 286.97 |
| 1              | 308.54          | 328.56 | 289.23          | 308    | 269.93          | 287.95 |
| 1.2            | 310.17          | 330.54 | 290.8           | 309.93 | 271.48          | 289.83 |
| 1.5            | 312.39          | 333.2  | 292.97          | 312.55 | 273.62          | 292.37 |



FIGURE 52: SCHEMATIC SIMULATION WAVEFORM OF CTSA

#### 4.5.2.6. CTSA LAYOUT



FIGURE 53: LAYOUT OF CTSA

The dimensions of layout are, Length = 10.55  $\mu\text{m}$  & Width = 9.77  $\mu\text{m}$ . Thus area of layout is 103.07  $\mu\text{m}^2$ .

## 5. INTEGRATION



FIGURE 54: ROW DEC &SENSE AMPLIFIER INTEGRATION LAYOUT



FIGURE 55: INTEGRATION OF CORE WORD LINE DRIVER



FIGURE 56: COLUMN DECODER & CORE INTEGRATION LAYOUT



FIGURE 57: SENSE AMPLIFIER & WRITE BUFFER INTEGRATION



FIGURE 58: TOP LEVEL CHIP LAYOUT

### 5.2.1 SUMMARY OF AREA DISTRIBUTION

TABLE 13: SUMMARY

| Components              | Area of Single Component | No. of Components in chip | Total Area           |
|-------------------------|--------------------------|---------------------------|----------------------|
| SRAM Core Array         | 312000 $\mu\text{m}^2$   | 8                         | 2.496 $\text{mm}^2$  |
| Pre-charge Circuit      | 2160.34 $\mu\text{m}^2$  | 8                         | 0.173 $\text{mm}^2$  |
| Sense Amplifier         | 103.07 $\mu\text{m}^2$   | 8                         | 0.001 $\text{mm}^2$  |
| Write buffer            | 46.84 $\mu\text{m}^2$    | 8                         | 0.0003 $\text{mm}^2$ |
| Row Decoder             | 911.68 $\mu\text{m}^2$   | 4                         | 0.0046 $\text{mm}^2$ |
| Column Decoder          | 14302.47 $\mu\text{m}^2$ | 4                         | 0.057 $\text{mm}^2$  |
| ATD and Control Circuit | 4320 $\mu\text{m}^2$     | 1                         | 0.004 $\text{mm}^2$  |
| Block Decoder           | 238.041 $\mu\text{m}^2$  | 1                         | .0003 $\text{mm}^2$  |

### 5.2.2. COMPLETE SUMMARY

#### Achieved Specifications

- Total Chip Area** : **3.01  $\text{mm}^2$**
- ❖ **Die Utilization** : **2.7362  $\text{mm}^2$**
- ❖ **Core Area** : **2.496  $\text{mm}^2$**
- ❖ **Core Efficiency** : **77.95 % (without I/O Pads)**

#### Estimated Specifications

- ❖ **Total Chip Area** : **5.45  $\text{mm}^2$**
- ❖ **Core Efficiency** :  **$\geq 65 \%$**

## REFERENCES

---

- [1] Binh-Son Le, Thanh-Tri VO, “SRAM Cell for High Noise Margin & Soft Errors Tolerance in Nanoscale Technology”, in *Proc. Intl. Conf. Computing, Management and Telecommunications*, Apr. 2014, pp. 96-100
- [2] Betty Prince, “Semiconductor Memories: A Handbook of Design, Manufacture and Application”, *Wiley*, Aug. 1996
- [3] Rabaey, Jan M., Anantha P. Chandrakasan, and Borivoje Nikolic, “Digital integrated Circuits. Vol. 2” *Englewood Cliffs: Prentice hall*, 2002.
- [4] Wakerly, John F. “Digital Design: Principles and Practices” *Prentice-Hall, Inc.*, 2000.