

### Homework 9: SRAM

Due: Friday Nov 14, 5pm

Please include your name, SID and specify either CS150, EE141 or EE241A at the top of your homework handin. Homeworks must be submitted electronically as a single file in PDF format.

#### Problem 1: 8T SRAM cells

Consider the 8T SRAM cell given below. With this design, there is a Write Word Line (WWL) that is used to write the values of Write Bit Line (WBL) and  $\bar{WBL}$  into the cell, and a separate Read Word Line (RWL) that is used to read the content of the cell on the Read Bit Line (RBL).



Figure 1: 8T cell

- Explain what could be the basic reasoning behind this idea.
- Determine which transistors are involved in a Write operation, and comment on their relative sizing.
- For the same cell, determine which transistors are involved in a Read operation, and comment on their relative sizing.

#### Problem 2: Physical array organization

You would like to instantiate an SRAM array that has 1024 entries of 8 bits. Assume an SRAM cell is  $0.25\mu m$  high and  $1\mu m$  wide (where the wordline is horizontal and the bitline is vertical).

Assume wire capacitance of  $0.2fF/\mu m$ .

- What are the dimensions of an array that has 1024 rows and 8 columns? (Assume the cells abut.)
- If you build an array of 1024 rows and 8 columns, how much energy is used to drive the wordline? How much energy is discharged from the bitlines (assuming the wordline is left on for a long time)? Only consider wire cap, and ignore gate cap.
- You can change the physical shape of an array by using 'bit-interleaving.' For a 4 to 1 interleaved design, every 4th bit in a row belongs to a single word (the 0th, 4th, 8th... bit form entry 0, the 1st, 5th, 9th.. bit form entry 1, the 2nd, 6th, 10th...bit form entry 2, and the 3rd, 7th, 11th... bit form entry 3 along a single row) If the 1024 entry x 8 bit word design is interleaved 4 to 1, how many rows and columns are there? Which address bits are used to select the word and which address bits are used for the column select mux?
- How much energy is used in the wordlines and bitlines for a 4 to 1 interleaved 1024 entry x 8 bit SRAM? How does this compare to the non-interleaved design?

### Problem 3: Bit masking

Cache lines are generally large (assume a line is 64 bytes or 512 bits), but accesses to these lines are small (only 8, 16, or 32 bits at a time). The cache line could be built out of narrow width SRAM macros (eg. 8 bits wide), but these are not area efficient because peripheral circuitry is amortized over less bitcells. So other techniques are needed.



Figure 2: 6T cell

Given the 6T cell above, assume that a precharge operation between accesses precharges the bitlines high.

- Given the data input to the cell `din[x]`, an active high write enable signal `we`, 4 NMOS transistors and an inverter, draw a transistor-level diagram of the circuitry required to write the cell.

- b) Given an additional active low signal `mask`, which prevents writes when `mask`=0, draw a transistor-level diagram of circuitry that only writes the cell when both `mask` and `we` are high.
- c) Assume 8 cells are arranged along a row to form 8 bit data words. If you only want to write the lowest 2 bits, what would the `mask` signals be for each column?
- d) What happens to the bitlines of the cells that are not written?
- e) Without any circuit techniques, how could you write only 2 of the 8 bits while keeping the other bits unchanged? Assume an SRAM with 4 bit words and 128 entries, and the standard control signals (clock, din, dout, addr, read\_enable, write\_enable). Draw a timing diagram of the solution that lets you write bits  $[1:0]=2'b11$  only (with  $[3:2]$  unchanged) for an access at address=0x1, including all of the SRAM signals and any additional signals you need to define. *Hint: you'll need 2 clock cycles.*

## Problem 4: Decoder design

- a) Implement a 2 to 4 decoder (based on a 2 bit input address, generate 4 one-hot output bits) by using only NOR2 gates and inverters. Draw the complete schematic and label the inputs and outputs. Assume you also have access to the input's complement.
- b) Show how you can build a 4 to 16 decoder using only 2 to 4 decoders, and AND2 gates. You can abstract the 2 to 4 decoder as a block, but label all of the inputs and outputs.
- c) What predecode blocks would you need for a design with an 8 bit input address? How many output rows would there be?

## Problem 5: Latch array

Small memories can be built out of flip-flops. But to improve area, flip-flops can be replaced with latches to create a latch array.

- a) Draw a 4 entry x 2 bit word memory block with asynchronous reads and positive-edge synchronous writes, using simple logic gates, 4 input MUXes, and positive-edge triggered flip-flops. The input signals to your block are `clk`, `A[1:0]` (and the complement  $\bar{A}[1:0]$ ), `din[1:0]`, and the output signal is `dout[1:0]`. *Hint: the D input is shared along a column, the Q output is muxed at each column with an input from each row, and each row shares a row select signal that is the AND of the clock and an address decoder. Do not use tri-state elements or bidirectional bitlines.*
- b) Now replace each positive-edge triggered flip flop with two-level sensitive latches, and draw the array again (you can abstract the peripheral circuitry with blocks, only draw the data cells and wires). Label the output of the master stage  $M[\text{row}][\text{col}]$  and label the output of the slave stage  $S[\text{row}][\text{col}]$  for every latch.

- c) Assume every bit in the array holds 0. If `clk` is low, and `din[1:0] = 10`, what are the values of M and S (the outputs of every latch)?
- d) If `clk` goes high, and `A[1:0] = 00`, what values change?
- e) Which latches can be removed to save area?

### Problem 6 (EE241A Only): Custom register file design

Download and read the paper “A 4R2W Register File for a 2.3GHz Wire-Speed POWER™ Processor with Double-Pumped Write Operation” (Ditlow ISSCC 2011) from bCourses, then answer the following questions.

- a) Complete the table below, assuming wordlines run in the horizontal direction and bitlines run in the vertical direction. Ignore metal tracks for Vdd and ground.

| Topology               | Horiz. metal tracks | Vert. metal tracks | Area | Area/(6T Area) |
|------------------------|---------------------|--------------------|------|----------------|
| 4R2W cell (from paper) |                     |                    |      |                |
| 2R1W cell (from paper) |                     |                    |      |                |
| 1R1W cell (from #1)    |                     |                    |      |                |
| 6T cell (from lecture) |                     |                    |      |                |

- b) In Figure 14.2.4, why are the blocks in “Pre-decode” labeled as (“4-16” and ”4-9”)? If the design had 256 entries, what would the blocks be labeled as?
- c) In Figure 14.2.4, why do the inverters in the “Delayed Keeper” block exist? What would happen if they are removed (assuming the polarity remains correct)?
- d) If you wanted to build a 2 read/1 write (2R1W) array that is 128 entries x 64 bits out of 2R1W cells, how many transistors would be used? How many transistors would be needed to build a 4R2W register file of the same dimensions (144 entries x 78bits) using flip flops (assuming the standard D-flip-flop from lecture)? Is transistor count a good metric to compare area?
- e) What is the maximum frequency while operating at 0.8V?