

# Final Project for NTHU's 11010EE 32300 Course – ROM macro

108061151 Jun-Hao Chen, EE23, NTHU, 108061229 Pin-Hsuan, Shih, EE23, NTHU

**Abstract – A 16 X 64 CMOS ROM macro.** The ROM has a nominal access time performance of 10ns and is physically organized as 64 word lines by 16 bits lines. The technology used was the CIC018 technology which features  $L_{eff}=0.18\mu m$  and a 1.8v power supply. The macro physical area is  $41.8 \times 121.05 \mu m^2$ .

## I. INTRODUCTION

This is the report of Final project for NTHU's 11010EE 323000 course. In this report, we will introduce the detail of ROM macro design and run the pre-simulation and post-simulation also. After that we will discuss the result and make a conclusion.

### A. ROM Macro Floorplan

A layout view of the ROM is shown in Fig. 1. The ROM macro is organized as 64 X 16 bits. A functional block diagram is shown in Fig.2 . There are 9 address inputs, a clock control input and 2 data outputs. The array is internally organized as 64 word lines by 16 bit lines. The type of ROM array is NOR type and the overall ROM macro used three levels of metal for wiring.



Fig.1 Layout view of ROM macro  
(a) Original Layout (b) Layout with denoted block



Fig.2 Functional Block Diagram of ROM macro  
(a) Functional Block (b) Overall Schematic

### B. Rom array

In this work, we use NOR type ROM. The Code-1 in ROM doesn't exist connection between cell and bit line and the Code-0 exist connection. The schematic diagram is shown in Fig. 3 and the layout in Fig. 4.



Fig.3 Schematic diagram of ROM array  
 (a) Schematic of ROM array (b) Detail view of each cell

Source: Lecture note

### C. Row Decoder

The purpose of decoder is to reduce the number of select signals. We can use 6 bits to represent 64 bits select signal.

$$K = \log_2 N$$

$K$  is the number of reduced signal and  $N$  is the original select signal.

The logic to implement this decoder is something like,

$$WL_0 = A_0 \cdot A_1 \cdot A_2 \cdot A_3 \cdot A_4 \cdot A_5$$

The problem of this implementation is the complexity is too high, we need to build a 6 inputs AND gate which needs lots of transistor and consume too much power. The alternative method is to use the pre-decoder to reduce the complexity.

The function of pre-decoder is to generate the pre-decode tag. For instance, we use two pre-decoders to decode the 6 bits signal into 2 sets of 8 tags(each of pre-decoder decode 3 bits). With two set of 8 tags, we can reconstruct the 64 bits select signal.

In our work, we add an additional function,  $WL\_EN$ (word line enable) to control the word line signal on or off. How about the function be implemented? We add an additional bit as input, as the time we control the additional bit, we can control the decoder. For example, if we use NAND to decode, when we apply logic 1 to it, the output of NAND will be set to logic 0.

The overall schematic is shown in Fig.4 and layout in Fig.5.



Fig.4 Schematic of Row decoder



Fig.5 Layout of Row decoder  
 (a) Original Layout (b) Layout with denoted block

### D. 8 to 1 Mux with decoder

The ROM macro has 16 bit lines and 2 word outputs, hence we need a bit selector to select which bit should be output. In this work, we use three stages Mux to implement this function. Why three stages? The reason is that we want to combine MUX with decoder. The advantage of this design is that we can reduce the area aggressively but the disadvantage is that the latency may be increased. The functional block is shown in Fig. 6.



Fig. 6 8 to 1 Mux

The total design use 4 of 2 to 1 Mux in first stage, 2 of 2 to 1 Mux in second stage and 1 of 2 to 1 Mux in third stage, which make it work as 8 to 1 Mux. These three stages correspond to 3 bits decoder, control select signal on each of 2 to 1 Mux at three stages can make it select the desired signal.

The overall design combine two 8 to 1 Mux into 16 to 2 Mux since we can reuse the select signal to reduce area. A schematic is shown in Fig. 7 and layout in Fig.8.



Fig.7 Schematic of two 8 to 1 Mux with decoder



Fig.8 Layout of two 8 to 1 Mux with decoder

#### E. Precharge circuit

The function of precharge circuit is to get the information store in ROM. Since we used NOR type ROM, we precharge the bit line before the word line be turned on. After that, turn the word line on, if there is MOS in ROM cell connect to VSS, the voltage of bit line will be pulled down. We can sense the voltage of bit line to determine the information store in memory cell.

To design this circuit, just need to add a PMOS at each word line and ensure the size of it is large enough to charge whole bit line. The layout is shown in Fig.9 and schematic in Fig.10.



Fig.9 Schematic of precharge circuit



Fig.10 Layout of precharge circuit

#### F. D Flip-flop

We use DFF(D Flip-flop) as input stabilizer to make sure that the input signal cannot be revised after the positive edge of clock increasing the stability of whole circuit.

In this work, we implement a 12T-TSPC D Flip-flop to get a lower power consumption and lower area. A schematic is shown in Fig. 11 and layout view in Fig.12.



Fig.11 Schematic of 12T-TSPC DFF



Fig.12 Layout of 12T-TSPC DFF

#### G. Timing Control

Due to every functional block in ROM macro has its own latency, we need a time control to coordinate every functional block.

In our work, we design a series of combinational logic to generate the control signal. The desired waveform of control signal is shown in Fig.13

To achieve the desired waveform, we use inverter chain as delay component and NAND gate as determinate component. The functional block is shown in Fig.14, the schematic of this block is shown in Fig.15 and layout in Fig.16.

Also, in this part when design the inverter chain, we find that if we want to get a longer delay. Increase the length of MOS is more effective than add more stage of inverter. With this experience, we can design this time control unit more compact to reduce lots of area.



Fig.13 Waveform of desired control signal



Fig.14 Functional block of desired control signal



Fig.15 Schematic of time control circuit



Fig.16 Layout of the time control circuit



Fig.20 Schematic of sense amplifier



Fig.21 Layout of sense amplifier

The overall schematic of this part is shown in Fig.22 and layout is shown in Fig.23.



Fig.18 Schematic of sense amplifier



Fig.19 Layout of sense amplifier



Fig.22 Schematic of this block



Fig.23 Layout of this block

(a) Original layout (b) Layout with denoted block

In order to store the output of sense amplifier, we introduce a latch. The problem of latch is that if we connect it to the sense amplifier directly, it may ruin the function of sense amplifier. To solve this problem, we introduce two inverters between the output of sense amplifier and input of latch to block the signal. The whole schematic of this block is shown in Fig.20 and the layout in Fig.21.

## II. SIMULATION

### A. Pre-Simulation

Pre-Simulate the ROM macro with 5 corners including read 0(63, 0) and read 1(63, 15). And since the initial condition of our ROM macro, we cannot get the access time of read 1(0, 0). We discuss with TA and turn to measure read 0(63, 0) and read 1(63, 15).

The simulation of TT corner is shown in Fig.24.



Fig.24 Simulation at TT corner

| Item                                | Value    |
|-------------------------------------|----------|
| Average access time                 | 0.630ns  |
| Average power consumption per clock | 1177.9uW |

Table.1 Performance table at TT corner

The simulation of SS corner is shown in Fig.25.



Fig.25 Simulation at SS corner

| Item                                | Value      |
|-------------------------------------|------------|
| Average access time                 | 1.54ns     |
| Average power consumption per clock | 705.1663uW |

Table.2 Performance table at SS corner

The simulation of FF corner is shown in Fig.26.



Fig.26 Simulation at FF corner

| Item                                | Value    |
|-------------------------------------|----------|
| Average access time                 | 0.539ns  |
| Average power consumption per clock | 1390.5uW |

Table.3 Performance table at FF corner

The simulation of SF corner is shown in Fig.27.



Fig.27 Simulation at SF corner

| Item                                | Value    |
|-------------------------------------|----------|
| Average access time                 | 1.315ns  |
| Average power consumption per clock | 1251.4uW |

Table.4 Performance table at SF corner

The simulation of FS corner is shown in Fig.28.



Fig.28 Simulation at FS corner

| Item                                | Value    |
|-------------------------------------|----------|
| Average access time                 | 0.637ns  |
| Average power consumption per clock | 1259.4uW |

Table.5 Performance table at FS corner

### B. Post-Simulation

Post-Simulate the ROM macro at TT corner 25°C including read 1 and read 0.

The post-simulation of TT corner is shown in Fig.29.



Fig.29 Post-simulation at TT corner

| Item                                | Value   |
|-------------------------------------|---------|
| Average access time                 | 0.771ns |
| Average power consumption per clock | 1379uW  |

Table.6 Post simulation performance table at TT corner

### III. COMPARISON

We make a simple comparison with pre-simulation and post-simulation at TT corner 25°C. To make comparison, I put two waveform together shown in Fig. 30.



Fig.30 Waveform of comparison together

| Item                                | Pre-simulation | Post-simulation |
|-------------------------------------|----------------|-----------------|
| Average access time                 | 0.630ns        | 0.771ns         |
| Average power consumption per clock | 1177.9uW       | 1379uW          |

Table.7 Comparison Table

We can see that after the post-simulation, the results are not too different from the pre-simulation. In the pre-simulation, the slope of the transient time is larger because the parasitic capacitance is smaller, which makes the subsequent components start working earlier and makes the access time shorter. In the post-simulation, there are more parasitic capacitors due to the layout, which makes the transient time longer, and the access time becomes longer as well.

### IV. CONCLUSION

The ROM macro physically area is 41.8 X 121.05 um<sup>2</sup> is shown in Fig. 31.



Fig.31 ROM macro Layout

And its FoM defined by

$$\begin{aligned} FOM &= \text{access time}^2 \times \text{power} \times \text{area} \\ &= 0.771^2 \times 1379 \times 5059.89 \\ &= 4147764.573 \end{aligned}$$



Fig. 32 DRC report



Fig. 33 LVS report

In this work, we combine all homework in this semester to complete a large project making us feel fulfillment. During the hardworking time, in addition to the improvement of design circuit skills, there is also the peace of mind of teamwork. When we encounter difficulties in our own research, we can exchange ideas with each other so that we can overcome different kinds of problems. I think the difference between the Electrical Department and other departments is that when we have a large project like this, we all work together at workstations and even have classmates from different courses come to visit us. Finally, I would like to thank all the instructors, teaching assistants, and classmates who helped in this course. Have a great winter vacation.

#### REFERENCES

- [1] A. Tuminaro, “A 400MHz, 144 Kb CMOS ROM macro for an IBM S/390-class microprocessor,” Proceedings International Conference on Computer Design VLSI in Computers and Processors, 1997, pp. 253-255, doi: 10.1109/ICCD.1997.628876
- [2] B. Razavi, “TSPC Logic”, IEEE SOLID-STATE CIRCUITS MAGAZINE, FALL 2016, pp. 10-13
- [3] Prof. Marvin Meng-Fan Chang, Lecture Note, EE Dept., NTHU, Taiwan