

**EE577a - Final Project**

**Design of a General-Purpose Microprocessor Using Software and Hardware Components**  
**Phase 3**

Team 5

Yulong Ding 8456815156 - Front-end / Back-end Python Scripting / Layout

Samuel Bruner 3198542984 - Register File / System Integration / Layout

Angsagan Abdigazy 2509824628 - ALU / Layout

# Table of Contents

|                                                      |           |
|------------------------------------------------------|-----------|
| <b>Top-Level CPU Schematic</b>                       | <b>2</b>  |
| <b>CPU Layout</b>                                    | <b>4</b>  |
| Top-Level CPU Floorplan                              | 4         |
| Register File Layout                                 | 7         |
| ID/EX Pipeline Register Layout                       | 9         |
| EX-Stage (ALU) Layout                                | 10        |
| ADD/MIN - Programmable Carry Ripple Adder/Subtractor | 11        |
| SFL/SFR                                              | 13        |
| MUL Stage 1                                          | 16        |
| MUL Stage 2                                          | 19        |
| Bitwise AND                                          | 21        |
| Bitwise OR                                           | 23        |
| EX/MEM Pipeline Register Layout                      | 24        |
| MEM-Stage (SRAM) Layout                              | 26        |
| MEM/WB Pipeline Register Layout                      | 28        |
| WB-Stage Layout                                      | 29        |
| <b>Data Path Optimization</b>                        | <b>31</b> |
| <b>Register File Optimization</b>                    | <b>31</b> |
| Phase 2 Optimizations                                | 31        |
| Phase 3 Optimizations                                | 32        |
| <b>Power Optimization</b>                            | <b>32</b> |
| True Single-Phase Clock (TSPC) Flip-Flop             | 33        |
| Static Voltage Scaling                               | 34        |
| <b>Final Python Script</b>                           | <b>33</b> |
| <b>Vector File and Golden Results</b>                | <b>39</b> |
| <b>Simulation Results</b>                            | <b>43</b> |
| <b>Performance Summary</b>                           | <b>43</b> |

- Top-Level CPU Schematic

- Note: refer to Phase 1 Report for the detailed schematic of each block.







- **CPU Layout**

- **Top-Level CPU Floorplan**

The top-level floorplan of the processor shown below contains layouts of major modules including the register file, all ALU submodules in the EX stage, the SRAM in the MEM stage, all stage registers as well as peripheral logic in the WB stage. However, note that the routing between modules is incomplete given the limited time and thus **the top-level layout below does not pass LVS**. Nevertheless, this top-level layout contains all modules of the 5-stage pipelined CPU in place and therefore serves as a floorplan reference providing a fairly accurate area estimate for the processor.





- Register File Layout

The annotated register file layout can be seen below. The design is currently clear of DRC. However, it is slightly incomplete and **unable to pass LVS**.





The progress of the RF layout can be seen by the following annotated schematic. The majority of the layout is completed. However, the write data input from WB stage still needs to be routed to both of the 2:1 IFRF Muxes.



- ID/EX Pipeline Register Layout





- EX-Stage (ALU) Layout

- ADD/MIN - Programmable Carry Ripple Adder/Subtractor

If MIN function is implemented, Cin = 1. The MSB of the final result is used to select the minimum of two inputs. If MSB = 1, then the result is negative and A is smaller than B. If MSB = 0, then the result is positive and B is smaller than A.







- SFL/SFR
  - The SFL/SFR submodule of the ALU performs arithmetic left/right shift on the 16-bit data input by a specified number of bits.







- MUL Stage 1
  - The pipelined 2's complement 5-bit multiplier contains 2 separate stages to reduce the combinational critical path of the circuit. The first stage resides in the EX stage of the processor along with the rest of the ALU submodules before the EX/MEM stage register.





- MUL Stage 2

- The second stage of the pipelined multiplier resides in the MEM stage of the processor along with the SRAM after the EX/MEM stage register.





### - Bitwise AND





- Bitwise OR





## • EX/MEM Pipeline Register Layout





- **MEM-Stage (SRAM) Layout**

- The SRAM design, including the schematic and the complete top-level layout, is adopted from the verified working design of Lab 2.





- MEM/WB Pipeline Register Layout





- **WB-Stage Layout**



- **Data Path Optimization**

The following modification of the immediate value datapath was made in order to reduce area by 32 DFFs. The original design simply forwarded a dedicated 16b path from ID stage through WB stage. After the modification, a 16b 2 to 1 MUX is inserted at the end of the source register 2 datapath before outputting to the ID/EX stage register. The `imm_op` control signal is then able to select between the immediate value or source register 2 value. Since the immediate value is only needed at the input of the SRAM, we are able to share the R2 datapath through the ID/EX and EX/MEM stage registers. The immediate value only then is given a dedicated path as it passes through the MEM/WB stage register.



- **Register File Optimization**

- Phase 2 Optimizations

In the latter portion of the RF datapath, an internally forwarding register file (IFRF) was implemented in order to reduce the minimum number of NOPs following a RAW data hazard from 3 to 2. In the original datapath design, if an ID/EX RAW dependency occurred, the senior instruction would need to completely pass through WB stage before the junior instruction is no longer stalled. After the IFRF implementation, senior instructions stalling junior instructions are able to forward their values from the WB stage to the RF in ID stage before exiting the WB stage. There is a saving of one NOP as a result.



### - Phase 3 Optimizations

When decreasing the clock period below 0.7ns, the register file final output originally yielded incorrect data for some registers. These incorrect values correspond to instructions with data dependencies. The root cause was attributed to delay in the IFRF signal propagation as the internally forwarded data could not reach EX stage before the clock edge. The internally forwarded value was reaching the IF/EX stage register 24ps after the clock edge. The following figure shows the implementation of logical effort on the IFRF signal datapath as well as 4X sizing of each gate to decrease propagation delay. After this optimization, the clock period was able to successfully achieve 0.59ns.



### • Power Optimization

- True Single-Phase Clock (TSPC) Flip-Flop



Given the large number of flip flops required in the pipeline stage registers of the datapath and the control signal transfer path, different flip-flop architectures beyond the standard master-slave latch DFF construction were explored for further power and transistor area reduction without sacrificing performance. The true single-phase clock dynamic structure was ultimately adopted given its compact structure and size. Because there are no transmission gates, the FF uses a single clock signal without needing to invert and thus reduces the effect of clock skew. Compared to the standard master-slave DFF from the gpdk045 library with asynchronous reset, the transistor count is brought down from 28T to 12T per FF and cell area is reduced by approximately 50% per FF.

The most promising aspect of the TSPC FF is its reduced power consumption. With a significant reduction in transistor count, both static leakage power and dynamic switching power is reduced. By adopting TSPC FFs in the construction of the ID/EX and EX/MEM stage registers, a reduction of 150 $\mu$ W in total power from over 900  $\mu$ W was achieved for the processor overall compared to using standard master-slave latch DFFs. An attempt to further reduce power consumption was made by replacing the DFFs of the register file with these TSPC FFs. However, possibly due to the dynamic logic nature of the TSPC construction, functionality issues were encountered after the replacement and thus the register file kept standard DFFs, while the stage registers (consisting of 94 FFs in total) adopted TSPC FFs.

- Static Voltage Scaling

The reference voltage VDD in AND/OR blocks was reduced to 700mV. This voltage reference was directly generated by the ideal external pin. Since the output of the ALU is followed by stage registers (DFFs), the output of the AND/OR blocks is still properly latched and the correct operation of the ALU is preserved. Even though the reduction of VDD brings more delay into the circuit, the delay from AND/OR blocks is still far from the bottleneck. This technique saved us 10uW of total power consumption at 1GHz of clock operation.

- **Final Python Script**

```

1 # Front-end: Instruction Fetch and Decode
2 fileIn = open('sample_instructions_new.txt', 'r')
3 fileOutCPU = open('cpu_new.vec', 'w')
4 fileOutMEM = open('mem_new.vec', 'w')
5 fileOutCPU.write('radix 1 1 1 3 3 3 ')
6 fileOutCPU.write('1 1 1 1 1 1 ')
7 fileOutCPU.write('1 1 1 4444 14\n')
8 fileOutCPU.write('io i i i i i i i i i i i i i i i i\n')
9 fileOutCPU.write('vname clk ~reset reg_write reg_dst<[2:0> reg_src_1<[2:0> reg_src_2<[2:0> ')
10 fileOutCPU.write('and_en or_en add_en mul_en min_en shift_en right_shift ')
11 fileOutCPU.write('imm_op alu_load_sel loadi_sel data_in<[15:0]> addr<[4:0]>\n')
12 fileOutCPU.write('tunit ns\nslope 0.005\nvih 1.0\nvil 0.0\n\n')
13 fileOutMEM.write('radix 1 1 1 1\n')
14 fileOutMEM.write('io i i i i\n')
15 fileOutMEM.write('vname ~pre_en mem_write_en mem_read_en rowdec_en\n')
16 fileOutMEM.write('tunit ns\nslope 0.005\nvih 1.0\nvil 0.0\n\n')
17 destination = ['none', 'none']
18 instructionList = []
19 commandNum = 0
20 for line in fileIn:
21     line = line.replace('$', '')
22     line = line.replace('#', '')
23     line = line.upper()
24     instruction = line.split(' ')
25     if instruction == ['\n']:
26         continue
27     commandNum += 1
28     instruction[-1] = instruction[-1].replace('\n', '')
29     if instruction[0] == 'STOREI' or instruction[0] == 'STORE' or instruction[0] == 'LOAD':
30         for i in range(1, 3):
31             if len(instruction[i]) == 3 and instruction[i][-1] == 'H':
32                 instruction[i] = instruction[i].replace('H', '')
33             elif len(instruction[i]) == 6 and instruction[i][-1] == 'B':
34                 instruction[i] = '%X' % int(instruction[i].replace('B', ''), 2)
35             if len(instruction[i]) == 1:
36                 instruction[i] = '0' + instruction[i]
37     if instruction[0] == 'STOREI':
38         if len(instruction[1]) == 1:
39             addrLSB = instruction[2][-1]
40             if instruction[1] == '1':
41                 instruction.pop(1)
42                 instructionList.append(instruction)
43                 destination.pop(0)
44                 destination.append('none')
45             elif instruction[1] == '2':
46                 if (addrLSB != '0' and addrLSB != '2' and addrLSB != '4' and addrLSB != '6':
47                     and addrLSB != '8' and addrLSB != 'A' and addrLSB != 'C' and addrLSB != 'E'):
48                     print('Error001: Command ' + str(commandNum) + ' is not aligned properly.')
49                     continue
50                 else:
51                     instructionList.append(['STOREI', instruction[2], instruction[3]])
52                     instructionList.append(['STOREI', instruction[2][0] + chr(ord(addrLSB) + 1), instruction[4]])
53                     destination.pop(0)
54                     destination.pop(0)
55                     destination.append('none')
56                     destination.append('none')
57             elif instruction[1] == '4':
58                 if addrLSB != '0' and addrLSB != '4' and addrLSB != '8' and addrLSB != 'C':
59                     print('Error001: Command ' + str(commandNum) + ' is not aligned properly.')
60                     continue
61                 else:
62                     instructionList.append(['STOREI', instruction[2], instruction[3]])
63                     instructionList.append(['STOREI', instruction[2][0] + chr(ord(addrLSB) + 1), instruction[4]])
64                     if addrLSB == '8':
65                         instructionList.append(['STOREI', instruction[2][0] + 'A', instruction[5]])
66                         instructionList.append(['STOREI', instruction[2][0] + 'B', instruction[6]])

```

```

67         instructionList.append(['STOREI', instruction[2][0] + chr(ord(addrLSB) + 2), instruction[5]])
68         instructionList.append(['STOREI', instruction[2][0] + chr(ord(addrLSB) + 3), instruction[6]])
69         destination = ['none', 'none']
70     else:
71         print('Error000: Command ' + str(commandNum) + ' has invalid burst length.')
72         continue
73     else:
74         instructionList.append(instruction)
75         destination.pop(0)
76         destination.append('none')
77     else:
78         if instruction[0] != 'LOADI' and instruction[0] != 'LOAD' and instruction[0] != 'NOP':
79             while instruction[2] in destination or (instruction[0] != 'STORE' and instruction[3] in destination):
80                 instructionList.append(['NOP'])
81                 destination.pop(0)
82                 destination.append('none')
83             instructionList.append(instruction)
84             destination.pop(0)
85             if instruction[0] != 'STORE' and instruction[0] != 'NOP':
86                 destination.append(instruction[1])
87             else:
88                 destination.append('none')
89             clkPeriod = 0.6
90         clk = '1'
91         reset_bar = '0'
92         reg_write = '0'
93         reg_dst = '0'
94         reg_src_1 = '0'
95         reg_src_2 = '0'
96         and_en = '0'
97         or_en = '0'
98         add_en = '0'
99         mul_en = '0'
100        min_en = '0'
101        shift_en = '0'
102        right_shift = '0'
103        imm_op = '0'
104        alu_load_sel = '0'
105        loadi_sel = '0'
106        data_in = '0000'
107        addr = '00'
108        fileOutCPU.write('0      ')
109        fileOutCPU.write(clk + ' ' + reset_bar + ' ' + reg_write + ' ' + reg_dst + ' ' + reg_src_1 + ' ' + reg_src_2 + ' ')
110        fileOutCPU.write(and_en + ' ' + or_en + ' ' + add_en + ' ' + mul_en + ' ' + min_en + ' ' + shift_en + ' ' + right_shift + ' ')
111        fileOutCPU.write(imm_op + ' ' + alu_load_sel + ' ' + loadi_sel + ' ' + data_in + ' ' + addr + '\n')
112        fileOutMEM.write('0 1 0 0 0\n')
113        reset_bar = '1'
114    for i in range(len(instructionList)):
115        op = instructionList[i]
116        if op[0] == 'STOREI' or op[0] == 'STORE' or op[0] == 'NOP':
117            reg_write = '0'
118        else:
119            reg_write = '1'
120            reg_dst = op[1]
121        if op[0] != 'STOREI' and op[0] != 'STORE' and op[0] != 'LOADI' and op[0] != 'LOAD' and op[0] != 'NOP':
122            reg_src_1 = op[2]
123        if op[0] == 'STORE':
124            reg_src_2 = op[2]
125        elif op[0] == 'AND' or op[0] == 'OR' or op[0] == 'ADD' or op[0] == 'MUL' or op[0] == 'MIN':
126            reg_src_2 = op[3]
127        if op[0] == 'ANDI' or op[0] == 'AND':
128            and_en = '1'
129        else:
130            and_en = '0'
131        if op[0] == 'ORI' or op[0] == 'OR':
132

```

```

133     or_en = '1'
134 else:
135     or_en = '0'
136 if op[0] == 'ADDI' or op[0] == 'MINI' or op[0] == 'ADD' or op[0] == 'MIN':
137     add_en = '1'
138 else:
139     add_en = '0'
140 if op[0] == 'MULI' or op[0] == 'MUL':
141     mul_en = '1'
142 else:
143     mul_en = '0'
144 if op[0] == 'MINI' or op[0] == 'MIN':
145     min_en = '1'
146 else:
147     min_en = '0'
148 if op[0] == 'SFL':
149     shift_en = '1'
150     right_shift = '0'
151 elif op[0] == 'SFR':
152     shift_en = '1'
153     right_shift = '1'
154 else:
155     shift_en = '0'
156 if op[0] == 'STORE' or op[0] == 'LOAD' or op[0] == 'AND' or op[0] == 'OR' or op[0] == 'ADD' or op[0] == 'MUL' or op[0] == 'MIN':
157     imm_op = '0'
158 else:
159     imm_op = '1'
160 if op[0] == 'LOAD':
161     alu_load_sel = '1'
162 else:
163     alu_load_sel = '0'
164 if op[0] == 'LOADI':
165     loadi_sel = '1'
166 else:
167     loadi_sel = '0'
168 if op[0] == 'STOREI' or op[0] == 'LOADI':
169     data_in = op[2]
170 elif op[0] == 'ANDI' or op[0] == 'ORI' or op[0] == 'ADDI' or op[0] == 'MINI' or op[0] == 'SFL' or op[0] == 'SFR':
171     data_in = op[3]
172 elif op[0] == 'MULI':
173     data_in = '00' + op[3]
174 if op[0] == 'STOREI' or op[0] == 'STORE':
175     addr = op[1]
176     fileOutMEM.write('\n; ')
177     for argument in op:
178         fileOutMEM.write(argument + ' ')
179     fileOutMEM.write('\n')
180     fileOutMEM.write(str(round(i * clkPeriod + 2 * clkPeriod, 3)) + ' ')
181     fileOutMEM.write('0 0 0\n')
182     fileOutMEM.write(str(round(i * clkPeriod + 2 * clkPeriod + 0.1, 3)) + ' ')
183     fileOutMEM.write('1 1 0 1\n')
184     fileOutMEM.write(str(round(i * clkPeriod + 2 * clkPeriod + clkPeriod * 0.8, 3)) + ' ')
185     fileOutMEM.write('1 0 0 0\n')
186 elif op[0] == 'LOAD':
187     addr = op[2]
188     fileOutMEM.write('\n; ')
189     for argument in op:
190         fileOutMEM.write(argument + ' ')
191     fileOutMEM.write('\n')
192     fileOutMEM.write(str(round(i * clkPeriod + 2 * clkPeriod, 3)) + ' ')
193     fileOutMEM.write('0 0 0\n')
194     fileOutMEM.write(str(round(i * clkPeriod + 2 * clkPeriod + 0.1, 3)) + ' ')
195     fileOutMEM.write('1 0 0 1\n')
196     fileOutMEM.write(str(round(i * clkPeriod + 2 * clkPeriod + clkPeriod * 0.6, 3)) + ' ')
197     fileOutMEM.write('1 0 1 0\n')
198     fileOutMEM.write(str(round(i * clkPeriod + 2 * clkPeriod + clkPeriod * 0.9, 3)) + ' ')

```

```

199     fileOutMEM.write('1 0 0 0\n')
200     fileOutCPU.write('\n; ')
201     for argument in op:
202         fileOutCPU.write(argument + ' ')
203     fileOutCPU.write('\n')
204     clk = '0'
205     fileOutCPU.write(str(round(i * clkPeriod + clkPeriod / 2, 3)) + ' ')
206     fileOutCPU.write(clk + ' ' + reset_bar + ' ' + reg_write + ' ' + reg_dst + ' ' + reg_src_1 + ' ' + reg_src_2 + ' ')
207     fileOutCPU.write(and_en + ' ' + or_en + ' ' + add_en + ' ' + mul_en + ' ' + min_en + ' ' + shift_en + ' ' + right_shift + ' ')
208     fileOutCPU.write(imm_op + ' ' + alu_load_sel + ' ' + loadi_sel + ' ' + data_in + ' ' + addr + '\n')
209     clk = '1'
210     fileOutCPU.write(str(round(i * clkPeriod + clkPeriod, 3)) + ' ')
211     fileOutCPU.write(clk + ' ' + reset_bar + ' ' + reg_write + ' ' + reg_dst + ' ' + reg_src_1 + ' ' + reg_src_2 + ' ')
212     fileOutCPU.write(and_en + ' ' + or_en + ' ' + add_en + ' ' + mul_en + ' ' + min_en + ' ' + shift_en + ' ' + right_shift + ' ')
213     fileOutCPU.write(imm_op + ' ' + alu_load_sel + ' ' + loadi_sel + ' ' + data_in + ' ' + addr + '\n')
214 fileOutCPU.write('\n; END OF INSTRUCTION SEQUENCE')
215 for j in range(1, 6):
216     clk = '0'
217     fileOutCPU.write('\n' + str(round((i + j) * clkPeriod + clkPeriod / 2, 3)) + ' ')
218     fileOutCPU.write(clk + ' 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000 00\n')
219     clk = '1'
220     fileOutCPU.write(str(round((i + j) * clkPeriod + clkPeriod, 3)) + ' ')
221     fileOutCPU.write(clk + ' 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000 00\n')
222 fileIn.close()
223 fileOutCPU.close()
224 fileOutMEM.close()
225
226 # Back-end: Automated Result Verification
227 def signed(value, bits):
228     if value & (1 << (bits - 1)):
229         value = value - (1 << bits)
230     return value
231 def unsigned(value, bits):
232     if value < 0:
233         value = value + (1 << bits)
234     return value
235 fileOutGolden = open('golden_results_new.txt', 'w')
236 SRAM = {}
237 RF = {}
238 for instruction in instructionList:
239     if instruction[0] == 'STOREI':
240         SRAM[instruction[1]] = int(instruction[2], 16)
241     elif instruction[0] == 'LOADI':
242         RF[instruction[1]] = int(instruction[2], 16)
243     elif instruction[0] == 'STORE':
244         if instruction[2] in RF.keys():
245             SRAM[instruction[1]] = RF[instruction[2]]
246     elif instruction[0] == 'LOAD':
247         if instruction[2] in SRAM.keys():
248             RF[instruction[1]] = SRAM[instruction[2]]
249     elif instruction[0] == 'AND':
250         if instruction[2] in RF.keys() and instruction[3] in RF.keys():
251             RF[instruction[1]] = RF[instruction[2]] & RF[instruction[3]]
252     elif instruction[0] == 'ANDI':
253         if instruction[2] in RF.keys():
254             RF[instruction[1]] = RF[instruction[2]] & int(instruction[3], 16)
255     elif instruction[0] == 'OR':
256         if instruction[2] in RF.keys() and instruction[3] in RF.keys():
257             RF[instruction[1]] = RF[instruction[2]] | RF[instruction[3]]
258     elif instruction[0] == 'ORI':
259         if instruction[2] in RF.keys():
260             RF[instruction[1]] = RF[instruction[2]] | int(instruction[3], 16)
261     elif instruction[0] == 'ADD':
262         if instruction[2] in RF.keys() and instruction[3] in RF.keys():
263             result = RF[instruction[2]] + RF[instruction[3]]
264             result_hex = hex(result).replace('0x', '')

```

```

265     if len(result_hex) > 4:
266         result_hex = result_hex[1:]
267         result = int(result_hex, 16)
268         RF[instruction[1]] = result
269     elif instruction[0] == 'ADDI':
270         if instruction[2] in RF.keys():
271             result = RF[instruction[2]] + int(instruction[3], 16)
272             result_hex = hex(result).replace('0x', '')
273             if len(result_hex) > 4:
274                 result_hex = result_hex[1:]
275             result = int(result_hex, 16)
276             RF[instruction[1]] = result
277     elif instruction[0] == 'MUL':
278         if instruction[2] in RF.keys() and instruction[3] in RF.keys():
279             op1 = signed(int(bin(RF[instruction[2]]).replace('0b', '')[-5:], 2), 5)
280             op2 = signed(int(bin(RF[instruction[3]]).replace('0b', '')[-5:], 2), 5)
281             result = op1 * op2
282             RF[instruction[1]] = unsigned(result, 16)
283     elif instruction[0] == 'MULI':
284         if instruction[2] in RF.keys():
285             op1 = signed(int(bin(RF[instruction[2]]).replace('0b', '')[-5:], 2), 5)
286             op2 = signed(int(instruction[3], 16), 5)
287             result = op1 * op2
288             RF[instruction[1]] = unsigned(result, 16)
289     elif instruction[0] == 'MIN':
290         if instruction[2] in RF.keys() and instruction[3] in RF.keys():
291             if signed(RF[instruction[2]], 16) < signed(RF[instruction[3]], 16):
292                 RF[instruction[1]] = RF[instruction[2]]
293             else:
294                 RF[instruction[1]] = RF[instruction[3]]
295     elif instruction[0] == 'MINI':
296         if instruction[2] in RF.keys():
297             if signed(RF[instruction[2]], 16) < signed(int(instruction[3], 16), 16):
298                 RF[instruction[1]] = RF[instruction[2]]
299             else:
300                 RF[instruction[1]] = int(instruction[3], 16)
301     elif instruction[0] == 'SFL':
302         if instruction[2] in RF.keys():
303             result = RF[instruction[2]] << int(instruction[3], 16)
304             result_hex = hex(result).replace('0x', '')
305             if len(result_hex) > 4:
306                 result_hex = result_hex[-4:]
307             result = int(result_hex, 16)
308             RF[instruction[1]] = result
309     elif instruction[0] == 'SFR':
310         if instruction[2] in RF.keys():
311             RF[instruction[1]] = unsigned(signed(RF[instruction[2]], 16) >> int(instruction[3], 16), 16)
312 fileOutGolden.write('Register File - Final Content:\n')
313 for key in sorted(RF):
314     fileOutGolden.write('$' + key + ': ')
315     content_hex = hex(RF[key]).replace('0x', '')
316     while len(content_hex) < 4:
317         content_hex = '0' + content_hex
318     fileOutGolden.write(content_hex.upper() + '\n')
319 fileOutGolden.write('\nSRAM - Final Content:\n')
320 for key in sorted(SRAM):
321     fileOutGolden.write(key + 'H: ')
322     content_hex = hex(SRAM[key]).replace('0x', '')
323     while len(content_hex) < 4:
324         content_hex = '0' + content_hex
325     fileOutGolden.write(content_hex.upper() + '\n')
326 fileOutGolden.close()
327

```

- **Vector File and Golden Results**

- Note that the asynchronous control signals required for precharging, reading from and writing to the SRAM are produced directly by the front-end Python script through a separate vector file named “mem.vec” apart from the main processor input vector file “cpu.vec”.

```

67 ; STORE 00 6
68 8.7 0 1 0 1 7 6 0 0 0 0 0 0 0 0 0 0 0 0EE 00
69 9.0 1 1 0 1 7 6 0 0 0 0 0 0 0 0 0 0 0 0EE 00
70
71 ; SFL 5 3 0005
72 9.3 0 1 1 5 3 6 0 0 0 0 0 1 0 1 0 0 0005 00
73 9.6 1 1 1 5 3 6 0 0 0 0 0 1 0 1 0 0 0005 00
74
75 ; NOP
76 9.9 0 1 0 5 3 6 0 0 0 0 0 0 1 0 0 0 0005 00
77 10.2 1 1 0 5 3 6 0 0 0 0 0 0 1 0 0 0 0005 00
78
79 ; NOP
80 10.5 0 1 0 5 3 6 0 0 0 0 0 0 1 0 0 0 0005 00
81 10.8 1 1 0 5 3 6 0 0 0 0 0 0 1 0 0 0 0005 00
82
83 ; OR 6 5 4
84 11.1 0 1 1 6 5 4 0 1 0 0 0 0 0 0 0 0 0005 00
85 11.4 1 1 1 6 5 4 0 1 0 0 0 0 0 0 0 0 0005 00
86
87 ; NOP
88 11.7 0 1 0 6 5 4 0 0 0 0 0 0 1 0 0 0 0005 00
89 12.0 1 1 0 6 5 4 0 0 0 0 0 0 1 0 0 0 0005 00
90
91 ; NOP
92 12.3 0 1 0 6 5 4 0 0 0 0 0 0 1 0 0 0 0005 00
93 12.6 1 1 0 6 5 4 0 0 0 0 0 0 1 0 0 0 0005 00
94
95 ; ANDI 6 6 00CC
96 12.9 0 1 1 6 6 4 1 0 0 0 0 0 1 0 0 0 00CC 00
97 13.2 1 1 1 6 6 4 1 0 0 0 0 0 1 0 0 0 00CC 00
98
99 ; NOP
100 13.5 0 1 0 6 6 4 0 0 0 0 0 0 1 0 0 0 00CC 00
101 13.8 1 1 0 6 6 4 0 0 0 0 0 0 1 0 0 0 00CC 00
102
103 ; NOP
104 14.1 0 1 0 6 6 4 0 0 0 0 0 0 1 0 0 0 00CC 00
105 14.4 1 1 0 6 6 4 0 0 0 0 0 0 1 0 0 0 00CC 00
106
107 ; ADD 6 6 4
108 14.7 0 1 1 6 6 4 0 0 1 0 0 0 0 0 0 0 00CC 00
109 15.0 1 1 1 6 6 4 0 0 1 0 0 0 0 0 0 0 00CC 00
110
111 ; STORE 09 5
112 15.3 0 1 0 6 6 5 0 0 0 0 0 0 0 0 0 0 00CC 09
113 15.6 1 1 0 6 6 5 0 0 0 0 0 0 0 0 0 0 00CC 09
114
115 ; NOP
116 15.9 0 1 0 6 6 5 0 0 0 0 0 0 1 0 0 0 00CC 09
117 16.2 1 1 0 6 6 5 0 0 0 0 0 0 1 0 0 0 00CC 09
118
119 ; STORE 10 6
120 16.5 0 1 0 6 6 6 0 0 0 0 0 0 0 0 0 0 00CC 10
121 16.8 1 1 0 6 6 6 0 0 0 0 0 0 0 0 0 0 00CC 10
122
123 ; LOAD 1 00
124 17.1 0 1 1 1 6 6 0 0 0 0 0 0 0 0 0 1 0 00CC 00
125 17.4 1 1 1 1 6 6 0 0 0 0 0 0 0 0 0 1 0 00CC 00
126
127 ; LOAD 1 1F
128 17.7 0 1 1 1 6 6 0 0 0 0 0 0 0 0 0 1 0 00CC 1F
129 18.0 1 1 1 1 6 6 0 0 0 0 0 0 0 0 0 1 0 00CC 1F
130
131 ; LOAD 1 09
132 18.3 0 1 1 1 6 6 0 0 0 0 0 0 0 0 0 1 0 00CC 09
133

```

```

18.6 1 1 1 1 6 6 0 0 0 0 0 0 0 0 0 1 0 00CC 09
134
135 ; LOAD 1 10
136 18.9 0 1 1 1 1 6 6 0 0 0 0 0 0 0 0 1 0 00CC 10
137 19.2 1 1 1 1 6 6 0 0 0 0 0 0 0 0 0 1 0 00CC 10
138
139 ; END OF INSTRUCTION SEQUENCE
140 19.5 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000 00
141 19.8 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000 00
142
143 20.1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000 00
144 20.4 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000 00
145
146 20.7 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000 00
147 21.0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000 00
148
149 21.3 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000 00
150 21.6 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000 00
151
152 21.9 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000 00
153 22.2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000 00
154

```

|                                                    |                  |
|----------------------------------------------------|------------------|
| 1 radix 1 1 1 1                                    | 65 ; STORE 09 5  |
| 2 io i i i i                                       | 66 16.2 0 0 0 0  |
| 3 vname ~pre_en mem_write_en mem_read_en rowdec_en | 67 16.3 1 1 0 1  |
| 4 tunit ns                                         | 68 16.68 1 0 0 0 |
| 5 slope 0.005                                      | 69               |
| 6 vih 1.0                                          | 70 ; STORE 10 6  |
| 7 vil 0.0                                          | 71 17.4 0 0 0 0  |
| 8                                                  | 72 17.5 1 1 0 1  |
| 9 0 1 0 0 0                                        | 73 17.88 1 0 0 0 |
| 10                                                 | 74               |
| 11 ; STOREI 0A FF0F                                | 75 ; LOAD 1 00   |
| 12 1.2 0 0 0 0                                     | 76 18.0 0 0 0 0  |
| 13 1.3 1 1 0 1                                     | 77 18.1 1 0 0 1  |
| 14 1.68 1 0 0 0                                    | 78 18.36 1 0 1 0 |
| 15                                                 | 79 18.54 1 0 0 0 |
| 16 ; STOREI 0B 0014                                | 80               |
| 17 1.8 0 0 0 0                                     | 81 ; LOAD 1 1F   |
| 18 1.9 1 1 0 1                                     | 82 18.6 0 0 0 0  |
| 19 2.28 1 0 0 0                                    | 83 18.7 1 0 0 1  |
| 20                                                 | 84 18.96 1 0 1 0 |
| 21 ; STOREI 18 000B                                | 85 19.14 1 0 0 0 |
| 22 2.4 0 0 0 0                                     | 86               |
| 23 2.5 1 1 0 1                                     | 87 ; LOAD 1 09   |
| 24 2.88 1 0 0 0                                    | 88 19.2 0 0 0 0  |
| 25                                                 | 89 19.3 1 0 0 1  |
| 26 ; STOREI 19 00EE                                | 90 19.56 1 0 1 0 |
| 27 3.0 0 0 0 0                                     | 91 19.74 1 0 0 0 |
| 28 3.1 1 1 0 1                                     | 92               |
| 29 3.48 1 0 0 0                                    | 93 ; LOAD 1 10   |
| 30                                                 | 94 19.8 0 0 0 0  |
| 31 ; LOAD 7 0A                                     | 95 19.9 1 0 0 1  |
| 32 3.6 0 0 0 0                                     | 96 20.16 1 0 1 0 |
| 33 3.7 1 0 0 1                                     | 97 20.34 1 0 0 0 |
| 34 3.96 1 0 1 0                                    | 98               |
| 35 4.14 1 0 0 0                                    |                  |
| 36                                                 |                  |
| 37 ; LOAD 2 0B                                     |                  |
| 38 4.2 0 0 0 0                                     |                  |
| 39 4.3 1 0 0 1                                     |                  |
| 40 4.56 1 0 1 0                                    |                  |
| 41 4.74 1 0 0 0                                    |                  |
| 42                                                 |                  |
| 43 ; LOAD 3 19                                     |                  |
| 44 4.8 0 0 0 0                                     |                  |
| 45 4.9 1 0 0 1                                     |                  |
| 46 5.16 1 0 1 0                                    |                  |
| 47 5.34 1 0 0 0                                    |                  |
| 48                                                 |                  |
| 49 ; LOAD 4 18                                     |                  |
| 50 5.4 0 0 0 0                                     |                  |
| 51 5.5 1 0 0 1                                     |                  |
| 52 5.76 1 0 1 0                                    |                  |
| 53 5.94 1 0 0 0                                    |                  |
| 54                                                 |                  |
| 55 ; STORE 1F 5                                    |                  |
| 56 9.0 0 0 0 0                                     |                  |
| 57 9.1 1 1 0 1                                     |                  |
| 58 9.48 1 0 0 0                                    |                  |
| 59                                                 |                  |
| 60 ; STORE 00 6                                    |                  |
| 61 9.6 0 0 0 0                                     |                  |
| 62 9.7 1 1 0 1                                     |                  |
| 63 10.08 1 0 0 0                                   |                  |

- Demo Instructions and Golden Results:

```
STOREI 0AH #ff0f
STOREI 0BH #0014
STOREI 2 18H #000B #00EE
LOAD $7 0AH
LOAD $2 0BH
LOAD $3 19H
LOAD $4 18H
MUL $5 $7 $2
MUL $6 $3 $5
MIN $1 $7 $2
STORE 1FH $5
STORE 00H $6
SFL $5 $3 #0005
OR $6 $5 $4
ANDI $6 $6 #00CC
ADD $6 $6 $4
STORE 09H $5
STORE 10H $6
LOAD $1 00H
LOAD $1 1FH
LOAD $1 09H
LOAD $1 10H
```

**Register File - Final Content:**

\$1: 00D3  
\$2: 0014  
\$3: 00EE  
\$4: 000B  
\$5: 1DC0  
\$6: 00D3  
\$7: FF0F

**SRAM - Final Content:**

00H: 00A8  
09H: 1DC0  
0AH: FF0F  
0BH: 0014  
10H: 00D3  
18H: 000B  
19H: 00EE  
1FH: FF4C

- **Simulation Results**

- The following waveforms show the register file contents from \$0 to \$7 throughout the execution of the demo instruction sequence.



- **Performance Summary**

| Clock Period [ns] | Total Time [ns] | Power [ $\mu\text{W}$ ] | Estimated Area [ $\mu\text{m}^2$ ]                   |
|-------------------|-----------------|-------------------------|------------------------------------------------------|
| 0.59              | 20.85           | 794                     | 5632<br>(60.53 $\mu\text{m}$ x 93.04 $\mu\text{m}$ ) |