

## EE 5323

Fall 2024, Yu (Kevin) Cao

### Project 3

#### Important dates:

**Final report** on the complete IF neuron design: *December 8, Sunday, 11:59pm.*

**Recommended milestone:** schematic, layout and simulation of one 1-bit adder and one D flipflop: *November 22, Friday*

Report submission: *A softcopy to Canvas.*

Design files: *Please keep all the files of your design, such as the schematic and layout, until December 20th. The TA may need to check with you to validate the results.*

**Submission:** Prepare a concise report with important results only. **Please use the posted cover page for the final report.** You don't have to include the entire set of your circuit schematics or layout plots. Place the report in Assignment at Canvas with the filename as: EE5323\_Project\_2\_your name.

#### Design of an 4-to-1 Integrate-and-Fire (IF) Neuron

Machine learning algorithms have made significant progress in recent years, achieving accuracy close to, or even better than human-level perception in various tasks. However, their computing efficiency and energy consumption are still far behind those of a brain. To bridge the gap in hardware implementation, neuromorphic circuits are proposed to emulate spiking neurons and synapses in silicon hardware. Examples include the TrueNorth by IBM and the Loihi by Intel.

In this project, we will design a digital spiking neuron with the essential functions of:

- *Integrate:* This neuron ( $z$ ) first integrates the input from other 4 neurons ( $x_0$  to  $x_3$ ). Each input,  $x_i$ , is a 4-bit binary data. There are 4 synapses ( $w_0$  to  $w_3$ ) connecting  $x_i$  with  $z$ , respectively. Each  $w_i$  is a 1-bit binary data (i.e.,  $w_i$  is either 0 or 1). Figure 1 illustrates the network structure, i.e.,  $z$  performs the weighted sum of  $x_i$ :

$$z = w_0 \cdot x_0 + w_1 \cdot x_1 + w_2 \cdot x_2 + w_3 \cdot x_3$$



Figure 1. The network structure of an integrate-and-

- *Fire:* The output of  $z$ ,  $F$ , is a 1-bit data. If  $z$  is larger than or equal to 16 (i.e., **10000** as a binary number),  $F = 1$ ; otherwise,  $F = 0$ .

**Design goal:** A 4-to-1 integrate-and-fire neuron that **maximizes the speed**.

This objective of this mini-project is to design such a **high-performance 4-to-1 IF neuron**. The building blocks include a 1-bit adder, a D flip-flop, and other logic gates if needed. It is recommended to start from the architecture and logic flow of this design: how to perform the weighted sum of each input, how to integrate all 4 inputs, which operation can be parallel, how to determine  $F$  from  $z$ , etc.

The design needs to be synchronized: all input bits of  $x$  are from D flip-flops, and the output  $F$  should be stored in a D flip-flop. You need to decide how to pipeline the operations from  $x$  to  $F$ , with additional FFs if needed.

**Technology:** TSMC 16nm,  $V_{DD} = 0.8V$ . All input signals, including the clock, have  $T_r = T_f = 20\text{ps}$  (**0 to  $V_{DD}$** ). The load at the final output bit (at the output of the neuron, but before the FF that latches  $F$ ) is INVD4\* from the library.

**Design approaches:** You are free to choose any type of logic styles to implement both the logic and registers. First design and implement the 1-bit adder and complete its layout. Then design the FFs; connect all the adders, FFs and other logic gates based on the pipeline structure. Finally evaluate the speed of the design from  $x$  to  $F$ .

**Layout style:** Use the concept of standard cells for the layout implementation of the logic and flip-flop circuits. The inverter load of  $F$  is placed in the test bench only; no need to add it in the layout. The use of metals is limited up to M3 (i.e., use M1, M2 and M3 only). M2 can only be routed horizontally and M3 can only be routed vertically.

**Performance metrics:** *Delay* is defined as the minimum time needed (i.e., the product of the number of clock cycles and the minimum achievable clock period) to ensure that the circuit works correctly. *Delay* should be simulated from the post-layout schematics, with the load and all FFs.

**Testbench:** You need to check the critical path in your design to obtain the delay number. In addition, a set of input patterns, as the testbench, will be provided by the TA to validate your design.

**Report:** The quality of your report is as important as the quality of your design.

Your report should provide the following items in this order:

1. Results: the values of your clock period, the minimum delay, and total layout area. Please list them in a table on the cover page of your report.
2. Circuit schematics at the transistor level: the 1-bit adder and one FF;
3. Architecture of the neuron: from  $x_i$  to  $F$ ;
4. Layout: show the layout of your 1-bit adder, 1-bit register, and the entire neuron design;

You can also present your overall design strategies and the critical path. Please keep the report within **five pages**, plus the cover page.

**Grading:** Total 50 points:

25 points from the ranking of *Delay*, 15 points from the ranking of layout area, 5 points on design innovation, and 5 points on the quality of your report.

- Please keep all your design files. The TA may schedule a demo with you. During the demo, we will check functional correctness and layout. You are expected to explain your design aspects.