



# Tiny Tapeout 03 Datasheet

Project Repository

<https://github.com/TinyTapeout/tinytapeout-03>

April 24, 2023

## Contents

|                                          |    |
|------------------------------------------|----|
| Render of whole chip                     | 5  |
| <b>Projects</b>                          | 6  |
| 0 : Test Inverter Project . . . . .      | 6  |
| 1 : ChipTune . . . . .                   | 7  |
| The ChipTune Project . . . . .           | 8  |
| 2 : 7 Segment Life . . . . .             | 13 |
| 3 : Another Piece of Pi . . . . .        | 15 |
| 4 : Wormy . . . . .                      | 17 |
| 5 : Knight Rider Sensor Lights . . . . . | 21 |
| 6 : Single digit latch . . . . .         | 23 |

|                                                                    |     |
|--------------------------------------------------------------------|-----|
| 7 : 4x4 Memory . . . . .                                           | 24  |
| 8 : KS-Signal . . . . .                                            | 25  |
| 9 : Hovalaa CPU . . . . .                                          | 27  |
| 10 : SKINNY SBOX . . . . .                                         | 28  |
| 11 : Stateful Lock . . . . .                                       | 29  |
| 12 : Ascon's 5-bit S-box . . . . .                                 | 30  |
| 13 : 8bit configurable galois Lfsr . . . . .                       | 31  |
| 14 : Sbox SKINNY 8 Bit . . . . .                                   | 32  |
| 15 : BinaryDoorLock . . . . .                                      | 33  |
| 16 : bad apple . . . . .                                           | 34  |
| 17 : TinyFPGA attempt for TinyTapeout3 . . . . .                   | 35  |
| 18 : 4bit Adder . . . . .                                          | 36  |
| 19 : 12-bit PDP8 . . . . .                                         | 37  |
| 20 : CTF - Catch the fish . . . . .                                | 39  |
| 21 : Dot operation calculator . . . . .                            | 40  |
| 22 : RiscV Scan Chain based CPU – block 1 – clocking . . . . .     | 41  |
| 23 : RiscV Scan Chain based CPU – block 2 – instructions . . . . . | 42  |
| 24 : RiscV Scan Chain based CPU – block 3 – registers . . . . .    | 43  |
| 25 : RiscV Scan Chain based CPU – block 4 – ALU . . . . .          | 44  |
| 26 : 31b-PrimeDetector . . . . .                                   | 45  |
| 27 : XOR Stream Cipher . . . . .                                   | 47  |
| 28 : LED Panel Driver . . . . .                                    | 50  |
| 29 : 6-bit FIFO . . . . .                                          | 52  |
| 30 : ezm_cpu . . . . .                                             | 54  |
| 31 : tiny logic analyzer . . . . .                                 | 55  |
| 32 : 4-bit ALU . . . . .                                           | 57  |
| 33 : Pulse-Density Modulators . . . . .                            | 59  |
| 34 : CRC Decelerator . . . . .                                     | 62  |
| 35 : 7 segment seconds . . . . .                                   | 66  |
| 36 : Binary to DEC and HEX . . . . .                               | 67  |
| 37 : Simon Says . . . . .                                          | 68  |
| 38 : Shift Register Ram . . . . .                                  | 71  |
| 39 : tinysat . . . . .                                             | 72  |
| 41 : POV display . . . . .                                         | 73  |
| 42 : Toy CPU . . . . .                                             | 74  |
| 43 : Base-10 grey counter counts from zero to a trillion . . . . . | 76  |
| 44 : ttFIR filter . . . . .                                        | 80  |
| 45 : QTCore-A1 . . . . .                                           | 81  |
| 46 : MicroTapeout . . . . .                                        | 92  |
| 47 : nipple multiplier . . . . .                                   | 96  |
| 48 : Synthesizable Digital Temperature Sensor . . . . .            | 98  |
| 49 : 4-bits sequential ALU . . . . .                               | 100 |

|                                                           |     |
|-----------------------------------------------------------|-----|
| 50 : Brightness control of LED with PWM . . . . .         | 101 |
| 51 : Microtapeout . . . . .                               | 102 |
| Shift Register . . . . .                                  | 102 |
| ALU . . . . .                                             | 102 |
| Adder . . . . .                                           | 102 |
| Seven Segment . . . . .                                   | 102 |
| 52 : Neptune guitar tuner (fixed window) . . . . .        | 104 |
| 53 : Neptune guitar tuner (proportional window) . . . . . | 106 |
| 54 : M segments . . . . .                                 | 108 |
| 55 : 7 segment seconds . . . . .                          | 109 |
| 56 : 7 segment wokwi counter . . . . .                    | 110 |
| 57 : Straight through test . . . . .                      | 112 |
| 58 : Combo lock . . . . .                                 | 113 |
| 59 : ro-based_tempsense . . . . .                         | 114 |
| 60 : FSM_LAT . . . . .                                    | 115 |
| 61 : 7 segment seconds . . . . .                          | 116 |
| 62 : MMM Finite State Machine (4 States) . . . . .        | 117 |
| States . . . . .                                          | 117 |
| Outputs . . . . .                                         | 117 |
| 63 : FSK modem . . . . .                                  | 119 |
| 64 : ChatGPT-created Benchmarks . . . . .                 | 120 |
| 65 : HiddenCPU . . . . .                                  | 138 |
| 66 : Simple UART interface . . . . .                      | 139 |
| 67 : MSF Clock . . . . .                                  | 141 |
| 68 : Hamming(7,4) encoder decoder" . . . . .              | 143 |
| 69 : I2S reeceiver, data mix and transmitter . . . . .    | 144 |
| 70 : S4GA: Super Slow Serial SRAM FPGA . . . . .          | 145 |
| 71 : 7 Segment Random Walk . . . . .                      | 147 |
| 72 : Tiny binarized neural network . . . . .              | 148 |
| 73 : RTL Locked QTCore-A1 . . . . .                       | 149 |
| 74 : Arbiter Game . . . . .                               | 156 |
| 75 : Frequency Divider . . . . .                          | 158 |
| 76 : FullAdderusing4is1 . . . . .                         | 159 |
| 77 : BCDtoDECIMAL . . . . .                               | 161 |
| 78 : AI Decelerator . . . . .                             | 163 |
| 79 : 3BitParallelAdder . . . . .                          | 164 |
| 80 : Asynchronous 3-Bit Down Counter . . . . .            | 166 |
| 81 : Ring oscillator with skew correction . . . . .       | 168 |
| 82 : 3 bit multiplier . . . . .                           | 169 |
| 83 : Tiny Teeth Toothbrush Timer . . . . .                | 171 |
| 84 : 2's Compliment Subtractor . . . . .                  | 173 |
| 85 : Customizable Padlock . . . . .                       | 174 |

|                                                                   |            |
|-------------------------------------------------------------------|------------|
| 86 : Customizable UART String . . . . .                           | 175        |
| 87 : Customizable UART Character . . . . .                        | 176        |
| 88 : 7-Seg ‘Tiny Tapeout’ Display . . . . .                       | 177        |
| <b>Technical info</b>                                             | <b>178</b> |
| Scan chain . . . . .                                              | 178        |
| Clocking . . . . .                                                | 179        |
| Clock divider . . . . .                                           | 180        |
| Wait states . . . . .                                             | 180        |
| Pinout . . . . .                                                  | 180        |
| Instructions to build GDS . . . . .                               | 181        |
| Changing macro block size . . . . .                               | 182        |
| <b>Verification</b>                                               | <b>183</b> |
| Setup . . . . .                                                   | 183        |
| Simulations . . . . .                                             | 183        |
| Top level tests setup . . . . .                                   | 184        |
| Formal Verification . . . . .                                     | 185        |
| Timing constraints . . . . .                                      | 185        |
| Physical tests . . . . .                                          | 186        |
| <b>Toplevel STA and Spice analysis for TinyTapeout-02 and 03.</b> | <b>187</b> |
| Introduction . . . . .                                            | 187        |
| GitHub action . . . . .                                           | 187        |
| Prerequisites . . . . .                                           | 187        |
| Assumptions . . . . .                                             | 188        |
| Issues . . . . .                                                  | 188        |
| Invoking STA analysis . . . . .                                   | 188        |
| How it works . . . . .                                            | 188        |
| Invoking Spice simulation: . . . . .                              | 189        |
| <b>Sponsored by</b>                                               | <b>191</b> |
| <b>Team</b>                                                       | <b>191</b> |

# Render of whole chip



Figure 1: Full GDS

# Projects

## 0 : Test Inverter Project

- Author: Matt Venn
- Description: Inverts every line. This project is also used to fill any empty design spaces.
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

Uses 8 inverters to invert every line.

### How to test

Setting the input switch to on should turn the corresponding LED off.

### IO

| # | Input | Output    |
|---|-------|-----------|
| 0 | a     | segment a |
| 1 | b     | segment b |
| 2 | c     | segment c |
| 3 | d     | segment d |
| 4 | e     | segment e |
| 5 | f     | segment f |
| 6 | g     | segment g |
| 7 | dot   | dot       |

# 1 : ChipTune



Figure 2: picture

- Author: Wallie Everest
- Description: Vintage 8-bit sound generator
- GitHub repository
- HDL project
- Extra docs
- Clock: 9600 Hz
- External hardware: Computer COM port

## How it works

Chiptune implements an 8-bit Programmable Sound Generator. Input is from a serial UART interface. Output is 4-bit DAC audio.

# The ChipTune Project

This is a two-in-one project. First, an audio device replicates the square-wave sound generators of vintage video games. Second, a re-imagined version of the scan controller is implemented to demonstrate faster data acquisition.

## TinyTapeout 3 Configuration

For this third Multi Project Chip (MPC) tapeout, the original scan chain will be configured in ‘external mode’. The inputs and outputs for user projects will be derived externally from scanchain data, not from the chip’s I/O pins. The scanchain control signals will occupy pins designated `io_in` and `io_out`. Expected throughput is better than 10k byte/second. The ChipTune project will configure the shift clock to attain 9600 bytes/sec. This speed provides a 4800 Hz clock and 300 baud communication to the tiny user project.

[Top Level Drawing]

Devices from the eFabless Multi-Project Wafer (MPW) shuttle are delivered in two package options, each with 64 pins. TinyTapeout 2 will be packaged in a QFN, whereas TinyTapout 3 will be packaged as a WCSP. In both delivered configurations, the device is mounted to a daughter board PCB with 10 castellated pins per side. The daughter board provides basic clock, configuration and power management for the Caravel SoC.

## MPRJ\_IO Pin Assignments

| Signal                   | Name             | Dir | WCSP | QFN | PCB   |
|--------------------------|------------------|-----|------|-----|-------|
| <code>mprj_io[0]</code>  | JTAG             | In  | D7   | 31  | J3.14 |
| <code>mprj_io[1]</code>  | SDO              | Out | E9   | 32  | J3.13 |
| <code>mprj_io[2]</code>  | SDI              | In  | F9   | 33  | J3.12 |
| <code>mprj_io[3]</code>  | CSB              | In  | E8   | 34  | J3.11 |
| <code>mprj_io[4]</code>  | SCK              | In  | F8   | 35  | J3.10 |
| <code>mprj_io[5]</code>  | SER_RX           | In  | E7   | 36  | J3.9  |
| <code>mprj_io[6]</code>  | SER_TX           | Out | F7   | 37  | J3.8  |
| <code>mprj_io[7]</code>  | IRQ              | In  | E5   | 41  | J3.7  |
| <code>mprj_io[8]</code>  | DRIVER_SEL[0]    | In  | F5   | 42  | J3.6  |
| <code>mprj_io[9]</code>  | DRIVER_SEL[1]    | In  | E4   | 43  | J3.5  |
| <code>mprj_io[10]</code> | SLOW_CLK         | Out | F4   | 44  | J3.4  |
| <code>mprj_io[11]</code> | SET_CLK_DIV      | In  | E3   | 45  | J3.3  |
| <code>mprj_io[12]</code> | ACTIVE_SELECT[0] | In  | F3   | 46  | J3.2  |
| <code>mprj_io[13]</code> | ACTIVE_SELECT[1] | In  | D3   | 48  | J3.1  |
| <code>mprj_io[14]</code> | ACTIVE_SELECT[2] | In  | E2   | 50  | J2.14 |

| Signal     | Name                         | Dir | WCSP | QFN | PCB   |
|------------|------------------------------|-----|------|-----|-------|
| mpj_io[15] | ACTIVE_SELECT[3]             | In  | F1   | 51  | J2.13 |
| mpj_io[16] | ACTIVE_SELECT[4]             | In  | E1   | 53  | J2.12 |
| mpj_io[17] | ACTIVE_SELECT[5]             | In  | D2   | 54  | J2.11 |
| mpj_io[18] | ACTIVE_SELECT[6]             | In  | D1   | 55  | J2.10 |
| mpj_io[19] | ACTIVE_SELECT[7]             | In  | C10  | 57  | J2.9  |
| mpj_io[20] | ACTIVE_SELECT[8]             | In  | C2   | 58  | J2.8  |
| mpj_io[21] | IO_IN[0] / EXT_SCAN_CLK_OUT  | In  | B1   | 59  | J2.7  |
| mpj_io[22] | IO_IN[1] / EXT_SCAN_DATA_OUT | In  | B2   | 60  | J2.6  |
| mpj_io[23] | IO_IN[2] / EXT_SCAN_SELECT   | In  | A1   | 61  | J2.5  |
| mpj_io[24] | IO_IN[3] / EXT_SCAN_LATCH_EN | In  | C3   | 62  | J2.4  |
| mpj_io[25] | IO_IN[4]                     | In  | A3   | 2   | J2.3  |
| mpj_io[26] | IO_IN[5]                     | In  | B4   | 3   | J2.2  |
| mpj_io[27] | IO_IN[6]                     | In  | A4   | 4   | J2.1  |
| mpj_io[28] | IO_IN[7]                     | In  | B5   | 5   | J1.14 |
| mpj_io[29] | IO_OUT[0] / EXT_SCAN_DATA_IN | Out | A5   | 6   | J1.13 |
| mpj_io[30] | IO_OUT[1] / EXT_SCAN_DATA_IN | Out | B6   | 7   | J1.12 |
| mpj_io[31] | IO_OUT[2]                    | Out | A6   | 8   | J1.11 |
| mpj_io[32] | IO_OUT[3]                    | Out | A7   | 11  | J1.10 |
| mpj_io[33] | IO_OUT[4]                    | Out | C8   | 12  | J1.9  |
| mpj_io[34] | IO_OUT[5]                    | Out | B8   | 13  | J1.4  |
| mpj_io[35] | IO_OUT[6]                    | Out | A8   | 14  | J1.3  |
| mpj_io[36] | IO_OUT[7]                    | Out | B9   | 15  | J1.2  |
| mpj_io[37] | READY                        | Out | A9   | 16  | J1.1  |

## Caravel Connections

Within the Caravel SoC, the TinyTapeout project has configured the user space into 250 sub-projects. Each project is interconnected by a scanchain that serpentine through the chip. Control of the 4-wire chain provides access to each project.

### [Caravel]

The scanchain topology has pros and cons, as would any interconnect scheme. This project presents an alternative topology based on a JTAG implementation. The advantage is a reduction in the length of the register latency.

## Scanchain V1

- Pro: Economical use of resources. Readily hardened in ASIC fabric. Testable on an FPGA platform.
- Con: Very long register chain (2,000) impacts overall acquisition rate.

## **Scanchain V2**

- Pro: Short register chain (10) minimizes latency. Economical use of resources. Testable on an FPGA platform.
- Con: Speed limited by length of multiplexer propagation (250 instances).

## **Multiplexer**

- Pro: Pure combinatorial output. Potential to be the fastest option.
- Con: Requires a large number of long routing resources. Internal tri-state busses not testable on an FPGA platform.

## **Tiny Project Configuration**

This tiny project embeds the revised scanchain (version 2) to investigate its low latency and testability. The penalty for this implementation is 10 clock cycles per transfer. The delay is mitigated by a serial UART expanding the internal registers of the audio generator. Four extra scanchain endpoints are included for demonstration purposes.

### [Project]

A clock generator is used to recover a bit-clock from asynchronous serial data. An external 16x baud clock is required by the design. This planned 4800 Hz project clock will permit a 300 baud serial link.

## **Scanchain Version 2**

The re-imagined scanchain uses a bypass technique commonly seen with JTAG devices. Each ‘tap’ in the chain routes data through a combinatorial multiplexer until the module is activated. Individual taps are assigned a unique 8-bit address during tapeout elaboration. A tap is activated when it receives a matching address message, enabling its’ 8-bit shift register. Unselected taps drive logic 0 into the projects’ inputs to keep them in a quiescent state.

### [Scanchain V2]

Encoding of serial data is compatible with the ubiquitous UART format. The waveform is one-start, eight-data, one-stop (300,8,N,1). Least significant bits are transmitted first. An immediate advantage is the use of a computer COM port to generate and analyze functional data. The serial interface is in addition to decoded parallel data available on the I/O ports.

## **ChipTune Operation**

The audio portion of the project consists of two rectangular pulse generators. Each module is controlled by four 8-bit registers. Configurable parameters are the frequency, duty cycle, sweep, decay, and note duration.

### [Operation]

The frequency range of the project is limited by the legacy scanchain, but mid-range frequencies are acceptable. Additional triangle and noise modules will be added in future work when more bandwidth is available.

## **Design For Test Considerations**

An isolated instance of scanchain version 2 has been tested on an FPGA platform with good results. A shift rate of 3.7 MHz permits communication with the computer at 115,200 baud. Longer scan chains do not affect throughput until multiplexer delays become dominant. For an FPGA, 75% of the timing delays are attributed to routing resources.

### [Test Configuration]

Output of the sub-project is always available at both the parallel output port and the serial data. A ‘Mode’ signal driven by the DTS line controls whether the project’s input is derived from the parallel input port or serial data.

## **Summary**

The next shuttle for TinyTapeout is planning a multiplexer for selecting between the 250 projects. This will alleviate latency in the present design. The revised scanchain offered here is an alternative for other group projects. The scanchain topology still holds merit in many applications, especially for the application of a computer-based logic analyzer via a USB COM port.

## **How to test**

The ChipTune project can be interfaced to a computer COM port at 300 baud. The project’s scanchain is updated at 9600 Hz such that a 4800 Hz UART clock is available at `io_in[4]`. The system clock on `io_in[0]` may also be 4800 Hz. An external bit clock must be provided on `io_in[5]` with a rising edge in the center of the bit period. This clock may be sourced from the bit-aligned reference clock on `io_out[5]`. Serial data is received on `io_in[7]`. Serial data is decoded by the device as an address when `io_in[6] = 0`, and as data when `io_in[6] = 1`. A 4-bit DAC output is available

on `io_out[4:0]`. The mode pin, `io_in[3]` must stay high for this version of the design.

## IO

| # | Input                            | Output                           |
|---|----------------------------------|----------------------------------|
| 0 | <code>io_in[0]</code> (CLOCK)    | <code>io_out[0]</code> (DAC_0)   |
| 1 | <code>io_in[1]</code> (IN_1)     | <code>io_out[1]</code> (DAC_1)   |
| 2 | <code>io_in[2]</code> (IN_2)     | <code>io_out[2]</code> (DAC_2)   |
| 3 | <code>io_in[3]</code> (MODE)     | <code>io_out[3]</code> (DAC_3)   |
| 4 | <code>io_in[4]</code> (BAUD_CLK) | <code>io_out[4]</code> (DAC_4)   |
| 5 | <code>io_in[5]</code> (TCK)      | <code>io_out[5]</code> (REF_CLK) |
| 6 | <code>io_in[6]</code> (TMS)      | <code>io_out[6]</code> (RTCK)    |
| 7 | <code>io_in[7]</code> (TDI)      | <code>io_out[7]</code> (TDO)     |

## 2 : 7 Segment Life

- Author: icegoat9
- Description: Simple 7-segment cellular automaton
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware: None. Could add debounced momentary pushbuttons in parallel with dip switches 1,2,3 to make loading in new patterns and stepping through a run easier.

### How it works

See the Wokwi gate layout and simulation. At a high level...

- Seven flip-flops hold the cellular automaton's internal state, which is also displayed in the seven-segment display.
- Combinatorial logic generates the next state for each segment based on its neighbors, according to the ruleset...
  - Living segments with exactly one living neighbor (another segment that touches it end to end) survive, and all others die.
  - Dead segments with exactly two living neighbors come to life.
- When either the system clock or a user toggling the clock input go high, this new state is latched into the automaton's state.
- There's minor additional support logic to let the user manually shift in an initial condition and handle clock dividing.

### How to test

For full details and a few 'exercises for the reader', see the github README doc link. But at a high level, assuming the IC is mounted on the standard tinytapeout PCB which provides dip switches, clock, and a seven-segment display for output...

- Set all dip switches off and the clock slide switch to the 'manual' clock side.
- Power on the system. An arbitrary state may appear on the 7-segment display.
- Set dip switch 4 on ('run mode').
- Toggle dip switch 1 on and off to advance the automaton to the next state, you should see the 7-segment display update.

If you want to watch it run automatically (which may quickly settle on an empty state or a static pattern, depending on start state)...

- Set the PCB clock divider to the maximum clock division (255). With a system clock of 6.25kHz, the clock input should now be ~24.5Hz.
- Set dip switches 5 and 7 on to add a reasonable additional clock divider (see docs for more details on a higher or lower divider).
- Set dip switch 4 on.
- Switch the clock slide switch to the ‘system clock’ side. The display should advance at roughly 1.5Hz if I’ve done math correctly.

If you want to load an initial state...

- Set dip switch 4 off ('load mode').
- Toggle dip switches 2 and/or 3 on and off seven times total, to shift in 0 and 1 values to the automaton internal state.
- Set dip switch 4 on and run manually or automatically as above.

## IO

| # | Input     | Output     |
|---|-----------|------------|
| 0 | clock     | 7segmentA  |
| 1 | load0     | 7segmentB  |
| 2 | load1     | 7segmentC  |
| 3 | runmode   | 7segmentD  |
| 4 | clockdiv8 | 7segmentE  |
| 5 | clockdiv4 | 7segmentF  |
| 6 | clockdiv2 | 7segmentG  |
| 7 | unused    | 7segmentDP |

### 3 : Another Piece of Pi

- Author: Meinhard Kissich, EAS Group, Graz University of Technology
- Description: This design takes up the idea of James Ross [1], who submitted a circuit to Tiny Tapeout 02 that stores and outputs the first 1024 decimal digits of the number Pi (including the decimal point) to a 7-segment display. In contrast to his approach, a densely packed decimal encoding is used to store the data. With this approach, 1400 digits can be stored and output within the design area of 150um x 170um. However, at 1400 decimals and utilization of 38.99%, the limitation seems to be routing. Like James, I'm also interested to hear about better strategies to fit more information into the design with synthesizable Verilog code. [1] [https://github.com/jar/tt02\\_freespeech](https://github.com/jar/tt02_freespeech)
- GitHub repository
- HDL project
- Extra docs
- Clock: 0 Hz
- External hardware: 7-segment display

#### How it works

The circuit stores each triplet of decimals in a 10-bit vector encoded as densely packed decimals. An index vector selects the current digits to be output to the 7-segment display. It consists of an upper part `index[11:2]` that selects the triplet and a lower part `index[1:0]` that specifies the digit within the triplet. First, the upper part decides on the triplet, which is then decoded into three decimals. Afterwards, the lower part selects one of the three decimals to be decoded into 7-segment display logic and applied to the outputs. The index is incremented at each primary clock edge. However, when the lower part equals three, i.e., `index[1:0]==1'b10`, two is added, as the triplet consists of three (not four) digits.

- `index == 'b0000000000|00: triplet[0], digit 0 within triplet`
- `index == 'b0000000000|01: triplet[0], digit 1 within triplet`
- `index == 'b0000000000|10: triplet[0], digit 2 within triplet`
- `index == 'b0000000001|00: triplet[1], digit 0 within triplet`
- `index == 'b0000000001|01: triplet[1], digit 1 within triplet`
- `index == 'b0000000001|10: triplet[1], digit 2 within triplet`

There is one exception to the rule above: the decimal point. Another multiplexer

at the input of the 7-segment decoder can either forward a digit from the decoded triplet or a constant – the decimal point. Once the lower part of the index counter, i.e., `index[1:0]` reaches 2'b10 for the first time, the multiplexer selects the decimal point and pauses incrementing the index for one clock cycle.

- `index == 'b0000000000|00`: triplet[0], digit 0 within triplet
- `index == 'b0000000000|01`: triplet[0], decimal point
- `index == 'b0000000000|01`: triplet[0], digit 1 within triplet
- `index == 'b0000000000|10`: triplet[0], digit 2 within triplet
- `index == 'b0000000001|00`: triplet[1], digit 0 within triplet

## How to test

For simulation, please use the provided testbench and Makefile. It is important to run the `genmux.py` Python script first, as it generates the test vectors required by the Verilog testbench. For testing the physical chip, release the reset and compare the digits of Pi against a reference.

## IO

| # | Input | Output      |
|---|-------|-------------|
| 0 | clk   | segment a   |
| 1 | reset | segment b   |
| 2 | none  | segment c   |
| 3 | none  | segment d   |
| 4 | none  | segment e   |
| 5 | none  | segment f   |
| 6 | none  | segment g   |
| 7 | none  | decimal LED |

## 4 : Wormy

- Author: nqbit
- Description: MC Wormy Pants squirms like a worm and grows just as fast.
- GitHub repository
- HDL project
- Extra docs
- Clock: 300 Hz
- External hardware:

### How it works

Wormy is a very simple, addictive last person video game. This last person, open-world game takes you down the path of an earthworm. Wormy's world is made up of a 4x4 grid represented by 3x16-bit arrays: Direction[0], Direction[1], and Occupied. The Direction[x] maps keep track of which way a segment of worm moves, if it is on, and Occupied keeps track of if the grid location is occupied.

Example Occupied Grid:

|  |   |   |   |   |  |
|--|---|---|---|---|--|
|  | X | X | X | X |  |
|  |   |   | X |   |  |
|  |   |   | X |   |  |
|  |   |   |   |   |  |

In addition to direction and occupied there are also pointers to the head and the tail. The same grid would look something like the following:

Example Occupied Grid with head(H) and tail(T) highlighted:

|  |   |   |   |   |  |
|--|---|---|---|---|--|
|  | T | X | X | X |  |
|  |   |   | X |   |  |
|  |   |   | H |   |  |
|  |   |   |   |   |  |

BOOM! Wormy shouldn't run into itself. If its head hits any part of its body, that causes a collision. There is a collision if the location of the Wormy's head is occupied by another segment of the Wormy. To determine this, we keep track of the worm location, and specifically the current and future locations of the worm head (H) and tail (T). If the future location of H will occupy a location that will already be occupied, this causes a collision.

This is made a bit trickier with growth (see below), because if the future state of H is set to occupy the current state of T - there is only a collision if growth is set to occur

in the next cycle, so be careful if you plan to have Wormy chase its tail! SPLAT!

NOM NOM! Wormy eats the tasty earth around it and grows. Every so often Wormy, after having eaten all of its lunch, grows a whole segment. During a growth cycle, the state of T simply persists (remains occupied and heading in the same direction as it is).

Last Person Input - see `user_input.v` To move Wormy along, the last player needs to push buttons to help Wormy find more tasty earth: up, right, down or left.

Buttons are nasty little bugs in general. When pushed the button generates an analog signal that might not look exactly like a single rising edge.

It might look something like this:

```
xxxxxxxxxxxxxxxxxxxx  
          xx  
          x  
          x  
          x      x  
          x          x  
          x          x  
          x          x  
          x          x  
          x      xx  x      x  
          x      xxxx      x  
          x      x  xx      x  
          x      x  x  x      x  
          x      x  x  x      x  
          xx      x  x  x      x  
          xxx      x  x  x      x  
          x  xx  x  x  x  x  
          x      x  x  x      x  x  
          x      x  x      xx  x  
          x      xx      x  x  
xxxxxxxxxxxxxxxxxxxx           x
```

In order to help protect our logic from this scary looking signal that might introduce metastability (<- WHAT?), we can filter it with a couple flippy-floppies and keep the metastability at bay. ARGH.

Once we have a clear pushed or not-pushed, we can suggest that Wormy move in a specific direction. If the last player tries button mashing, Wormy won't listen. Once a second Wormy checks what the last button press was and tries really hard to go that way (see BOOM!).

Earthworms don't have eyes - see `multiplexer.v` The game's display is made up of a 4x4

grid of LEDs controlled by a multiplexer. Why multiplexing? With a multiplexed LED setup, we can control more display units (LEDs), with a limited number of outputs (8 on this TinyTapeout project).

To get multiplexing working the network of outputs is mapped to each display unit. This allows us to manipulate assigned outputs to control the state of each display unit, one at a time. We then cycle through each display unit quickly enough to display a persistent image to the last player.

Wires (A1-4, B1-4) map to each location on the game arena (4x4 grid):

|       |    |    |    |    |
|-------|----|----|----|----|
| A1    |    |    |    |    |
| ----- |    |    |    |    |
| A2    |    |    |    |    |
| ----- |    |    |    |    |
| A3    |    |    |    |    |
| ----- |    |    |    |    |
| A4    |    |    |    |    |
| ----- |    |    |    |    |
|       | B1 | B2 | B3 | B4 |

When B's voltage is OFF the LED's state changes to ON if A is also ON. - ON LED state: A(ON) — >| — B(OFF) - OFF LED state: A(ON) — >| — B(ON)

Example: The 3 filled squares below each represent a Wormy segment in the ON state as controlled by the multiplexer. Notice how each is lighter than the last. This is because the multiplexer cycles through each LED to update the state, creating one persistent image even though the LEDS are not on over the entire period of time.

|       |    |    |    |    |
|-------|----|----|----|----|
| A1    |    | 0  |    |    |
| ----- |    |    |    |    |
| A2    |    | o  |    |    |
| ----- |    |    |    |    |
| A3    |    | .  |    |    |
| ----- |    |    |    |    |
| A4    |    |    |    |    |
| ----- |    |    |    |    |
|       | B1 | B2 | B3 | B4 |

Another Example: If you enter a dark cave and point a flashlight straight ahead at one point on the wall you have a very small visual field that is contained within the beam of light. However, you can expand your visual field in the cave by waving the flashlight back and forth across the wall. Despite the fact that the beam is moving over individual points on the wall, the entire wall can be seen at once. This is similar to the concept used in the Wormy display, since the multiplexer changes the state of

the worm occupied locations to ON one at a time, but in a cycle. The result is a solid image, made up of LEDs cycling through ON states to produce a persistent image of Wormy (that beautiful Lumbricina).

## How to test

After reset, you should see a single pixel moving along the display and it should grow every now and then.

## IO

| # | Input   | Output                                                                     |
|---|---------|----------------------------------------------------------------------------|
| 0 | clock   | A0 - Multiplexer channel A to be tied to a an array of 16 multiplexed LEDs |
| 1 | reset   | A1                                                                         |
| 2 | button0 | A2                                                                         |
| 3 | button1 | A3                                                                         |
| 4 | button2 | B0 - Multiplexer channel B to be tied to a an array of 16 multiplexed LEDs |
| 5 | button3 | B1                                                                         |
| 6 | none    | B2                                                                         |
| 7 | none    | B3                                                                         |

## 5 : Knight Rider Sensor Lights



Figure 3: picture

- Author: Kolos Koblasz
- Description: The logic asserts output bits one by one, like KITT's sensor lights in Knight Rider.
- GitHub repository
- HDL project
- Extra docs
- Clock: 6000 Hz
- External hardware: Connect LEDs with ~1K-10K Ohm serial resistors to output pins and connect push button switches to Input[2] and Input[3] which drive the inputs with logic zeros when idle and with logic 1 when pressed. Rising edge on these inputs selects the next settings.

### How it works

Uses several counters, shiftregisters to create a moving light. Input[2] and Input[3] can control speed and brightness respectively. Brightness control is achieved by PWM of the output bits at 50Hz. Simulated with 6KHz clock signal.

## How to test

After reset it starts moving the switched on LED. Input[0] is clk and Input[1] is reset (1=reset on, 0=reset off). By creating rising edges on Input[2] and Input[3] the two config spaces can be discovered. Connect LEDs with ~1K-10K Ohm serial resistors to output pins and connect push button switches to Input[2] and Input[3] which drive the inputs with logic zeros when idle and with logic 1 when pressed. Rising edge on these inputs selects the next settings.

## IO

| # | Input              | Output |
|---|--------------------|--------|
| 0 | clock              | LED 0  |
| 1 | reset              | LED 1  |
| 2 | speed control      | LED 2  |
| 3 | brightness control | LED 3  |
| 4 | none               | LED 4  |
| 5 | none               | LED 5  |
| 6 | none               | LED 6  |
| 7 | none               | LED 7  |

## 6 : Single digit latch

- Author: Dylan Garrett
- Description: Store a single digit 0-9 and display it on a 7-segment display
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

Explain how your project works

### How to test

Explain how to test your project

### IO

| # | Input | Output    |
|---|-------|-----------|
| 0 | clock | segment a |
| 1 | reset | segment b |
| 2 | none  | segment c |
| 3 | none  | segment d |
| 4 | none  | segment e |
| 5 | none  | segment f |
| 6 | none  | segment g |
| 7 | none  | dot       |

## 7 : 4x4 Memory

- Author: Yannick Reiß
- Description: Store 4x4 bits of memory.
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 1000 Hz
- External hardware:

### How it works

On write enable, the four data-inputs are saved to the d-flipflops selected by the address inputs. The output is always read from the current selected flipflops.

### How to test

Connect a clock, buttons for reset and write\_enable and switches for address and data like shown below. The output can be read by using 4 LEDs or any other kind of binary output device.

### IO

| # | Input        | Output |
|---|--------------|--------|
| 0 | clock        | data1  |
| 1 | reset        | data2  |
| 2 | write_enable | data3  |
| 3 | addr1        | data4  |
| 4 | addr2        | none   |
| 5 | data1        | none   |
| 6 | data2        | none   |
| 7 | data3        | none   |

## 8 : KS-Signal

- Author: Yannick Reiß
- Description: Set KS-Signal based on track information.
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 10000 Hz
- External hardware: Input: Clock, 9 Buttons, Output: 3 yellow LEDs, 3 white LEDs, 1 green LED, 1 red LED, 1 orange LED

### How it works

Simply switches lamps on and off on toggle.

### How to test

Connect the clock, and the buttons as inputs. Connect the LEDs as outputs in the order shown below.

### IO

| # | Input                     | Output                            |
|---|---------------------------|-----------------------------------|
| 0 | track1_free               | Fahrt! (Green)                    |
| 1 | track2_free               | Halt Erwarten! (Orange)           |
| 2 | pre_signal                | Halt! (Red)                       |
| 3 | none                      | Vorsicht! (All yellow)            |
| 4 | none                      | pre signal indicator (white)      |
| 5 | allow shunting            | lower shunting indicator (white)  |
| 6 | Vorsicht!                 | shorter breaking distance (white) |
| 7 | shorter breaking distance | none                              |



Figure 4: picture

## 9 : Hovalaag CPU

- Author: Mike Bell
- Description: Implementation of the CPU from HOVALAAG
- GitHub repository
- HDL project
- Extra docs
- Clock: 12500 Hz
- External hardware:

### How it works

HOVALAAG (Hand-Optimizing VLIW Assembly Language as a Game) is a free Zachlike game.

This is an implementation of the VLIW processor from the game. Thank you to @nothings for the fun game, making the assembler public domain, and for permission to create this hardware implementation.

The processor uses 32-bit instructions and has 12-bit I/O. The instruction and data are therefore passed in and out over 10 clocks per processor clock.

More details in the github repo.

### How to test

The assembler can be downloaded to generate programs.

The subcycle counter can be reset independently of the rest of the processor, to ensure you can get to a known state without clearing all registers.

### IO

| # | Input                                                | Output   |
|---|------------------------------------------------------|----------|
| 0 | Clock                                                | Output 0 |
| 1 | Reset disable (resets enabled when low)              | Output 1 |
| 2 | Input 0 or Reset (when high)                         | Output 2 |
| 3 | Input 1 or Reset subcycle count (when high)          | Output 3 |
| 4 | Input 2 or enable ROSC (when high and reset enabled) | Output 4 |
| 5 | Input 3                                              | Output 5 |
| 6 | Input 4                                              | Output 6 |
| 7 | Input 5                                              | Output 7 |

## 10 : SKINNY SBOX

- Author: Niklas Fassbender
- Description: Implementation of a 4-Bit Sbox for SKINNY
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

Explain how your project works

### How to test

Explain how to test your project

### IO

| # | Input     | Output    |
|---|-----------|-----------|
| 0 | bit_0_lsb | bit_0_lsb |
| 1 | bit_1     | bit_1     |
| 2 | bit_2     | bit_2     |
| 3 | bit_3_msb | bit_3_msb |
| 4 | none      | none      |
| 5 | none      | none      |
| 6 | none      | none      |
| 7 | none      | none      |

## 11 : Stateful Lock

- Author: Tim Henkes
- Description: A little combination lock which requires three codes in the correct order to unlock
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

A little two-bit state machine decides which code is required to enter the next state. The fourth state equals to the lock being open. A wrong code in any state resets to the first state.

### How to test

To test the project, refer to its Wokwi, which is public.

### IO

| # | Input        | Output      |
|---|--------------|-------------|
| 0 | unused       | lock status |
| 1 | reset        | unused      |
| 2 | enter        | unused      |
| 3 | code digit 0 | unused      |
| 4 | code digit 1 | unused      |
| 5 | code digit 2 | unused      |
| 6 | code digit 3 | unused      |
| 7 | code digit 4 | unused      |

## 12 : Ascon's 5-bit S-box

- Author: Fabio Campos
- Description: Ascon's 5-bit S-box
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

See Figure 4a in <https://eprint.iacr.org/2021/1574.pdf>

### How to test

See Section 3.2 and Table 4 in <https://eprint.iacr.org/2021/1574.pdf>

### IO

| # | Input | Output |
|---|-------|--------|
| 0 | x0    | x0     |
| 1 | x1    | x1     |
| 2 | x2    | x2     |
| 3 | x3    | x3     |
| 4 | x4    | x4     |
| 5 | none  | none   |
| 6 | none  | none   |
| 7 | none  | none   |

## 13 : 8bit configurable galois Ifsr

- Author: Alexander Schönborn
- Description: A 8bit configurable galois Ifsr.
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware:

### How it works

Uses 2 cycles each to set the shift and xor mask registers. It has 3 operation modes: normal shift,

### How to test

After reset, set the shift register and xor state, after that normal shift mode

### IO

| # | Input                                                           | Output                                       |
|---|-----------------------------------------------------------------|----------------------------------------------|
| 0 | clock                                                           | output[0] lsb normal Ifsr output             |
| 1 | reset                                                           | output[1] other 7 bits to see the full state |
| 2 | mode[0] 00 normal shift mode, 01 set register mode,             | output[2]                                    |
| 3 | mode[1] 10 set mode registers, 11 unused                        | output[3]                                    |
| 4 | data_in[0] is used for both filling register and xor mask state | output[4]                                    |
| 5 | data_in[1] needs 2 cycles to fill all 8 bits                    | output[5]                                    |
| 6 | data_in[2] first cycle is lower 4 bits, 2nd upper 4 bits        | output[6]                                    |
| 7 | data_in[3] see above                                            | output[7]                                    |

## 14 : Sbox SKINNY 8 Bit

- Author: Thorsten Knoll
- Description: A circuit for the substitution of 8 bits. Made for the SKINNY cipher algorithm.
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

Explain how your project works

### How to test

Explain how to test your project

### IO

| # | Input | Output |
|---|-------|--------|
| 0 | in_0  | out_0  |
| 1 | in_1  | out_1  |
| 2 | in_2  | out_2  |
| 3 | in_3  | out_3  |
| 4 | in_4  | out_4  |
| 5 | in_5  | out_5  |
| 6 | in_6  | out_6  |
| 7 | in_7  | out_7  |

## 15 : BinaryDoorLock

- Author: Marcus Michaely
- Description: Input is 8-Bit and only one combination opens the door
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

Explain how your project works

### How to test

Explain how to test your project

### IO

| # | Input | Output |
|---|-------|--------|
| 0 | Bit_0 | Output |
| 1 | Bit_1 | none   |
| 2 | Bit_2 | none   |
| 3 | Bit_3 | none   |
| 4 | Bit_4 | none   |
| 5 | Bit_5 | none   |
| 6 | Bit_6 | none   |
| 7 | Bit_7 | none   |

## 16 : bad apple

- Author: shadow1229
- Description: Plays bad apple over a Piezo Speaker connected across io\_out[1:0]. Based on <https://github.com/meriac/tt02-play-tune>
- GitHub repository
- HDL project
- Extra docs
- Clock: 12000 Hz
- External hardware: Piezo speaker connected across io\_out[1:0]

### How it works

Converts an RTTL ringtone into verilog and plays it back using differential PWM modulation.

### How to test

Provide 12kHz clock on io\_in[0], briefly hit reset io\_in[1] (L->H->L) and io\_out[1:0] will play a differential sound wave over piezo speaker (Bad Apple)

### IO

| # | Input | Output          |
|---|-------|-----------------|
| 0 | clock | piezo_speaker_p |
| 1 | reset | piezo_speaker_n |
| 2 | none  | none            |
| 3 | none  | none            |
| 4 | none  | none            |
| 5 | none  | none            |
| 6 | none  | none            |
| 7 | none  | none            |

## 17 : TinyFPGA attempt for TinyTapeout3

- Author: Emilian Miron
- Description: FPGA attempt
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware:

### How it works

TODO

### How to test

After reset, the counter should increase by one every second.

### IO

| # | Input | Output            |
|---|-------|-------------------|
| 0 | clock | segment a         |
| 1 | reset | segment b         |
| 2 | none  | segment c         |
| 3 | none  | segment d         |
| 4 | none  | segment e         |
| 5 | none  | segment f         |
| 6 | none  | segment g         |
| 7 | none  | slow clock output |

## 18 : 4bit Adder

- Author: Carin Schreiner
- Description: This tiny tape out project takes two four bit numbers and adds them.
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

The halfadder uses simple logic gates.

### How to test

Input bits 0-3 are used for the first number and bits 4-7 for the second number. The output bits 0-3 are the resulting number an bit 4 the carry. All numbers should be in little endian format.

### IO

| # | Input | Output |
|---|-------|--------|
| 0 | a0    | r0     |
| 1 | a1    | r1     |
| 2 | a2    | r2     |
| 3 | a3    | r3     |
| 4 | b0    | carry  |
| 5 | b1    | none   |
| 6 | b2    | none   |
| 7 | b3    | none   |

## 19 : 12-bit PDP8

- Author: Paul Campnell
- Description: PDP8 core
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware:

### How it works

This is a 12-bit basic PDP8 cpu - it doesn't have the extended arithmetic unit (so no multiply or divide). Included is an assembler (mostly for test). Bus interface is a 5-clock to get 12 bits of address and 12 bits of data though 8-bit interfaces. Address is 2 beats of 6 bits each, data is 3 beats of 4 bits each, I/O cycles have an extra beat

|                 |   |   |   |   |   |   |   |                 |                        |
|-----------------|---|---|---|---|---|---|---|-----------------|------------------------|
| output bits     |   |   |   |   |   |   |   |                 |                        |
| 7               | 6 | 5 | 4 | 3 | 2 | 1 | 0 |                 |                        |
| 1               | 0 | A | A | A | A | A | A |                 | address hi             |
| 1               | 1 | A | A | A | A | A | A |                 | address lo             |
| 0 1 1 I I 4 2 1 |   |   |   |   |   |   |   | I/O cycle intro |                        |
| either          |   |   |   |   |   |   |   |                 |                        |
| 0               | 0 | 0 | 0 | - | - | - | - | -               | read data high nibble  |
| 0               | 0 | 1 | 0 | - | - | - | - | -               | read data med nibble   |
| 0               | 1 | 0 | 0 | - | - | - | - | -               | read data low nibble   |
| or              |   |   |   |   |   |   |   |                 |                        |
| 0               | 0 | 0 | 1 | D | D | D | D |                 | write data high nibble |
| 0               | 0 | 1 | 1 | D | D | D | D |                 | write data med nibble  |
| 0               | 1 | 0 | 1 | D | D | D | D |                 | write data low nibble  |

Input bits are ignored except during read beats, interrupts are sampled during the first address beat

### How to test

code in test-bench, assembler in asm dir

### IO

| # | Input     | Output                        |
|---|-----------|-------------------------------|
| 0 | clock     | address_0_data_mux_0          |
| 1 | reset     | address_1_data_mux_1          |
| 2 | none      | address_2_data_mux_2          |
| 3 | none      | address_3_data_mux_3          |
| 4 | data_in_0 | address_4_rw_mux              |
| 5 | data_in_1 | address_5_phase_lo_select_mux |
| 6 | data_in_2 | phase_hi_select               |
| 7 | data_in_3 | address_data_select           |

## 20 : CTF - Catch the fish

- Author: Carin Schreiner
- Description: Catch the fish is a whac-a-mole game.
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

The project consists of \* a fibonacci lfsr of degree 6. \* a simple timer: the timer consists of two flip flops. It slows down the game and further amplifies the pseudo-randomness \* a button press detection for each button \* a simple reward detection \* Output state Handlers \* Output Selection: The last bit of the lsfr is used as the decision whether to give an output or not. The second and third to last bits are used to determine on which pin the output should be given. If both bits are zero, no output is given.

### How to test

To play the game, press the start button and make sure the clock is set to a frequency of one or two. The higher the frequency the more difficult the game gets. To score, you need to press the button while the respectivly numbered feedback output is one. You then get a reward feedback

### IO

| # | Input    | Output          |
|---|----------|-----------------|
| 0 | clock    | feedback 1      |
| 1 | button 1 | feedback 2      |
| 2 | button 2 | feedback 3      |
| 3 | button 3 | reward feedback |
| 4 | none     | none            |
| 5 | none     | none            |
| 6 | none     | none            |
| 7 | none     | none            |

## 21 : Dot operation calculator

- Author: Yannick Reiß
- Description: Can calculate the result for 3 bit multiplication and division.
- GitHub repository
- HDL project
- Extra docs
- Clock: 0 Hz
- External hardware: 7 switches, 6 leds

### How it works

It takes input 1 to determine operator and two 3 bit inputs as operands. The result is put to output 0-5. In multiplication mode it just outputs the number binary encoded. In division mode the output pins 0-2 are the quotient and 3-5 are the remainder.

### How to test

Connect input 1-7 with switches and output 0-6 with leds.

### IO

| # | Input      | Output            |
|---|------------|-------------------|
| 0 | clk_dummy  | product/quotient  |
| 1 | opcode     | product/quotient  |
| 2 | operand1_1 | product/quotient  |
| 3 | operand1_2 | product/remainder |
| 4 | operand1_3 | product/remainder |
| 5 | operand2_1 | product/remainder |
| 6 | operand2_2 | none              |
| 7 | operand2_3 | none              |

## 22 : RiscV Scan Chain based CPU – block 1 – clocking

- Author: Emilian Miron
- Description: RiscV Scan Chain based CPU – block 1 – clocking
- GitHub repository
- HDL project
- Extra docs
- Clock: 20000 Hz
- External hardware:

### How it works

TODO

### How to test

After reset, the counter should increase by one every second.

### IO

| # | Input | Output            |
|---|-------|-------------------|
| 0 | clock | segment a         |
| 1 | reset | segment b         |
| 2 | none  | segment c         |
| 3 | none  | segment d         |
| 4 | none  | segment e         |
| 5 | none  | segment f         |
| 6 | none  | segment g         |
| 7 | none  | slow clock output |

## 23 : RiscV Scan Chain based CPU – block 2 – instructions

- Author: Emilian Miron
- Description: RiscV Scan Chain based CPU – block 2 – instructions
- GitHub repository
- HDL project
- Extra docs
- Clock: 20000 Hz
- External hardware:

### How it works

TODO

### How to test

After reset, the counter should increase by one every second.

### IO

| # | Input | Output            |
|---|-------|-------------------|
| 0 | clock | segment a         |
| 1 | reset | segment b         |
| 2 | none  | segment c         |
| 3 | none  | segment d         |
| 4 | none  | segment e         |
| 5 | none  | segment f         |
| 6 | none  | segment g         |
| 7 | none  | slow clock output |

## 24 : RiscV Scan Chain based CPU – block 3 – registers

- Author: Emilian Miron
- Description: RiscV Scan Chain based CPU – block 3 – registers
- GitHub repository
- HDL project
- Extra docs
- Clock: 20000 Hz
- External hardware:

### How it works

TODO

### How to test

After reset, the counter should increase by one every second.

### IO

| # | Input | Output            |
|---|-------|-------------------|
| 0 | clock | segment a         |
| 1 | reset | segment b         |
| 2 | none  | segment c         |
| 3 | none  | segment d         |
| 4 | none  | segment e         |
| 5 | none  | segment f         |
| 6 | none  | segment g         |
| 7 | none  | slow clock output |

## 25 : RiscV Scan Chain based CPU – block 4 – ALU

- Author: Emilian Miron
- Description: RiscV Scan Chain based CPU – block 4 – ALU
- GitHub repository
- HDL project
- Extra docs
- Clock: 20000 Hz
- External hardware:

### How it works

TODO

### How to test

After reset, the counter should increase by one every second.

### IO

| # | Input | Output            |
|---|-------|-------------------|
| 0 | clock | segment a         |
| 1 | reset | segment b         |
| 2 | none  | segment c         |
| 3 | none  | segment d         |
| 4 | none  | segment e         |
| 5 | none  | segment f         |
| 6 | none  | segment g         |
| 7 | none  | slow clock output |

## 26 : 31b-PrimeDetector



Figure 5: picture

- Author: Aidan McNay
- Description: Detects whether a 31-bit number is prime or not
- GitHub repository
- HDL project
- Extra docs
- Clock: 50000 (though could probably be faster) Hz
- External hardware: External Clock, 5 buttons for inputs, 2 LEDs for outputs

### How it works

The 31-bit Prime Detector takes in a 31-bit number (shifted in serially). Once the number is obtained, the FSM control logic takes over. It attempts to divide the value by all numbers less than it; if it finds one that divides evenly, the logic stops and declares the number not prime. If it doesn't divide evenly by any of these, the number is declared prime.

Due to space constraints, the design uses an iterative divider and FSM logic to minimize space usage. Further information can be taken from the README.md on the GitHub page

## How to test

This design requires an external clock. Before testing, the design should be reset with the appropriate pin (active high), which resets the stored value to 0. To shift in a value, use the SPI-like interface; when the CS line is enabled (active low), on rising edges of SCLK, the data present at SDI is shifted in. Data is shifted into the LSB, and progressively shifted to more significant bits as new data is received (with the data at MSB being shifted out and disregarded).

Once you have the desired number stored, start the calculations by enabling the ready pin (active high). Note that the stored value cannot change while calculations are ongoing.

Once the calculations are finished, the done pin will be driven high. The result will be shown on the is\_prime pin; a value of 1 indicates that the value inputted is prime.

## IO

| # | Input | Output   |
|---|-------|----------|
| 0 | clock | done     |
| 1 | reset | is_prime |
| 2 | SDI   | waiting  |
| 3 | SCLK  | GND      |
| 4 | CS    | GND      |
| 5 | ready | GND      |
| 6 | NC    | GND      |
| 7 | NC    | GND      |

## 27 : XOR Stream Cipher



Figure 6: picture

- Author: Luke Vassallo
- Description: An two channel XOR stream cipher with fully programmable 32-bit galois LFSRs.
- GitHub repository
- HDL project
- Extra docs
- Clock: 12500 Hz
- External hardware: General puporse FPGA with PMOD connector for detailed testing./

### How it works

An XOR (exclusive or) cipher is a type of encryption that uses a bitwise exclusive or operation to combine a plaintext message with a secret key. This process generates a ciphertext that can only be decrypted by someone with the same secret key. XOR ciphers are commonly used in computer security and are popular due to their simplicity and efficiency. They can be implemented in hardware using XOR cipher chips, which typically use a Galois LFSR (linear feedback shift register) to generate a key stream that is XORed with the plaintext to produce the ciphertext. XOR ciphers are considered relatively secure as long as the secret key is kept secret and is not easily guessable.

The system uses a Galois Linear Feedback Shift Register (LFSR) to produce a key stream that is combined with the incoming bitstream through an XORing process to

create the cipher stream that appears at the output. A concurrent channel is additionally provided such that a transmission and reception bitstream can be encrypted and decrypted simultaneously. When the system is reset, a default configuration is applied that will encrypt the bitsream on tx\_p and decrypt the bitsream on rx\_e when te\_en and rx\_en are respectively driven high.

To configure the chip (optional), a 130-bit configuration vector is serial shifted through the pin cfg\_i while the current configuration is simultaneously outputted on pin cfg\_o. This configuration method functions as a lengthy shift register with its input connected to cfg\_i and its output connected to cfg\_o. It only operates synchronously to clk when cfg\_en is asserted. The configuration vector consists of bits 95->64, 31->0 for the tx/rx LFSR state, 127->96, 63->32 bits for the tx/rx LFSR taps. Power-on-reset state has the taps configured for PRBS-31 and LFSR state holding 0x55. Bit 128 is used to internally route the output bitstreams through an XOR that returns the original bitstreams provided at the input albeit placed on debug output pins (disabled by default). Bit 129 selects between the internal or externally provided plaintext generator (default).

## How to test

Reset the module and optionally configure the LFSR taps, state and other options using the serial shift configuration register. When the enable pin (tx\_en/rx\_en) is asserted the corresponding channel is activate and the plain/cipher text bitstream will be encrypted/decrypted. Detailed simulation, FPGA and ASIC design examples are available on GitHub (<https://github.com/LukeVassallo/tt03-xor-cipher>). The GitHub repository is mirrored on my private GitLab instance (<https://gitlab.lukevassallo.com/luke/tt03-xor-cipher>) that incorporates automated builds and pre-built bitstreams.

## IO

---

| # | Input                                                            | Output                                                           |
|---|------------------------------------------------------------------|------------------------------------------------------------------|
| 0 | clk (12.5KHz system clock)                                       | tx_e ( Encrypted bitstream for transmission)                     |
| 1 | rst (Active high synchronous reset)                              | dbg_tx_p (Decrypted transmit bitstream, pin disabled by default) |
| 2 | tx_p (Plaintext bitstream for transmission)                      | dbg_rx_e (Encrypted receive bitstream, pin disabled by default)  |
| 3 | cfg_i (Configuration input to the 130-bit serial shift register) | rx_p (Decrypted bitstream for reception)                         |
| 4 | cfg_en (Active high configuration enable)                        | cfg_o (Configuration output from the 130-bit shift register)     |

| # | Input                                    | Output                                    |
|---|------------------------------------------|-------------------------------------------|
| 5 | tx_en (Transmit channel enable)          | heartbeat[7] (bit from heartbeat counter) |
| 6 | rx_e (Encrypted bitstream for reception) | heartbeat[8] (bit from heartbeat counter) |
| 7 | rx_en (Receive channel enable)           | heartbeat[9] (bit from heartbeat counter) |

## 28 : LED Panel Driver



Figure 7: picture

- Author: Tom Keddie
- Description: Drives a 16x16 P10 LED panel
- GitHub repository
- HDL project
- Extra docs
- Clock: 6000 Hz
- External hardware: led panel, level converter to 5V logic

## How it works

- The circuit updates half of a P10 16x16 LED display module
- It initially displays the string TT03
- It provides a 600baud uart input to
  - paint pixels
  - erase pixels
  - clear the display
  - change the displayed colour
- Functionality is limited by resource availability and clock rate
  - single colour at once
  - no double buffer, updates may have artifacts
  - refresh rate is low, some flicker is observed ( $16*8=128$  pixels at 6kHz is ~46Hz, plus overhead)
- Mode pin to allow for 2 different clocking patterns

## How to test

- Connect the display module as per the outputs
- Connect the uart
- Power on and see the TT03 text
- If the display is swapped by quadrant change the mode pin
- Use the script(s) in the software directory to control the display

## IO

| # | Input | Output |
|---|-------|--------|
| 0 | clock | red    |
| 1 | reset | blue   |
| 2 | uart  | b      |
| 3 | mode  | blank  |
| 4 | none  | green  |
| 5 | none  | a      |
| 6 | none  | clk    |
| 7 | none  | latch  |

## 29 : 6-bit FIFO

- Author: Mike Bell
- Description: Implementation of a FIFO
- GitHub repository
- HDL project
- Extra docs
- Clock: 50000 Hz
- External hardware:

### How it works

The design implements a 52 entry 6-bit FIFO. The oldest 4 entries are accessible by setting the peek address. The first 48 entries in the FIFO are implemented as a chain of latches. This allows for high data density in the limited area. The last 4 entries in the FIFO are implemented by a ring of flip-flops. This allows random access to the last four entries.

Because of the way the chain of latches works, when entries are popped from the FIFO it takes time for data in the latch part of the FIFO to move down the chain. If the FIFO is fairly empty this shouldn't be noticeable, but when the FIFO is quite full this can cause the FIFO to refuse writes even though it is not full.

A ready output indicates whether the latch chain is ready for more data. If the FIFO is completely full and one entry is popped then it takes 48 cycles after the pop for the chain to be ready again. Therefore, if ready stays low when the FIFO is clocked 49 or more times without any data being popped then the FIFO is full.

There are minimal delays on the read side - if any data is in the FIFO then it can always be read, this works because empty latch entries can pass the data through them without needing to be clocked. The only exception to this is that due to input buffering newly written entries take 2 cycles to appear on the output.

### How to test

New entries, taken from inputs 2-7, are written to the FIFO on a rising clock edge when input 1 is high. Data writes are ignored when the Ready output is low.

Because of limited inputs, the write enable is used to determine the mode of inputs 2-5. When not writing, these control reset (active low), pop (active high) and the peek address.

The peek address controls whether the oldest to 4th oldest entry in the FIFO is presented on the data outputs.

Reading the oldest entry when the FIFO is empty always reads 0. However, peeking at previous entries when the FIFO is empty or has fewer occupied entries reads stale data - the values should not be relied upon.

The oldest entry is popped from the FIFO by setting pop. It is valid to set the peek address to a non-zero value when popping, but it is always the oldest entry that is popped, which is not the entry being read if peek address is non-zero. The peek address is considered prior to the pop.

When writing, it is always the last entry in the FIFO that is read (peek address is considered to be zero). The new value written when the FIFO is empty is not presented on the outputs until the following cycle.

## IO

| # | Input               | Output  |
|---|---------------------|---------|
| 0 | Clock               | Ready   |
| 1 | Mode (Write enable) | Empty_n |
| 2 | Reset_n / Data 0    | Data 0  |
| 3 | Pop / Data 1        | Data 1  |
| 4 | Peek A0 / Data 2    | Data 2  |
| 5 | Peek A1 / Data 3    | Data 3  |
| 6 | Unused / Data 4     | Data 4  |
| 7 | Unused / Data 5     | Data 5  |

## 30 : ezm\_cpu

- Author: guianmonezm#4787
- Description: basic 8bit CPU
- GitHub repository
- HDL project
- Extra docs
- Clock: 6000 Hz
- External hardware: FPGA or ram and demultiplexer

### How it works

you connect an external ram and a decoder to it or an FPGA with a rom programed.

### How to test

connect an external ram and a decoder to it.

### IO

| # | Input | Output  |
|---|-------|---------|
| 0 | clock | pc[0]/c |
| 1 | reset | pc[1]/c |
| 2 | in1   | pc[2]/c |
| 3 | in2   | pc[3]/c |
| 4 | in3   | pc[4]/c |
| 5 | in4   | pc[5]/c |
| 6 | in5   | pc[6]/c |
| 7 | in6   | pc[7]/c |

## 31 : tiny logic analyzer

- Author: yubex
- Description: The design samples one data input and shows the current state and edge events using the 7 segment display.
- GitHub repository
- HDL project
- Extra docs
- Clock: 12500 Hz
- External hardware: no external HW required

### How it works

The samples of the data input pin are shifted into a shift register first. After that the 2 most significant bits of the shift register are used to detect the current signal state (high, low) and the edge events (rising edge, falling edge). There are 4 different states/events displayed on the 7 segment display.

| state/event  | segments on |
|--------------|-------------|
| high         | a           |
| low          | d           |
| rising edge  | e and f     |
| falling edge | b and c     |

The edge events duration is extended by counters, so the events can actually be seen by the human eye.

### How to test

You can test by using the dip switch connected to io\_in[2] as data input. I plan to connect a wire to the pin on the PCB by PMOD connector or custom soldering to be able to have a “measurement probe”. ;-)

### IO

| # | Input      | Output    |
|---|------------|-----------|
| 0 | clock      | segment a |
| 1 | reset      | segment b |
| 2 | data input | segment c |
| 3 | none       | segment d |

| # | Input | Output    |
|---|-------|-----------|
| 4 | none  | segment e |
| 5 | none  | segment f |
| 6 | none  | segment g |
| 7 | none  | dot       |

## 32 : 4-bit ALU

- Author: ReJ aka Renaldas Zioma
- Description: Digital design for a 4-bit ALU supporting 8 different operations and built-in 4-bit accumulator register
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware: push-button, debouncer, DIP-switch, 5 LEDs

### How it works

Each clock cycle ALU performs one of the 8 possible operations and stores result in the 4-bit accumulator register.

accumulator [4 bit] = accumulator [4 bit] (operation) operand [4 bit]

Supported operations: lda imm :: imm -> accumulator  
neg imm :: 0x0F - imm -> accumulator  
shr :: accumulator / 2 -> accumulator  
sub imm :: accumulator - imm -> accumulator  
and imm :: accumulator & imm -> accumulator  
xor imm :: accumulator ^ imm -> accumulator  
or imm :: accumulator | imm -> accumulator  
add imm :: accumulator add imm -> accumulator  
+ imm -> accumulator

Matrix mapping of operation opcode to internal control signals muxA muxB muxC  
AtoX negX setC outC invC 000 lda -- 1 0 0 - 0 - 001 neg -- 1 0 1 - 0 - 010 shr -- 1  
1 0 - 0 - 011 sub 1 1 0 0 1 1 1 1 100 and 0 0 0 0 0 - 0 - 101 xor 0 1 0 0 0 - 0 - 110  
or 1 0 0 0 0 - 0 - 111 add 1 1 0 0 0 0 1 0

### How to test

The following diagram shows a simple test setup that can be used to test ALU





To reset ALU set all input pins to 0 which corresponds to lda 0 operation loading Accumulator register with 0.

## IO

| # | Input           | Output                              |
|---|-----------------|-------------------------------------|
| 0 | clock           | accumulator value 0th bit           |
| 1 | opcode 0th bit  | accumulator value 1st bit           |
| 2 | opcode 1st bit  | accumulator value 2nd bit           |
| 3 | opcode 2nd bit  | accumulator value 3rd bit           |
| 4 | operand 0th bit | {‘unused (TODO’: ‘negative flag’)’} |
| 5 | operand 1st bit | {‘unused (TODO’: ‘overflow flag’)’} |
| 6 | operand 2nd bit | {‘unused (TODO’: ‘zero flag’)’}     |
| 7 | operand 3rd bit | carry flag                          |

### 33 : Pulse-Density Modulators



Figure 8: picture

- Author: Marco Merlin
- Description: An implementation of a DDPM, PWM and Sigma-Delta Pulse-Density Modulators with python libraries myHDL and PuEDA.
- GitHub repository
- HDL project
- Extra docs
- Clock: 12500 Hz
- External hardware: clock source

#### How it works

This project implements three different architectures of Pulse Density Modulators (PDM), to compare performances and implementation complexity of the different schemes.

PDM modulators are of particular interest because exploiting oversampling, they allow to implement Digital-to-Analog-Conversion (DAC) schemes with an equivalent resolu-

tion of multiple bits, based on a single-bit digital output, by means of a straightforward analogue low-pass filter.

The PDM modulators implemented in this project are the following: 1) Pulse Width Modulation (PWM) 2) Dyadic Digital Pulse Modulation (DDPM) 3) Sigma-Delta (SD)

PWM is arguably the simplest and possibly most widespread PDM technology that is praised for its low complexity.

DDPM [1][2][3] is a type of digital modulation technique in which the pulse width are quantized in a dyadic manner, meaning that they are quantized in powers of two, which allows for efficient implementation using binary arithmetic. As a consequence, DDPM modulators are relatively inexpensive to deploy, with a complexity comparable to that of widespread Pulse-Width Modulation (PWM) modulators.

SD modulators are perhaps the best-performing PDM technology, that is particularly well suited for higher resolution data conversion, and typically find use in audio DACs and fractional PLLs. In spite their good performances, SD modulators are Infinite-Impulse-Response (IIR) closed-loop systems which behavior depends on the input signal, resulting in a number of concerns (most notably stability), that typically require careful modeling of the systems and condition under which they will be required to operate.

This design is separated into two sections: 1) 6 bits resolution instance of PWM, DDPM and SD fed by a static DC value from the input pins 2) 8 bits resolution instance of PWM, DDPM and SD fed sine look-up-table (LUT) that allows to evaluate the spectral content of the modulated signals.

## How to test

Common: The circuit needs to be fed with a clock on pin `io_in[0]`. Reset signal needs to be released by raising to 1 pin `io_in[1]`.

Static DC:

The input `inval` of the first set of DC modulators is fed through pins `io_in[7:2]`. The low-passed dc component of the outputs on pins `io_out[0]` (PWM), `io_out[1]`, and `io_out[2]` is proportional to the decimal value of the input `inval`.

Sinusoidal output: The low-passed outputs on pins `io_out[4]` (PWM), `io_out[5]`, and `io_out[6]` is a sinusoidal wave. When clocking the chip with a clock frequency of 12.5kHz, the frequency of the sine is of 0.76z ( $f_{sin} = f_{clock} / 2^{14}$ ), so that it should be visible at naked eye. The different designs should achieve an ENOB of 8 bits in the band 0-40Hz, with different level of out-of-band emission between each other.

## IO

| # | Input    | Output                |
|---|----------|-----------------------|
| 0 | clock    | pwm / segment a       |
| 1 | resetn   | ddpm / segment b      |
| 2 | inval[0] | sd / segment c        |
| 3 | inval[1] | 0'b1 / segment d      |
| 4 | inval[2] | pwm_sine / segment e  |
| 5 | inval[3] | ddpm_sine / segment f |
| 6 | inval[4] | sd_sine / segment g   |
| 7 | inval[5] | 0'b1 / dot            |

## 34 : CRC Decelerator



Figure 9: picture

- Author: Grant Hernandez (@grant-h)
- Description: A reconfigurable CRC engine
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware:

### How it works

*“The world’s slowest CRC!”*

The Cyclic Redundancy Check Decelerator is a reconfigurable CRC block that can be programmed to calculate any CRC up to 32-bits with arbitrary length, streamed input data. Since TinyTapeout 3 (TT03) I/O speeds are low, its unlikely that this CRC engine will be faster than the CPU/microcontroller streaming in data, hence “decelerator”.

This TT03 project is the follow up to my earlier, full-custom VLSI version of a CRC-32 datapath, built in Cadence, and fabricated using the MOSIS service while attending my university. Read more about my original the CRC-32 design and check out the die shots.

[3D-View]

### I/O Interface

The CRC IP has 8 input pins. Two are for the clk (io\_in[1]) and reset (rst / io\_in[1]). Then there is a two-bit command input, cmd (io\_in[3:2]), and the remaining 4-bits are for data input, data\_in (io\_in[7:4]). The design uses

only the positive edge of the clock and has a synchronous, active high, reset line. The data input is limited to passing a single nibble at a time. How this data is used depends on the current command.

Here is a table showing the I/O pins:

| # | Input      | Output      |
|---|------------|-------------|
| 0 | clk        | data_out[0] |
| 1 | rst        | data_out[1] |
| 2 | cmd[0]     | data_out[2] |
| 3 | cmd[1]     | data_out[3] |
| 4 | data_in[0] | data_out[4] |
| 5 | data_in[1] | data_out[5] |
| 6 | data_in[2] | data_out[6] |
| 7 | data_in[3] | data_out[7] |

There are 4 supported commands:

| #     | Name        | Description                                                                                                                                                |
|-------|-------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 2'b00 | CMD_RESET   | Restarts the CRC calculations using the parameters from the last SETUP bitstream                                                                           |
| 2'b01 | CMD_SETUP   | Streams in a CRC-bitwidth dependent bitstream to configure the CRC parameters                                                                              |
| 2'b10 | CMD_MESSAGE | Stream in a message to CRC 4-bits at a time. First cycle is lower 4-bits. Second cycle is upper 4-bits. Wait 8 cycles for byte to be processed.<br>Repeat. |
| 2'b11 | CMD_FINAL   | Continually stream out the final CRC value 8-bits at a time in a loop until deasserted                                                                     |

The overall flow is, a SETUP bitstream containing the CRC bitwidth, reflect in/out parameters, CRC poly, initial value, and XOR out is streamed in. Then the MESSAGE is streamed in 4-bits at a time until the message is complete. Finally, the FINAL is asserted and the final CRC value is streamed out on the output pins. To restart another CRC fresh, send RESET or resume adding additional data to the existing CRC by using MESSAGE.

## CRC Setup Bitstream

The most complex portion of the using this CRC IP is streaming in the configuration bitstream. This bitstream can be from 20-bits in a CRC-4 or less case and up to 104 bits in a CRC-32 case. The bitstream format is roughly: [config\_lo - 4 bits] [config\_hi - 4 bits] [poly - 4N bits] [init - 4N bits] [xor - 4N-bits]. Each nibble is packed with the MSB on data\_in[3]. config\_lo is 4-bits and defines the first 4-bits of the CRC bitwidth bitwidth[3:0]. config\_hi is also 4-bits and it contains the top-2 bits of the bitwidth and the reflect\_out and reflect\_in parameters of the CRC: [bitwidth[5]] [bitwidth[4]] [reflect\_out] [reflect\_in].

This initial configuration is always 8-bits, but the remaining bitstream is variable length dependent on the bitwidth. Following the initial parameters is the poly, init value, and XOR out values. These are equal length and are streamed one nibble at a time from least-significant nibble to most (least significant being bits[3:0]).

This process is best demonstrated using a timing diagram. For the example we'll be using the CRC-16/USB parameters which are width=16 poly=0x8005 init=0xffff refin=true refout=true xorout=0xffff check=0xb4c8:

### CRC-16/USB Setup Bitstream

The fully packed bitstream in hex is f35008fffffffff. The first nibble corresponds to bitwidth[3:0] = 0xf. This represents the value 15, which is one less than the bitwidth. This is because the CRC treats a bitwidth of zero as a CRC-1. You must pass in a bitwidth one less than the desired bitwidth to account for this. The second nibble unpacks as 4'b0011 in binary which makes the top 2-bits of the bitwidth be zero and sets reflect in and out to be true. The remaining nibbles are the configuration parameters least-significant nibble to most. Note that setup\_fsm is internal to the design.

## CRC Message Streaming

Once the CRC's parameters have been initialized, you can switch to CMD\_MESSAGE to stream in data to be CRC'd one nibble at a time. CRCs typically use the check message of 123456789 which is 0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38 0x39 in individual bytes. Here is a timing diagram showing this message being streamed in and the CRC's result register:

### CRC-16/USB 123456789 Check Message

The first break skips the 8 clock cycles for 0x31. The second break skips the middle 6 bytes and the final break skips the last shifting clock cycles. Following CMD\_MESSAGE you would signal CMD\_FINAL on cmd to have the CRC value streamed out on the data

output pins. Note that `crc_state` is internal to the design.

The CRC decelerator is a reconfigurable CRC block that can be programmed to calculate different CRC values up to 64-bits with arbitrary length streamed input data. Since clock speeds are low, its unlikely that this CRC engine will be faster than the CPU streaming in data, hence “decelerator”.

To begin, a SETUP bitstream containing the bitwidth, reflect in/out, CRC poly, init, and XOR out is sent. Then the MESSAGE is streamed in 4-bits at a time until the message is complete. Finally, the FINAL is signaled, leading the final CRC value to be streamed out. To calculate another CRC fresh, send RESET.

## How to test

See documentation.

## IO

| # | Input                   | Output                   |
|---|-------------------------|--------------------------|
| 0 | <code>clk</code>        | <code>data_out[0]</code> |
| 1 | <code>rst</code>        | <code>data_out[1]</code> |
| 2 | <code>cmd[0]</code>     | <code>data_out[2]</code> |
| 3 | <code>cmd[1]</code>     | <code>data_out[3]</code> |
| 4 | <code>data_in[0]</code> | <code>data_out[4]</code> |
| 5 | <code>data_in[1]</code> | <code>data_out[5]</code> |
| 6 | <code>data_in[2]</code> | <code>data_out[6]</code> |
| 7 | <code>data_in[3]</code> | <code>data_out[7]</code> |

## 35 : 7 segment seconds

- Author: Matt Venn
- Description: Count up to 10, one second at a time.
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware:

### How it works

Uses a set of registers to divide the clock, and then some combinational logic to convert from binary to decimal for the display.

Puts a slower square wave output on output 7.

### How to test

After reset, the counter should increase by one every second.

### IO

| # | Input | Output            |
|---|-------|-------------------|
| 0 | clock | segment a         |
| 1 | reset | segment b         |
| 2 | none  | segment c         |
| 3 | none  | segment d         |
| 4 | none  | segment e         |
| 5 | none  | segment f         |
| 6 | none  | segment g         |
| 7 | none  | slow clock output |

## 36 : Binary to DEC and HEX

- Author: Norberto Hernandez-Como
- Description: Converts a 4 digit binary number to decimal or to hexadecimal using a 7-segment display
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

Converts a 4 digit binary number to decimal or to hexadecimal using a 7-segment display

### How to test

Select binary to decimal with input 6 (named Dec, the dot is displayed) or binary to hexadecimal with input 7 (named Hex). Input a 4 digit binary number using inputs B3, B2, B1, B0 and see in the 7-segment display the converted number in decimal or hexadecimal. All forbidden combinations are not displayed.

### IO

| # | Input | Output    |
|---|-------|-----------|
| 0 | B0    | segment a |
| 1 | B1    | segment b |
| 2 | B2    | segment c |
| 3 | B3    | segment d |
| 4 | none  | segment e |
| 5 | none  | segment f |
| 6 | Dec   | segment g |
| 7 | Hex   | dot       |

## 37 : Simon Says



Figure 10: picture

- Author: Uri Shaked
- Description: A simple memory game
- GitHub repository
- HDL project
- Extra docs
- Clock: 4000 Hz
- External hardware: Four push buttons (with pull-down resistors), four LEDs, and optionally a speaker/buzzer

### How it works

Simon says is a simple electronic memory game: the user has to repeat a growing sequence of colors. The sequence is displayed by lighting up the LEDs. Each color also has a corresponding tone.

In each turn, the game will play the sequence, and then wait for the user to repeat the

sequence by pressing the buttons according to the color sequence. If the user repeated the sequence correctly, the game will play a “leveling-up” sound, add a new color at the end of the sequence, and move to the next turn.

The game continues until the user has made a mistake. Then a game over sound is played, and the game restarts.

The game supports four clock speeds, which can be selected using the clk3 and clk1 inputs:

| clk3 | clk1 | Clock Speed |
|------|------|-------------|
| 0    | 0    | 4KHz        |
| 0    | 1    | 6KHz        |
| 1    | 0    | 12KHz       |
| 1    | 1    | 14KHz       |

Setting the clock speed affects the speed of the game and the tone generator.

Check out the online simulation at <https://wokwi.com/projects/352319274216569857> (including wiring diagram).

## How to test

You need four buttons, four LEDs, resistors, and optionally a speaker/buzzer. Ideally, you want to use 4 different colors for the buttons/LEDs (red, green, blue, yellow). 1. Connect the buttons to pins btn1, btn2, btn3, and btn4, and also connect each button to a pull down resistor. 2. Connect the LEDs to pins led1, led2, led3, and led4, matching the colors of the buttons (so led1 and btn1 have the same color, etc.) 3. Connect the speaker to the speaker pin. 4. Select the clock frequency (using the clk3 and clk1 inputs). 5. Reset the game, and then press any button to start it. Enjoy!

## IO

| # | Input | Output  |
|---|-------|---------|
| 0 | clock | led1    |
| 1 | reset | led2    |
| 2 | btn1  | led3    |
| 3 | btn2  | led4    |
| 4 | btn3  | speaker |
| 5 | btn4  | none    |

| # | Input | Output |
|---|-------|--------|
| 6 | clk1  | none   |
| 7 | clk3  | none   |

## 38 : Shift Register Ram

- Author: Dakotath
- Description: The device holds bits in shift registers to remember crap
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 5000 Hz
- External hardware:

### How it works

It's ram. It stores stuff. This device can hold 8 Bytes of stuff. Yes, I know, It's not a lot.

### How to test

Explain how to test your project

### IO

| # | Input         | Output |
|---|---------------|--------|
| 0 | Clock Data    | D0     |
| 1 | Data Input    | D1     |
| 2 | Clock Address | D2     |
| 3 | Address Input | D3     |
| 4 | Output Enable | D4     |
| 5 | None          | D5     |
| 6 | None          | D6     |
| 7 | None          | D7     |

## 39 : tinysat

- Author: Emmanouel Matigakis
- Description: Tiny sat solver.
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware:

### How it works

uses a counter to iterate over all possible assignments to a simple sat problem.

### How to test

I'm thinking of using an RP2040 to basically implement the same tests as in test.py in the physical thing.

### IO

| # | Input   | Output |
|---|---------|--------|
| 0 | clock   | x[0]   |
| 1 | reset   | x[1]   |
| 2 | run     | x[2]   |
| 3 | load    | x[3]   |
| 4 | data[0] | sol    |
| 5 | data[1] | done   |
| 6 | data[2] | 0      |
| 7 | data[3] | 0      |

## 41 : POV display

- Author: Balint Kovacs
- Description: Small POV display
- GitHub repository
- HDL project
- Extra docs
- Clock: 12500 Hz
- External hardware: A line of LEDs, an MCU to load the image data, and some means of timing

### How it works

The image is stored in a 8x32 loop, this can be updated over the SPI bus. Additionally, there is a clock generator that generates 48-128 pulses for every cycle of the hall effect sensor. Finally, a controller passes 32 of those pulses each cycle to the loop memory, and also handles blanking.

Relevant registers are reset by transitions of the hall effect sensor and the CS lines.

### How to test

- Supply a regular clock, up to f\_clk/1024, on hall\_in.
- Load an image in a single, 32 byte SPI transaction.
- Move the device quickly

### IO

| # | Input       | Output |
|---|-------------|--------|
| 0 | clock       | led0   |
| 1 | cs_n        | led1   |
| 2 | sck         | led2   |
| 3 | mosi        | led3   |
| 4 | hall_in     | led4   |
| 5 | hall_invert | led5   |
| 6 | divider[0]  | led6   |
| 7 | divider[1]  | led7   |

## 42 : Toy CPU

- Author: jordan336
- Description: Toy CPU is an 8 bit toy CPU for the Tiny Tapeout project.
- GitHub repository
- HDL project
- Extra docs
- Clock: 10000 Hz
- External hardware:

### How it works

#### ISA

| Opcode | Mnemonic | Name            | Description                                                                        |
|--------|----------|-----------------|------------------------------------------------------------------------------------|
| 000    | DISP     | Display         | $\text{data\_out} = \text{reg}[\text{src\_a}]$                                     |
| 001    | ADD      | Add             | $\text{reg}[\text{dest}] = \text{reg}[\text{src\_a}] + \text{reg}[\text{src\_b}]$  |
| 010    | ADD_I    | Add (immediate) | $\text{reg}[\text{dest}] = \text{reg}[\text{src\_a}] + \text{imm}$                 |
| 011    | AND      | And             | $\text{reg}[\text{dest}] = \text{reg}[\text{src\_a}] \& \text{reg}[\text{src\_b}]$ |
| 100    | AND_I    | And (immediate) | $\text{reg}[\text{dest}] = \text{reg}[\text{src\_a}] \& \text{imm}$                |
| 101    | OR       | Or              | $\text{reg}[\text{dest}] = \text{reg}[\text{src\_a}]   \text{reg}[\text{src\_b}]$  |
| 110    | OR_I     | Or (immediate)  | $\text{reg}[\text{dest}] = \text{reg}[\text{src\_a}]   \text{imm}$                 |
| 111    | STRE     | Store           | $\text{reg}[\text{dest}] = \text{imm}$                                             |

#### Instruction format

Instructions are passed using the upper 6 bits of the inputs. Depending on the opcode, the full instruction with opcode and all arguments is passed using one, two, or three 6 bit instruction words.

| Word | Input [7:5]  | Input [4:2]            | Input [1] | Input [0] |
|------|--------------|------------------------|-----------|-----------|
| 0    | opcode[2:0]  | src_a[2:0]             | rst       | clk       |
| 1    | dest[2:0]    | src_b[2:0] or imm[7:5] | rst       | clk       |
| 2    | {X,imm[4:3]} | imm[2:0]               | rst       | clk       |

| Opcode | Mnemonic | Number of Instruction Words |
|--------|----------|-----------------------------|
| 000    | DISP     | 1                           |
| 001    | ADD      | 2                           |
| 010    | ADD_I    | 3                           |
| 011    | AND      | 2                           |
| 100    | AND_I    | 3                           |

| Opcode | Mnemonic | Number of Instruction Words |
|--------|----------|-----------------------------|
| 101    | OR       | 2                           |
| 110    | OR_I     | 3                           |
| 111    | STRE     | 3                           |

## Start input

After exiting reset, the Toy CPU looks for a start input to begin processing the instruction stream. The start input is all 1s in the 6 bit instruction word (0x3F). After sampling the start sequence, the CPU will interpret the next 6 bit instruction word as the first word in the instruction stream.

## How to test

Drive a clock on input pin 0 and perform a reset using pin 1. Drive the start input on the 6 bit instruction word, then encode your instructions in the above format on the 6 bit instruction word interface.

## IO

| # | Input          | Output      |
|---|----------------|-------------|
| 0 | clock          | data_out[0] |
| 1 | reset          | data_out[1] |
| 2 | instruction[0] | data_out[2] |
| 3 | instruction[1] | data_out[3] |
| 4 | instruction[2] | data_out[4] |
| 5 | instruction[3] | data_out[5] |
| 6 | instruction[4] | data_out[6] |
| 7 | instruction[5] | data_out[7] |

## 43 : Base-10 grey counter counts from zero to a trillion

- Author: Daniel Wisehart
- Description: Change only one output bit per count, but count with decimal digits instead of the usual reverse bit order grey counter.
- GitHub repository
- HDL project
- Extra docs
- Clock: any Hz
- External hardware:

### How it works

Like a standard grey counter, this counter will only change one bit per time, but the bits are grouped into decimal digits. (This also makes for easy bits -> decimal decoding, which is done in the test.py file.)

Each decimal digit uses five bits. As with all grey counters, you have to scan the counter output fast enough that either the output value is the same or it has only increased by 1 between scans. If you scan too slowly and the value changes by 2 or more, then the grey counter can and will give you bad counts.

As an example of how this grey counter works, this is the progression of grey bits when counting from decimal 0 to 12:

| 10's  | 1's   | count |
|-------|-------|-------|
| 10001 | 10001 | 0     |
| 10001 | 00001 | 1     |
| 10001 | 00011 | 2     |
| 10001 | 00010 | 3     |
| 10001 | 00110 | 4     |
| 10001 | 00100 | 5     |
| 10001 | 01100 | 6     |
| 10001 | 01000 | 7     |
| 10001 | 11000 | 8     |
| 10001 | 10000 | 9     |
| 00001 | 10001 | 10    |
| 00001 | 00001 | 11    |
| 00001 | 00011 | 12    |

But wait, you say, at decimal value 10, two bits change at once. This is where your deciphering of the bits has to be smart.

The bits in the 10's digit change 10x more slowly than the 1's digit. So when you decipher the combined value, you look first at the 10's digit. If the 10's digit has

changed, then you can ignore the 1's digit, because you know the combined value is 10 or 20 or ... 90 or back to 00. If the 10's digit has not changed, then you look at both digits to decipher the combined value: 1 or 2 or ... 98 or 99.

Some work has been done to insure that the 10's digit changes before the 1's digit, but if the physical routes between the outputs of this circuit and the inputs to your scanning circuit make the 10's bits arrive at your inputs later than the 1's bits, you can and will receive 1's bits that change from 9 ('b10000) to 0 ('b00000) even though you have not yet seen the 10's bits change. That can be worked around because you can deduce that the 10's digit has rolled over, so you update your internal copy of the 10's bits. It is probably better to keep the input paths the same length or make the 1's bits use a little longer path than the 10's bits.

Note that in this implementation, the TinyTapeout PCB hardware—which only has 8 outputs—outputs a combination of eight bits depending on the input selection. This is the purpose of the input selection: to determine which counter values you can see on the outputs. To see all of the digits at once, you have to run this in the simulator or with cocotb. Here are the hardware output selections that are available. The selection uses input bits 7 to 2, in that order.

| select [5:0] | output [7:0] |           |           |
|--------------|--------------|-----------|-----------|
| 000101       | rollover     | 100B[5:0] | 10B[5:4]  |
| 000110       | 100B[0]      | 10B[5:0]  | 1B[5:4]   |
| 000111       | 10B[0]       | 1B[5:0]   | 100M[5:4] |
| 001001       | 1B[0]        | 100M[5:0] | 10M[5:4]  |
| 001010       | 100M[0]      | 10M[5:0]  | 1M[5:4]   |
| 001011       | 10M[0]       | 1M[5:0]   | 100T[5:4] |
| 010001       | 1M[0]        | 100T[5:0] | 10T[5:4]  |
| 010010       | 100T[0]      | 10T[5:0]  | 1T[5:4]   |
| 010011       | 10T[0]       | 1T[5:0]   | 100[5:4]  |
| 100001       | 1T[0]        | 100[5:0]  | 10[5:4]   |
| 100010       | 100[0]       | 10[5:0]   | 1[5:4]    |
| default      | 10[1:0]      | 1[5:0]    | half_clk  |

Here are the meanings of the abbreviations

| abbrev   | meaning              |
|----------|----------------------|
| 100B     | 100 billions digit   |
| 10B      | 10 billions digit    |
| 1B       | billions digit       |
| 100M     | 100 millions digit   |
| 10M      | 10 millions digit    |
| 1M       | millions digit       |
| 100T     | 100 thousands digit  |
| 10T      | 10 thousands digit   |
| 1T       | thousands digit      |
| 100      | hundreds digit       |
| 10       | tens digit           |
| 1        | singles digit        |
| rollover | one trillion         |
| half_clk | clock divided by two |

## How to test

After reset goes low, the counter should increase by one with each rising edge of the clock. You are encouraged to try different clock rates. If you load a 60 bit value into the init input before you assert reset, that bit pattern will be loaded into the grey counter, as shown in test.py.

If you enter `make` from the `src` directory, a cocotb test will run and report the results. Inside of `test.py` there is a `RANGE` constant at the top of the file which you can use to tell cocotb how many different values to check. Here are some representative test times:

|                    |          |
|--------------------|----------|
| RANGE = 10,000     | 2 sec    |
| RANGE = 100,000    | 19 sec   |
| RANGE = 1,000,000  | 189 sec  |
| RANGE = 10,000,000 | 2193 sec |

## IO

| # | Input     | Output    |
|---|-----------|-----------|
| 0 | clock     | output[7] |
| 1 | reset     | output[6] |
| 2 | select[5] | output[5] |
| 3 | select[4] | output[4] |
| 4 | select[3] | output[3] |
| 5 | select[2] | output[2] |

| # | Input     | Output    |
|---|-----------|-----------|
| 6 | select[1] | output[1] |
| 7 | select[0] | output[0] |

## 44 : ttFIR filter

- Author: Georg Boecherer
- Description: 6 tap Finite Impulse Response (FIR) filter with 6 bit input signal, 6 bit filter coefficients, and 8 bit output signal.
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware:

### How it works

ttFIR implements a digital 6 tap Finite Impulse Response (FIR) filter. Inputs, coefficients, and outputs are 6bit, 6bit, and 8bit values, respectively, in 2's complement format.

### How to test

- reset high: shift registers for coefficient and output are set to zero.
- reset low:
  - 6 clock cycles: 6bit coefficients in 2's complement format are loaded into registers.
  - input at each clock cycle: 6bit inputs in 2's complement format are loaded into shift register.
  - output at each clock cycle: coefficients and input values in shift register are multiplied, added and output in 8bit 2's complement format.

### IO

| # | Input                                         | Output                             |
|---|-----------------------------------------------|------------------------------------|
| 0 | clock                                         | bit0 LSB of 2's complement output. |
| 1 | reset                                         | bit1                               |
| 2 | bit0 LSB of 2's complement coefficient/input. | bit2                               |
| 3 | bit1                                          | bit3                               |
| 4 | bit2                                          | bit4                               |
| 5 | bit3                                          | bit5                               |
| 6 | bit4                                          | bit6                               |
| 7 | bit5 MSB.                                     | bit7 MSB.                          |

## 45 : QTCore-A1

- Author: Hammond Pearce
- Description: An accumulator-based 8-bit microarchitecture designed via GPT-4 conversations.
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware: Any microcontroller with an SPI peripheral

### How it works

The QTCore-A1 is a basic accumulator-based 8-bit microarchitecture (with an emphasis on the micro). It is a Von Neumann design (shared data and instruction memory).

Although primarily designed for Tiny Tapeout 3, it is parameterized and may be synthesized to FPGAs (and a project file for CMOD A7 is provided on the project GitHub).

Probably the most interesting thing about this design is that all functional Verilog beyond the Tiny Tapeout wrapper was written by GPT-4, i.e. not a human! The author (Hammond Pearce) developed with GPT-4 first the ISA, then the processor, fully conversationally. Hammond wrote the test-benches to validate the design, and then had the appropriate back-and-forth with GPT-4 to have it fix all bugs. For your interest, we will provide all conversation logs in the project repository.

The architecture defines a processor with the following components:

- Control Unit: 2-cycle FSM for driving the processor (3 bit one-hot encoded state register)
- Program Counter: 5-bit register containing the current address of the program
- Instruction Register: 8-bit register containing the current instruction to execute
- Accumulator: 8-bit register used for data storage, manipulation, and logic
- Memory Bank: 17 8-bit registers which store instructions and data. The 17th register is used for I/O.

In order to interact with the processor, all registers are connected via one large scan chain. As such, you can use an external microcontroller's SPI peripheral to read and write the status of the processor. We use the SPI clock SPI\_SCK as the clock to drive both the QTCore-A1 and the scan chain. You choose which you are using by asserting the appropriate chip select. This makes it quite easy to interact with the processor.

The ISA description follows. For your convenience, we also provide an assembler in Python (also written by GPT-4) which makes it quite easy to write simple programs.

## Immediate Data Manipulation Instructions

- ADDI: Add 4-bit Immediate to Accumulator
  - Opcode (4 bits): 1110
  - Immediate (4 bits): 4-bit Immediate
  - Register Effects:  $ACC \leftarrow ACC + IMM$

## Instructions with Variable-Data Operands

- LDA: Load Accumulator with memory contents
  - Opcode (3 bits): 000
  - Operand (5 bits): Memory Address
  - Register Effects:  $ACC \leftarrow M[Address]$
- STA: Store Accumulator to memory
  - Opcode (3 bits): 001
  - Operand (5 bits): Memory Address
  - Register Effects:  $M[Address] \leftarrow ACC$
- ADD: Add memory contents to Accumulator
  - Opcode (3 bits): 010
  - Operand (5 bits): Memory Address
  - Register Effects:  $ACC \leftarrow ACC + M[Address]$
- SUB: Subtract memory contents from Accumulator
  - Opcode (3 bits): 011
  - Operand (5 bits): Memory Address
  - Register Effects:  $ACC \leftarrow ACC - M[Address]$
- AND: AND memory contents with Accumulator
  - Opcode (3 bits): 100
  - Operand (5 bits): Memory Address
  - Register Effects:  $ACC \leftarrow ACC \& M[Address]$
- OR: OR memory contents with Accumulator
  - Opcode (3 bits): 101
  - Operand (5 bits): Memory Address
  - Register Effects:  $ACC \leftarrow ACC | M[Address]$
- XOR: XOR memory contents with Accumulator
  - Opcode (3 bits): 110
  - Operand (5 bits): Memory Address
  - Register Effects:  $ACC \leftarrow ACC \wedge M[Address]$

## Control and Branching Instructions

- JMP: Jump to memory address
  - Opcode (8 bits): 11110000

- PC Behavior:  $PC \leftarrow ACC$  (Load the PC with the address stored in the accumulator)
- JSR: Jump to Subroutine (save address to ACC)
  - Opcode (8 bits): 11110001
  - PC Behavior:  $ACC \leftarrow PC + 1$ ,  $PC \leftarrow ACC$  (Save the next address in ACC, then jump to the address in ACC)
- BEQ\_FWD: Branch if equal, forward (branch if  $ACC == 0$ )
  - Opcode (8 bits): 11110010
  - PC Behavior: If  $ACC == 0$ , then  $PC \leftarrow PC + 3$  (Jump 2 instructions forward if ACC is zero)
- BEQ\_BWD: Branch if equal, backward (branch if  $ACC == 0$ )
  - Opcode (8 bits): 11110011
  - PC Behavior: If  $ACC == 0$ , then  $PC \leftarrow PC - 2$  (Jump 1 instruction backward if ACC is zero)
- BNE\_FWD: Branch if not equal, forward (branch if  $ACC != 0$ )
  - Opcode (8 bits): 11110100
  - PC Behavior: If  $ACC != 0$ , then  $PC \leftarrow PC + 3$  (Jump 2 instructions forward if ACC is non-zero)
- BNE\_BWD: Branch if not equal, backward (branch if  $ACC != 0$ )
  - Opcode (8 bits): 11110101
  - PC Behavior: If  $ACC != 0$ , then  $PC \leftarrow PC - 2$  (Jump 1 instruction backward if ACC is non-zero)
- HLT: Halt the processor until reset
  - Opcode (8 bits): 11111111
  - PC Behavior: Stop execution (PC does not change until a reset occurs)

## Data Manipulation Instructions

- SHL: Shift Accumulator left
  - Opcode (8 bits): 11110110
  - Register Effects:  $ACC \leftarrow ACC \ll 1$
- SHR: Shift Accumulator right
  - Opcode (8 bits): 11110111
  - Register Effects:  $ACC \leftarrow ACC \gg 1$
- SHL4: Shift Accumulator left by 4 bits
  - Opcode (8 bits): 11111000
  - Register Effects:  $ACC \leftarrow ACC \ll 4$
- ROL: Rotate Accumulator left
  - Opcode (8 bits): 11111001
  - Register Effects:  $ACC \leftarrow (ACC \ll 1) \text{ OR } (ACC \gg 7)$
- ROR: Rotate Accumulator right
  - Opcode (8 bits): 11111010

- Register Effects:  $ACC \leftarrow (ACC >> 1) \text{ OR } (ACC << 7)$
- LDAR: Load Accumulator via indirect memory access (ACC as ptr)
  - Opcode (8 bits): 11111011
  - Register Effects:  $ACC \leftarrow M[ACC]$
- DEC: Decrement Accumulator
  - Opcode (8 bits): 11111100
  - Register Effects:  $ACC \leftarrow ACC - 1$
- CLR: Clear (Zero) Accumulator
  - Opcode (8 bits): 11111101
  - Register Effects:  $ACC \leftarrow 0$
- INV: Invert (NOT) Accumulator
  - Opcode (8 bits): 11111110
  - Register Effects:  $ACC \leftarrow \sim ACC$

## **Example programming using the assembler**

Writing assembly programs for QTCore-A1 is simplified by the assembler produced by GPT-4. First, we define two additional meta-instructions:

### **Meta-instructions:**

- NOP: Do nothing
  - Implemented as ADDI 0
- DATA: Define raw data to be loaded at the current address
  - Operand (8 bits): 8-bit data value

### **Presenting programs to the assembler:**

1. Programs are presented in the format [address]: [mnemonic] [optional operand]
2. There is a special meta-instruction called DATA, which is followed by a number. If this is used, just place that number at that address.
3. Programs cannot exceed the size of the memory (in Tiny Tapeout 3, this is 17 bytes including the IO register).
4. The memory contains both instructions and data.

### **Example program:**

An interesting example program is presented. This assumes a button is connected to the general purpose input, and some LEDs are connected to the LED output.

We assume this file is called test\_btn\_led.asm:

```

; This program tests the btn and LEDs
; It will wait for low->high transitions on the button input.
; After receiving this, it will toggle a set of LEDs.
;
; BTNs and LEDS are at address 17, btn at LSB
0: LDA 17 ; load the btn and LEDS
1: AND 16 ; mask the btn
2: BNE_BWD ; if btn&1 is not zero then branch back to 0
3: LDA 17 ; load the btn and LEDS
4: AND 16 ; mask the btn
5: BEQ_BWD ; if btn&1 is zero then branch back to 3
;
; the button has now done a transition from low to high
;
6: LDA 14 ; load the counter toggle
7: XOR 16 ; toggle the counter using the btn mask
8: STA 14 ; store the counter value
9: ADDI 14 ; get the counter value offset
;           (if 0, will be 14 (which is 0), if 1, 15)
10: LDAR ; load the counter LED pattern
11: STA 17 ; store the LED pattern
12: CLR
13: JMP
;
; data
;
14: DATA 0; toggle and LED pattern 0
15: DATA 24; LED pattern of ON
;           (test for led_out=value 12, since 24>>1 == 12)
16: DATA 1 ; btn and counter mask

```

To compile this program, we would invoke the assembler (provided in the associated repository) as follows:

```
$ ./assembler.py test_btn_led.asm
```

The assembler will generate the following files for us:

- `test_btn_led.bin`: This is a binary representation of the assembly program. It can be used, or we can use one of the helper formats...
- `test_btn_led.memarray.v`: This provides the binary ready for use in a Verilog test bench to directly write to the memory of the processor.
- `test_btn_led.scanchain.v`: This provides the binary ready for use in the provided Verilog test bench which loads and unloads programs via the scan chain.

- `test_btn_led.c`: This provides the binary as an array suitable for use in a C program on an external microcontroller which can load it into the processor via SPI.

## **Processor operation**

The processor executes all instructions via 2 stages (multi-cycle).

The timing is as follows. Note the branch instructions are +2/-3 due to the already-incremented PC in the fetch stage.

### **FETCH cycle (all instructions)**

1.  $IR \leftarrow M[PC]$
2.  $PC \leftarrow PC + 1$

### **EXECUTE cycle**

For **Immediate Data Manipulation Instructions**:

- ADDI:  $ACC \leftarrow ACC + IMM$

For **Instructions with Variable-Data Operands**:

- LDA:  $ACC \leftarrow M[Address]$
- STA:  $M[Address] \leftarrow ACC$
- ADD:  $ACC \leftarrow ACC + M[Address]$
- SUB:  $ACC \leftarrow ACC - M[Address]$
- AND:  $ACC \leftarrow ACC \& M[Address]$
- OR:  $ACC \leftarrow ACC | M[Address]$
- XOR:  $ACC \leftarrow ACC \wedge M[Address]$

For **Control and Branching Instructions**:

- JMP:  $PC \leftarrow ACC$
- JSR:  $ACC \leftarrow PC, PC \leftarrow ACC$
- BEQ\_FWD: If  $ACC == 0$ , then  $PC \leftarrow PC + 2$
- BEQ\_BWD: If  $ACC == 0$ , then  $PC \leftarrow PC - 3$
- BNE\_FWD: If  $ACC != 0$ , then  $PC \leftarrow PC + 2$
- BNE\_BWD: If  $ACC != 0$ , then  $PC \leftarrow PC - 3$
- HLT: (No operation, processor halted)

For **Data Manipulation Instructions**:

- SHL:  $ACC \leftarrow ACC \ll 1$
- SHR:  $ACC \leftarrow ACC \gg 1$
- SHL4:  $ACC \leftarrow ACC \ll 4$
- ROL:  $ACC \leftarrow (ACC \ll 1) \text{ OR } (ACC \gg 7)$

- ROR:  $ACC \leftarrow (ACC >> 1) \text{ OR } (ACC << 7)$
- LDAR:  $ACC \leftarrow M[ACC]$
- DEC:  $ACC \leftarrow ACC - 1$
- CLR:  $ACC \leftarrow 0$
- INV:  $ACC \leftarrow \sim ACC$

## Processor Memory Map

| Address | Description                                         |
|---------|-----------------------------------------------------|
| 0-16    | 17 bytes of general purpose Instruction/Data memory |
| 17      | I/O: $\{OUT[7:1], IN[0]\}$                          |
| 18      | Constant value: 19 (See "7seg" note below)          |
| 19      | Constant value: Bits for 7seg "0"                   |
| 20      | Constant value: Bits for 7seg "1"                   |
| 21      | Constant value: Bits for 7seg "2"                   |
| 22      | Constant value: Bits for 7seg "3"                   |
| 23      | Constant value: Bits for 7seg "4"                   |
| 24      | Constant value: Bits for 7seg "5"                   |
| 25      | Constant value: Bits for 7seg "6"                   |
| 26      | Constant value: Bits for 7seg "7"                   |
| 27      | Constant value: Bits for 7seg "8"                   |
| 28      | Constant value: Bits for 7seg "9"                   |
| 29-31   | Constant value: 1                                   |

## 7seg

Tiny Tapeout 3 has a 7-segment display on the board. To make it useful with the QTCore-A1, the processor helpfully includes the bit patterns stored at addresses 19-28. The easiest way to make use of these is with the LDAR instruction. Here's an example snippet, which assumes a value between 0 and 9 is stored at address 16:

```
0: LDA 16 ; load the value we want to display
1: ADD 18 ; add it to the constant 19 to get the 7 segment pattern offset
2: LDAR    ; load the 7 segment pattern
3: STA 17 ; emit the 7 segment pattern
```

## How to test

The processor will not run unless a program is scanned into it. Fortunately, this is easy using the SPI peripheral of an external microcontroller.

## Wiring the SPI

The QTCore-A1 is designed to be connected to an SPI peripheral along with two chip selects. This is because we use the SPI SCK as the clock for both the scan chain and the processor.

Connect the I/O according to the provided wiring table. Then, set the SPI peripheral to the following settings:

- SPI Mode 0
- 8-bit data
- MSB first
- A very slow clock (I used 1kHz)
- The scan chain chip select is active low
- The processor chip select is active low

## Loading a program

During the scan chain mode (when the scan enable is low), the entire processor acts as a giant shift register, with the bits in the following order:

All elements of the scan chain are presented MSB first.

- `scan_chain[2:0]` - 3-bit state register
- `scan_chain[7:3]` - 5-bit PC
- `scan_chain[15:8]` - 8-bit Instruction Register
- `scan_chain[23:16]` - 8-bit Accumulator
- `scan_chain[31 -: 8]` - Memory[0]
- `scan_chain[39 -: 8]` - Memory[1]
- `scan_chain[47 -: 8]` - Memory[2]
- `scan_chain[55 -: 8]` - Memory[3]
- `scan_chain[63 -: 8]` - Memory[4]
- `scan_chain[71 -: 8]` - Memory[5]
- `scan_chain[79 -: 8]` - Memory[6]
- `scan_chain[87 -: 8]` - Memory[7]
- `scan_chain[95 -: 8]` - Memory[8]
- `scan_chain[103 -: 8]` - Memory[9]
- `scan_chain[111 -: 8]` - Memory[10]
- `scan_chain[119 -: 8]` - Memory[11]
- `scan_chain[127 -: 8]` - Memory[12]
- `scan_chain[135 -: 8]` - Memory[13]
- `scan_chain[143 -: 8]` - Memory[14]
- `scan_chain[151 -: 8]` - Memory[15]
- `scan_chain[159 -: 8]` - Memory[16]

- scan\_chain[167 -: 8] - Memory[17] (the IO register)

C code to load the previously-discussed example program (e.g. via the STM32 HAL) is provided:

```
//The registers are presented in reverse order to
// the table as we load them MSB first.
uint8_t program_led_btn[21] = {
    0b00000000, //IOREG
    0b00000001, //MEM[16]
    0b00011000, //MEM[15]
    0b00000000, //MEM[14]
    0b11110000, //MEM[13]
    0b11111101, //MEM[12]
    0b00110001, //MEM[11]
    0b11111011, //MEM[10]
    0b11101110, //MEM[9]
    0b00101110, //MEM[8]
    0b11010000, //MEM[7]
    0b00001110, //MEM[6]
    0b11110011, //MEM[5]
    0b10010000, //MEM[4]
    0b00010001, //MEM[3]
    0b11110101, //MEM[2]
    0b10010000, //MEM[1]
    0b00010001, //MEM[0]
    0b00000000, //ACC
    0b00000000, //IR
    0b00000001 //PC[5bit], CU[3bit]
};

//... this will go in the SPI peripheral initialization code
hspi1.Init.Mode = SPI_MODE_MASTER;
hspi1.Init.Direction = SPI_DIRECTION_2LINES;
hspi1.Init.DataSize = SPI_DATASIZE_8BIT;
hspi1.Init.CLKPolarity = SPI_POLARITY_LOW;
hspi1.Init.CLKPhase = SPI_PHASE_1EDGE;
hspi1.Init.NSS = SPI_NSS_SOFT;
//...
hspi1.Init.FirstBit = SPI_FIRSTBIT_MSB;
hspi1.Init.TIMode = SPI_TIMODE_DISABLE;
//...
```

```

//... this will go in your main or similar
uint8_t *program = program_led_btn;
//on the first run, scan_out will give us the reset value of the
// processor
HAL_GPIO_WritePin(SPI_SCAN_CS_GPIO_Port, SPI_SCAN_CS_Pin, 0);
HAL_Delay(100);
HAL_SPI_TransmitReceive(&hspi1, program, scan_out, 21, HAL_MAX_DELAY);
HAL_Delay(100);
HAL_GPIO_WritePin(SPI_SCAN_CS_GPIO_Port, SPI_SCAN_CS_Pin, 1);
//we can check if the program loaded correctly by immediately scanning
// it back in again, it will be unloaded to scan_out
HAL_GPIO_WritePin(SPI_SCAN_CS_GPIO_Port, SPI_SCAN_CS_Pin, 0);
HAL_Delay(100);
HAL_SPI_TransmitReceive(&hspi1, program, scan_out, 21, HAL_MAX_DELAY);
HAL_Delay(100);
HAL_GPIO_WritePin(SPI_SCAN_CS_GPIO_Port, SPI_SCAN_CS_Pin, 1);
//Check if the program loaded correctly
for(int i = 0; i < 21; i++) {
    if(program[i] != scan_out[i]) {
        while(1); //it failed
    }
}

```

## Running a program

Once the program is loaded (using the above code or similar), we can run a program. This is as easy as providing a clock signal to the processor and setting the processor enable line high. We do this using the SPI as follows:

```

uint8_t dummy; //a dummy value
HAL_GPIO_WritePin(SPI_PROC_CS_GPIO_Port, SPI_PROC_CS_Pin, 0);
//...
//run this in a loop
HAL_SPI_TransmitReceive(&hspi1, &dummy, &dummy, 1, HAL_MAX_DELAY);

```

The processor will ignore any value being shifted in on the MOSI data line during operation. However, it does provide a nice feature in that the processor will emit the current value of the processor\_halt signal on the MISO line. This means that we can improve the code to run this process in a loop and catch when the program reaches a HLT instruction:

```

HAL_Delay(100);
HAL_GPIO_WritePin(SPI_PROC_CS_GPIO_Port, SPI_PROC_CS_Pin, 0);

```

```

dummy = 0;
while(1) {
    HAL_SPI_TransmitReceive(&hspi1, &dummy, &dummy, 1, HAL_MAX_DELAY);
    if(dummy == 0xFF)
        break;
}
HAL_GPIO_WritePin(SPI_PROC_CS_GPIO_Port, SPI_PROC_CS_Pin, 1);
HAL_Delay(100);

```

Of course, the example program does not HLT, so we will not reach this point in this code. But, it works for other programs that do contain a HLT instruction.

Once you have the provided example program running, you will be able to press the button and see how the LEDs are toggled.

## IO

---

| # | Input                                                                                          | Output                                                                                           |
|---|------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|
| 0 | clock - connect to an SPI SCK.                                                                 | general purpose output 0<br>(e.g. LED segment a). This output comes from the I/O register bit 1. |
| 1 | reset (active high)                                                                            | general purpose output 1<br>(e.g. LED segment b). This output comes from the I/O register bit 2. |
| 2 | scan enable (active low) - connect to an SPI chip select.                                      | general purpose output 2<br>(e.g. LED segment c). This output comes from the I/O register bit 3. |
| 3 | processor enable (active low) - connect to an SPI chip select.                                 | general purpose output 3<br>(e.g. LED segment d). This output comes from the I/O register bit 4. |
| 4 | scan data in - connect to an SPI MOSI.                                                         | general purpose output 4<br>(e.g. LED segment e). This output comes from the I/O register bit 5. |
| 5 | general purpose input<br>(e.g. Button). This input will be provided to the I/O register bit 0. | general purpose output 5<br>(e.g. LED segment f). This output comes from the I/O register bit 6. |
| 6 | none                                                                                           | general purpose output 6<br>(e.g. LED segment g). This output comes from the I/O register bit 7. |
| 7 | none                                                                                           | scan data out - connect to an SPI MISO.                                                          |

---

## 46 : MicroTapeout

- Author: htfab
- Description: 395 standard cells with a mux to select between them
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware:

### How it works

Digital chip designs are usually written in a hardware description language like RTL Verilog and then synthesized into a set of mask layers suitable for fabrication. In order to make both synthesis and verification robust for huge designs, a modular approach is used where the functionality of the circuit is decomposed into pre-built blocks called *standard cells* with well-known and thoroughly tested behaviour and layout.

This design contains a copy of most standard cells in the sky130\_fd\_sc\_hd library along with a multiplexing mechanism that allows exposing any of them to the input/output pins.

There are 437 standard cells in our library, of which 42 don't produce output or require special power handling. This leaves us with 395 cells. Each cell has up to 6 inputs and up to 2 outputs for a total of 427 outputs. The same 6 inputs are fed into each cell in parallel while the 427 outputs are divided into 54 pages of 8 outputs each with a multiplexer deciding which page is mapped to the output pins.

In order to drive the 6 cell inputs and the 6 bits of input to the mux from a total of 8 input pins we use some registered logic. Input pin 0 is a clock signal while input pin 1 selects *page mode*. On each rising clock edge we save input pins 2 to 7 into a page register if page mode is on and into an input register if page mode is off. Cell inputs are then supplied from the input register and the mux operates on the page register.

Mapping of outputs to pages:

| page   | pin | pin 0/4  | pin 1/5  | pin 2/6   | pin 3/7   |
|--------|-----|----------|----------|-----------|-----------|
| 000000 | 0-3 | conb_1.h | conb_1.l | buf_1     | buf_2     |
|        | 4-7 | buf_4    | buf_6    | buf_8     | buf_12    |
| 000001 | 0-3 | buf_16   | bufbuf_8 | bufbuf_16 | inv_1     |
|        | 4-7 | inv_2    | inv_4    | inv_6     | inv_8     |
| 000010 | 0-3 | inv_12   | inv_16   | bufinv_8  | bufinv_16 |
|        | 4-7 | and2_0   | and2_1   | and2_2    | and2_4    |
| 000011 | 0-3 | and2b_1  | and2b_2  | and2b_4   | and3_1    |

| page   | pin | pin 0/4   | pin 1/5   | pin 2/6   | pin 3/7   |
|--------|-----|-----------|-----------|-----------|-----------|
|        | 4-7 | and3_2    | and3_4    | and3b_1   | and3b_2   |
| 000100 | 0-3 | and3b_4   | and4_1    | and4_2    | and4_4    |
|        | 4-7 | and4b_1   | and4b_2   | and4b_4   | and4bb_1  |
| 000101 | 0-3 | and4bb_2  | and4bb_4  | nand2_1   | nand2_2   |
|        | 4-7 | nand2_4   | nand2_8   | nand2b_1  | nand2b_2  |
| 000110 | 0-3 | nand2b_4  | nand3_1   | nand3_2   | nand3_4   |
|        | 4-7 | nand3b_1  | nand3b_2  | nand3b_4  | nand4_1   |
| 000111 | 0-3 | nand4_2   | nand4_4   | nand4b_1  | nand4b_2  |
|        | 4-7 | nand4b_4  | nand4bb_1 | nand4bb_2 | nand4bb_4 |
| 001000 | 0-3 | or2_0     | or2_1     | or2_2     | or2_4     |
|        | 4-7 | or2b_1    | or2b_2    | or2b_4    | or3_1     |
| 001001 | 0-3 | or3_2     | or3_4     | or3b_1    | or3b_2    |
|        | 4-7 | or3b_4    | or4_1     | or4_2     | or4_4     |
| 001010 | 0-3 | or4b_1    | or4b_2    | or4b_4    | or4bb_1   |
|        | 4-7 | or4bb_2   | or4bb_4   | nor2_1    | nor2_2    |
| 001011 | 0-3 | nor2_4    | nor2_8    | nor2b_1   | nor2b_2   |
|        | 4-7 | nor2b_4   | nor3_1    | nor3_2    | nor3_4    |
| 001100 | 0-3 | nor3b_1   | nor3b_2   | nor3b_4   | nor4_1    |
|        | 4-7 | nor4_2    | nor4_4    | nor4b_1   | nor4b_2   |
| 001101 | 0-3 | nor4b_4   | nor4bb_1  | nor4bb_2  | nor4bb_4  |
|        | 4-7 | xor2_1    | xor2_2    | xor2_4    | xor3_1    |
| 001110 | 0-3 | xor3_2    | xor3_4    | xnor2_1   | xnor2_2   |
|        | 4-7 | xnor2_4   | xnor3_1   | xnor3_2   | xnor3_4   |
| 001111 | 0-3 | a2111o_1  | a2111o_2  | a2111o_4  | a2111oi_0 |
|        | 4-7 | a2111oi_1 | a2111oi_2 | a2111oi_4 | a211o_1   |
| 010000 | 0-3 | a211o_2   | a211o_4   | a211oi_1  | a211oi_2  |
|        | 4-7 | a211oi_4  | a21bo_1   | a21bo_2   | a21bo_4   |
| 010001 | 0-3 | a21boi_0  | a21boi_1  | a21boi_2  | a21boi_4  |
|        | 4-7 | a21o_1    | a21o_2    | a21o_4    | a21oi_1   |
| 010010 | 0-3 | a21oi_2   | a21oi_4   | a221o_1   | a221o_2   |
|        | 4-7 | a221o_4   | a221oi_1  | a221oi_2  | a221oi_4  |
| 010011 | 0-3 | a222oi_1  | a22o_1    | a22o_2    | a22o_4    |
|        | 4-7 | a22oi_1   | a22oi_2   | a22oi_4   | a2bb2o_1  |
| 010100 | 0-3 | a2bb2o_2  | a2bb2o_4  | a2bb2oi_1 | a2bb2oi_2 |
|        | 4-7 | a2bb2oi_4 | a311o_1   | a311o_2   | a311o_4   |
| 010101 | 0-3 | a311oi_1  | a311oi_2  | a311oi_4  | a31o_1    |
|        | 4-7 | a31o_2    | a31o_4    | a31oi_1   | a31oi_2   |
| 010110 | 0-3 | a31oi_4   | a32o_1    | a32o_2    | a32o_4    |
|        | 4-7 | a32oi_1   | a32oi_2   | a32oi_4   | a41o_1    |
| 010111 | 0-3 | a41o_2    | a41o_4    | a41oi_1   | a41oi_2   |

| page   | pin | pin 0/4    | pin 1/5    | pin 2/6    | pin 3/7    |
|--------|-----|------------|------------|------------|------------|
|        | 4-7 | a41oi_4    | o2111a_1   | o2111a_2   | o2111a_4   |
| 011000 | 0-3 | o2111ai_1  | o2111ai_2  | o2111ai_4  | o211a_1    |
|        | 4-7 | o211a_2    | o211a_4    | o211ai_1   | o211ai_2   |
| 011001 | 0-3 | o211ai_4   | o21a_1     | o21a_2     | o21a_4     |
|        | 4-7 | o21ai_0    | o21ai_1    | o21ai_2    | o21ai_4    |
| 011010 | 0-3 | o21ba_1    | o21ba_2    | o21ba_4    | o21bai_1   |
|        | 4-7 | o21bai_2   | o21bai_4   | o221a_1    | o221a_2    |
| 011011 | 0-3 | o221a_4    | o221ai_1   | o221ai_2   | o221ai_4   |
|        | 4-7 | o22a_1     | o22a_2     | o22a_4     | o22ai_1    |
| 011100 | 0-3 | o22ai_2    | o22ai_4    | o2bb2a_1   | o2bb2a_2   |
|        | 4-7 | o2bb2a_4   | o2bb2ai_1  | o2bb2ai_2  | o2bb2ai_4  |
| 011101 | 0-3 | o311a_1    | o311a_2    | o311a_4    | o311ai_0   |
|        | 4-7 | o311ai_1   | o311ai_2   | o311ai_4   | o31a_1     |
| 011110 | 0-3 | o31a_2     | o31a_4     | o31ai_1    | o31ai_2    |
|        | 4-7 | o31ai_4    | o32a_1     | o32a_2     | o32a_4     |
| 011111 | 0-3 | o32ai_1    | o32ai_2    | o32ai_4    | o41a_1     |
|        | 4-7 | o41a_2     | o41a_4     | o41ai_1    | o41ai_2    |
| 100000 | 0-3 | o41ai_4    | maj3_1     | maj3_2     | maj3_4     |
|        | 4-7 | mux2_1     | mux2_2     | mux2_4     | mux2_8     |
| 100001 | 0-3 | mux2i_1    | mux2i_2    | mux2i_4    | mux4_1     |
|        | 4-7 | mux4_2     | mux4_4     | ha_1.c     | ha_1.s     |
| 100010 | 0-3 | ha_2.c     | ha_2.s     | ha_4.c     | ha_4.s     |
|        | 4-7 | fa_1.c     | fa_1.s     | fa_2.c     | fa_2.s     |
| 100011 | 0-3 | fa_4.c     | fa_4.s     | fah_1.c    | fah_1.s    |
|        | 4-7 | fahcin_1.c | fahcin_1.s | fahcon_1.c | fahcon_1.s |
| 100100 | 0-3 | dlxtp_1    | dlxbp_1.q  | dlxbp_1.n  | dlxtn_1    |
|        | 4-7 | dlxtn_2    | dlxtn_4    | dlxrn_1.q  | dlxrn_1.n  |
| 100101 | 0-3 | dlxrn_2.q  | dlxrn_2.n  | drltp_1    | drltp_2    |
|        | 4-7 | drltp_4    | drlbp_1.q  | drlbp_1.n  | drlbp_2.q  |
| 100110 | 0-3 | drlbp_2.n  | drltn_1    | drltn_2    | drltn_4    |
|        | 4-7 | drlbn_1.q  | drlbn_1.n  | drlbn_2.q  | drlbn_2.n  |
| 100111 | 0-3 | dfxtp_1    | dfxtp_2    | dfxtp_4    | dfxbp_1.q  |
|        | 4-7 | dfxbp_1.n  | dfxbp_2.q  | dfxbp_2.n  | dfrtp_1    |
| 101000 | 0-3 | dfrtp_2    | dfrtp_4    | dfrbp_1.q  | dfrbp_1.n  |
|        | 4-7 | dfrbp_2.q  | dfrbp_2.n  | dfrtn_1    | dfstp_1    |
| 101001 | 0-3 | dfstp_2    | dfstp_4    | dfsbp_1.q  | dfsbp_1.n  |
|        | 4-7 | dfsdp_2.q  | dfsdp_2.n  | dfbbp_1.q  | dfbbp_1.n  |
| 101010 | 0-3 | dfbbn_1.q  | dfbbn_1.n  | dfbbn_2.q  | dfbbn_2.n  |
|        | 4-7 | edfxtp_1   | edfxbp_1.q | edfxbp_1.n | sdfxtp_1   |
| 101011 | 0-3 | sdfxtp_2   | sdfxtp_4   | sdfxbp_1.q | sdfxbp_1.n |

| page   | pin | pin 0/4     | pin 1/5       | pin 2/6     | pin 3/7     |
|--------|-----|-------------|---------------|-------------|-------------|
|        | 4-7 | sdfxbp_2.q  | sdfxbp_2.n    | sdfrtp_1    | sdfrtp_2    |
| 101100 | 0-3 | sdfrtp_4    | sdfrbp_1.q    | sdfrbp_1.n  | sdfrbp_2.q  |
|        | 4-7 | sdfrbp_2.n  | sdfrtn_1      | sdfstp_1    | sdfstp_2    |
| 101101 | 0-3 | sdfstp_4    | sdfsdp_1.q    | sdfsdp_1.n  | sdfsdp_2.q  |
|        | 4-7 | sdfsdp_2.n  | sdfbbp_1.q    | sdfbbp_1.n  | sdfbbn_1.q  |
| 101110 | 0-3 | sdfbbn_1.n  | sdfbbn_2.q    | sdfbbn_2.n  | sedfxtp_1   |
|        | 4-7 | sedfxtp_2   | sedfxtp_4     | sedfxbp_1.q | sedfxbp_1.n |
| 101111 | 0-3 | sedfxbp_2.q | sedfxbp_2.n   | ebufn_1/_2  | ebufn_4/_8  |
|        | 4-7 | einvp_1/n_0 | einvp_1/n_1   | einvp_2/n_2 | einvp_4/n_4 |
| 110000 | 0-3 | einvp_8/n_8 | dg~sd1_1      | dg~4sd2_1   | dg~4sd3_1   |
|        | 4-7 | dm~6s2s_1   | dm~6s4s_1     | dm~6s6s_1   | clkbuf_1    |
| 110001 | 0-3 | clkbuf_2    | clkbuf_4      | clkbuf_8    | clkbuf_16   |
|        | 4-7 | clkinv_1    | clkinv_2      | clkinv_4    | clkinv_8    |
| 110010 | 0-3 | clkinv_16   | clkinvlp_2    | clkinvlp_4  | cd~4s15_1   |
|        | 4-7 | cd~4s15_2   | cd~4s18_1     | cd~4s18_2   | cd~4s25_1   |
| 110011 | 0-3 | cd~4s25_2   | cd~4s50_1     | cd~4s50_2   | dlclkp_1    |
|        | 4-7 | dlclkp_2    | dlclkp_4      | sdlclkp_1   | sdlclkp_2   |
| 110100 | 0-3 | sdlclkp_4   | lpfii~0p_1    | lpfii~0n_1  | lpfii~1p_1  |
|        | 4-7 | lpfii~1n_1  | lpfii~latch_1 | lpfibs~_1   | lpfibs~_2   |
| 110101 | 0-3 | lpfibs~_4   | lpfibs~_8     | lpfibs~_16  |             |

where dg~ = dlygate, dm~ = dlymetal, cd~ = clkdklybuf, lpfii~ = lpflow\_inputiso, lpfib~ = lpflow\_isobufsrc.

## How to test

TODO

## IO

| # | Input | Output |
|---|-------|--------|
| 0 | TODO  | TODO   |
| 1 | TODO  | TODO   |
| 2 | TODO  | TODO   |
| 3 | TODO  | TODO   |
| 4 | TODO  | TODO   |
| 5 | TODO  | TODO   |
| 6 | TODO  | TODO   |
| 7 | TODO  | TODO   |

## 47 : nibble multiplier

- Author: 'Mohamed Nasser
- Description: multiply two 8-b numbers in 4 chunks
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware:

### How it works

assign the switches with the first nibble of the first number, make valid high, at the next cycle it will be latched. de-assert valid and assign the switches with the second nibble of the first number, make valid high, at the next cycle it will be latched. assign the switches with the first nibble of the second number, make valid high, at the next cycle it will be latched. de-assert valid and assign the switches with the second nibble of the second number, make valid high, at the next cycle it will be latched. the result will be 16-bits however we have only 8-b output so, if the toggle bit is high the first 8-b of the result will be shown at the output. if the toggle bit is low the second 8-b of the result will be shown at the output.

### How to test

assign the switches with the first nibble of the first number, make valid high, at the next cycle it will be latched. de-assert valid and assign the switches with the second nibble of the first number, make valid high, at the next cycle it will be latched. assign the switches with the first nibble of the second number, make valid high, at the next cycle it will be latched. de-assert valid and assign the switches with the second nibble of the second number, make valid high, at the next cycle it will be latched. the result will be 16-bits however we have only 8-b output so, if the toggle bit is high the first 8-b of the result will be shown at the output. if the toggle bit is low the second 8-b of the result will be shown at the output.

### IO

| # | Input | Output |
|---|-------|--------|
| 0 | data0 | dout0  |
| 1 | data1 | dout1  |
| 2 | data2 | dout2  |
| 3 | data3 | dout3  |

| # | Input   | Output |
|---|---------|--------|
| 4 | clk     | dout4  |
| 5 | reset_n | dout5  |
| 6 | valid   | dout6  |
| 7 | toggle  | dout7  |

## 48 : Synthesizable Digital Temperature Sensor

- Author: Harald Pretl
- Description: Measure the on-chip temperature and display on the LED display.
- GitHub repository
- HDL project
- Extra docs
- Clock: 10000 Hz
- External hardware:

### How it works

By creatively twisting the use of a tristate-inverter (EINVP) a voltage DAC is built. This voltage-mode DAC is used in another twisted arrangement of an EINVP to bias an NMOS into subthreshold operation to discharge a pre-charged capacitor (the input capacitor of an inverter). Since the subthreshold current of a MOSFET is a strong function of temperature, the resulting delay time is also a strong function of temperature, thus a digital temperature sensor is built.

The temperature-dependent digital signal is output at the LED display, showing tens (dot off) and ones (dot on).

A calibration engine via a LUT is included, to allow to linearize and calibrate the shown temperature code.

`io_in[0]` is used as a CLK signal, and `io_in[1]` is used a RESET.

`io_in[4:2]` is used to load and enable the calibration engine.

`io_in[7:5]` is used to enable various debug modes, presenting internal state signals to the `io_out`.

### How to test

After reset, one temperature measurement is taken per second and displayed using the LEDs. A code ranging 0...63 is displayed with tens first (dot off) and ones later (dot on).

During normal operation `io_in[7:5]` have to be set to 000. The different debug modes are documented in the Verilog code.

For calibration, the internal LUT can be serially loaded by using `CAL_CLK` (`io_in[2]`) and `CAL_DAT` (`io_in[3]`). Once fully loaded the calibration engine is enabled by setting `CAL_ENA` (`io_in[4]`) to 1 (setting it to 0 displays the raw sensor code).

## IO

| # | Input         | Output                                       |
|---|---------------|----------------------------------------------|
| 0 | clock         | segment a (or debug information)             |
| 1 | reset         | segment b (or debug information)             |
| 2 | cal_clk       | segment c (or debug information)             |
| 3 | cal_dat       | segment d (or debug information)             |
| 4 | cal_ena       | segment e (or debug information)             |
| 5 | debug_mode[0] | segment f (or debug information)             |
| 6 | debug_mode[1] | segment g (or debug information)             |
| 7 | debug_mode[2] | indicate ones or tens (or debug information) |

## 49 : 4-bits sequential ALU

- Author: Diego Satizabal
- Description: A 4-bits sequential ALU that takes operands and opcode sequentially and performs operations and outputs results
- GitHub repository
- HDL project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

The ALU takes 4-bits wide Operators and Operation sequentially with every clock cycle if the enabled signal is set to high, it takes an additional clock to output the result

### How to test

Run `make` from the command line in the `src` directory to perform all tests suites, you must have Python, cocotb and Icarus Verilog installed If you have GTK Wave installed you can run the command `make test_gtkwave` to generate the VCD output, then run `gtkwave tb.vcd` to see the waveforms

### IO

| # | Input        | Output     |
|---|--------------|------------|
| 0 | clock        | result_0   |
| 1 | reset        | result_1   |
| 2 | enabled      | result_2   |
| 3 | none         | result_3   |
| 4 | Opx_opcode_0 | done_flag  |
| 5 | Opx_opcode_1 | carry_flag |
| 6 | Opx_opcode_2 | zero_flag  |
| 7 | Opx_opcode_3 | sign_flag  |

## 50 : Brightness control of LED with PWM

- Author: Ioannis G. Intzes
- Description: Increase and Decrease the PWM (Pulse-Width) to dim a LED.
- GitHub repository
- HDL project
- Extra docs
- Clock: 12500 Hz
- External hardware:

### How it works

This Verilog code implements a simple PWM (Pulse-Width Modulation) controller module that takes an input clock signal of 12500 Hz and generates a PWM output signal based on the input from two buttons to increase or decrease the PWM duty cycle. The module also includes debouncing functionality for the button inputs to eliminate any bouncing that may occur when the button is pressed. The PWM output is controlled by a duty cycle variable that is updated based on the button inputs and ranges from 0% to 100%. The module also includes output signals that indicate when the maximum and minimum duty cycle values have been reached and a 1 Hz clock signal.

### How to test

After reset, the counter should increase by one every second.

### IO

| # | Input       | Output    |
|---|-------------|-----------|
| 0 | clk         | inled     |
| 1 | btn_incrPWM | deled     |
| 2 | btn_decrPWM | led       |
| 3 | none        | clock_1Hz |
| 4 | none        | none      |
| 5 | none        | none      |
| 6 | none        | none      |
| 7 | none        | none      |

## 51 : Microtapeout

- Author: Enno Schnackenberg
- Description: A Shift Register, A seven segment encoder (Hexadecimal), 1 Bit ALU, 3 Bit Adder and a surprise
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

You can select the project over Input 0 and 1 (e.g. 01 for the seven segment encoder) and then have the 6 remaining pins for the project

### How to test

#### Shift Register

Set Input 0 and 1 to LOW. Input 2 is the Data In Pin, Input 3 is the Shift Register Clock and Input 4 the Ooutput Latch

#### ALU

Set Input 0 and 1 to HIGH and then you can use Input 5 to 7 for the multiplexer Inputs 2 to 4 are for the Inputs of the ALU the Output is connected to Output 1

#### Adder

Set the Inputs like for the ALU, but now ise Inputs 2 to 4 for the first number and Inputs 5 to 7 for the second. Outputs 1 to 4 are the Resulting number

#### Seven Segment

Set Input 0 to LOW and Input 1 to HIGH and then use Inputs 3 - 6 for the number input (MSB first)

### IO

| # | Input            | Output    |
|---|------------------|-----------|
| 0 | Project Select 0 | segment a |
| 1 | Project Select 1 | segment b |
| 2 | I1               | segment c |
| 3 | I2               | segment d |
| 4 | I3               | segment e |
| 5 | I4               | segment f |
| 6 | I5               | segment g |
| 7 | I6               | dot       |

## 52 : Neptune guitar tuner (fixed window)



Figure 11: picture

- Author: Pat Deegan
- Description: It's a guitar tuner! and so much more... (TODO)
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware: The analog input signal must be massaged into clean digital. See the Neptune project for details <https://github.com/psychogenic/neptune>

### How it works

Uses a set of registers to divide the clock, and then some combinational logic to convert from binary to decimal for the display.

Puts a slower square wave output on output 7.

## How to test

After reset, the counter should increase by one every second.

## IO

| # | Input                           | Output         |
|---|---------------------------------|----------------|
| 0 | clock                           | segment a      |
| 1 | reset                           | segment b      |
| 2 | clock config A                  | segment c      |
| 3 | clock config B                  | segment d      |
| 4 | measured frequency input pulses | segment e      |
| 5 | none                            | segment f      |
| 6 | none                            | segment g      |
| 7 | none                            | display select |

## 53 : Neptune guitar tuner (proportional window)



Figure 12: picture

- Author: Pat Deegan
- Description: It's a guitar tuner! and so much more... (TODO)
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware: The analog input signal must be massaged into clean digital. See the Neptune project for details <https://github.com/psychogenic/neptune>

### How it works

Uses a set of registers to divide the clock, and then some combinational logic to convert from binary to decimal for the display.

Puts a slower square wave output on output 7.

## How to test

After reset, the counter should increase by one every second.

## IO

| # | Input                           | Output         |
|---|---------------------------------|----------------|
| 0 | clock                           | segment a      |
| 1 | reset                           | segment b      |
| 2 | clock config A                  | segment c      |
| 3 | clock config B                  | segment d      |
| 4 | measured frequency input pulses | segment e      |
| 5 | none                            | segment f      |
| 6 | none                            | segment g      |
| 7 | none                            | display select |

## 54 : M segments

- Author: Matt Venn
- Description: pressing the first 4 buttons will put an M on the LEDs
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

Uses 3 AND gates

### How to test

Press the first 4 buttons and an M will show on the 7 segment display

### IO

| # | Input | Output    |
|---|-------|-----------|
| 0 | in1   | segment a |
| 1 | in2   | segment b |
| 2 | in3   | segment c |
| 3 | in4   | segment d |
| 4 | none  | segment e |
| 5 | none  | segment f |
| 6 | none  | segment g |
| 7 | none  | dot       |

## 55 : 7 segment seconds

- Author: Matt Venn
- Description: Count up to 10, one second at a time.
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware:

### How it works

Uses a set of registers to divide the clock, and then some combinational logic to convert from binary to decimal for the display.

Puts a slower square wave output on output 7.

### How to test

After reset, the counter should increase by one every second.

### IO

| # | Input | Output            |
|---|-------|-------------------|
| 0 | clock | segment a         |
| 1 | reset | segment b         |
| 2 | none  | segment c         |
| 3 | none  | segment d         |
| 4 | none  | segment e         |
| 5 | none  | segment f         |
| 6 | none  | segment g         |
| 7 | none  | slow clock output |

## 56 : 7 segment wokwi counter



Figure 13: picture

- Author: Matt Venn
- Description: counts up from 0 to 9, incrementing once per second
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 1 Hz
- External hardware:

### How it works

Uses a 4 bit counter to sequence, the combinational logic to drive the 7 segment display.

### How to test

Set the clock to 1Hz.

### IO

| # | Input | Output    |
|---|-------|-----------|
| 0 | clock | segment a |
| 1 | none  | segment b |
| 2 | none  | segment c |
| 3 | none  | segment d |
| 4 | none  | segment e |
| 5 | none  | segment f |
| 6 | none  | segment g |
| 7 | none  | dot       |

## 57 : Straight through test

- Author: Matt Venn
- Description: Just connects inputs to outputs
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

8 wires connect the 8 inputs to the 8 outputs

### How to test

Set any DIP switch high and the corresponding LED should light on the 7 segment display

### IO

| # | Input | Output    |
|---|-------|-----------|
| 0 | a     | segment a |
| 1 | b     | segment b |
| 2 | c     | segment c |
| 3 | d     | segment d |
| 4 | e     | segment e |
| 5 | f     | segment f |
| 6 | g     | segment g |
| 7 | dot   | dot       |

## 58 : Combo lock

- Author: Benjamin Collier
- Description: set, reset, and check for a combo
- GitHub repository
- HDL project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

this projects works by taking 4 inputs for number 0-9. The first 4 number inputed will be the new combo (0000 is not an input zero should be 0001). Then to “unlock” press your combo again. to reset and enter a new combo the unlock must be high and reset must be pressed, or master reset must be pressed.

### How to test

Enter 4 single digits in binary into input 0-4, each input must be followed by 0000. Then input the same 4 digits in each followed by 0000 to make unlock high. a valid input would look some think like this: 0001, 0000, 0101, 0000, 1001, 0000, 0011, 0000, then 0001, 0000, 0101, 0000, 1001, 0000, 0011, 0000, will make unlock high.

### IO

| # | Input        | Output |
|---|--------------|--------|
| 0 | input 0      | unlock |
| 1 | input 1      | none   |
| 2 | input 2      | none   |
| 3 | input 3      | none   |
| 4 | reset        | none   |
| 5 | master reset | none   |
| 6 | none         | none   |
| 7 | none         | none   |

## 59 : ro-based\_tempsense

- Author: Jorge Marin, Daniel Arevalos
- Description: Ring oscillator whose frequency depends on temperature.
- GitHub repository
- HDL project
- Extra docs
- Clock: 10000 Hz
- External hardware:

### How it works

Uses counters to determine the number of cycles of the ring oscillator within a clock period, which will then be sent through the UART to determine the temperature vs frequency characteristic.

### How to test

After reset and enable are set, the ring oscillator should start and then when a START code is received by UART, a value is sent back.

### IO

| # | Input           | Output  |
|---|-----------------|---------|
| 0 | clk_internal    | tx      |
| 1 | clk_external    | sum[8]  |
| 2 | clk_sel         | sum[10] |
| 3 | enable_inv_osc  | sum[13] |
| 4 | enable_nand_osc | sum[15] |
| 5 | reset           | sum[17] |
| 6 | rx              | sum[19] |
| 7 | osc_sel         | sum[21] |

## 60 : FSM\_LAT

- Author: Juan Sanchez
- Description: FSM look at table
- GitHub repository
- HDL project
- Extra docs
- Clock: 0 Hz
- External hardware: un fpga que permita generar un relog una salida serial para programar la secuencia de la maquina de estados

### How it works

el proyecto es una maquina de estados a la cual se le puede cambiar la secuencia de los estados por medio de una entrada serial

### How to test

para cargar la secuencia de estados se debe conectar un relog el cual este sincronizado con la secuencia serial para programar la maquina de estados = y para pasar de estado la entrada conectada a cada estado debe estar en 1

### IO

| # | Input    | Output    |
|---|----------|-----------|
| 0 | oi_in[0] | oi_out[1] |
| 1 | oi_in[1] | oi_out[2] |
| 2 | oi_in[2] | oi_out[3] |
| 3 | oi_in[3] | oi_out[4] |
| 4 | oi_in[4] | oi_out[5] |
| 5 | oi_in[5] | oi_out[6] |
| 6 | oi_in[6] | oi_out[7] |
| 7 | oi_in[7] | oi_out[0] |

## 61 : 7 segment seconds

- Author: Matt Venn
- Description: Count up to 10, one second at a time.
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware:

### How it works

Uses a set of registers to divide the clock, and then some combinational logic to convert from binary to decimal for the display.

Puts a slower square wave output on output 7.

### How to test

After reset, the counter should increase by one every second.

### IO

| # | Input | Output            |
|---|-------|-------------------|
| 0 | clock | segment a         |
| 1 | reset | segment b         |
| 2 | none  | segment c         |
| 3 | none  | segment d         |
| 4 | none  | segment e         |
| 5 | none  | segment f         |
| 6 | none  | segment g         |
| 7 | none  | slow clock output |

## 62 : MMM Finite State Machine (4 States)

- Author: Alexandra Zhang Jiang
- Description: Finite State Machine for the Magnetic Microsystems and Micro-robotics Research Lab at UCI
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 1000 Hz
- External hardware: 3-electrode sensor and motors

### How it works

This chip will be incorporated in the microrobot that is being developed in lab.

### Inputs

The inputs consist of two bits I0 and I1, each one for a sensor that detects contamination in water to the left and right of the robot respectively. The input bus goes to logic gates along with the Current State to calculate the Next State, which is then saved in a register (D-FlipFlop).

### States

There are a total of four states: - 00 Start (default initial state of the robot, both motors are off) - 01 Go Right (which enables Left Motor in order to go right) - 10 Go Left (which enables Right Motor in order to go left) - 11 Straight (both motors are enabled to go straight)

The register will store the Current State (which is given by S0 and S1) at any given time.

### Outputs

This is a Moore Finite State Machine, meaning that the output only depends on the current state. The outputs consist of two bits M0 and M1, each one turns on the motor on the right or left of the robot respectively.

### How to test

The truth table for the FSM can be found below.

| I0 | I1 | S0 | S1 | S0+ | S1+ | M0 | M1 |
|----|----|----|----|-----|-----|----|----|
| 0  | 0  | 0  | 0  | 0   | 0   | 0  | 0  |
| 0  | 0  | 0  | 1  | 1   | 1   | 0  | 1  |
| 0  | 0  | 1  | 0  | 1   | 1   | 1  | 0  |
| 0  | 0  | 1  | 1  | 1   | 1   | 1  | 1  |
| 0  | 1  | 0  | 0  | 0   | 1   | 0  | 0  |
| 0  | 1  | 0  | 1  | 0   | 1   | 0  | 1  |
| 0  | 1  | 1  | 0  | 0   | 1   | 1  | 0  |
| 0  | 1  | 1  | 1  | 0   | 1   | 1  | 1  |
| 1  | 0  | 0  | 0  | 1   | 0   | 0  | 0  |
| 1  | 0  | 0  | 1  | 1   | 0   | 0  | 1  |
| 1  | 0  | 1  | 0  | 1   | 0   | 1  | 0  |
| 1  | 0  | 1  | 1  | 1   | 0   | 1  | 1  |
| 1  | 1  | 0  | 0  | 1   | 1   | 0  | 0  |
| 1  | 1  | 0  | 1  | 1   | 1   | 0  | 1  |
| 1  | 1  | 1  | 0  | 1   | 1   | 1  | 0  |
| 1  | 1  | 1  | 1  | 1   | 1   | 1  | 1  |

## IO

| # | Input              | Output             |
|---|--------------------|--------------------|
| 0 | clock              | Enable Right Motor |
| 1 | reset              | Enable Left Motor  |
| 2 | none               | segment c          |
| 3 | none               | segment d          |
| 4 | none               | segment e          |
| 5 | none               | segment f          |
| 6 | Sensor Left Input  | segment g          |
| 7 | Sensor Right Input | dot                |

## 63 : FSK modem

- Author: Balint Kovacs
- Description: Very simple FSK modem, comparable to Bell 103
- GitHub repository
- HDL project
- Extra docs
- Clock: 12500 Hz
- External hardware:

### How it works

On the transmit side, it has two triangle wave synthesizers. The frequency of one can be altered by the data in pin, and the other one can be mixed in for DTMF.

On the receive side, it takes the input, heterodynes it with some square waves, and then filters I and Q over a 31 tap FIR filter. Finally, a phase detector is applied, that output is differentiated, and filtered for glitches. The noise immunity of this approach is questionable.

### How to test

- Set registers over SPI.
- Connect the “incoming” phone line to a comparator, and to samples\_in.
- Connect the “outgoing” phone line to an R-2R DAC, and to samples\_out.
- Connect a serial port to data\_in and data\_out.
- Enjoy squawking 300bps serial!

### IO

| # | Input      | Output         |
|---|------------|----------------|
| 0 | clock      | data_out       |
| 1 | reset      | valid_out      |
| 2 | cs_n       | samples_out[0] |
| 3 | sck        | samples_out[1] |
| 4 | mosi       | samples_out[2] |
| 5 | data_in    | samples_out[3] |
| 6 | none       | samples_out[4] |
| 7 | samples_in | samples_out[5] |

## 64 : ChatGPT-created Benchmarks

- Author: Jason Blocklove
- Description: This project implements 8 different benchmark circuits created 100% with ChatGPT-4.
- GitHub repository
- HDL project
- Extra docs
- Clock: 15000 Hz
- External hardware:

### How it works

This design implements a series of 8 benchmark circuits selectable by 3 bits of the design input, all of which written by ChatGPT-4. A series of prompts for each circuit were created which had ChatGPT design the module itself as well as a Verilog testbench for the design, and the design was considered finalized when there were no errors from simulation or synthesis. As much of the feedback as possible was given by the tools – Icarus Verilog for simulation and the Tiny Tapeout OpenLane/yosys toolchain for synthesis.

As a result of the designs being 100% created by ChatGPT, they all passed their ChatGPT-created testbenches, but several are not functionally correct as the generated testbenches are insufficient or incorrect. The best example of this is the **Dice Roller** benchmark, which has a constant output but passes its testbench fully.

The only design with a human-made testbench is the **Wrapper Module**, whose testbench uses dummy modules just to ensure the multiplexing works as expected. It seemed unrealistic to have ChatGPT create a testbench for every benchmark all in one given the token limits and how it struggled to make some of the standalone testbenches.

The complete transcripts of the ChatGPT conversations can be found at [https://github.com/JBlocklove/tt03-chatgpt-4\\_benchmarks/tree/main/conversations](https://github.com/JBlocklove/tt03-chatgpt-4_benchmarks/tree/main/conversations)

---

### Wrapper Module/Multiplexer

#### ChatGPT Prompt

I am trying to create a Verilog model for a wrapper around several benchmarks, specifically for the Tiny Tapeout project. It must meet the following specifications:

- Inputs:
  - io\_in (8-bits)
- Outputs:
  - io\_out (8-bits)

The design should instantiate the following modules, and use 3 bits of the 8-bit input to select which one will output from the module.

Benchmarks:

- shift\_register:
  - Inputs:
    - clk
    - reset\_n
    - data\_in
    - shift\_enable
  - Outputs:
    - [7:0] data\_out
- sequence\_generator:
  - Inputs:
    - clock
    - reset\_n
    - enable
  - Outputs:
    - [7:0] data
- sequence\_detector:
  - Inputs:
    - clk
    - reset\_n
    - [2:0] data
  - Outputs:
    - sequence\_found
- abro\_state\_machine:
  - Inputs:
    - clk
    - reset\_n
    - A
    - B
  - Outputs:

- 0
- [3:0] state
  
- binary\_to\_bcd:
  - Inputs:
    - [4:0] binary\_input
  - Outputs:
    - [3:0] bcd\_tens
    - [3:0] bcd\_units
  
- lfsr:
  - Inputs:
    - clk
    - reset\_n
  - Outputs:
    - [7:0] data
  
- traffic\_light:
  - Inputs:
    - clk
    - reset\_n
    - enable
  - Outputs:
    - red
    - yellow
    - green
  
- dice\_roller:
  - Inputs:
    - clk
    - reset\_n
    - [1:0] die\_select
    - roll
  - Outputs:
    - rolled\_number

How would I write a design that meets these specifications?

## Benchmark I/O Mapping

| io_in[7:5] | Benchmark          |
|------------|--------------------|
| 000        | Shift Register     |
| 001        | Sequence Generator |
| 010        | Sequence Detector  |
| 011        | ABRO               |
| 100        | Binary to BCD      |
| 101        | LFSR               |
| 110        | Traffic Light      |
| 111        | Dice Roller        |

## Expected Functionality

The top level module for the design is a wrapper module/multiplexer which allows the user to select which benchmark is being used for the output of the design. Using `io_in[7:5]`, the user can select which benchmark will output to the `io_out` pins.

This module was created after all of the other designs were finalized so their port mappings could be given

## Actual Functionality

The module functions as intended. This is the only module with a human-written testbench, as it seemed unrealistic to have ChatGPT create a full testbench that confirmed the module instantiations worked given how much it struggled with some of the other testbenches.

---

## Shift Register

### ChatGPT Prompt

I am trying to create a Verilog model for a shift register. It must meet the following specifications:

- Inputs:
  - Clock
  - Active-low reset
  - Data (1 bit)
  - Shift enable
- Outputs:
  - Data (8 bits)

How would I write a design that meets these specifications?

## Benchmark I/O Mapping

| # | Input         | Output           |
|---|---------------|------------------|
| 0 | clk           | Shifted data [0] |
| 1 | rst_n (async) | Shifted data [1] |
| 2 | data_in       | Shifted data [2] |
| 3 | shift_enable  | Shifted data [3] |
| 4 | Not used      | Shifted data [4] |
| 5 | Select bit    | Shifted data [5] |
| 6 | Select bit    | Shifted data [6] |
| 7 | Select bit    | Shifted data [7] |

## Expected Functionality

The expected functionality of this shift register module is to shift the `data_in` bit in on the right side of the data vector on any rising `clk` edge where `shift_enable` is high.

## Actual Functionality

The module seems to function as intended.

---

## Sequence Generator

### ChatGPT Prompt

I am trying to create a Verilog model for a sequence generator. It must meet the following specifications:

- Inputs:
  - Clock
  - Active-low reset
  - Enable
- Outputs:
  - Data (8 bits)

While enabled, it should generate an output sequence of the following hexadecimal values and then repeat:

- 0xAF
- 0xBC
- 0xE2
- 0x78
- 0xFF
- 0xE2
- 0x0B
- 0x8D

How would I write a design that meets these specifications?

## Benchmark I/O Mapping

| # | Input         | Output              |
|---|---------------|---------------------|
| 0 | clk           | Sequence Output [0] |
| 1 | rst_n (async) | Sequence Output [1] |
| 2 | Not used      | Sequence Output [2] |
| 3 | Not used      | Sequence Output [3] |
| 4 | enable        | Sequence Output [4] |
| 5 | Select bit    | Sequence Output [5] |
| 6 | Select bit    | Sequence Output [6] |
| 7 | Select bit    | Sequence Output [7] |

## Expected Functionality

The expected functionality of this sequence generator is to output the following sequence, moving a step forward whenever the clk has a rising edge and the enable is high. Once the sequence has reached its end it should repeat.

0xAF  
0xBC  
0xE2  
0x78  
0xFF  
0xE2  
0x0B  
0x8D

## Actual Functionality

The module functions as intended.

---

## Sequence Detector

### ChatGPT Prompt

I am trying to create a Verilog model for a sequence detector. It must meet the following specifications:

- Inputs:
  - Clock
  - Active-low reset
  - Data (3 bits)
- Outputs:
  - Sequence found

While enabled, it should detect the following sequence of binary input values:

- 0b001
- 0b101
- 0b110
- 0b000
- 0b110
- 0b110
- 0b011
- 0b101

How would I write a design that meets these specifications?

### Benchmark I/O Mapping

| # | Input         | Output   |
|---|---------------|----------|
| 0 | clk           | Not Used |
| 1 | rst_n (async) | Not Used |
| 2 | data[0]       | Not Used |
| 3 | data[1]       | Not Used |
| 4 | data[2]       | Not Used |
| 5 | Select bit    | Not Used |

| # | Input      | Output         |
|---|------------|----------------|
| 6 | Select bit | Not Used       |
| 7 | Select bit | Sequence Found |

## Expected Functionality

The expected functionality of this sequence detector is to output a 1 if it receives the following sequence of data all on consecutive clock cycles.

```
0b001
0b101
0b110
0b000
0b110
0b110
0b011
0b101
```

## Actual Functionality

The module does not correctly detect the sequence. In trying to set the states to allow the sequence to overlap it instead skips the final value or outputs a 1 if the second to last value and final value are both 0b101.

---

## ABRO State Machine

### ChatGPT Prompt

I am trying to create a Verilog model for an ABRO state machine.  
It must meet the following specifications:

- Inputs:
  - Clock
  - Active-low reset
  - A
  - B
- Outputs:
  - O
  - State

Other than the main output from ABRO machine, it should output the current state of the machine for use in verification.

The states for this state machine should be one-hot encoded.

How would I write a design that meets these specifications?

## Benchmark I/O Mapping

| # | Input           | Output    |
|---|-----------------|-----------|
| 0 | clk             | State [0] |
| 1 | reset_n (async) | State [1] |
| 2 | A               | State [2] |
| 3 | B               | State [3] |
| 4 | Not used        | Output    |
| 5 | Select bit      | Not used  |
| 6 | Select bit      | Not used  |
| 7 | Select bit      | Not used  |

## Expected Functionality

The expected functionality of the ABRO (A, B, Reset, Output) state machine is to only reach the output state and output a 1 when both inputs A and B have been given before a reset. The order of the inputs should not matter, so long as both A and B are set.

## Actual Functionality

The module does not function fully as intended. If B is received before A then it works as intended, but if A is received first then it actually requires the sequence A, B, A in order to reach the output state. It also does not handle the case where A and B are set in the same cycle, instead interpreting it as if A was received first.

---

## Binary to BCD Converter

### ChatGPT Prompt

I am trying to create a Verilog model for a binary to

binary-coded-decimal converter. It must meet the following specifications:

- Inputs:
  - Binary input (5-bits)
- Outputs:
  - BCD (8-bits: 4-bits for the 10's place and 4-bits for the 1's place)

How would I write a design that meets these specifications?

## Benchmark I/O Mapping

| # | Input           | Output       |
|---|-----------------|--------------|
| 0 | binary_input[0] | BCD Ones [0] |
| 1 | binary_input[1] | BCD Ones [1] |
| 2 | binary_input[2] | BCD Ones [2] |
| 3 | binary_input[3] | BCD Ones [3] |
| 4 | binary_input[4] | BCD Tens [0] |
| 5 | Select bit      | BCD Tens [1] |
| 6 | Select bit      | BCD Tens [2] |
| 7 | Select bit      | BCD Tens [3] |

## Expected Functionality

The expected functionality of this module is to take a 5-bit binary number and produce a binary-coded-decimal output. The 4 most significant bits of the output encode to the tens place of the decimal number, the 4 least significant bits of the output encode the ones place of the decimal number.

## Actual Functionality

The module functions as intended.

---

## Linear Feedback Shift Register (LFSR)

### ChatGPT Prompt

I am trying to create a Verilog model for an LFSR. It must meet the following specifications:

- Inputs:
  - Clock
  - Active-low reset
- Outputs:
  - Data (8-bits)

The initial state should be 10001010, and the taps should be at locations 1, 4, 6, and 7.

How would I write a design that meets these specifications?

## Benchmark I/O Mapping

| # | Input         | Output          |
|---|---------------|-----------------|
| 0 | clk           | Data Output [0] |
| 1 | rst_n (async) | Data Output [1] |
| 2 | Not used      | Data Output [2] |
| 3 | Not used      | Data Output [3] |
| 4 | Not used      | Data Output [4] |
| 5 | Select bit    | Data Output [5] |
| 6 | Select bit    | Data Output [6] |
| 7 | Select bit    | Data Output [7] |

## Expected Functionality

The expected functionality of this module is to generate a pseudo-random output value from this LFSR based on the tap locations given in the prompt.

It was unspecified in the prompt, but originally it was expected that the LFSR would shift right as is standard amongst most documentation. It instead shifts to the left, which is not inherently incorrect but warrants mentioning.

## Actual Functionality

This module functions almost as expected, except the taps were placed on indices off-by-one. Rather than being at indices 1, 4, 6, and 7, they are at indices 0, 3, 5, and 6. This is still a valid LFSR, it is just not quite what was requested.

---

## Traffic Light State Machine

## ChatGPT Prompt

I am trying to create a Verilog model for a traffic light state machine. It must meet the following specifications:

- Inputs:
  - Clock
  - Active-low reset
  - Enable
- Outputs:
  - Red
  - Yellow
  - Green

The state machine should reset to a red light, change from red to green after 32 clock cycles, change from green to yellow after 20 clock cycles, and then change from yellow to red after 7 clock cycles.

How would I write a design that meets these specifications?

## Benchmark I/O Mapping

| # | Input         | Output              |
|---|---------------|---------------------|
| 0 | clk           | Sequence Output [0] |
| 1 | rst_n (async) | Sequence Output [1] |
| 2 | Not used      | Sequence Output [2] |
| 3 | enable        | Sequence Output [3] |
| 4 | Not used      | Sequence Output [4] |
| 5 | Select bit    | Green               |
| 6 | Select bit    | Yellow              |
| 7 | Select bit    | Red                 |

## Expected Functionality

The expected functionality of this module is to simulate the function of a timed traffic light. On a reset it outputs a red light, waits 32 clock cycles and then changes to a green light, waits 20 clock cycles and then changes to a yellow light, waits 7 clock cycles and then changes back to red. This should then repeat. If the enable is low, then it should pause the operation entirely and pick up again once the enable is brought high again.

## Actual Functionality

The module functions as intended.

---

## Dice Roller

### ChatGPT Prompt

I am trying to create a Verilog model for a simulated dice roller. It must meet the following specifications:

- Inputs:
  - Clock
  - Active-low reset
  - Die select (2-bits)
  - Roll
- Outputs:
  - Rolled number (up to 8-bits)

The design should simulate rolling either a 4-sided, 6-sided, 8-sided, or 20-sided die, based on the input die select. It should roll when the roll input goes high and output the random number based on the number of sides of the selected die.

How would I write a design that meets these specifications?

### Benchmark I/O Mapping

| # | Input         | Output        |
|---|---------------|---------------|
| 0 | clk           | Dice Roll [0] |
| 1 | rst_n (async) | Dice Roll [1] |
| 2 | die_select[1] | Dice Roll [2] |
| 3 | die_select[0] | Dice Roll [3] |
| 4 | roll          | Dice Roll [4] |
| 5 | Select bit    | Dice Roll [5] |
| 6 | Select bit    | Dice Roll [6] |
| 7 | Select bit    | Dice Roll [7] |

## Expected Functionality

The expected functionality of this dice roller is to allow the user to select which die they would like to simulate rolling based on the following table:

| die_select | Number of sides |
|------------|-----------------|
| 00         | 4               |
| 01         | 6               |
| 10         | 8               |
| 11         | 20              |

When `roll` is high the module should output a new pseudo-random value in the range [1 - Number of sides]

## Actual Functionality

This module outputs 2 for the first two dice rolls and then consistently outputs a 1 regardless of what die is selected.

## How to test

Testing this design has some difficulties as there are several different functionalities that are selected between and not all work as expected.

For sequential designs the input vectors are expected to be given across consecutive clock cycles, as such for those designs `io_in[0]` will not be given. For designs with input bits that are unused, those positions will be shown as `x` to represent don't-cares.

The following tables give input test vectors and their expected outputs:

## Shift Register

| Given Input | Expected Output | Comment    |
|-------------|-----------------|------------|
| 000x000     | 00000000        | Reset      |
| 000x001     | 00000000        | Disabled   |
| 000x011     | 00000000        | Shift in 0 |
| 000x111     | 00000001        | Shift in 1 |
| 000x111     | 00000011        | Shift in 1 |
| 000x101     | 00000110        | Shift in 0 |
| 000x111     | 00001101        | Shift in 1 |
| 000x011     | 00001101        | Disabled   |

| Given Input | Expected Output | Comment |
|-------------|-----------------|---------|
| 000x110     | 00000000        | Reset   |

## Sequence Generator

| Given Input | Expected Output | Comment  |
|-------------|-----------------|----------|
| 0010xx0     | 10101111        | Reset    |
| 0010xx1     | 10101111        | Disabled |
| 0011xx1     | 10101111        | Enabled  |
| 0011xx1     | 10111100        | Enabled  |
| 0011xx1     | 11100010        | Enabled  |
| 0011xx1     | 01111000        | Enabled  |
| 0011xx1     | 11111111        | Enabled  |
| 0011xx1     | 11100010        | Enabled  |
| 0011xx1     | 00001011        | Enabled  |
| 0010xx1     | 10001101        | Disabled |
| 0011xx1     | 10001101        | Enabled  |
| 0011xx0     | 10101111        | Reset    |

## Sequence Detector

This module does not function as intended. These vectors are meant to represent what will be expected from the hardware, not what is expected from the initial design description.

| Given Input | Expected Output | State    |
|-------------|-----------------|----------|
| 0100000     | 00000000        | S0       |
| 0100001     | 00000000        | S0       |
| 0100011     | 00000000        | S0 -> S1 |
| 0101011     | 00000000        | S1 -> S2 |
| 0101101     | 00000000        | S2 -> S3 |
| 0100001     | 00000000        | S3 -> S4 |
| 0101101     | 00000000        | S4 -> S5 |
| 0101101     | 00000000        | S5 -> S6 |
| 0100111     | 10000000        | S6 -> S0 |
| 0101011     | 00000000        | S0       |
| 0100001     | 00000000        | S0       |
| 0100011     | 00000000        | S0 -> S1 |
| 0101011     | 00000000        | S1 -> S2 |

| Given Input | Expected Output | State    |
|-------------|-----------------|----------|
| 0101101     | 00000000        | S2 -> S3 |
| 0100001     | 00000000        | S3 -> S4 |
| 0101101     | 00000000        | S4 -> S5 |
| 0101101     | 00000000        | S5 -> S6 |
| 0101011     | 10000000        | S6 -> S7 |
| 0100111     | 00000000        | S7 -> S2 |

## ABRO State Machine

| Given Input | Expected Output | State     |
|-------------|-----------------|-----------|
| 011x000     | 00000001        | IDLE      |
| 011x001     | 00000001        | IDLE      |
| 011x011     | 00000010        | IDLE -> A |
| 011x101     | 00000100        | A -> B    |
| 011x111     | 00011000        | B -> O    |
| 011x111     | 00000001        | O -> IDLE |
| 011x001     | 00000001        | IDLE      |
| 011x101     | 00000100        | IDLE -> B |
| 011x011     | 00000100        | B -> O    |
| 011x011     | 00000001        | O -> IDLE |

## Binary to BCD Converter

This design is purely combinational, so all 8-bits are included in the input vector. Clock cycles do not matter for this functionality.

| Given Input | Expected Output | Comment |
|-------------|-----------------|---------|
| 10000001    | 00000001        | 1       |
| 10011111    | 00110001        | 31      |
| 10011010    | 00100110        | 26      |
| 10010011    | 00011001        | 19      |

## LFSR

As the LFSR only has a clock and reset input, the table will only give the first vectors for resetting and then setting, but it will show the first several cycles of the LFSR from the initial vector.

| Given Input | Expected Output | Comment |
|-------------|-----------------|---------|
| 0000000     | 10001010        |         |
| 0000001     | 00010101        |         |
| ...         | 00101011        |         |
| ...         | 01010111        |         |
| ...         | 10101110        |         |
| ...         | 01011100        |         |
| ...         | 10111001        |         |

## Traffic Light State Machine

Given the nature of this benchmark being dependent on counting clock cycles, it seems unhelpful to provide a suite of test vectors and their expected outputs. A limited set of vectors is provided with comments on their general functionality, but full testing will be left to the user's discretion.

| Given Input | Expected Output | Comment                                                                |
|-------------|-----------------|------------------------------------------------------------------------|
| 110x0x0     | 10000000        | Reset: Sets the state machine back to the initial (RED) state          |
| 110x0x1     | Depends         | Disabled: Holds the FSM in its current state, pauses clock cycle count |
| 110x1x1     | Depends         | Enabled: Resumes clock cycle counting and allows state transitions     |

## Dice Roller

This module does not function as intended. These vectors are meant to represent what will be expected from the hardware, not what is expected from the initial design description.

| Given Input | Expected Output | Comment |
|-------------|-----------------|---------|
| 1110000     | 00000000        | Reset   |
| 1111001     | 00000010        | d4      |
| 1110001     | 00000010        | d4      |
| 1111101     | 00000001        | d6      |
| 1110101     | 00000001        | d6      |
| 1111011     | 00000001        | d8      |
| 1110011     | 00000001        | d8      |
| 1111111     | 00000001        | d20     |
| 1110111     | 00000001        | d20     |
| 1111001     | 00000001        | d4      |
| 1111001     | 00000001        | d4      |

## IO

| # | Input                | Output               |
|---|----------------------|----------------------|
| 0 | Depends on benchmark | Depends on benchmark |
| 1 | Depends on benchmark | Depends on benchmark |
| 2 | Depends on benchmark | Depends on benchmark |
| 3 | Depends on benchmark | Depends on benchmark |
| 4 | Depends on benchmark | Depends on benchmark |
| 5 | benchmark_select[0]  | Depends on benchmark |
| 6 | benchmark_select[1]  | Depends on benchmark |
| 7 | benchmark_select[2]  | Depends on benchmark |

## 65 : HiddenCPU

- Author: HiddenRoom
- Description: Basic 8 bit CPU.
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware: No absolute requirements but external memory and hardware for addressing is not a bad idea

### How it works

Each instruction is executed in one clock cycle.

Instructions are inputed on io\_in[2:7].

### How to test

After reset, registers will initialize, in order of ascending address, to 00000000b, 00000001b, 00000010b, 00000011b

### IO

| # | Input                                   | Output                            |
|---|-----------------------------------------|-----------------------------------|
| 0 | clock                                   | bit zero of hardwired output reg  |
| 1 | reset                                   | number at address 11b or pc       |
| 2 | opcode bit zero                         | bit one of hardwired output reg   |
| 3 | opcode bit one                          | number at address 11b or pc       |
| 4 | result reg address bit zero             | bit two of hardwired output reg   |
| 5 | result reg address bit one              | number at address 11b or pc       |
| 6 | non result operand reg address bit zero | bit three of hardwired output reg |
| 7 | non result operand reg address bit one  | number at address 11b or pc       |

## 66 : Simple UART interface

- Author: Aleksandr Zlobin
- Description: UART interface with access to internal registers
- GitHub repository
- HDL project
- Extra docs
- Clock: 12500 Hz
- External hardware:

### How it works

Each transaction consists of two bytes. First byte - address and r/w bit. Second byte - data to be written (write transaction), or data read from internal registers (read transaction).

### How to test

After reset LEDs should show 3 horizontal segments.

Read transaction: first byte is send by host, device will reply with one data byte. List of implemented commands:

1. Read 15:18 bits of signature: send 0x00, device will reply with 0xDA byte
2. Read 7:0 bits of signature: send 0x02, device will reply with 0xDE byte
3. Read internal register: send 0x04, device will reply with 0x00 byte (initial value of internal register is 0)
4. Read led register: send 0x06, device will reply with 0x49 byte (initial value of led register is 0x49)
5. Read status of input4-input7: send 0x08, device will reply with byte containing state of the inputs.

Write transaction: host should send two bytes first - command, second data:

1. Write to internal register: send 0x05, send byte you want to write
2. Write to LED register: send 0x07, send byte you want to write to LED register

### IO

| # | Input  | Output    |
|---|--------|-----------|
| 0 | clock  | segment a |
| 1 | reset  | segment b |
| 2 | rx     | segment c |
| 3 | input4 | segment d |
| 4 | input5 | segment e |
| 5 | input6 | segment f |
| 6 | input7 | segment g |
| 7 | input8 | tx        |

## 67 : MSF Clock

- Author: Jamie Wood & Daniel Cannell
- Description: MSF radio clock
- GitHub repository
- HDL project
- Extra docs
- Clock: 12500 Hz
- External hardware:

### How it works

The MSF radio clock bitstream is synchronised, decoded, and the time displayed on an LED display driven by a shift-register.

### How to test

- Provide 12.5kHz clock to the clock pin, e.g. using the clock divider set to 0.
- Connect the data input to a WWVB / MSF receiver module.
- Connect the inverted input high if the module outputs a 1 when there is no carrier.
- Connect the shift\_date input to a button. If held, the display will show date instead of time.
- Connect the shift register outputs to a 42-bit shift register chain. Bit 41 is shifted out first.

| Bit   | Segment                     |
|-------|-----------------------------|
| 41-35 | Hours tens G-A (41=G, 35=A) |
| 34-28 | Hours ones G-A as above     |
| 27-21 | Minutes tens G-A as above   |
| 20-14 | Minutes ones G-A as above   |
| 13-7  | Seconds tens G-A as above   |
| 6-0   | Seconds ones G-A as above   |

Time and date are always both shifted out, so an 84-bit shift register will show both simultaneously.

| # | Input      | Output      |
|---|------------|-------------|
| 0 | clock      | shift_clk   |
| 1 | reset      | shift_data  |
| 2 | data       | shift_latch |
| 3 | inverted   | none        |
| 4 | shift_date | none        |
| 5 | none       | none        |
| 6 | none       | none        |
| 7 | none       | none        |

## 68 : Hamming(7,4) encoder decoder”

- Author: Robbert
- Description: Encodes and decodes data using Hamming 7,4 codes.
- GitHub repository
- HDL project
- Extra docs
- Clock: Hz
- External hardware:

### How it works

Encodes and decodes data using Hamming 7,4 codes.

### How to test

See `src/test.py`

### IO

| # | Input                       | Output           |
|---|-----------------------------|------------------|
| 0 | word / codeword             | codeword / word  |
| 1 | word / codeword             | codeword / word  |
| 2 | word / codeword             | codeword / word  |
| 3 | word / codeword             | codeword / word  |
| 4 | none / codeword             | codeword / error |
| 5 | none / codeword             | codeword / none  |
| 6 | none / codeword             | codeword / none  |
| 7 | mode (0: encode, 1: decode) | none             |

## 69 : I2S receiver, data mix and transmitter

- Author: Clemens Nasenberg
- Description: Mix two I2S streams according to selection
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware: I2S Controller/Transmitter/Receiver

### How it works

Read in 2 Channels of I2S streams, mix them according to input bits (mute, I2S input 1 or 2 or both) and send them out again. For adding the LSB of the data is truncated. This should support standard I2S transmitter/receiver.

<https://www.sparkfun.com/datasheets/BreakoutBoards/I2SBUS.pdf>

### How to test

You need to connect the word select signal (WS), continuous serial clock (SCK) and serial data for channel 1 and 2 (SD\_CH1 SD\_CH2). Configure channel\_sel (b00 is mute, b01 channel 1, b10 channel 2 and b11 is adding ch1 and ch2), reset, then let and I2S master drive the signals according to the standard and receive data on the output SD\_OUT. Some other IC needs to act as clock master and provide the clocks and signals.

### IO

| # | Input       | Output |
|---|-------------|--------|
| 0 | sck         | wsd    |
| 1 | reset       | wsp    |
| 2 | ws          | sd_out |
| 3 | sd_ch1      | none   |
| 4 | sd_ch2      | none   |
| 5 | channel_sel | none   |
| 6 | none        | none   |
| 7 | none        | none   |

## 70 : S4GA: Super Slow Serial SRAM FPGA

### S4GA: Super Slow Serial SRAM FPGA

Datapath – N=283 K=3,4,5,6,7,8-LUTs I=2 O=7

*Tiny Tapeout 2, 2022-11-22*



Figure 14: picture

- Author: Jan Gray(TT02), Marco Merlin (TT03)
- Description: one fracturable 5-LUT that receives FPGA LUT configuration frames, serially evaluates LUT inputs and LUT outputs
- GitHub repository
- HDL project
- Extra docs
- Clock: 0 Hz
- External hardware: serial SRAM or FLASH

### How it works

The design is a single physical LUT into which an external agent pours a series of 92b LUT configuration frames, four bits per cycle. Every 23 clock cycles it evaluates a 5-input LUT. The last  $N=283$  LUT output values are kept on die to be used as LUT inputs of subsequent LUTs. The design also has 2 FPGA input pins and 7 FPGA output pins.

### How to test

tricky

## IO

| # | Input | Output |
|---|-------|--------|
| 0 | clk   | out[0] |
| 1 | rst   | out[1] |
| 2 | si[0] | out[2] |
| 3 | si[1] | out[3] |
| 4 | si[2] | out[4] |
| 5 | si[3] | out[5] |
| 6 | in[0] | out[6] |
| 7 | in[1] | debug  |

## 71 : 7 Segment Random Walk

- Author: Richard Miller
- Description: Random walk around the 7 segment display
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

Segments of the 7seg display are activated in a sequence controlled by a finite state machine. The state is represented by 14 flipflops. Each segment has two flipflops, one for each direction of the walk. Where a segment has two possible successors, a mux selects between them, controlled by the output from a linear feedback shift register acting as a random number generator. The FSM is self initialising so a reset signal isn't required: the number of true flipflops is summed, and if the total isn't exactly 1 the state will be initialised to have a single segment and direction active. A clock divider is used to slow the walk to a visible speed. This is done by gating the flipflop state transitions with a pulse generated every  $2^N$  clock cycles (where N is a 4-bit number set by the DIP switches.) The pulses are obtained by using a tree of muxes controlled by the bits of N to select one carry bit from within a 16-bit counting register.

### How to test

Use switches 5-8 (LSB is switch 8) to set a number N from 0 to 15; the clock will be divided by  $2^N$ . To test the clock, set switch 4 on and the 7seg dot will flash at half the clock speed.

### IO

| # | Input                     | Output    |
|---|---------------------------|-----------|
| 0 | clock                     | segment a |
| 1 | none                      | segment b |
| 2 | none                      | segment c |
| 3 | enable flashing dot       | segment d |
| 4 | clock divisor bit 3 (MSB) | segment e |
| 5 | clock divisor bit 2       | segment f |
| 6 | clock divisor bit 1       | segment g |
| 7 | clock divisor bit 0 (LSB) | dot       |

## 72 : Tiny binarized neural network

- Author: ReJ aka Renaldas Zioma
- Description: 8 neurons
- GitHub repository
- HDL project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

TODO

### How to test

TODO

### IO

| # | Input                                 | Output                                             |
|---|---------------------------------------|----------------------------------------------------|
| 0 | clock                                 | output neuron 1                                    |
| 1 | setup                                 | output neuron 2                                    |
| 2 | in0 / in parameter (if setup is high) | output neuron 3                                    |
| 3 | in1                                   | output neuron 4                                    |
| 4 | in2                                   | output neuron 5                                    |
| 5 | in3                                   | output neuron 6                                    |
| 6 | in4                                   | output neuron 7                                    |
| 7 | in5                                   | output neuron 8 / out parameter (if setup is high) |

## 73 : RTL Locked QTCore-A1

- Author: Luca Collini and Hammond Pearce
- Description: A RTL locked accumulator-based 8-bit microarchitecture designed via GPT-4 conversations.
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware: Any microcontroller with an SPI peripheral

### How it works

The original QTCore-A1 from Hammond Pearce is a basic accumulator-based 8-bit microarchitecture (with an emphasis on the micro). It is a Von Neumann design (shared data and instruction memory). Its original specs can be found here: <https://github.com/kiwihi/tt03-verilog-qtcoreA1>

In order to lock the design, I had to make some space so I cut off 2 (8bit) memory registers (as a 16 bit register was added to store the locking key) and the memory mapped constants. I locked the ALU with 6 bits, locking 1 operation an 3 constants, and the control unit with 10 bits, locking an 8 bit constnt used twice and 2 branches. This locking techniques are the same one first presented in ASSURE: <https://arxiv.org/abs/2010.05344>

I also locked the ISA to ALU opcodes module using a novel RTL locking technique to lock case statements. The case variable is xored with the locking key and all case constants are replaced with the results of the original value xored with the right key. In this way the case statement works as originally intended only with the right key. To implement this locking technique I used ChatGPT with the GPT4 version. It was able to modify the `isa_to_alu_opcode` module by describing the technique, the key value to use, and the original module. The structure was correct from the first response, some additional back and forth was required for it to get the right values for the case constants (apparently gpt4 struggles a bit with xor operations).

The correct key is provided to allow everyone to use this design: 1011 1111 1111 1001  
We store the key in a scan chained register. In order to have the design working, your assembly should end with this two lines:

```
15: DATA 249      ; logic locking unlock key
16: DATA 191      ; logic locking unlock key
```

## 7seg

Tiny Tapeout 3 has a 7-segment display on the board. To make it useful with the QTCore-A1, the processor helpfully includes the bit patterns stored at addresses 16-26. Address 15 contains the Address of the 0 pattern. The easiest way to make use of these is with the LDAR instruction. Here's an example snippet, which assumes a value between 0 and 9 is stored at address 13:

```
0: LDA 13 ; load the value we want to display
1: ADD 15 ; add it to the constant 15 to get
;           the 7 segment pattern offset
2: LDAR    ; load the 7 segment pattern
3: STA 14 ; emit the 7 segment pattern
```

## How to test

The processor will not run unless a program is scanned into it. Fortunately, this is easy using the SPI peripheral of an external microcontroller.

## Wiring the SPI

The QTCore-A1 is designed to be connected to an SPI peripheral along with two chip selects. This is because we use the SPI SCK as the clock for both the scan chain and the processor.

Connect the I/O according to the provided wiring table. Then, set the SPI peripheral to the following settings:

- SPI Mode 0
- 8-bit data
- MSB first
- A very slow clock (I used 1kHz)
- The scan chain chip select is active low
- The processor chip select is active low

## Processor Memory Map

| Address | Description                                         |
|---------|-----------------------------------------------------|
| 0-13    | 14 bytes of general purpose Instruction/Data memory |
| 14      | I/O: {OUT[7:1], IN[0]}                              |
| 15      | Constant value: 16 (See "7seg" note below)          |
| 16      | Constant value: Bits for 7seg "0"                   |

| Address | Description                       |
|---------|-----------------------------------|
| 17      | Constant value: Bits for 7seg "1" |
| 18      | Constant value: Bits for 7seg "2" |
| 19      | Constant value: Bits for 7seg "3" |
| 21      | Constant value: Bits for 7seg "4" |
| 22      | Constant value: Bits for 7seg "5" |
| 23      | Constant value: Bits for 7seg "6" |
| 24      | Constant value: Bits for 7seg "7" |
| 25      | Constant value: Bits for 7seg "8" |
| 26      | Constant value: Bits for 7seg "9" |
| 27-31   | Constant value: 1                 |

## Loading a program

During the scan chain mode (when the scan enable is low), the entire processor acts as a giant shift register, with the bits in the following order:

All elements of the scan chain are presented MSB first.

- scan\_chain[2:0] - 3-bit state register
- scan\_chain[7:3] - 5-bit PC
- scan\_chain[15:8] - 8-bit Instruction Register
- scan\_chain[23:16] - 8-bit Accumulator
- scan\_chain[31 :- 8] - Memory[0]
- scan\_chain[39 :- 8] - Memory[1]
- scan\_chain[47 :- 8] - Memory[2]
- scan\_chain[55 :- 8] - Memory[3]
- scan\_chain[63 :- 8] - Memory[4]
- scan\_chain[71 :- 8] - Memory[5]
- scan\_chain[79 :- 8] - Memory[6]
- scan\_chain[87 :- 8] - Memory[7]
- scan\_chain[95 :- 8] - Memory[8]
- scan\_chain[103 :- 8] - Memory[9]
- scan\_chain[111 :- 8] - Memory[10]
- scan\_chain[119 :- 8] - Memory[11]
- scan\_chain[127 :- 8] - Memory[12]
- scan\_chain[135 :- 8] - Memory[13]
- scan\_chain[143 :- 8] - Memory[14]
- scan\_chain[159 :- 16] - (Locking Key)

Example program:

```
; This program tests the btn and LEDs
```

```

; will switch between 0 and 7 pressing the button
;
; BTNs and LEDS are at address 17, btn at LSB
0: ADDI 14 ; load the btn and LEDS
1: STA 14 ; load the btn and LEDS
2: LDA 14 ; load the btn and LEDS
3: AND 12 ; mask the btn
4: BNE_BWD ; if btn&1 is not zero then branch back to 0
5: LDA 13 ; load the btn and LEDS
6: STA 14 ; load the btn and LEDS
7: LDA 14 ; load the btn and LEDS
8: AND 12 ; mask the btn
9: BEQ_BWD ; if btn&1 is zero then branch back to 3
;
; the button has now done a transition from low to high
;
10: CLR
11: JMP
;
; data
12: DATA 1;
13: DATA 126 ;
15: DATA 249      ; logic locking unlock key
16: DATA 191      ; logic locking unlock key

```

C code to load the previously-discussed example program (e.g. via the STM32 HAL) is provided:

```

//The registers are presented in reverse order to
// the table as we load them MSB first.
uint8_t program_led_btn[21] = {
    0b10111111, //MEM[16]
    0b11111001, //MEM[15]
    0b00000000, // IOREG
    0b01111110, //MEM[13]
    0b00000001, //MEM[12]
    0b11110000, //MEM[11]
    0b11111101, //MEM[10]
    0b11110011, //MEM[9]
    0b10001100, //MEM[8]
    0b00001110, //MEM[7]
    0b00101110, //MEM[6]
    0b00001101, //MEM[5]

```

```

        0b11110101, //MEM[4]
        0b10001100, //MEM[3]
        0b00001110, //MEM[2]
        0b00101110, //MEM[1]
        0b11101110, //MEM[0]
        0b00000000, //ACC
        0b00000000, //IR
        0b00000001 //PC[5bit], CU[3bit]
};

//... this will go in the SPI peripheral initialization code
hspi1.Init.Mode = SPI_MODE_MASTER;
hspi1.Init.Direction = SPI_DIRECTION_2LINES;
hspi1.Init.DataSize = SPI_DATASIZE_8BIT;
hspi1.Init.CLKPolarity = SPI_POLARITY_LOW;
hspi1.Init.CLKPhase = SPI_PHASE_1EDGE;
hspi1.Init.NSS = SPI_NSS_SOFT;
//...
hspi1.Init.FirstBit = SPI_FIRSTBIT_MSB;
hspi1.Init.TIMode = SPI_TIMODE_DISABLE;
//...

//... this will go in your main or similar
uint8_t *program = program_led_btn;
//on the first run, scan_out will give us the reset value of the
// processor
HAL_GPIO_WritePin(SPI_SCAN_CS_GPIO_Port, SPI_SCAN_CS_Pin, 0);
HAL_Delay(100);
HAL_SPI_TransmitReceive(&hspi1, program, scan_out, 21, HAL_MAX_DELAY);
HAL_Delay(100);
HAL_GPIO_WritePin(SPI_SCAN_CS_GPIO_Port, SPI_SCAN_CS_Pin, 1);
//we can check if the program loaded correctly by immediately scanning
// it back in again, it will be unloaded to scan_out
HAL_GPIO_WritePin(SPI_SCAN_CS_GPIO_Port, SPI_SCAN_CS_Pin, 0);
HAL_Delay(100);
HAL_SPI_TransmitReceive(&hspi1, program, scan_out, 21, HAL_MAX_DELAY);
HAL_Delay(100);
HAL_GPIO_WritePin(SPI_SCAN_CS_GPIO_Port, SPI_SCAN_CS_Pin, 1);
//Check if the program loaded correctly
for(int i = 0; i < 21; i++) {
    if(program[i] != scan_out[i]) {
        while(1); //it failed
}

```

```
    }  
}
```

## Running a program

Once the program is loaded (using the above code or similar), we can run a program. This is as easy as providing a clock signal to the processor and setting the processor enable line high. We do this using the SPI as follows:

```
uint8_t dummy; //a dummy value  
HAL_GPIO_WritePin(SPI_PROC_CS_GPIO_Port, SPI_PROC_CS_Pin, 0);  
//...  
//run this in a loop  
HAL_SPI_TransmitReceive(&hspi1, &dummy, &dummy, 1, HAL_MAX_DELAY);
```

The processor will ignore any value being shifted in on the MOSI data line during operation. However, it does provide a nice feature in that the processor will emit the current value of the processor\_halt signal on the MISO line. This means that we can improve the code to run this process in a loop and catch when the program reaches a HLT instruction:

```
HAL_Delay(100);  
HAL_GPIO_WritePin(SPI_PROC_CS_GPIO_Port, SPI_PROC_CS_Pin, 0);  
dummy = 0;  
while(1) {  
    HAL_SPI_TransmitReceive(&hspi1, &dummy, &dummy, 1, HAL_MAX_DELAY);  
    if(dummy == 0xFF)  
        break;  
}  
HAL_GPIO_WritePin(SPI_PROC_CS_GPIO_Port, SPI_PROC_CS_Pin, 1);  
HAL_Delay(100);
```

Of course, the example program does not HLT, so we will not reach this point in this code. But, it works for other programs that do contain a HLT instruction.

Once you have the provided example program running, you will be able to press the button and see how the LEDs are toggled.

## IO

| # | Input                                                                                          | Output                                                                                           |
|---|------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|
| 0 | clock - connect to an SPI SCK.                                                                 | general purpose output 0<br>(e.g. LED segment a). This output comes from the I/O register bit 1. |
| 1 | reset (active high)                                                                            | general purpose output 1<br>(e.g. LED segment b). This output comes from the I/O register bit 2. |
| 2 | scan enable (active low) - connect to an SPI chip select.                                      | general purpose output 2<br>(e.g. LED segment c). This output comes from the I/O register bit 3. |
| 3 | processor enable (active low) - connect to an SPI chip select.                                 | general purpose output 3<br>(e.g. LED segment d). This output comes from the I/O register bit 4. |
| 4 | scan data in - connect to an SPI MOSI.                                                         | general purpose output 4<br>(e.g. LED segment e). This output comes from the I/O register bit 5. |
| 5 | general purpose input<br>(e.g. Button). This input will be provided to the I/O register bit 0. | general purpose output 5<br>(e.g. LED segment f). This output comes from the I/O register bit 6. |
| 6 | none                                                                                           | general purpose output 6<br>(e.g. LED segment g). This output comes from the I/O register bit 7. |
| 7 | none                                                                                           | scan data out - connect to an SPI MISO.                                                          |

## 74 : Arbiter Game



Figure 15: picture

- Author: Martin A. Heredia
- Description: This is a game for 2 players. After a countdown, the player who press his/her button first wins. Countdown and winner should be displayed in output leds.
- GitHub repository
- HDL project
- Extra docs
- Clock: 1000 Hz
- External hardware: 2 active low pushbuttons for players + 1 active low pushbutton for reset. 4 LEDs connected to outputs.

### How it works

This is a 2 players game where every player has an active low pushbutton assigned (`req1_in` and `req2_in` inputs to top level). After applying reset to the circuit, a countdown will be displayed at the output ports (ideally connected to some LEDs). Then, when the last LED stops toggling, the players will press the buttons. The first player to press the button wins and this will be displayed in output LEDs.

## How to test

- Connect 2 active low pushbuttons to req1\_in and req2\_in (io\_in[2] and io\_in[3] respectively).
- Connect 4 LEDs to leds\_out[3:0] (io\_out[3:0]).
- Apply 1 KHz clock at clk port (io\_in[0]).
- Apply an active low reset (io\_in[1]) to start the game.

## IO

| # | Input                 | Output                  |
|---|-----------------------|-------------------------|
| 0 | clk                   | leds_out[0] / segment a |
| 1 | rst_in_n (active low) | leds_out[1] / segment b |
| 2 | req1_in (active low)  | leds_out[2] / segment c |
| 3 | req2_in (active low)  | leds_out[3] / segment d |
| 4 | none                  | none                    |
| 5 | none                  | none                    |
| 6 | none                  | none                    |
| 7 | none                  | none                    |

## 75 : Frequency Divider

- Author: Tanish Khanchandani
- Description: Can enter a binary number and the clock is divided by  $2^{\text{number entered}}$
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

Enter a binary number using Inputs 2-5 and the clock on output 8 should be divided.

### How to test

Enter a binary number and probe output 8 to see the clock frequency get divided.

### IO

| # | Input          | Output            |
|---|----------------|-------------------|
| 0 | clock          | clock             |
| 1 | Binary Input 1 | none              |
| 2 | Binary Input 2 | none              |
| 3 | Binary Input 3 | none              |
| 4 | Binary Input 4 | none              |
| 5 | none           | none              |
| 6 | none           | none              |
| 7 | none           | Divided Frequency |

## 76 : FullAdderusing4is1

- Author: Marushika Suri , Siya Sharma , Rudakshi Arora
- Description: This project is used to add 3-bit binary numbers
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 1 Hz
- External hardware:

### How it works

Explain how your project works : A full adder is a digital circuit that performs the addition of three binary digits: two inputs and a carry bit from a previous stage. The output of a full adder is the sum of the three inputs and a carry bit for the next stage. A 4:1 multiplexer (mux) is a digital circuit that selects one of four input signals to pass through to the output based on the value of two select inputs. Here, we are implementing a full adder using two 4:1 mux. We have taken 3 inputs:A,B,Cin. The output of first mux(i.e.,Sum function) is on OUT0 Pin and the output of second mux(Carry) is connected to OUT1 Pin, connected with a seven-segment display. The use of multiplexers simplifies the implementation of the full adder circuit.

### How to test

Explain how to test your project : At the input, we have used 4 pins. Pin IN0 is for the clock. Pin IN1, IN2, IN3 for A, B,Cin respectively. Here, we will set the clock at “1”, we’ll provide sum and carry at the OUT0 and OUT1 pins of the output respectively. Once we start the simulation, we can add two 3-bit binary numbers and the output will be displayed at the 7-segment display.

### IO

| # | Input                            | Output                                                     |
|---|----------------------------------|------------------------------------------------------------|
| 0 | in1(Connected to A input)        | segment a(Connected to OUT0 performing the Sum function)   |
| 1 | in2(Connected to Select line S1) | segment b(Connected to OUT1 performing the Carry function) |
| 2 | in3(Connected to Select line S2) | segment c                                                  |
| 3 | none                             | segment d                                                  |
| 4 | none                             | segment e                                                  |
| 5 | none                             | segment f                                                  |

| # | Input | Output    |
|---|-------|-----------|
| 6 | none  | segment g |
| 7 | none  | dot       |

## 77 : BCDtoDECIMAL

- Author: Giresh and Aditya
- Description: The objective of this project is to create a circuit that can convert Binary Coded Decimal (BCD) numbers to their decimal equivalents. This will enable users to easily convert BCD numbers, which are commonly used in electronic devices, into the decimal format that is more commonly understood.
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

Explain how your project works : A BCD to decimal generator on a seven-segment display works by converting Binary Coded Decimal (BCD) input signals into the corresponding decimal digits that can be displayed on a seven-segment display. The BCD input consists of four binary digits, where each digit corresponds to a decimal number between 0 and 9. These four digits are converted into their corresponding decimal values by using a decoder circuit. The decoder circuit uses logic gates to decode the BCD input signals into the corresponding outputs for each of the seven segments in the seven-segment display. Each segment is either turned on or off, depending on the decimal value of the BCD input.

### How to test

Explain how to test your project : To test this project we have connected the output pins of this circuit to the seven segment pins. To display the decimal digit on the seven-segment display, the corresponding segments are turned on or off, depending on the decimal value of the BCD input. For example, if the BCD input is 0010, the decoder circuit will turn on segment A, segment F, and segment G to display the decimal digit 2 on the seven-segment display.

### IO

| # | Input                     | Output                             |
|---|---------------------------|------------------------------------|
| 0 | {'B': 'connected to IN0'} | {'segment a': 'connected to OUT0'} |
| 1 | {'C': 'connected to IN1'} | {'segment b': 'connected to OUT1'} |
| 2 | {'D': 'connected to IN2'} | {'segment c': 'connected to OUT2'} |
| 3 | {'E': 'connected to IN3'} | {'segment d': 'connected to OUT3'} |

| # | Input | Output                             |
|---|-------|------------------------------------|
| 4 | none  | {'segment e': 'connected to OUT4'} |
| 5 | none  | {'segment f': 'connected to OUT5'} |
| 6 | none  | {'segment g': 'connected to OUT6'} |
| 7 | none  | dot                                |

## 78 : AI Decelerator

- Author: Hunter Scott
- Description: A 2x2 matrix multiplier, guaranteed to slow down your AI model training.
- GitHub repository
- HDL project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

Just performs arithmetic to do the matrix multiplication and outputs the result.

### How to test

Use the DIP switches to set the value of each position in the two matrices you're multiplying. That means that each position in each input matrix can only contain a zero or one. The output is 4 groups of 2 bits, each group representing one of the positions in the output matrix.

### IO

| # | Input                       | Output                                  |
|---|-----------------------------|-----------------------------------------|
| 0 | 7 is Matrix 1, top left     | 7 is Result matrix, top left, bit 1     |
| 1 | 6 is Matrix 1, top right    | 6 is Result matrix, top left, bit 0     |
| 2 | 5 is Matrix 1, bottom left  | 5 is Result matrix, top right, bit 1    |
| 3 | 4 is Matrix 1, bottom right | 4 is Result matrix, top right, bit 0    |
| 4 | 3 is Matrix 2, top left     | 3 is Result matrix, bottom left, bit 1  |
| 5 | 2 is Matrix 2, top right    | 2 is Result matrix, bottom left, bit 0  |
| 6 | 1 is Matrix 2, bottom left  | 1 is Result matrix, bottom right, bit 1 |
| 7 | 0 is Matrix 2, bottom right | 0 is Result matrix, bottom right, bit 0 |

## 79 : 3BitParallelAdder

- Author: Anish Paul , Pancham Mittal , Ramandeep
- Description: This project is used to Add and Subtract 3-bit binary inputs
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

Explain how your project works : A 3-bit binary parallel adder-subtractor is a digital circuit that can perform both addition and subtraction of two 3-bit binary numbers in parallel. The circuit has two main inputs, which are the two 3-bit binary numbers to be added or subtracted. These inputs are represented by A2, A1, A0, and B2, B1, B0. The circuit also has a carry-in (Cin) input, which is used during addition. To understand how the 3-bit binary parallel adder-subtractor works, let's first look at how addition works in binary. In binary addition, each bit in the two binary numbers is added along with the carry from the previous addition. The carry from the current addition is then passed to the next bit addition. In our circuit the inputs IN1,IN2 and IN3 is represented as the inputs A0(Least significant bit),A1 and A2(Most significant bit). IN4,IN5,IN6 are represented as inputs B0(Least significant bit),B1 and B2(Most significant bit). IN7 is given as Cin. To make the circuit work as an adder the Carry in(Cin) is given logic 0. Similarly, the circuit can work as a 2's complement subtractor, to implement this we have connected the Cin to inputs B2,B1 B0 with an XOR gate. When the Carry in input (Cin) is given Logic 1 the XOR gate will give the 2's Complement of B. Then the result of addition between A and B will be the difference between them.

### How to test

Explain how to test your project : For testing the project , we have connected the output pins of the circuit to a seven segment display. For this we have connected the S0 (least significant bit) pin(OUT0) to the seven segment pin A, S1 pin(OUT1) to seven segment pin B, S3 (Most significant bit) pin (OUT2) to seven segment pin C and the Carry Out pin(OUT3) to seven segment pin D . The Seven Segment Display will glow in accordance to the Output on the respective pins.

### IO

| # | Input    | Output           |
|---|----------|------------------|
| 0 | clock    | segment a - S0   |
| 1 | reset    | segment b - S1   |
| 2 | IN1 - A0 | segment c - S2   |
| 3 | IN2 - A1 | segment d - Cout |
| 4 | IN3 - A2 | segment e        |
| 5 | IN4 - B0 | segment f        |
| 6 | IN5 - B1 | segment g        |
| 7 | IN6 - B2 | dot              |

## 80 : Asynchronous 3-Bit Down Counter

- Author: Dikshant, Mohit, Sanidhya
- Description: This Project works as a down counter which counts from 7 to 0
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

See The Wokwi gate and simulation. \* The sequence in which the counter works is as follows : 7: 0111 3: 0011 5: 0101 1: 0001 6: 0110 2: 0010 4: 0100 0: 0000 Here's how the counter works : first, all the three flip flops are set to 1 then for changing the inputs i have used the switch 1 of the Dip switch. \* When the count signal is triggered, the first flip-flop (least significant bit) toggles from 1 to 0, which produces the binary number 0111 (decimal 7). \* The second count signal should not trigger a change in the counter value, as the next count is 3, not 6. \* The third count signal triggers a change in the counter value, where the second flip-flop toggles from 1 to 0, which produces the binary number 0101 (decimal 5). \* The fourth count signal should not trigger a change in the counter value, as the next count is 1, not 4. \* The fifth count signal triggers a change in the counter value, where the first flip-flop toggles from 0 to 1, and the third flip-flop toggles from 0 to 1, which produces the binary number 0110 (decimal 6). \* The sixth count signal should not trigger a change in the counter value, as the next count is 2, not 5. \* The seventh count signal triggers a change in the counter value, where the second flip-flop toggles from 0 to 1, which produces the binary number 0100 (decimal 4). \* The eighth count signal triggers a change in the counter value, where all three flip-flops toggle from 0 to 1, which produces the binary number 0000 (decimal 0). \* The counter stays at 000 until the count signal is triggered again, which causes all three flip-flops to be set back to 1, and the counting process starts again from the beginning of the sequence. In summary, the 3-bit asynchronous down counter counts down in the sequence 7, 3, 5, 1, 6, 2, 4, 0 using three flip-flops and asynchronous inputs to trigger each flip-flop in the sequence.

### How to test

For Testing this project, the user has to use the switch 1 of the dip switch to give different inputs and then get the desired output on the 7-segment display.

### IO

| # | Input      | Output           |
|---|------------|------------------|
| 0 | clock      | segment a - OUT0 |
| 1 | reset      | segment b - OUT1 |
| 2 | I0 - Input | segment c - OUT2 |
| 3 | I1 - None  | segment d - OUT3 |
| 4 | I2 - None  | segment e - OUT4 |
| 5 | I3 - None  | segment f - OUT5 |
| 6 | I4 - None  | segment g - OUT6 |
| 7 | Unused     | dot - None       |

## 81 : Ring oscillator with skew correction

- Author: Daniel Wisehart
- Description: Fixing the problem of skew in a ring oscillator with a differential clock ring.
- GitHub repository
- HDL project
- Extra docs
- Clock: any Hz
- External hardware:

### How it works

Explain how your project works

### How to test

Explain how to test your project

### IO

| # | Input     | Output    |
|---|-----------|-----------|
| 0 | clock     | output[7] |
| 1 | reset     | output[6] |
| 2 | select[5] | output[5] |
| 3 | select[4] | output[4] |
| 4 | select[3] | output[3] |
| 5 | select[2] | output[2] |
| 6 | select[1] | output[1] |
| 7 | select[0] | output[0] |

## 82 : 3 bit multiplier

- Author: Ananya
- Description: 3 bit multiplier
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

Explain how your project works :- A  $3 \times 3$  binary multiplier is one of the combinational logic circuits, which can perform binary multiplication of two binary numbers having a bit size of a maximum of 3 bits. The bit size of the resultant output binary product is 6. Consider the multiplicand  $A_0, A_1, A_2$  and the multiplier  $B_0, B_1, B_2$ , and the final binary product output as  $P_0, P_1, P_2$ . The 3 partial product terms are obtained in the binary multiplication because it is a 3-bit multiplier. This  $3 \times 3$  multiplier can be implemented using a 3-bit full adder and individual single-bit adders. The carry bit is raised when  $A_2B_0$  and  $A_1B_1$  are added together. By the addition of the sum obtained from that, the carry bit is obtained from the addition of  $A_0B_1$  and  $A_0B_2$  to  $A_1B_0$ , which can raise another carry bit. Hence 2 carry bits are obtained and carried over for the addition of  $A_2B_1$  and  $A_1B_2$  and the 2 more carry bits are generated in the same way. The circuit is designed with 3-bit full adders to add the 3 partial product terms. The least significant bit (LSB) of the 1st partial product is not added to the next partial product because it is taken as an LSB of the final binary product output obtained. From the above logical circuit, one 3-bit full adder is used to add the first 2 partial products together and the other 3-bit full adder adds the 3rd partial product with the sum of the first adder.

### How to test

Explain how to test your project :- For testing a 3bit multiplier put 3bit 2 binary numbers at the input and cheak the outputs coming at the 7 segmented display.

### IO

| # | Input                     | Output                             |
|---|---------------------------|------------------------------------|
| 0 | {'B': 'connected to IN1'} | {'segment a': 'connected to OUT0'} |
| 1 | {'C': 'connected to IN2'} | {'segment b': 'connected to OUT1'} |
| 2 | {'D': 'connected to IN3'} | {'segment c': 'connected to OUT2'} |

| # | Input                     | Output                             |
|---|---------------------------|------------------------------------|
| 3 | {'E': 'connected to IN4'} | {'segment d': 'connected to OUT3'} |
| 4 | {'F': 'connected to IN5'} | {'segment e': 'connected to OUT4'} |
| 5 | {'G': 'connected to IN6'} | {'segment f': 'connected to OUT5'} |
| 6 | none                      | {'segment g': 'connected to OUT6'} |
| 7 | none                      | dot                                |

## 83 : Tiny Teeth Toothbrush Timer

- Author: Noah Hoffman
- Description: A simple 2 minute timer that gives visual feedback on the seven-segment display for each 30-second increment that passes. This indicates when to move to the next quarter of the mouth. Each 30 second increment can be blocked by an input switch so the user has to acknowledge they are done with that section before moving forward. At the end of 2 minutes, all segments of the display will flash
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 10000 Hz
- External hardware:

### How it works

This project expects a 10KHz clock. It divides it by  $2^{15}$  to get about 0.305Hz clock with a period of about 3.27 seconds. This time is reflected by the blinking of the decimal on the seven segment dispaly. After 10 clock cycles, the first input goes high to signify the first 30-second increment is done and the top-right segment of the display will turn on. After the next 30-second increment, the top left segment will turn on, then the bottom right, and finally the bottom left. Once they are all lit, the 3 segments in the middle will flash to indicate it is done. Each of the 4 increments can be blocked by an enable switch for each one. If desired, a physical input can be used to ensure the timer does not continue to the next section until the user is done the current task.

### How to test

The timing is dependent on a 10KHz clock. From a hard reset, the first increment may be less than 30 seconds, so it is best to start with at least the “reset first increment” switch off (input 3) or the “enable first increment” switch off (input 7). With the other inputs on (1,2,4,5,6), one should be able to time each segment lighting up after 30 seconds and the 3 middle segments flashing at 2 minutes. If the decimal is not flashing or is flashing at a different period than 3.27 seconds, that indicates either the clock is off (input 1 to connect the clock or 2 to enable the clock dividing) or something other than a 10KHz clock was applied. If one wants to try the gated incremental mode, make sure inputs 4,5,6 start off. As one sees the timer indicate it is time to move on, the user can toggle that section’s switch. Each section is sequentially dependent on the next, so if a user want to go back to increment 2, the timer will start from 2 and then 3 and then 4.

## IO

| # | Input                   | Output                           |
|---|-------------------------|----------------------------------|
| 0 | clock                   | segment a- complete              |
| 1 | reset clock             | segment b- first increment done  |
| 2 | reset first increment   | segment c- third increment done  |
| 3 | enable fourth increment | segment d- complete              |
| 4 | enable third increment  | segment e- fourth increment done |
| 5 | enable second increment | segment f- second increment done |
| 6 | enable first increment  | segment g- complete              |
| 7 | none                    | dot- clock (3.27 seconds)        |

## 84 : 2's Compliment Subtractor

- Author: Naman Garg and Aryan Chaudhary
- Description: Subtracts two 4-bit binary numbers using 2's complement subtraction method
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware:

### How it works

Explain how your project work “It uses 4 full adder circuits made by using XOR and AND gates.”

### How to test

Explain how to test your project “enter two 4 bit binary numbers in the inputs to try it out.”

### IO

| # | Input | Output    |
|---|-------|-----------|
| 0 | A0    | segment a |
| 1 | A1    | segment b |
| 2 | A2    | segment c |
| 3 | A3    | segment d |
| 4 | B0    | segment e |
| 5 | B1    | segment f |
| 6 | B2    | segment g |
| 7 | B3    | dot       |

## 85 : Customizable Padlock

- Author: Tiny Tapeout 02 (J. Rosenthal)
- Description: This design implements a customizable padlock. Set a code for your digital safe!
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 0 Hz
- External hardware: None

### How it works

Switch 2 is used to reset the safe. Switch 8 is used to set your code (on = set, off = locked). Switches 3 to 5 are used to set the code. The push button is used to enter your code.

### How to test

Set your desired code using Switches 3 to 5. Once you've done so, toggle Switch 8 to on then back off—the safe is now set! Turn on Switch 2, and press the push button. The seven segment display should show “L” (for locked). Next turn off Switch 2 to begin entering codes. If you enter a correct code, the seven segment display should show “U” (for unlocked).

### IO

| # | Input    | Output    |
|---|----------|-----------|
| 0 | N/A      | segment a |
| 1 | Reset    | segment b |
| 2 | Code 0   | segment c |
| 3 | Code 1   | segment d |
| 4 | Code 2   | segment e |
| 5 | N/A      | segment f |
| 6 | N/A      | segment g |
| 7 | Set Code | none      |

## 86 : Customizable UART String

- Author: Tiny Tapeout 02 (J. Rosenthal)
- Description: This design Supports sending multiple ASCII characters over UART.
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 300 Hz
- External hardware: Arduino, computer with serial monitor connected to the Arduino

### How it works

This circuit implements five shift registers with 21 bits: seven idle bits, one start bit, eight data bits, one stop bit, and four more idle bits. The circuit supports transmitting a string of ASCII characters

### How to test

Connect an Arduino serial RX pin to the eight output pin (Output[7]). In the Arduino code, set the serial baud rate Serial.begin(); in the \*.ino file to 300. Set the PCB clock frequency to 300 Hz as well. Set the slide switch to the clock. Set SW7 to OFF ('Load'). Set SW8 to ON ('Output Enable'). Set SW7 to ON ('TX').

### IO

| # | Input         | Output                    |
|---|---------------|---------------------------|
| 0 | clock         | segment a (Output Enable) |
| 1 | N/A           | segment b (Load/TX)       |
| 2 | N/A           | segment c                 |
| 3 | N/A           | segment d                 |
| 4 | N/A           | segment e                 |
| 5 | N/A           | segment f                 |
| 6 | Load/TX       | segment g                 |
| 7 | Output Enable | UART Serial Out           |

## 87 : Customizable UART Character

- Author: Tiny Tapeout 02 (J. Rosenthal)
- Description: This design implements a single character UART transmitter using registers made from D-flip flops and multiplexers.
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 300 Hz
- External hardware: Arduino, computer with serial monitor connected to the Arduino

### How it works

This circuit implements a shift register with 17 bits: four idle bits, one start bit, eight data bits, one stop bit, and three more idle bits. The circuit supports transmitting a user-selected ASCII character from 0x40 (@) to 0x5F (\_), including capital letters from the Latin alphabet.

### How to test

Connect an Arduino serial RX pin to the eight output pin (Output[7]). In the Arduino code, set the serial baud rate Serial.begin(); in the \*.ino file to 300. Set the PCB clock frequency to 300 Hz as well. Set SW7 to OFF (“Load”). Set SW2 to ON and SW3-6 to OFF. Set SW7 to ON (“TX”). Set SW8 to ON (“Output Enable”). Connect the Arduino via USB to your computer and run the serial monitor. If there’s no output from the Arduino serial monitor, try toggling SW7 OFF and ON again. You should see the character ‘A’ appearing repeatedly in the serial monitor.

### IO

| # | Input         | Output                    |
|---|---------------|---------------------------|
| 0 | clock         | segment a (Load/TX)       |
| 1 | Bit 0         | segment b                 |
| 2 | Bit 1         | segment c                 |
| 3 | Bit 2         | segment d                 |
| 4 | Bit 3         | segment e                 |
| 5 | Bit 4         | segment f (Output Enable) |
| 6 | Load/TX       | segment g                 |
| 7 | Output Enable | UART Serial Out           |

## 88 : 7-Seg ‘Tiny Tapeout’ Display

- Author: Tiny Tapeout 02 (J. Rosenthal)
- Description: This circuit will output a string of characters ('tiny tapeout') to the 7-segment display.
- GitHub repository
- Wokwi project
- Extra docs
- Clock: 1 Hz
- External hardware: None

### How it works

The logic to light the characters appears in the bottom half of the simulation window. The top half of the simulation window implements a modulo-11 counter. In other words, the counter increments up to 11 then resets. This counter is used to determine which character we should output to the 7-segment display. The truth table for the design can be found in the Design Spreadsheet: [https://docs.google.com/spreadsheets/d/1-h9pBYtuxv6su2EC8qBc6nX\\_JqHXks6Gx5nmHFQh\\_30/edit](https://docs.google.com/spreadsheets/d/1-h9pBYtuxv6su2EC8qBc6nX_JqHXks6Gx5nmHFQh_30/edit)

### How to test

Simply turn on and watch the characters on the 7-segment display. If needed, flip Input[1] (zero-indexed) ON to reset the state machine counter.

### IO

| # | Input                     | Output    |
|---|---------------------------|-----------|
| 0 | clock                     | segment a |
| 1 | Reset Counter             | segment b |
| 2 | N/A                       | segment c |
| 3 | Clock Disable (Test Mode) | segment d |
| 4 | Test Logic A              | segment e |
| 5 | Test Logic B              | segment f |
| 6 | Test Logic C              | segment g |
| 7 | Test Logic D              | N/A       |

# Technical info

## Scan chain

All 250 designs are joined together in a long chain similar to JTAG. We provide the inputs and outputs of that chain (see pinout below) externally, to the Caravel logic analyser, and to an internal scan chain driver.

The default is to use an external driver, this is in case anything goes wrong with the Caravel logic analyser or the internal driver.

The scan chain is identical for each little project, and you can read it here.



Figure 16: block diagram

## Updating inputs and outputs of a specified design

A good way to see how this works is to read the FSM in the scan controller. You can also run one of the simple tests and check the waveforms. See how in the scan chain

verification doc.

- Signal names are from the perspective of the scan chain driver.
- The desired project shall be called DUT (design under test)

Assuming you want to update DUT at position 2 (0 indexed) with inputs = 0x02 and then fetch the output. This design connects an inverter between each input and output.

- Set scan\_select low so that the data is clocked into the scan flops (rather than from the design)
- For the next 8 clocks, set scan\_data\_out to 0, 0, 0, 0, 0, 0, 1, 0
- Toggle scan\_clk\_out 16 times to deliver the data to the DUT
- Toggle scan\_latch\_en to deliver the data from the scan chain to the DUT
- Set scan\_select high to set the scan flop's input to be from the DUT
- Toggle the scan\_clk\_out to capture the DUT's data into the scan chain
- Toggle the scan\_clk\_out another 8 x number of remaining designs to receive the data at scan\_data\_in



Figure 17: update cycle

#### *Notes on understanding the trace*

- There are large wait times between the latch and scan signals to ensure no hold violations across the whole chain. For the internal scan controller, these can be configured (see section on wait states below).
- The input looks wrong (0x03) because the input is incremented by the test bench as soon as the scan controller captures the data. The input is actually 0x02.
- The output in the trace looks wrong (0xFE) because it's updated after a full refresh, the output is 0xFD.

## Clocking

Assuming:

- 100MHz input clock
- 8 ins & 8 outs
- 2 clock cycles to push one bit through the scan chain (scan clock is half input clock rate)
- 250 designs
- scan controller can do a read/write cycle in one refresh

So the max refresh rate is  $100\text{MHz} / (8 * 2 * 250) = 25000\text{Hz}$ .

## Clock divider

A rising edge on the `set_clk_div` input will capture what is set on the input pins and use this as a divider for an internal slow clock that can be provided to the first input bit.

The slow clock is only enabled if the `set_clk_div` is set, and the resulting clock is connected to `input0` and also output on the `slow_clk` pin.

The slow clock is synced with the scan rate. A divider of 0 mean it toggles the `input0` every scan. Divider of 1 toggles it every 2 cycles. So the resultant slow clock frequency is  $\text{scan\_rate} / (2 * (N+1))$ .

See the `test_clock_div` test in the scan chain verification.

## Wait states

This dictates how many wait cycle we insert in various state of the load process. We have a sane default, but also allow override externally.

To override, set the wait amount on the inputs, set the `driver_sel` inputs both high, and then reset the chip.

See the `test_wait_state` test in the scan chain verification.

## Pinout

| PIN   | NAME                       | DESCRIPTION                                     |
|-------|----------------------------|-------------------------------------------------|
| 20:12 | <code>active_select</code> | 9 bit input to set which design is active       |
| 28:21 | <code>inputs</code>        | 8 inputs                                        |
| 36:29 | <code>outputs</code>       | 8 outputs                                       |
| 37    | <code>ready</code>         | goes high for one cycle everytime the scanchain |
| 10    | <code>slow_clk</code>      | slow clock from internal clock divider          |
| 11    | <code>set_clk_div</code>   | enable clock divider                            |

```

9:8      driver_sel          which scan chain driver: 00 = external, 01 =
21      ext_scan_clk_out    for external driver, clk input
22      ext_scan_data_out   data input
23      ext_scan_select     scan select
24      ext_scan_latch_en   latch
29      ext_scan_clk_in    clk output from end of chain
30      ext_scan_data_in   data output from end of chain

```

## Instructions to build GDS

To run the tool locally or have a fork's GitHub action work, you need the GH\_USERNAME and GH\_TOKEN set in your environment.

GH\_USERNAME should be set to your GitHub username.

To generate your GH\_TOKEN go to <https://github.com/settings/tokens/new>. Set the checkboxes for repo and workflow.

To run locally, make a file like this:

```
export GH_USERNAME=<username>
export GH_TOKEN=<token>
```

And then source it before running the tool.

## Fetch all the projects

This goes through all the projects in project\_urls.py, and fetches the latest artifact zip from GitHub. It takes the verilog, the GL verilog, and the GDS and copies them to the correct place.

```
./configure.py --clone-all --fetch-gds
```

## Configure Caravel

Caravel needs the list of macros, how power is connected, instantiation of all the projects etc. This command builds these configs and also makes the README.md index.

```
./configure.py --update-caravel
```

## Build the GDS

To build the GDS and run the simulations, you will need to install the Sky130 PDK and OpenLane tool. It takes about 5 minutes and needs about 3GB of disk space.

```
export PDK_ROOT=<some dir>/pdk
export OPENLANE_ROOT=<some dir>/openlane
cd <the root of this repo>
make setup
```

Then to create the GDS:

```
make user_project_wrapper
```

## Changing macro block size

After working out what size you want:

- adjust configure.py in CaravelConfig.create\_macro\_config().
- adjust the PDN spacing to match in openlane/user\_project\_wrapper/config.tcl:
  - set ::env(FP\_PDN\_HPITCH)
  - set ::env(FP\_PDN\_HOFFSET)

# Verification

We are not trying to verify every single design. That is up to the person who makes it. What we want is to ensure that every design is accessible, even if some designs are broken.

We can split the verification effort into functional testing (simulation), static tests (formal verification), timing tests (STA) and physical tests (LVS & DRC).

See the sections below for details on each type of verification.

## Setup

You will need the GitHub tokens setup as described in INFO.

The default of 250 projects takes a very long time to simulate, so I advise overriding the configuration:

```
# fetch the test projects
./configure.py --test --clone-all
# rebuild config with only 20 projects
./configure.py --test --update-caravel --limit 20
```

You will also need iVerilog & cocotb. The easiest way to install these are to download and install the oss-cad-suite.

## Simulations

- Simulation of some test projects at RTL and GL level.
- Simulation of the whole chip with scan controller, external controller, logic analyser.
- Check wait state setting.
- Check clock divider setting.

## Scan controller

This test only instantiates user\_project\_wrapper (which contains all the small projects). It doesn't simulate the rest of the ASIC.

```
cd verilog/dv/scan_controller
make test_scan_controller
```

The Gate Level simulation requires scan\_controller and user\_project\_wrapper to be re-hardened to get the correct gate level netlists:

- Edit openlane/scan\_controller/config.tcl and change NUM\_DESIGNS=250 to NUM\_DESIGNS=20.
- Then from the top level directory:  
`make scan_controller make user_project_wrapper`
- Then run the GL test  
`cd verilog/dv/scan_controller make test_scan_controller_gl`

## **single**

Just check one inverter module. Mainly for easy understanding of the traces.

```
make test_single
```

## **custom wait state**

Just check one inverter module. Set a custom wait state value.

```
make test_wait_state
```

## **clock divider**

Test one inverter module with an automatically generated clock on input 0. Sets the clock rate to 1/2 of the scan refresh rate.

```
make test_clock_div
```

## **Top level tests setup**

For all the top level tests, you will also need a RISCV compiler to build the firmware.

You will also need to install the ‘management core’ for the Caravel ASIC submission wrapper. This is done automatically by following the PDK install instructions.

## **Top level test: internal control**

Uses the scan controller, instantiated inside the whole chip.

```
cd verilog/dv/scan_controller_int
make coco_test
```

## **Top level test: external control**

Uses external signals to control the scan chain. Simulates the whole chip.

```
cd verilog/dv/scan_controller_ext  
make coco_test
```

## **Top level test: logic analyser control**

Uses the RISCV co-processor to drive the scanchain with firmware. Simulates the whole chip.

```
cd verilog/dv/scan_controller_la  
make coco_test
```

## **Formal Verification**

- Formal verification that each small project's scan chain is correct.
- Formal verification that the correct signals are passed through for the 3 different scan chain control modes.

## **Scan chain**

Each GL netlist for each small project is proven to be equivalent to the reference scan chain implementation. The verification is done on the GL netlist, so an RTL version of the cells used needed to be created. See here for more info.

## **Scan controller MUX**

In case the internal scan controller doesn't work, we also have ability to control the chain from external pins or the Caravel Logic Analyser. We implement a simple MUX to achieve this and formally prove it is correct.

## **Timing constraints**

Due to limitations in OpenLane - a top level timing analysis is not possible. This would allow us to detect setup and hold violations in the scan chain.

Instead, we design the chain and the timing constraints for each project and the scan controller with this in mind.

- Each small project has a negedge flop flop at the end of the shift register to reclock the data. This gives more hold margin.

- Each small project has SDC timing constraints
- Scan controller uses a shift register clocked with the end of the chain to ensure correct data is captured.
- Scan controller has its own SDC timing constraints
- Scan controller can be configured to wait for a programmable time at latching data into the design and capturing it from the design.
- External pins (by default) control the scan chain.

## Physical tests

- LVS
- DRC
- CVC

### LVS

Each project is built with OpenLane, which will check LVS for each small project. Then when we combine all the projects together we run a top level LVS & DRC for routing, power supply and macro placement.

The extracted netlist from the GDS is what is used in the formal scan chain proof.

### DRC

DRC is checked by OpenLane for each small project, and then again at the top level when we combine all the projects.

### CVC

Mitch Bailey' CVC checker is a device level static verification system for quickly and easily detecting common circuit errors in CDL (Circuit Definition Language) netlists. We ran the test on the final design and found no errors.

- See the paper here.
- Github repo for the tool: <https://github.com/d-m-bailey/cvc>

# Toplevel STA and Spice analysis for TinyTapeout-02 and 03.

Contributed by J. Birch 22 March 2023

## Introduction

TinyTapeout is built using the OpenLane flow and consists of three major components:

- 1) up to 250 user designs with a common interface, each of which has been composed using the OpenLane flow,
- 2) a scanchain block that is instanced for each user design to provide input data and to sink output data,
- 3) a scan chain controller.

The OpenLane flow allows for timing analysis of each of these individual blocks but does not provide for analysis of the assembly of them, which leaves potential holes in timing that could cause failures.

This work allows a single design to be assembled out of the sub-blocks and this can then be submitted to STA for analysis.

In addition, an example critical path (for the clock that is passed along the scanchain) is created using a Python script so that it can be run through SPICE.

## GitHub action

The STA github action automatically runs the STA when the repository is updated.

## Prerequisites

To run STA, the following files need to be available:

- 1) a gate level verilog netlist of the top level design (`verilog/rtl/user_project_wrapper.v`)
- 2) a gate level verilog netlist for each sub-block (`verilog/rtl/`)
- 3) a SPEF format parasitics file for the top level wiring (`spef/`)
- 4) a SPEF format parasitics file for each sub-block (`spef/`)
- 5) the relevant SkyWater libraries for STA analysis - installed with the PDK
- 6) OpenSTA 2.4.0 (or later) needs to be on the PATH. `rundocker.sh` shows how to use STA included in the OpenLane docker
- 7) python3.9 or later
- 8) verilog parser (`pip3 install verilog-parser`)

- 9) a constraints file (sta\_top/top.sdc)

## Assumptions

- all of the Verilog files are in a single directory
- all of the SPEF files are in a single directory

## Issues

The Verilog parser has two bugs: it does not recognise ‘inout’ and it does not cope with escaped names (starting with a ”). Locally this has been fixed but to use the off-the-shelf parser we preprocess the file to change inout to input and remove the escapes and substitute the [nnn] with *nnn* in the names

## Invoking STA analysis

```
sta_top/toplevel_sta.py <path to verilog> <path to spef> <sdc>
```

eg:

```
sta_top/toplevel_sta.py ./verilog/gl/user_project_wrapper.v  
./spef/user_project_wrapper.spef ./sta_top/top.sdc > sta.log
```

Alternatively you can enter interactive mode after analysis:

```
sta_top/toplevel_sta.py ./verilog/gl/user_project_wrapper.v  
./spef/user_project_wrapper.spef ./sta_top/top.sdc -i
```

## How it works

- preprocesses the main Verilog
- parses the main Verilog to find which modules are used
- creates a new merged Verilog containing all the module netlists and the main netlist
- creates a tcl script in the spef directory to load the relevant spefs for each instanced module in the main verilog
- creates all\_sta.tcl in the verilog/gl directory that does the main STA steps (loading design, loading constraints, running analyses) - this mimics what OpenLane does
- creates a script sta.sh in the verilog/gl directory which sets up the environment and invokes STA to run all\_sta.tcl
- runs sta.sh

## **Invoking Spice simulation:**

### **Important notes**

- This uses at least ngspice-34, tested on ngspice-39, which has features to increase speed and reduce memory image when running with the SkyWater spice models.  
In addition the following needs to be set in `~/spice.rc` and/or `~/.spiceinit`:  
`set ngbehavior=hs`
- Note the spelling of `ngbehavior` (no u). If this is not set then the simulation will run out of memory]
- If the critical path changes then the `spice_path.py` script will need to be adapted to follow suit

### **Instructions**

- go to where the included spice files will be found  
`cd $PDK_ROOT/sky130A/libs.tech/ngspice`
- run the script to get the critical path spice circuit - 250 stages of the clock  
`/sta_top/spice_path.py path.spice`
- invoke spice (takes about 2 minutes)  
`ngspice path.spice`
- run the sim (takes about 2 minutes)  
`tran 1n 70n`
- show the results  
`plot i0 i250`

This chart shows the input clock and how it changes after 250 blocks. It was simulated for 200n to capture a clean pair of clock pulses.



Figure 18: clock\_spice.png

## Sponsored by



## Team

Tiny Tapeout would not be possible without a lot of people helping. We would especially like to thank:

- Uri Shaked for wokwi development and lots more
- Sylvain Munaut for help with scan chain improvements
- Mike Thompson for verification expertise
- Jix for formal verification support
- Proppy for help with GitHub actions
- Maximo Balestrini for all the amazing renders and the interactive GDS viewer
- James Rosenthal for coming up with digital design examples
- All the people who took part in TinyTapeout 01 and volunteered time to improve docs and test the flow
- The team at YosysHQ and all the other open source EDA tool makers
- Efabless for running the shuttles and providing OpenLane and sponsorship
- Tim Ansell and Google for supporting the open source silicon movement
- Zero to ASIC course community for all your support
- Jeremy Birch for help with STA
- Aisler for sponsoring PCB development