PrUcess is low-power multi-clock configurable digital processing system that executes commands (unsigned arithmetic operations, logical operations, register file read & write operations) which are received from an external source through UART receiver module and it transmits the commands' results through the UART transmitter module.
This is a full ASIC design project (from RTL to GDS). It goes through the ASIC design flow from frontend to backend:
- System's architecture design.
- Synthesizable Verilog RTL modelling (behavioral modelling, structural modelling, and FSM coding) of all the system blocks from scratch (UART transmitter and receiver, integer clock divider, ALU, register file, parametrized data and bit synchronizers for solving CDC issues, reset synchronizer, and system's main controller).
- Solving CDC issues by using synchronizers.
- Functional verification using self-checking testbenches and automated Python verification environments and running the testbenches using Modelsim.
- Logic synthesis using Synopsys Design Compiler & 130 nm technology node.
- Formal verification post logic synthesis using Synopsys Formality.
- Design for testability (DFT) using Synopsys DFT Compiler.
- Formal verification post DFT using Synopsys Formality.
- Physical design (floor planning, power planning, placement, CTS, routing, timing closure, and chip finishing) using Cadence innovus.
- Formal verification post physical design using Synopsys Formality.
UART is a standard serial communication protocol widely used in many applications. Oversampling is a technique used in UART receivers to improve the accuracy and reliability of the received data. In a UART receiver, data is received as a series of binary bits that are transmitted asynchronously with respect to a clock signal. To correctly interpret the received data, the receiver must sample the incoming signal at the correct time to capture the correct value of each bit. Oversampling involves sampling the incoming signal at a higher frequency than the baud rate of the transmitted data. This means that multiple samples are taken during the transmission of each bit, allowing the receiver to more accurately determine the timing and value of each bit. Oversampling also helps to mitigate the effects of noise and other signal distortions that can cause errors in the received data. By taking multiple samples of each bit, the receiver can detect and correct for these errors, improving the overall reliability of the data transmission.
The system includes two asynchronous clock domains (reference clock domain and UART clock domain). The command is received by the UART receiver, then it is sent to the system controller through a synchronizer (to solve the CDC issues) to decode and execute the command and then it sends the result to the UART transmitter through a synchronizer which will finally transmit it serially.
Parameter | Default Value | Description |
---|---|---|
DATA_WIDTH | 8 | It is the size of: registers, ALU operands, UART transmitter frames, and UART receiver frames. |
REGISTER_FILE_DEPTH | 16 | The number of registers in the register file. |
SYNCHRONIZER_STAGE_COUNT | 2 | The number of stages in a synchronizer. |
oversampling_prescale | the default value after resetting the system is 8 | The ratio between the frequency of the UART receiver clock and the frequency of the UART transmitter clock. |
Clock Domain | Clock Names | Modules | Frequency |
---|---|---|---|
Reference clock domain |
|
|
Reference clock frequency = ALU clock frequency = 40 MHz |
UART clock domain |
|
|
|
Note that the oversampling prescale can have the values (8, 16, or 32) but 32 is used in the simulations and backend flow to ensure that the UART receiver is functioning correctly in the worst case (highest clock frequency).
- UART: It consists of UART receiver which receives the commands and UART transmitter that transmits the commands' results.
- Clock divider: An integer clock divider which can divide the source clock up to division ratio of 32. It is used to divide the UART clock to produce UART transmitter clock with division ratio equal oversampling prescale.
- ALU: It executes unsigend arithmetic operations and logical operations.
- Clock gating cell: It is used to gate the ALU clock because there is significant time in which the ALU is not in operation (because the ALU operates on a very fast clock compared with the UART, so it waits long time to receive a new command).
- Register file.
- System controller: It is the main controller of the system. It consists of UART transmitter controller and UART receiver controller. The UART transmitter controller controls the UART transmitter by sending to it the data to be sent serially after it is ready (ALU result or register file data). The UART receiver controller controls the ALU and register file control signals based on the received frames from the UART receiver.
- Reset synchronizer: It is used to synchronize the global reset to all clock domains.
- Bus synchronizer: This module can be used to synchronize a single bit or a grey encoded bus between two asynchronous clock domains. It is a generic module (setting BUS_WIDTH = 1, means that it is a single bit synchronizer).
- Data synchronizer: It is used to synchronize a bus by using a bit synchronizer and pulse generator to synchronize the bus's data valid signal.
- Addition (+)
- Subtraction (-)
- Multiplication (*)
- Division (/)
- Bit-wise AND (&)
- Bit-wise OR (|)
- Bit-wise NAND (~&)
- Bit-wise NOR (~|)
- Bit-wise XOR (^)
- Bit-wise XNOR (~^)
- Is equal (==)
- Is greater than (>)
- Is less than (<)
- Shift right (>>1)
- Shift left (<<1)
- Register file write command. This command consists of 3 frames as follows:
- Command opcode (0xAA)
- Register file write address
- Register file write data
- Register file read command. This command consists of 2 frames as follows:
- Command opcode (0xBB)
- Register file read address
- ALU operation with operands command. The operands of the ALU are connected to the first two registers of the register file, so to execute this command: the operands are first written to the first two registers in the register file then the result is evaluated. This command consists of 4 frames as follows:
- Command opcode (0xCC)
- Operand A
- Operand B
- ALU function
- ALU operation without operands command. This command executes the ALU operation on the stored values in the first two registers in the register file directly. This command consists of 2 frames as follows:
- Command opcode (0xDD)
- ALU function
In all ALU commands, the UART transmitter sends two consecutive frames (becuase the size of the ALU result is double the size of the frame).
- The parity configuration of UART (parity enable and parity type).
- The oversampling prescale (division ratio) of the UART receiver.
Note that the mentioned configurations are outputs from the register file (reference clock domain) and they are inputs to blocks that operates on UART clock (i.e. Metastability may occur becuase the source and destination domains are asynchronous to one another), however there is no synchronizers used to synchronize those signals because they are Quasi-static signals (they are effectively stable for long periods of time. Such domain crossings do not require synchronizers in the destination domain, because they are held long enough to be captured by even the slowest clock domains without the risk of metastability).
Blue-colored signals are the system's input ports, while red-colored signals are the system's output ports.
Port | Direction | Width | Description |
---|---|---|---|
reference_clk | input | 1 | The main clock of the system. |
UART_clk | input | 1 | UART clock (the clock of the UART receiver). |
reset | input | 1 | Unsynchronized global active low asynchronous reset. |
serial_data_in | input | 1 | The data which is received serially by the UART receiver. |
serial_data_out | output | 1 | The output of the UART transmitter (It is also the output of the mux that select between start, serial data, parity, or stop bits according to the state of the transmission). |
parity_error | output | 1 | A signal to indicate that there is parity mismatch between the received parity bit and the calculated parity bit. |
frame_error | output | 1 | A signal to indicate that the start bit or the stop bit was incorrect. |
The whole system is verified through an automated Python environment which does the following:
- Generates the opcodes of all the given commands in an external file.
- Generates all the expected results that should be transmitted serially through the UART transmitter in an external file.
- Generates the memory file which corresponds to the final values that should be stored in the register file after the execution of all the commands.
- Compares the results of the Verilog testbench (transmitted through UART transmitter) and the generated memory file with the expected results' file and expected memory file.
- Reports any mismatch that occur in the testbench.
- Reports the number of passed and failed testcases.
Sample test cases:
Any arbitrary test case can be written in functional_verification/system_top/test_cases_generator.py
as the following:
Then the script functional_verification/system_top/run.tcl
is used to: run the Python script to generate the expected results, run the testbench to generate the actual results, and compare both the results to report any mismatch in the results.