

Organization of Digital Computer Lab  
EECS112L/CSE 132L

**Final Project  
Pipeline MIPS**

**Team:**  
A552A55

**Student name:**  
James Ortiz-Luis  
Allan Rodriguez  
Mario Ruiz  
Gaurav Venkatesh

**Student ID:**  
32386064  
85395712  
46301389  
28826069

EECS Department  
Henry Samueli School of Engineering  
University of California, Irvine

March 13, 2016

# 1 Description

In this lab, we extended the functionality of our previous single-cycle MIPS processor to support pipelining. In this version, our processor takes in two inputs, clock and reset, and is able to execute all R-type and I-type instructions, jump, jump and link, and branch on equal. The components of our processor include adders, multiplexers, a control unit, an ALU-control unit, an instruction memory, a program counter, sign extenders, ALU, register file, and data memory. For this lab, we designed four new registers to enable the pipeline stages and also a hazard unit to control forwarding. Furthermore, the data paths of the processor were updated in order to implement pipelining and hazard detection.

## 2 Input/Output Port Description

| Port Name | Port Size | Port Type | Description     |
|-----------|-----------|-----------|-----------------|
| ref_clk   | 1         | IN        | processor clock |
| reset     | 1         | IN        | processor reset |

## 3 Design Schematic

### 3.1 Book Schematic



### 3.2 Team Schematic



## 4 Simulation

To test our design, we implemented a testbench using SystemVerilog and simulated the design using Questasim. Since the only ports of the processor are the clock and reset, the testbench only supplies the processor with those inputs. To ensure correct functionality, we wrote our own test program in the *imem.h* file to store it into the instruction memory for the processor to run. Next we ran the simulation and inspected the inputs and outputs of each component and the values in the registers in the simulation waveforms.

### 4.1 Test Program

The program below, which we used to test our processor, implements a sequence of logic and arithmetic operations and then checks the resulting values in the registers to determine whether or not to perform a branch operation. The unconditional branch and load and store operations are also implemented. The instructions are written in hexadecimal, binary, and assembly with comments on the side for readability and comprehensive purposes.

```

2001ffffc 001000 00000 00001 1111111111111100 addi r1, r0, -4      r1 = -4
20220006 001000 00001 00010 0000000000000010 addi r2, r1, 6      r2 = 2
200f0002 001000 00000 01111 00000000000000010 addi r15, r0, 2      r15 = 2
11e20002 000100 01111 00010 00000000000000010 beq r2, r15, [2]    skip to add r3, r1, r2

-----
20100003 001000 00000 10000 0000000000000011 addi r16, r0, -4      r16 = 3  SKIPPED BY BRANCH
2011000a 001000 00000 10001 000000000000001010 addi r17, r0, 10     r17 = 10 SKIPPED BY BRANCH

-----
002211820 000000 00001 00010 00011 00000 100000 add r3, r1, r2      r3 = -2
00422020 000000 00010 00010 00100 00000 100000 add r4, r2, r2      r4 = 4
00442825 000000 00010 00100 00100 00000 100101 or r5, r4, r2      r5 = 6
34a6000a 001101 00101 00110 00000 000000000001010 ori r6, r5, 10     r6 = 14
00c53022 000000 00010 00110 00010 00000 100010 sub r6, r6, r5      r6 = 8
20470009 001000 00010 00110 00000 000000000001001 addi r7, r2, 9      r7 = 11
30a8000a 001100 00101 01000 00000000000001010 andi r8, r5, 10      r8 = 2
80000011 000010 0000000000000000000000010001 j address[17]    skip to and, r9, r3, r4

-----
20100003 001000 00000 10000 00000000000000011 addi r16, r0, -4      r16 = 3  SKIPPED BY JUMP
2011000a 001000 00000 10001 000000000000001010 addi r17, r0, 10     r17 = 10 SKIPPED BY JUMP
2012000f 001000 00000 10010 0000000000000001111 addi r18, r0, 15     r18 = 10 SKIPPED BY JUMP

-----
00644824 000000 00011 00100 01001 00000 100100 and r9, r3, r4      r9 = 4
00a45026 000000 00101 00100 01000 00000 100110 xor r10, r5, r4      r10 = 2
00c71027 000000 00110 00111 00010 00000 100111 nor r2, r6, r7      r2 = -12
0043402a 000000 00010 00011 01000 00000 101010 slt r8, r2, r3      r8 = 1
29240003 001010 01001 00100 00000000000000011 slti r4, r9, 3      r4 = 0
ac070004 101011 00000 00111 000000000000000100 sw r7, 4(0)      4(0) = 11
8c030004 100011 00000 00011 000000000000000100 lw r3, 4(0)      r3 = 11
00620820 000000 00011 00010 00001 00000 100000 add r1, r3, r2      r1 = -1

```

### 4.2 Waveforms













## 5 Synthesis

In this lab, we were able to synthesize our processor. We experienced some difficulties when synthesizing including the synthesizer removing some of our components when optimizing. To solve that issue, we had to specify some options when synthesizing to prevent the optimization.

### 5.1 Area

```
Number of ports: 34
Number of nets: 1374
Number of cells: 38
Number of combinational cells: 2
Number of sequential cells: 0
Number of macros/black boxes: 0
Number of buf/inv: 1
Number of references: 38

Combinational area: 16977.581641
Buf/Inv area: 783.271812
Noncombinational area: 11965.862345
Macro/Black Box area: 0.000000
Net Interconnect area: 11959.287554

Total cell area: 28943.443986
Total area: 40902.731540
```

#### Area of detected synthetic parts

| Module              | Implm.  | Count | Area      | Perc. of cell area |
|---------------------|---------|-------|-----------|--------------------|
| DP_OP_45J4_122_6278 | str     | 1     | 539.2930  | 1.9%               |
| DW01_add            | apparch | 2     | 318.1884  | 1.1%               |
| DW_cmp              | apparch | 2     | 423.4039  | 1.5%               |
| DP_OP Subtotal:     |         | 1     | 539.2930  | 1.9%               |
| Total:              |         | 5     | 1280.8852 | 4.4%               |

#### Estimated area of ungrouped synthetic parts

| Module     | Implm.  | Count | Estimated Area | Perc. of cell area |
|------------|---------|-------|----------------|--------------------|
| DW_cmp     | apparch | 18    | 176.8322       | 0.6%               |
| DW_leftsh  | astr    | 1     | 535.9893       | 1.9%               |
| DW_rightsh | astr    | 1     | 429.7576       | 1.5%               |
| DW_sra     | astr    | 1     | 541.3268       | 1.9%               |
| Total:     |         | 21    | 1683.9059      | 5.8%               |

```
Subtotal of datapath(DP_OP) cell area: 539.2930 1.9% (estimated)
Total synthetic cell area: 2964.7911 10.2% (estimated)
```

## 5.2 Power

| Hierarchy | Switch Power | Int Power | Leak Power | Total Power | %     |
|-----------|--------------|-----------|------------|-------------|-------|
| Processor | 27.369       | 5.74e+03  | 2.20e+10   | 2.78e+04    | 100.0 |

## 5.3 Timing

|                                       |        |
|---------------------------------------|--------|
| data arrival time                     | 1.98   |
| clock clk (rise edge)                 | 2.00   |
| clock network delay (ideal)           | 0.00   |
| EXMEM/alu_out_reg[29]/CLK (DFFX1_LVT) | 0.00   |
| library setup time                    | 2.00 r |
| data required time                    | -0.02  |
|                                       | 1.98   |
| data required time                    | 1.98   |
| data arrival time                     | -1.98  |
| slack (MET)                           | 0.00   |

Since we get zero slack when running our processor with a clock period of 2 ns. The frequency of our processor would be 500 MHz