

Project title: Dual-Issue Cell SPU-Lite Multimedia Processor

Team members: Ahib Alghazali (111705816) Prince Kumar Maurya (114354075)

|                                    |                     |             |                                                                                             |                                                                          |                              |                                                 |                                                 |                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|------------------------------------|---------------------|-------------|---------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|------------------------------|-------------------------------------------------|-------------------------------------------------|----------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                    | SPU-lite Model Code | Parser code | Loading of the instruction memory with instruction code from the file created by the parser | Dual-instruction fetch_decode_RF read_execute (7 stages)_WB (no hazards) | Structural hazard resolution | Data hazard resolution by forwarding (no stall) | Data hazard resolution by stalling & forwarding | Control hazard resolution for branches | 4x4 SP FP matrix multiply using multiply-add instructions (as a reference, look at the code for the 2x2 DP FP matrix multiply for Intel SSE in the lecture on DLP and SIMD)<br>Initialize elements of the first matrix with values like 0.0, 1.0, 2.0, 3.0, and so on, and the second matrix with values of 16.0, 17.0, 18.0, 19.0 and so on. Both matrices need to be in memory. Hint:<br><b>Initialize these matrices at the locations starting from address 0 in the Local Store before the start of simulation without any use of Cell SPU instructions.</b> |
| Page #s for Code & Testing Results | 32                  | 99          | 33                                                                                          | 13<br>(Sample Code 107)                                                  | 11<br>(Sample Code 107)      | 13<br>(Sample Code 107)                         | 12<br>(Sample Code 107)                         | 12<br>(Sample Code 107)                | 14<br>(Sample Code 108)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |

|                         |  |                                                                                                                                                   |                                                                                                                          |                                   |                                  |                                                            |                                                                                                                                                                                                                            |                                   |                                   |                     |
|-------------------------|--|---------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|-----------------------------------|----------------------------------|------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------|-----------------------------------|---------------------|
| Comments by Design Team |  | <p>Written in Python as assembler.py, with instruction formats in following .lst file. Writes to "out" file shown in sample programs section.</p> | <p>tb_Instruction Fetch handles reading from "out" output from parser. PC aligns to each instr as (line number - 1).</p> | <p>Lines 21-30 of input1.asm;</p> | <p>Lines 1-15 of input1.asm;</p> | <p>Lines 21-30 of input1.asm, emphasis on 21 &amp; 30;</p> | <p>If consumer instr is dependent on output of producer instr before prod. can reach (latency - 1), cons. will stall until prod. reaches said cycle, and then issue on (latency) cyc following rules of previous case.</p> | <p>Lines 18-20 of input1.asm;</p> | <p>Lines 18-21 of input1.asm;</p> | <p>input_mp.asm</p> |
| Comments by Instructor  |  |                                                                                                                                                   |                                                                                                                          |                                   |                                  |                                                            |                                                                                                                                                                                                                            |                                   |                                   |                     |

# **Dual-Issue Cell SPU-Lite Multimedia Processor**

**Ahib Alghazali      Prince Kumar Maurya**

**111705816**

**114354075**

**ESE 545: Computer Architecture**

**May 1st, 2022**

## Table of Contents

|                               |            |
|-------------------------------|------------|
| <b>Introduction</b>           | <b>5</b>   |
| <b>Data Flow</b>              | <b>6</b>   |
| <b>Instruction Fetch</b>      | <b>7</b>   |
| <b>Decode</b>                 | <b>7</b>   |
| <b>Processing Unit</b>        | <b>8</b>   |
| <b>Register Table</b>         | <b>8</b>   |
| <b>Data Forwarding</b>        | <b>9</b>   |
| <b>Even Pipe</b>              | <b>9</b>   |
| <b>Odd Pipe</b>               | <b>10</b>  |
| <b>Testing/Results</b>        | <b>11</b>  |
| <b>Instruction Set</b>        | <b>20</b>  |
| <b>SystemVerilog Codebase</b> | <b>32</b>  |
| <b>Assembler Code</b>         | <b>99</b>  |
| <b>Sample Programs</b>        | <b>106</b> |

## Introduction

The goal of this project is to learn about some of the complexities of modern microprocessor architecture, such as multiple-issue, multiple-input-multiple-data, and handling of data, structural, and control hazards. A key feature of the Sony Cell Synergistic Processing Unit (SPU) is that each processing element (SPE) is capable of issuing two instructions in a single cycle, but despite the fact that all instructions don't execute in the same length of time, it still has in-order behavior thanks to a forwarding pipeline that effectively idles either pipeline if its respective instruction has completed. The design of this project is a model-based simulation of the Cell SPE, capable of approximating most of the original unit's functionality and timing without needing to worry about intricacies such as instruction cache or the exact timing of memory accesses.

The design begins with a Python assembler that translates Cell-like assembly code into binary machine code that can be directly read by the unit. This code, from within the instruction memory, is read by the instruction fetch stage in pairs and then sent to the decode stage, which is responsible for sorting out structural and data hazards between these instructions. Compatible pairs of instructions are then dispatched to the register file/forwarding stage, where source values are read from registers or forwarded from later completed steps in the pipeline. These instructions and values are then sent to their respective dual-issue pipe to process a result. The execution, or even, pipe consists of four units focused more on arithmetic operations and the byte data type, while the load, or odd, pipe consists of three units specializing in memory access and program counter manipulation, as well as operating on quadword values. After completion and propagation through all forwarding stages, these results are written back to their respective registers, if necessary. Simplifications to the overall pipeline include fewer cycles through the first three single-pipe stages, reduced branch calculation time and penalty, static predict-not-taken branch prediction, considerably simpler instruction fetch, no use of data caches, and overall simpler memory management.

## Data Flow

*Figure 1: Original SPE Pipeline Diagram*



The original Cell SPE (synergistic processing element) pipeline consisted of 16 stages that performed instructions in 16 cycles, from instruction fetch to register writeback (not accounting for the pipeline behind the instruction fetch itself). This pipeline is demonstrated in Figure 1. This modified Cell SPU-lite processor uses a simplified model of the hardware to perform the same instructions much faster and in a more straightforward manner, though without any concern for synthesis or real hardware operation. The simplified unit has 11 stages in its pipeline, also a single cycle each. The entire instruction fetch pipeline was shortened to a single stage, as were the Decode through Route stages. The register access was shortened to a single stage as well, while the eighth forwarding stage of the processing unit was combined with the register writeback stage.

The datapath of the SPU-lite unit starts by fetching two instructions in the first stage, then sending these instructions to the decode stage to handle decoding, checking for structural hazards (such as two instructions going to the same pipe) and data hazards (such as read-after-write and write-after-write), and issuing the pair of instructions, if possible. The instructions are issued to the register access stage, where the register table is read with the needed input values and data forwarding checks the two pipes for any more recent values ready to be used. Once updated, the values and relevant control signals are sent to the two pipes to be processed and forwarded to the final forwarding stage, which writes the results to their respective destination registers.

## Instruction Fetch

The Instruction Fetch stage of the Cell SPU-lite consists of the instruction memory construct as well as the unit itself that reads the memory's values. The 1KB instruction memory is word-addressed, consisting of 256 ( $2^8 \cdot 1$ ) memory slots with 32-bit data per slot, each occupied by a single instruction. Each cycle, the IF unit reads two instructions from the instruction memory to then be processed by the Decode unit. The addresses used are based on the program counter, which typically increments by 2 every cycle, though it can also be manipulated by the Decode unit in case of a branch or stall. The unit uses static predict-not-taken branch prediction, so all branches will incur a branch miss penalty of about three cycles.

## Decode

The Decode stage of the processor is responsible for decoding the inbound instructions from the Instruction Fetch unit and dealing with a variety of hazards associated with this process. It first will decode the operation, functional unit, pipe, and associated registers/values of the two instructions. It then will check the first instruction for any read-after-write (RAW) data hazards with existing instructions in the pipeline that haven't computed their results yet. Next, the unit will check the second instruction against the first in case they are both destined for the same instruction pipe (structural hazard) or if they would both write to the same destination register resulting in a write-after-write (WAW) data hazard. The same RAW hazard checking would be done for the second instruction while also checking for a RAW hazard against the first instruction (if the second instruction reads from the first instruction's destination register), and then the instructions are issued accordingly if all goes well. If a hazard is detected against only the second instruction, the first instruction will issue as normal with a no-op down the alternate pipe while the second will be stalled for a cycle. If a hazard is detected against the first instruction, both instructions will be held back to preserve in-order operation.

In the case of a control stall, instructions will be queued into a register for issue in the next available cycle, and the program counter will be held back as well. If a branch is detected to be taken in the Branch unit, the new program counter will be sent to the Instruction Fetch unit to replace the existing count, and both the IF and Decode units will stall for a cycle to flush outgoing instructions and prepare for the new instructions to be fetched..

## Processing Unit

For this report, all stages following the Decode unit will be collectively known as the Processing unit. This consists of the shared Register File/Forward stage and the seven execution units of the even and odd pipes. The former stage takes one cycle to complete, whereas the execution units have latencies of up to 7 cycles, with an eighth cycle for writing back to the register table. This is a noteworthy point in the pipeline because from here on out, all pipeline hazards will be handled passively through forwarding as opposed through stalling and manipulating the program counter as the previous units did (the Branch unit does not directly do any of this either, it simply sends signals up for the Decode unit to handle accordingly).

## Register Table

The register table used in this processor holds 128 registers of 128-bit width. This is notable in that it allows a single register to hold 4 32-bit words at once, or 8 16bit halfwords, which is very useful in some simpler computations that operate on those data types. The table has 6 read ports (one for each register ra, rb, and rc of both pipes) and two write ports (one each for the result of both pipes' instructions). After the respective instructions are pushed to the register access stage, their source register addresses are decoded and the contents of these registers are read asynchronously. After being read, the register table checks if there are any inbound writes being performed to these source registers on the same cycle (representing the writeback stage of these older instructions). If so, the register table instead forwards these values to the next stage rather than the register contents, as these writeback values are more recent.

The write operation of the register table is simpler. During the writeback stage of an instruction, three result signals are sent to the register table: the 128-bit result value, the destination register address, and the reg\_write signal letting the unit know if it should be updated with the new result value. Despite having two write ports, two results cannot be written to the same register at the same time as that would create a structural hazard via collision. The Decode stage of the pipeline will not issue a pair instructions in one cycle if they both write to the same destination register, instead opting to only issue the first instruction and stall the latter until the next cycle.

## **Data Forwarding**

The data forwarding unit consists entirely of combinational logic, with no use of registers. It is in the same pipeline stage as register access, but uses signals from the register table after being read. It is used to prevent RAW (read-after-write) hazards by forwarding readied results from the pipe staging registers directly to the input buses of the execution units. For each of the six possible register table read port outputs, it checks every stage of both pipes' forwarding registers for a destination register address matching a source register. If a matching address is found, and that stage's reg\_write signal is asserted, the result from that stage is used to update the source value of the respective source register, allowing the most recently updated value to be used in the new instruction. Lower forwarding stages (with more recently issued instructions) are prioritized, and there is no need to account for both pipes having the same destination address as this structural hazard is prevented by the decode stage, as mentioned in the above section.

## **Even Pipe**

The even pipe, or execution pipe, of the Processing Unit consists of four units focused more on arithmetic operations and the byte data type. They are the Single Precision (FP) unit, the Simple Fixed 2 (FX2) unit, the Byte unit, and the Simple Fixed 1 (FX1) unit.

The Single Precision unit handles two types of instructions: floating point arithmetic and integer multiplication. In the true SPE hardware, floating point operations take six stages to be calculated, while integer operations take one more stage to perform. While calculations in the SPU-lite simulation model all occur in the first execution cycle and propagate internally before reaching the respective staging register, this difference in latency is simulated by having an internal shift register depth of seven stages and adding a control signal called int. Quite simply, if the int control signal is not asserted for a given result, it will be released from the unit after six stages; if int is asserted, it will be released after the seventh stage.

The Simple Fixed 2 unit performs shift and rotate operations. In the original hardware, the unit used three stages to operate, but needed four cycles to complete an instruction before it could be forwarded. To simulate this, the SPU-lite model behaves as if the unit has four delay stages rather than three, but otherwise follows specifications.

The Byte unit performs four operations on the byte data type exclusively. It has the same pipeline behavior as the Simple Fixed 2 unit, with regard to latency and delay staging, resulting in an effective latency of four cycles.

The Simple Fixed 1 unit performs most arithmetic and logical operations, as well as compares. It performs, by far, the most unique instructions of any unit, yet only has a pipeline depth of two stages with the same latency, meaning results are ready after two full cycles of operation.

### Odd Pipe

The odd pipe, or load pipe, of the Processing Unit consists of three units specializing in memory access and program counter manipulation, as well as operating on quadword values. They are the Permute unit, the Local Store (LS) unit, and the Branch unit.

The Permute unit primarily performs quadword operations, meaning it can shift or rotate an entire register's contents as one value, while also doing gather operations. It has the same pipeline behavior as the Simple Fixed 2 unit, with regard to latency and delay staging, resulting in an effective latency of four cycles.

The Local Store unit performs all memory accesses for the SPE. There is a 32KB local memory that the unit directly interfaces with for all of its instructions. This memory is byte-addressed, consisting of 32,767 ( $2^{15}-1$ ) memory slots with 8-bit data per slot. Memory accesses are always to quadwords, which is the full size of the 128-bit registers. Memory access is done via one of three addressing modes: indexed addressing, base/indexed addressing, and direct addressing. All memory accesses actually occur on the first execution stage, with the results from a load instruction being ready for forwarding after six cycles. This means there are no RAW hazards within the Local Store unit as reads always obtain the most recent data.

The Branch unit handles all changes to the program counter, both unconditional and conditional. In other ISAs, unconditional branches may be known as jump instructions, but in Cell SPU, all are referred to as branch instructions. The 1KB instruction memory is word-addressed, consisting of 256 ( $2^8-1$ ) memory slots with 32-bit data per slot, each occupied by a single instruction. Each branch instruction takes only a single cycle to execute, and rather than be forwarded to the odd pipe's delay stages, the new program counter and a `branch_taken` control signal are sent directly to the Instruction Fetch and Decode units, which then handle the next fetch based on the `branch_taken` flag. A signal is also sent to all other functional units to kill the instructions that have been issued after branch to ensure in-order operation is kept and all control hazards are accounted for.

## Testing/Results

**Figure 2: Sample Program for Testing Hazard Handling (Binary, Hex, and Assembly)**

In order to test proper operation of the Cell SPU-lite processor, the program shown in Figure 2 was drafted to highlight a number of operating conditions. These are pairs of instructions fetched for the same pipe, RAW hazards resolved via stalling and forwarding, branch control hazards, optimal dual-issue operation, and RAW hazard resolved via forwarding without stalling.

The first conditions tested were pairs of instructions going to the same functional pipe. Lines 1-7 of Figure 2 show 7 instructions being issued to the Register File/Forwarding unit to go to the even pipe. This can be seen in the pipeline in Figure 3, where the even pipe is shown to be populated for 7 consecutive cycles rather than issuing twice per cycle. The stall signal used to modulate this is also shown, though note that this signal is used to route instructions in the cycle following the pulse. Figure 4 shows something similar, but with instructions 9-15 in the odd pipe.

Figure 3: Even Pipe Populated with 7 Consecutive Lone Instructions



Figure 4: Odd Pipe Populated with 7 Consecutive Lone Instructions



Following the two stalled pipeline sequences in a very typical case that approximates a for loop from high level programming. Line 16 sets register 25 to 3 in the preferred slot, while Line 18 is equivalent to a “dec r25” command. Following that in line 20 is the BRNZ instruction, which sets up the following loop: for (r25 = 3; r25 > 0; r25--). In other words, the register will decrement every cycle, allowing us to perform other tasks in the loop; in this case, that task is simply incrementing r26 to show the number of loops. Note that because the registers are assigned and decremented before the branch conditional is reached, the loop is effectively starting at r25 = 2. The whole loop dynamic is shown in Figure 5 below.

Figure 5: Branch Control Hazard Resolution with RAW Stall



Of course, another thing to note about the loop above is that it demonstrates a RAW stall as well. Line 20 (BRNZ) is an odd-pipe instruction, while line 19 (r26++) is an even-pipe instruction. The reason these two don't issue together in spite of this is because Line 20 has a dependency on line 18 (r25--), so it must wait until the FX2 unit's data is available for reading before it can check for branch taken. FX2 has a latency of two cycles, which would suggest 3 cycles are necessary to read the data from it due to the RF stage adding one, but results are ready to forward to the RF/FWD stage once FX2 hits its final stage. This means that once data is in the first cycle of FX2, Decode is free to issue any depending instructions; by the next cycle they'll reach the FWD stage and read the result of FX2-2. This is shown in Figure 6. Line 18 is the leftmost instruction in the even pipe, while line 20 is the first non-zero instruction in the odd.

Figure 6: Zoomed View of Branch Control Hazard Resolution with RAW Stall



The stall signal is pulsed in alignment with line 19 as that cycle was originally due to issue both the (inc r26) instruction and the BNEZ, but the latter still needed to be delayed one stage in order to safely read the result of FX2 and the (dec r25) instruction.

The next shown condition of the processor's operation is the ideal dual-issue case. Lines 21-30 are pairs of even and odd instructions with no dependencies among one another, except for line 30. As one can see from Figure 7 below, they are able to issue in pairs with no stalling. One can see from looking at the code though that line 30 (rotqbi 15 4 2; 0x3B00820F) has a dependency on line 21 (avgb 2 0 1; 0x1A604002), with the first instruction writing to register 2 and the latter one using register 2 as a data source. Yet as seen in Figure 7, there are no stalls seen or active hazard prevention used. This is because AVGB is a Byte unit instruction, which means there is a latency of four cycles after issue before it can be read from. The cycle immediately before the yellow dividing line is the cycle in which AVGB is in the RF/FWD stage. The following three cycles have no hazards between the instructions, though on that third cycle after the line, AVGB's result is full calculated and will be sent to the Forwarding unit on the next cycle so that instructions in the decode stage (such as ROTQBI seen a few lines above in the waveform) see the data is available. On the next clock edge, Decode has determined that ROTQBI has all necessary values available for issue and dispatches it with its even pairmate rather than stalling it for a cycle, allowing it to issue exactly when it is allowed to without hazard.

Figure 7: Dual-Issue with No Stalls



*Figure 8: Sample Program for Matrix-Multiplication*

```

1  ila 0 0          // store 0 in reg0
2  ila 1 4          // shift=4;
3  ilh 10 -1        // used to create mask FFFFFFFF
4  ilh 30 4          // use to shift by 12byte
5  ilh 31 8          // use to shift by 8 byte
6  ilh 32 12         // use to shift by 8 bytes
7  shlqby 10 10 32   // generate FFFFFFFF ins first word
8  ila 11 4          // i = 4// START OUTER LOOP
9  ai 11 11 -1        // i--
10 ila 12 4          // j = 4 START INNER LOOP
11 ila 42 0          // load_index=0
12 ila 43 16         // rotate=16
13 ila 20 0          // clearing reg 20
14 ai 12 12 -1        // j--;
15 ila 15 0          // clearing reg 15
16 lqd 5 4(42)        // load row(2)= 4+0,+41,4+2,4+3
17 and 15 5 10         // extract the first word of the quadword
18 rotqby 15 15 43    // rotate the word to align in the slot
19 xor 20 15 20        // reg 15 will have elemnt for column 0 matrix2
20 shlqby 5 5 1          // shift row by 4
21 ai 43 43 -4        // rotate-=4
22 stqd 5 4(42)        // store the shifted reg back to memory
23 ai 42 42 1          // load_index+=1
24 brnz 12 -10         // if reg11==0 break else goto ai 12 12 -1
25 stqd 20 10(11)       // store 20 in 13 to 10
26 brnz 11 -17         // if reg11==0 break else goto ai 11 11 -1
27 ila 51 4          // i = 4// START OUTER LOOP
28 ai 51 51 -1        // i--
29 ila 80 0          // reset the quad word
30 lqd 21 0(51)        // load maxtrix 1 rows
31 ila 12 4          // j = 4 START INNER LOOP
32 ila 43 16         // rotate=16s
33 ai 12 12 -1        // j--;
34 lqd 22 10(12)       // load transposed matrix2 rows
35 mpya 23 22 21 23    // multiply slot wise both rows and store qword in 23
36 ila 81 0          // reset the cell
37 ila 45 4          // k=0
38 ai 45 45 -1        // k--
39 and 70 23 10        // using mask to extract the left most word
40 shlqby 23 23 30    // shift left by 4 bytes
41 a 81 81 70         // computing sum of the product cell wise
42 brnz 45 -4          // jump to ai 45 45 -1
43 and 81 81 10        // clearing any other bits
44 rotqby 81 81 43    // rotate reg81 to place word in correct slot of quad
45 ai 43 43 -4          // rotate-=4
46 or 80 80 81         // build the quadword using or with rotated word
47 brnz 12 -14         // jump to ai 12 12 -1
48 stqd 80 20(51)       // store the quad word row for the matrix3
49 brnz 51 -21         //Jump to ai 51 51 -1
50 stop

```

A practical, more complex application of the MIMD format of the Cell SPU-lite processor is the ability to perform multiply-add operations on four 32-bit words of data simultaneously, which simplifies the process of matrix multiplication. Figure 8 shows a program designed to do just that. It extracts preloaded matrices A and B from memory (the values of which are shown

below in Figure 9) and performs masks and masks on matrix B to allow it to be column-aligned, allowing for multiplication with the row-aligned matrix A. The two matrices are then multiplied through consecutive multiply-add operations, and the results are written to memory as matrix C after some more formatting and alignment. Figure 10 shows the results of this operation in memory locations 20-23. The displayed elements, which convert into the decimal values shown in Figure 9, are: [(0x98, 0x9E, 0xA4, 0xAA),

```
[ (0x98, 0x9E, 0xA4, 0xAA),  
  (0x1F8, 0x20E, 0x224, 0x23A),  
  (0x358, 0x37E, 0x3A4, 0x3CA),  
  (0x4B8, 0x4EE, 0x524, 0x55A) ]
```

*Figure 9: Example Values and Results to be Produced in Matrix-Multiply Program*

| Matrix A |                |                |                |                | Matrix B |                |                |                |                |
|----------|----------------|----------------|----------------|----------------|----------|----------------|----------------|----------------|----------------|
|          | A <sub>1</sub> | A <sub>2</sub> | A <sub>3</sub> | A <sub>4</sub> |          | C <sub>1</sub> | C <sub>2</sub> | C <sub>3</sub> | C <sub>4</sub> |
| 1        | 0              | 1              | 2              | 3              |          | 152            | 158            | 164            | 170            |
| 2        | 4              | 5              | 6              | 7              |          | 504            | 526            | 548            | 570            |
| 3        | 8              | 9              | 10             | 11             |          | 856            | 894            | 932            | 970            |
| 4        | 12             | 13             | 14             | 15             |          | 1208           | 1262           | 1316           | 1370           |

Figure 10: Results of Matrix-Multiplication in Memory (Hex, Word-Indexed)



## Single Precision (FP)



Instructions tested in above screenshot:

1. mpy: Multiply
2. mpuy: Multiply Unsigned
3. mpyh: Multiply High
4. mpya: Multiply and Add
5. mpyi: Multiply Immediate
6. mpyui: Multiply Unsigned Immediate
7. fa: Floating Add
8. fs: Floating Subtract
9. fm: Floating Multiply
10. fceq: Floating Compare Equal
11. fcgt: Floating Compare Greater Than
12. fma: Floating Multiply and Add
13. fms: Floating Multiply and Subtract
14. cflts: Convert Floating Point to Signed Integer
15. cfltu: Convert Floating Point to Unsigned Integer

## Simple Fixed 2 (FX2)

Instructions tested in above screenshot:

1. shl: Shift Left Word
  2. roth: Rotate Halfword
  3. rot: Rotate Word
  4. rothm: Rotate and Mask Halfword
  5. rotma: Rotate and Mask Algebraic Word
  6. rothi: Rotate Halfword Immediate
  7. rotmai: Rotate and Mask Algebraic Word Immediate
  8. shli: Shift Left Word Immediate

## Byte

Instructions tested in above screenshot:

1. cntb: Count Ones in Byte
  2. avgb: Average Bytes
  3. absdb: Absolute Difference of Bytes
  4. sumb: Sum Bytes into Halfwords

## Simple Fixed 1 (FX1)



Instructions tested in above screenshot:

1. ah: Add Halfword
  2. sfh: Subtract from Halfword
  3. sfhi: Subtract from Halfword Immediate
  4. ai: Add Word Immediate
  5. and: And
  6. or: Or
  7. ceqb: Compare Equal Byte
  8. cgt: Compare Greater Than Word
  9. clgtb: Compare Logical Greater Than Byte
  10. andbi: And Byte Immediate
  11. xorbi: Exclusive Or Byte Immediate
  12. clgti: Compare Logical Greater Than Word Immediate
  13. ilh: Immediate Load Halfword
  14. ila: Immediate Load Address

## Permute



Instructions tested in above screenshot:

1. `shlqbi`: Shift Left Quadword by Bits
  2. `rotqbi`: Rotate Quadword by Bits
  3. `gbb`: Gather Bits from Bytes
  4. `shlqbii`: Shift Left Quadword by Bits Immediate
  5. `rotqbii`: Rotate Quadword by Bytes Immediate

## Local Store



Instructions tested in above screenshot:

1. stqx : Store Quadword (x-form)
  2. lqx : Load Quadword (x-form)
  3. stqd : Store Quadword (d-form)
  4. lqd : Load Quadword (d-form)
  5. stqa : Store Quadword (a-form)
  6. lqa : Load Quadword (a-form)

An example of writing to memory location 48 is shown below:



## Branch



Instructions tested in above screenshot:

1. bi : Branch Indirect
  2. brsl: Branch Relative and Set Link
  3. br: Branch Relative
  4. bra: Branch Absolute
  5. brnz: Branch If Not Zero Word
  6. brz: Branch If Zero Word

## Instruction Set

| Name                           | Mnemonic               | RTL Description                                                                                                                                                                                                                                                                                                                                   | Exec Unit | Exec Pipe | Latency |
|--------------------------------|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|-----------|---------|
| Multiply                       | mpy rt, ra,<br>rb      | $RT^{0:3} \leftarrow RA^{2:3} * RB^{2:3}$<br>$RT^{4:7} \leftarrow RA^{6:7} * RB^{6:7}$<br>$RT^{8:11} \leftarrow RA^{10:11} * RB^{10:11}$<br>$RT^{12:15} \leftarrow RA^{14:15} * RB^{14:15}$                                                                                                                                                       | FP1       | EVEN      | 7       |
| Multiply Unsigned              | mpyu rt,<br>ra, rb     | $RT^{0:3} \leftarrow RA^{2:3}  *  RB^{2:3}$<br>$RT^{4:7} \leftarrow RA^{6:7}  *  RB^{6:7}$<br>$RT^{8:11} \leftarrow RA^{10:11}  *  RB^{10:11}$<br>$RT^{12:15} \leftarrow RA^{14:15}  *  RB^{14:15}$                                                                                                                                               | FP1       | EVEN      | 7       |
| Multiply Immediate             | mpyi rt, ra,<br>imm10  | $t \leftarrow RepLeftBit(I10,16)$<br>$RT^{0:3} \leftarrow RA^{2:3} * t$<br>$RT^{4:7} \leftarrow RA^{6:7} * t$<br>$RT^{8:11} \leftarrow RA^{10:11} * t$<br>$RT^{12:15} \leftarrow RA^{14:15} * t$                                                                                                                                                  | FP1       | EVEN      | 6       |
| Multiply Unsigned<br>Immediate | mpyui rt,<br>ra, imm10 | $t \leftarrow RepLeftBit(I10,16)$<br>$RT^{0:3} \leftarrow RA^{2:3}  *  t$<br>$RT^{4:7} \leftarrow RA^{6:7}  *  t$<br>$RT^{8:11} \leftarrow RA^{10:11}  *  t$<br>$RT^{12:15} \leftarrow RA^{14:15}  *  t$                                                                                                                                          | FP1       | EVEN      | 6       |
| Multiply and Add               | mpya rt,<br>ra, rb, rc | $t0 \leftarrow RA^{2:3} * RB^{2:3}$<br>$t1 \leftarrow RA^{6:7} * RB^{6:7}$<br>$t2 \leftarrow RA^{10:11} * RB^{10:11}$<br>$t3 \leftarrow RA^{14:15} * RB^{14:15}$<br>$RT^{0:3} \leftarrow t0 + RC^{0:3}$<br>$RT^{4:7} \leftarrow t1 + RC^{4:7}$<br>$RT^{8:11} \leftarrow t2 + RC^{8:11}$<br>$RT^{12:15} \leftarrow t3 + RC^{12:15}$                | FP1       | EVEN      | 6       |
| Multiply High                  | mpyh rt,<br>ra, rb     | $t0 \leftarrow RA^{0:1} * RB^{2:3}$<br>$t1 \leftarrow RA^{4:5} * RB^{6:7}$<br>$t2 \leftarrow RA^{8:9} * RB^{10:11}$<br>$t3 \leftarrow RA^{12:13} * RB^{14:15}$<br>$RT^{0:3} \leftarrow t0^{2:3}    0x0000$<br>$RT^{4:7} \leftarrow t1^{2:3}    0x0000$<br>$RT^{8:11} \leftarrow t2^{2:3}    0x0000$<br>$RT^{12:15} \leftarrow t3^{2:3}    0x0000$ | FP1       | EVEN      | 6       |
| Floating Add                   | fa rt, ra, rb          |                                                                                                                                                                                                                                                                                                                                                   | FP1       | EVEN      | 6       |
| Floating Subtract              | fs rt, ra, rb          |                                                                                                                                                                                                                                                                                                                                                   | FP1       | EVEN      | 6       |
| Floating Multiply              | fm rt, ra, rb          |                                                                                                                                                                                                                                                                                                                                                   | FP1       | EVEN      | 6       |
| Floating Multiply and Add      | fma rt, ra,<br>rb, rc  |                                                                                                                                                                                                                                                                                                                                                   | FP1       | EVEN      | 6       |
| Floating Multiply and          | fms rt, ra,            |                                                                                                                                                                                                                                                                                                                                                   | FP1       | EVEN      | 6       |

|                                      |                    |                                                                                                                                                                                                                                                           |     |      |   |
|--------------------------------------|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|------|---|
| Subtract                             | rb, rc             |                                                                                                                                                                                                                                                           |     |      |   |
| Convert Floating to Signed Integer   | cflts rt, ra, imm8 |                                                                                                                                                                                                                                                           | FP1 | EVEN | 4 |
| Convert Floating to Unsigned Integer | cfltu rt, ra, imm8 |                                                                                                                                                                                                                                                           | FP1 | EVEN | 4 |
| Floating Compare Equal               | fceq rt, ra, rb    |                                                                                                                                                                                                                                                           | FP1 | EVEN | 4 |
| Floating Compare Greater Than        | fcgt rt, ra, rb    |                                                                                                                                                                                                                                                           | FP1 | EVEN | 4 |
| Shift Left Halfword                  | shlh rt, ra, rb    | <pre> for j = 0 to 15 by 2   s ← RB<sup>j::2</sup> &amp; 0x001F   t ← RA<sup>j::2</sup>   for b = 0 to 15     if b + s &lt; 16 then r<sub>b</sub> ← t<sub>b+s</sub>     else r<sub>b</sub> ← 0   end   RT<sup>j::2</sup> ← r end </pre>                   | FX2 | EVEN | 2 |
| Shift Left Word                      | shl rt, ra, rb     | <pre> for j = 0 to 15 by 4   s ← RB<sup>j::4</sup> &amp; 0x0000003F   t ← RA<sup>j::4</sup>   for b = 0 to 31     if b + s &lt; 32 then r<sub>b</sub> ← t<sub>b+s</sub>     else r<sub>b</sub> ← 0   end   RT<sup>j::4</sup> ← r end </pre>               | FX2 | EVEN | 2 |
| Shift Left Word Immediate            | shli rt, ra, imm7  | <pre> s ← RepLeftBit(I7,32) &amp; 0x0000003F for j = 0 to 15 by 4   t ← RA<sup>j::4</sup>   for b = 0 to 31     if b + s &lt; 32 then r<sub>b</sub> ← t<sub>b+s</sub>     else r<sub>b</sub> ← 0   end   RT<sup>j::4</sup> ← r end </pre>                 | FX2 | EVEN | 2 |
| Rotate Halfword                      | roth rt, ra, rb    | <pre> for j = 0 to 15 by 2   s ← RB<sup>j::2</sup> &amp; 0x000F   t ← RA<sup>j::2</sup>   for b = 0 to 15     if b + s &lt; 16 then r<sub>b</sub> ← t<sub>b+s</sub>     else r<sub>b</sub> ← t<sub>b+s-16</sub>   end   RT<sup>j::2</sup> ← r; end </pre> | FX2 | EVEN | 2 |
| Rotate Halfword Immediate            | rothi rt, ra, imm7 | <pre> s ← RepLeftBit(I7,16) &amp; 0x000F for j = 0 to 15 by 2 </pre>                                                                                                                                                                                      | FX2 | EVEN | 2 |

|                                    |                   |                                                                                                                                                                                                                                                         |     |      |   |
|------------------------------------|-------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|------|---|
|                                    |                   | $t \leftarrow RA^{::2}$<br>for $b = 0$ to 15<br>if $b + s < 16$ then $r_b \leftarrow t_{b+s}$<br>else $r_b \leftarrow t_{b+s-16}$<br>end<br>$RT^{::2} \leftarrow r$<br>end                                                                              |     |      |   |
| Rotate Word                        | rot rt, ra, rb    | $s \leftarrow RB^{::4} \& 0x00000001F$<br>$t \leftarrow RA^{::4}$<br>for $b = 0$ to 31<br>if $b + s < 32$ then $r_b \leftarrow t_{b+s}$<br>else $r_b \leftarrow t_{b+s-32}$<br>end<br>$RT^{::4} \leftarrow r$<br>end                                    | FX2 | EVEN | 2 |
| Rotate Word Immediate              | roti rt, ra, imm7 | $s \leftarrow RepLeftBit(I7,32) \& 0x00000001F$<br>for $j = 0$ to 15 by 4<br>$t \leftarrow RA^{::4}$<br>for $b = 0$ to 31<br>if $b + s < 32$ then $r_b \leftarrow t_{b+s}$<br>else $r_b \leftarrow t_{b+s-32}$<br>end<br>$RT^{::4} \leftarrow r$<br>end | FX2 | EVEN | 2 |
| Rotate and Mask Halfword           | rothm rt, ra, rb  | $s \leftarrow (0 - RB^{::2}) \& 0x001F$<br>$t \leftarrow RA^{::2}$<br>for $b = 0$ to 15<br>if $b \geq s$ then $r_b \leftarrow t_{b-s}$<br>else $r_b \leftarrow 0$<br>end<br>$RT^{::2} \leftarrow r$<br>end                                              | FX2 | EVEN | 2 |
| Rotate and Mask Word               | rotm rt, ra, rb   | $s \leftarrow (0 - RB^{::4}) \& 0x0000003F$<br>$t \leftarrow RA^{::4}$<br>for $b = 0$ to 32<br>if $b \geq s$ then $r_b \leftarrow t_{b-s}$<br>else $r_b \leftarrow 0$<br>end<br>$RT^{::4} \leftarrow r$<br>end                                          | FX2 | EVEN | 2 |
| Rotate and Mask Algebraic Halfword | rotmah rt, ra, rb | $s \leftarrow (0 - RB^{::2}) \& 0x001F$<br>$t \leftarrow RA^{::2}$                                                                                                                                                                                      | FX2 | EVEN | 2 |

|                                              |                         |                                                                                                                                                                                                                                                                            |      |      |   |
|----------------------------------------------|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|------|---|
|                                              |                         | <pre> for b = 0 to 15     if b ≥ s then r<sub>b</sub> ← t<sub>b-s</sub>     else          r<sub>b</sub> ← t<sub>0</sub> end RT<sup>:2</sup> ← r end </pre>                                                                                                                 |      |      |   |
| Rotate and Mask Algebraic Halfword Immediate | rotmahi rt,<br>ra, imm7 | <pre> s ← (0 - RepLeftBit(I7,16)) &amp; 0x001F for j = 0 to 15 by 2     t ← RA<sup>j:2</sup>     for b = 0 to 15         if b ≥ s then r<sub>b</sub> ← t<sub>b-s</sub>         else          r<sub>b</sub> ← t<sub>0</sub>     end     RT<sup>j:2</sup> ← r end </pre>     | FX2  | EVEN | 2 |
| Rotate and Mask Algebraic Word               | rotma rt,<br>ra, rb     | <pre> for j = 0 to 15 by 4     s ← (0 - RB<sup>j:4</sup>) &amp; 0x001F     t ← RA<sup>j:4</sup>     for b = 0 to 31         if b ≥ s then r<sub>b</sub> ← t<sub>b-s</sub>         else          r<sub>b</sub> ← t<sub>0</sub>     end     RT<sup>j:4</sup> ← r end </pre>  | FX2  | EVEN | 2 |
| Rotate and Mask Algebraic Word Immediate     | rotmai rt,<br>ra, imm7  | <pre> s ← (0 - RepLeftBit(I7,32)) &amp; 0x0000003F for j = 0 to 15 by 4     t ← RA<sup>j:4</sup>     for b = 0 to 31         if b ≥ s then r<sub>b</sub> ← t<sub>b-s</sub>         else          r<sub>b</sub> ← t<sub>0</sub>     end     RT<sup>j:4</sup> ← r end </pre> | FX2  | EVEN | 2 |
| Count Ones in Bytes                          | cntb rt, ra             | <pre> for j = 0 to 15     c = 0     b ← RA<sup>j</sup>     for m = 0 to 7         If b<sub>m</sub> = 1 then c ← c + 1     end     RT<sup>j</sup> ← c end </pre>                                                                                                            | BYTE | EVEN | 7 |
| Average Bytes                                | avgb rt, ra,<br>rb      | <pre> for j = 0 to 15     RT<sup>j</sup> ← ((0x00    RA<sup>j</sup>)+(0x00    RB<sup>j</sup>)+1)<sub>7:14</sub> </pre>                                                                                                                                                     | BYTE | EVEN | 7 |
| Absolute Differences of Bytes                | absdb rt,<br>ra, rb     | <pre> for j = 0 to 15     if (RB<sup>j</sup> &gt;<sup>u</sup> RA<sup>j</sup>) then RT<sup>j</sup> ← RB<sup>j</sup> - RA<sup>j</sup>     else                      RT<sup>j</sup> ← RA<sup>j</sup> - RB<sup>j</sup> end </pre>                                              | BYTE | EVEN | 7 |

|                                  |                    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |      |      |   |
|----------------------------------|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|------|---|
| Sum Bytes into Halfwords         | sumb rt,<br>ra, rb | $RT^{0:1} \leftarrow RB^0 + RB^1 + RB^2 + RB^3$<br>$RT^{2:3} \leftarrow RA^0 + RA^1 + RA^2 + RA^3$<br>$RT^{4:5} \leftarrow RB^4 + RB^5 + RB^6 + RB^7$<br>$RT^{6:7} \leftarrow RA^4 + RA^5 + RA^6 + RA^7$<br>$RT^{8:9} \leftarrow RB^8 + RB^9 + RB^{10} + RB^{11}$<br>$RT^{10:11} \leftarrow RA^8 + RA^9 + RA^{10} + RA^{11}$<br>$RT^{12:13} \leftarrow RB^{12} + RB^{13} + RB^{14} + RB^{15}$<br>$RT^{14:15} \leftarrow RA^{12} + RA^{13} + RA^{14} + RA^{15}$ | BYTE | EVEN | 7 |
| Immediate Load Halfword          | ilh rt,<br>imm16   | $s \leftarrow I16$<br>$RT^{0:1} \leftarrow s$<br>$RT^{2:3} \leftarrow s$<br>$RT^{4:5} \leftarrow s$<br>$RT^{6:7} \leftarrow s$<br>$RT^{8:9} \leftarrow s$<br>$RT^{10:11} \leftarrow s$<br>$RT^{12:13} \leftarrow s$<br>$RT^{14:15} \leftarrow s$                                                                                                                                                                                                               | FX1  | EVEN | 4 |
| Immediate Load Halfword<br>Upper | ilhu rt,<br>imm16  | $t \leftarrow I16    0x0000$<br>$RT^{0:3} \leftarrow t$<br>$RT^{4:7} \leftarrow t$<br>$RT^{8:11} \leftarrow t$<br>$RT^{12:15} \leftarrow t$                                                                                                                                                                                                                                                                                                                    | FX1  | EVEN | 4 |
| Immediate Load Word              | il rt, imm16       | $s \leftarrow RepLeftBit(I16,32)$<br>$RT^{0:3} \leftarrow s$<br>$RT^{4:7} \leftarrow s$<br>$RT^{8:11} \leftarrow s$<br>$RT^{12:15} \leftarrow s$                                                                                                                                                                                                                                                                                                               | FX1  | EVEN | 4 |
| Immediate Load Address           | ila rt,<br>imm18   | $t \leftarrow {}_{14}0    I18$<br>$RT^{0:3} \leftarrow t$<br>$RT^{4:7} \leftarrow t$<br>$RT^{8:11} \leftarrow t$<br>$RT^{12:15} \leftarrow t$                                                                                                                                                                                                                                                                                                                  | FX1  | EVEN | 4 |
| Immediate Or Halfword<br>Lower   | iohl rt,<br>imm16  | $t \leftarrow 0x0000    I16$<br>$RT^{0:3} \leftarrow RT^{0:3}   t$<br>$RT^{4:7} \leftarrow RT^{4:7}   t$<br>$RT^{8:11} \leftarrow RT^{8:11}   t$<br>$RT^{12:15} \leftarrow RT^{12:15}   t$                                                                                                                                                                                                                                                                     | FX1  | EVEN | 4 |
| Add Halfword                     | ah rt, ra, rb      | $RT^{0:1} \leftarrow RA^{0:1} + RB^{0:1}$<br>$RT^{2:3} \leftarrow RA^{2:3} + RB^{2:3}$<br>$RT^{4:5} \leftarrow RA^{4:5} + RB^{4:5}$<br>$RT^{6:7} \leftarrow RA^{6:7} + RB^{6:7}$<br>$RT^{8:9} \leftarrow RA^{8:9} + RB^{8:9}$<br>$RT^{10:11} \leftarrow RA^{10:11} + RB^{10:11}$<br>$RT^{12:13} \leftarrow RA^{12:13} + RB^{12:13}$<br>$RT^{14:15} \leftarrow RA^{14:15} + RB^{14:15}$                                                                         | FX1  | EVEN | 4 |

|                                     |                       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |     |      |   |
|-------------------------------------|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|------|---|
| Add Halfword Immediate              | ahi rt, ra,<br>imm10  | $s \leftarrow \text{RepLeftBit}(l10,16)$<br>$RT^{0:1} \leftarrow RA^{0:1} + s$<br>$RT^{2:3} \leftarrow RA^{2:3} + s$<br>$RT^{4:5} \leftarrow RA^{4:5} + s$<br>$RT^{6:7} \leftarrow RA^{6:7} + s$<br>$RT^{8:9} \leftarrow RA^{8:9} + s$<br>$RT^{10:11} \leftarrow RA^{10:11} + s$<br>$RT^{12:13} \leftarrow RA^{12:13} + s$<br>$RT^{14:15} \leftarrow RA^{14:15} + s$                                                                                           | FX1 | EVEN | 4 |
| Add Word                            | a rt, ra, rb          | $RT^{0:3} \leftarrow RA^{0:3} + RB^{0:3}$<br>$RT^{4:7} \leftarrow RA^{4:7} + RB^{4:7}$<br>$RT^{8:11} \leftarrow RA^{8:11} + RB^{8:11}$<br>$RT^{12:15} \leftarrow RA^{12:15} + RB^{12:15}$                                                                                                                                                                                                                                                                      | FX1 | EVEN | 4 |
| Add Word Immediate                  | ai rt, ra,<br>imm10   | $t \leftarrow \text{RepLeftBit}(l10,32)$<br>$RT^{0:3} \leftarrow RA^{0:3} + t$<br>$RT^{4:7} \leftarrow RA^{4:7} + t$<br>$RT^{8:11} \leftarrow RA^{8:11} + t$<br>$RT^{12:15} \leftarrow RA^{12:15} + t$                                                                                                                                                                                                                                                         | FX1 | EVEN | 4 |
| Subtract from Halfword              | sfh rt, ra,<br>rb     | $RT^{0:1} \leftarrow RB^{0:1} + \neg RA^{0:1} + 1$<br>$RT^{2:3} \leftarrow RB^{2:3} + \neg RA^{2:3} + 1$<br>$RT^{4:5} \leftarrow RB^{4:5} + \neg RA^{4:5} + 1$<br>$RT^{6:7} \leftarrow RB^{6:7} + \neg RA^{6:7} + 1$<br>$RT^{8:9} \leftarrow RB^{8:9} + \neg RA^{8:9} + 1$<br>$RT^{10:11} \leftarrow RB^{10:11} + \neg RA^{10:11} + 1$<br>$RT^{12:13} \leftarrow RB^{12:13} + \neg RA^{12:13} + 1$<br>$RT^{14:15} \leftarrow RB^{14:15} + \neg RA^{14:15} + 1$ | FX1 | EVEN | 4 |
| Subtract from Halfword<br>Immediate | sfhi rt, ra,<br>imm10 | $t \leftarrow \text{RepLeftBit}(l10,16)$<br>$RT^{0:1} \leftarrow t + \neg RA^{0:1} + 1$<br>$RT^{2:3} \leftarrow t + \neg RA^{2:3} + 1$<br>$RT^{4:5} \leftarrow t + \neg RA^{4:5} + 1$<br>$RT^{6:7} \leftarrow t + \neg RA^{6:7} + 1$<br>$RT^{8:9} \leftarrow t + \neg RA^{8:9} + 1$<br>$RT^{10:11} \leftarrow t + \neg RA^{10:11} + 1$<br>$RT^{12:13} \leftarrow t + \neg RA^{12:13} + 1$<br>$RT^{14:15} \leftarrow t + \neg RA^{14:15} + 1$                   | FX1 | EVEN | 4 |
| Subtract from Word                  | sf rt, ra, rb         | $RT^{0:3} \leftarrow RB^{0:3} + \neg RA^{0:3} + 1$<br>$RT^{4:7} \leftarrow RB^{4:7} + \neg RA^{4:7} + 1$<br>$RT^{8:11} \leftarrow RB^{8:11} + \neg RA^{8:11} + 1$<br>$RT^{12:15} \leftarrow RB^{12:15} + \neg RA^{12:15} + 1$                                                                                                                                                                                                                                  | FX1 | EVEN | 4 |
| Subtract from Word<br>Immediate     | sfi rt, ra,<br>imm10  | $t \leftarrow \text{RepLeftBit}(l10,32)$<br>$RT^A(0:3) \leftarrow t + (\neg RA^A(0:3)) + 1$<br>$RT^A(4:7) \leftarrow t + (\neg RA^A(4:7)) + 1$<br>$RT^A(8:11) \leftarrow t + (\neg RA^A(8:11)) + 1$<br>$RT^A(12:15) \leftarrow t + (\neg RA^A(12:15)) + 1$                                                                                                                                                                                                     | FX1 | EVEN | 4 |
| And                                 | and rt,ra,rb          | $RT \leftarrow RA \& RB$                                                                                                                                                                                                                                                                                                                                                                                                                                       | FX1 | EVEN | 4 |

|                        |                        |                                                                                                                                                                                                                                                                                                                                                                       |     |      |   |
|------------------------|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|------|---|
| And Byte Immediate     | andbi rt,<br>ra, imm10 | $b \leftarrow I10 \& 0x00FF$<br>$bbbb \leftarrow b    b    b    b$<br>$RT^{0:3} \leftarrow RA^{0:3} \& bbbb$<br>$RT^{4:7} \leftarrow RA^{4:7} \& bbbb$<br>$RT^{8:11} \leftarrow RA^{8:11} \& bbbb$<br>$RT^{12:15} \leftarrow RA^{12:15} \& bbbb$                                                                                                                      | FX1 | EVEN | 2 |
| And Halfword Immediate | andhi rt,<br>ra, imm10 | $t \leftarrow RepLeftBit(I10,16)$<br>$RT^{0:1} \leftarrow RA^{0:1} \& t$<br>$RT^{2:3} \leftarrow RA^{2:3} \& t$<br>$RT^{4:5} \leftarrow RA^{4:5} \& t$<br>$RT^{6:7} \leftarrow RA^{6:7} \& t$<br>$RT^{8:9} \leftarrow RA^{8:9} \& t$<br>$RT^{10:11} \leftarrow RA^{10:11} \& t$<br>$RT^{12:13} \leftarrow RA^{12:13} \& t$<br>$RT^{14:15} \leftarrow RA^{14:15} \& t$ | FX1 | EVEN | 2 |
| And Word Immediate     | andi rt, ra,<br>imm10  | $t \leftarrow RepLeftBit(I10,32)$<br>$RT^{0:3} \leftarrow RA^{0:3} \& t$<br>$RT^{4:7} \leftarrow RA^{4:7} \& t$<br>$RT^{8:11} \leftarrow RA^{8:11} \& t$<br>$RT^{12:15} \leftarrow RA^{12:15} \& t$                                                                                                                                                                   | FX1 | EVEN | 2 |
| Or                     | or rt, ra, rb          | $RT \leftarrow RA   RB$                                                                                                                                                                                                                                                                                                                                               | FX1 | EVEN | 2 |
| Or Byte Immediate      | orbi rt, ra,<br>imm10  | $b \leftarrow I10 \& 0x00FF$<br>$bbbb \leftarrow b    b    b    b$<br>$RT^{0:3} \leftarrow RA^{0:3}   bbbb$<br>$RT^{4:7} \leftarrow RA^{4:7}   bbbb$<br>$RT^{8:11} \leftarrow RA^{8:11}   bbbb$<br>$RT^{12:15} \leftarrow RA^{12:15}   bbbb$                                                                                                                          | FX1 | EVEN | 2 |
| Or Halfword Immediate  | orhi rt, ra,<br>imm10  | $t \leftarrow RepLeftBit(I10,16)$<br>$RT^{0:1} \leftarrow RA^{0:1}   t$<br>$RT^{2:3} \leftarrow RA^{2:3}   t$<br>$RT^{4:5} \leftarrow RA^{4:5}   t$<br>$RT^{6:7} \leftarrow RA^{6:7}   t$<br>$RT^{8:9} \leftarrow RA^{8:9}   t$<br>$RT^{10:11} \leftarrow RA^{10:11}   t$<br>$RT^{12:13} \leftarrow RA^{12:13}   t$<br>$RT^{14:15} \leftarrow RA^{14:15}   t$         | FX1 | EVEN | 2 |
| Or Word Immediate      | ori rt, ra,<br>imm10   | $t \leftarrow RepLeftBit(I10,32)$<br>$RT^{0:3} \leftarrow RA^{0:3}   t$<br>$RT^{4:7} \leftarrow RA^{4:7}   t$<br>$RT^{8:11} \leftarrow RA^{8:11}   t$<br>$RT^{12:15} \leftarrow RA^{12:15}   t$                                                                                                                                                                       | FX1 | EVEN | 2 |
| Xor                    | xor rt,ra,rb           | $RT \leftarrow RA \oplus RB$                                                                                                                                                                                                                                                                                                                                          | FX1 | EVEN | 2 |
| Xor Byte Immediate     | xorbi rt, ra,<br>imm10 | $b \leftarrow I10 \& 0x00FF$<br>$bbbb \leftarrow b    b    b    b$<br>$RT^{0:3} \leftarrow RA^{0:3} \oplus bbbb$<br>$RT^{4:7} \leftarrow RA^{4:7} \oplus bbbb$                                                                                                                                                                                                        | FX1 | EVEN | 2 |

|                                  |                     |                                                                                                                                                                                                                                                                                                                                                                                                               |     |      |   |
|----------------------------------|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|------|---|
|                                  |                     | $RT^{8:11} \leftarrow RA^{8:11} \oplus bbbb$<br>$RT^{12:15} \leftarrow RA^{12:15} \oplus bbbb$                                                                                                                                                                                                                                                                                                                |     |      |   |
| Xor Halfword Immediate           | xorhi rt, ra, imm10 | $t \leftarrow \text{RepLeftBit}(I10, 16)$<br>$RT^{0:1} \leftarrow RA^{0:1} \oplus t$<br>$RT^{2:3} \leftarrow RA^{2:3} \oplus t$<br>$RT^{4:5} \leftarrow RA^{4:5} \oplus t$<br>$RT^{6:7} \leftarrow RA^{6:7} \oplus t$<br>$RT^{8:9} \leftarrow RA^{8:9} \oplus t$<br>$RT^{10:11} \leftarrow RA^{10:11} \oplus t$<br>$RT^{12:13} \leftarrow RA^{12:13} \oplus t$<br>$RT^{14:15} \leftarrow RA^{14:15} \oplus t$ | FX1 | EVEN | 2 |
| Xor Word Immediate               | xori rt, ra, imm10  | $t \leftarrow \text{RepLeftBit}(I10, 32)$<br>$RT^{0:3} \leftarrow RA^{0:3} \oplus t$<br>$RT^{4:7} \leftarrow RA^{4:7} \oplus t$<br>$RT^{8:11} \leftarrow RA^{8:11} \oplus t$<br>$RT^{12:15} \leftarrow RA^{12:15} \oplus t$                                                                                                                                                                                   | FX1 | EVEN | 2 |
| Nand                             | nand rt, ra, rb     | $RT \leftarrow \neg(RA \& RB)$                                                                                                                                                                                                                                                                                                                                                                                | FX1 | EVEN | 2 |
| Compare Equal Byte               | ceqb rt, ra, rb     | for i = 0 to 15<br>If $RA^i = RB^i$ then $RT^i \leftarrow 0xFF$<br>else $RT^i \leftarrow 0x00$<br>end                                                                                                                                                                                                                                                                                                         | FX1 | EVEN | 2 |
| Compare Equal Byte Immediate     | ceqbi rt, ra, imm10 | for i = 0 to 15<br>If $RA^i = I10_{2:9}$ then $RT^i \leftarrow 0xFF$<br>else $RT^i \leftarrow 0x00$<br>end                                                                                                                                                                                                                                                                                                    | FX1 | EVEN | 2 |
| Compare Equal Halfword           | ceqh rt, ra, rb     | for i = 0 to 15 by 2<br>If $RA^{i:2} = RB^{i:2}$ then $RT^{i:2} \leftarrow 0xFFFF$<br>else $RT^{i:2} \leftarrow 0x0000$<br>end                                                                                                                                                                                                                                                                                | FX1 | EVEN | 2 |
| Compare Equal Halfword Immediate | ceqli rt, ra, imm10 | for i = 0 to 15 by 2<br>If $RA^{i:2} = \text{RepLeftBit}(I10, 16)$ then $RT^{i:2} \leftarrow 0xFFFF$<br>else $RT^{i:2} \leftarrow 0x0000$<br>end                                                                                                                                                                                                                                                              | FX1 | EVEN | 2 |
| Compare Equal Word               | ceq rt, ra, rb      | for i = 0 to 15 by 4<br>If $RA^{i:4} = RB^{i:4}$ then $RT^{i:4} \leftarrow 0xFFFFFFFF$<br>else $RT^{i:4} \leftarrow 0x00000000$<br>end                                                                                                                                                                                                                                                                        | FX1 | EVEN | 2 |
| Compare Equal Word Immediate     | ceqj rt, ra, imm10  | for i = 0 to 15 by 4<br>If $RA^{i:4} = \text{RepLeftBit}(I10, 32)$ then $RT^{i:4} \leftarrow 0xFFFFFFFF$<br>else $RT^{i:4} \leftarrow 0x00000000$<br>end                                                                                                                                                                                                                                                      | FX1 | EVEN | 2 |

|                                                 |                          |                                                                                                                                                             |     |      |   |
|-------------------------------------------------|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|------|---|
| Compare Greater Than Byte                       | cgtb rt, ra,<br>rb       | for i = 0 to 15<br>If RA <sup>i</sup> > RB <sup>i</sup> then RT <sup>i</sup> ← 0xFF<br>else RT <sup>i</sup> ← 0x00<br>end                                   | FX1 | EVEN | 2 |
| Compare Greater Than Byte Immediate             | cgtbi rt, ra,<br>imm10   | for i = 0 to 15<br>If RA <sup>i</sup> > I10 <sub>2:9</sub> then RT <sup>i</sup> ← 0xFF<br>else RT <sup>i</sup> ← 0x00<br>end                                | FX1 | EVEN | 2 |
| Compare Greater Than Halfword                   | cgtih rt, ra,<br>rb      | for i = 0 to 15 by 2<br>If RA <sup>i:2</sup> > RB <sup>i:2</sup> then RT <sup>i:2</sup> ← 0xFFFF<br>else RT <sup>i:2</sup> ← 0x0000<br>end                  | FX1 | EVEN | 2 |
| Compare Greater Than Halfword Immediate         | cgtih i rt, ra,<br>imm10 | for i = 0 to 15 by 2<br>If RA <sup>i:2</sup> > RepLeftBit(I10,16) then RT <sup>i:2</sup><br>← 0xFFFF<br>else RT <sup>i:2</sup> ← 0x0000<br>end              | FX1 | EVEN | 2 |
| Compare Greater Than Word                       | cgt rt, ra,<br>rb        | for i = 0 to 15 by 4<br>If RA <sup>i:4</sup> > RB <sup>i:4</sup> then RT <sup>i:4</sup> ←<br>0xFFFFFFFF<br>else RT <sup>i:4</sup> ← 0x00000000<br>end       | FX1 | EVEN | 2 |
| Compare Greater Than Word Immediate             | cgti rt, ra,<br>imm10    | for i = 0 to 15 by 4<br>If RA <sup>i:4</sup> > RepLeftBit(I10,32) then RT <sup>i:4</sup><br>← 0xFFFFFFFF<br>else RT <sup>i:4</sup> ← 0x00000000<br>end      | FX1 | EVEN | 2 |
| Compare Logical Greater Than Byte               | clgtb rt, ra,<br>rb      | for i = 0 to 15<br>If RA <sup>i</sup> > <sup>u</sup> RB <sup>i</sup> then RT <sup>i</sup> ← 0xFF<br>else RT <sup>i</sup> ← 0x00<br>end                      | FX1 | EVEN | 2 |
| Compare Logical Greater Than Byte Immediate     | clgtbi rt, ra,<br>imm10  | for i = 0 to 15<br>If RA <sup>i</sup> > <sup>u</sup> I10 <sub>2:9</sub> then RT <sup>i</sup> ← 0xFF<br>else RT <sup>i</sup> ← 0x00<br>end                   | FX1 | EVEN | 2 |
| Compare Logical Greater Than Halfword           | clgth rt, ra,<br>rb      | for i = 0 to 15 by 2<br>If RA <sup>i:2</sup> > <sup>u</sup> RB <sup>i:2</sup> then RT <sup>i:2</sup> ← 0xFFFF<br>else RT <sup>i:2</sup> ← 0x0000<br>end     | FX1 | EVEN | 2 |
| Compare Logical Greater Than Halfword Immediate | clgthi rt, ra,<br>imm10  | for i = 0 to 15 by 2<br>If RA <sup>i:2</sup> > <sup>u</sup> RepLeftBit(I10,16) then RT <sup>i:2</sup><br>← 0xFFFF<br>else RT <sup>i:2</sup> ← 0x0000<br>end | FX1 | EVEN | 2 |

|                                             |                         |                                                                                                                                                                       |      |      |   |
|---------------------------------------------|-------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|------|---|
| Compare Logical Greater Than Word           | clgt rt, ra,<br>rb      | for i = 0 to 15 by 4<br>If RA <sup>i:4</sup> > <sup>u</sup> RB <sup>i:4</sup> then RT <sup>i:4</sup> ← 0xFFFFFFFF<br>else RT <sup>i:4</sup> ← 0x00000000<br>end       | FX1  | EVEN | 2 |
| Compare Logical Greater Than Word Immediate | clgti rt, ra,<br>imm10  | for i = 0 to 15 by 4<br>If RA <sup>i:4</sup> > <sup>u</sup> RepLeftBit(I10,32) then RT <sup>i:4</sup> ← 0xFFFFFFFF<br>else RT <sup>i:4</sup> ← 0x00000000<br>end      | FX1  | EVEN | 2 |
| Shift Left Quadword by Bits                 | shlqbi rt,<br>ra, rb    | s ← RB <sub>29:31</sub><br>for b = 0 to 127<br>if b + s < 128 then r <sub>b</sub> ← RA <sub>b+s</sub><br>else r <sub>b</sub> ← 0<br>end<br>RT ← r                     | PERM | ODD  | 6 |
| Shift Left Quadword by Bits Immediate       | shlqbii rt,<br>ra, imm7 | s ← I7 & 0x07<br>for b = 0 to 127<br>if b + s < 128 then r <sub>b</sub> ← RA <sub>b+s</sub><br>else r <sub>b</sub> ← 0<br>end<br>RT ← r                               | PERM | ODD  | 6 |
| Shift Left Quadword by Bytes                | shlqby rt,<br>ra, rb    | s ← RB <sub>27:31</sub><br>for b = 0 to 15<br>if b + s < 16 then r <sup>b</sup> ← RA <sup>b+s</sup><br>else r <sup>b</sup> ← 0<br>end<br>RT ← r                       | PERM | ODD  | 6 |
| Shift Left Quadword by Bytes Immediate      | shlqbyi rt,<br>ra, imm7 | s ← I7 & 0x1F<br>for b = 0 to 15<br>if b + s < 16 then r <sup>b</sup> ← RA <sup>b+s</sup><br>else r <sup>b</sup> ← 0<br>end<br>RT ← r                                 | PERM | ODD  | 6 |
| Rotate Quadword by Bits                     | rotqbi rt,<br>ra, rb    | s ← RB <sub>29:31</sub><br>for b = 0 to 127<br>if b + s < 128 then r <sub>b</sub> ← RA <sub>b+s</sub><br>else r <sub>b</sub> ← RA <sub>b+s-128</sub><br>end<br>RT ← r | PERM | ODD  | 6 |
| Rotate Quadword by Bits Immediate           | rotqbii rt,<br>ra, imm7 | s ← I7 <sub>4:6</sub><br>for b = 0 to 127<br>if b + s < 128 then r <sub>b</sub> ← RA <sub>b+s</sub><br>else r <sub>b</sub> ← RA <sub>b+s-128</sub><br>end<br>RT ← r   | PERM | ODD  | 1 |

|                                    |                      |                                                                                                                                                                                                                                |      |     |   |
|------------------------------------|----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|-----|---|
| Rotate Quadword by Bytes           | rotqby rt,<br>ra, rb | $s \leftarrow RB_{28:31}$<br>for $b = 0$ to 15<br>if $b + s < 16$ then $r^b \leftarrow RA^{b+s}$<br>else $r^b \leftarrow RA^{b+s-16}$<br>end<br>$RT \leftarrow r$                                                              | PERM | ODD | 1 |
| Rotate Quadword by Bytes Immediate | rotqby rt,<br>ra, rb | $s \leftarrow I7 \& 0x1F$<br>for $b = 0$ to 15<br>if $b + s < 16$ then $r^b \leftarrow RA^{b+s}$<br>else $r^b \leftarrow RA^{b+s-16}$<br>end<br>$RT \leftarrow r$                                                              | PERM | ODD | 1 |
| Gather Bits from Bytes             | gbt rt, ra           | $k = 0$<br>$s = 0$<br>for $j = 7$ to 128 by 8<br>$s_k \leftarrow RA_j$<br>$k = k + 1$<br>end<br>$RT^{0:3} \leftarrow 0x0000    s$<br>$RT^{4:7} \leftarrow 0$<br>$RT^{8:11} \leftarrow 0$<br>$RT^{12:15} \leftarrow 0$          | PERM | ODD | 1 |
| Gather Bits from Halfwords         | gbh rt, ra           | $k = 0$<br>$s = 0x00$<br>for $j = 15$ to 128 by 16<br>$s_k \leftarrow RA_j$<br>$k = k + 1$<br>end<br>$RT^{0:3} \leftarrow 0x00000000    s$<br>$RT^{4:7} \leftarrow 0$<br>$RT^{8:11} \leftarrow 0$<br>$RT^{12:15} \leftarrow 0$ | PERM | ODD | 1 |
| Gather Bits from Words             | gb rt, ra            | $k = 0$<br>$s = 0x0$<br>for $j = 31$ to 128 by 132<br>$s_k \leftarrow RA_j$<br>$k = k + 1$<br>end<br>$RT^{0:3} \leftarrow 0x0000000000    s$<br>$RT^{4:15} \leftarrow 0$                                                       | PERM | ODD | 1 |
| Load Quadword (d-form)             | lqd rt,<br>imm10(ra) | $LSA \leftarrow (\text{RepLeftBit}(I10    0b0000,32) +$<br>$RA^{0:3}) \& \text{LSLR} \& 0xFFFFFFFF0$<br>$RT \leftarrow \text{LocStor}(LSA, 16)$                                                                                | LS   | ODD | 4 |
| Load Quadword (x-form)             | lqx rt, ra,<br>rb    | $LSA \leftarrow (RA^{0:3} + RB^{0:3}) \& \text{LSLR} \&$<br>$0xFFFFFFFF0$<br>$RT \leftarrow \text{LocStor}(LSA, 16)$                                                                                                           | LS   | ODD | 4 |

|                              |                       |                                                                                                                                              |     |     |   |
|------------------------------|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------|-----|-----|---|
| Load Quadword (a-form)       | lqa rt,<br>imm16      | $LSA \leftarrow RepLeftBit(I16    0b00,32) \& LSLR \& 0xFFFFFFFF0$<br>$RT \leftarrow LocStor(LSA,16)$                                        | LS  | ODD | 4 |
| Store Quadword (d-form)      | stqd rt,<br>imm10(ra) | $LSA \leftarrow (RepLeftBit(I10    0b0000,32) + RA^{0:3}) \& LSLR \& 0xFFFFFFFF0$<br>$LocStor(LSA,16) \leftarrow RT$                         | LS  | ODD | 4 |
| Store Quadword (x-form)      | stqx rt, ra,<br>rb    | $LSA \leftarrow (RA^{0:3} + RB^{0:3}) \& LSLR \& 0xFFFFFFFF0$<br>$LocStor(LSA,16) \leftarrow RT$                                             | LS  | ODD | 4 |
| Store Quadword (a-form)      | stqa rt,<br>imm16     | $LSA \leftarrow RepLeftBit(I16    0b00,32) \& LSLR \& 0xFFFFFFFF0$<br>$LocStor(LSA,16) \leftarrow RT$                                        | LS  | ODD | 6 |
| Branch Relative              | br imm16              | $PC \leftarrow (PC + RepLeftBit(I16    0b00,32)) \& LSLR$                                                                                    | BR1 | ODD | 4 |
| Branch Absolute              | bra imm16             | $PC \leftarrow RepLeftBit(I16    0b00,32) \& LSLR$                                                                                           | BR1 | ODD | 4 |
| Branch Relative and Set Link | brsl rt,<br>imm16     | $RT^{0:3} \leftarrow (PC + 4) \& LSLR$<br>$RT^{4:15} \leftarrow 0$<br>$PC \leftarrow (PC + RepLeftBit(I16    0b00,32)) \& LSLR$              | BR1 | ODD | 4 |
| Branch Indirect              | bi ra                 | $PC \leftarrow RA^{0:3} \& LSLR \& 0xFFFFFFFFC$                                                                                              | BR1 | ODD | 4 |
| Branch If Not Zero Word      | brnz rt,<br>imm16     | If $RT^{0:3} \neq 0$ then<br>$PC \leftarrow (PC + RepLeftBit(I16    0b00)) \& LSLR \& 0xFFFFFFFFC$<br>else<br>$PC \leftarrow (PC+4) \& LSLR$ | BR1 | ODD | 4 |
| Branch if Zero Word          | brz rt,<br>imm16      | If $RT^{0:3} = 0$ then<br>$PC \leftarrow (PC + RepLeftBit(I16    0b00)) \& LSLR \& 0xFFFFFFFFC$<br>else<br>$PC \leftarrow (PC+4) \& LSLR$    | BR1 | ODD | 4 |

## **SystemVerilog Codebase**

```

1  module tb_InstructionFetch();
2      logic          clk, reset;
3      logic[0:31]    ins_mem[0:255];
4      logic[0:31]    instr[0:255];
5      logic          read_enable,stall;
6      logic[7:0]     pc;
7
8
9      IF ins_fetch(clk, reset, instr,pc, read_enable);
10
11
12     initial
13         clk = 0;
14
15     always begin
16         #5 clk = ~clk;
17     end
18
19     initial begin
20         $readmemb("./compiler/out", ins_mem);
21         for(integer i=0;i<256;i++) begin
22             $display("PC %d %b", i, ins_mem[i]);
23         end
24         #1; reset =1;
25         @(posedge clk); #1; reset = 1;
26         @(posedge clk); #1; reset = 0;
27
28
29         #300;$stop;
30     end
31
32     always @(posedge clk) begin
33         if(read_enable==1) begin
34             $display($time,"TB: pc %d ",pc);
35             instr[0:255] = ins_mem[(pc)+:256];
36             for(integer i=0;i<256;i++) begin
37                 $display("PC %d %b", i, ins_mem[i+pc]);
38             end
39         end
40     end
41 endmodule

```

```

1  module IF(clk, reset, ins_cache, pc, read_enable);
2
3      input logic clk;
4      input logic reset;
5
6      input logic[0:31] ins_cache[0:255]; // Instruction line buffer , stores 64B
7      instruction
8
9      logic[0:31] instr_d[0:1];           // 2 instruction sent to DECODE stage
10     logic stall;                     // incase of stall Instruction fetch
11     //should stop fetching new instruction
12
13     logic branch_taken;
14
15     output logic read_enable;        // Used to signal that IF is ready to read next
16     set of 64B instruction
17
18     output logic[7:0] pc;
19
20     logic[7:0] pc_wb;               // pc_wb access as reset PC signal for IF to
21     start new instr read position
22     integer pc_check;             // used for checkpointing, to adjust the pc while
23     reading from ins_cache
24
25     localparam [0:31] NOP = 32'b01000000001000000000000000000000;
26     localparam [0:31] LNOP = 32'b00000000000100000000000000000000;
27
28     Decode decode(clk, reset, instr_d, pc, pc_wb, stall, branch_taken);
29
30     always_comb begin : pc_counter
31         if(reset == 1) begin
32             pc_check = 0;
33             read_enable =1;
34         end
35         else begin
36             read_enable = 0;
37         end
38     end
39
40     always_ff @(posedge clk) begin : fetch_instruction
41
42         $display($time," IF: stall %d pc %d pc_wb %d read_enable %d ",stall, pc,
43         pc_wb, read_enable);
44
45         if(reset == 1 ) begin
46             pc<=0;
47
48             instr_d[0]<=32'h0000;
49             instr_d[1]<=32'h0000;
50
51         end
52         else begin
53
54             // Use of stall is to stop IF fetching new instruction
55             // This can be used in case of dual issue conflicts where decode
56             // inserts no-op instruction
57             // we use pc_wb to continue fetch new stream of instruction then onwards
58
59             if( stall==0 ) begin
60                 // stall<=0;
61                 instr_d[0]<=ins_cache[pc];
62                 instr_d[1]<=ins_cache[pc+1];
63                 $display($time," IF: ins %b ins %b pc %d pc_wb %d read_enable %d
64                 ",ins_cache[pc], ins_cache[pc+1],pc,pc_wb, read_enable);
65                 $display($time," IF: ins %h ins %h pc %d pc_wb %d read_enable %d
66                 ",ins_cache[pc], ins_cache[pc+1],pc,pc_wb, read_enable);

```

```

63
64      $display($time," IF: ins %b ins %b pc %d pc_wb %d read_enable %d
65      ",instr_d[0], instr_d[1],pc,pc_wb, read_enable);
66      $display($time," IF: ins %h ins %h pc %d pc_wb %d read_enable %d
67      ",instr_d[0], instr_d[1],pc,pc_wb, read_enable);

68      if (pc < 254)
69          pc <= pc+2;

70
71  end
72 else begin
73     if (branch_taken == 0) begin
74         $display($time,"IF: pc update to pc_wb %d pc %d" ,pc_wb, pc);
75         instr_d[0]<=ins_cache[pc_wb];
76         instr_d[1]<=ins_cache[pc_wb+1];
77     end
78     else begin
79         pc <= pc_wb & ~1;
80         $display($time,"IF: pc update to pc_wb %d pc %d" ,pc_wb, pc);
81         if (pc_wb & 256'h1) begin // If branching to an odd
82             number instr
83                 instr_d[0]<=ins_cache[pc_wb-2];
84                 instr_d[1]<=0;
85             end
86             else begin
87                 instr_d[0]<=ins_cache[pc_wb-2];
88                 instr_d[1]<=ins_cache[pc_wb-1];
89             end
90         end
91     end
92 end
93
94 endmodule

```

```

1  module Decode(clk, reset, instr, pc, stall_pc, stall, branch_taken_reg);
2    input logic          clk, reset;
3    input logic [0:31]   instr[0:1];
4    logic [0:31]        instr_next[0:1], instr_dec[0:1], instr_next_reg[0:1];
5
6    logic [0:31]        instr_even, instr_odd, instr_odd_issue, instr_even_issue; //Instr
7      from decoder
8
9    //Signals for handling branches
10   logic [7:0]         pc_wb;                                         //New program counter
11   for branch
12     output logic [7:0] stall_pc;
13     output logic       stall;
14     output logic       branch_taken_reg;                                //Was branch taken?
15     logic              branch_taken;                                    //Was branch taken?
16     logic              first_odd, first_odd_out;                      //1 if odd instr is
17     first in pair, 0 else; Used for branch flushing
18     logic              stall_var;
19     logic [7:0]         stall_pc_var;
20
21   input logic[7:0] pc;                                              // PC for the current state
22
23   //Nets from decode logic
24   logic [2:0]         format_even, format_odd;                      //Format of instr
25   logic [0:10]        op_even, op_odd;                                //Opcode of instr (used
26   with format)
27   logic [1:0]         unit_even, unit_odd;                           //Destination unit of
28   instr; Order of: FP, FX2, Byte, FX1 (Even); Perm, LS, Br (Odd)
29   logic [0:6]         rt_addr_even, rt_addr_odd;                     //Destination register
30   addresses
31   logic [0:17]        imm_even, imm_odd;                            //Full possible immediate
32   value (used with format)
33   logic              reg_write_even, reg_write_odd;                  //1 if instr will write to
34   rt, else 0
35
36   logic              first_cyc;                                     //Due to how finished is
37   detected, workaround is needed to prevent flag after reset
38
39   logic [7:0]         pc_dec, pc_pipe;                             //New program
40   counter for stalls
41
42   localparam [0:31] NOP = 32'b01000000001000000000000000000000;
43   localparam [0:31] LNOP = 32'b00000000000010000000000000000000;
44
45   //Internal Signals for Handling RAW Hazards
46   logic [0:6]         rt_addr_delay_even, rt_addr_delay_odd;      //Destination
47   register for rt_wb
48   logic              reg_write_delay_even, reg_write_delay_odd;    //Will rt_wb write
49   to RegTable
50   logic              stall_first_raw, stall_second_raw;           // 1 if respective
51   signal is to be stalled due to RAW hazard
52   logic              stall_first, stall_second;                   // 1 if respective signal
53   is to be stalled due to RAW or structural hazard
54
55   logic [6:0][0:6]    rt_addr_delay_fp1;                         //Destination register for rt_wb
56   logic [6:0]          reg_write_delay_fp1;                      //Will rt_wb write to RegTable
57   logic [6:0]          int_delay_fp1;                           //Will fp1 write an int result
58
59   logic [3:0][0:6]    rt_addr_delay_fx2;                         //Destination register for rt_wb
60   logic [3:0]          reg_write_delay_fx2;                      //Will rt_wb write to RegTable
61
62   logic [3:0][0:6]    rt_addr_delay_b1;                         //Destination register for rt_wb
63   logic [3:0]          reg_write_delay_b1;                      //Will rt_wb write to RegTable
64
65   logic [1:0][0:6]    rt_addr_delay_fx1;                         //Destination register for rt_wb
66   logic [1:0]          reg_write_delay_fx1;                      //Will rt_wb write to RegTable
67
68   logic [3:0][0:6]    rt_addr_delay_p1;                         //Destination register for rt_wb
69   logic [3:0]          reg_write_delay_p1;                      //Will rt_wb write to RegTable

```

```

56
57     logic [5:0][0:6]      rt_addr_delay_ls1;      //Destination register for rt_wb
58     logic [5:0]           reg_write_delay_ls1; //Will rt_wb write to RegTable
59
60     typedef struct {
61         logic [0:31]        instr;
62         logic [0:10]        op;
63         logic               reg_write;
64         logic               even_valid, odd_valid;          //Is even/odd instr a valid
65         instr?;
66         logic [0:17]        imm;
67         logic [0:6]         rt_addr;
68         logic [1:0]         unit;
69         logic [2:0]         format;
70         logic [0:6]         ra_addr, rb_addr, rc_addr;
71         logic               ra_valid, rb_valid, rc_valid; //Is ra/rb/rc read in this instr?
72     } op_code;
73
74     op_code first, second;
75
76     Pipes pipe(.clk(clk), .reset(reset), .pc(pc_pipe),
77                 .instr_even(instr_even), .instr_odd(instr_odd),
78                 .pc_wb(pc_wb), .branch_taken(branch_taken),
79                 .op_even(op_even), .op_odd(op_odd),
80                 .unit_even(unit_even), .unit_odd(unit_odd),
81                 .rt_addr_even(rt_addr_even), .rt_addr_odd(rt_addr_odd),
82                 .format_even(format_even), .format_odd(format_odd),
83                 .imm_even(imm_even), .imm_odd(imm_odd),
84                 .reg_write_even(reg_write_even), .reg_write_odd(reg_write_odd),
85                 .first_odd(first_odd_out),
86                 .rt_addr_delay_even(rt_addr_delay_even),
87                 .reg_write_delay_even(reg_write_delay_even),
88                 .rt_addr_delay_odd(rt_addr_delay_odd), .reg_write_delay_odd(reg_write_delay_odd),
89                 .rt_addr_delay_fp1(rt_addr_delay_fp1),
90                 .reg_write_delay_fp1(reg_write_delay_fp1), .int_delay_fp1(int_delay_fp1),
91                 .rt_addr_delay_fx2(rt_addr_delay_fx2), .reg_write_delay_fx2(reg_write_delay_fx2),
92                 .rt_addr_delay_b1(rt_addr_delay_b1), .reg_write_delay_b1(reg_write_delay_b1),
93                 .rt_addr_delay_fx1(rt_addr_delay_fx1), .reg_write_delay_fx1(reg_write_delay_fx1),
94                 .rt_addr_delay_p1(rt_addr_delay_p1), .reg_write_delay_p1(reg_write_delay_p1),
95                 .rt_addr_delay_ls1(rt_addr_delay_ls1), .reg_write_delay_ls1(reg_write_delay_ls1)
96 );
97
98
99     always_ff @(posedge clk) begin : decode_op
100
101         if (first.even_valid) begin
102             instr_even <= first.instr;
103             op_even <= first.op;
104             reg_write_even <= first.reg_write;
105             imm_even <= first.imm;
106             rt_addr_even <= first.rt_addr;
107             unit_even <= first.unit;
108             format_even <= first.format;
109             first_odd_out <= 0;
110         end
111         else if (second.even_valid) begin
112             instr_even <= second.instr;
113             op_even <= second.op;
114             reg_write_even <= second.reg_write;
115             imm_even <= second.imm;
116             rt_addr_even <= second.rt_addr;
117             unit_even <= second.unit;
118             format_even <= second.format;
119         end
120         else begin
121             instr_even <= 0;
122             op_even <= 0;
123             reg_write_even <= 0;
124             imm_even <= 0;
125             rt_addr_even <= 0;
126         end

```

```

117     unit_even <= 0;
118     format_even <= 0;
119   end
120
121   if (first.odd_valid) begin
122     instr_odd <= first.instr;
123     op_odd <= first.op;
124     reg_write_odd <= first.reg_write;
125     imm_odd <= first.imm;
126     rt_addr_odd <= first.rt_addr;
127     unit_odd <= first.unit;
128     format_odd <= first.format;
129     first_odd_out <= 1;
130   end
131   else if (second.odd_valid) begin
132     instr_odd <= second.instr;
133     op_odd <= second.op;
134     reg_write_odd <= second.reg_write;
135     imm_odd <= second.imm;
136     rt_addr_odd <= second.rt_addr;
137     unit_odd <= second.unit;
138     format_odd <= second.format;
139     first_odd_out <= 0;
140   end
141   else begin
142     instr_odd <= 0;
143     op_odd <= 0;
144     reg_write_odd <= 0;
145     imm_odd <= 0;
146     rt_addr_odd <= 0;
147     unit_odd <= 0;
148     format_odd <= 0;
149   end
150
151   instr_next_reg <= instr_next;
152   stall <= stall_var;
153   stall_pc <= stall_pc_var;
154   branch_taken_reg <= branch_taken;
155   pc_pipe <= pc_dec;
156
157   first_cyc <= reset;      //flag is always high after reset and low otherwise
158 end
159
160
161 //Decode logic
162 always_comb begin
163
164   stall_first = 0;
165   stall_second = 0;
166
167   $display("=====");
168   $display($time, " New Decode: Sensitivity list");
169   $display("pc: %d, reset: %b, branch_taken: %b, finished: %b, stall: %b", pc,
170   reset, branch_taken, 0, stall);
171   $display("instr_next[0]:           %b, instr_next[1]:       %b, ", instr_next[0],
172   instr_next[1]);
173   $display("instr_next_reg[0]:      %b, instr_next_reg[1]:    %b, ",
174   instr_next_reg[0], instr_next_reg[1]);
175   $display("instr[0]:                 %b, instr[1]:           %b", instr[0], instr[1]);
176   $display("=====");
177
178   if (reset == 1) begin
179     stall_var = 0;
180     stall_pc_var = 0;
181     first_odd = 0;
182     first = check_one(0);
183     second = check_one(0);
184     instr_next[0] = 0;
185     instr_next[1] = 0;

```

```

183
184
185     end
186
187 else begin
188     if (branch_taken == 1) begin
189         $display($time," New Decode: Branch taken, jumping to addr: %d", pc_wb);
190         stall_pc_var = pc_wb+2;
191         stall_var = 1;
192         instr_next[0] = 0;
193         instr_next[1] = 0;
194         first = check_one(0);
195         second = check_one(0);
196     end
197 else begin
198     if (stall == 1) begin
199         instr_dec[0] = instr_next_reg[0];
200         instr_dec[1] = instr_next_reg[1];
201         stall_var = 0;
202         pc_dec = stall_pc;
203         $display($time," New Decode: Choosing queued instr");
204     end
205 else begin
206     instr_dec[0] = instr[0];
207     instr_dec[1] = instr[1];
208     pc_dec = pc;
209     $display($time," New Decode: Choosing new instr");
210 end
211
212 if ((instr_dec[0] != 0)) begin
213     first = check_one(instr_dec[0]); //Checking first instr
214
215     if (first.ra_valid) begin
216         for (int i = 0; i <= 4; i++) begin
217             if (((first.ra_addr == rt_addr_delay_fp1[i]) &&
218                 (reg_write_delay_fp1[i] && int_delay_fp1[i]))) begin
219                 stall_first = 1;
220             end
221
222             if ((i < 4) &&
223                 (((first.ra_addr == rt_addr_delay_ls1[i]) &&
224                     reg_write_delay_ls1[i]) ||
225                     ((first.ra_addr == rt_addr_delay_fp1[i]) &&
226                         reg_write_delay_fp1[i]))) begin
227                 stall_first = 1;
228             end
229
230             if ((i < 2) &&
231                 (((first.ra_addr == rt_addr_delay_fx2[i]) &&
232                     reg_write_delay_fx2[i]) ||
233                     ((first.ra_addr == rt_addr_delay_b1[i]) &&
234                         reg_write_delay_b1[i]) ||
235                     ((first.ra_addr == rt_addr_delay_p1[i]) &&
236                         reg_write_delay_p1[i]))) begin
237                 stall_first = 1;
238             end
239
240             if ((first.ra_addr == rt_addr_delay_even) &&
241                 reg_write_delay_even) begin
242                 stall_first = 1;
243             end
244             if ((first.ra_addr == rt_addr_delay_odd) &&
245                 reg_write_delay_odd) begin
246                 stall_first = 1;
247             end
248
249         end
250         if (first.rb_valid) begin
251             for (int i = 0; i <= 4; i++) begin
252                 if (((first.rb_addr == rt_addr_delay_fp1[i]) &&
253                     (reg_write_delay_fp1[i] && int_delay_fp1[i]))) begin
254                     stall_first = 1;
255                 end
256             end
257         end
258     end
259 
```

```

243
244
245         if ((i < 4) &&
246             (((first.rb_addr == rt_addr_delay_ls1[i]) &&
247              reg_write_delay_ls1[i]) ||
248               ((first.rb_addr == rt_addr_delay_fp1[i]) &&
249               reg_write_delay_fp1[i]))) begin
250                 stall_first = 1;
251             end
252
253             if ((i < 2) &&
254                 (((first.rb_addr == rt_addr_delay_fx2[i]) &&
255                  reg_write_delay_fx2[i]) ||
256                   ((first.rb_addr == rt_addr_delay_b1[i]) &&
257                     reg_write_delay_b1[i]) ||
258                     ((first.rb_addr == rt_addr_delay_p1[i]) &&
259                     reg_write_delay_p1[i]))) begin
260                         stall_first = 1;
261                     end
262
263             end
264         if (first.rc_valid) begin
265             for (int i = 0; i <= 4; i++) begin
266                 if ((first.rc_addr == rt_addr_delay_fp1[i]) &&
267                     (reg_write_delay_fp1[i] && int_delay_fp1[i])) begin
268                     stall_first = 1;
269                 end
270
271                 if ((i < 4) &&
272                     (((first.rc_addr == rt_addr_delay_ls1[i]) &&
273                       reg_write_delay_ls1[i]) ||
274                         ((first.rc_addr == rt_addr_delay_fp1[i]) &&
275                           reg_write_delay_fp1[i]))) begin
276                             stall_first = 1;
277                         end
278
279                 if ((i < 2) &&
280                     (((first.rc_addr == rt_addr_delay_fx2[i]) &&
281                       reg_write_delay_fx2[i]) ||
282                         ((first.rc_addr == rt_addr_delay_b1[i]) &&
283                           reg_write_delay_b1[i]) ||
284                           ((first.rc_addr == rt_addr_delay_p1[i]) &&
285                             reg_write_delay_p1[i]))) begin
286                             stall_first = 1;
287                         end
288
289             end
290         end
291     else begin
292         first = check_one(0);
293     end
294
295     if ((instr_dec[1] != 0)) begin
296         second = check_one(instr_dec[1]);

```

```

297
298     if ((second.even_valid && first.even_valid) || (second.odd_valid &&
299     first.odd_valid)) begin      //Same pipe, structural hazard
300         stall_second = 1;
301         $display($time," New Decode: Same pipe hazard found for second
302         instr");
303     end
304     else if (first.reg_write && second.reg_write && (first.rt_addr ==
305     second.rt_addr)) begin          //Same dest, data hazard (WAW)
306         stall_second = 1;
307         $display($time," New Decode: Same destination hazard found for
308         second instr");
309     end
310     else begin
311         if (second.ra_valid) begin
312             for (int i = 0; i <= 4; i++) begin
313                 if ((second.ra_addr == rt_addr_delay_fp1[i]) &&
314                 (reg_write_delay_fp1[i] && int_delay_fp1[i])) begin
315                     stall_second = 1;
316                     $display($time," New Decode: RAW hazard found for
317                     second instr ra");
318                 end
319
320                 if ((i < 4) &&
321                     (((second.ra_addr == rt_addr_delay_ls1[i]) &&
322                     reg_write_delay_ls1[i]) ||
323                     ((second.ra_addr == rt_addr_delay_fp1[i]) &&
324                     reg_write_delay_fp1[i]))) begin
325                     stall_second = 1;
326                     $display($time," New Decode: RAW hazard found for
327                     second instr ra");
328                 end
329                 if ((i < 2) &&
330                     (((second.ra_addr == rt_addr_delay_fx2[i]) &&
331                     reg_write_delay_fx2[i]) ||
332                     ((second.ra_addr == rt_addr_delay_b1[i]) &&
333                     reg_write_delay_b1[i]) ||
334                     ((second.ra_addr == rt_addr_delay_p1[i]) &&
335                     reg_write_delay_p1[i]))) begin
336                     stall_second = 1;
337                     $display($time," New Decode: RAW hazard found for
338                     second instr ra");
339                 end
340                 if ((second.ra_addr == rt_addr_delay_even) &&
341                     reg_write_delay_even) begin
342                     stall_second = 1;
343                     $display($time," New Decode: RAW hazard found for
344                     second instr ra");
345                 end
346             end
347             if (second.rb_valid) begin
348                 for (int i = 0; i <= 4; i++) begin
349                     if ((second.rb_addr == rt_addr_delay_fp1[i]) &&
350                         (reg_write_delay_fp1[i] && int_delay_fp1[i])) begin
351                         stall_second = 1;

```

```

346                               $display($time," New Decode: RAW hazard found for
347                               second instr rb");
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
      if ((i < 4) &&
          (((second.rb_addr == rt_addr_delay_ls1[i]) &&
           reg_write_delay_ls1[i]) ||
           ((second.rb_addr == rt_addr_delay_fp1[i]) &&
            reg_write_delay_fp1[i]))) begin
          stall_second = 1;
          $display($time," New Decode: RAW hazard found for
          second instr rb");
      end
      if ((i < 2) &&
          (((second.rb_addr == rt_addr_delay_fx2[i]) &&
           reg_write_delay_fx2[i]) ||
           ((second.rb_addr == rt_addr_delay_b1[i]) &&
            reg_write_delay_b1[i]) ||
           ((second.rb_addr == rt_addr_delay_p1[i]) &&
            reg_write_delay_p1[i]))) begin
          stall_second = 1;
          $display($time," New Decode: RAW hazard found for
          second instr rb");
      end
      if ((second.rb_addr == rt_addr_delay_even) &&
          reg_write_delay_even) begin
          stall_second = 1;
          $display($time," New Decode: RAW hazard found for
          second instr rb");
      end
      if ((second.rb_addr == rt_addr_delay_odd) &&
          reg_write_delay_odd) begin
          stall_second = 1;
          $display($time," New Decode: RAW hazard found for
          second instr rb");
      end
      if ((second.rb_addr == first.rt_addr) && first.reg_write)
      begin
          stall_second = 1;
          $display($time," New Decode: RAW hazard found for
          second instr rb");
      end
      if (second.rc_valid) begin
          for (int i = 0; i <= 4; i++) begin
              if ((second.rc_addr == rt_addr_delay_fp1[i]) &&
                  (reg_write_delay_fp1[i] && int_delay_fp1[i])) begin
                  stall_second = 1;
                  $display($time," New Decode: RAW hazard found for
                  second instr rc");
              end
              if ((i < 4) &&
                  (((second.rc_addr == rt_addr_delay_ls1[i]) &&
                   reg_write_delay_ls1[i]) ||
                   ((second.rc_addr == rt_addr_delay_fp1[i]) &&
                    reg_write_delay_fp1[i]))) begin
                  stall_second = 1;
                  $display($time," New Decode: RAW hazard found for
                  second instr rc");
              end
              if ((i < 2) &&
                  (((second.rc_addr == rt_addr_delay_fx2[i]) &&
                   reg_write_delay_fx2[i]) ||
                   ((second.rc_addr == rt_addr_delay_b1[i]) &&
                    reg_write_delay_b1[i]) ||

```

```

394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454

```

```

        ((second.rc_addr == rt_addr_delay_p1[i]) &&
         reg_write_delay_p1[i])) begin
        stall_second = 1;
        $display($time," New Decode: RAW hazard found for
second instr rc");
    end
end
if ((second.rc_addr == rt_addr_delay_even) &&
reg_write_delay_even) begin
    stall_second = 1;
    $display($time," New Decode: RAW hazard found for
second instr rc");
end
if ((second.rc_addr == rt_addr_delay_odd) &&
reg_write_delay_odd) begin
    stall_second = 1;
    $display($time," New Decode: RAW hazard found for
second instr rc");
end
if ((second.rc_addr == first.rt_addr) && first.reg_write)
begin
    stall_second = 1;
    $display($time," New Decode: RAW hazard found for
second instr rc");
end
end
end
else begin
    second = check_one(0);
end

if (stall_first) begin
    $display($time," New Decode: RAW hazard found for first instr");
    stall_pc_var = pc_dec;
    stall_var = 1;
    instr_next[0] = instr_dec[0];
    instr_next[1] = instr_dec[1];
    first = check_one(0);
    second = check_one(0);
end
else if (stall_second) begin
    stall_pc_var = pc_dec;
    stall_var = 1;
    instr_next[0] = 0;
    instr_next[1] = instr_dec[1];
    second = check_one(0);
end
else begin
    stall_var = 0;
end
end
$display($time, "finished_var: %b, stall_var: %b", 0, stall_var);
end
end

function op_code check_one (input logic[0:31] instr);
if(reset==1) begin
    check_one.op = 0;
    check_one.even_valid = 1;
    check_one.odd_valid = 1;
end
$display($time," instr %b %b",instr, instr[0:10]); //Even decoding
check_one.reg_write = 1;

```

```

455     check_one.rt_addr = instr[25:31];
456     check_one.ra_addr = instr[18:24];
457     check_one.rb_addr = instr[11:17];
458     check_one.rc_addr = instr[25:31];
459     check_one.instr = instr;
460     check_one.even_valid = 1;
461   if (instr == 0) begin                                //alternate nop
462     check_one.format = 0;
463     check_one.ra_valid = 0;
464     check_one.rb_valid = 0;
465     check_one.rc_valid = 0;
466     check_one.op = 0;
467     check_one.unit = 0;
468     check_one.rt_addr = 0;
469     check_one.imm = 0;
470     check_one.reg_write = 0;
471     check_one.even_valid = 0;
472   end                                              //RRR-type
473   else if (instr[0:3] == 4'b1100) begin           //mpya
474     check_one.format = 1;
475     check_one.ra_valid = 1;
476     check_one.rb_valid = 1;
477     check_one.rc_valid = 1;
478     check_one.op = 4'b1100;
479     check_one.unit = 0;
480     check_one.rt_addr = instr[4:10];
481   end
482   else if (instr[0:3] == 4'b1110) begin           //fma
483     check_one.format = 1;
484     check_one.ra_valid = 1;
485     check_one.rb_valid = 1;
486     check_one.rc_valid = 1;
487     check_one.op = 4'b1110;
488     check_one.unit = 0;
489     check_one.rt_addr = instr[4:10];
490   end
491   else if (instr[0:3] == 4'b1111) begin           //fms
492     check_one.format = 1;
493     check_one.ra_valid = 1;
494     check_one.rb_valid = 1;
495     check_one.rc_valid = 1;
496     check_one.op = 4'b1111;
497     check_one.unit = 0;
498     check_one.rt_addr = instr[4:10];
499   end                                              //RI18-type
500   else if (instr[0:6] == 7'b0100001) begin         //ila
501     check_one.format = 6;
502     check_one.ra_valid = 0;
503     check_one.rb_valid = 0;
504     check_one.rc_valid = 0;
505     check_one.op = 7'b0100001;
506     check_one.unit = 3;
507     check_one.imm = $signed(instr[7:24]);
508   end                                              //RI8-type
509   else if (instr[0:9] == 10'b0111011000) begin      //cflts
510     check_one.format = 3;
511     check_one.ra_valid = 1;
512     check_one.rb_valid = 0;
513     check_one.rc_valid = 0;
514     check_one.op = 10'b0111011000;
515     check_one.unit = 0;
516     check_one.imm = $signed(instr[10:17]);
517   end                                              //cfltu
518   else if (instr[0:9] == 10'b0111011001) begin      //cfltu
519     check_one.format = 3;
520     check_one.ra_valid = 1;
521     check_one.rb_valid = 0;
522     check_one.rc_valid = 0;
523     check_one.op = 10'b0111011001;

```

```

524     check_one.unit = 0;
525     check_one.imm = $signed(instr[10:17]);
526   end
527   else if (instr[0:8] == 9'b010000011) begin //RI16-type
528     check_one.format = 5;
529     check_one.ra_valid = 0;
530     check_one.rb_valid = 0;
531     check_one.rc_valid = 0;
532     check_one.op = 9'b010000011;
533     check_one.unit = 3;
534     check_one.imm = $signed(instr[9:24]);
535   end
536   else if (instr[0:8] == 9'b010000001) begin //ilh
537     check_one.format = 5;
538     check_one.ra_valid = 0;
539     check_one.rb_valid = 0;
540     check_one.rc_valid = 0;
541     check_one.op = 9'b010000001;
542     check_one.unit = 3;
543     check_one.imm = $signed(instr[9:24]);
544   end
545   else if (instr[0:8] == 9'b010000010) begin //ilhu
546     check_one.format = 5;
547     check_one.ra_valid = 0;
548     check_one.rb_valid = 0;
549     check_one.rc_valid = 0;
550     check_one.op = 9'b010000010;
551     check_one.unit = 3;
552     check_one.imm = $signed(instr[9:24]);
553   end
554   else if (instr[0:8] == 9'b011000001) begin //iohl
555     check_one.format = 5;
556     check_one.ra_valid = 0;
557     check_one.rb_valid = 0;
558     check_one.rc_valid = 0;
559     check_one.op = 9'b011000001;
560     check_one.unit = 3;
561     check_one.imm = $signed(instr[9:24]);
562   end
563   else begin
564     check_one.format = 4; //RI10-type
565     check_one.ra_valid = 1;
566     check_one.rb_valid = 0;
567     check_one.rc_valid = 0;
568     check_one.imm = $signed(instr[8:17]);
569     case(instr[0:7])
570       8'b01110100 : begin //mpyi
571         check_one.op = 8'b01110100;
572         check_one.unit = 0;
573       end
574       8'b01110101 : begin //mpyui
575         check_one.op = 8'b01110101;
576         check_one.unit = 0;
577       end
578       8'b000011101 : begin //ahi
579         check_one.op = 8'b000011101;
580         check_one.unit = 3;
581       end
582       8'b000011100 : begin //ai
583         check_one.op = 8'b000011100;
584         check_one.unit = 3;
585       end
586       8'b000001101 : begin //sfhi
587         check_one.op = 8'b000001101;
588         check_one.unit = 3;
589       end
590       8'b000001100 : begin //sfi
591         check_one.op = 8'b000001100;
592         check_one.unit = 3;

```

```

593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
end
8'b00010110 : begin //andbi
    check_one.op = 8'b00010110;
    check_one.unit = 3;
end
8'b00010101 : begin //andhi
    check_one.op = 8'b00010101;
    check_one.unit = 3;
end
8'b00010100 : begin //andi
    check_one.op = 8'b00010100;
    check_one.unit = 3;
end
8'b00000110 : begin //orbi
    check_one.op = 8'b00000110;
    check_one.unit = 3;
end
8'b00000101 : begin //orhi
    check_one.op = 8'b00000101;
    check_one.unit = 3;
end
8'b00000100 : begin //ori
    check_one.op = 8'b00000100;
    check_one.unit = 3;
end
8'b01000110 : begin //xorbi
    check_one.op = 8'b01000110;
    check_one.unit = 3;
end
8'b01000101 : begin //xorhi
    check_one.op = 8'b01000101;
    check_one.unit = 3;
end
8'b01000100 : begin //xori
    check_one.op = 8'b01000100;
    check_one.unit = 3;
end
8'b01111110 : begin //ceqbi
    check_one.op = 8'b01111110;
    check_one.unit = 3;
end
8'b01111101 : begin //ceqli
    check_one.op = 8'b01111101;
    check_one.unit = 3;
end
8'b01111100 : begin //ceqqi
    check_one.op = 8'b01111100;
    check_one.unit = 3;
end
8'b01001110 : begin //cgtbi
    check_one.op = 8'b01001110;
    check_one.unit = 3;
end
8'b01001101 : begin //cgthi
    check_one.op = 8'b01001101;
    check_one.unit = 3;
end
8'b01001100 : begin //cgthi
    check_one.op = 8'b01001100;
    check_one.unit = 3;
end
8'b01011110 : begin //clgtbi
    check_one.op = 8'b01011110;
    check_one.unit = 3;
end
8'b01011101 : begin //clgthi
    check_one.op = 8'b01011101;
    check_one.unit = 3;
end

```

```

662      8'b01011100 : begin           //clgti
663          check_one.op = 8'b01011100;
664          check_one.unit = 3;
665      end
666      default : check_one.format = 7;
667  endcase
668  if (check_one.format == 7) begin
669      check_one.format = 0;                      //RR-type
670      check_one.ra_valid = 1;
671      check_one.rb_valid = 1;
672      check_one.rc_valid = 0;
673      case(instr[0:10])
674          11'b01111000100 : begin           //mpy
675              check_one.op = 11'b01111000100;
676              check_one.unit = 0;
677          end
678          11'b01111001100 : begin           //mpyu
679              check_one.op = 11'b01111001100;
680              check_one.unit = 0;
681          end
682          11'b01111000101 : begin           //mpyh
683              check_one.op = 11'b01111000101;
684              check_one.unit = 0;
685          end
686          11'b01011000100 : begin           //fa
687              check_one.op = 11'b01011000100;
688              check_one.unit = 0;
689          end
690          11'b01011000101 : begin           //fs
691              check_one.op = 11'b01011000101;
692              check_one.unit = 0;
693          end
694          11'b01011000110 : begin           //fm
695              check_one.op = 11'b01011000110;
696              check_one.unit = 0;
697          end
698          11'b01111000010 : begin           //fceq
699              check_one.op = 11'b01111000010;
700              check_one.unit = 0;
701          end
702          11'b01011000010 : begin           //fcgt
703              check_one.op = 11'b01011000010;
704              check_one.unit = 0;
705          end
706          11'b00001011111 : begin           //shlh
707              check_one.op = 11'b00001011111;
708              check_one.unit = 1;
709          end
710          11'b00001011011 : begin           //shl
711              check_one.op = 11'b00001011011;
712              check_one.unit = 1;
713          end
714          11'b00001011100 : begin           //roth
715              check_one.op = 11'b00001011100;
716              check_one.unit = 1;
717          end
718          11'b00001011000 : begin           //rot
719              check_one.op = 11'b00001011000;
720              check_one.unit = 1;
721          end
722          11'b00001011101 : begin           //rothm
723              check_one.op = 11'b00001011101;
724              check_one.unit = 1;
725          end
726          11'b00001011001 : begin           //rotm
727              check_one.op = 11'b00001011001;
728              check_one.unit = 1;
729          end
730          11'b00001011110 : begin           //rotmah

```

```

731           check_one.op = 11'b000001011110;
732           check_one.unit = 1;
733       end
734   11'b000001011010 : begin //rotma
735       check_one.op = 11'b000001011010;
736       check_one.unit = 1;
737   end
738   11'b01010110100 : begin //cntb
739       check_one.op = 11'b01010110100;
740       check_one.unit = 2;
741       check_one.rb_valid = 0;
742   end
743   11'b00011010011 : begin //avgb
744       check_one.op = 11'b00011010011;
745       check_one.unit = 2;
746   end
747   11'b000001010011 : begin //absdb
748       check_one.op = 11'b000001010011;
749       check_one.unit = 2;
750   end
751   11'b01001010011 : begin //sumb
752       $display("found sumb");
753       check_one.op = 11'b01001010011;
754       check_one.unit = 2;
755   end
756   11'b000011001000 : begin //ah
757       check_one.op = 11'b000011001000;
758       check_one.unit = 3;
759   end
760   11'b000011000000 : begin //a
761       check_one.op = 11'b000011000000;
762       check_one.unit = 3;
763   end
764   11'b00001001000 : begin //sfh
765       check_one.op = 11'b00001001000;
766       check_one.unit = 3;
767   end
768   11'b00001000000 : begin //sf
769       check_one.op = 11'b00001000000;
770       check_one.unit = 3;
771   end
772   11'b000011000001 : begin //and
773       check_one.op = 11'b000011000001;
774       check_one.unit = 3;
775   end
776   11'b000001000001 : begin //or
777       check_one.op = 11'b000001000001;
778       check_one.unit = 3;
779   end
780   11'b01001000001 : begin //xor
781       check_one.op = 11'b01001000001;
782       check_one.unit = 3;
783   end
784   11'b000011001001 : begin //nand
785       check_one.op = 11'b000011001001;
786       check_one.unit = 3;
787   end
788   11'b01111010000 : begin //ceqb
789       check_one.op = 11'b01111010000;
790       check_one.unit = 3;
791   end
792   11'b01111001000 : begin //ceqh
793       check_one.op = 11'b01111001000;
794       check_one.unit = 3;
795   end
796   11'b01111000000 : begin //ceq
797       check_one.op = 11'b01111000000;
798       check_one.unit = 3;
799   end

```

```

800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
999

```

```

869           endcase
870       end
871   end
872
873
874   if (check_one.even_valid == 0)
875     begin
876       check_one.rt_addr = instr[25:31];
877       check_one.ra_addr = instr[18:24];
878       check_one.rb_addr = instr[11:17];
879       check_one.rc_addr = instr[25:31];
880       check_one.reg_write = 1;
881       check_one.odd_valid = 1;
882       if (instr == 0) begin //alternate lnop
883         check_one.format = 0;
884         check_one.ra_valid = 0;
885         check_one.rb_valid = 0;
886         check_one.rc_valid = 0;
887         check_one.op = 0;
888         check_one.unit = 0;
889         check_one.rt_addr = 0;
890         check_one.imm = 0;
891         check_one.reg_write = 0;
892         check_one.odd_valid = 0;
893     end //RI10-type
894   else if (instr[0:7] == 8'b00110100) begin //lqd
895     check_one.format = 4;
896     check_one.ra_valid = 1;
897     check_one.rb_valid = 0;
898     check_one.rc_valid = 0;
899     check_one.op = 8'b00110100;
900     check_one.unit = 1;
901     check_one.imm = $signed(instr[8:17]);
902   end //stqd
903   else if (instr[0:7] == 8'b00100100) begin
904     check_one.format = 4;
905     check_one.ra_valid = 1;
906     check_one.rb_valid = 0;
907     check_one.rc_valid = 1;
908     check_one.op = 8'b00100100;
909     check_one.unit = 1;
910     check_one.imm = $signed(instr[8:17]);
911     check_one.reg_write = 0;
912   end //RI16-type
913   else begin
914     check_one.format = 5;
915     check_one.ra_valid = 0;
916     check_one.rb_valid = 0;
917     check_one.rc_valid = 0;
918     check_one.imm = $signed(instr[9:24]);
919     case(instr[0:8])
920       9'b001100001 : begin //lqa
921         check_one.op = 9'b001100001;
922         check_one.unit = 1;
923       end //stqa
924       9'b001000001 : begin
925         check_one.op = 9'b001000001;
926         check_one.unit = 1;
927         check_one.reg_write = 0;
928         check_one.rc_valid = 1;
929       end //br
930       9'b001100100 : begin
931         check_one.op = 9'b001100100;
932         check_one.unit = 2;
933         check_one.reg_write = 0;
934       end //bra
935       9'b001100000 : begin
936         check_one.op = 9'b001100000;
937         check_one.unit = 2;

```

```

937         check_one.reg_write = 0;
938
939 end
940     9'b001100110 : begin           //brs1
941         check_one.op = 9'b001100110;
942         check_one.unit = 2;
943     end
944     9'b001000010 : begin           //brnz
945         check_one.op = 9'b001000010;
946         check_one.unit = 2;
947         check_one.reg_write = 0;
948         check_one.rc_valid = 1;
949     end
950     9'b001000000 : begin           //brz
951         check_one.op = 9'b001000000;
952         check_one.unit = 2;
953         check_one.reg_write = 0;
954         check_one.rc_valid = 1;
955     end
956     default : check_one.format = 7;
957 endcase
958 if (check_one.format == 7) begin
959     check_one.format = 0;           //RR-type
960     check_one.ra_valid = 1;
961     check_one.rb_valid = 1;
962     check_one.rc_valid = 0;
963     //$/display("check: instr[0:10] %b ",instr[0:10]);
964     case(instr[0:10])
965         11'b00111011011 : begin      //shlqbi
966             $display("shlqbi ");
967             check_one.op = 11'b00111011011;
968             check_one.unit = 0;
969         end
970         11'b00111011111 : begin      //shlqby
971             check_one.op = 11'b00111011111;
972             check_one.unit = 0;
973         end
974         11'b00111011000 : begin      //rotqbi
975             check_one.op = 11'b00111011000;
976             check_one.unit = 0;
977         end
978         11'b00111011100 : begin      //rotqby
979             check_one.op = 11'b00111011100;
980             check_one.unit = 0;
981         end
982         11'b00110110010 : begin      //gbb
983             check_one.op = 11'b00110110010;
984             check_one.unit = 0;
985             check_one.rb_valid = 0;
986         end
987         11'b00110110001 : begin      //gbh
988             check_one.op = 11'b00110110001;
989             check_one.unit = 0;
990             check_one.rb_valid = 0;
991         end
992         11'b00110110000 : begin      //gb
993             check_one.op = 11'b00110110000;
994             check_one.unit = 0;
995             check_one.rb_valid = 0;
996         end
997         11'b00111000100 : begin      //lqx
998             check_one.op = 11'b00111000100;
999             check_one.unit = 1;
1000        end
1001        11'b00101000100 : begin      //stqx
1002            check_one.op = 11'b00101000100;
1003            check_one.unit = 1;
1004            check_one.reg_write = 0;
1005            check_one.rc_valid = 1;
1006        end

```

```

1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061

11'b00110101000 : begin //bi
    check_one.op = 11'b00110101000;
    check_one.unit = 2;
    check_one.reg_write = 0;
    check_one.rb_valid = 0;
end
11'b000000000001 : begin //lnop
    check_one.op = 11'b000000000001;
    check_one.unit = 0;
    check_one.reg_write = 0;
end
default : check_one.format = 7;
endcase
if (check_one.format == 7) begin
    check_one.format = 2; //RI7-type
    check_one.ra_valid = 1;
    check_one.rb_valid = 0;
    check_one.rc_valid = 0;
    check_one.imm = $signed(instr[11:17]);
    case(instr[0:10])
        11'b0011111011 : begin //shlqbii
            check_one.op = 11'b0011111011;
            check_one.unit = 0;
        end
        11'b0011111111 : begin //shlqbyi
            check_one.op = 11'b0011111111;
            check_one.unit = 0;
        end
        11'b0011111100 : begin //rotqbii
            check_one.op = 11'b0011111100;
            check_one.unit = 0;
        end
        11'b00111111100 : begin //rotqbyi
            check_one.op = 11'b00111111100;
            check_one.unit = 0;
        end
        default begin
            check_one.format = 0;
            check_one.ra_valid = 0;
            check_one.rb_valid = 0;
            check_one.rc_valid = 0;
            check_one.op = 0;
            check_one.unit = 0;
            check_one.rt_addr = 0;
            check_one.imm = 0;
            check_one.odd_valid = 0;
        end
    endcase
end
else check_one.odd_valid = 0;
endfunction
endmodule

```

```

1  module Pipes(clk, reset, instr_even, instr_odd, pc, pc_wb, branch_taken, op_even,
2    op_odd, unit_even, unit_odd, rt_addr_even, rt_addr_odd, format_even, format_odd,
3    imm_even, imm_odd, reg_write_even, reg_write_odd, first_odd, rt_addr_delay_even,
4    reg_write_delay_even, rt_addr_delay_odd, reg_write_delay_odd,
5    rt_addr_delay_fp1, reg_write_delay_fp1, int_delay_fp1, rt_addr_delay_fx2,
6    reg_write_delay_fx2, rt_addr_delay_b1, reg_write_delay_b1, rt_addr_delay_fx1,
7    reg_write_delay_fx1,
8    rt_addr_delay_p1, reg_write_delay_p1, rt_addr_delay_ls1, reg_write_delay_ls1);
9    input logic clk, reset;
10   input logic[0:31] instr_even, instr_odd; //Instr from decoder
11   input logic [7:0] pc; //Program counter from IF
12   stage
13
14   //Nets from decode logic
15   input logic [2:0] format_even, format_odd; //Format of instr
16   input logic [0:10] op_even, op_odd; //Opcode of instr (used
17   with format)
18   input logic [1:0] unit_even, unit_odd; //Destination unit of
19   instr; Order of: FP, FX2, Byte, FX1 (Even); Perm, LS, Br (Odd)
20   input logic [0:6] rt_addr_even, rt_addr_odd; //Destination register
21   addresses
22   logic [0:127] ra_even, rb_even, rc_even, ra_odd, rb_odd, rt_st_odd; //Register values from RegTable
23   input logic [0:17] imm_even, imm_odd; //Full possible immediate
24   value (used with format)
25   input logic reg_write_even, reg_write_odd; //1 if instr will write to
26   rt, else 0
27   input first_odd; //1 if odd instr is first
28   in pair, 0 else; Used for branch flushing
29
30   //Signals for writing back to RegTable
31   logic [0:127] rt_even_wb, rt_odd_wb; //Values to be written back
32   to RegTable
33   logic [0:6] rt_addr_even_wb, rt_addr_odd_wb; //Destination register
34   addresses
35   logic reg_write_even_wb, reg_write_odd_wb; //1 if instr will write to
36   rt, else 0
37
38   //Signals for forwarding logic
39   logic [6:0][0:127] fw_even_wb, fw_odd_wb; //Pipe shift registers of
40   values ready to be forwarded
41   logic [6:0][0:6] fw_addr_even_wb, fw_addr_odd_wb; //Destinations of values to
42   be forwarded
43   logic [6:0] fw_write_even_wb, fw_write_odd_wb; //Will forwarded values be
44   written to reg?
45   logic [0:127] ra_even_fwd, rb_even_fwd, rc_even_fwd, ra_odd_fwd, rb_odd_fwd,
46   rt_st_odd_fwd; //Updated input values
47
48   //Signals for handling branches
49   output logic [7:0] pc_wb; //New program counter for
50   branch
51   output logic branch_taken; //Was branch taken?
52   logic branch_kill; //If branch is taken and
53   branch instr is first in pair, kill twin instr
54   logic [2:0] format_even_live; //Format of instr, only
55   valid if branch not taken by first instr
56   logic [0:10] op_even_live; //Opcode of instr, only
57   valid if branch not taken by first instr
58
59   //Internal Signals for Handling RAW Hazards
60   output logic [0:6] rt_addr_delay_even, rt_addr_delay_odd; //Destination
61   register for rt_wb
62   output logic reg_write_delay_even, reg_write_delay_odd; //Will rt_wb
63   write to RegTable
64
65   output logic [6:0][0:6] rt_addr_delay_fp1; //Destination register for rt_wb
66   output logic [6:0] reg_write_delay_fp1; //Will rt_wb write to RegTable
67   output logic [6:0] int_delay_fp1; //Will fp1 write an int result

```

```

45   output logic [3:0][0:6] rt_addr_delay_fx2;           //Destination register for rt_wb
46   output logic [3:0]      reg_write_delay_fx2;         //Will rt_wb write to RegTable
47
48   output logic [3:0][0:6] rt_addr_delay_b1;           //Destination register for rt_wb
49   output logic [3:0]      reg_write_delay_b1;         //Will rt_wb write to RegTable
50
51   output logic [1:0][0:6] rt_addr_delay_fx1;           //Destination register for rt_wb
52   output logic [1:0]      reg_write_delay_fx1;         //Will rt_wb write to RegTable
53
54   output logic [3:0][0:6] rt_addr_delay_p1;           //Destination register for rt_wb
55   output logic [3:0]      reg_write_delay_p1;         //Will rt_wb write to RegTable
56
57   output logic [5:0][0:6] rt_addr_delay_ls1;           //Destination register for rt_wb
58   output logic [5:0]      reg_write_delay_ls1;         //Will rt_wb write to RegTable
59
60 RegisterTable rf(.clk(clk), .reset(reset), .instr_even(instr_even),
61   .instr_odd(instr_odd),
62     .ra_even(ra_even), .rb_even(rb_even), .rc_even(rc_even), .ra_odd(ra_odd),
63     .rb_odd(rb_odd), .rt_st_odd(rt_st_odd), .rt_addr_even(rt_addr_even_wb),
64     .rt_addr_odd(rt_addr_odd_wb), .rt_even(rt_even_wb), .rt_odd(rt_odd_wb),
65     .reg_write_even(reg_write_even_wb), .reg_write_odd(reg_write_odd_wb));
66
67 EvenPipe ev(.clk(clk), .reset(reset), .op(op_even_live), .format(format_even_live),
68   .unit(unit_even), .rt_addr(rt_addr_even), .ra(ra_even_fwd), .rb(rb_even_fwd),
69   .rc(rc_even_fwd),
70     .imm(imm_even), .reg_write(reg_write_even), .rt_wb(rt_even_wb),
71     .rt_addr_wb(rt_addr_even_wb), .reg_write_wb(reg_write_even_wb),
72     .branch_taken(branch_taken),
73     .fw_wb(fw_even_wb), .fw_addr_wb(fw_addr_even_wb), .fw_write_wb(fw_write_even_wb),
74     .rt_addr_delay_fp1(rt_addr_delay_fp1),
75     .reg_write_delay_fp1(reg_write_delay_fp1), .int_delay_fp1(int_delay_fp1),
76     .rt_addr_delay_fx2(rt_addr_delay_fx2),
77     .reg_write_delay_fx2(reg_write_delay_fx2), .rt_addr_delay_b1(rt_addr_delay_b1),
78     .reg_write_delay_b1(reg_write_delay_b1), .rt_addr_delay_fx1(rt_addr_delay_fx1),
79     .reg_write_delay_fx1(reg_write_delay_fx1));
80
81 OddPipe od(.clk(clk), .reset(reset), .op(op_odd), .format(format_odd),
82   .unit(unit_odd), .rt_addr(rt_addr_odd), .ra(ra_odd_fwd), .rb(rb_odd_fwd),
83     .rt_st(rt_st_odd_fwd), .imm(imm_odd), .reg_write(reg_write_odd), .pc_in(pc),
84     .rt_wb(rt_odd_wb), .rt_addr_wb(rt_addr_odd_wb), .reg_write_wb(reg_write_odd_wb),
85     .pc_wb(pc_wb), .branch_taken(branch_taken), .fw_wb(fw_odd_wb),
86     .fw_addr_wb(fw_addr_odd_wb), .fw_write_wb(fw_write_odd_wb), .first(first_odd),
87     .branch_kill(branch_kill),
88     .rt_addr_delay_p1(rt_addr_delay_p1), .reg_write_delay_p1(reg_write_delay_p1),
89     .rt_addr_delay_ls1(rt_addr_delay_ls1),
90     .reg_write_delay_ls1(reg_write_delay_ls1));
91
92
93 always_comb
94   if (branch_kill == 0) begin           //If branch is taken and branch is first
95     instr in pair, kill corresponding even instr
96     format_even_live = format_even;
97     op_even_live = op_even;
98   end
99   else begin
100    format_even_live = 0;
101    op_even_live = 0;
102  end

```

```
93     rt_addr_delay_even = rt_addr_even;  
94     reg_write_delay_even = reg_write_even;  
95     rt_addr_delay_odd = rt_addr_odd;  
96     reg_write_delay_odd = reg_write_odd;  
97 end  
98  
99 endmodule
```

```

1  module RegisterTable(clk, reset, instr_even, instr_odd, ra_even, rb_even, rc_even,
2    ra_odd, rb_odd,
3    rt_st_odd, rt_addr_even, rt_addr_odd, rt_even, rt_odd, reg_write_even,
4    reg_write_odd);
5
6    input clk, reset;
7
8    //RF/FWD Stage
9    input [0:31] instr_even, instr_odd; //Instructions to read from
10   decoder;
11   output logic [0:127] ra_even, rb_even, rc_even, ra_odd, rb_odd, rt_st_odd;
12   //Set all possible register values regardless of format
13
14   //WB Stage
15   input [0:6] rt_addr_even, rt_addr_odd; //Destination registers to
16   write to
17   input [0:127] rt_even, rt_odd; //Values to write to
18   destination registers
19   input reg_write_even, reg_write_odd; //1 if instr will write to
20   rt, else 0
21
22   logic [0:127] registers [0:127]; //RegFile
23   logic [7:0] i; //8-bit counter for reset
24
25   loop
26
27   always_comb begin
28     //All source register addresses are in the same location in all instr, so read
29     no matter what
30     rc_even = registers[instr_even[25:31]];
31     ra_even = registers[instr_even[18:24]];
32     rb_even = registers[instr_even[11:17]];
33
34     ra_odd = registers[instr_odd[18:24]];
35     rb_odd = registers[instr_odd[11:17]];
36     rt_st_odd = registers[instr_odd[25:31]];
37
38     //Forwarding in case of WAR hazard
39     if (reg_write_even == 1) begin
40       if (instr_even[25:31] == rt_addr_even)
41         rc_even = rt_even;
42       if (instr_even[18:24] == rt_addr_even)
43         ra_even = rt_even;
44       if (instr_even[11:17] == rt_addr_even)
45         rb_even = rt_even;
46       if (instr_odd[25:31] == rt_addr_even)
47         rt_st_odd = rt_even;
48       if (instr_odd[18:24] == rt_addr_even)
49         ra_odd = rt_even;
50       if (instr_odd[11:17] == rt_addr_even)
51         rb_odd = rt_even;
52     end
53     if (reg_write_odd == 1) begin
54       if (instr_even[25:31] == rt_addr_odd)
55         rc_even = rt_odd;
56       if (instr_even[18:24] == rt_addr_odd)
57         ra_even = rt_odd;
58       if (instr_even[11:17] == rt_addr_odd)
59         rb_even = rt_odd;
60       if (instr_odd[25:31] == rt_addr_odd)
61         rt_st_odd = rt_odd;
62       if (instr_odd[18:24] == rt_addr_odd)
63         ra_odd = rt_odd;
64       if (instr_odd[11:17] == rt_addr_odd)
65         rb_odd = rt_odd;
66     end
67   end
68
69   always_ff @(posedge clk) begin
70     if(reset == 1)begin
71       registers[127] = 0;
72     end
73   end
74
75 endmodule

```

```
61      for (i=0; i<127; i=i+1) begin
62          registers[i] <= 0;
63      end
64
65      else begin
66          if (reg_write_even == 1)
67              registers[rt_addr_even] <= rt_even;
68          if (reg_write_odd == 1)
69              registers[rt_addr_odd] <= rt_odd;
70      end
71
72  end
endmodule
```

```

1  module Forward(clk, reset, instr_even, instr_odd, ra_even, rb_even, rc_even, ra_odd,
2    rb_odd, rt_st_odd, ra_even_fwd, rb_even_fwd, rc_even_fwd, ra_odd_fwd, rb_odd_fwd,
3    rt_st_odd_fwd, fw_even_wb, fw_addr_even_wb, fw_write_even_wb, fw_odd_wb,
4    fw_addr_odd_wb, fw_write_odd_wb);
5
6    input          clk, reset;
7    input [0:31]   instr_even, instr_odd;                                //Instr from decoder
8
9
10   //Nets from decode logic (Note: Will be placed in decode/hazard unit in final
11   //submission)
12   input [0:127]  ra_even, rb_even, rc_even, ra_odd, rb_odd, rt_st_odd; //Register
13   values from RegTable
14
15   //Signals for forwarding logic
16   input [6:0][0:127] fw_even_wb, fw_odd_wb;                         //Pipe shift registers of
17   values ready to be forwarded
18   input [6:0][0:6]   fw_addr_even_wb, fw_addr_odd_wb;                //Destinations of values to
19   be forwarded
20   input [6:0]       fw_write_even_wb, fw_write_odd_wb;               //Will forwarded values be
21   written to reg?
22   output logic [0:127] ra_even_fwd, rb_even_fwd, rc_even_fwd, ra_odd_fwd,
23   rb_odd_fwd, rt_st_odd_fwd; //Updated input values
24
25
26   always_comb begin
27     //Checking inputs for even instr. Compared against FWE0 to FWE6, breaking when
28     //match found
29     //ra
30     if (instr_even[18:24] == fw_addr_even_wb[0] && fw_write_even_wb[0] == 1)
31       ra_even_fwd = fw_even_wb[0];
32     else if (instr_even[18:24] == fw_addr_odd_wb[0] && fw_write_odd_wb[0] == 1)
33       ra_even_fwd = fw_odd_wb[0];
34     else if (instr_even[18:24] == fw_addr_even_wb[1] && fw_write_even_wb[1] == 1)
35       ra_even_fwd = fw_even_wb[1];
36     else if (instr_even[18:24] == fw_addr_odd_wb[1] && fw_write_odd_wb[1] == 1)
37       ra_even_fwd = fw_odd_wb[1];
38     else if (instr_even[18:24] == fw_addr_even_wb[2] && fw_write_even_wb[2] == 1)
39       ra_even_fwd = fw_even_wb[2];
40     else if (instr_even[18:24] == fw_addr_odd_wb[2] && fw_write_odd_wb[2] == 1)
41       ra_even_fwd = fw_odd_wb[2];
42     else if (instr_even[18:24] == fw_addr_even_wb[3] && fw_write_even_wb[3] == 1)
43       ra_even_fwd = fw_even_wb[3];
44     else if (instr_even[18:24] == fw_addr_odd_wb[3] && fw_write_odd_wb[3] == 1)
45       ra_even_fwd = fw_odd_wb[3];
46     else if (instr_even[18:24] == fw_addr_even_wb[4] && fw_write_even_wb[4] == 1)
47       ra_even_fwd = fw_even_wb[4];
48     else if (instr_even[18:24] == fw_addr_odd_wb[4] && fw_write_odd_wb[4] == 1)
49       ra_even_fwd = fw_odd_wb[4];
50     else if (instr_even[18:24] == fw_addr_even_wb[5] && fw_write_even_wb[5] == 1)
51       ra_even_fwd = fw_even_wb[5];
52     else if (instr_even[18:24] == fw_addr_odd_wb[5] && fw_write_odd_wb[5] == 1)
53       ra_even_fwd = fw_odd_wb[5];
54     else if (instr_even[18:24] == fw_addr_even_wb[6] && fw_write_even_wb[6] == 1)
55       ra_even_fwd = fw_even_wb[6];
56     else if (instr_even[18:24] == fw_addr_odd_wb[6] && fw_write_odd_wb[6] == 1)
57       ra_even_fwd = fw_odd_wb[6];
58     else
59       ra_even_fwd = ra_even;
60
61     //rb
62     if (instr_even[11:17] == fw_addr_even_wb[0] && fw_write_even_wb[0] == 1)
63       rb_even_fwd = fw_even_wb[0];
64     else if (instr_even[11:17] == fw_addr_odd_wb[0] && fw_write_odd_wb[0] == 1)
65       rb_even_fwd = fw_odd_wb[0];
66     else if (instr_even[11:17] == fw_addr_even_wb[1] && fw_write_even_wb[1] == 1)
67       rb_even_fwd = fw_even_wb[1];
68     else if (instr_even[11:17] == fw_addr_odd_wb[1] && fw_write_odd_wb[1] == 1)
69       rb_even_fwd = fw_odd_wb[1];
70     else if (instr_even[11:17] == fw_addr_even_wb[2] && fw_write_even_wb[2] == 1)
71       rb_even_fwd = fw_even_wb[2];
72     else if (instr_even[11:17] == fw_addr_odd_wb[2] && fw_write_odd_wb[2] == 1)
73       rb_even_fwd = fw_odd_wb[2];

```

```

60     rb_even_fwd = fw_odd_wb[2];
61     else if (instr_even[11:17] == fw_addr_even_wb[3] && fw_write_even_wb[3] == 1)
62         rb_even_fwd = fw_even_wb[3];
63     else if (instr_even[11:17] == fw_addr_odd_wb[3] && fw_write_odd_wb[3] == 1)
64         rb_even_fwd = fw_odd_wb[3];
65     else if (instr_even[11:17] == fw_addr_even_wb[4] && fw_write_even_wb[4] == 1)
66         rb_even_fwd = fw_even_wb[4];
67     else if (instr_even[11:17] == fw_addr_odd_wb[4] && fw_write_odd_wb[4] == 1)
68         rb_even_fwd = fw_odd_wb[4];
69     else if (instr_even[11:17] == fw_addr_even_wb[5] && fw_write_even_wb[5] == 1)
70         rb_even_fwd = fw_even_wb[5];
71     else if (instr_even[11:17] == fw_addr_odd_wb[5] && fw_write_odd_wb[5] == 1)
72         rb_even_fwd = fw_odd_wb[5];
73     else if (instr_even[11:17] == fw_addr_even_wb[6] && fw_write_even_wb[6] == 1)
74         rb_even_fwd = fw_even_wb[6];
75     else if (instr_even[11:17] == fw_addr_odd_wb[6] && fw_write_odd_wb[6] == 1)
76         rb_even_fwd = fw_odd_wb[6];
77     else
78         rb_even_fwd = rb_even;
79
80 //rc
81 if (instr_even[25:31] == fw_addr_even_wb[0] && fw_write_even_wb[0] == 1)
82     rc_even_fwd = fw_even_wb[0];
83 else if (instr_even[25:31] == fw_addr_odd_wb[0] && fw_write_odd_wb[0] == 1)
84     rc_even_fwd = fw_odd_wb[0];
85 else if (instr_even[25:31] == fw_addr_even_wb[1] && fw_write_even_wb[1] == 1)
86     rc_even_fwd = fw_even_wb[1];
87 else if (instr_even[25:31] == fw_addr_odd_wb[1] && fw_write_odd_wb[1] == 1)
88     rc_even_fwd = fw_odd_wb[1];
89 else if (instr_even[25:31] == fw_addr_even_wb[2] && fw_write_even_wb[2] == 1)
90     rc_even_fwd = fw_even_wb[2];
91 else if (instr_even[25:31] == fw_addr_odd_wb[2] && fw_write_odd_wb[2] == 1)
92     rc_even_fwd = fw_odd_wb[2];
93 else if (instr_even[25:31] == fw_addr_even_wb[3] && fw_write_even_wb[3] == 1)
94     rc_even_fwd = fw_even_wb[3];
95 else if (instr_even[25:31] == fw_addr_odd_wb[3] && fw_write_odd_wb[3] == 1)
96     rc_even_fwd = fw_odd_wb[3];
97 else if (instr_even[25:31] == fw_addr_even_wb[4] && fw_write_even_wb[4] == 1)
98     rc_even_fwd = fw_even_wb[4];
99 else if (instr_even[25:31] == fw_addr_odd_wb[4] && fw_write_odd_wb[4] == 1)
100    rc_even_fwd = fw_odd_wb[4];
101 else if (instr_even[25:31] == fw_addr_even_wb[5] && fw_write_even_wb[5] == 1)
102    rc_even_fwd = fw_even_wb[5];
103 else if (instr_even[25:31] == fw_addr_odd_wb[5] && fw_write_odd_wb[5] == 1)
104    rc_even_fwd = fw_odd_wb[5];
105 else if (instr_even[25:31] == fw_addr_even_wb[6] && fw_write_even_wb[6] == 1)
106    rc_even_fwd = fw_even_wb[6];
107 else if (instr_even[25:31] == fw_addr_odd_wb[6] && fw_write_odd_wb[6] == 1)
108    rc_even_fwd = fw_odd_wb[6];
109 else
110    rc_even_fwd = rc_even;
111
112 //Checking inputs for odd instr. Compared against FW00 to FW06, breaking when
113 //match found
114 //ra
115 if (instr_odd[18:24] == fw_addr_even_wb[0] && fw_write_even_wb[0] == 1)
116     ra_odd_fwd = fw_even_wb[0];
117 else if (instr_odd[18:24] == fw_addr_odd_wb[0] && fw_write_odd_wb[0] == 1)
118     ra_odd_fwd = fw_odd_wb[0];
119 else if (instr_odd[18:24] == fw_addr_even_wb[1] && fw_write_even_wb[1] == 1)
120     ra_odd_fwd = fw_even_wb[1];
121 else if (instr_odd[18:24] == fw_addr_odd_wb[1] && fw_write_odd_wb[1] == 1)
122     ra_odd_fwd = fw_odd_wb[1];
123 else if (instr_odd[18:24] == fw_addr_even_wb[2] && fw_write_even_wb[2] == 1)
124     ra_odd_fwd = fw_even_wb[2];
125 else if (instr_odd[18:24] == fw_addr_odd_wb[2] && fw_write_odd_wb[2] == 1)
126     ra_odd_fwd = fw_odd_wb[2];
127 else if (instr_odd[18:24] == fw_addr_even_wb[3] && fw_write_even_wb[3] == 1)
128     ra_odd_fwd = fw_even_wb[3];

```

```

128     else if (instr_odd[18:24] == fw_addr_odd_wb[3] && fw_write_odd_wb[3] == 1)
129         ra_odd_fwd = fw_odd_wb[3];
130     else if (instr_odd[18:24] == fw_addr_even_wb[4] && fw_write_even_wb[4] == 1)
131         ra_odd_fwd = fw_even_wb[4];
132     else if (instr_odd[18:24] == fw_addr_odd_wb[4] && fw_write_odd_wb[4] == 1)
133         ra_odd_fwd = fw_odd_wb[4];
134     else if (instr_odd[18:24] == fw_addr_even_wb[5] && fw_write_even_wb[5] == 1)
135         ra_odd_fwd = fw_even_wb[5];
136     else if (instr_odd[18:24] == fw_addr_odd_wb[5] && fw_write_odd_wb[5] == 1)
137         ra_odd_fwd = fw_odd_wb[5];
138     else if (instr_odd[18:24] == fw_addr_even_wb[6] && fw_write_even_wb[6] == 1)
139         ra_odd_fwd = fw_even_wb[6];
140     else if (instr_odd[18:24] == fw_addr_odd_wb[6] && fw_write_odd_wb[6] == 1)
141         ra_odd_fwd = fw_odd_wb[6];
142     else
143         ra_odd_fwd = ra_odd;
144
145     //rb
146     if (instr_odd[11:17] == fw_addr_even_wb[0] && fw_write_even_wb[0] == 1)
147         rb_odd_fwd = fw_even_wb[0];
148     else if (instr_odd[11:17] == fw_addr_odd_wb[0] && fw_write_odd_wb[0] == 1)
149         rb_odd_fwd = fw_odd_wb[0];
150     else if (instr_odd[11:17] == fw_addr_even_wb[1] && fw_write_even_wb[1] == 1)
151         rb_odd_fwd = fw_even_wb[1];
152     else if (instr_odd[11:17] == fw_addr_odd_wb[1] && fw_write_odd_wb[1] == 1)
153         rb_odd_fwd = fw_odd_wb[1];
154     else if (instr_odd[11:17] == fw_addr_even_wb[2] && fw_write_even_wb[2] == 1)
155         rb_odd_fwd = fw_even_wb[2];
156     else if (instr_odd[11:17] == fw_addr_odd_wb[2] && fw_write_odd_wb[2] == 1)
157         rb_odd_fwd = fw_odd_wb[2];
158     else if (instr_odd[11:17] == fw_addr_even_wb[3] && fw_write_even_wb[3] == 1)
159         rb_odd_fwd = fw_even_wb[3];
160     else if (instr_odd[11:17] == fw_addr_odd_wb[3] && fw_write_odd_wb[3] == 1)
161         rb_odd_fwd = fw_odd_wb[3];
162     else if (instr_odd[11:17] == fw_addr_even_wb[4] && fw_write_even_wb[4] == 1)
163         rb_odd_fwd = fw_even_wb[4];
164     else if (instr_odd[11:17] == fw_addr_odd_wb[4] && fw_write_odd_wb[4] == 1)
165         rb_odd_fwd = fw_odd_wb[4];
166     else if (instr_odd[11:17] == fw_addr_even_wb[5] && fw_write_even_wb[5] == 1)
167         rb_odd_fwd = fw_even_wb[5];
168     else if (instr_odd[11:17] == fw_addr_odd_wb[5] && fw_write_odd_wb[5] == 1)
169         rb_odd_fwd = fw_odd_wb[5];
170     else if (instr_odd[11:17] == fw_addr_even_wb[6] && fw_write_even_wb[6] == 1)
171         rb_odd_fwd = fw_even_wb[6];
172     else if (instr_odd[11:17] == fw_addr_odd_wb[6] && fw_write_odd_wb[6] == 1)
173         rb_odd_fwd = fw_odd_wb[6];
174     else
175         rb_odd_fwd = rb_odd;
176
177     //rt_st
178     if (instr_odd[25:31] == fw_addr_even_wb[0] && fw_write_even_wb[0] == 1)
179         rt_st_odd_fwd = fw_even_wb[0];
180     else if (instr_odd[25:31] == fw_addr_odd_wb[0] && fw_write_odd_wb[0] == 1)
181         rt_st_odd_fwd = fw_odd_wb[0];
182     else if (instr_odd[25:31] == fw_addr_even_wb[1] && fw_write_even_wb[1] == 1)
183         rt_st_odd_fwd = fw_even_wb[1];
184     else if (instr_odd[25:31] == fw_addr_odd_wb[1] && fw_write_odd_wb[1] == 1)
185         rt_st_odd_fwd = fw_odd_wb[1];
186     else if (instr_odd[25:31] == fw_addr_even_wb[2] && fw_write_even_wb[2] == 1)
187         rt_st_odd_fwd = fw_even_wb[2];
188     else if (instr_odd[25:31] == fw_addr_odd_wb[2] && fw_write_odd_wb[2] == 1)
189         rt_st_odd_fwd = fw_odd_wb[2];
190     else if (instr_odd[25:31] == fw_addr_even_wb[3] && fw_write_even_wb[3] == 1)
191         rt_st_odd_fwd = fw_even_wb[3];
192     else if (instr_odd[25:31] == fw_addr_odd_wb[3] && fw_write_odd_wb[3] == 1)
193         rt_st_odd_fwd = fw_odd_wb[3];
194     else if (instr_odd[25:31] == fw_addr_even_wb[4] && fw_write_even_wb[4] == 1)
195         rt_st_odd_fwd = fw_even_wb[4];
196     else if (instr_odd[25:31] == fw_addr_odd_wb[4] && fw_write_odd_wb[4] == 1)

```

```
197     rt_st_odd_fwd = fw_odd_wb[4];
198 else if (instr_odd[25:31] == fw_addr_even_wb[5] && fw_write_even_wb[5] == 1)
199     rt_st_odd_fwd = fw_even_wb[5];
200 else if (instr_odd[25:31] == fw_addr_odd_wb[5] && fw_write_odd_wb[5] == 1)
201     rt_st_odd_fwd = fw_odd_wb[5];
202 else if (instr_odd[25:31] == fw_addr_even_wb[6] && fw_write_even_wb[6] == 1)
203     rt_st_odd_fwd = fw_even_wb[6];
204 else if (instr_odd[25:31] == fw_addr_odd_wb[6] && fw_write_odd_wb[6] == 1)
205     rt_st_odd_fwd = fw_odd_wb[6];
206 else
207     rt_st_odd_fwd = rt_st_odd;
208 end
209
210 endmodule
```

```

1  module EvenPipe(clk, reset, op, format, unit, rt_addr, ra, rb, rc, imm, reg_write,
2    rt_wb, rt_addr_wb, reg_write_wb, branch_taken, fw_wb, fw_addr_wb, fw_write_wb,
3    rt_addr_delay_fpl, reg_write_delay_fpl, int_delay_fpl, rt_addr_delay_fx2,
4    reg_write_delay_fx2, rt_addr_delay_b1, reg_write_delay_b1, rt_addr_delay_fx1,
5    reg_write_delay_fx1);
6    input          clk, reset;
7
8    //RF/FWD Stage
9    input [0:10]   op;           //Decoded opcode, truncated based on format
10   input [2:0]    format;       //Format of instr, used with op and imm
11   input [1:0]    unit;         //Execution unit of instr (0: FP, 1: FX2, 2: Byte,
12     3: FX1)
13   input [0:6]    rt_addr;      //Destination register address
14   input [0:127]  ra, rb, rc;  //Values of source registers
15   input [0:17]   imm;          //Immediate value, truncated based on format
16   input          reg_write;   //Will current instr write to RegTable
17   input          branch_taken; //Was branch taken?
18
19   //WB Stage
20   output logic [0:127] rt_wb; //Output value of Stage 7
21   output logic [0:6]   rt_addr_wb; //Destination register for rt_wb
22   output logic        reg_write_wb; //Will rt_wb write to RegTable
23
24   //Internal Signals
25   output logic [6:0][0:127] fw_wb; //Staging register for forwarded values
26   output logic [6:0][0:6]   fw_addr_wb; //Destination register for rt_wb
27   output logic [6:0]        fw_write_wb; //Will rt_wb write to RegTable
28
29   logic [0:10]   fp1_op;      //Multiplexed opcode
30   logic [2:0]    fp1_format;  //Multiplexed format
31   logic          fp1_reg_write; //Multiplexed reg_write
32   logic [0:127]  fp1_out;     //Output value of fp1 Stage 6
33   logic [0:6]    fp1_addr_out; //Destination register for rt_wb
34   logic          fp1_write_out; //Will rt_wb write to RegTable
35   logic [0:127]  fp1_int;     //Output value of fp1 Stage 7
36   logic [0:6]    fp1_addr_int; //Destination register for rt_wb
37   logic          fp1_write_int; //Will rt_wb write to RegTable
38
39   logic [0:10]   fx2_op;      //Multiplexed opcode
40   logic [2:0]    fx2_format;  //Multiplexed format
41   logic          fx2_reg_write; //Multiplexed reg_write
42   logic [0:127]  fx2_out;     //Output value of fx2 Stage 4
43   logic [0:6]    fx2_addr_out; //Destination register for rt_wb
44   logic          fx2_write_out; //Will rt_wb write to RegTable
45
46   logic [0:10]   b1_op;      //Multiplexed opcode
47   logic [2:0]    b1_format;  //Multiplexed format
48   logic          b1_reg_write; //Multiplexed reg_write
49   logic [0:127]  b1_out;     //Output value of b1 Stage 4
50   logic [0:6]    b1_addr_out; //Destination register for rt_wb
51   logic          b1_write_out; //Will rt_wb write to RegTable
52
53   logic [0:10]   fx1_op;      //Multiplexed opcode
54   logic [2:0]    fx1_format;  //Multiplexed format
55   logic          fx1_reg_write; //Multiplexed reg_write
56   logic [0:127]  fx1_out;     //Output value of fx1 Stage 2
57   logic [0:6]    fx1_addr_out; //Destination register for rt_wb
58   logic          fx1_write_out; //Will rt_wb write to RegTable
59
60   //Internal Signals for Handling RAW Hazards
61   output logic [6:0][0:6] rt_addr_delay_fpl; //Destination register for rt_wb
62   output logic [6:0]      reg_write_delay_fpl; //Will rt_wb write to RegTable
63   output logic [6:0]      int_delay_fpl; //Will fp1 write an int result
64
65   output logic [3:0][0:6] rt_addr_delay_fx2; //Destination register for rt_wb
66   output logic [3:0]      reg_write_delay_fx2; //Will rt_wb write to RegTable
67
68   output logic [3:0][0:6] rt_addr_delay_b1; //Destination register for rt_wb
69   output logic [3:0]      reg_write_delay_b1; //Will rt_wb write to RegTable

```

```

66
67     output logic [1:0][0:6] rt_addr_delay_fx1;      //Destination register for rt_wb
68     output logic [1:0]      reg_write_delay_fx1;    //Will rt_wb write to RegTable
69
70
71     SinglePrecision fp1(.clk(clk), .reset(reset), .op(fp1_op), .format(fp1_format),
72     .rt_addr(rt_addr), .ra(ra), .rb(rb), .rc(rc), .imm(imm), .reg_write(fp1_reg_write),
73     .rt_wb(fp1_out), .rt_addr_wb(fp1_addr_out), .reg_write_wb(fp1_write_out),
74     .rt_int(fp1_int), .rt_addr_int(fp1_addr_int), .reg_write_int(fp1_write_int),
75     .branch_taken(branch_taken), .rt_addr_delay(rt_addr_delay_fp1),
76     .reg_write_delay(reg_write_delay_fp1), .int_delay(int_delay_fp1));
77
78     SimpleFixed2 fx2(.clk(clk), .reset(reset), .op(fx2_op), .format(fx2_format),
79     .rt_addr(rt_addr), .ra(ra), .rb(rb), .imm(imm), .reg_write(fx2_reg_write),
80     .rt_wb(fx2_out),
81     .rt_addr_wb(fx2_addr_out), .reg_write_wb(fx2_write_out),
82     .branch_taken(branch_taken),
83     .rt_addr_delay(rt_addr_delay_fx2), .reg_write_delay(reg_write_delay_fx2));
84
85     Byte b1(.clk(clk), .reset(reset), .op(b1_op), .format(b1_format),
86     .rt_addr(rt_addr), .ra(ra), .rb(rb), .imm(imm), .reg_write(b1_reg_write),
87     .rt_wb(b1_out),
88     .rt_addr_wb(b1_addr_out), .reg_write_wb(b1_write_out),
89     .branch_taken(branch_taken),
90     .rt_addr_delay(rt_addr_delay_b1), .reg_write_delay(reg_write_delay_b1));
91
92
93     SimpleFixed1 fx1(.clk(clk), .reset(reset), .op(fx1_op), .format(fx1_format),
94     .rt_addr(rt_addr), .ra(ra), .rb(rb), .rt_st(rc), .imm(imm),
95     .reg_write(fx1_reg_write), .rt_wb(fx1_out),
96     .rt_addr_wb(fx1_addr_out), .reg_write_wb(fx1_write_out),
97     .branch_taken(branch_taken),
98     .rt_addr_delay(rt_addr_delay_fx1), .reg_write_delay(reg_write_delay_fx1));
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121

always_comb begin
    fp1_op = 0;
    fp1_format = 0;
    fp1_reg_write = 0;

    fx2_op = 0;
    fx2_format = 0;
    fx2_reg_write = 0;

    b1_op = 0;
    b1_format = 0;
    b1_reg_write = 0;

    fx1_op = 0;
    fx1_format = 0;
    fx1_reg_write = 0;

    case (unit)                                //Mux to determine which unit will
        take the instr                           //Instr going to fp1
    2'b00 : begin
        fp1_op = op;
        fp1_format = format;
        fp1_reg_write = reg_write;
    end
    2'b01 : begin                                //Instr going to fx2
        fx2_op = op;
        fx2_format = format;
        fx2_reg_write = reg_write;
    end
    2'b10 : begin                                //Instr going to b1
        b1_op = op;
        b1_format = format;
        b1_reg_write = reg_write;
    end
end

```

```

122
123         2'b11 : begin                                //Instr going to fx1
124             fx1_op = op;
125             fx1_format = format;
126             fx1_reg_write = reg_write;
127         end
128     endcase
129 end
130
131 always_ff @(posedge clk) begin
132     fw_wb[0] <= 0;                                //fw0 and fw1 don't exist, just use 0
133     fw_addr_wb[0] <= 0;
134     fw_write_wb[0] <= 0;
135
136     if (reset == 1) begin
137         rt_wb <= 0;
138         rt_addr_wb <= 0;
139         reg_write_wb <= 0;
140         for (int i=6; i>0; i=i-1) begin
141             fw_wb [i] <= 0;
142             fw_addr_wb [i] <= 0;
143             fw_write_wb [i] <= 0;
144         end
145     end
146     else begin
147         rt_wb <= fw_wb[6];
148         rt_addr_wb <= fw_addr_wb[6];
149         reg_write_wb <= fw_write_wb[6];
150
151         if (fp1_write_int == 1) begin                //Replace fw6 with fp1 if possible
152             fw_wb[6] <= fp1_int;
153             fw_addr_wb[6] <= fp1_addr_int;
154             fw_write_wb[6] <= fp1_write_int;
155         end
156         else begin
157             fw_wb[6] <= fw_wb[5];
158             fw_addr_wb[6] <= fw_addr_wb[5];
159             fw_write_wb[6] <= fw_write_wb[5];
160         end
161
162         if (fp1_write_out == 1) begin                //Replace fw6 with fp1 if possible
163             fw_wb[5] <= fp1_out;
164             fw_addr_wb[5] <= fp1_addr_out;
165             fw_write_wb[5] <= fp1_write_out;
166         end
167         else begin
168             fw_wb[5] <= fw_wb[4];
169             fw_addr_wb[5] <= fw_addr_wb[4];
170             fw_write_wb[5] <= fw_write_wb[4];
171         end
172
173         fw_wb[4] <= fw_wb[3];
174         fw_addr_wb[4] <= fw_addr_wb[3];
175         fw_write_wb[4] <= fw_write_wb[3];
176
177         if (fx2_write_out == 1) begin                //Replace fw4 with fx2 if possible
178             fw_wb[3] <= fx2_out;
179             fw_addr_wb[3] <= fx2_addr_out;
180             fw_write_wb[3] <= fx2_write_out;
181         end
182         else if (b1_write_out == 1) begin            //Replace fw4 with b1 if possible
183             fw_wb[3] <= b1_out;
184             fw_addr_wb[3] <= b1_addr_out;
185             fw_write_wb[3] <= b1_write_out;
186         end
187         else begin
188             fw_wb[3] <= fw_wb[2];
189             fw_addr_wb[3] <= fw_addr_wb[2];
190             fw_write_wb[3] <= fw_write_wb[2];
191         end

```

```
191
192     fw_wb[2] <= fw_wb[1];
193     fw_addr_wb[2] <= fw_addr_wb[1];
194     fw_write_wb[2] <= fw_write_wb[1];
195
196     if (fx1_write_out == 1) begin //Replace fw2 with fx1 if possible
197         fw_wb[1] <= fx1_out;
198         fw_addr_wb[1] <= fx1_addr_out;
199         fw_write_wb[1] <= fx1_write_out;
200     end
201     else begin
202         fw_wb[1] <= fw_wb[0];
203         fw_addr_wb[1] <= fw_addr_wb[0];
204         fw_write_wb[1] <= fw_write_wb[0];
205     end
206 end
207
208
209 endmodule
```

```

1  module OddPipe(clk, reset, op, format, unit, rt_addr, ra, rb, rt_st, imm, reg_write,
2    pc_in, rt_wb, rt_addr_wb, reg_write_wb, pc_wb, branch_taken,
3    fw_wb, fw_addr_wb, fw_write_wb, first, branch_kill, rt_addr_delay_ls1,
4    reg_write_delay_ls1, rt_addr_delay_p1, reg_write_delay_p1);
5    input          clk, reset;
6
7    //RF/FWD Stage
8    input [0:10]   op;           //Decoded opcode, truncated based on format
9    input [2:0]    format;       //Format of instr, used with op and imm
10   input [1:0]   unit;         //Execution unit of instr (0: Perm, 1: LS, 2: Br,
11   3: Undefined)
12   input [0:6]   rt_addr;      //Destination register address
13   input [0:127]  ra, rb, rt_st; //Values of source registers
14   input [0:17]   imm;          //Immediate value, truncated based on format
15   input          reg_write;   //Will current instr write to RegTable
16   input [7:0]   pc_in;        //Program counter from IF stage
17   input          first;        //1 if first instr in pair; used for determining
18   order of branch
19
20  //WB Stage
21  output logic [0:127] rt_wb;      //Output value of Stage 7
22  output logic [0:6]  rt_addr_wb;   //Destination register for rt_wb
23  output logic          reg_write_wb; //Will rt_wb write to RegTable
24  output logic [7:0]  pc_wb;        //New program counter for branch
25  output logic          branch_taken; //Was branch taken?
26  output logic          branch_kill; //If branch is taken and branch instr is
27  first in pair, kill twin instr
28
29  //Internal Signals
30  output logic [6:0][0:127] fw_wb;    //Staging register for forwarded values
31  output logic [6:0][0:6]  fw_addr_wb; //Destination register for rt_wb
32  output logic [6:0]       fw_write_wb; //Will rt_wb write to RegTable
33
34  logic [0:10]  p1_op;          //Multiplexed opcode
35  logic [2:0]   p1_format;      //Multiplexed format
36  logic          p1_reg_write;  //Multiplexed reg_write
37  logic [0:127] p1_out;         //Output value of p1 Stage 4
38  logic [0:6]   p1_addr_out;   //Destination register for rt_wb
39  logic          p1_write_out;  //Will rt_wb write to RegTable
40
41  logic [0:10]  ls1_op;         //Multiplexed opcode
42  logic [2:0]   ls1_format;     //Multiplexed format
43  logic          ls1_reg_write; //Multiplexed reg_write
44  logic [0:127] ls1_out;        //Output value of ls Stage 6
45  logic [0:6]   ls1_addr_out;  //Destination register for rt_wb
46  logic          ls1_write_out; //Will rt_wb write to RegTable
47
48  logic [0:10]  br1_op;         //Multiplexed opcode
49  logic [2:0]   br1_format;     //Multiplexed format
50  logic          br1_reg_write; //Multiplexed reg_write
51  logic [0:127] br1_out;        //Output value of br1 Stage 1
52  logic [0:6]   br1_addr_out;  //Destination register for rt_wb
53  logic          br1_write_out; //Will rt_wb write to RegTable
54
55  //Internal Signals for Handling RAW Hazards
56  output logic [5:0][0:6] rt_addr_delay_ls1; //Destination register for rt_wb
57  output logic [5:0]      reg_write_delay_ls1; //Will rt_wb write to RegTable
58
59  output logic [3:0][0:6] rt_addr_delay_p1; //Destination register for rt_wb
60  output logic [3:0]      reg_write_delay_p1; //Will rt_wb write to RegTable
61
62  Permute p1(.clk(clk), .reset(reset), .op(p1_op), .format(p1_format),
63  .rt_addr(rt_addr), .ra(ra), .rb(rb), .imm(imm), .reg_write(p1_reg_write),
64  .rt_wb(p1_out),
65  .rt_addr_wb(p1_addr_out), .reg_write_wb(p1_write_out),
66  .branch_taken(branch_taken), .rt_addr_delay(rt_addr_delay_p1),
67  .reg_write_delay(reg_write_delay_p1));
68
69  LocalStore ls1(.clk(clk), .reset(reset), .op(ls1_op), .format(ls1_format),
70  .rt_addr(rt_addr), .ra(ra), .rb(rb), .imm(imm), .reg_write(ls1_reg_write),
71  .rt_wb(ls1_out),
72  .rt_addr_wb(ls1_addr_out), .reg_write_wb(ls1_write_out),
73  .branch_taken(branch_taken), .rt_addr_delay(rt_addr_delay_ls1),
74  .reg_write_delay(reg_write_delay_ls1));

```

```

        .rt_addr(rt_addr), .ra(ra), .rb(rb), .rt_st(rt_st), .imm(imm),
        .reg_write(ls1_reg_write), .rt_wb(ls1_out),
        .rt_addr_wb(ls1_addr_out), .reg_write_wb(ls1_write_out),
        .branch_taken(branch_taken), .rt_addr_delay(rt_addr_delay_ls1),
        .reg_write_delay(reg_write_delay_ls1));

62
63 Branch brl(.clk(clk), .reset(reset), .op(brl_op), .format(brl_format),
64     .rt_addr(rt_addr), .ra(ra), .rb(rb), .rt_st(rt_st), .imm(imm),
65     .reg_write(brl_reg_write), .pc_in(pc_in), .rt_wb(brl_out),
66     .rt_addr_wb(brl_addr_out), .reg_write_wb(brl_write_out), .pc_wb(pc_wb),
67     .branch_taken(branch_taken), .first(first), .branch_kill(branch_kill));
68
69 always_comb begin
70     p1_op = 0;
71     p1_format = 0;
72     p1_reg_write = 0;
73
74     ls1_op = 0;
75     ls1_format = 0;
76     ls1_reg_write = 0;
77
78     brl_op = 0;
79     brl_format = 0;
80     brl_reg_write = 0;
81
82     case (unit)                                //Mux to determine which unit will
83         take the instr
84             2'b00 : begin                      //Instr going to p1
85                 p1_op = op;
86                 p1_format = format;
87                 p1_reg_write = reg_write;
88             end
89             2'b01 : begin                      //Instr going to ls1
90                 ls1_op = op;
91                 ls1_format = format;
92                 ls1_reg_write = reg_write;
93             end
94             2'b10 : begin                      //Instr going to brl
95                 brl_op = op;
96                 brl_format = format;
97                 brl_reg_write = reg_write;
98             end
99             default begin                         //Instr going to p1
100                p1_op = op;
101                p1_format = format;
102                p1_reg_write = reg_write;
103            end
104        endcase
105    end
106
107 always_ff @ (posedge clk) begin
108
109     if (reset == 1) begin
110         rt_wb = 0;
111         rt_addr_wb = 0;
112         reg_write_wb = 0;
113         for (int i=6; i>0; i=i-1) begin
114             fw_wb [i] <= 0;
115             fw_addr_wb [i] <= 0;
116             fw_write_wb [i] <= 0;
117         end
118     end
119     else begin
120         rt_wb <= fw_wb[6];
121         rt_addr_wb <= fw_addr_wb[6];
122         reg_write_wb <= fw_write_wb[6];
123
124         fw_wb[6] <= fw_wb[5];
125         fw_addr_wb[6] <= fw_addr_wb[5];
126     end
```

```

122 fw_write_wb[6] <= fw_write_wb[5];
123
124 if (ls1_write_out == 1) begin //Replace fw6 with ls1 if possible
125   fw_wb[5] <= ls1_out;
126   fw_addr_wb[5] <= ls1_addr_out;
127   fw_write_wb[5] <= ls1_write_out;
128 end
129 else begin
130   fw_wb[5] <= fw_wb[4];
131   fw_addr_wb[5] <= fw_addr_wb[4];
132   fw_write_wb[5] <= fw_write_wb[4];
133 end
134
135 fw_wb[4] <= fw_wb[3];
136 fw_addr_wb[4] <= fw_addr_wb[3];
137 fw_write_wb[4] <= fw_write_wb[3];
138
139 if (p1_write_out == 1) begin //Replace fw4 with p1 if possible
140   fw_wb[3] <= p1_out;
141   fw_addr_wb[3] <= p1_addr_out;
142   fw_write_wb[3] <= p1_write_out;
143 end
144 else begin
145   fw_wb[3] <= fw_wb[2];
146   fw_addr_wb[3] <= fw_addr_wb[2];
147   fw_write_wb[3] <= fw_write_wb[2];
148 end
149
150 fw_wb[2] <= fw_wb[1];
151 fw_addr_wb[2] <= fw_addr_wb[1];
152 fw_write_wb[2] <= fw_write_wb[1];
153
154 fw_wb[1] <= fw_wb[0];
155 fw_addr_wb[1] <= fw_addr_wb[0];
156 fw_write_wb[1] <= fw_write_wb[0];
157
158 if (br1_write_out == 1) begin //Replace fw1 with p1 if possible
159   (??????)
160   fw_wb[0] <= br1_out;
161   fw_addr_wb[0] <= br1_addr_out;
162   fw_write_wb[0] <= br1_write_out;
163 end
164 else begin
165   fw_wb[0] <= 0;
166   fw_addr_wb[0] <= 0;
167   fw_write_wb[0] <= 0;
168 end
169 end
170 endmodule

```

```

1  module SinglePrecision(clk, reset, op, format, rt_addr, ra, rb, rc, imm, reg_write,
2    rt_wb, rt_addr_wb, reg_write_wb, rt_int, rt_addr_int, reg_write_int, branch_taken,
3    rt_addr_delay, reg_write_delay, int_delay);
4    input          clk, reset;
5
6    //RF/FWD Stage
7    input [0:10]   op;           //Decoded opcode, truncated based on format
8    input [2:0]    format;       //Format of instr, used with op and imm
9    input [0:6]    rt_addr;     //Destination register address
10   input [0:127]  ra, rb, rc; //Values of source registers
11   input [0:17]   imm;         //Immediate value, truncated based on format
12   input          reg_write;  //Will current instr write to RegTable
13   input          branch_taken; //Was branch taken?
14
15  //WB Stage
16  output logic [0:127] rt_wb; //Output value of Stage 6
17  output logic [0:6]  rt_addr_wb; //Destination register for rt_wb
18  output logic [0:127] reg_write_wb; //Will rt_wb write to RegTable
19
20  output logic [0:127] rt_int; //Output value of Stage 7
21  output logic [0:6]  rt_addr_int; //Destination register for rt_wb
22  output logic [0:127] reg_write_int; //Will rt_wb write to RegTable
23
24  //Internal Signals
25  logic [6:0][0:127] rt_delay; //Staging register for calculated values
26  output logic [6:0][0:6] rt_addr_delay; //Destination register for rt_wb
27  output logic [6:0]   reg_write_delay; //Will rt_wb write to RegTable
28  output logic [6:0]   int_delay; //1 if int op, 0 if else
29
30  always_comb begin
31    if (int_delay[5] == 1) begin //FP7 writeback (only for int ops)
32      rt_int = rt_delay[5];
33      rt_addr_int = rt_addr_delay[5];
34      reg_write_int = reg_write_delay[5];
35    end
36    else begin
37      rt_int = 0;
38      rt_addr_int = 0;
39      reg_write_int = 0;
40    end
41
42    if (int_delay[4] == 0) begin //FP6 writeback
43      rt_wb = rt_delay[4];
44      rt_addr_wb = rt_addr_delay[4];
45      reg_write_wb = reg_write_delay[4];
46    end
47    else begin
48      rt_wb = 0;
49      rt_addr_wb = 0;
50      reg_write_wb = 0;
51    end
52  end
53
54  always_ff @(posedge clk) begin
55    integer scale;
56    shortreal tempfp;
57    logic [0:15] temp16;
58
59    if (reset == 1) begin
60      rt_delay[6] <= 0;
61      rt_addr_delay[6] <= 0;
62      reg_write_delay[6] <= 0;
63      int_delay[6] <= 0;
64      for (int i=0; i<6; i=i+1) begin
65        rt_delay[i] <= 0;
66        rt_addr_delay[i] <= 0;
67        reg_write_delay[i] <= 0;
68        int_delay[i] <= 0;
69      end
70    end

```

```

69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
end
else begin
    rt_delay[6] <= rt_delay[5];
    rt_addr_delay[6] <= rt_addr_delay[5];
    reg_write_delay[6] <= reg_write_delay[5];
    int_delay[6] <= int_delay[5];
    for (int i=0; i<5; i=i+1) begin
        rt_delay[i+1] <= rt_delay[i];
        rt_addr_delay[i+1] <= rt_addr_delay[i];
        reg_write_delay[i+1] <= reg_write_delay[i];
        int_delay[i+1] <= int_delay[i];
    end
    if (format == 0 && op[0:9] == 0000000000) begin      //nop : No Operation
        (Execute)
        rt_delay[0] <= 0;
        rt_addr_delay[0] <= 0;
        reg_write_delay[0] <= 0;
        int_delay[0] <= 0;
    end
    else begin
        rt_addr_delay[0] <= rt_addr;
        reg_write_delay[0] <= reg_write;
        if (branch_taken) begin
            int_delay[0] <= 0;
            rt_delay[0] <= 0;
            rt_addr_delay[0] <= 0;
            reg_write_delay[0] <= 0;
        end
        else if (format == 0) begin
            case (op)
                11'b01111000100 : begin                  //mpy : Multiply
                    int_delay[0] <= 1;
                    for (int i=0; i<4; i=i+1)
                        rt_delay[0][(i*32) +: 32] <= $signed(ra[(i*32)+16 +:
                            16]) * $signed(rb[(i*32)+16 +: 16]);
                end
                11'b01111001100 : begin                  //mpyu : Multiply
                    Unsigned
                    int_delay[0] <= 1;
                    for (int i=0; i<4; i=i+1)
                        rt_delay[0][(i*32) +: 32] <= $unsigned(ra[(i*32)+16 +:
                            16]) * $unsigned(rb[(i*32)+16 +: 16]);
                end
                11'b01111000101 : begin                  //mpyh : Multiply High
                    int_delay[0] <= 1;
                    for (int i=0; i<4; i=i+1)
                        rt_delay[0][(i*32) +: 32] <= ($signed(ra[(i*32) +: 16])
                            * $signed(rb[(i*32)+16 +: 16])) << 16;
                end
                11'b01011000100 : begin                  //fa : Floating Add
                    int_delay[0] <= 0;
                    for (int i=0; i<4; i=i+1) begin
                        if (($bitstoshortreal(ra[(i*32) +: 32]) +
                            $bitstoshortreal(rb[(i*32) +: 32])) >=
                            $bitstoshortreal(32'h7F7FFFFFF))
                            rt_delay[0][(i*32) +: 32] <= 32'h7F7FFFFFF;
                        else if (($bitstoshortreal(ra[(i*32) +: 32]) +
                            $bitstoshortreal(rb[(i*32) +: 32])) <=
                            $bitstoshortreal(32'hFF7FFFFFF))
                            rt_delay[0][(i*32) +: 32] <= 32'hFF7FFFFFF;
                        else if (($bitstoshortreal(ra[(i*32) +: 32]) +
                            $bitstoshortreal(rb[(i*32) +: 32])) <=
                            $bitstoshortreal(32'h000000001)
                                && ($bitstoshortreal(ra[(i*32) +: 32]) +
                                    $bitstoshortreal(rb[(i*32) +: 32])) > 0)
                            rt_delay[0][(i*32) +: 32] <= 32'h000000001;
                        else if (($bitstoshortreal(ra[(i*32) +: 32]) +
                            $bitstoshortreal(rb[(i*32) +: 32])) >=

```

```

125           $bitstoshortreal(32'h80000001)
126           && ($bitstoshortreal(ra[(i*32) +: 32]) +
127             $bitstoshortreal(rb[(i*32) +: 32])) < 0)
128             rt_delay[0][(i*32) +: 32] <= 32'h80000001;
129       else
130         rt_delay[0][(i*32) +: 32] <=
131           $shortrealtobits($bitstoshortreal(ra[(i*32) +: 32]) +
132             $bitstoshortreal(rb[(i*32) +: 32]));
133       end
134     end
135   11'b01011000101 : begin                         //fs : Floating Subtract
136     int_delay[0] <= 0;
137     for (int i=0; i<4; i=i+1) begin
138       if ((($bitstoshortreal(ra[(i*32) +: 32]) -
139         $bitstoshortreal(rb[(i*32) +: 32])) >=
140           $bitstoshortreal(32'h7F7FFFFFF))
141             rt_delay[0][(i*32) +: 32] <= 32'h7F7FFFFFF;
142       else if ((($bitstoshortreal(ra[(i*32) +: 32]) -
143         $bitstoshortreal(rb[(i*32) +: 32])) <=
144           $bitstoshortreal(32'hFF7FFFFFF))
145             rt_delay[0][(i*32) +: 32] <= 32'hFF7FFFFFF;
146       else if ((($bitstoshortreal(ra[(i*32) +: 32]) -
147         $bitstoshortreal(rb[(i*32) +: 32])) <=
148           $bitstoshortreal(32'h00000001)
149             && ($bitstoshortreal(ra[(i*32) +: 32]) -
150               $bitstoshortreal(rb[(i*32) +: 32])) > 0)
151             rt_delay[0][(i*32) +: 32] <= 32'h00000001;
152       else if ((($bitstoshortreal(ra[(i*32) +: 32]) -
153         $bitstoshortreal(rb[(i*32) +: 32])) >=
154           $bitstoshortreal(32'h80000001)
155             && ($bitstoshortreal(ra[(i*32) +: 32]) -
156               $bitstoshortreal(rb[(i*32) +: 32])) < 0)
157             rt_delay[0][(i*32) +: 32] <= 32'h80000001;
158       else
159         rt_delay[0][(i*32) +: 32] <=
160           $shortrealtobits($bitstoshortreal(ra[(i*32) +: 32]) *
161             $bitstoshortreal(rb[(i*32) +: 32]));
162     end
163   end
164   11'b01111000010 : begin                         //fceq : Floating

```

```

166 Compare Equal
167     int_delay[0] <= 0;
168     for (int i=0; i<4; i=i+1) begin
169         if ($bitstoshortreal(ra[(i*32) +: 32]) ==
170             $bitstoshortreal(rb[(i*32) +: 32]))
171             rt_delay[0][(i*32) +: 32] <= 32'hFFFFFF;
172         else
173             rt_delay[0][(i*32) +: 32] <= 0;
174     end
175     11'b01011000010 : begin //fcgt : Floating
176         Compare Greater Than
177             int_delay[0] <= 0;
178             for (int i=0; i<4; i=i+1) begin
179                 if ($bitstoshortreal(ra[(i*32) +: 32]) >
180                     $bitstoshortreal(rb[(i*32) +: 32]))
181                     rt_delay[0][(i*32) +: 32] <= 32'hFFFFFF;
182                 else
183                     rt_delay[0][(i*32) +: 32] <= 0;
184             end
185         end
186         default begin
187             int_delay[0] <= 0;
188             rt_delay[0] <= 0;
189             rt_addr_delay[0] <= 0;
190             reg_write_delay[0] <= 0;
191         end
192     endcase
193 end
194 else if (format == 1) begin
195     case (op[7:10])
196         4'b1100 : begin //mpya : Multiply and Add
197             int_delay[0] <= 1;
198             for (int i=0; i<4; i=i+1)
199                 rt_delay[0][(i*32) +: 32] <= ($signed(ra[(i*32)+16 +:
200                     16]) * $signed(rb[(i*32)+16 +: 16])) +
201                     $signed(rc[(i*32) +: 32]);
202         end
203         4'b1110 : begin //fma : Floating Multiply and Add
204             int_delay[0] <= 0;
205             for (int i=0; i<4; i=i+1) begin
206                 if ((($bitstoshortreal(ra[(i*32) +: 32]) *
207                     $bitstoshortreal(rb[(i*32) +: 32])) +
208                     $bitstoshortreal(rc[(i*32) +: 32])) >=
209                     $bitstoshortreal(32'h7F7FFFFFF))
210                     rt_delay[0][(i*32) +: 32] <= 32'h7F7FFFFFF;
211                 else if (((($bitstoshortreal(ra[(i*32) +: 32]) *
212                     $bitstoshortreal(rb[(i*32) +: 32])) +
213                     $bitstoshortreal(rc[(i*32) +: 32])) <=
214                     $bitstoshortreal(32'hFF7FFFFFF))
215                     rt_delay[0][(i*32) +: 32] <= 32'hFF7FFFFFF;
216                 else if (((($bitstoshortreal(ra[(i*32) +: 32]) *
217                     $bitstoshortreal(rb[(i*32) +: 32])) +
218                     $bitstoshortreal(rc[(i*32) +: 32])) <=
219                     $bitstoshortreal(32'h00000001))
220                     && (($bitstoshortreal(ra[(i*32) +: 32]) *
221                         $bitstoshortreal(rb[(i*32) +: 32])) +
222                         $bitstoshortreal(rc[(i*32) +: 32])) > 0)
223                     rt_delay[0][(i*32) +: 32] <= 32'h00000001;
224                 else if (((($bitstoshortreal(ra[(i*32) +: 32]) *
225                     $bitstoshortreal(rb[(i*32) +: 32])) +
226                     $bitstoshortreal(rc[(i*32) +: 32])) >=
227                     $bitstoshortreal(32'h80000001))
228                     && (($bitstoshortreal(ra[(i*32) +: 32]) *
229                         $bitstoshortreal(rb[(i*32) +: 32])) +
230                         $bitstoshortreal(rc[(i*32) +: 32])) < 0)
231                     rt_delay[0][(i*32) +: 32] <= 32'h80000001;
232             end
233         end

```

```

$shortrealtobits(($bitstoshortreal(ra[(i*32) +: 32]) *
$bitstoshortreal(rb[(i*32) +: 32])) +
$bitstoshortreal(rc[(i*32) +: 32]));
213      end
214
215 4'b1111 : begin //fms : Floating Multiply and
Subtract
216      int_delay[0] <= 0;
217      for (int i=0; i<4; i=i+1) begin
218          if (((($bitstoshortreal(ra[(i*32) +: 32]) *
$bitstoshortreal(rb[(i*32) +: 32])) -
$bitstoshortreal(rc[(i*32) +: 32])) >=
$bitstoshortreal(32'h7F7FFFFFF))
219              rt_delay[0][(i*32) +: 32] <= 32'h7F7FFFFFF;
220          else if (((($bitstoshortreal(ra[(i*32) +: 32]) *
$bitstoshortreal(rb[(i*32) +: 32])) -
$bitstoshortreal(rc[(i*32) +: 32])) <=
$bitstoshortreal(32'hFF7FFFFFF))
221              rt_delay[0][(i*32) +: 32] <= 32'hFF7FFFFFF;
222          else if (((($bitstoshortreal(ra[(i*32) +: 32]) *
$bitstoshortreal(rb[(i*32) +: 32])) -
$bitstoshortreal(rc[(i*32) +: 32])) <=
$bitstoshortreal(32'h00000001)
223              && (($bitstoshortreal(ra[(i*32) +: 32]) *
$bitstoshortreal(rb[(i*32) +: 32])) -
$bitstoshortreal(rc[(i*32) +: 32])) > 0)
224                  rt_delay[0][(i*32) +: 32] <= 32'h00000001;
225          else if (((($bitstoshortreal(ra[(i*32) +: 32]) *
$bitstoshortreal(rb[(i*32) +: 32])) -
$bitstoshortreal(rc[(i*32) +: 32])) >=
$bitstoshortreal(32'h80000001)
226              && (($bitstoshortreal(ra[(i*32) +: 32]) *
$bitstoshortreal(rb[(i*32) +: 32])) -
$bitstoshortreal(rc[(i*32) +: 32])) < 0)
227                  rt_delay[0][(i*32) +: 32] <= 32'h80000001;
228          else
229              rt_delay[0][(i*32) +: 32] <=
$shortrealtobits(($bitstoshortreal(ra[(i*32) +: 32]) *
$bitstoshortreal(rb[(i*32) +: 32])) -
$bitstoshortreal(rc[(i*32) +: 32]));
230      end
231  end
232  default begin
233      int_delay[0] <= 0;
234      rt_delay[0] <= 0;
235      rt_addr_delay[0] <= 0;
236      reg_write_delay[0] <= 0;
237  end
238 endcase
239
240 end
241 else if (format == 3) begin
242     case (op[1:10])
243         10'b0111011000 : begin //cflts : Convert
Floating to Signed Integer
244             int_delay[0] <= 0;
245             for (int i=0; i<4; i=i+1) begin
246                 scale = 173 - $unsigned(imm[10:17]);
247                 if (scale > 127)
248                     scale = 127;
249                 else if (scale < 0)
250                     scale = 0;
251                 tempfp = $bitstoshortreal(ra[(i*32) +: 32]) * (2**scale);
252                 if (tempfp > (2**31 - 1))
253                     rt_delay[0][(i*32) +: 32] <= 32'h7FFFFFFF;
254                 else if (tempfp < -(2**31))
255                     rt_delay[0][(i*32) +: 32] <= 32'h80000000;
256                 else
257                     rt_delay[0][(i*32) +: 32] <= int'(tempfp);
258             end

```

```

258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308

      end
      10'b0111011001 : begin                         //cfltu : Convert
      Floating to Unsigned Integer
          int_delay[0] <= 0;
          for (int i=0; i<4; i=i+1) begin
              scale = 173 - $unsigned(imm[10:17]);
              if (scale > 127)
                  scale = 127;
              else if (scale < 0)
                  scale = 0;
              tempfp = $bitstoshortreal(ra[(i*32) +: 32]) *
              (2**scale); // << scale;
              if (tempfp > (2**31 - 1))
                  rt_delay[0][(i*32) +: 32] <= 32'h7FFFFFFF;
              else if (tempfp < 0)
                  rt_delay[0][(i*32) +: 32] <= 0;
              else
                  rt_delay[0][(i*32) +: 32] <= int'(tempfp);
          end
      end
      default begin
          int_delay[0] <= 0;
          rt_delay[0] <= 0;
          rt_addr_delay[0] <= 0;
          reg_write_delay[0] <= 0;
      end
  endcase
end
else if (format == 4) begin
    case (op[3:10])
        8'b01110100 : begin                         //mpyi : Multiply Immediate
            int_delay[0] <= 1;
            for (int i=0; i<4; i=i+1)
                rt_delay[0][(i*32) +: 32] <= $signed(ra[(i*32)+16 +:
                16]) * $signed(imm[8:17]);
        end
        8'b01110101 : begin                         //mpyui : Multiply Unsigned
        Immediate
            int_delay[0] <= 1;
            temp16 = $signed(imm[8:17]);
            for (int i=0; i<4; i=i+1)
                rt_delay[0][(i*32) +: 32] <= $unsigned(ra[(i*32)+16 +:
                16]) * $unsigned(temp16);
        end
        default begin
            int_delay[0] <= 0;
            rt_delay[0] <= 0;
            rt_addr_delay[0] <= 0;
            reg_write_delay[0] <= 0;
        end
    endcase
end
end
end
endmodule

```

```

1  module SimpleFixed2(clk, reset, op, format, rt_addr, ra, rb, imm, reg_write, rt_wb,
2    rt_addr_wb, reg_write_wb, branch_taken, rt_addr_delay, reg_write_delay);
3
4    input          clk, reset;
5
6    //RF/FWD Stage
7    input [0:10]   op;           //Decoded opcode, truncated based on format
8    input [2:0]    format;       //Format of instr, used with op and imm
9    input [0:6]    rt_addr;      //Destination register address
10   input [0:127]  ra, rb;      //Values of source registers
11   input [0:17]   imm;         //Immediate value, truncated based on format
12   input          reg_write;   //Will current instr write to RegTable
13   input          branch_taken; //Was branch taken?
14
15  //WB Stage
16  output logic [0:127] rt_wb;    //Output value of Stage 3
17  output logic [0:6]  rt_addr_wb; //Destination register for rt_wb
18  output logic [0:127] reg_write_wb; //Will rt_wb write to RegTable
19
20  //Internal Signals
21  logic [3:0][0:127] rt_delay;    //Staging register for calculated values
22  output logic [3:0][0:6] rt_addr_delay; //Destination register for rt_wb
23  output logic [3:0]   reg_write_delay; //Will rt_wb write to RegTable
24
25  logic [6:0]          i;          //7-bit counter for loops
26  logic [0:127]        tmp,s;
27
28  always_comb begin
29    rt_wb = rt_delay[2];
30    rt_addr_wb = rt_addr_delay[2];
31    reg_write_wb = reg_write_delay[2];
32  end
33
34  always_ff @(posedge clk) begin
35    if (reset == 1) begin
36      rt_delay[3] <= 0;
37      rt_addr_delay[3] <= 0;
38      reg_write_delay[3] <= 0;
39      for (i=0; i<3; i=i+1) begin
40        rt_delay[i] <= 0;
41        rt_addr_delay[i] <= 0;
42        reg_write_delay[i] <= 0;
43      end
44    end
45    else begin
46      rt_delay[3] <= rt_delay[2];
47      rt_addr_delay[3] <= rt_addr_delay[2];
48      reg_write_delay[3] <= reg_write_delay[2];
49
50      rt_delay[2] <= rt_delay[1];
51      rt_addr_delay[2] <= rt_addr_delay[1];
52      reg_write_delay[2] <= reg_write_delay[1];
53
54      rt_delay[1] <= rt_delay[0];
55      rt_addr_delay[1] <= rt_addr_delay[0];
56      reg_write_delay[1] <= reg_write_delay[0];
57
58      if (format == 0 && op == 0) begin //nop : No Operation
59        (Execute)
60        rt_delay[0] <= 0;
61        rt_addr_delay[0] <= 0;
62        reg_write_delay[0] <= 0;
63      end
64      else begin
65        rt_addr_delay[0] <= rt_addr;
66        reg_write_delay[0] <= reg_write;
67        if (branch_taken) begin
68          rt_delay[0] <= 0;
69          rt_addr_delay[0] <= 0;
70          reg_write_delay[0] <= 0;
71        end
72      end
73    end
74  end

```

```

68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123

end
else if (format == 0) begin
    case (op)
        11'b00001011111 : begin                                //shlh : Shift Left
            Halfword
                for (i=0; i<8; i=i+1) begin
                    if ((rb[(i*16) +: 16] & 16'h001F) < 16) begin
                        rt_delay[0][(i*16) +: 16] <= ra[(i*16) +: 16] <<
                            (rb[(i*16) +: 16] & 16'h001F);
                    end
                    else
                        rt_delay[0][(i*16) +: 16] <= 0;
                end
            end
        11'b00001011011 : begin                                //shl rt, ra, rb :
            Shift Left Word
                for (i=0; i<16; i=i+4) begin
                    if ((rb[(i*8) +: 32] & 32'h0000003F) < 32) begin
                        rt_delay[0][(i*8) +: 32] = ra[(i*8) +: 32] <<
                            (rb[(i*8) +: 32] & 32'h0000003F);
                    end
                    else begin
                        rt_delay[0][(i*8) +: 32] = 0;
                    end
                end
            end
        11'b00001011100 : begin                                //roth rt, ra, rb :
            Rotate Halfword
                for(i = 0;i<=15;i=i+2) begin
                    tmp[0:15] = ra[(i*8) +: 16];
                    for(int b = 0;b<16;b=b+1) begin
                        if( ( b+(rb[(i*8) +: 16] & 16'h000F)) < 16 ) begin
                            rt_delay[0][(i*8)+b] = tmp[b+(rb[(i*8) +: 16] &
                                16'h000F)];
                        end
                        else begin
                            rt_delay[0][(i*8)+b] = tmp[b+(rb[(i*8) +: 16] &
                                16'h000F)-16];
                        end
                    end
                end
            end
        11'b00001011000 : begin                                //rot rt, ra, rb :
            Rotate Word
                for(i = 0;i<=15;i=i+4) begin
                    tmp[0:31] = ra[(i*8) +: 32];
                    for(int b = 0;b<32;b=b+1) begin
                        if( ( b+(rb[(i*8) +: 32] & 32'h0000001F)) < 32 )
                            begin
                                rt_delay[0][(i*8)+b] = tmp[b+(rb[(i*8) +: 32] &
                                    32'h0000001F)];
                            end
                        else begin
                            rt_delay[0][(i*8)+b] = tmp[b+(rb[(i*8) +: 32] &
                                32'h0000001F)-32];
                        end
                    end
                end
            end
        11'b00001011101 : begin                                //rothm rt, ra, rb
            : Rotate and Mask Halfword

                for(i = 0;i<=15;i=i+2) begin
                    tmp[0:15] = ra[(i*8) +: 16];
                    for(int b = 0;b<16;b=b+1) begin
                        if( b > (( 0- rb[(i*8) +: 16] ) & 16'h001F)) begin
                            rt_delay[0][(i*8)+b] = tmp[b-(( 0- rb[(i*8) +:
                                16] ) & 16'h001F)];
                        end
                    end
                end
            end

```

```

124
125
126
127
128
129
130
131      11'b00001011001 : begin //rotm rt, ra, rb :
132          Rotate and Mask Word
133
134
135
136      for(i = 0;i<=15;i=i+4) begin
137          tmp[0:31] = ra[(i*8) +: 32];
138          for(int b = 0;b<32;b=b+1) begin
139              if( b > (( 0- rb[(i*8) +: 32] ) & 32'h0000003F))
140                  begin
141                      rt_delay[0][(i*8)+b] = tmp[b-(( 0- rb[(i*8) +:
142                          32] ) & 32'h0000003F)];
143                  end
144              else begin
145                  rt_delay[0][(i*8)+b] =0;
146              end
147          end
148      end
149      11'b00001011110 : begin //rotmah rt, ra, rb
150          : Rotate and Mask Algebraic Halfword
151
152
153
154
155
156
157
158
159      for(i = 0;i<=15;i=i+2) begin
160          tmp[0:15] = ra[(i*8) +: 16];
161          for(int b = 0;b<16;b=b+1) begin
162              if( b >= (( 0- rb[(i*8) +: 16] ) & 16'h001E)) begin
163                  rt_delay[0][(i*8)+b] = tmp[b-(( 0- rb[(i*8) +:
164                          16] ) & 16'h001E)];
165              end
166              else begin
167                  rt_delay[0][(i*8)+b] =tmp[0];
168              end
169          end
170      end
171
172
173      default begin
174          rt_delay[0] <= 0;
175          rt_addr_delay[0] <= 0;
176          reg_write_delay[0] <= 0;
177      end
178  endcase
179
180  else if (format == 2) begin
181      case (op)
182          11'b00001111011 : begin //shli rt, ra, imm7 : Shift Left Word
183              Immediate
184                  for(int i=0;i<26;i=i+1) begin

```

```

184     s[i] = imm[11];
185   end
186   s[26:31] = imm[11:17];
187   for(i = 0;i<=15;i=i+4) begin
188     tmp[0:31] = ra[(i*8) +: 32];
189     for(int b = 0;b<32;b=b+1) begin
190       if( b+(s[0:31] & 32'h0000003F) < 32 ) begin
191         rt_delay[0][(i*8)+b] = tmp[b+(s[0:31] &
192           32'h0000003F)];
193       end
194       else begin
195         rt_delay[0][(i*8)+b] = 0;
196       end
197     end
198   end
199
200   11'b0000111100: begin // rothi rt, ra, imm7 : Rotate
201   Halfword Immediate
202     for(int i=0;i<9;i=i+1) begin
203       s[i] = imm[11];
204     end
205     s[9:15] = imm[11:17];
206
207     for(i = 0;i<=15;i=i+2) begin
208       tmp[0:15] = ra[(i*8) +: 16];
209       for(int b = 0;b<16;b=b+1) begin
210         if( ( b+(s[0:15] & 16'h000F)) < 16 ) begin
211           rt_delay[0][(i*8)+b] = tmp[b+(s[0:15] &
212             16'h000F)];
213         end
214         else begin
215           rt_delay[0][(i*8)+b] = tmp[b+(s[0:15] &
216             16'h000F)-16];
217         end
218       end
219     end
220   end
221   11'b00001111000 : begin //roti rt, ra, imm7 : Rotate
222   Word Immediate
223     for(int i=0;i<26;i=i+1) begin
224       s[i] = imm[11];
225     end
226     s[26:31] = imm[11:17];
227     for(i = 0;i<=15;i=i+4) begin
228       tmp[0:31] = ra[(i*8) +: 32];
229       for(int b = 0;b<32;b=b+1) begin
230         if( ( b+(s[0:31] & 32'h0000001F)) < 32 ) begin
231           rt_delay[0][(i*8)+b] = tmp[b+(s[0:31] &
232             32'h0000001F)];
233         end
234         else begin
235           rt_delay[0][(i*8)+b] = tmp[b+(s[0:31] &
236             32'h0000001F)-32];
237         end
238       end
239     end
240   end
241   11'b0000111110 : begin //rotmahi rt, ra, imm7 : Rotate and
242   Mask Algebraic Halfword Immediate
243
244     for(int i=0;i<9;i=i+1) begin
245       s[i] = imm[11];
246     end
247     s[9:15] = imm[11:17];
248
249     for(i = 0;i<=15;i=i+2) begin
250       tmp[0:15] = ra[(i*8) +: 16];
251       for(int b = 0;b<16:b=b+1) begin

```

```

245                               if( b >= ((0-s[0:15]) & 16'h001F)) begin
246                                   rt_delay[0][(i*8)+b] = tmp[b-((0-s[0:15]) &
247                                       16'h001F)];
248                               end
249                           else begin
250                               rt_delay[0][(i*8)+b] =tmp[0];
251                           end
252                       end
253                   end
254               11'b000001111010 : begin //rotmai rt, ra, imm7 : Rotate and Mask
Algebraic Word Immediate
255                   for(int i=0;i<26;i=i+1) begin
256                       s[i] = imm[11];
257                   end
258                   s[26:31] = imm[11:17];
259                   for(i = 0;i<=15;i=i+4) begin
260                       tmp[0:31] = ra[(i*8) +: 32];
261                       for(int b = 0;b<32;b=b+1) begin
262                           if( b>=((0-s[0:31]) & 32'h0000003F)) begin
263                               rt_delay[0][(i*8)+b] = tmp[b-((0-s[0:31]) &
264                                               32'h0000003F)];
265                           end
266                           else begin
267                               rt_delay[0][(i*8)+b] = tmp[0];
268                           end
269                       end
270                   end
271                   default begin
272                       rt_delay[0] = 0;
273                       rt_addr_delay[0] = 0;
274                       reg_write_delay[0] = 0;
275                   end
276               endcase
277           end
278       end
279   end
280 end
281
282 endmodule

```

```

1  module Byte(clk, reset, op, format, rt_addr, ra, rb, imm, reg_write, rt_wb, rt_addr_wb,
2   reg_write_wb, branch_taken, rt_addr_delay, reg_write_delay);
3
4   //RF/FWD Stage
5   input [0:10]    op;           //Decoded opcode, truncated based on format
6   input [2:0]     format;       //Format of instr, used with op and imm
7   input [0:6]     rt_addr;      //Destination register address
8   input [0:127]   ra, rb;       //Values of source registers
9   input [0:17]   imm;          //Immediate value, truncated based on format
10  input         reg_write;    //Will current instr write to RegTable
11  input         branch_taken; //Was branch taken?
12
13  //WB Stage
14  output logic [0:127] rt_wb;    //Output value of Stage 3
15  output logic [0:6]  rt_addr_wb; //Destination register for rt_wb
16  output logic      reg_write_wb; //Will rt_wb write to RegTable
17
18  //Internal Signals
19  logic [3:0][0:127] rt_delay;    //Staging register for calculated values
20  output logic [3:0][0:6] rt_addr_delay; //Destination register for rt_wb
21  output logic [3:0]   reg_write_delay; //Will rt_wb write to RegTable
22
23  logic [3:0]        temp;        //Incrementing counter
24
25  always_comb begin
26    rt_wb = rt_delay[2];
27    rt_addr_wb = rt_addr_delay[2];
28    reg_write_wb = reg_write_delay[2];
29  end
30
31  always_ff @(posedge clk) begin
32    if (reset == 1) begin
33
34      rt_delay[3] <= 0;
35      rt_addr_delay[3] <= 0;
36      reg_write_delay[3] <= 0;
37      for (int i=0; i<3; i=i+1) begin
38        rt_delay[i] <= 0;
39        rt_addr_delay[i] <= 0;
40        reg_write_delay[i] <= 0;
41      end
42    end
43    else begin
44      rt_delay[3] <= rt_delay[2];
45      rt_addr_delay[3] <= rt_addr_delay[2];
46      reg_write_delay[3] <= reg_write_delay[2];
47      rt_delay[2] <= rt_delay[1];
48      rt_addr_delay[2] <= rt_addr_delay[1];
49      reg_write_delay[2] <= reg_write_delay[1];
50      rt_delay[1] <= rt_delay[0];
51      rt_addr_delay[1] <= rt_addr_delay[0];
52      reg_write_delay[1] <= reg_write_delay[0];
53
54      if (format == 0 && op == 0) begin //nop : No Operation
55        (Execute)
56        rt_delay[0] <= 0;
57        rt_addr_delay[0] <= 0;
58        reg_write_delay[0] <= 0;
59      end
60      else begin
61        rt_addr_delay[0] <= rt_addr;
62        reg_write_delay[0] <= reg_write;
63        if (branch_taken) begin
64          rt_delay[0] <= 0;
65          rt_addr_delay[0] <= 0;
66          reg_write_delay[0] <= 0;
67        end
68        else if (format == 0) begin

```

```

68
69     case (op)
70         11'b01010110100 : begin //cntb : Count Ones in
71             Bytes
72                 for (int i=0; i<16; i=i+1) begin
73                     temp = 0;
74                     for (int j=0; j<8; j=j+1) begin
75                         if (ra[(i*8)+j] == 1'b1)
76                             temp = temp + 1;
77                     end
78                     rt_delay[0][(i*8) +: 8] <= temp;
79                 end
80             end
81             11'b00011010011 : begin //avgb : Average Bytes
82                 for (int i=0; i<16; i=i+1) begin
83                     rt_delay[0][(i*8) +: 8] <= ({2'b00, ra[(i*8) +: 8]} +
84                     {2'b00, rb[(i*8) +: 8]} + 1) >> 1;
85                 end
86             end
87             11'b00001010011 : begin //absdb : Absolute
88                 Difference of Bytes
89                 for (int i=0; i<16; i=i+1) begin
90                     if ($signed(ra[(i*8) +: 8]) > $signed(rb[(i*8) +: 8]))
91                         rt_delay[0][(i*8) +: 8] <= $signed(ra[(i*8) +: 8])
92                         - $signed(rb[(i*8) +: 8]);
93                     else
94                         rt_delay[0][(i*8) +: 8] <= $signed(rb[(i*8) +: 8])
95                         - $signed(ra[(i*8) +: 8]);
96                 end
97             end
98             11'b01001010011 : begin //sumb : Sum Bytes into
99                 Halfwords
100                 for (int i=0; i<4; i=i+1) begin
101                     rt_delay[0][(i*32) +: 16] <= $signed(rb[(i*32) +: 8]) +
102                     $signed(rb[(i*32)+8 +: 8]) + $signed(rb[(i*32)+16 +:
103                     8]) + $signed(rb[(i*32)+24 +: 8]);
104                     rt_delay[0][(i*32)+16 +: 16] <= $signed(ra[(i*32) +:
105                     8]) + $signed(ra[(i*32)+8 +: 8]) + $signed(ra[(i*32)+16
106                     +: 8]) + $signed(ra[(i*32)+24 +: 8]);
107                 end
108             end
109             default begin
110                 rt_delay[0] <= 0;
111                 rt_addr_delay[0] <= 0;
112                 reg_write_delay[0] <= 0;
113             end
114         endcase
115     end
116     else begin
117         rt_delay[0] <= 0;
118         rt_addr_delay[0] <= 0;
119         reg_write_delay[0] <= 0;
120     end
121 end
122
123 endmodule

```

```

1  module SimpleFixed1(clk, reset, op, format, rt_addr, ra, rb, rt_st, imm, reg_write,
2    rt_wb, rt_addr_wb, reg_write_wb, branch_taken, rt_addr_delay, reg_write_delay);
3
4    //RF/FWD Stage
5    input [0:10]      op;           //Decoded opcode, truncated based on format
6    input [2:0]       format;        //Format of instr, used with op and imm
7    input [0:6]       rt_addr;       //Destination register address
8    input [0:127]     ra, rb, rt_st; //Values of source registers
9    input [0:17]      imm;          //Immediate value, truncated based on format
10   input            reg_write;    //Will current instr write to RegTable
11   input            branch_taken; //Was branch taken?
12
13  //WB Stage
14  output logic [0:127]  rt_wb;      //Output value of Stage 3
15  output logic [0:6]    rt_addr_wb;  //Destination register for rt_wb
16  output logic          reg_write_wb; //Will rt_wb write to RegTable
17
18  //Internal Signals
19  logic [1:0][0:127]  rt_delay;    //Staging register for calculated values
20  output logic [1:0][0:6] rt_addr_delay; //Destination register for rt_wb
21  output logic [1:0]    reg_write_delay; //Will rt_wb write to RegTable
22
23  logic [6:0]         i;           //7-bit counter for loops
24  logic signed [31:0]  max_value_32 = 32'h7FFFFFFF;
25  logic signed [31:0]  min_value_32 = 32'h80000000;
26
27  logic signed [15:0]  max_value_16 = 16'h7FFF;
28  logic signed [15:0]  min_value_16 = 16'h8000;
29  logic signed [0:31]  mask = 1 << 31;
30
31  logic [0:128] tmp;
32
33  always_comb begin
34    rt_wb = rt_delay[0];
35    rt_addr_wb = rt_addr_delay[0];
36    reg_write_wb = reg_write_delay[0];
37  end
38
39  always_ff @ (posedge clk) begin
40    if (reset == 1) begin
41      rt_delay[1] <= 0;
42      rt_addr_delay[1] <= 0;
43      reg_write_delay[1] <= 0;
44      rt_delay[0] <= 0;
45      rt_addr_delay[0] <= 0;
46      reg_write_delay[0] <= 0;
47      tmp = 0;
48    end
49    else begin
50      rt_delay[1] <= rt_delay[0];
51      rt_addr_delay[1] <= rt_addr_delay[0];
52      reg_write_delay[1] <= reg_write_delay[0];
53
54      if (format == 0 && op == 0) begin //nop : No Operation
55        (Execute)
56        rt_delay[0] <= 0;
57        rt_addr_delay[0] <= 0;
58        reg_write_delay[0] <= 0;
59      end
60      else begin
61        rt_addr_delay[0] <= rt_addr;
62        reg_write_delay[0] <= reg_write;
63        if (branch_taken) begin
64          rt_delay[0] = 0;
65          rt_addr_delay[0] = 0;
66          reg_write_delay[0] = 0;
67        end
68        else if (format == 0) begin

```

```

68
69     case (op)
70         11'b000011001000 : begin //ah : Add Halfword
71             for (i=0; i<16; i=i+1) begin
72                 if (($signed(ra[(i * 16) +:16]) + $signed(rb[(i * 16) +:16])) >= max_value_16)
73                     rt_delay[0][(i * 16) +:16] = max_value_16;
74                 else if (($signed(ra[(i * 16) +:16]) + $signed(rb[(i * 16) +:16])) <= min_value_16)
75                     rt_delay[0][(i * 16) +:16] = min_value_16;
76                 else
77                     rt_delay[0][(i * 16) +:16] = $signed(ra[(i * 16) +:16]) + $signed(rb[(i * 16) +:16]);
78             end
79         end
80         11'b000011000000 : begin //ah : Add Word
81             for (i=0; i<32; i=i+1) begin
82                 if (($signed(ra[(i * 32) +:32]) + $signed(rb[(i * 32) +:32])) >= max_value_32)
83                     rt_delay[0][(i * 32) +:32] = max_value_32;
84                 else if (($signed(ra[(i * 32) +:32]) + $signed(rb[(i * 32) +:32])) <= min_value_32)
85                     rt_delay[0][(i * 32) +:32] = min_value_32;
86                 else
87                     rt_delay[0][(i * 32) +:32] = $signed(ra[(i * 32) +:32]) + $signed(rb[(i * 32) +:32]);
88             end
89         end
90         11'b000001001000 : begin //sfh rt, ra, rb : Subtract from Halfword
91             for (i=0; i<16; i=i+1) begin
92                 if (($signed(rb[(i * 16) +:16]) - $signed(ra[(i * 16) +:16])) >= max_value_16)
93                     rt_delay[0][(i * 16) +:16] = max_value_16;
94                 else if ($signed(rb[(i * 16) +:16]) - $signed(ra[(i * 16) +:16]) <= min_value_16)
95                     rt_delay[0][(i * 16) +:16] = min_value_16;
96                 else
97                     rt_delay[0][(i * 16) +:16] = $signed(rb[(i * 16) +:16]) - $signed(ra[(i * 16) +:16]);
98             end
99         end
100        11'b000001000000 : begin //sf rt, ra, rb : Subtract from Word
101            for (i=0; i<32; i=i+1) begin
102                if ($signed(rb[(i * 32) +:32]) - $signed(ra[(i * 32) +:32]) >= max_value_32)
103                    rt_delay[0][(i * 32) +:32] = max_value_32;
104                else if ($signed(rb[(i * 32) +:32]) - $signed(ra[(i * 32) +:32]) <= min_value_32)
105                    rt_delay[0][(i * 32) +:32] = min_value_32;
106                else
107                    rt_delay[0][(i * 32) +:32] = $signed(rb[(i * 32) +:32]) - $signed(ra[(i * 32) +:32]);
108            end
109        end
110        11'b000011000001 : begin // and
111            rt_delay[0] = ra & rb;
112        end
113        11'b000001000001 : begin // or
114            rt_delay[0] = ra | rb;
115        end
116        11'b010001000001 : begin // xor
117            rt_delay[0] = ra ^ rb;
118        end
119        11'b000011001001 : begin // nand
120            rt_delay[0] = ~ (ra & rb);
121        end
122        11'b011110100000 : begin // ceqb rt, ra, rb Compare Equal Byte
123            if(ra==rb) begin
124                rt_delay[0]=128'hFFFFFFFFFFFFFFF;

```

```

124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186

        end
    else begin
        rt_delay[0]=128'h00000000000000000000000000000000;
    end
end
11'b01111001000 : begin // ceqh rt, ra, rb Compare Equal Halfword
for(int i = 0;i<16;i=i+2) begin
    if(ra[(i*8) +: 16]==rb[(i*8) +: 16]) begin
        rt_delay[0][(i*8) +: 16] = 16'hFFFF;
    end
    else begin
        rt_delay[0][(i*8) +: 16] = 16'h0000;
    end
end
11'b01111000000 : begin // ceq rt, ra, rb Compare Equal Word
for(int i = 0;i<16;i=i+4) begin
    if(ra[(i*8) +: 32]==rb[(i*8) +: 32]) begin
        rt_delay[0][(i*8) +: 32] = 32'hFFFFFF;
    end
    else begin
        rt_delay[0][(i*8) +: 32] = 32'h00000000;
    end
end
11'b010001010000 : begin // cgtb rt, ra, rb Compare Greater Than
Byte
for(int i = 0;i<16;i=i+1) begin
    if($signed(ra[(i*8) +: 8])>$signed(rb[(i*8) +: 8])) begin
        rt_delay[0][(i*8) +: 8] = 8'hFF;
    end
    else begin
        rt_delay[0][(i*8) +: 8] = 8'h00;
    end
end
11'b010001001000 : begin // cgth rt, ra, rb Compare Greater Than
Halfword
for(int i = 0;i<16;i=i+2) begin
    if($signed(ra[(i*8) +: 16])>$signed(rb[(i*8) +: 16])) begin
        rt_delay[0][(i*8) +: 16] = 16'hFFFF;
    end
    else begin
        rt_delay[0][(i*8) +: 16] = 16'h0000;
    end
end
11'b010001000000 : begin // cgt rt, ra, rb Compare Greater Than
Word
for(int i = 0;i<16;i=i+4) begin
    if($signed(ra[(i*8) +: 32])>$signed(rb[(i*8) +: 32])) begin
        rt_delay[0][(i*8) +: 32] = 32'hFFFFFF;
    end
    else begin
        rt_delay[0][(i*8) +: 32] = 32'h00000000;
    end
end
11'b01011010000 : begin // clgtb rt, ra, rb Compare Logical
Greater Than Byte
for(int i = 0;i<16;i=i+1) begin
    if(ra[(i*8) +: 8]>rb[(i*8) +: 8]) begin
        rt_delay[0][(i*8) +: 8] = 8'hFF;
    end
    else begin
        rt_delay[0][(i*8) +: 8] = 8'h00;
    end
end

```

```

187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
      end
    end
  end
  11'b01011001000 : begin // clgth rt, ra, rb Compare Logical
  Greater Than Halfword
    for(int i = 0;i<16;i=i+2) begin
      if(ra[(i*8) +: 16]>rb[(i*8) +: 16]) begin
        rt_delay[0][(i*8) +: 16] = 16'hFFFF;
      end
      else begin
        rt_delay[0][(i*8) +: 16] = 16'h0000;
      end
    end
  end
  11'b01011000000 : begin // clgt rt, ra, rb Compare Logical
  Greater Than Word
    for(int i = 0;i<16;i=i+4) begin
      if(ra[(i*8) +: 32]>rb[(i*8) +: 32]) begin
        rt_delay[0][(i*8) +: 32] = 32'hFFFFFF;
      end
      else begin
        rt_delay[0][(i*8) +: 32] = 32'h00000000;
      end
    end
  end
  default begin
    rt_delay[0] <= 0;
    rt_addr_delay[0] <= 0;
    reg_write_delay[0] <= 0;
  end
endcase
end
else if (format == 4) begin //RI10-type
  case (op)
    8'b000011101 : begin
      //ahi rt, ra, imm10 : Add
      Halfword Immediate
      for (i=0; i<16; i=i+1) begin
        if (($signed(ra[(i * 16) +:16]) + $signed(imm[8:17])) >=
          max_value_16)
          rt_delay[0][(i * 16) +:16] = max_value_16;
        else if (($signed(ra[(i * 16) +:16]) +
          $signed(imm[8:17])) <= min_value_16)
          rt_delay[0][(i * 16) +:16] = min_value_16;
        else
          rt_delay[0][(i * 16) +:16] = $signed(ra[(i * 16) +:16]) +
          $signed(imm[8:17]);
      end
    end
    8'b000011100 : begin
      //ai rt, ra, imm10 : Add
      Word Immediate
      for (i=0; i<4; i=i+1) begin
        if (($signed(ra[(i * 32) +:32]) + $signed(imm[8:17])) >=
          max_value_32)
          rt_delay[0][(i * 32) +:32] = max_value_32;
        else if (($signed(ra[(i * 32) +:32]) +
          $signed(imm[8:17])) <= min_value_32)
          rt_delay[0][(i * 32) +:32] = min_value_32;
        else
          rt_delay[0][(i * 32) +:32] = $signed(ra[(i * 32) +:32]) +
          $signed(imm[8:17]);
      end
    end
    8'b000001101 : begin
      //sfhi rt, ra, imm10 : Subtract from
      Halfword Immediate
      for (i=0; i<16; i=i+1) begin

```

```

245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303

        if (($signed(imm[8:17]) - $signed(ra[(i * 16) +:16])) >= max_value_16)
            rt_delay[0][(i * 16) +:16] = max_value_16;
        else if (($signed(imm[8:17]) - $signed(ra[(i * 16) +:16])) <= min_value_16)
            rt_delay[0][(i * 16) +:16] = min_value_16;
        else
            rt_delay[0][(i * 16) +:16] = $signed(imm[8:17]) -
                $signed(ra[(i * 16) +:16]);
    end
end
8'b000001100 : begin      //sfi rt, ra, imm10 : Subtract from
Word Immediate
    for (i=0; i<32; i=i+1) begin
        if (($signed(imm[8:17]) - $signed(ra[(i * 32) +:32])) >= max_value_32)
            rt_delay[0][(i * 32) +:32] = max_value_32;
        else if (($signed(imm[8:17]) - $signed(ra[(i * 32) +:32])) <= min_value_32)
            rt_delay[0][(i * 32) +:32] = min_value_32;
        else
            rt_delay[0][(i * 32) +:32] = $signed(imm[8:17]) -
                $signed(ra[(i * 32) +:32]);
    end
end
8'b000010110 : begin // andbi rt, ra, imm10 And Byte Immediate
    for(int i=0;i<15;i=i+2) begin
        rt_delay[0][(i*8) +: 16] = ra[(i*8) +: 16] &
            ((imm[10:17] & 8'hFF) | (imm[9:17] & 8'hFF) <<16) ;
    end
end

8'b000010101 : begin // andhi rt, ra, imm10 And Halfword Immediate
    for(int i=0;i<6;i=i+1) begin
        tmp[i] = imm[8];
    end
    tmp[6:15] = imm[8:17];
    for(int i=0;i<15;i=i+2) begin
        rt_delay[0][(i*8) +: 16] = ra[(i*8) +: 16] & tmp[0:15];
    end
end
8'b000010100 : begin // andi rt, ra, imm10 And Word Immediate
    for(int i=0;i<21;i=i+1) begin
        tmp[i] = imm[8];
    end
    tmp[21:31] = imm[8:17];
    for(int i=0;i<15;i=i+4) begin
        rt_delay[0][(i*8) +: 32] = ra[(i*8) +: 32] & tmp[0:31];
    end
end

8'b000000110 : begin // orbi rt, ra, imm10 Or Byte Immediate
    for(int i=0;i<15;i=i+4) begin
        rt_delay[0][(i*8) +: 32] = ra[(i*8) +: 32] |
            ({(imm[10:17] & 8'hFF), (imm[10:17] &
            8'hFF), (imm[10:17] & 8'hFF)}) ;
    end
end

8'b000000101 : begin // orhi rt, ra, imm10 Or Halfword Immediate
    for(int i=0;i<6;i=i+1) begin
        tmp[i] = imm[8];
    end
    tmp[6:15] = imm[8:17];
    for(int i=0;i<15;i=i+2) begin
        rt_delay[0][(i*8) +: 16] = ra[(i*8) +: 16] | tmp[0:15];
    end
end
8'b000000100 : begin // ori rt, ra, imm10 Or Word Immediate

```

```

304
305         for(int i=0;i<21;i=i+1) begin
306             tmp[i] = imm[8];
307         end
308         tmp[21:31] = imm[8:17];
309         for(int i=0;i<15;i=i+4) begin
310             rt_delay[0][(i*8) +: 32] = ra[(i*8) +: 32] | tmp[0:31];
311         end
312     end
313
314
315         8'b01000110 : begin // xorbi rt, ra, imm10 Exclusive Or Byte
316             Immediate
317                 for(int i=0;i<15;i=i+4) begin
318                     rt_delay[0][(i*8) +: 32] = ra[(i*8) +: 32] ^
319                         ({(imm[10:17] & 8'hFF), (imm[10:17] &
320                           8'hFF), (imm[10:17] & 8'hFF), (imm[10:17] & 8'hFF)});
321             end
322         end
323
324
325         8'b01000101 : begin // xorhi rt, ra, imm10 Xor Halfword Immediate
326             for(int i=0;i<6;i=i+1) begin
327                 tmp[i] = imm[8];
328             end
329             tmp[6:15] = imm[8:17];
330             for(int i=0;i<15;i=i+2) begin
331                 rt_delay[0][(i*8) +: 16] = ra[(i*8) +: 16] ^ tmp[0:15];
332             end
333         end
334
335         8'b01000100 : begin // xori rt, ra, imm10 Xor Word Immediate
336             for(int i=0;i<21;i=i+1) begin
337                 tmp[i] = imm[8];
338             end
339             tmp[21:31] = imm[8:17];
340             for(int i=0;i<15;i=i+4) begin
341                 rt_delay[0][(i*8) +: 32] = ra[(i*8) +: 32] ^ tmp[0:31];
342             end
343         end
344
345
346         8'b01111110 : begin // ceqbi rt, ra, imm10 Compare Equal Byte
347             Immediate
348                 for(int i=0;i<16;i=i+1) begin
349                     if(ra[(i*8) +: 8] == imm[10:17]) begin
350                         rt_delay[0][(i*8) +: 8] = 8'hFF;
351                     end
352                     else begin
353                         rt_delay[0][(i*8) +: 8] = 8'h00;
354                     end
355                 end
356
357
358         8'b01111101 : begin // ceqhi rt, ra, imm10 Compare Equal
359             Halfword Immediate
360                 for(int i=0;i<6;i=i+1) begin
361                     tmp[i] = imm[8];
362                 end
363                 tmp[6:15] = imm[8:17];
364
365                 for(int i=0;i<16;i=i+2) begin
366                     if(ra[(i*8) +: 16] == tmp[0:15]) begin
367                         rt_delay[0][(i*8) +: 16] = 16'hFFFF;
368                     end
369                     else begin
370                         rt_delay[0][(i*8) +: 16] = 16'h0000;
371                     end
372                 end
373             end
374
375         8'b01111100 : begin // ceqi rt, ra, imm10 Compare Equal Word

```

```

368
369
370
371
372
373
374
375
376
377
378
379
380
381
382     Immediate
383         for(int i=0;i<22;i=i+1) begin
384             tmp[i] = imm[8];
385         end
386         tmp[22:31] = imm[8:17];
387         for(int i=0;i<16;i=i+4) begin
388             if(ra[(i*8) +: 32] == tmp[0:31]) begin
389                 rt_delay[0][(i*8) +: 32] = 32'hFFFFFF;
390             end
391             else begin
392                 rt_delay[0][(i*8) +: 32] = 32'h00000000;
393             end
394         end
395
396
397
398
399
400     8'b01001110 : begin // cgtbi rt, ra, imm10 Compare Greater Than
401     Byte Immediate
402         for(int i=0;i<16;i=i+1) begin
403             if($signed(ra[(i*8) +: 8]) > $signed(imm[10:17])) begin
404                 rt_delay[0][(i*8) +: 8] = 8'hFF;
405             end
406             else begin
407                 rt_delay[0][(i*8) +: 8] = 8'h00;
408             end
409         end
410
411
412
413
414
415     8'b01001101 : begin // cgthi rt, ra, imm10 Compare Greater Than
416     Halfword Immediate
417         for(int i=0;i<6;i=i+1) begin
418             tmp[i] = imm[8];
419         end
420         tmp[6:15] = imm[8:17];
421         for(int i=0;i<16;i=i+2) begin
422             if($signed(ra[(i*8) +: 16]) > $signed(tmp[0:15])) begin
423                 rt_delay[0][(i*8) +: 16] = 16'hFFFF;
424             end
425             else begin
426                 rt_delay[0][(i*8) +: 16] = 16'h0000;
427             end
428         end
429
430
431
432
433
434     8'b01001100 : begin // cgti rt, ra, imm10 Compare Greater Than
435     Word Immediate
436         for(int i=0;i<22;i=i+1) begin
437             tmp[i] = imm[8];
438         end
439         tmp[22:31] = imm[8:17];
440
441
442
443
444
445         for(int i=0;i<16;i=i+4) begin
446             if($signed(ra[(i*8) +: 32]) > $signed(tmp[0:31])) begin
447                 rt_delay[0][(i*8) +: 32] = 32'hFFFFFF;
448             end
449             else begin
450                 rt_delay[0][(i*8) +: 32] = 32'h00000000;
451             end
452         end
453
454
455
456
457
458     8'b01011110 : begin // clgtbi rt, ra, imm10 Compare Logical
459     Greater Than Byte Immediate
460         for(int i=0;i<16;i=i+1) begin
461             if($unsigned(ra[(i*8) +: 8]) > $unsigned(imm[10:17]))
462             begin
463                 rt_delay[0][(i*8) +: 8] = 8'hFF;
464             end

```

```

431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
        else begin
            rt_delay[0][(i*8) +: 8] = 8'h00;
        end
    end
end

8'b01011101 : begin // clgthi rt, ra, imm10 Compare Logical
Greater Than Halfword Immediate
    for(int i=0;i<6;i=i+1) begin
        tmp[i] = imm[8];
    end
    tmp[6:15] = imm[8:17];
    for(int i=0;i<16;i=i+2) begin
        if($unsigned(ra[(i*8) +: 16]) > $unsigned(tmp[0:15]))
        begin
            rt_delay[0][(i*8) +: 16] = 16'hFFFF;
        end
        else begin
            rt_delay[0][(i*8) +: 16] = 16'h0000;
        end
    end
end

8'b01011100 : begin // clgti rt, ra, imm10 Compare Logical
Greater Than Word Immediate

    for(int i=0;i<22;i=i+1) begin
        tmp[i] = imm[8];
    end
    tmp[22:31] = imm[8:17];

    for(int i=0;i<16;i=i+4) begin
        if($unsigned(ra[(i*8) +: 32]) > $unsigned(tmp[0:31]))
        begin
            rt_delay[0][(i*8) +: 32] = 32'hFFFFFFFF;
        end
        else begin
            rt_delay[0][(i*8) +: 32] = 32'h00000000;
        end
    end
end

default begin
    rt_delay[0] = 0;
    rt_addr_delay[0] = 0;
    reg_write_delay[0] = 0;
    tmp=0;
end
endcase
end
else if (format == 5) begin
    case (op)
        9'b010000011: begin // ilh rt, imm16 Immediate Load Halfword
            for(int i=0;i<16;i=i+2) begin
                rt_delay[0][(i*8) +: 16] = imm[2:17];
            end
        end
        9'b010000010: begin // ilhu rt, imm16 Immediate Load Halfword
Upper
            for(int i=0;i<16;i=i+4) begin
                rt_delay[0][(i*8) +: 32] = {imm[2:17],16'h0000};
            end
        end
        9'b010000001: begin // il rt, imm16 Immediate Load Word
            for(int i=0;i<4;i++) begin
                rt_delay[0][(i*32) +: 32] = $signed(imm[2:17]);
            end
        end
    end
end
9'b011000001: begin // ioohl rt, imm16 Immediate Or Halfword Lower

```

```

495         for(int i=0;i<16;i=i+4) begin
496             rt_delay[0][(i*8) +: 32] = rt_st[(i*8) +: 32] |
497                 {16'h0000,imm[2:17]};
498         end
499     end
500
501     default begin
502         rt_delay[0] = 0;
503         rt_addr_delay[0] = 0;
504         reg_write_delay[0] = 0;
505         tmp=0;
506     end
507 endcase
508
509 end
510 else if (format == 6) begin
511     case (op)
512         7'b0100001: begin // ila rt, imm18 Immediate Load Address
513             for(int i=0;i<16;i=i+4) begin
514                 rt_delay[0][(i*8) +: 32] = {14'h0000,imm[0:17]};
515             end
516         end
517
518         default begin
519             rt_delay[0] = 0;
520             rt_addr_delay[0] = 0;
521             reg_write_delay[0] = 0;
522             tmp=0;
523         end
524     endcase
525     end
526   end
527 end
528 endmodule

```

```

1  module Permute(clk, reset, op, format, rt_addr, ra, rb, imm, reg_write, rt_wb,
2    rt_addr_wb, reg_write_wb, branch_taken, rt_addr_delay, reg_write_delay);
3
4    input          clk, reset;
5
6    //RF/FWD Stage
7    input [0:10]   op;           //Decoded opcode, truncated based on format
8    input [2:0]    format;       //Format of instr, used with op and imm
9    input [0:6]    rt_addr;      //Destination register address
10   input [0:127]  ra, rb;      //Values of source registers
11   input [0:17]   imm;         //Immediate value, truncated based on format
12   input          reg_write;   //Will current instr write to RegTable
13   input          branch_taken; //Was branch taken?
14
15  //WB Stage
16  output logic [0:127] rt_wb;    //Output value of Stage 3
17  output logic [0:6]  rt_addr_wb; //Destination register for rt_wb
18  output logic [0:1]  reg_write_wb; //Will rt_wb write to RegTable
19
20  //Internal Signals
21  logic [3:0][0:127] rt_delay;    //Staging register for calculated values
22  output logic [3:0][0:6] rt_addr_delay; //Destination register for rt_wb
23  output logic [3:0]   reg_write_delay; //Will rt_wb write to RegTable
24
25  // Temp variables
26  logic [0:127] tmp;
27
28  always_comb begin
29    rt_wb = rt_delay[2];
30    rt_addr_wb = rt_addr_delay[2];
31    reg_write_wb = reg_write_delay[2];
32  end
33
34  always_ff @ (posedge clk) begin
35    if (reset == 1) begin
36      rt_delay[3] <= 0;
37      rt_addr_delay[3] <= 0;
38      reg_write_delay[3] <= 0;
39      for (i=0; i<3; i=i+1) begin
40        rt_delay[i] <= 0;
41        rt_addr_delay[i] <= 0;
42        reg_write_delay[i] <= 0;
43      end
44      tmp <= 0;
45
46    end
47    else begin
48      rt_delay[3] <= rt_delay[2];
49      rt_addr_delay[3] <= rt_addr_delay[2];
50      reg_write_delay[3] <= reg_write_delay[2];
51
52      rt_delay[2] <= rt_delay[1];
53      rt_addr_delay[2] <= rt_addr_delay[1];
54      reg_write_delay[2] <= reg_write_delay[1];
55
56      rt_delay[1] <= rt_delay[0];
57      rt_addr_delay[1] <= rt_addr_delay[0];
58      reg_write_delay[1] <= reg_write_delay[0];
59
60      if (format == 0 && op == 0) begin                                //nop : No Operation
61        (Load)
62          rt_delay[0] <= 0;
63          rt_addr_delay[0] <= 0;
64          reg_write_delay[0] <= 0;
65      end
66      else begin
67        rt_addr_delay[0] <= rt_addr;
68        reg_write_delay[0] <= reg_write;

```

```

68     if (branch_taken) begin
69         rt_delay[0] <= 0;
70         rt_addr_delay[0] <= 0;
71         reg_write_delay[0] <= 0;
72     end
73     else if (format == 0) begin
74         case (op)
75             11'b00111011011 : begin //shlqbi : Shift Left
76                 Quadword by Bits
77                 rt_delay[0] <= ra << rb[29:31];
78             end
79             11'b00111011111 : begin //shlqby rt, ra, rb :
80                 Shift Left Quadword by Bytes
81                 rt_delay[0] <= ra << (rb[27:31] * 8);
82             end
83             11'b00111011000 : begin //rotqbi rt, ra, rb :
84                 Rotate Quadword by Bits
85                 tmp = rb[29:31];
86                 for(int b=0;b<128;b++) begin
87                     if(b+tmp < 128 ) begin
88                         rt_delay[0][b] = ra[b+tmp];
89                     end
90                     else begin
91                         rt_delay[0][b]=ra[b+tmp-128];
92                     end
93                 end
94             end
95             11'b00111011100 : begin //rotqby rt,ra,rb
96                 Rotate Quadword by Bytes
97                 tmp = rb[28:31];
98
99                 for(int b=0;b<=15;b++) begin
100                     if(b+tmp < 16 ) begin
101                         for(int i = b*8;i<(b*8+8);i++) begin
102                             rt_delay[0][i] = ra[i+tmp*8];
103                         end
104                     end
105                     else begin
106                         for(int i = b*8;i<(b*8+8);i++) begin
107                             rt_delay[0][i] = ra[i+tmp*8-16*8];
108                         end
109                     end
110                 end
111             end
112             11'b00110110010 : begin //gbb rt,ra Gather Bits
113                 from Bytes
114                 tmp = 0;
115                 for(int j=7,k=0;j<128;j=j+8,k++) begin
116                     tmp[k+16] = ra[j];
117                 end
118
119                 for(int i = 32;i<128;i++) begin
120                     rt_delay[0][i]=0;
121                 end
122                 rt_delay[0][0:31] = 16'h0000 | tmp[0:31];
123             end
124             11'b00110110001 : begin //gbb rt,ra Gather Bits
125                 from halfwords
126                 tmp = 0;
127
128                 for(int j=15,k=0;j<128;j=j+16,k++) begin
129                     tmp[k+24] = ra[j];
130                 end
131
132                 for(int i = 0;i<128;i++) begin
133                     rt_delay[0][i]=0;
134                 end

```

```

131           rt_delay[0][24:31] = 8'h00 | tmp[24:31];
132       end
133   11'b00110110000 : begin                         //gbb rt,ra Gather Bits
134       from halfwords
135           tmp = 0;
136
137           for(int j=31,k=0;j<128;j=j+32,k++) begin
138               tmp[k+28] = ra[j];
139           end
140
141           for(int i = 0;i<128;i++) begin
142               rt_delay[0][i]=0;
143           end
144           rt_delay[0][28:31] = 4'h0 | tmp[28:31];
145       end
146       default begin
147           rt_delay[0] <= 0;
148           rt_addr_delay[0] <= 0;
149           reg_write_delay[0] <= 0;
150       end
151   endcase
152 end
153 else if (format == 2) begin
154     case (op)
155         11'b00111111011 : begin                         //shlqbi : Shift Left
156             Quadword by Bits
157             tmp[0:6] = imm & 7'b0000111;
158             for(int b=0;b<128;b++) begin
159                 if(b+tmp[0:6] < 128 ) begin
160                     rt_delay[0][b] = ra[b+tmp[0:6]];
161                 end
162                 else begin
163                     rt_delay[0][b] = 0;
164                 end
165             end
166         11'b00111111111 : begin                         //shlqbyi rt,ra,value
167             Shift Left Quadword by Bytes Immediate
168             tmp[0:6] = imm & 7'b0001111;
169             for(int b=0;b<15;b++) begin
170                 if(b+tmp[0:6] < 16 ) begin
171                     for(int i = b*8;i<(b*8+8);i++) begin
172                         rt_delay[0][i] = ra[i+tmp[0:6]*8];
173                     end
174                 end
175                 else begin
176                     for(int i = b*8;i<(b*8+8);i++) begin
177                         rt_delay[0][i] = 0;
178                     end
179                 end
180             end
181         11'b00111111000 : begin                         //rotqbii rt,ra,value
182             Rotate Quadword by Bits Immediate
183             tmp[0:6] = imm & 7'b0000111;
184
185             for(int b=0;b<128;b++) begin
186                 if(b+tmp[0:6] < 128 ) begin
187                     rt_delay[0][b] = ra[b+tmp[0:6]];
188                 end
189                 else begin
190                     rt_delay[0][b]=ra[b+tmp[0:6]-128];
191                 end
192             end
193         11'b00111111100 : begin                         //rotqbyi rt, ra, imm7
194             Rotate Quadword by Bytes Immediate
195             tmp[0:6] = imm & 7'b0001111;

```

```

195         for(int b=0;b<=15;b++) begin
196             if(b+tmp[0:6] < 16 ) begin
197                 for(int i = b*8;i<(b*8+8);i++) begin
198                     rt_delay[0][i] = ra[i+tmp[0:6]*8];
199                 end
200             end
201             else begin
202                 for(int i = b*8;i<(b*8+8);i++) begin
203                     rt_delay[0][i] = ra[i+tmp[0:6]*8-16*8];
204                 end
205             end
206         end
207         default begin
208             rt_delay[0] = 0;
209             rt_addr_delay[0] = 0;
210             reg_write_delay[0] = 0;
211         end
212     endcase
213   end
214 end
215 end
216
217
218
219 endmodule

```

```

1  module LocalStore(clk, reset, op, format, rt_addr, ra, rb, rt_st, imm, reg_write,
2    rt_wb, rt_addr_wb, reg_write_wb, branch_taken, rt_addr_delay, reg_write_delay);
3
4    //RF/FWD Stage
5    input [0:10]      op;           //Decoded opcode, truncated based on format
6    input [2:0]        format;       //Format of instr, used with op and imm
7    input [0:6]        rt_addr;     //Destination register address
8    input [0:127]     ra, rb, rt_st; //Values of source registers
9    input [0:17]      imm;         //Immediate value, truncated based on format
10   input            reg_write;   //Will current instr write to RegTable
11   input            branch_taken; //Was branch taken?
12
13  //WB Stage
14  output logic [0:127]  rt_wb;      //Output value of Stage 3
15  output logic [0:6]    rt_addr_wb;  //Destination register for rt_wb
16  output logic          reg_write_wb; //Will rt_wb write to RegTable
17
18  //Internal Signals
19  logic [5:0][0:127]  rt_delay;    //Staging register for calculated values
20  output logic [5:0][0:6] rt_addr_delay; //Destination register for rt_wb
21  output logic [5:0]    reg_write_delay; //Will rt_wb write to RegTable
22
23  logic [0:127] mem [0:2047];      //32KB local memory
24
25  always_comb begin
26    rt_wb = rt_delay[4];
27    rt_addr_wb = rt_addr_delay[4];
28    reg_write_wb = reg_write_delay[4];
29  end
30
31  always_ff @(posedge clk) begin
32    if (reset == 1) begin
33      rt_delay[5] <= 0;
34      rt_addr_delay[5] <= 0;
35      reg_write_delay[5] <= 0;
36      for (int i=0; i<5; i=i+1) begin
37        rt_delay[i] <= 0;
38        rt_addr_delay[i] <= 0;
39        reg_write_delay[i] <= 0;
40      end
41      for (logic [0:11] i=0; i<2048; i=i+1) begin
42        //mem[i] <= 0;
43        mem[i] <= {i*4, (i*4 + 1), (i*4 + 2), (i*4 + 3)};
44      end
45    end
46    else begin
47      rt_delay[5] <= rt_delay[4];
48      rt_addr_delay[5] <= rt_addr_delay[4];
49      reg_write_delay[5] <= reg_write_delay[4];
50
51      for (int i=0; i<4; i=i+1) begin
52        rt_delay[i+1] <= rt_delay[i];
53        rt_addr_delay[i+1] <= rt_addr_delay[i];
54        reg_write_delay[i+1] <= reg_write_delay[i];
55      end
56
57      if (format == 0 && op == 0) begin                                //nop : No Operation
58        (Load)
59        rt_delay[0] <= 0;
60        rt_addr_delay[0] <= 0;
61        reg_write_delay[0] <= 0;
62      end
63      else begin
64        rt_addr_delay[0] <= rt_addr;
65        reg_write_delay[0] <= reg_write;
66        if (branch_taken) begin                                     // If branch taken last
67          cyc, cancel last instr
68          rt_delay[0] <= 0;

```

```

67         rt_addr_delay[0] <= 0;
68         reg_write_delay[0] <= 0;
69     end
70     else if (format == 0) begin
71         case (op)
72             11'b00111000100 : begin //lqx : Load Quadword
73                 (x-form)
74                     rt_delay[0] <= mem[$signed(ra[0:31]) + $signed(rb[0:31])];
75             end
76             11'b00101000100 : begin //stqx : Store Quadword
77                 (x-form)
78                     mem[$signed(ra[0:31]) + $signed(rb[0:31])] <= rt_st;
79                     reg_write_delay[0] <= 0;
80             end
81             default begin
82                 rt_delay[0] <= 0;
83                 rt_addr_delay[0] <= 0;
84                 reg_write_delay[0] <= 0;
85             end
86         endcase
87     end
88     else if (format == 4) begin
89         case (op[3:10])
90             8'b00110100 : begin //lqd : Load Quadword
91                 (d-form)
92                     rt_delay[0] <= mem[$signed((ra[0:31]) + $signed(imm[8:17]))];
93             end
94             8'b00100100 : begin //stqd : Store Quadword
95                 (d-form)
96                     mem[$signed(ra[0:31]) + $signed(imm[8:17])] <= rt_st;
97                     reg_write_delay[0] <= 0;
98             end
99             default begin
100                 rt_delay[0] <= 0;
101                 rt_addr_delay[0] <= 0;
102                 reg_write_delay[0] <= 0;
103             end
104         endcase
105     end
106     else if (format == 5) begin
107         case (op[2:10])
108             9'b001100001 : begin //lqa : Load Quadword
109                 (a-form)
110                     rt_delay[0] <= mem[$signed(imm[2:17])];
111             end
112             9'b001000001 : begin //stqa : Store Quadword
113                 (a-form)
114                     mem[$signed(imm[2:17])] <= rt_st;
115                     reg_write_delay[0] <= 0;
116             end
117             default begin
118                 rt_delay[0] <= 0;
119                 rt_addr_delay[0] <= 0;
120                 reg_write_delay[0] <= 0;
121             end
122         endcase
123     end
124     end
125 end
126
127 endmodule

```

```

1  module Branch(clk, reset, op, format, rt_addr, ra, rb, rt_st, imm, reg_write, pc_in,
2    rt_wb, rt_addr_wb, reg_write_wb, pc_wb, branch_taken, first, branch_kill);
3
4    //RF/FWD Stage
5    input [0:10]      op;           //Decoded opcode, truncated based on format
6    input [2:0]       format;        //Format of instr, used with op and imm
7    input [0:6]       rt_addr;       //Destination register address
8    input [0:127]     ra, rb, rt_st; //Values of source registers
9    input [0:17]     imm;          //Immediate value, truncated based on format
10   input            reg_write;    //Will current instr write to RegTable
11   input [7:0]      pc_in;        //Program counter from IF stage
12   input            first;        //1 if first instr in pair; used for determining
13   order of branch
14
15  //WB Stage
16  output logic [0:127]  rt_wb;      //Output value of Stage 3
17  output logic [0:6]    rt_addr_wb;  //Destination register for rt_wb
18  output logic          reg_write_wb; //Will rt_wb write to RegTable
19  output logic [7:0]   pc_wb;       //New program counter for branch
20  output logic          branch_taken; //Was branch taken?
21  output logic          branch_kill; //If branch is taken and branch instr
22  is first in pair, kill twin instr
23
24  //Internal Signals
25  logic [0:127]    rt_delay;     //Staging register for calculated values
26  logic [0:6]       rt_addr_delay; //Destination register for rt_wb
27  logic             reg_write_delay; //Will rt_wb write to RegTable
28  logic [7:0]       pc_delay;     //Staging register for PC
29  logic             branch_delay; //Was branch taken?
30
31  always_ff @(posedge clk) begin
32    if (reset == 1) begin
33      rt_wb <= 0;
34      rt_addr_wb <= 0;
35      reg_write_wb <= 0;
36      pc_wb <= 0;
37      branch_taken <= 0;
38    end
39    else begin
40      rt_wb <= rt_delay;
41      rt_addr_wb <= rt_addr_delay;
42      reg_write_wb <= reg_write_delay;
43      pc_wb <= pc_delay;
44      branch_taken <= branch_delay;
45    end
46  end
47
48  always_comb begin
49    if (format == 0 && op == 0) begin
50      rt_delay = 0;
51      rt_addr_delay = 0;
52      reg_write_delay = 0;
53      pc_delay = 0;
54      branch_delay = 0;
55    end
56    else begin
57      rt_addr_delay = rt_addr;
58      reg_write_delay = reg_write;
59      if (branch_taken) begin
60        rt_delay = 0;
61        rt_addr_delay = 0;
62        reg_write_delay = 0;
63        pc_delay = 0;
64        branch_delay = 0;
65      end
66      else if (format == 0) begin
67        case (op)
68          11'b00110101000 : begin
69            //bi : Branch Indirect

```

```

67     pc_delay = ra[0:31];
68     reg_write_delay = 0;
69     branch_delay = 1;
70   end
71   default begin
72     rt_delay = 0;
73     rt_addr_delay = 0;
74     reg_write_delay = 0;
75     pc_delay = 0;
76     branch_delay = 0;
77   end
78 endcase
79 end
80 else if (format == 5) begin
81   case (op[2:10])
82     9'b0001100110 : begin //brsl: Branch Relative and Set
83       Link
84       rt_delay[0:31] = pc_in + 1;
85       rt_delay[32:127] = 0;
86       pc_delay = imm[2:17] + pc_in;
87       branch_delay = 1;
88     end
89     9'b0001100100 : begin //br: Branch Relative
90       pc_delay = imm[2:17] + pc_in;
91       branch_delay = 1;
92     end
93     9'b0001100000 : begin //bra: Branch Absolute
94       pc_delay = imm[2:17];
95       branch_delay = 1;
96     end
97     9'b0001000010 : begin //brnz: Branch If Not Zero Word
98       if (first)
99         pc_delay = pc_in - 2 + $signed(imm[2:17]);
100      else
101        pc_delay = pc_in - 1 + $signed(imm[2:17]);
102      branch_delay = (rt_st[0:31] == 32'h0000) ? 0 : 1;
103    end
104    9'b0001000000 : begin //brz: Branch If Zero Word
105      if (first)
106        pc_delay = pc_in - 2 + $signed(imm[2:17]);
107      else
108        pc_delay = pc_in - 1 + $signed(imm[2:17]);
109      branch_delay = (rt_st[0:31] == 32'h0000) ? 1 : 0;
110    end
111    default begin
112      rt_delay = 0;
113      rt_addr_delay = 0;
114      reg_write_delay = 0;
115      pc_delay = 0;
116      branch_delay = 0;
117    end
118  endcase
119 else begin
120   rt_delay = 0;
121   rt_addr_delay = 0;
122   reg_write_delay = 0;
123   pc_delay = 0;
124   branch_delay = 0;
125 end
126 end
127
128 if (branch_delay == 1 && first == 1)
129   branch_kill = 1;
130 else
131   branch_kill = 0;
132 end
133 endmodule

```

## **Assembler Code**

```

1  from distutils.log import debug
2  import sys
3  class Assembler:
4      def __init__(self,ins_list_name,input_file,output_file,debug) -> None:
5          self.ins_list_name = ins_list_name
6          self.ins_opcode_mapping = dict()
7          self.opcode_ins_mapping = dict()
8          self.output_file = output_file
9          self.input_file = input_file
10         self.debug_level = debug
11         self.load_mapping()
12
13
14     def load_mapping(self):
15
16         with open(self.ins_list_name,'r') as ins_lst:
17             for line in ins_lst.readlines():
18                 # print(line)
19                 ins,opcode = line.split("\t")
20                 mnemonic = ins.split(" ")[0]
21                 opcode = opcode.strip()
22                 self.ins_opcode_mapping[mnemonic]=opcode
23                 if "rt, ra, rb, rc" in ins:
24                     self.opcode_ins_mapping[opcode]=[1,ins]
25                 elif "rt, ra, rb" in ins:
26                     self.opcode_ins_mapping[opcode]=[0,ins]
27                 elif "rt, ra, imm7" in ins:
28                     self.opcode_ins_mapping[opcode]=[2,ins]
29                 elif "rt, ra, imm8" in ins:
30                     self.opcode_ins_mapping[opcode]=[3,ins]
31                 elif "rt, ra, imm10" in ins or "rt, imm10(ra)" in ins :
32                     self.opcode_ins_mapping[opcode]=[4,ins]
33                 elif "rt, imm16" in ins or "imm16" in ins :
34                     self.opcode_ins_mapping[opcode]=[5,ins]
35                 elif "rt, imm18" in ins:
36                     self.opcode_ins_mapping[opcode]=[6,ins]
37                 else:
38                     if "nop" in ins:
39                         print("nop")
40                         self.opcode_ins_mapping[opcode]=[7,ins]
41                     elif "stop" in ins:
42                         self.opcode_ins_mapping[opcode]=[7,ins]
43
44         # #print(ins,opcode)
45         # #print(self.opcode_ins_mapping)
46         # #print(self.ins_opcode_mapping)
47
48     def parse_line(self,line):
49         ins = line.strip()
50         ins = ins.split("//")
51         ins = ins[0].strip().split(" ")
52         return ins
53
54     def parse_input(self):
55
56         with open(self.input_file,'r') as ins_lst, open(self.output_file,'w') as object:
57             for line in ins_lst.readlines():
58                 ins = self.parse_line(line)
59                 mnemonic = ins[0]
60                 if "stop" in mnemonic:
61                     break
62                     #print(mnemonic)
63                     opcode = self.ins_opcode_mapping[mnemonic]
64                     #print(opcode)
65                     format = self.opcode_ins_mapping[opcode]
66                     # print("format ",format)
67                     binary = self.compute(format,opcode,ins)

```

```

68         object.write(binary+"\n")
69
70     def parse_input_1(self):
71         with open(self.input_file,'r') as ins_lst, open(self.output_file,'w') as
72             object,open("debug.out",'w') as d_object:
73                 for line in ins_lst.readlines():
74
75                     ins = self.parse_line(line)
76                     mnemonic = ins[0]
77                     if "stop" in mnemonic:
78                         break
79                     #print(mnemonic)
80                     opcode = self.ins_opcode_mapping[mnemonic]
81                     print(opcode)
82                     format = self.opcode_ins_mapping[opcode]
83                     # print("format ",format)
84                     binary = self.compute(format,opcode,ins)
85                     object.write(binary+"\n")
86                     if self.debug_level==2:
87                         o_hex = '0x{0:0{1}X}'.format(int(binary,2),8)
88                         d_object.write(str(binary)+"\t"+o_hex+"\t"+line)
89                     else:
90                         d_object.write(str(binary)+"\t"+line)
91
92     def compute(self,format,opcode,ins):
93         ins_binary = "".zfill(32)
94         print(format,ins_binary,len(ins_binary))
95         print("ins {}".format(ins))
96         if format[0] == 0:
97             # opcode rt ra rb
98             # op[0-10]rb[11-17]ra[18-24]rt[25-31]
99             rt = bin(int(ins[1])).replace("0b","");
100            ra = bin(int(ins[2])).replace("0b","");
101            rb = bin(int(ins[3])).replace("0b","");
102            #print(rt,ra,rb)
103            rt = self.fill(rt,7)
104            ra = self.fill(ra,7)
105            rb = self.fill(rb,7)
106            #print(rt,ra,rb)
107            ins_binary = opcode+rb+ra+rt
108
109        elif format[0] == 1:
110            # opcode rt,ra,rb,rc
111            # op[0-3]rt[4-10]rb[11-17]ra[18-24]rc[25-31]
112            rt = bin(int(ins[1])).replace("0b","");
113            ra = bin(int(ins[2])).replace("0b","");
114            rb = bin(int(ins[3])).replace("0b","");
115            rc = bin(int(ins[4])).replace("0b","");
116            #print(rt,ra,rb)
117            rt = self.fill(rt,7)
118            ra = self.fill(ra,7)
119            rb = self.fill(rb,7)
120            rc = self.fill(rc,7)
121            # print(rt,ra,rb,rc)
122            ins_binary = opcode+rt+rb+ra+rc
123        elif format[0] == 2:
124            # opcode rt,ra,imm7
125            # op[0-10]imm7[11-17]ra[18-24]rt[25-31]
126            mask = 0b1111111
127            rt = bin(int(ins[1])).replace("0b","");
128            ra = bin(int(ins[2])).replace("0b","");
129            imm7 = bin(int(ins[3]) & mask).replace("0b","");
130
131            rt = self.fill(rt,7)
132            ra = self.fill(ra,7)
133            imm7 = self.fill(imm7,7)

```

```

134     ins_binary = opcode+imm7+ra+rt
135
136     elif format[0] == 3:
137         # opcode rt,ra,imm8
138         # op[0-9]imm8[10-17]ra[18-24]rt[25-31]
139         mask = 0b11111111
140         rt = bin(int(ins[1])).replace("0b","");
141         ra = bin(int(ins[2])).replace("0b","");
142         imm8 = bin(int(ins[3]) & mask).replace("0b","");
143         rt = self.fill(rt,7)
144         ra = self.fill(ra,7)
145         imm8 = self.fill(imm8,8)
146         ins_binary = opcode+imm8+ra+rt
147
148
149     elif format[0] == 4:
150         # opcode rt,ra,value
151         # op[0-7]imm10[8-17]ra[18-24]rt[25-31]
152         rt = bin(int(ins[1])).replace("0b","");
153         rt = self.fill(rt,7)
154         mask = 0b1111111111
155
156         if '(' in format[1]:
157             # opcode rt, symbol(ra)
158             imm10 = ins[2].split('(')[0]
159             ra = ins[2].split('(')[1][:-1]
160         else:
161             # opcode rt,ra,value
162             imm10 = ins[3]
163             ra = ins[2]
164
165         # print(imm10,ra)
166         imm10 = bin(int(imm10) & mask).replace("0b","");
167         imm10 = self.fill(imm10,10)
168
169         ra = bin(int(ra)).replace("0b","");
170         ra = self.fill(ra,7)
171         ins_binary = opcode+imm10+ra+rt
172     elif format[0] == 5:
173         print(ins)
174         mask = 0b1111111111111111
175         if len(ins)==3:
176             rt = bin(int(ins[1])).replace("0b","");
177             imm16 = bin(int(ins[2]) & mask).replace("0b","");
178         else:
179             rt="0"
180             imm16 = bin(int(ins[1]) & mask).replace("0b","");
181
182         print(rt,imm16)
183         rt = self.fill(rt,7)
184         imm16 = self.fill(imm16,16)
185         print(rt,imm16)
186         ins_binary = opcode+imm16+rt
187     elif format[0] == 6:
188         # opcode rt,value
189         # op[0-8]imm16[9-24]rt[25-31]
190         mask = 0b1111111111111111
191         rt = bin(int(ins[1]) & mask).replace("0b","");
192         rt = self.fill(rt,7)
193
194         imm18 = bin(int(ins[2])).replace("0b","");
195         imm18 = self.fill(imm18,18)
196         ins_binary = opcode+imm18+rt
197     else:
198         ins_binary = opcode + self.fill("",32-len(opcode))
199
200

```

```

201     #print(ins_binary,len(ins_binary))
202     return ins_binary
203
204 def fill(self,seq,width):
205     neg = False
206     if len(seq) > 0 and seq[0]=='-':
207         neg = True
208         seq = seq.replace("-","");
209         wide_seq = seq.zfill(width)
210
211     if neg:
212         wide_seq=str(1)+wide_seq[1:]
213     return wide_seq
214
215
216
217 if len(sys.argv) ==1:
218     print("Please mention the asm file: python assembler <input>.asm -o <out>")
219     sys.exit(0)
220 input_file_name = sys.argv[1]
221 output_file_name = "out"
222 debug = 0
223 print(sys.argv)
224
225 print(sys.argv)
226 for i in range(1,len(sys.argv)):
227     print("test ",sys.argv[i])
228     if sys.argv[i] =='-o':
229         print("afdasd")
230         output_file_name = sys.argv[i+1]
231         i=i+1
232
233     if sys.argv[i] =='-d':
234         debug=int(sys.argv[i+1])
235         i=i+1
236
237 print(" fda ",input_file_name)
238 print("dfd ",output_file_name)
239 asm = Assembler(ins_list_name="instructions.lst", input_file=input_file_name,
240                  output_file=output_file_name,debug=debug)
241
242 print(debug)
243 if debug==0:
244     asm.parse_input()
245 elif debug>=1:
246     print("debug ")
247     asm.parse_input_1()

```

```
1 mpy rt, ra, rb 01111000100
2 mpyu rt, ra, rb 01111001100
3 mpyh rt, ra, rb 01111000101
4 fa rt, ra, rb 01011000100
5 fs rt, ra, rb 01011000101
6 fm rt, ra, rb 01011000110
7 fceq rt, ra, rb 01111000010
8 fcgt rt, ra, rb 01011000010
9 shlh rt, ra, rb 00001011111
10 shl rt, ra, rb 00001011011
11 roth rt, ra, rb 00001011100
12 rot rt, ra, rb 00001011000
13 rothm rt, ra, rb 00001011101
14 rotm rt, ra, rb 00001011001
15 rotmah rt, ra, rb 00001011110
16 rotma rt, ra, rb 00001011010
17 cntb rt, ra 01010110100
18 avgb rt, ra, rb 00011010011
19 absdb rt, ra, rb 00001010011
20 sumb rt, ra, rb 01001010011
21 ah rt, ra, rb 00011001000
22 a rt, ra, rb 00011000000
23 sfh rt, ra, rb 00001001000
24 sf rt, ra, rb 00001000000
25 and rt, ra, rb 00011000001
26 or rt, ra, rb 00001000001
27 xor rt, ra, rb 01001000001
28 nand rt, ra, rb 00011001001
29 ceqb rt, ra, rb 01111010000
30 ceqh rt, ra, rb 01111001000
31 ceq rt, ra, rb 01111000000
32 cgtb rt, ra, rb 01001010000
33 cgth rt, ra, rb 01001001000
34 cgt rt, ra, rb 01001000000
35 clgtb rt, ra, rb 01011010000
36 clgth rt, ra, rb 01011001000
37 clgt rt, ra, rb 01011000000
38 nop 01000000001
39 mpya rt, ra, rb, rc 1100
40 fma rt, ra, rb, rc 1110
41 fms rt, ra, rb, rc 1111
42 shli rt, ra, imm7 00001111011
43 rothi rt, ra, imm7 00001111100
44 roti rt, ra, imm7 00001111000
45 rotmahi rt, ra, imm7 00001111110
46 rotmai rt, ra, imm7 00001111010
47 cflts rt, ra, imm8 01110111000
48 cfltu rt, ra, imm8 01110111001
49 mpyi rt, ra, imm10 01110100
50 mpyui rt, ra, imm10 01110101
51 ahi rt, ra, imm10 00011101
52 ai rt, ra, imm10 00011100
53 sfhi rt, ra, imm10 00001101
54 sfi rt, ra, imm10 00001100
55 andbi rt, ra, imm10 00010110
56 andhi rt, ra, imm10 00010101
57 andi rt, ra, imm10 00010100
58 orbi rt, ra, imm10 00000110
59 orhi rt, ra, imm10 00000101
60 ori rt, ra, imm10 00000100
61 xorbi rt, ra, imm10 01000110
62 xorhi rt, ra, imm10 01000101
63 xorri rt, ra, imm10 01000100
64 ceqbi rt, ra, imm10 01111110
65 ceqhi rt, ra, imm10 01111101
66 ceqi rt, ra, imm10 01111100
67 cgtbi rt, ra, imm10 01001110
68 cgthi rt, ra, imm10 01001101
69 cgti rt, ra, imm10 01001100
```

```
70  clgtbi rt, ra, imm10      01011110
71  clgthi rt, ra, imm10      01011101
72  clgti rt, ra, imm10 01011100
73  ilh rt, imm16  010000011
74  ilhu rt, imm16  010000010
75  il rt, imm16  010000001
76  iohl rt, imm16  011000001
77  ila rt, imm18  01000001
78  shlqbi rt, ra, rb   00111011011
79  shlqby rt, ra, rb   00111011111
80  rotqbi rt, ra, rb   00111011000
81  rotqby rt, ra, rb   00111011100
82  gbb rt, ra  00110110010
83  gbh rt, ra  00110110001
84  gb rt, ra  00110110000
85  lqx rt, ra, rb  00111000100
86  stqx rt, ra, rb 00101000100
87  bi ra  00110101000
88  lnop  0000000000001
89  shlqbii rt, ra, imm7    00111111011
90  shlqbyi rt, ra, imm7    00111111111
91  rotqbii rt, ra, imm7    00111111000
92  rotqbyi rt, ra, imm7    00111111100
93  lqd rt, imm10(ra) 00110100
94  stqd rt, imm10(ra) 00100100
95  lqa rt, imm16  001100001
96  stqa rt, imm16  001000001
97  br imm16  001100100
98  bra imm16  001100000
99  brsl rt, imm16  001100110
100 brnz rt, imm16  001000010
101 brz rt, imm16  001000000
102 stop  0000000000000
```

## **Sample Programs**

```

1 ai 0 0 12      //Even pipe structural hazard
2 ilh 4 -15
3 ai 1 0 1
4 ilh 5 -14
5 ahi 2 2 14
6 ilh 6 -13
7 ahi 3 2 1
8 stqd 1 0(1)   //Odd pipe structural hazard, plus memory ref
9 stqd 1 1(1)
10 stqd 1 2(1)
11 stqd 1 3(1)
12 lqa 4 13
13 lqa 5 14
14 lqa 6 15
15 lqa 7 16
16 ila 25 3      //Set r25 to 3 and decrement
17 ila 26 0
18 ai 25 25 -1   //Keep decrementing until r25==0
19 ai 26 26 1
20 brnz 25 -2    //Branch hazard resolution, odd destination addr; Also RAW hazard, brnz
is stalled 1 cyc until dec r25 finishes
21 avgb 2 0 1     //Dual issue example
22 rotqbi 12 4 5
23 mpy 5 3 4
24 rotqbi 13 4 0
25 mpy 8 6 7
26 rotqbi 14 4 1
27 mpy 11 9 10
28 rotqbi 15 4 3
29 mpy 17 25 26
30 rotqbi 15 4 2 //Forwarding without stalling
31 stop           //Stop instr ignores all following instr and ends program
32 mpy 8 6 7
33 rotqbi 14 4 1
34 mpy 11 9 10
35 rotqbi 15 4 3
36 mpy 14 22 23

```

```

1  ila 0 0          // store 0 in reg0
2  ila 1 4          // shift=4;
3  ilh 10 -1        // used to create mask FFFFFFFF
4  ilh 30 4          // use to shift by 12byte
5  ilh 31 8          // use to shift by 8 byte
6  ilh 32 12         // use to shift by 8 bytes
7  shlqby 10 10 32   // generate FFFFFFFF ins first word
8  ila 11 4          // i = 4// START OUTER LOOP
9  ai 11 11 -1        // i--
10 ila 12 4          // j = 4 START INNER LOOP
11 ila 42 0          // load_index=0
12 ila 43 16         // rotate=16
13 ila 20 0          // clearing reg 20
14 ai 12 12 -1        // j--;
15 ila 15 0          // clearing reg 15
16 lqd 5 4(42)       // load row(2)= 4+0,+41,4+2,4+3
17 and 15 5 10        // extract the first word of the quadword
18 rotqby 15 15 43    // rotate the word to align in the slot
19 xor 20 15 20        // reg 15 will have elemnt for column 0 matrix2
20 shlqby 5 5 1        // shift row by 4
21 ai 43 43 -4        // rotate=-4
22 stqd 5 4(42)       // store the shifted reg back to memory
23 ai 42 42 1          // load_index+=1
24 brnz 12 -10         // if reg11==0 break else goto ai 12 12 -1
25 stqd 20 10(11)      // store 20 in 13 to 10
26 brnz 11 -17         // if reg11==0 break else goto ai 11 11 -1
27 ila 51 4          // i = 4// START OUTER LOOP
28 ai 51 51 -1        // i--
29 ila 80 0          // reset the quad word
30 lqd 21 0(51)       // load maxtrix 1 rows
31 ila 12 4          // j = 4 START INNER LOOP
32 ila 43 16         // rotate=16s
33 ai 12 12 -1        // j--;
34 lqd 22 10(12)      // load transposed matrix2 rows
35 mpya 23 22 21 23    // multiply slot wise both rows and store qword in 23
36 ila 81 0          // reset the cell
37 ila 45 4          // k=0
38 ai 45 45 -1        // k--
39 and 70 23 10        // using mask to extract the left most word
40 shlqby 23 23 30    // shift left by 4 bytes
41 a 81 81 70         // computing sum of the product cell wise
42 brnz 45 -4          // jump to ai 45 45 -1
43 and 81 81 10        // clearing any other bits
44 rotqby 81 81 43    // rotate reg81 to place word in correct slot of quad
45 ai 43 43 -4        // rotate=-4
46 or 80 80 81         // build the quadword using or with rotated word
47 brnz 12 -14         // jump to ai 12 12 -1
48 stqd 80 20(51)      // store the quad word row for the matrix3
49 brnz 51 -21         //Jump to ai 51 51 -1
50 stop
51

```