

### Procedure

This lab involved one main task: design and simulate a simple pipelined processor. We need to add pipelining to our lab2 processor and resolve the issues that come with the performance enhancement (Figure 1). After building the single processor, we then load a test program to confirm the system works.



Figure 1: ARM 5 Stage Pipelined Processor

Figure 1: ARM 5 Stage Pipelined Processor

### Datapath

We firstly added four pipeline registers to separate the datapath into five stages (Figure 2).

### Single-Cycle & Pipelined Datapath



Figure 2: Datapaths of Single-cycle and Pipelined

After that, we corrected the datapath by making the WA3 signal pipelined along through the Execution, Memory and Writeback stages (Figure 3).

## Corrected Pipelined Datapath



- **WA3 must arrive at same time as Result**
- **Register file written on falling edge of CLK**

**Figure 3: Corrected pipelined datapath**

Then, we optimized the PC logic by eliminating a register and adder (Figure 4).

## Optimized Pipelined Datapath



**Remove adder by using PCPlus4F after PC has been updated to PC+4**

**Figure 4: Optimized PC logic eliminating a register and adder**

## Control

Then we make the control signals pipelined along with the data in order to make them remain synchronized with the instructions (Figure 5)..

### Pipelined Processor Control



- Same control unit as single-cycle processor
- Control delayed to proper pipeline stage

**Figure 5: Pipelined processor with control**

Because forwarding is necessary when an instruction in the Execute stage has a source register matching the destination register of an instruction in the Memory or Writeback stage, we modified the pipelined processor to support forwarding (Figure 6).

### Data Forwarding



**Figure 6: Pipelined processor with forwarding to solve hazards**

And we added the stalls in order to let LDR work properly if it is an LDR and its destination register matches either source operand of the instruction in the Decode Stage (Figure 7).

## Stalling Hardware



**Figure 7: Pipelined processor with stalls to solve LDR data hazard**

In the end, we modified the pipelined processor to make sure moving the branch decision earlier and handle control hazards (Figure 8).

## Pipelined processor with Early BTA



**Figure 8: Pipelined processor handling branch control hazard**

**Task:**

After completing the design, we use the following program to verify that our system resolved all the potential hazards: data hazard and control hazard.

|      |                 |
|------|-----------------|
| Main | SUB R0 R15 R15  |
|      | ADD R1 R0 #1    |
|      | ORR R2 R0 R1    |
|      | ADD R2 R0 #2    |
|      | SUBS R0 R2 #0   |
|      | BEQ TAG1        |
|      | AND R2 R2 R0    |
|      | AND R1 R2 R0    |
| TAG1 | ADD R9 R1 R0    |
|      | STR R9 [R0, #9] |
|      | LDR R3 [R0, #9] |
|      | AND R2 R3 R2    |

**Result:****i. An example of forwarding from the memory stage to the execute stage.**

In the first instruction, the result of the subtraction operation ( $R15 - R15$ ) is stored in  $R0$  in the memory stage. In the second instruction,  $R0$  is used as an operand in the execution stage to perform subtraction ( $R0 + #1$ ). In this case, forwarding from the memory stage to the execute stage is used to provide the value of  $R0$  to the execute stage without waiting for it to be written back to the register file.

From the following waveform, we could see that at fifth cycle, the ALUResult is been stored in the  $R0$  in memory stage for the first instruction, and as the source  $R0$  is been called as one source in the execution stage, the forward AE becomes 10 to let the sourceA read the ALUResult directly from the  $R0$  in the memory stage.



## ii. An example of forwarding from the writeback stage to the execute stage.

In the first instruction, the result of the subtraction operation ( $R15 - R15$ ) is stored in R0 in the writeback stage. In the third instruction, R0 is used as an operand in the execution stage to perform ORR (R0, R1). In this case, forwarding from the writeback stage to the execute stage is used to provide the value of R0 to the execute stage without waiting for it to be read from the register file.

From the following waveform, we could see that at sixth cycle, the ALUResultW has been given and is going to write back to the register in the writeback stage for the first instruction, and as R0 is being called as source A in the execution stage in the third instruction. The ForwardAE becomes 01 let the source A read the ALUResultW directly from the R0 in the writeback stage.



### iii. An example of stalling for a memory instruction.

In the eleventh instruction, the data from memory location [R0, #9] is loaded into R3 in the memory stage. In the next instruction, R3 is used as an operand in the execution stage to perform addition ( $R3 + R2$ ). However, since the data in R3 is not yet available (it is still being loaded from memory), the processor has to wait for the memory operation to complete before executing the second instruction. This causes a pipeline stall.

From the following waveform, we could see that when  $RA1D = WA3E$ , the stalling is executed. The  $StallD$  and  $StallF$  are asserted to force the Decode and Fetch stage registers to hold their old value.



#### iv. An example of flushing for a branch instruction.

In the sixth instruction, a branch instruction is executed that checks whether the result of the previous subtraction operation ( $R2 - \#0$ ) is zero. If it is zero, the program jumps to TAG1. Since the processor cannot determine whether the branch is taken or not until the execution stage of the next instruction, the instructions in the pipeline after the sixth instruction have to be flushed (i.e., discarded) if the branch is taken. This is done to ensure that the program counter points to the correct instruction after the branch is taken.

From the following waveform, we could see that when branch is taken, flushD and flushE are asserted to force the registers of Decode and Execute stages flushed.



## Appendix: SystemVerilog code

### 1) arm.sv

```

1 //Junchao Zhou, Chenhan Dai
2 //04/19/2023
3 //EE469
4 //Lab #3
5
6 /* arm is the spotlight of the show and contains the bulk of the datapath and control logic. This module is split into two parts, the datapath and control.
7 */
8
9 // clk - system clock
10 // rst - system reset
11 // instr - incoming 32 bit instruction from imem, contains opcode, condition, addresses and or immediates
12 // Readdata - data read out of the dmem
13 // writedata - data to be written to the dmem
14 // Memwrite - write enable to allowed writedata to overwrite an existing dmem word
15 // PC - the current program count value; goes to imem to fetch instruction
16 // ALUResult - result of the ALU operation, sent as address to the dmem
17
18 module arm (
19     input logic          clk, rst,
20     input logic [31:0] Instr,
21     input logic [31:0] Readdata,
22     output logic [31:0] writedata,
23     output logic [31:0] PC, ALUResult,
24     output logic         Memwrite
25 );
26
27     // datapath buses and signals
28     logic [31:0] PCPrime, PCPlus4, PCImm, PCPlus8E; // pc signals
29     logic [3:0] RAID, RA2D, RAIE, RA2E; // regfile input addresses in different stage
30     logic [31:0] RDID, RD2D, RDIE, RD2E, RD1E; // raw regfile outputs in different stage
31     logic [3:0] ALUflags; //alu combinational flag outputs
32     logic [3:0] StatusFlag, FlagsE; //two flag value in different stages
33     logic [3:0] CondeX; //condition value in execute stage
34     logic [31:0] ExtImmD, ExtImmE, SrcA, SrcB; //computed or fetched value in different stages
35     logic [31:0] result; //alu result
36     logic [31:0] InstrD; //instruction in decoder stage
37     logic [31:0] ALUResultE, ALUResultW; //ALUResult in memory stage
38     logic [3:0] WA3D, WA3E, WA3M, WA3W; //destination Register address at different stages
39     logic [31:0] writedataI, writedataE, ReadDataW; //data holder from different stage
40
41     // control signals in different stage
42     logic PCsrcD, MemtoRegD, BranchE, ALUSrcD, Regwrited, Flagwrited, Memwrited;
43     logic PCsrcE, MemtoRegE, BranchE, ALUSrcE, RegwritedE, FlagwritedE, MemwritedE;
44     logic PCsrcM, MemtoRegM, Regwritem; //Memwritem = Memwrite
45     logic PCsrcW, MemtoRegW, Regwritew;
46     logic [1:0] RegSrc, ImmSrc, ALUControlD, ALUControlE;
47     logic CondeX;
48
49     // hazard unit
50     logic StallF, stallD;
51     logic PCwPendingF;
52     logic ldrStallD;
53     logic FlushD, FlushE;
54     logic [1:0] ForwardAE, ForwardBE;
55
56     // build hazard control unit for signal
57     assign PCwPendingF = PCsrcD | PCsrcE | PCsrcM;
58     assign ldrStallD = ((RAID == WA3E) | (RA2D == WA3E)) & MemtoRegD;
59     assign stallF = ldrStallD | PCwPendingF;
60     assign FlushD = (BranchE & CondeX) | PCwPendingF | PCsrcW;
61     assign FlushE = ldrStallD | (BranchE & CondeX);
62     assign stallD = ldrStallD;
63     assign ForwardAE = ((RAE == WA3M) & Regwritem) ? 2'b10 :
64                         ((RAE == WA3W) & Regwritew) ? 2'b01 :
65                         2'b00;
66     assign ForwardBE = ((RA2E == WA3M) & Regwritem) ? 2'b10 :
67                         ((RA2E == WA3W) & Regwritew) ? 2'b01 :
68                         2'b00;
69
70     /* The datapath consists of a PC as well as a series of muxes to make decisions about which data words to pass forward and operate on. It is
71     ** noticeably missing the register file and alu, which you will fill in using the modules made in lab 1. To correctly match up signals to the
72     ** ports of the register file and alu take some time to study and understand the logic and flow of the datapath.
73     */
74
75     //----- DATAPATH -----
76
77
78
79     assign PCPrime = (BranchE & CondeX) ? ALUResultE : PCImm; // mux, use either default or newly computed value from ALU
80     assign PCPlus4 = PC + 4'd4; // default value to access next instruction
81     assign PCImm = PCsrcW ? Result : PCPlus4; // mux, use either default or newly computed value from ResultW
82
83     // update the PC, at rst initialize to 0
84     always_ff @(posedge clk) begin
85         if (rst) begin
86             PC <= 0;
87         end
88         else if (~StallF) PC <= PCPrime;
89     end
90
91 
```

```

91      // Pipeline between fetch stage and decoder stage
92      // Fetch register takes rst and clk to synchronize the stage
93      // input stallD and FlushD are applied to control the update of register
94      // InstrF is instruction input from instruction memory
95      // InstrD is output instruction in decoder stage
96
97      FetchReg fetch_Reg(
98          .rst    (rst),
99          .clk    (clk),
100         .StallD (StallD),
101         .FlushD (FlushD),
102         .InstrF  (Instr),
103         .InstrD  (InstrD)
104     );
105
106
107     // determining the register addresses based on control signals
108     // Regsrc[0] is set if doing branch instruction
109     // Regsrc[1] is set when doing memory instructions
110     assign RAID = Regsrc[0] ? 4'd15 : InstrD[19:16];
111     assign RA2D = Regsrc[1] ? WA3D : InstrD[3:0];
112     assign WA3D = InstrD[15:12];
113
114     // Register file with 16 registers
115     // Input Reverse clock to save data half clock cycle in advance
116     // Take address from write back stage to enable write
117     // Write with result and address from write back stage(ResultW, WA3W)
118     // Two output value(RD1D, RD2D) based on correspond addresses(RAID, RA2D)
119     // Address are in 4 bits, data are in 32 bits
120
121     reg_file u_reg_file (
122         .clk        (~clk),
123         .wr_en     (RegwriteW),
124         .write_data (Result),
125         .wr_addr   (RAID),
126         .read_addr1 (RAID),
127         .read_addr2 (RA2D),
128         .read_data1 (RD1D),
129         .read_data2 (RD2D)
130     );
131
132     // Flag register
133     // Input ALUflags to register
134     // Update output statusflag when flagwrite and CondEx is true
135     FlagReg u_flags_Reg (
136         .clk        (clk),
137         .Flagwrite (FlagwriteE & CondEx),
138         .write_data (ALUFlags),
139         .read_data (StatusFlag)
140     );
141
142
143     // two muxes, put together into an always_comb for clarity
144     // determines which set of instruction bits are used for the immediate
145     always_comb begin
146         if (ImmSrc == 'b00) ExtImmD = {12{InstrD[7]}}, InstrD[7:0];           // 8 bit immediate - reg operations
147         else if (ImmSrc == 'b01) ExtImmD = {10'b0, InstrD[11:0]};           // 12 bit Immediate - mem operations
148         else ExtImmD = {16{InstrD[23]}}, InstrD[23:0], 2'b00;           // 24 bit immediate - branch operation
149     end
150
151     // Clear the memory when FlushE or reset signal is true
152     // Resets control signal PCSrcD, MemRegD, BranchD, 2bits ALUSrcD,
153     // 2bits ALUControlD, RegWriteD, FlagWrittenD, MemWrittenD
154     // And condition value from instruction condB
155     // Updated PC value PCPlus8(PCPlus4)
156     // Register address and corresponding value RD1D, RD2D, RAID, RA2D
157     // Write back address WA3D, Extended immediate value ExtImmD
158     // Output corresponding signal and value in Execute stage
159     // Update with clock
160
161     DecoderReg decode_Reg(
162         .clk        (clk),
163         .rst        (rst),
164         .FlushE    (FlushE),
165         .PCSrcD   (PCSrcD),
166         .RegWritten (RegWritten),
167         .MemToRegD (MemToRegD),
168         .MemWritten (MemWritten),
169         .BranchD   (BranchD),
170         .ALUSrcD   (ALUSrcD),
171         .FlagWritten (FlagWritten),
172         .PCPlusBD (PCPlusBD),
173         .CondD    (InstrD[31:28]),
174         .FlagsD   (StatusFlag),
175         .ALUControlD (ALUControlD),
176         .RD1D     (RD1D),
177         .RD2D     (RD2D),
178         .RA1D     (RA1D),
179         .RA2D     (RA2D),
180         .WA3D     (WA3D),
181         .ExtImmD  (ExtImmD),
182         .PCSrcCE (PCSrcCE),
183         .RegwriteE (RegwriteE),
184         .MemwriteE (MemwriteE),
185         .BranchE   (BranchE),
186         .ALUSrcE   (ALUSrcE),
187         .FlagwriteE (FlagwriteE),
188         .PCPlus8E (PCPlus8E),
189         .Conde    (Conde),
190         .FlagsE   (FlagsE),
191         .ALUControlE (ALUControlE),
192         .RD1E     (RD1E),
193         .RD2E     (RD2E),
194         .RA1E     (RA1E),
195         .RA2E     (RA2E),
196         .WA3E     (WA3E),
197         .ExtImmE  (ExtImmE)
198     );
199

```

```

199
200    // WriteData and SrcA are direct outputs of the register file, whereas SrcB is chosen between reg file output and the immediate
201    assign writeDataE = (RAE == 'd15) ? PCPlus8E : RD2E;           // substitute the 15th regfile register for PC
202    assign writeDataE = (ForwardBE == 'b00) ? writeDataE;
203    assign writeDataE = (ForwardBE == 'b01) ? result;
204    assign ALUResult;
205    assign SrcA = (ForwardDAE == 'b00) ? RDE1;
206    assign SrcA = (ForwardDAE == 'b01) ? result;
207    assign ALUResult;
208    assign RD1E = (RAE == 'd15) ? PCPlus8E : RD1E;
209    assign SrcB = ALUSrcE;           // substitute the 15th regfile register for PC
210    assign SrcB = EXCIMME ? writeDataE : writeDataE;           // determine alu operand to be either from reg file or from immediate
211
212    // ALU
213    // with two input source A and B
214    // Controlled by [1:0]ALUcontrol signal
215    // 00 for ADD, 01 for SUB, 10 for AND, 11 for OR
216    // Return computed result and flags
217    alu ualu (
218        .a      (SrcA),
219        .b      (SrcB),
220        .ALUControl (ALUControl),
221        .Result   (ALUResult),
222        .ALUFlags  (ALUFlags)
223    );
224
225    // Pipeline between Execute Stage and Memory Stage
226    // Takes Clk and Rst signal
227    // Input control signal PCSrc, Regwrite, Memwrite in Execute stage with AND gate to Condition Execute
228    // Also MemtoReg in Execute stage
229    // Input data ALUResult and write Data in 32 bits
230    // Write back address in 4 bits
231    // Output corresponding signal and value in memory stage with clock
232    ExcReg execute_Reg (
233        .clk      (clk),
234        .rst      (rst),
235        .PCSrc   (PCSrcE & CondEx),
236        .Regwrite (RegwriteE & CondEx),
237        .MemtoReg (MemtoRegE),
238        .Memwrite (MemwriteE & CondEx),
239        .ALUResultE (ALUResultE),
240        .writeDataE (writeDataE),
241        .WA3E    (WA3E),
242        .PCSrcM   (PCSrcM),
243        .RegwriteM (RegwriteM),
244        .MemtoRegM (MemtoRegM),
245        .MemwriteM (MemwriteM),
246        .ALUResultM (ALUResultM),
247        .writeDataM (writeDataM),
248        .WA3M    (WA3M)
249    );
250
251    // Pipeline between Memory Stage and Write Stage
252    // Takes Clk and Rst signal
253    // Input control signal PCSrc, Regwrite, MemtoReg in Memory stage
254    // Input 32bits value ReadData, ALUResult in Memory stage
255    // Input write back address in 4 bits
256    // Output Corresponding signal and value in Write stage
257    MemReg memory_Reg (
258        .clk      (clk),
259        .rst      (rst),
260        .PCSrcM   (PCSrcM),
261        .RegwriteM (RegwriteM),
262        .MemtoRegM (MemtoRegM),
263        .Readdata  (Readdata),
264        .ALUResultM (ALUResult),
265        .WA3M    (WA3M),
266        .PCSrcW   (PCSrcw),
267        .RegwriteW (RegwriteW),
268        .MemtoRegW (MemtoRegW),
269        .ReaddataW (ReaddataW),
270        .ALUResultW (ALUResultW),
271        .WA3W    (WA3W)
272    );
273
274    // determine the result to run back to PC or the register file based on whether we used a memory instruction
275    assign Result = MemtoRegW ? ReadDataW : ALUResultW; // determine whether final writeback result is from dmemory or alu
276
277
278    /* The control consists of a large decoder, which evaluates the top bits of the instruction and produces the control bits
279    ** which become the select bits and write enables of the system. The write enables (Regwrite, Memwrite and PCSrc) are
280    ** especially important because they are representative of your processor's current state.
281    */
282
283    //-----
284
285    always_comb begin
286
287        // Decoder for CondEx
288        // Result is based on Condition signal from instruction
289        case (Conde)
290
291            //EQ: Equal
292            4'b0000: CondEx = StatusFlag[2];
293
294            //NE: Not equal
295            4'b0001: CondEx = ~StatusFlag[2];
296
297            //GE: Greater or Equal
298            4'b1010: CondEx = StatusFlag[3] ~^ StatusFlag[0];
299
300            //LT: Less
301            4'b1011: CondEx = StatusFlag[3] ^ StatusFlag[0];
302
303            //GT: Greater
304            4'b1100: CondEx = ~StatusFlag[2] & (StatusFlag[3] ~^ StatusFlag[0]);
305
306            //LE: Less or Equal
307            4'b1101: CondEx = StatusFlag[2] | (StatusFlag[3] ^ StatusFlag[0]);
308
309            //Unconditional
310            4'b1110: CondEx = 1; //Keep execute for uncondition
311
312            default: CondEx = 0;
313        endcase
314

```

```

314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432

```

```

    casez (InstrD[27:20])
        // ADD (Imm or Reg)
        8'b002_0100_0 : begin    // note that we use wildcard "?" in bit 25. That bit decides whether we use immediate or reg, but regardless we add
            PCSrcD = 0;
            BranchD = 0;
            MemtoRegD = 0;
            MemwriteD = 0;
            ALUSrcD = InstrD[25]; // may use immediate
            Flagwritten = 0;
            RegWrittenD = 1;
            RegSrc = 'b00;
            ImmSrc = 'b00;
            ALUControlD = 'b00;
        end
        // SUB/CMP (Imm or Reg)
        8'b002_0010_? : begin // note that we use wildcard "?" in bit 25. That bit decides whether we use immediate or reg, but regardless we sub
            PCSrcD = 0;
            BranchD = 0;
            MemtoRegD = 0;
            MemwriteD = 0;
            ALUSrcD = InstrD[25]; // may use immediate
            Flagwritten = InstrD[20]; // may write flag
            RegWrittenD = 1;
            RegSrc = 'b00;
            ImmSrc = 'b00;
            ALUControlD = 'b01;
        end
        // AND
        8'b000_0000_0 : begin
            PCSrcD = 0;
            BranchD = 0;
            MemtoRegD = 0;
            MemwriteD = 0;
            ALUSrcD = 0;
            Flagwritten = 0;
            RegWrittenD = 1;
            RegSrc = 'b00;
            ImmSrc = 'b00; // doesn't matter
            ALUControlD = 'b10;
        end
        // ORR
        8'b000_1100_0 : begin
            PCSrcD = 0;
            BranchD = 0;
            MemtoRegD = 0;
            MemwriteD = 0;
            ALUSrcD = 0;
            Flagwritten = 0;
            RegWrittenD = 1;
            RegSrc = 'b00;
            ImmSrc = 'b00; // doesn't matter
            ALUControlD = 'b11;
        end
        // LDR
        8'b010_1100_1 : begin
            PCSrcD = 0;
            BranchD = 0;
            MemtoRegD = 1;
            MemwriteD = 0;
            ALUSrcD = 1;
            Flagwritten = 0;
            RegWrittenD = 1;
            RegSrc = 'b10; // msb doesn't matter
            ImmSrc = 'b01;
            ALUControlD = 'b00; // do an add
        end
        // STR
        8'b010_1100_0 : begin
            PCSrcD = 0;
            BranchD = 0;
            MemtoRegD = 0; // doesn't matter
            MemwriteD = 1;
            ALUSrcD = 1;
            Flagwritten = 0;
            RegWrittenD = 0;
            RegSrc = 'b10; // msb doesn't matter
            ImmSrc = 'b01;
            ALUControlD = 'b00; // do an add
        end
        // B/BXX
        8'b1010_???? : begin
            PCSrcD = 1; // depends on CondEx
            BranchD = 1;
            MemtoRegD = 0;
            MemwriteD = 0;
            ALUSrcD = 1;
            Flagwritten = 0;
            RegWrittenD = 0;
            RegSrc = 'b01;
            ImmSrc = 'b10;
            ALUControlD = 'b00; // do an add
        end
        default: begin
            PCSrcD = 0;
            BranchD = 0;
            MemtoRegD = 0; // doesn't matter
            MemwriteD = 0;
            ALUSrcD = 0;
            Flagwritten = 0;
            RegWrittenD = 0;
            RegSrc = 'b00;
            ImmSrc = 'b00;
            ALUControlD = 'b00; // do an add
        end
    endcase
endmodule

```

## 2) FetchReg.sv

```

1 //Junchao Zhou, Chenhan Dai
2 //05/05/2023
3 //EE469
4 //Lab #3
5
6 /* FetchReg is a register as pipeline between fetch stage and decoder stage
7 ** of pipeline processor. It reads instruction from instruction memory and
8 ** pass instruction to decoder stage. Stall and Flush signal from hazard unit
9 ** are applied to this register to prevent harzard.
10 */
11
12 //Inputs:
13 //clk (1-bit): A clock signal that controls the timing of the module.
14 //rst (1-bit): A reset signal that initializes the module to a known state.
15 //stallD (1-bit): A signal that stalls at decoder stage (D) of the processor.
16 //FlushD (1-bit): A signal that flushes at decoder stage (D) of the processor.
17 //InstrF (32-bit): A signal that contains the instruction fetched from the instruction memory.
18
19 //Outputs:
20 //InstrD (32-bit): A signal that contains the instruction to be decoded in the decoder stage (D) of the processor.
21 module FetchReg(input logic clk,
22                 input logic rst,
23                 input logic stallD,
24                 input logic FlushD,
25                 input logic [31:0] InstrF,
26                 output logic [31:0] InstrD);
27
28 // memory:
29 logic [31:0] memory;
30
31 // write port
32 always @ (posedge clk) begin
33     if (FlushD | rst)
34         memory <= '0;
35     else if (~stallD)
36         memory <= InstrF;
37 end
38
39 // asynchronous read
40 assign InstrD = memory;
41
42 endmodule

```

### 3) DecodeReg.sv

```

1 //junchao zhou, chenhan dai
2 //05/05/2023
3 //EE469
4 //Lab #3
5
6 // This is a register between decoder stage and execute stage
7 // with input control signal and datapath values in decoder stage
8 // And output same values in execute stage
9 // Flush in execute stage are applied to the register to prevent potential hazard
10
11 // The inputs to the module include:
12 // clk - a clock signal used for synchronization.
13 // rst - a reset signal used to reset the state of the module.
14 // Flushe - a signal used to flush the execute stage.
15 // PCsrcD - a signal that determines the source of the next program counter value.
16 // RegwriteD - a signal that enables writing to the register file.
17 // MemtoRegD - a signal that determines whether data from memory is written to the register file.
18 // MemwriteD - a signal that enables writing to memory.
19 // BranchD - a signal that determines whether a branch instruction is executed.
20 // ALUsrcD - a signal that determines the source of the second operand to the ALU.
21 // FlagwriteD - a signal that enables writing to the flag register.
22 // PCPlus8D - the value of the program counter plus 8 in 32 bits.
23 // CondD - the condition for the branch instruction in 4 bits.
24 // FlagsD - the current value of the flag register in 4 bits.
25 // ALUcontrolD - the control signal for the ALU operation in 2 bits.
26 // RD1D, RD2D - the values of the first and second operands from the register file in 32 bits.
27 // RAID, RA2D, WA3D: the register addresses for the first and second operands and the write-back address, in 4 bits.
28 // ExtImmD - the value of the immediate operand, sign-extended to 32 bits.
29
30 // The outputs of the module include:
31 // PCsrcE - a signal that determines the source of the next program counter value in the execute stage.
32 // RegwriteE - a signal that enables writing to the register file in the execute stage.
33 // MemtoRegE - a signal that determines whether data from memory or the ALU is written to the register file in the execute stage.
34 // MemwriteE - a signal that enables writing to memory in the execute stage.
35 // BranchE - a signal that determines whether a branch instruction is executed in the execute stage.
36 // ALUsrcE - a signal that determines the source of the second operand to the ALU in the execute stage.
37 // FlagwriteE - a signal that enables writing to the flag register in the execute stage.
38 // PCPlus8E - the value of the program counter plus 8 in the execute stage in 32 bits.
39 // CondE - the condition for the branch instruction in the execute stage in 4 bits.
40 // FlagsE - the current value of the flag register in 4 bits.
41 // ALUcontrolE - the control signal for the ALU operation in the execute stage in 2 bits.
42 // RD1E, RD2E - the values of the first and second operands from the register file in the execute stage in 32 bits.
43 // RAID, RA2E, WA3E - the register addresses for the first and second operands and the write-back address in the execute stage, in 4 bits.
44 // ExtImmE - the value of the immediate operand, sign-extended to 32 bits, in the execute stage.
45
46 module DecodeReg(input logic clk,
47                   input logic rst,
48                   input logic Flushe,
49                   input logic PCsrcD,
50                   input logic RegwriteD,
51                   input logic MemtoRegD,
52                   input logic MemwriteD,
53                   input logic BranchD,
54                   input logic ALUsrcD,
55                   input logic FlagwriteD,
56                   input logic [31:0]PCPlus8D,
57                   input logic [31:0]CondD,
58                   input logic [31:0]FlagsD,
59                   input logic [1:0] ALUcontrolD,
60                   input logic [31:0] RD1D, RD2D,
61                   input logic [3:0] RAID, RA2D, WA3D,
62                   input logic [31:0] ExtImmD,
63                   output logic PCsrcE,
64                   output logic RegwriteE,
65                   output logic MemtoRegE,
66                   output logic MemwriteE,
67                   output logic BranchE,
68                   output logic ALUsrcE,
69                   output logic FlagwriteE,
70                   output logic [31:0]PCPlus8E,
71                   output logic [3:0] CondE,
72                   output logic [31:0]FlagsE,
73                   output logic [1:0] ALUcontrolE,
74                   output logic [31:0] RD1E, RD2E,
75                   output logic [3:0] RAID, RA2E, WA3E,
76                   output logic [31:0] ExtImmE);
77
78 // memory:
79 logic [6:0] ctrlsgMem;
80 logic [1:0] ALUcontrolMem;
81 logic [2:0][31:0] dataMem ;
82 logic [3:0] condMem;
83 logic [3:0] FlagMem;
84 logic [2:0][3:0] WAMem;
85 logic [31:0] PCMem;
86
87 // write port
88 always_ff @(posedge clk) begin
89   if (Flushe | rst) begin
90     ctrlsgMem <= '0';
91     ALUcontrolMem <= '0';
92   end begin
93     ctrlsgMem[0] <= PCsrcD;
94     ctrlsgMem[1] <= RegwriteD;
95     ctrlsgMem[2] <= MemtoRegD;
96     ctrlsgMem[3] <= MemwriteD;
97     ctrlsgMem[4] <= BranchD;
98     ctrlsgMem[5] <= ALUsrcD;
99     ctrlsgMem[6] <= FlagwriteD;
100    FlagMem <= FlagsD;
101    ALUcontrolMem <= ALUcontrolD;
102
103    dataMem[0] <= RD1D;
104    dataMem[1] <= RD2D;
105    dataMem[2] <= ExtImmD;
106    WAMem[0] <= RAID;
107    WAMem[1] <= RA2D;
108    WAMem[2] <= WA3D;
109    condMem <= CondD;
110    PCMem <= PCPlus8D;
111
112  end
113
114 // asynchronous read
115 assign PCsrcE = ctrlsgMem[0];
116 assign RegwriteE = ctrlsgMem[1];
117 assign MemtoRegE = ctrlsgMem[2];
118 assign MemwriteE = ctrlsgMem[3];
119 assign BranchE = ctrlsgMem[4];
120 assign ALUsrcE = ctrlsgMem[5];
121 assign FlagwriteE = ctrlsgMem[6];
122 assign FlagsE = FlagMem;
123 assign ALUcontrolE = ALUcontrolMem;
124 assign RAID = WAMem[0];
125 assign RA2D = WAMem[1];
126 assign WA3D = WAMem[2];
127 assign ALUcontrolE = ALUcontrolMem;
128 assign RD1E = dataMem[0];
129 assign RD2E = dataMem[1];
130 assign ExtImmE = dataMem[2];
131 assign PCPlus8E = PCMem;
132
133 endmodule

```

#### 4) ExcReg.sv

```

1 //Junchao Zhou, Chenhan Dai
2 //05/05/2023
3 //EE481
4 //Lab #3
5
6 // ExcReg is a pipeline register between execute stage and memory stage
7 // update output value with correspond input value synchronize to the clock
8
9 // Input:
10 // clk - a clock signal used for synchronization.
11 // rst - a reset signal used to reset the state of the module.
12 // PCSrc - a signal that determines the source of the next program counter value.
13 // Regwrite - a signal that enables writing to the register file.
14 // MemtoReg - a signal that determines whether the data from memory is written to the register file.
15 // Memwrite - a signal that enables writing to memory.
16 // ALUResultE - 32 bits value data as result from ALU
17 // WriteDataE - 32 bits value data as the first source of srcB
18 // WA3E - 4 bits write back address in Execute stage
19
20 // Output
21 // PCSrcM - a signal that determines the source of the next program counter value in memory stage.
22 // RegwriteM - a signal that enables writing to the register file in memory stage.
23 // MemtoRegM - a signal that determines whether the data from memory is written to the register file in memory stage.
24 // MemwriteM - a signal that enables writing to memory in memory stage.
25 // ALUResultM - 32 bits value data as result from ALU in memory stage
26 // WriteDataM - 32 bits value data as the first source of srcB in memory stage
27 // WA3M - 4 bits write back address in Memory stage
28
29 module ExcReg(input logic clk,
30                 input logic rst,
31                 input logic PCSrc,
32                 input logic Regwrite,
33                 input logic MemtoReg,
34                 input logic Memwrite,
35                 input logic [31:0] ALUResultE,
36                 input logic [31:0] WriteDataE,
37                 input logic [3:0] WA3E,
38                 output logic PCSrcM,
39                 output logic RegwriteM,
40                 output logic MemtoRegM,
41                 output logic MemwriteM,
42                 output logic [31:0] ALUResultM,
43                 output logic [31:0] WriteDataM,
44                 output logic [3:0] WA3M);
45
46     // memory:
47     logic [3:0] memory;
48     logic [31:0] ALUMem;
49     logic [31:0] WriteDataMem;
50     logic [3:0] WAMem;
51
52     // write port
53     always_ff @(posedge clk) begin
54         if (rst) begin
55             memory <= 'bo;
56         end
57         else begin
58             memory[0] <= PCSrc;
59             memory[1] <= Regwrite;
60             memory[2] <= MemtoReg;
61             memory[3] <= Memwrite;
62
63             ALUMem <= ALUResultE;
64             WriteDataMem <= WriteDataE;
65             WAMem <= WA3E;
66         end
67
68     // asynchronous read
69     assign PCSrcM = memory[0];
70     assign RegwriteM = memory[1];
71     assign MemtoRegM = memory[2];
72     assign MemwriteM = memory[3];
73     assign ALUResultM = ALUMem;
74     assign WriteDataM = WriteDataMem;
75     assign WA3M = WAMem;
76
77 endmodule
78

```

## 5) MemReg.sv

```

1 //Junchao Zhou, Chenhan Dai
2 //05/05/2023
3 //EE469
4 //Lab #3
5
6 // Pipeline Register between Memory Stage and write Stage
7
8 // Input:
9 // clk - a clock signal used for synchronization.
10 // rst - a reset signal used to reset the rest of the module.
11 // PCSrcM - a signal that determines the source of the next program counter value.
12 // RegwriteM - a signal that enables writing to the register file.
13 // MemtoRegM - a signal that determines whether the data from memory is written to the register file.
14 // ReaddataM - 32 bits data read from memory
15 // ALUResultM - 32 bits data from ALUResult in memory stage
16 // WA3M - 4 bits write back address in memory stage
17
18 //Output:
19 // PCSrcW - a signal that determines the source of the next program counter value.
20 // RegwriteW - a signal that enables writing to the register file.
21 // MemtoRegW - a signal that determines whether the data from memory is written to the register file.
22 // ReaddataW - 32 bits data read from memory in write stage
23 // ALUResultW - 32 bits data from ALUResult in write stage
24 // WA3W - 4 bits write back address in write stage
25
26 module MemReg(input logic clk,
27                 input logic rst,
28                 input logic PCSrcM,
29                 input logic RegwriteM,
30                 input logic MemtoRegM,
31                 input logic [31:0] ReaddataM,
32                 input logic [31:0] ALUResultM,
33                 input logic [3:0] WA3M,
34                 output logic PCSrcW,
35                 output logic RegwriteW,
36                 output logic MemtoRegW,
37                 output logic [31:0] ReaddataW,
38                 output logic [31:0] ALUResultW,
39                 output logic [3:0] WA3W;
40
41     // memory;
42     logic [2:0] memory;
43     logic [1:0][31:0] dataMem ;
44     logic [3:0] WAMem;
45
46     // write port
47     always_ff @(posedge clk) begin
48         if (rst) begin
49             memory <= 'b0;
50         end
51         else begin
52             memory[0] <= PCSrcM;
53             memory[1] <= RegwriteM;
54             memory[2] <= MemtoRegM;
55         end
56         dataMem[0] <= ReaddataM;
57         dataMem[1] <= ALUResultM;
58         WAMem <= WA3M;
59     end
60
61
62     // asynchronous read
63     assign PCSrcW = memory[0];
64     assign RegwriteW = memory[1];
65     assign MemtoRegW = memory[2];
66     assign ReaddataW = dataMem[0];
67     assign ALUResultW = dataMem[1];
68     assign WA3W = WAMem;
69
70 endmodule

```

## 6) memfile3.dat

```

// ADD R - 1110_000_0100_0_AAAA_DDDD_0000_0000_BBBB
// ADD I - 1110_001_0100_0_AAAA_DDDD_0000_IIII_IIII
// SUB R - 1110_000_0010_0_AAAA_DDDD_0000_0000_BBBB
// SUB I - 1110_001_0010_0_AAAA_DDDD_0000_IIII_IIII
// CMP R - 1110_000_0010_1_AAAA_DDDD_0000_0000_BBBB
// CMP I - 1110_001_0010_1_AAAA_DDDD_0000_IIII_IIII
// AND - 1110_000_0000_0_AAAA_DDDD_0000_0000_BBBB
// ORR - 1110_000_1100_0_AAAA_DDDD_0000_0000_BBBB
// LDR - 1110_010_1100_1_AAAA_DDDD_IIII_IIII_IIII
// STR - 1110_010_1100_0_AAAA_DDDD_IIII_IIII_IIII
// COND_1010_IIII_IIII_IIII_IIII_IIII_IIII

// Equal      - COND = 0000
// Not Equal   - COND = 0001
// Greater or Equal - COND = 1010
// Greater      - COND = 1100
// Less or Equal - COND = 1101
// Less        - COND = 1011

```

|                                    |         |                 |
|------------------------------------|---------|-----------------|
| 1110000001001110000000000001111    | // MAIN | SUB R0 R15 R15  |
| 11100010100000000001000000000001   | //      | ADD R1 R0 #1    |
| 11100001100000000001000000000001   | //      | ORR R2 R0 R1    |
| 111000101000000000010000000000010  | //      | ADD R2 R0 #2    |
| 11100010010100100000000000000000   | //      | SUBS R0 R2 #0   |
| 0000101000000000000000000000000001 | //      | BEQ TAG1        |
| 11100000000000100010000000000000   | //      | AND R2 R2 R0    |
| 11100000000000010000100000000000   | //      | AND R1 R2 R0    |
| 11100000100000011001000000000000   | // TAG1 | ADD R9 R1 R0    |
| 11100101100000001001000000001001   | //      | STR R9 [R0, #9] |
| 11100101100100000011000000001001   | //      | LDR R3 [R0, #9] |
| 11100000000000011001000000000010   | //      | AND R2 R3 R2    |