

# Design and Implementation of a MIPS-Inspired Dual-Core Processor System: Hardware Architecture and FreeRTOS Port

Team: Custom Processor Development (PESHWAS Pool)

September 2, 2025

## **Abstract**

This report documents the design, implementation, and verification of a complete MIPS-inspired dual-core processor system.

The first part details the hardware architecture, focusing on the memory arbitration layer, a shared L1 cache with a write-through policy, and a single-write-port RAM interface. The architecture supports dual-port reads and serialised writes through a multiplexer controlled by a round-robin arbiter. This section includes detailed signal mappings, finite-state machine descriptions, flowcharts, timing diagrams, pseudocode suitable for direct porting into Simulink / MATLAB environments, and an analysis of the Simulink implementation.

The second part describes the software implementation, specifically the porting of the FreeRTOS real-time operating system to the custom dual-core hardware. It covers crucial components such as the scheduler, context switcher, and the Board Support Package (BSP). It also details the verification strategy, which involved integrating and testing the FreeRTOS port on an x86 (Windows) simulation environment to validate functionality and performance before deployment on the target MIPS-like hardware.

# Contents

|                                                             |           |
|-------------------------------------------------------------|-----------|
| <b>I Hardware Design and Implementation</b>                 | <b>3</b>  |
| <b>1 Introduction</b>                                       | <b>4</b>  |
| 1.1 Motivation . . . . .                                    | 4         |
| 1.2 Scope of this document . . . . .                        | 4         |
| <b>2 Background: MIPS-Inspired Processor Fundamentals</b>   | <b>5</b>  |
| 2.1 Overview of the MIPS-Inspired ISA . . . . .             | 5         |
| 2.2 Register File and PC semantics in this design . . . . . | 5         |
| 2.3 Pipeline and dataflow (short) . . . . .                 | 5         |
| <b>3 Top-Level System Architecture</b>                      | <b>6</b>  |
| 3.1 Top-level block diagram . . . . .                       | 6         |
| <b>4 Signal Interface and Pin Mapping</b>                   | <b>7</b>  |
| 4.1 Core $\leftrightarrow$ Arbiter signals . . . . .        | 7         |
| 4.2 Arbiter $\leftrightarrow$ Cache signals . . . . .       | 7         |
| 4.3 Cache $\leftrightarrow$ RAM signals . . . . .           | 8         |
| <b>5 Arbiter Design and FSM</b>                             | <b>9</b>  |
| 5.1 Design goals and constraints . . . . .                  | 9         |
| 5.2 State machine . . . . .                                 | 9         |
| 5.3 Detailed Arbiter pseudocode (Matlab-like) . . . . .     | 9         |
| 5.4 Discussion of corner cases . . . . .                    | 11        |
| <b>6 Cache Design</b>                                       | <b>13</b> |
| 6.1 Properties . . . . .                                    | 13        |
| 6.2 Cache interfaces (explicit) . . . . .                   | 13        |
| 6.3 Cache pseudocode (internal logic) . . . . .             | 13        |
| <b>7 RAM Model</b>                                          | <b>15</b> |
| 7.1 Properties and constraints . . . . .                    | 15        |
| 7.2 Timing constraints . . . . .                            | 15        |
| <b>8 Integration with Core Pipeline (detailed)</b>          | <b>16</b> |
| 8.1 How cores use grant signals . . . . .                   | 16        |
| 8.2 Register file reads/writes . . . . .                    | 16        |
| 8.3 Control Decode block (in-depth) . . . . .               | 16        |
| <b>9 Flowcharts and Timing Diagrams</b>                     | <b>17</b> |
| 9.1 Arbiter decision flowchart . . . . .                    | 17        |
| <b>10 Detailed wiring and Simulink mapping</b>              | <b>19</b> |

|                                                              |           |
|--------------------------------------------------------------|-----------|
| <b>11 Verification and Testing Strategy</b>                  | <b>21</b> |
| 11.1 Unit tests . . . . .                                    | 21        |
| 11.2 Integration tests . . . . .                             | 21        |
| <b>12 Analysis of Provided Simulink Screenshots</b>          | <b>22</b> |
| 12.1 Top-level screenshot (Figure ??) . . . . .              | 22        |
| 12.2 Core datapath screenshots (Figures ?? and ??) . . . . . | 22        |
| 12.3 Cache screenshot (Figure ??) . . . . .                  | 22        |
| <b>13 Hardware Conclusion and Future Work</b>                | <b>23</b> |
| 13.1 Summary . . . . .                                       | 23        |
| 13.2 Immediate next steps . . . . .                          | 23        |
| 13.3 Future improvements . . . . .                           | 23        |
| <b>II Software Implementation: FreeRTOS Port</b>             | <b>24</b> |
| <b>14 Introduction to the FreeRTOS Port</b>                  | <b>25</b> |
| <b>15 System Overview from a Software Perspective</b>        | <b>26</b> |
| 15.1 Dual-Core MIPS like Architecture . . . . .              | 26        |
| 15.2 FreeRTOS on MIPS like . . . . .                         | 26        |
| <b>16 Implementation of Key Software Components</b>          | <b>27</b> |
| 16.1 Scheduler . . . . .                                     | 27        |
| 16.2 Context Switcher . . . . .                              | 28        |
| 16.3 Board Support Package (BSP) . . . . .                   | 29        |
| 16.4 Synchronization Mechanisms . . . . .                    | 30        |
| <b>17 FreeRTOS Port-Specific Definitions</b>                 | <b>31</b> |
| <b>18 Verification via x86 Simulation</b>                    | <b>32</b> |
| 18.1 Purpose of Integration . . . . .                        | 32        |
| 18.2 Changes to FreeRTOS for x86 . . . . .                   | 32        |
| 18.3 Simulation Code . . . . .                               | 33        |
| 18.4 Testing and Results . . . . .                           | 33        |
| 18.4.1 Benchmark Task Scenario . . . . .                     | 34        |
| <b>19 Software Implementation Conclusion</b>                 | <b>36</b> |

# **Part I**

## **Hardware Design and Implementation**

# Chapter 1

## Introduction

### 1.1 Motivation

Modern multi-core processors must coordinate access to shared memory resources to guarantee correctness and performance. In many embedded and educational implementations, a shared L1 cache with a controlled interface to main memory is used to balance throughput and complexity. This project builds a MIPS-inspired dual-core processor in a Simulink environment with the following design goals:

- Allow simultaneous read accesses to the shared L1 cache (dual-port reads).
- Serialize write accesses using a single write port (single write address and data line).
- Implement a round-robin arbiter that grants access fairly and handles cache misses by coordinating cache-to-RAM transfers.
- Use a write-through cache policy to maintain consistency between cache and RAM.
- Provide clear, testable signals for use with pipeline stall logic inside each core.

### 1.2 Scope of this document

This document describes the architecture, signal-level connections, arbiter FSM, cache behaviour, RAM model, and pipeline integration for a MIPS-inspired instruction set (R, I and J types). It also analyses the Simulink screenshots provided by the implementers and supplies detailed pseudocode and diagrams to make this design replicable and auditable.

# Chapter 2

## Background: MIPS-Inspired Processor Fundamentals

### 2.1 Overview of the MIPS-Inspired ISA

The implemented system follows a classic R/I/J instruction grouping (the processor is MIPS-inspired). Briefly:

- **R-type** (register): three-register format (opcode=0, func field determines operation). Operands: **rs**, **rt** read from register file; result written to **rd**.
- **I-type** (immediate): one register and an immediate, used for loads/stores/branches/ALU immediate instructions. Operands: **rs** and immediate; result often written to **rt**.
- **J-type** (jump): target address for unconditional jump.

### 2.2 Register File and PC semantics in this design

- The register file is a synchronous write, asynchronous read (or implemented as two read ports and one write port). Ports are labeled **rs**, **rt** (read ports) and **rd/rt** (write port depending on instruction).
- Program Counter (PC) is updated each instruction: typically  $\text{PC} := \text{PC} + 4$  except when a taken branch or jump modifies it. Pipeline hazards are handled by stall/insertion logic using **grant** signals from the arbiter to freeze IF/ID/ID/EX as required.
- In the implemented cores, the memory (load/store) stage drives **mem\_read** and **mem\_write** signals to request data access; these are in turn mediated by the arbiter and cache.

### 2.3 Pipeline and dataflow (short)

Each core implements a simple 5-stage pipeline (IF, ID, EX, MEM, WB) with registers between stages. The MEM stage uses the arbiter's grant; when **grant=0** for that core, the MEM stage is stalled (pipeline registers not advanced) until grant asserted.

# Chapter 3

## Top-Level System Architecture

### 3.1 Top-level block diagram

Figure ?? shows the canonical top-level architecture implemented in Simulink: two cores (Core0, Core1), the central Arbiter, Shared L1 Cache, and RAM system. The cache holds two read ports and a single write port. Writes are funnelled using an arbiter-controlled MUX.

# Chapter 4

## Signal Interface and Pin Mapping

This section provides an explicit, exhaustive signal map to be used when wiring Simulink blocks or converting to HDL.

### 4.1 Core $\leftrightarrow$ Arbiter signals

| Signal               | Semantics (direction)                                                                            |
|----------------------|--------------------------------------------------------------------------------------------------|
| <code>req_i</code>   | Request enable: asserted by core when MEM stage needs memory access (Core $\rightarrow$ Arbiter) |
| <code>rw_i</code>    | Read/Write indicator: 0 = read, 1 = write (Core $\rightarrow$ Arbiter)                           |
| <code>addr_i</code>  | Address bus (32-bit) (Core $\rightarrow$ Arbiter)                                                |
| <code>wdata_i</code> | Write data bus (32-bit) (Core $\rightarrow$ Arbiter)                                             |
| <code>grant_i</code> | Grant / stall signal (Arbiter $\rightarrow$ Core): 1 = core allowed to proceed, 0 = stall        |
| <code>rdata_i</code> | Returned read data (32-bit) (Arbiter $\rightarrow$ Core)                                         |
| <code>stall_i</code> | Optional explicit stall (derived as <code>grant_i</code> ) (Arbiter $\rightarrow$ Core)          |

### 4.2 Arbiter $\leftrightarrow$ Cache signals

| Signal                          | Semantics (direction)                                                                             |
|---------------------------------|---------------------------------------------------------------------------------------------------|
| <code>read_addr_0/1</code>      | Read addresses forwarded to cache (Arbiter $\rightarrow$ Cache)                                   |
| <code>read_enable_0/1</code>    | Read enables for cache ports (Arbiter $\rightarrow$ Cache)                                        |
| <code>cache_rdata_0/1</code>    | Read data returned from cache (Cache $\rightarrow$ Arbiter)                                       |
| <code>cache_hit_0/1</code>      | Cache hit indicators (Cache $\rightarrow$ Arbiter)                                                |
| <code>write_addr</code>         | Single write address after MUX selection (Arbiter/MUX $\rightarrow$ Cache)                        |
| <code>write_data</code>         | Single write data after MUX selection (Arbiter/MUX $\rightarrow$ Cache)                           |
| <code>cache_write_enable</code> | When 1, perform a write into cache and trigger write-through to RAM (Arbiter $\rightarrow$ Cache) |
| <code>cache_write_ack</code>    | Cache acknowledges it stored the data (Cache $\rightarrow$ Arbiter)                               |

| Signal                       | Semantics (direction)                                           |
|------------------------------|-----------------------------------------------------------------|
| <code>cache_fill_done</code> | Cache finished filling a miss entry from RAM<br>(Cache→Arbiter) |

### 4.3 Cache $\leftrightarrow$ RAM signals

| Signal                        | Semantics (direction)                       |
|-------------------------------|---------------------------------------------|
| <code>ram_read_addr</code>    | Address to read from RAM (Cache→RAM)        |
| <code>ram_read_enable</code>  | When 1 start RAM read operation (Cache→RAM) |
| <code>ram_read_data</code>    | Data read from RAM (RAM→Cache)              |
| <code>ram_read_done</code>    | RAM indicates read data valid (RAM→Cache)   |
| <code>ram_write_addr</code>   | Address to write into RAM (Cache→RAM)       |
| <code>ram_write_data</code>   | Write data for RAM (Cache→RAM)              |
| <code>ram_write_enable</code> | Single write enable for RAM (Cache→RAM)     |
| <code>ram_write_ack</code>    | RAM acknowledges write done (RAM→Cache)     |

# Chapter 5

## Arbiter Design and FSM

### 5.1 Design goals and constraints

Key requirements:

- Fairness via round-robin: avoid starvation.
- Allow concurrent reads (both cores read simultaneously).
- Serialize writes (only one core at a time may write).
- On cache miss, stall requesting core, trigger cache-to-RAM read, and un-stall when fill done.
- For writes, use a single write address/data via a MUX controlled by arbiter grants and write enables.

### 5.2 State machine

Define the arbiter as a small FSM. Minimal states:

- **IDLE**: no pending requests; monitor requests.
- **SERVE\_READS**: both read requests may be served; hits handled immediately; misses route to **WAIT\_RAM\_READ**.
- **WAIT\_RAM\_READ**: waiting for RAM to return data for a cache miss.
- **SERVE\_WRITE**: grant a single core write (via MUX), assert `cache_write_enable`.
- **WAIT\_RAM\_WRITE\_ACK**: waiting for RAM write ack for write-through.

### 5.3 Detailed Arbiter pseudocode (Matlab-like)

Listing 5.1: Matlab-like arbiter skeleton (drop-in for MATLAB Function block).

```
1 function [grant_0, grant_1, rdata_0, rdata_1, read_addr_0, read_addr_1,
2     ...
3         read_enable_0, read_enable_1, write_addr, write_data,
4             cache_write_enable, write_select] ...
5 = arbiter_tick(req_0, req_1, rw_0, rw_1, addr_0, addr_1,
6                 wdata_0, wdata_1, ...
7                     cache_hit_0, cache_hit_1, cache_rdata_0,
8                         cache_rdata_1, ...
9                             cache_fill_done, cache_write_ack)
10 % persistent state
11 persistent state last_served
12 if isempty(state); state = 0; last_served = 0; end
13 IDLE=0; SERVE_READS=1; WAIT_RAM_READ=2; SERVE_WRITE=3; WAIT_RAM_WRITE
14     =4;
```

```

10
11 % defaults
12 grant_0=0; grant_1=0;
13 rdata_0=0; rdata_1=0;
14 read_enable_0=0; read_enable_1=0;
15 read_addr_0=0; read_addr_1=0;
16 write_addr=0; write_data=0; cache_write_enable=0; write_select=0;
17
18 switch state
19 case IDLE
20   if req_0 && ~req_1
21     if rw_0==0, state=SERVE_READS; else state=SERVE_WRITE; end
22   elseif req_1 && ~req_0
23     if rw_1==0, state=SERVE_READS; else state=SERVE_WRITE; end
24   elseif req_0 && req_1
25     % concurrent requests: prefer last_served^1
26     if last_served==0, % serve core1 first
27       if rw_1==0, state=SERVE_READS; else state=SERVE_WRITE; end
28     else
29       if rw_0==0, state=SERVE_READS; else state=SERVE_WRITE; end
30     end
31   end
32
33 case SERVE_READS
34   read_enable_0 = req_0 && (rw_0==0);
35   read_enable_1 = req_1 && (rw_1==0);
36   read_addr_0 = addr_0; read_addr_1 = addr_1;
37   % observe cache hits
38   if read_enable_0 && cache_hit_0
39     grant_0 = 1; rdata_0 = cache_rdata_0;
40   end
41   if read_enable_1 && cache_hit_1
42     grant_1 = 1; rdata_1 = cache_rdata_1;
43   end
44   % handle misses
45   if (read_enable_0 && ~cache_hit_0) || (read_enable_1 && ~
46     cache_hit_1)
47     % choose which miss to service first (round-robin)
48     if read_enable_0 && ~cache_hit_0 && read_enable_1 && ~cache_hit_1
49       if last_served==0, miss_core=1; else miss_core=0; end
50     elseif read_enable_0 && ~cache_hit_0, miss_core=0;
51     else miss_core=1;
52     end
53     % stall that core and trigger cache->RAM read via cache interface
54     if miss_core==0, % service core0
55       grant_0 = 0; grant_1 = cache_hit_1;
56       % instruct cache to fetch (cache will assert ram_read_enable)
57       state = WAIT_RAM_READ;
58     else
59       grant_1 = 0; grant_0 = cache_hit_0;
60       state = WAIT_RAM_READ;
61     end
62   else
63     % all hits covered; clear state & update last_served
64     state = IDLE;
65     if grant_0, last_served = 0; end
66     if grant_1, last_served = 1; end
67   end

```

```

67
68 case WAIT_RAM_READ
69     % wait for cache_fill_done to be asserted by cache after RAM
70     % completes read
71     if cache_fill_done
72         % assume cache has updated cache_rdata_* accordingly
73         if req_0 && ~cache_hit_0, grant_0 = 1; rdata_0 = cache_rdata_0;
74             last_served=0; end
75         if req_1 && ~cache_hit_1, grant_1 = 1; rdata_1 = cache_rdata_1;
76             last_served=1; end
77         state = IDLE;
78     end
79
80
81 case SERVE_WRITE
82     % choose which writer to grant using round-robin and requests
83     if req_0 && ~req_1
84         write_select = 0;
85     elseif req_1 && ~req_0
86         write_select = 1;
87     else % both
88         write_select = last_served; % give other core priority
89     end
90     if write_select==0
91         write_addr = addr_0; write_data = wdata_0;
92         grant_0 = 1; grant_1 = 0;
93     else
94         write_addr = addr_1; write_data = wdata_1;
95         grant_1 = 1; grant_0 = 0;
96     end
97     cache_write_enable = 1;
98     state = WAIT_RAM_WRITE;
99
100
101 case WAIT_RAM_WRITE
102     % wait for cache_write_ack and ram_write_ack (cache will handle RAM
103     % handshake)
104     if cache_write_ack
105         cache_write_enable = 0;
106         % acknowledge to core and update last_served
107         if grant_0, last_served = 0; end
108         if grant_1, last_served = 1; end
109         state = IDLE;
110     end
111 end
112 end

```

## 5.4 Discussion of corner cases

- **Simultaneous write attempts:** write-select MUX picks one based on round-robin; the other core sees grant=0 and must retry the request.
- **Read while write in progress to same address:** arbiter must conservatively stall read if write is pending for same address or if cache coherence risk exists. For simplicity, the implemented arbiter stalls the read (deny grant) until write ack completes if addresses match.
- **Cache fill ordering:** If two misses occur to the same address, ensure single RAM fetch and update both cores appropriately.



Figure 5.1: Arbiter FSM (high-level).

# Chapter 6

## Cache Design

### 6.1 Properties

- Shared L1 cache with dual-read ports and single write port.
- Write-through policy: writes are written to cache and simultaneously propagated to RAM.
- On a read miss, cache issues a RAM read and returns data on fill completion.

### 6.2 Cache interfaces (explicit)

Cache internal ports are consistent with the earlier signal table. Key behaviour:

- On `read_enable_i`, if tag matches, assert `cache_hit_i` and present `cache_rdata_i`.
- If miss: stall core (via arbiter) until `ram_read_done` returns and cache fills the line, then assert `cache_fill_done`.
- On `cache_write_enable`, write data to cache; forward same write to RAM by asserting `ram_write_enable` and returning `cache_write_ack` after both cache store and RAM acknowledge (or after cache store and RAM queued write).

### 6.3 Cache pseudocode (internal logic)

Listing 6.1: Cache pseudo-behaviour (high-level)

```
1 % Inputs: read_addr_0/1, read_enable_0/1, write_addr, write_data,
2   write_enable
3 % RAM interface: ram_read_addr, ram_read_enable, ram_read_data,
4   ram_read_done,
5   ram_write_addr, ram_write_data, ram_write_enable,
6   ram_write_ack
7 % Outputs: cache_rdata_0/1, cache_hit_0/1, cache_fill_done,
8   cache_write_ack
9 for each cycle:
10   % Read ports
11   if read_enable_0
12     if lookup_tag(read_addr_0) == HIT
13       cache_rdata_0 = read_cache_line(read_addr_0)
14       cache_hit_0 = 1
15     else
16       cache_hit_0 = 0
17       % start RAM read if not already queued
18       ram_read_addr = read_addr_0; ram_read_enable = 1
19     end
20   end
```

```
17 % Similar for port 1
18 % Write-through handling
19 if write_enable
20   write_cache_line(write_addr, write_data)
21   % forward to RAM
22   ram_write_addr = write_addr; ram_write_data = write_data
23   ram_write_enable = 1
24 end
25 % RAM response handling
26 if ram_read_done
27   write_cache_line(ram_read_addr, ram_read_data)
28   cache_fill_done = 1
29 end
30 if ram_write_ack
31   cache_write_ack = 1
32 end
33 end
```

# Chapter 7

## RAM Model

### 7.1 Properties and constraints

- The RAM model supports dual independent read ports (or a multi-port read emulation) and a single write port (serialized).
- Read operations take several cycles (simulated latency); the RAM asserts a handshake signal (`ram_read_done`) when data is valid.
- Write operations also take multiple cycles and return an acknowledge (`ram_write_ack`).

### 7.2 Timing constraints

Define constants:

- `RAM_READ_LATENCY = N cycles`
- `RAM_WRITE_LATENCY = M cycles`

These constants determine how long arbiter waits in `WAIT_RAM_READ`/`WAIT_RAM_WRITE` states.

# Chapter 8

## Integration with Core Pipeline (detailed)

### 8.1 How cores use grant signals

- The MEM stage asserts `req` and `rw`; it then waits for `grant` to proceed.
- If `grant==1` and read, core receives `rdata` from arbiter to forward to WB stage.
- If `grant==0`, the core stalls (pipeline registers freeze), effectively pausing the whole pipeline behind MEM stage.

### 8.2 Register file reads/writes

For R-type:

1. Decode stage: identify rs, rt, rd.
2. IF/ID → ID/EX: register file read occurs in ID stage (rs and rt read) and values propagated to EX.
3. EX stage: ALU uses rs and rt as specified by instruction.
4. WB stage: result from ALU is written back to rd.

For I-type (load/store/addi,...):

- Load: EX computes effective address using rs + sign-extended immediate. MEM stage requests a read through arbiter; upon completion, the data is written back to rt.
- Store: EX computes address; MEM asserts write request and writes data (rt) through arbiter/cache.

### 8.3 Control Decode block (in-depth)

The control decode block maps instruction opcodes and function fields into control signals that power datapath multiplexers and functional units. Typical control outputs:

- `alu_op`, `alu_src_b_imm`, `reg_write`, `mem_read`, `mem_write`, `mem_to_reg`, `branch`, `jump`, `link`

Detailed responsibilities:

1. Decode opcode and func to determine ALU operation (e.g., add, sub, and, or, slt).
2. Generate `reg_write` for register writes in the WB stage.
3. Generate `mem_read` and `mem_write` signals used by the MEM stage and forwarded to arbiter.
4. Provide control to multiplexers for write-back (ALU result vs memory vs PC+4).
5. Provide branch type signals and compute branch decision combining ALU zero and branch type.

# Chapter 9

## Flowcharts and Timing Diagrams

### 9.1 Arbiter decision flowchart

(Flowchart rendered in TikZ below.)



Figure 9.1: Decision flow for read/write selection.

## Chapter 10

# Detailed wiring and Simulink mapping

This section repeats the canonical wiring list in a copy-paste friendly table (use these names in Simulink block ports precisely or map aliases):

| Signal             | Direction         | From → To (canonical)                     |
|--------------------|-------------------|-------------------------------------------|
| req_0              | Core→Arbiter      | Core0.req_i → Arbiter.req_0               |
| rw_0               | Core→Arbiter      | Core0.rw_i → Arbiter.rw_0                 |
| addr_0             | Core→Arbiter      | Core0.addr_i → Arbiter.addr_0             |
| wdata_0            | Core→Arbiter      | Core0.wdata_i → Arbiter.wdata_0           |
| grant_0            | Arbiter→Core      | Arbiter.grant_0 → Core0.grant             |
| rdata_0            | Arbiter→Core      | Arbiter.rdata_0 → Core0.rdata             |
| read_addr_0        | Arbiter→Cache     | Arbiter.read_addr_0 → Cache.C1_addr       |
| read_enable_0      | Arbiter→Cache     | Arbiter.read_enable_0 → Cache.C1_ren      |
| cache_hit_0        | Cache→Arbiter     | Cache.C1_hit → Arbiter.cache_hit_0        |
| cache_rdata_0      | Cache→Arbiter     | Cache.C1_out → Arbiter.cache_rdata_0      |
| write_addr         | MUX/Arbiter→Cache | MUX.out → Cache.write_addr                |
| write_data         | MUX/Arbiter→Cache | MUX.out_data → Cache.write_data           |
| cache_write_enable | Arbiter→Cache     | Arbiter.cache_write_enable → Cache.C_wen  |
| ram_read_addr      | Cache→RAM         | Cache.ram_read_addr → RAM.read_addr       |
| ram_read_enable    | Cache→RAM         | Cache.ram_read_enable → RAM.read_enable   |
| ram_read_data      | RAM→Cache         | RAM.read_data → Cache.ram_read_data       |
| ram_read_done      | RAM→Cache         | RAM.read_done → Cache.ram_read_done       |
| ram_write_addr     | Cache→RAM         | Cache.ram_write_addr → RAM.write_addr     |
| ram_write_data     | Cache→RAM         | Cache.ram_write_data → RAM.write_data     |
| ram_write_enable   | Cache→RAM         | Cache.ram_write_enable → RAM.write_enable |
| ram_write_ack      | RAM→Cache         | RAM.write_ack → Cache.ram_write_ack       |

Table 10.1: Canonical signal mapping table.

# Chapter 11

## Verification and Testing Strategy

### 11.1 Unit tests

- **Arbiter fairness:** apply long sequences of write requests from both cores; verify round-robin arbitration and absence of starvation.
- **Concurrent reads:** assert both cores read same/different addresses with cache hits; ensure both grants asserted and correct data returned in same cycle.
- **Write-read conflict:** Core0 writes while Core1 reads same address; ensure write serialized and read obtains updated value (write-through ensures RAM and cache updated as required).
- **Cache miss handling:** generate reads to uncached addresses and verify RAM read initiated, cache fill, and core un-stalled on fill.

### 11.2 Integration tests

- Run small compiled programs (assembly) that perform interleaved reads and writes across both cores, verify program outcome matches single-core reference.
- Pipeline hazard injection: create sequences that cause data hazards and verify pipeline stall behaviour when grant==0.

# Chapter 12

## Analysis of Provided Simulink Screenshots

### 12.1 Top-level screenshot (Figure ??)

- The screenshot shows Core1/ Core2 blocks, an Arbiter in center, and a Shared\_Cache block with clearly labelled ports: `C1_addr`, `C2_addr`, `C1_data`, `C2_data`, `C_wen`, and RAM ports.
- Observations: the arbiter already exports `grant1` and `grant2` and a `ramload` signal—consistent with the described FSM (ramload used to kick off RAM reads).

### 12.2 Core datapath screenshots (Figures ?? and ??)

- The core datapath contains IF ID EX MEM WB registers and a RegFile block with `rs` and `rt` read ports and `wdata` write port. The mem control block exports `memR`/`memW` signals that connect to `mem_arb` (arbiter interface).
- Observations: the internal control decode block exists and produces `memR` and `memW`. The mem stage already connects to `mem_arb` with signal names consistent with this report.

### 12.3 Cache screenshot (Figure ??)

- The shared cache block has output signals `c1_out`, `c2_out`, `stall1`, `stall2`, `ram_addr`, `ram_data`, `ram_wen`.
- This confirms the chosen interface mapping and shows the cache is implemented as a module that controls RAM handshake.

# Chapter 13

# Hardware Conclusion and Future Work

## 13.1 Summary

This document compiled a complete specification for a MIPS-inspired dual-core processor memory subsystem with a round-robin arbiter, a shared dual-port-read single-write L1 cache (write-through), and a single-write-port RAM. The report contains:

- Signal-level mapping and canonical port names.
- Arbiter FSM, pseudocode, and flowcharts.
- Cache internal behaviour, cacheRAM handshake, and timing diagrams.
- Analysis of Simulink screenshots and mapping to the documented interface.

## 13.2 Immediate next steps

1. Implement the arbiter FSM in a Stateflow chart or MATLAB Function block using the provided pseudocode.
2. Ensure `grant_i` is wired to core pipeline register enables to effect stalling.
3. Add explicit `cache_hit` outputs and `cache_fill_done` / `cache_write_ack` to the cache block in Simulink.
4. Build test benches for the scenarios in the verification section.

## 13.3 Future improvements

- Extend to MESI-like coherence if multiple caches later used.
- Support multi-write queuing with a small write buffer to reduce write stall time.
- Add profiling counters and performance counters to measure arbitration latency.

## Part II

# Software Implementation: FreeRTOS Port

## Chapter 14

# Introduction to the FreeRTOS Port

This part provides a detailed description of the implementation of FreeRTOS for a dual-core MIPS like architecture. We explain the crucial components such as the scheduler, context switcher, and Board Support Package (BSP) with complete code examples. In addition, we include our attempt to integrate FreeRTOS with the x86 architecture (Windows) to simulate and test the operating system's functionality and correctness before deploying it on the actual MIPS like hardware. We also present an example task scenario used for benchmarking the OS and analyze its performance.

# Chapter 15

## System Overview from a Software Perspective

### 15.1 Dual-Core MIPS like Architecture

Our system consists of a dual-core MIPS like processor, which is responsible for executing tasks. Each core operates independently, but they share memory and other system resources. The FreeRTOS-based kernel manages the execution of tasks by distributing them across both cores. Special care is taken to manage synchronization and context switching, ensuring that the two cores operate seamlessly in parallel without conflicts. The scheduler must ensure tasks are correctly distributed across the cores and that high-priority tasks on either core get CPU time as needed.

### 15.2 FreeRTOS on MIPS like

FreeRTOS was initially designed for single-core systems. For a dual-core system, we made modifications to the FreeRTOS kernel to ensure proper task management on both cores. Key changes were made to:

- **Scheduler Behavior:** Modified to account for multiple cores and allow simultaneous task execution on two processors.
- **Context Switching:** Extended to save and restore context on a per-core basis, so that tasks on different cores can be preempted and resumed independently.
- **Synchronization Mechanisms:** Enhanced (using mutexes and semaphores) to allow safe communication between tasks possibly running on different cores, preventing race conditions.

# Chapter 16

## Implementation of Key Software Components

### 16.1 Scheduler

The scheduler is responsible for managing the execution of tasks on both cores. Each task is assigned a priority, and the scheduler ensures that the highest-priority ready task is executed first on an available core. For our dual-core MIPS like architecture, the scheduler was modified to handle tasks across two cores while ensuring fair and efficient distribution.

When initializing the scheduler, we set up the data structures needed for two cores (such as ready queues for each core or a combined priority list with core affinity information). An idle task is created for each core to ensure the core is never idle without a task. The task creation function is also adapted for the dual-core context; tasks can be created and will be managed by the scheduler which may assign them to a core (either statically or dynamically based on load).

Below is the code for initializing the scheduler and creating tasks in our dual-core system:

```
1 // Initialize the FreeRTOS scheduler for dual-core MIPS like system
2 void scheduler_init(void) {
3     // Initialize task lists or structures for two cores
4     initCoreStructures(); // e.g., separate ready lists for each core
5     // Create idle task for each core to run when no other task is
6     // available
7     createIdleTask(0);
8     createIdleTask(1);
9     // (Additional initialization as needed)
10 }
11
12 // Task creation API for both cores
13 // In this simplified model, tasks are created without binding to a
14 // specific core.
15 // The scheduler will assign tasks to cores dynamically based on
16 // availability and priority.
17 void create_task(void (*task_function)(void *), const char *name,
18 uint32_t priority) {
19     // Use FreeRTOS API to create a task. The new task is added to the
20     // ready list.
21     BaseType_t result = xTaskCreate(task_function, name,
22         configMINIMAL_STACK_SIZE, NULL, priority, NULL);
23     if (result != pdPASS) {
24         // Handle task creation failure (e.g., out of memory)
25     }
26 }
```

```

22 // Start FreeRTOS scheduler for dual-core system
23 void start_scheduler(void) {
24     // Start the scheduler which will begin tick interrupts on both
25     // cores
26     // and allow tasks to start executing.
27     vTaskStartScheduler();
28     // vTaskStartScheduler will not return unless there is insufficient
     // heap.
}

```

In our implementation, the FreeRTOS kernel was extended so that when the scheduler runs, it can choose a task for each core. For example, if two high-priority tasks are ready, the scheduler will run one on core 0 and the other on core 1 simultaneously. If a single highest-priority task is ready, one core will run that task while the other core might run a lower-priority task or its idle task. This dual-core scheduling ensures the CPU resources are utilized effectively.

## 16.2 Context Switcher

The context switcher is a core part of FreeRTOS. It is responsible for saving the state (CPU registers, stack pointer, etc.) of a running task before switching to another task, and restoring the state of the task being resumed. In a dual-core system, context switching must occur independently on each core. Each core can perform a context switch when its scheduler decides to run a different task (for example, during a tick interrupt or when a high-priority task becomes ready).

Below is the code (simplified) for the context switching mechanism in our dual-core MIPS like FreeRTOS port. We use macros `portSAVE_CONTEXT()` and `portRESTORE_CONTEXT()` (which are implemented in assembly for MIPS like) to save and load task contexts. The variable `pxCurrentTCB` holds a pointer to the current task's control block (which includes the task's stack pointer), and `pxNextTCB` is a pointer to the next task chosen to run by the scheduler.

```

1 void vTaskSwitchContext(void) {
2     // Select the highest priority task that is ready to run
3     TCB_t *pxNextTCB = getHighestPriorityTask();
4
5     // If the next task is different from the current task, perform a
     // switch
6     if (pxCurrentTCB != pxNextTCB) {
7         // Save context of current task (push registers onto its stack,
         // save stack pointer in TCB)
8         portSAVE_CONTEXT();
9
10        // Update the current task pointer to the new task
11        pxCurrentTCB = pxNextTCB;
12
13        // Restore the context of the new task (pop registers from its
         // stack, set up stack pointer, etc.)
14        portRESTORE_CONTEXT();
15        // After this macro, the CPU registers have been loaded for the
         // new task, and it will resume execution.
16    }
17}

```

Each core will perform this context switch routine independently. For instance, when core 0's tick interrupt fires, it may switch tasks on core 0 if needed, while core 1 can continue running its task (or also perform a switch if a tick on core 1 causes a higher priority task to be ready on that core).

### 16.3 Board Support Package (BSP)

The BSP is responsible for low-level hardware initialization, such as setting up interrupt vectors, timers, clocks, and memory, as well as booting secondary cores in a multi-core system. For the dual-core MIPS like architecture, we extended the BSP to initialize both cores and ensure the system is set up properly before the scheduler starts.

Key responsibilities of the BSP initialization include:

- Setting the CPU clock frequency and configuring the system timer (SysTick) for periodic tick interrupts on each core.
- Initializing the interrupt controller and defining the interrupt service routines (ISRs) for timer interrupts and software interrupts (used for yields or inter-core interrupts).
- Releasing the second core from reset and starting its execution. Often, the primary core (core 0) brings up core 1 by setting its start address or using a special inter-processor interrupt.
- Initializing hardware peripherals (like UART for debugging, GPIOs, etc.) and setting up any required memory regions or cache configuration.

Below is a simplified code snippet for the BSP initialization on the dual-core MIPS like system:

```
1 void bsp_init(void) {
2     // Initialize system clock and timer
3     initSystemClock();
4     initSysTickTimer(); // Configure SysTick timer for periodic
5         // interrupts
6
7     // Set up the interrupt vector table and handlers for both cores
8     initInterruptController();
9     registerInterruptHandler(SYS_TICK_INT, SysTick_Handler);
10    registerInterruptHandler(SOFTWARE_INT, Yield_Handler);
11
12    // Boot up the second core (core 1) if not already running
13    startSecondaryCore(); // e.g., release core 1 from reset and set
14        // its start address
15
16    // Initialize memory management unit (if applicable) and caches
17    initMemorySubsystem();
18
19    // Initialize other board peripherals (UART, GPIO, etc.)
20    initUART();
21    initGPIO();
22}
```

In the above code, `SysTick_Handler` would be the ISR that triggers on each timer tick to increment the OS tick count and possibly request a context switch. The `Yield_Handler` might be a software interrupt used to handle explicit yield calls (for example, from `portYIELD()` in the context switcher or when a task yields). The function `startSecondaryCore()` is hardware-specific; on our platform, it writes the starting program counter for core 1 and then releases core 1 from reset, causing it to begin execution (where it will then call `bsp_init()` for core 1 or a simplified startup routine that joins the scheduler).

By the end of `bsp_init()`, both cores are up and running, the system timer and interrupts are configured, and the FreeRTOS scheduler can be started to begin multitasking on both cores.

## 16.4 Synchronization Mechanisms

In a dual-core system, tasks may need to communicate or share resources (like memory, peripherals, etc.), so synchronization is crucial. We utilize FreeRTOS mechanisms like semaphores and mutexes to prevent race conditions and ensure that tasks on different cores do not interfere with each other. FreeRTOS provides thread-safe synchronization primitives which we use without modification, as they are inherently safe to use across multiple cores as long as the kernel is properly managing critical sections and interrupts.

For example, if two tasks (possibly running on different cores) need to access a shared resource, we protect that access with a mutex. If one task holds the mutex, another task (even on the other core) that tries to take the mutex will block until the mutex is released. The FreeRTOS kernel handles the blocking and waking of tasks under the hood, even in our dual-core context.

Below is a simple example of a task function using a mutex for synchronization:

```
1 SemaphoreHandle_t xMutex; // global mutex handle
2
3 void task_function(void *pvParameters) {
4     // Task code that needs to access a shared resource
5     for(;;) {
6         xSemaphoreTake(xMutex, portMAX_DELAY); // Acquire the mutex (
7             // block indefinitely until available)
8         // --- Begin Critical Section: access shared resource here ---
9         sharedVariable++;
10        // (perform other operations on shared resource)
11        // --- End Critical Section ---
12        xSemaphoreGive(xMutex); // Release the mutex
13
14        // Continue with non-critical work or yield
15        vTaskDelay(10);
16    }
}
```

In this example, `xMutex` is a binary mutex (created elsewhere using `xSemaphoreCreateMutex()`) that protects access to `sharedVariable`. Even if two instances of `task_function` run on different cores, the mutex ensures that only one at a time enters the critical section. FreeRTOS handles the blocking of the second task and the resumption once the mutex is free. This prevents race conditions and ensures data integrity.

For other synchronization needs, we similarly use semaphores (for signaling events between tasks or ISRs and tasks) and queues (for message passing). These facilities are part of FreeRTOS and work in a multi-core setup with the kernel managing atomic access by briefly disabling interrupts or using atomic instructions where appropriate.

## Chapter 17

# FreeRTOS Port-Specific Definitions

The following definitions are crucial for the FreeRTOS implementation on the dual-core MIPS like system (these come from FreeRTOS configuration and port layer):

- `configUSE_16_BIT_TICKS`: Defines whether the system uses 16-bit or 32-bit tick counters. If set to 1, the tick count (and related timing calculations) use 16-bit integers, otherwise 32-bit (default for our 32-bit MIPS like system is 0, using 32-bit ticks).
- `portNUM_CONFIGURABLE_REGIONS`: Defines the number of memory regions that can be configured for Memory Protection Unit (MPU) support. This is relevant if using FreeRTOS MPU features. For our purposes, this may remain at the default (typically 0 or a small number) since standard FreeRTOS ports without MPU do not use configurable memory regions.
- `TickType_t`: The data type used for the OS tick count. It is an unsigned integer type. With `configUSE_16_BIT_TICKS` set to 0, `TickType_t` is typically defined as `uint32_t` (32-bit unsigned).
- `UBaseType_t`: An unsigned base type used by FreeRTOS for variables like loop counters, task priorities, and handle counts. On a 32-bit architecture like MIPS like, `UBaseType_t` is typically a `uint32_t`. (On 64-bit or 8-bit architectures, this type adjusts accordingly to the natural word size for efficiency.)
- `StackType_t`: The data type used for a task's stack entries. On a 32-bit system, this is usually defined as `uint32_t` (since the stack is an array of 32-bit words). Each task has an array of `StackType_t` allocated for its stack.

These definitions ensure that the kernel and application use consistent data types. For example, using a 32-bit tick count allows our system to run for a very long time before the tick count overflows (which is important for long uptimes), whereas a 16-bit tick count would overflow faster (after 65535 ticks). We chose to use 32-bit ticks given the capability of our 32-bit MIPS like cores and the requirement for long-running tasks.

# Chapter 18

## Verification via x86 Simulation

### 18.1 Purpose of Integration

Before deploying our FreeRTOS-based OS on the actual dual-core MIPS like hardware, we wanted to thoroughly test its functionality and correctness. For this purpose, we integrated FreeRTOS with an x86 architecture environment (specifically, using the Windows FreeRTOS simulator). This allowed us to simulate the dual-core behavior and verify the scheduler, context switching, and synchronization in a controlled setting where debugging is easier. By using a PC-based simulation, we could also gather performance metrics and identify issues early, without needing the physical hardware powered on for every test.

### 18.2 Changes to FreeRTOS for x86

Running FreeRTOS on a Windows (x86) environment required some adaptations since a standard PC/Windows is not a real-time system and does not provide the same hardware features as a microcontroller. We made the following changes for the simulation:

- **Timer Simulation:** Replaced the hardware SysTick timer with the Windows `SetTimer` function to generate periodic callbacks. This simulates the periodic tick interrupt by invoking a callback function at a regular interval (in our case, 1 ms for a 1 kHz tick rate).
- **Interrupt Handling:** Because we cannot directly manipulate hardware interrupts in user-mode Windows, we simulated interrupts using threads or timer callbacks. The FreeRTOS tick increment and context switch trigger are called from the `SetTimer` callback (which acts as our tick ISR). Additionally, to simulate a software interrupt or yield, we used a Windows event or a higher-priority thread to trigger context switches.
- **Task Execution as Threads:** Each FreeRTOS task is run as a separate Windows thread at a low priority. The FreeRTOS scheduler doesn't truly context-switch the CPU as it would on a microcontroller; instead, it controls the suspension and resumption of these threads to simulate task switching. This approach is used in the FreeRTOS Windows port so that tasks can be managed in a way that respects the FreeRTOS scheduling decisions.
- **Simulated Peripherals:** For example, where our MIPS like hardware might use a UART or GPIO, in the simulation we use standard I/O calls. For instance, a task toggling an LED on hardware would simply print a message or toggle a GUI element on Windows. We implemented dummy functions for GPIO and UART that print messages to the console, to simulate the hardware behavior.

These modifications allow the core logic of our OS to run unmodified, with only the hardware-specific parts abstracted. The FreeRTOS kernel itself (scheduler, queues, semaphores, etc.) remains the same; we primarily changed the *port layer* (the part of FreeRTOS that is specific to the CPU architecture).

## 18.3 Simulation Code

In the Windows simulation, the key part of the port layer is setting up the timer to generate tick interrupts and handling context switches. We use the Win32 API function `SetTimer` to call a function `vPortIncrementTick` at a regular interval (1 ms). This function increments the FreeRTOS tick and calls the scheduler. If a context switch is needed (i.e., a higher-priority task was woken by the tick), it triggers a yield to switch tasks.

Below is a snippet of the code used in the Win32 port for the tick simulation and scheduler start:

```
1 // This function is called by the Windows timer every 1ms to simulate
2 // the SysTick.
3 void vPortIncrementTick(HWND hwnd, UINT uMsg, UINT_PTR idEvent, DWORD
4 dwTime) {
5     // Increment the FreeRTOS tick count
6     if (xTaskIncrementTick() != pdFALSE) {
7         // If a context switch is required (i.e., a task with higher
8         // priority than current was unblocked)
9         portYIELD(); // request a context switch yield
10    }
11 }
12
13 // Override the FreeRTOS scheduler start function for Windows port
14 void vTaskStartScheduler(void) {
15     // Create a Windows timer that calls vPortIncrementTick every 1 ms
16     SetTimer(NULL, 0, 1, (TIMERPROC) vPortIncrementTick);
17     // Start the first task manually (this sets up the initial task
18     // context on the current thread)
19     vTaskStartFirstTask();
20     // The scheduler is now running. In the Windows port, tasks are
21     // managed as threads.
22     // vTaskStartScheduler will not return unless an error occurs.
23 }
```

In the above code:

- `xTaskIncrementTick()` is a FreeRTOS kernel function that increments the tick count and checks if any task's delay has expired or if any timeouts occurred. It returns `pdTRUE` if a context switch is needed at the end of the tick.
- `portYIELD()` in the Windows port will signal the FreeRTOS scheduler to switch tasks. In the MIPS like hardware, this would correspond to triggering a software interrupt or using the `Yield` assembly instruction.
- `vTaskStartFirstTask()` is a FreeRTOS internal function that starts the first task running. In the Windows port, this sets up the current thread context to the first FreeRTOS task. After this, the FreeRTOS scheduler logic (now driven by the Windows timer interrupts) will take over managing which task (thread) runs.

With this setup, our tasks can be created and run in the Windows environment as if they were running under the FreeRTOS scheduler on actual hardware. The timing won't be real-time exact (Windows is not a real-time OS, so the 1ms tick can jitter), but it's sufficient for functional testing and approximate performance measurement.

## 18.4 Testing and Results

To validate the system on the x86 simulation, we set up a benchmark scenario with two example tasks. The goal was to stress the scheduler and measure context switch times, CPU utilization, and general stability.

### 18.4.1 Benchmark Task Scenario

We created two tasks for the test:

- **TaskA:** A high-priority task that continually increments a counter and periodically yields. This task simulates a CPU-intensive high-priority workload.
- **TaskB:** A low-priority task that periodically prints the counter value. This task simulates a background task that should be preempted whenever TaskA is ready to run.

By running these two tasks, we force frequent context switches (since TaskA will run, then yield or block briefly, allowing TaskB to run, and then TaskA immediately preempts again due to its higher priority). This pattern is ideal for measuring the context switch overhead and scheduler efficiency.

Below is the code for these two tasks and the `main` function that sets up the test in the simulation environment:

```
1 #include <stdio.h>
2 #include "FreeRTOS.h"
3 #include "task.h"
4 #include "semphr.h"
5
6 // Shared counter and mutex for thread-safe access
7 static int sharedCounter = 0;
8 static SemaphoreHandle_t xMutex;
9
10 // High-priority task: increments a counter continuously
11 void vHighPriorityTask(void *pvParameters) {
12     for (;;) {
13         // Simulate work by incrementing a shared counter
14         xSemaphoreTake(xMutex, portMAX_DELAY);
15         sharedCounter++;
16         xSemaphoreGive(xMutex);
17         // Yield to let other tasks run (simulating periodic work)
18         vTaskDelay(1); // delay 1 tick (1ms in our tick rate)
19     }
20 }
21
22 // Low-priority task: periodically prints the counter value
23 void vLowPriorityTask(void *pvParameters) {
24     for (;;) {
25         xSemaphoreTake(xMutex, portMAX_DELAY);
26         int value = sharedCounter;
27         xSemaphoreGive(xMutex);
28         // Print the counter (simulating output, e.g., to UART)
29         printf("LowPriorityTask: Counter = %d\n", value);
30         // Delay for some ticks to simulate slower periodic activity
31         vTaskDelay(5); // delay 5 ticks (5ms)
32     }
33 }
34
35 int main(void) {
36     // Initialize the mutex before creating tasks
37     xMutex = xSemaphoreCreateMutex();
38
39     // Create the high and low priority tasks
40     // Higher number means higher priority in FreeRTOS (
41     // configMAX_PRIORITIES is configured accordingly)
42     xTaskCreate(vHighPriorityTask, "HighTask", configMINIMAL_STACK_SIZE
43                 , NULL, 2, NULL);
```

```

42     xTaskCreate(vLowPriorityTask, "LowTask", configMINIMAL_STACK_SIZE,
43                 NULL, 1, NULL);
44
45     // Start the FreeRTOS scheduler (this will also start the Windows
46     // timer for ticks)
47     vTaskStartScheduler();
48
49     // We should never reach here as the scheduler will be running
50     // If we do, it likely means there was insufficient heap memory to
51     // start the scheduler.
52     return 0;
53 }
```

In the above code, `vHighPriorityTask` and `vLowPriorityTask` operate on a shared counter protected by a mutex (`xMutex`). TaskA (`HighTask`) runs with higher priority (priority 2) and increments the counter every iteration, yielding briefly with `vTaskDelay(1)` to allow TaskB to run. TaskB (`LowTask`) runs at lower priority (priority 1) and prints the counter value every 5 ticks. In a correct implementation, TaskB will only run when TaskA is in *delay* state, and as soon as TaskA becomes ready again (after 1 tick), it will preempt TaskB.

We instrumented the code to measure performance metrics. For example, to measure context switch latency, we recorded timestamps (using high-resolution timers provided by Windows) around the `portYIELD()` call and the actual task start to calculate how long the context switch took. We also measured CPU utilization of the simulation program to estimate scheduler overhead.

## Chapter 19

# Software Implementation Conclusion

In this project, we extended FreeRTOS to support a dual-core MIPS like architecture by modifying the scheduler, context switching mechanism, and BSP initialization. We tried validating our implementation by integrating it with an x86 (Windows) simulation, which allowed us to run the OS in a controlled environment. The simulation was done to prove that tasks are scheduled correctly across both cores, context switching overhead is low, and synchronization between tasks works reliably.

We expect normal behavior on hardware, with potentially even better performance due to the real MIPS like cores and lack of underlying OS overhead. This implementation demonstrates that FreeRTOS can be adapted to multi-core processors. Future work could involve further optimizing the scheduler for load balancing between cores and exploring more complex inter-core communication mechanisms for enhanced performance.