Majorana

⚠️ This project is no longer maintained. Instead, I made an enriched version in Go: teivah/majorana.

Majorana

Majorana is a RISC-V virtual machine, written in Rust.

Majorana Virtual Machine (MVM)

MVM-1

MVM-1 is the first version of the RISC-V virtual machine. It does not implement any of the known CPU optimizations such as pipelining, out-of-order execution, multiple execution units, etc.

Here is the architecture, divided into 4 classic stages:

Fetch: fetch an instruction from the main memory
Decode: decode the instruction
Execute: execute the RISC-V instruction
Write: write-back the result to a register or the main memory

   +-------+
   | Fetch |
   +---+---+
       |
       |
   +---v----+
   | Decode |
   +---+----+
       |
       |
+------v--------+
|     ALU       |
|  +---------+  |
|  | Execute |  |
|  +---------+  |
+------+--------+
       |
       |
   +---v---+
   | Write |
   +-------+

MVM-2

Compared to MVM-1, we add a cache for instructions called L1I (Level 1 Instructions) with a size of 64 KB. The caching policy is straightforward: as soon as we meet an instruction that is not present in L1I, we fetch a cache line of 64 KB instructions from the main memory, and we cache it into LI1.

+-----+     +-------+
| L1I <-----+ Fetch |
+-----+     +---+---+
                |
                |
            +---v----+
            | Decode |
            +---+----+
                |
                |
         +------v--------+
         |     ALU       |
         |  +---------+  |
         |  | Execute |  |
         |  +---------+  |
         +---------------+
                |
                |
            +---v---+
            | Write |
            +-------+

MVM-3

MVM-3 keeps the same architecture than MVM-2 with 4 stages and L1I. Yet, this version implements pipelining.

In a nutshell, pipelining allows keeping every stage as busy as possible. For example, as soon as the fetch unit has fetched an instruction, it will not wait for the instruction to be decoded, executed and written. It will fetch another instruction straight away during the next cycle(s).

This way, the first instruction can be executed in 4 cycles (assuming the fetch is done from L1I), whereas the next instructions will be executed in only 1 cycle.

One of the complexity with pipelining is to handle conditional branches. What if we fetch a bge instruction for example? The next instruction fetched will not be necessarily the one we should have fetched/decoded/executed/written. As a solution, we implemented the first version of branch prediction handled by the Branch Unit.

The Branch Unit takes the hypothesis that a condition branch will not be taken. Hence, after having fetched an instruction, regardless if it's a conditional branch, we will fetch the next instruction after it. If the prediction was wrong, we need to flush the pipeline, revert the program counter to the destination marked by the conditional branch instruction, and continue the execution.

Of course, pipeline flushing has an immediate performance impact. Modern CPUs have a branch prediction mechanism that is move evolved than MVM-3.

There is another problem with pipelining. We might face what we call a data hazard. For example:

addi t1, zero, 2
div t1, t0, t1

The processor must wait for ADDI to be executed and to get its result written in T1 before to execute DIV (as div depends on T1). In this case, we implement what we call pipeline interclock by delaying the execution of DIV.

+-----+     +-------+
| L1I <-----+ Fetch +------------+
+-----+     +---+---+            |
                |         +------v------+
                |         | Branch Unit |
                |         +-------------+
            +---v----+           |
            | Decode <-----------+
            +---+----+
                |
                |
         +------v--------+
         |     ALU       |
         |  +---------+  |
         |  | Execute |  |
         |  +---------+  |
         +---------------+
                |
                |
            +---v---+
            | Write |
            +-------+

Benchmarks

All the benchmarks are executed at a fixed CPU clock frequency: 2.3 GHz.

Meanwhile, we have executed a benchmark on an Intel i5-7360U (same CPU clock frequency). This benchmark was on a different architecture, different ISA, etc. is hardly comparable with the MVM benchmarks. Yet, it gives us a vague reference to show how good (or bad :) the MVM implementations are.

Is Prime Number

RISC source: prime-number.asm

Machine	n=1109
i5-7360U	253 ns
MVM-1	64100 ns, ~253 times slower
MVM-2	4939 ns, ~19 times slower
MVM-3	2260 ns, ~9 times slower

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
res/risc		res/risc
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

res/risc

res/risc

src

src

.gitignore

.gitignore

Cargo.lock

Cargo.lock

Cargo.toml

Cargo.toml

README.md

README.md

Repository files navigation

Majorana

Majorana Virtual Machine (MVM)

MVM-1

MVM-2

MVM-3

Benchmarks

Is Prime Number

About

Releases

Packages

Languages

teivah/ettore

Folders and files

Latest commit

History

Repository files navigation

Majorana

Majorana Virtual Machine (MVM)

MVM-1

MVM-2

MVM-3

Benchmarks

Is Prime Number

About

Topics

Resources

Stars

Watchers

Forks

Languages