Skip to content
This repository has been archived by the owner on Apr 3, 2024. It is now read-only.

teivah/ettore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚠️ This project is no longer maintained. Instead, I made an enriched version in Go: teivah/majorana.

Majorana

Majorana is a RISC-V virtual machine, written in Rust.

Majorana Virtual Machine (MVM)

MVM-1

MVM-1 is the first version of the RISC-V virtual machine. It does not implement any of the known CPU optimizations such as pipelining, out-of-order execution, multiple execution units, etc.

Here is the architecture, divided into 4 classic stages:

  • Fetch: fetch an instruction from the main memory
  • Decode: decode the instruction
  • Execute: execute the RISC-V instruction
  • Write: write-back the result to a register or the main memory
   +-------+
   | Fetch |
   +---+---+
       |
       |
   +---v----+
   | Decode |
   +---+----+
       |
       |
+------v--------+
|     ALU       |
|  +---------+  |
|  | Execute |  |
|  +---------+  |
+------+--------+
       |
       |
   +---v---+
   | Write |
   +-------+

MVM-2

Compared to MVM-1, we add a cache for instructions called L1I (Level 1 Instructions) with a size of 64 KB. The caching policy is straightforward: as soon as we meet an instruction that is not present in L1I, we fetch a cache line of 64 KB instructions from the main memory, and we cache it into LI1.

+-----+     +-------+
| L1I <-----+ Fetch |
+-----+     +---+---+
                |
                |
            +---v----+
            | Decode |
            +---+----+
                |
                |
         +------v--------+
         |     ALU       |
         |  +---------+  |
         |  | Execute |  |
         |  +---------+  |
         +---------------+
                |
                |
            +---v---+
            | Write |
            +-------+

MVM-3

MVM-3 keeps the same architecture than MVM-2 with 4 stages and L1I. Yet, this version implements pipelining.

In a nutshell, pipelining allows keeping every stage as busy as possible. For example, as soon as the fetch unit has fetched an instruction, it will not wait for the instruction to be decoded, executed and written. It will fetch another instruction straight away during the next cycle(s).

This way, the first instruction can be executed in 4 cycles (assuming the fetch is done from L1I), whereas the next instructions will be executed in only 1 cycle.

One of the complexity with pipelining is to handle conditional branches. What if we fetch a bge instruction for example? The next instruction fetched will not be necessarily the one we should have fetched/decoded/executed/written. As a solution, we implemented the first version of branch prediction handled by the Branch Unit.

The Branch Unit takes the hypothesis that a condition branch will not be taken. Hence, after having fetched an instruction, regardless if it's a conditional branch, we will fetch the next instruction after it. If the prediction was wrong, we need to flush the pipeline, revert the program counter to the destination marked by the conditional branch instruction, and continue the execution.

Of course, pipeline flushing has an immediate performance impact. Modern CPUs have a branch prediction mechanism that is move evolved than MVM-3.

There is another problem with pipelining. We might face what we call a data hazard. For example:

addi t1, zero, 2
div t1, t0, t1

The processor must wait for ADDI to be executed and to get its result written in T1 before to execute DIV (as div depends on T1). In this case, we implement what we call pipeline interclock by delaying the execution of DIV.

+-----+     +-------+
| L1I <-----+ Fetch +------------+
+-----+     +---+---+            |
                |         +------v------+
                |         | Branch Unit |
                |         +-------------+
            +---v----+           |
            | Decode <-----------+
            +---+----+
                |
                |
         +------v--------+
         |     ALU       |
         |  +---------+  |
         |  | Execute |  |
         |  +---------+  |
         +---------------+
                |
                |
            +---v---+
            | Write |
            +-------+

Benchmarks

All the benchmarks are executed at a fixed CPU clock frequency: 2.3 GHz.

Meanwhile, we have executed a benchmark on an Intel i5-7360U (same CPU clock frequency). This benchmark was on a different architecture, different ISA, etc. is hardly comparable with the MVM benchmarks. Yet, it gives us a vague reference to show how good (or bad :) the MVM implementations are.

Is Prime Number

RISC source: prime-number.asm

Machine n=1109
i5-7360U 253 ns
MVM-1 64100 ns, ~253 times slower
MVM-2 4939 ns, ~19 times slower
MVM-3 2260 ns, ~9 times slower

Releases

No releases published

Packages

No packages published