Refactoring RISC-V emulation APIs for easier adoption and porting #310

ChinYikMing · 2023-12-25T18:09:43Z

First trial of refactoring, the wasm branch's latest commit is the result.

Since state_t is a user-provided data, so all runtime defined value(often change) shall be stored there. For instance, the emulated target program's argc and argv, and the emulator's parameter. The following have been adjusted to reflect the changes:

state_t *state_new(void) -----> state_t *state_new(uint32_t mem_size, int argc, char **argv, bool allow_misalign, bool quiet_output)
mem_size is used for memory_new because different runtimes may have memory requirements (for example, the page size in WebAssembly is 64KiB), the default MEM_SIZE(2^32 - 1) is not appropriate for that. The rest of parameters are runtime defined value.
riscv_t *rv_create(const riscv_io_t *io, riscv_user_t userdata, int argc, char **args, bool output_exit_code) -----> riscv_t *rv_create(riscv_user_t userdata)
Much cleaner function signature.
void rv_reset(riscv_t *rv, riscv_word_t pc, int argc, char **args) -----> void rv_reset(riscv_t *rv, riscv_word_t pc)
We can use rv->userdata to get the required argc and argv.
Since memory I/O handlers are rarely changed, it makes less sense to define them during runtime (makes porting difficult). Instead, I believe it is preferable to link them during build time. If really want to change the implementations, make a new C file and link it during build time might be a better choice.
To do this, some changes are made:
- define memory I/O handlers in rv_create and link during build time
- to make memory write interfaces match to compatible with the function pointers, MEM_WRITE_IMPL macro has to be changed:
- src/io.c

// old
#define MEM_WRITE_IMPL(size, type)                                 \
    void memory_write_##size(uint32_t addr, const uint8_t *src)    \
    {                                                              \
        *(type *) (data_memory_base + addr) = *(const type *) src; \
    }

// new
#define MEM_WRITE_IMPL(size, type)                                 \
    void memory_write_##size(uint32_t addr, const type src)    \
    {                                                              \
        *(type *) (data_memory_base + addr) = src; \
    }

the calling of memory_write_w in "src/syscall.c" shall be changed accordingly:
src/syscall.c

// old
memory_write_w(tv + 0, (const uint8_t *) &tv_s.tv_sec);

// new
memory_write_w(tv + 0, *((const uint32_t *) &tv_s.tv_sec));

For notably change, the "pre.js" of the wasm branch do not define IO on its own anymore compare to first attempt.(more abstraction)

Change all uint32_t and uint16_t and uint8_t in riscv.[ch] to riscv_word_t and riscv_half_t and riscv_byte_t in function signature respectively for consistency.
bool elf_load(elf_t *e, riscv_t *rv, memory_t *mem); -----> bool elf_load(elf_t *e, riscv_t *rv);
The memory instance required by elf_load can be accessed via rv's userdata.

I am wondering shall we abstract the FILE defined in state_new as a parameter of state_new. Without abstraction, the emulator always depends on standard io(e.g., stdin, stdout, stderr). What if the user want to use a file instead of stdout?

Related to: #75

The text was updated successfully, but these errors were encountered:

jserv · 2023-12-25T18:26:15Z

I would like to invite @RinHizakura, @qwe661234, and @visitorckw to join the discussion and contribute to the refinement of the API.

jserv · 2023-12-25T18:29:30Z

I am wondering shall we abstract the FILE defined in state_new as a parameter of state_new. Without abstraction, the emulator always depends on standard io(e.g., stdin, stdout, stderr). What if the user want to use a file instead of stdout?

In the initial stages of developing this emulator, I redirected I/O operations to files for comparison purposes. However, I now recognize that this approach to the function interface was not as flexible as I had initially thought.

ChinYikMing · 2023-12-26T05:11:18Z

I am wondering shall we abstract the FILE defined in state_new as a parameter of state_new. Without abstraction, the emulator always depends on standard io(e.g., stdin, stdout, stderr). What if the user want to use a file instead of stdout?

In the initial stages of developing this emulator, I redirected I/O operations to files for comparison purposes. However, I now recognize that this approach to the function interface was not as flexible as I had initially thought.

So, it could be useful to provide an abstract way for defining the desired stdin, stdout, stderr, or more than just three of them. stdin, stdout, stderr can be set as default if any spefication of them is not given.

RinHizakura · 2023-12-26T15:47:50Z

Since memory I/O handlers are rarely changed, it makes less sense to define them during runtime (makes porting difficult). Instead, I believe it is preferable to link them during build time.

I think the distinction between modules is a little bit unclear in the current design of rv32emu. On the current design, if we regard riscv.c as the part of the library and main.c as the part of the application using the library. Although rv_create() seems to allow the application to customize memory operations through io in a pointer manner, the operation on simulated memory actually must be bound to the instance created by state_new(), which is belongs to the library side. This leads to limitations for customizing io. For example, what if you want to use a backup file to simulate memory? This design seems to make memory operations using function pointers io redundant.

I believe it is preferable to link them during build time.

So, if it doesn't matter to provide user-specific operations on memory, providing them on the build time for the library will also be a great solution.

RinHizakura · 2023-12-26T15:54:11Z

mem_size is used for memory_new because different runtimes may have memory requirements (for example, the page size in WebAssembly is 64KiB), the default MEM_SIZE(2^32 - 1) is not appropriate for that.

Not quite sure about whether changing the memory size directly is safe. As I remember, some implementations of rv32emu intensionally rely on the fact that the memory size is 2^32 - 1 to have some trick. Or maybe I mix up with some project else. Looking for others to answer the question.

ChinYikMing · 2023-12-26T16:33:06Z

mem_size is used for memory_new because different runtimes may have memory requirements (for example, the page size in WebAssembly is 64KiB), the default MEM_SIZE(2^32 - 1) is not appropriate for that.

Not quite sure about whether changing the memory size directly is safe. As I remember, some implementations of rv32emu intensionally rely on the fact that the memory size is 2^32 - 1 to have some trick. Or maybe I mix up with some project else. Looking for others to answer the question.

The built-in ELF programs do not seem to need a lot of memory so I think 2GB - 4GB is a safe region. Dynamically changing the memory size in different runtime might be needed. For example, 64KiB multiples should be used in WebAssembly. The MEM_SIZE is set to 2^32 originally in #151 as the memory size for preallocating memory to prevent extra checking when manipulating the memory region. Then, MEM_SIZE is set to 2^32 - 1 in #221 to compatible with emcc which default build target is wasm32 ( memory shall < 4GB ).

ChinYikMing · 2023-12-27T05:31:15Z

I would like to introduce cycles_per_step into state_t structure since it can be varied. For example, in web-based simulation, the user might want to increase the cycles_per_step to jump quicker to the desired part of execution to see the register file or memory bank status. For better abstraction, it could be possible to add a pair of getters and setters.

Then, rv_step signature can be refactored to have only one parameter: void rv_step(riscv_t *rv). The cycles_per_step can be retrieved via rv->userdata

ChinYikMing · 2023-12-27T06:18:40Z

rv_enables_to_output_exit_code could be renamed as something like rv_get_xxx. Same rules might be applied to other fields of state_t to improve consistency. rv_set_xxx can be the setter.

For example:

rv_get_userdata / rv_set_userdata
rv_get_pc / rv_set_pc
rv_get_reg / rv_set_reg
rv_get_halt_status / rv_set_halt_status
rv_get_cycle_per_step / rv_set_cycle_per_step
rv_get_output_exit_code_flag / rv_set_output_exit_code_flag
rv_get_allow_misalign_flag / rv_set_allow_misalign_flag
...

jserv · 2023-12-30T02:13:40Z

I would like to introduce cycles_per_step into state_t structure since it can be varied. For example, in web-based simulation, the user might want to increase the cycles_per_step to jump quicker to the desired part of execution to see the register file or memory bank status. For better abstraction, it could be possible to add a pair of getters and setters.

Then, rv_step signature can be refactored to have only one parameter: void rv_step(riscv_t *rv). The cycles_per_step can be retrieved via rv->userdata

I agree. By the way, state_t might not be a very self-explanatory name. I am considering unifying it into a VM-specific structure.

jserv · 2024-01-26T19:17:05Z

The repository mnurzia/rv serves as an additional reference for API refinement. It features three main APIs:

Memory Access Callback: This function processes data as input/output and returns RV_BAD in case of a fault. It's defined as:
typedef rv_res (*rv_bus_cb)(void *user, rv_u32 addr, rv_u8 *data, rv_u32 is_store, rv_u32 width);
CPU Initialization: This function initializes the CPU and can be called again on the cpu object to reset it. The function signature is:
void rv_init(rv *cpu, void *user, rv_bus_cb bus_cb);
CPU Single-Step: This function advances the CPU by one step and returns RV_E * in case of an exception. Its definition is:
rv_u32 rv_step(rv *cpu);

These APIs collectively provide a structure for memory access, CPU initialization, and step-wise execution in the CPU simulation.

ChinYikMing · 2024-01-27T06:43:05Z

As previously suggested, the maximum memory (MEM_SIZE) of a virtual machine (VM) shall be determined by the application. If these modifications are made, the Makefile-defined default stack size shall also be adjusted.

Makefile:

# Set the default stack pointer
...
CFLAGS += -D DEFAULT_STACK_ADDR=0xFFFFE000
# Set the default args starting address
CFLAGS += -D DEFAULT_ARGS_ADDR=0xFFFFF000
...

Thus, adjusting stack size should be a part in public API.

ChinYikMing · 2024-01-28T14:02:52Z

I would like to introduce cycles_per_step into state_t structure since it can be varied. For example, in web-based simulation, the user might want to increase the cycles_per_step to jump quicker to the desired part of execution to see the register file or memory bank status. For better abstraction, it could be possible to add a pair of getters and setters.
Then, rv_step signature can be refactored to have only one parameter: void rv_step(riscv_t *rv). The cycles_per_step can be retrieved via rv->userdata

I agree. By the way, state_t might not be a very self-explanatory name. I am considering unifying it into a VM-specific structure.

I think vm_attr_t can be the candidate for the name ( inspired by pthread_attr_t ).

Currently, vm_attr_t should consist of the following:

vm RAM size (if previous concern is OK)
vm STACK size (if vm RAM size changes)
vm-specific argc, argv
error code (to represent the exit state of vm)
enable_outout_exit_code
logging level
union of target ELF program and target vm

union {
    rv_struct_t rv_struct;
    vm_struct_t vm_struct;
};

typedef struct rv_struct {
     char *elf_program;
} rv_struct_t;

typedef struct vm_struct {
    kernel_img;
    dtb;
    rootfs_img;
} vm_struct_t ;

cycle_per_step
enable_misaligned

I would like to introduce the sixth attribute of vm_attr_t which allows the user to select how vm should log, just like printk log level of Linux kernel. This logging level will register corresponding handler during rv_init initialization. This feature enable the user has more flexibility to observe the vm state or error reporting. The sixth attribute of vm_attr_t allows to differentiate RISC-V program or RISC-V system emulation, then rv_create return a corresponding internal structure (riscv_internal or vm_internal), of course they are forward declaration structure.

Prefix of all vm-related functions should be consistent ( more discussion ).

jserv · 2024-01-29T00:30:00Z

state_t might not be a very self-explanatory name. I am considering unifying it into a VM-specific structure.

I think vm_attr_t can be the candidate for the name ( inspired by pthread_attr_t ).

It sounds promising. Please send pull request(s) to refine APIs.

Currently, vm_attr_t should consist of the following:

vm RAM size (if previous concern is OK)

vm STACK size (if vm RAM size changes)

vm-specific argc, argv

How about envp?

error code (to represent the exit state of vm)

enable_outout_exit_code

enable_outout_exit_code looks hacky. Can you show something detailed?

ChinYikMing · 2024-01-29T10:06:21Z

vm RAM size (if previous concern is OK)

vm STACK size (if vm RAM size changes)

vm-specific argc, argv

How about envp?

Since the envp is not accessible for now, place a TODO in vm_attr_t might be decent.

enable_outout_exit_code

enable_outout_exit_code looks hacky. Can you show something detailed?

It is related to syscall_exit to determine whether to output the exit code. I think always output the exit code is not a bad thing, maybe this is redundant. Or, it can be determined on top of logging feature.

jserv · 2024-01-30T09:35:28Z

I think always output the exit code is not a bad thing, maybe this is redundant. Or, it can be determined on top of logging feature.

After streamlining the API, we can control the exit code by storing it in a specific structure, instead of displaying it directly in the console.

ChinYikMing · 2024-01-31T07:30:07Z

I am wondering shall we abstract the FILE defined in state_new as a parameter of state_new. Without abstraction, the emulator always depends on standard io(e.g., stdin, stdout, stderr). What if the user want to use a file instead of stdout?

In the initial stages of developing this emulator, I redirected I/O operations to files for comparison purposes. However, I now recognize that this approach to the function interface was not as flexible as I had initially thought.

So, it could be useful to provide an abstract way for defining the desired stdin, stdout, stderr, or more than just three of them. Standard stdin, stdout, stderr can be set as default if any spefication of them is not given.

For abstracting file or file descriptor, we could have an attribute called bool use_default_stdin_stdout_stderr in vm_attr_t which will use common stdin, stdout and stderr. For alternative, we could have a function called vm_register_stdxxx makes the vm to register the non-common fd (e.g., regular file) before emulation.

RinHizakura · 2024-01-31T16:08:33Z

For abstracting file or file descriptor, we could have an attribute called bool use_default_stdin_stdout_stderr in vm_attr_t which will use common stdin, stdout and stderr. For alternative, we could have a function called vm_register_stdxxx makes the vm to register the non-common fd (e.g., regular file) before emulation.

Since we should also have to maintain the file descriptor if vm_register_stdxxx() for redirection, how about just adding three file descriptors in vm_attr_t:

struct {
    ...
    int stdin;
    int stdout;
    int stderr;
} vm_attr_t;

Note: The variable's name should be refined. I just can't come up with a suitable and concise name now

These file descriptors are assigned to 0, 1, and 2 by default, and modified if vm_register_stdxxx() it. So we don't need a redundant variable bool use_default_stdin_stdout_stderr for this feature.

ChinYikMing · 2024-01-31T16:22:30Z

For abstracting file or file descriptor, we could have an attribute called bool use_default_stdin_stdout_stderr in vm_attr_t which will use common stdin, stdout and stderr. For alternative, we could have a function called vm_register_stdxxx makes the vm to register the non-common fd (e.g., regular file) before emulation.

Since we should also have to maintain the file descriptor if vm_register_stdxxx() for redirection, how about just adding three file descriptors in vm_attr_t:
struct {
    ...
    int stdin;
    int stdout;
    int stderr;
} vm_attr_t;
Note: The variable's name should be refined. I just can't come up with a suitable and concise name now

These file descriptors are assigned to 0, 1, and 2 by default, and modified if vm_register_stdxxx() it. So we don't need a redundant variable bool use_default_stdin_stdout_stderr for this feature.

Thanks for tips! I think the boolean really redundant since vm_register_stdxxx() could overwrite them.

The following should be included in an emulator's simple and clear public API: 1. create/init core 2. run emulation 3. delete/destroy core Other components, including as memory, file systems, program data, etc., should be abstracted from the user; as a result, setting a configuration value (vm_attr_t) is sufficient. The user should concern about memory (state_t) and elf stuff before this PR. The user may just construct a core, run it, and shut it down after this PR, so they won't need to worry about them anymore. The vm_attr_t has multiple fields and they are commented clearly in the code. Note that logging feature and system emulator integration are not implemented yet. related: sysprog21#310

The following should be included in an emulator's simple and clear public API: 1. create/init core 2. run emulation 3. delete/destroy core Other components, including as memory, file systems, program data, etc., should be abstracted from the user; as a result, setting a configuration value (vm_attr_t) is sufficient. For stdio redirection, rv_register_stdio function is introduced. The user should concern about memory (state_t) and elf stuff before this PR. The user may just construct a core, run it, and shut it down after this PR, so they won't need to worry about them anymore. The vm_attr_t has multiple fields and they are commented clearly in the code. Note that logging feature and system emulator integration are not implemented yet. related: sysprog21#310

The following should be included in an emulator's simple and clear public API: 1. create/init core 2. run emulation 3. delete/destroy core Other components, including as memory, file systems, program data, etc., should be abstracted from the user, as a result, setting a configuration value (vm_attr_t) is sufficient. The user should manage about memory (state_t) and elf stuff before this PR. The user may just construct a core, run it, and shut it down after this PR, so they won't need to worry about them anymore. For stdio remapping, rv_remap_stdstream function is introduced. The vm_attr_t has multiple fields and they are commented clearly in the code. elf_t is reopened in run_and_trace and dump_test_signature because elf_t is allocated inside rv_create and they cannot access them. It is acceptable to reopen elf_t since they are only for testing and debugging. PRINT_EXIT_CODE build macro is introduced to enable syscall_exit to print exit code to console only during testing since the actual usage of exit code is really depending on applications. The io interface is not changed in this PR because it could maybe reused with semu in some way, still need to be investigated. Also, Logging feature and system emulator integration are not implemented yet. related: sysprog21#310

The following should be included in an emulator's simple and clear public API: 1. create/init core 2. run emulation 3. delete/destroy core Other components, including as memory, file systems, program data, etc., should be abstracted from the user, as a result, setting a configuration value (vm_attr_t) is sufficient. The user should manage about memory (state_t) and elf stuff before this PR. The user may just construct a core, run it, and shut it down after this PR, so they won't need to worry about them anymore. For stdio remapping, rv_remap_stdstream function is introduced. The vm_attr_t has multiple fields and they are commented clearly in the code. elf is reopened in run_and_trace and dump_test_signature because elf is allocated inside rv_create and they cannot access them. It is acceptable to reopen elf since they are only for testing and debugging. Print inferior exit code to console inside main instead of syscall_exit because the actual usage of exit code depends on applications of using riscv public API. The io interface is not changed in this PR because it could maybe reused with semu in some way, still need to be investigated. Also, Logging feature and system emulator integration are not implemented yet. related: sysprog21#310

The following should be included in an emulator's simple and clear public API: 1. create/init core 2. run emulation 3. delete/destroy core Other components, including as memory, file systems, program data, etc., should be abstracted from the user, as a result, setting a configuration value (vm_attr_t) is sufficient. The user should manage about memory (state_t) and elf stuff before this PR. The user may just construct a core, run it, and shut it down after this PR, so they won't need to worry about them anymore. For stdio remapping, rv_remap_stdstream function is introduced. The vm_attr_t has multiple fields and they are commented clearly in the code. elf is reopened in dump_test_signature because elf is allocated during rv_create. It is acceptable to reopen elf since it is only for testing. Print inferior exit code to console inside main instead of syscall_exit because the actual usage of exit code depends on applications of using riscv public API. The io interface is not changed in this PR because it could maybe reused with semu in some way, still need to be investigated. Also, Logging feature and system emulator integration are not implemented yet. related: sysprog21#310

The following should be included in an emulator's simple and clear public API: 1. create/init core 2. run emulation 3. delete/destroy core Other components, including as memory, file systems, program data, etc., should be abstracted from the user, as a result, setting a configuration value (vm_attr_t) is sufficient. The user should manage about memory (state_t) and elf stuff before this PR. The user may just construct a core, run it, and shut it down after this PR, so they won't need to worry about them anymore. The vm_attr_t has multiple fields and they are commented clearly in the code. As you can see in "main", there are various mode to run the emulator such as "run_and_trace", "gdbstub", and "profiling". Thus, a field call "run_flag" is introduced in vm_attr_t. For standard stream remapping, rv_remap_stdstream function is introduced. The emulator can remap default standard stream to required streams after creating the emulator by calling the rv_remap_stdstream function. elf is reopened in dump_test_signature because elf is allocated during rv_create. It is acceptable to reopen elf since it is only for testing. Print inferior exit code to console inside main instead of syscall_exit because the actual usage of exit code depends on applications of using riscv public API. The io interface is not changed in this PR because it could maybe reused with semu in some way, still need to be investigated. Also, Logging feature and system emulator integration are not implemented yet. A validator for validating the user-defined vm_attr_t might need to be introduced. related: sysprog21#310

The following should be included in an emulator's simple and clear public API: 1. create/init core 2. run emulation 3. delete/destroy core Other components, including as memory, file systems, program data, etc., should be abstracted from the user, as a result, setting a configuration value (vm_attr_t) is sufficient. The user should manage about memory (state_t) and elf stuff before this PR. The user may just construct a core, run it, and shut it down after this PR, so they won't need to worry about them anymore. The vm_attr_t has multiple fields and they are commented clearly in the code. As you can see in "main", there are various mode to run the emulator such as "run_and_trace", "gdbstub", and "profiling". Thus, a field call "run_flag" is introduced in vm_attr_t. For standard stream remapping, rv_remap_stdstream function is introduced. The emulator can remap default standard stream to required streams after creating the emulator by calling the rv_remap_stdstream function. rv_userdata has been dropped since PRIV macro is sufficient for internal implemntation. Also, application will not need to direct access it. elf is reopened in dump_test_signature because elf is allocated during rv_create. It is acceptable to reopen elf since it is only for testing. Print inferior exit code to console inside main instead of syscall_exit because the actual usage of exit code depends on applications of using riscv public API. The io interface is not changed in this PR because it could maybe reused with semu in some way, still need to be investigated. Also, Logging feature and system emulator integration are not implemented yet. A validator for validating the user-defined vm_attr_t might need to be introduced. related: sysprog21#310

It is not required to give an application the opportunity to bind IO handlers because IO handlers are rarely altered during the creation of a emulator. With this commit, the application can now build a emulator much more easier by only taking the emulator's attribute (vm_attr_t) into consideration. In order to facilitate further integration with the RISC-V system emulator (semu), I have included a TODO inside the IO interface. Related: sysprog21#310

It is not required to give an application the opportunity to bind I/O handlers because I/O handlers are rarely altered during the creation of a emulator. With this commit, the application can now build a emulator much more easier by only taking the emulator's attribute (vm_attr_t) into consideration. In order to facilitate further integration with the RISC-V system emulator (semu), I have included a TODO inside the I/O interface. Related: sysprog21#310

It is not required to give an application the opportunity to bind I/O handlers because I/O handlers are rarely altered during the creation of an emulator. With this commit, the application can now build a emulator much more easier by only taking the emulator's attribute (vm_attr_t) into consideration. In order to facilitate further integration with the RISC-V system emulator (semu), I have included a TODO inside the I/O interface. Related: sysprog21#310

jserv · 2024-03-12T04:08:10Z

mmu_fetch signature of semu is compatible with riscv_mem_ifetch by removing the vm and value parameter. The I/O interface is embedded inside riscv_t so vm parameter is no longer needed. The fetched value is returned

mmu_load signature of semu is compatible with riscv_mem_read_w, riscv_mem_read_s and riscv_mem_read_b by removing the vm, width, value and reserved parameter. The I/O interface is embedded inside riscv_t so vm param is no longer needed. The width parameter is not necessary since there are width related handlers(riscv_mem_read_w, riscv_mem_read_s and riscv_mem_read_b). The loaded value is returned. The registration of the 'reservation set' can be done in corresponding RVOP()(some fields might be added to riscv_t, e.g., reservation) so reserved parameter is no longer needed

mmu_store is similar to mmu_load

The proposal sounds great. I wonder how mmu_{fetch,load,store} can be interoperated with existing structure. Can you show more about function prototypes?

ChinYikMing · 2024-05-02T09:04:33Z

I wonder how mmu_{fetch,load,store} can be interoperated with existing structure. Can you show more about function prototypes?

Sure.

We have to emulate the peripherals, like MMU, UART and PLIC for minimum requirements to boot Linux.

First of all, we shall support MMU for more resource-management technique in kernel, for example memory sharing or copy-on-write(COW) such that user space programs can call fork system call. In order to support MMU, we can reuse the riscv_io_t inferface for I/O operations. The new function pointer for MMU_{fetch, load, store} might look like this:

typedef struct {
    /* memory read interface */
    riscv_mem_ifetch mem_ifetch;
    riscv_mem_read_w mem_read_w;
    riscv_mem_read_s mem_read_s;
    riscv_mem_read_b mem_read_b;

    /* memory write interface */
    riscv_mem_write_w mem_write_w;
    riscv_mem_write_s mem_write_s;
    riscv_mem_write_b mem_write_b;

    /* TODO: add peripheral I/O interfaces */

+   /* MMU memory helper interface */
+   riscv_mmu_mem_walk mmu_mem_walk;

+   /* MMU memory read interface */
+   riscv_mem_ifetch mmu_mem_ifetch;
+   riscv_mem_read_w mmu_mem_read_w;
+   riscv_mem_read_s mmu_mem_read_s;
+   riscv_mem_read_b mmu_mem_read_b;

+   /* MMU memory write interface */
+   riscv_mem_write_w mmu_mem_write_w;
+   riscv_mem_write_s mmu_mem_write_s;
+   riscv_mem_write_b mmu_mem_write_b;

    /* system */
    riscv_on_ecall on_ecall;
    riscv_on_ebreak on_ebreak;
    riscv_on_memset on_memset;
    riscv_on_memcpy on_memcpy;
} riscv_io_t;

We can decide which function pointer to call during instruction decoding stage since we will know the data width at that time.

mmu_mem_walk is the helper function to walk the 3-level page table(Sv32) with virtual memory and return the corresponding PTE. It's riscv_mmu_mem_walk interface might be like this:

typedef riscv_word_t *(*riscv_mmu_mem_walk)(riscv_word_t addr);

You might notice that non-mmu {fetch, read, write} and mmu {fetch, read, write} are duplicated after this changes. To preserve the mnemonic of the function pointers, we might want to separate them although we could use union to wrap them up to save memory. But, for simplicity, I would like not to use union first.

jserv · 2024-05-02T09:21:02Z

You might notice that non-mmu {fetch, read, write} and mmu {fetch, read, write} are duplicated after this changes. To preserve the mnemonic of the function pointers, we might want to separate them although we could use union to wrap them up to save memory. But, for simplicity, I would like not to use union first.

The proposed mmu_mem_{read,write}_[wsb] are confusing since we already have the ones prefixing with mem_. Can you avoid such inconsistency?

ChinYikMing · 2024-05-02T09:31:37Z

You might notice that non-mmu {fetch, read, write} and mmu {fetch, read, write} are duplicated after this changes. To preserve the mnemonic of the function pointers, we might want to separate them although we could use union to wrap them up to save memory. But, for simplicity, I would like not to use union first.

The proposed mmu_mem_{read,write}_[wsb] are confusing since we already have the ones prefixing with mem_. Can you avoid such inconsistency?

What about remove mem_? If so, the proposed would becomes:

typedef struct {
    ...

    /* TODO: add peripheral I/O interfaces */

+   /* MMU memory helper interface */
+   riscv_mmu_mem_walk mmu_walk;

+   /* MMU memory read interface */
+   riscv_mem_ifetch mmu_ifetch;
+   riscv_mem_read_w mmu_read_w;
+   riscv_mem_read_s mmu_read_s;
+   riscv_mem_read_b mmu_read_b;

+   /* MMU memory write interface */
+   riscv_mem_write_w mmu_write_w;
+   riscv_mem_write_s mmu_write_s;
+   riscv_mem_write_b mmu_write_b;

    ...
} riscv_io_t;

jserv · 2024-05-02T10:00:21Z

What about remove mem_?

Yes, I anticipate removing the legacy memory callback functions prefixed with mem_ in favor of the newly-added MMU counterparts. Additionally, I am considering the possibility of eliminating the mmu_{read,write} callback functions within the definition and registration of riscv_io_t. Can you refine it accordingly?

ChinYikMing · 2024-05-02T14:46:26Z

What about remove mem_?

Yes, I anticipate removing the legacy memory callback functions prefixed with mem_ in favor of the newly-added MMU counterparts. Additionally, I am considering the possibility of eliminating the mmu_{read,write} callback functions within the definition and registration of riscv_io_t. Can you refine it accordingly?

Originally, I am not intend to make mmu_{load,store} registratable, but to reduce the number of parameters that passed to a mmu_{load,store} function. However, if we want to eliminate registration for mmu_{load,store} from riscv_io_t, we can declare and define them as static functions within file scope inside "emulate.c" since all instructions implementation will be expanded by RVOP macro. In this way, the function prototype for mmu_{load,store} and helper function might look like this:
load:

static riscv_word_t mmu_ifetch(riscv_t *rv, riscv_word_t addr);
static riscv_word_t mmu_read_w(riscv_t *rv, riscv_word_t addr);
static riscv_half_t mmu_read_s(riscv_t *rv, riscv_word_t addr);
static riscv_byte_t mmu_read_b(riscv_t *rv, riscv_word_t addr);

store:

static void mmu_write_w(riscv_t *rv, riscv_word_t addr, riscv_word_t data);
static void mmu_write_s(riscv_t *rv, riscv_word_t addr, riscv_half_t data);
static void mmu_write_b(riscv_t *rv, riscv_word_t addr, riscv_byte_t data);

MMU helper function:

static riscv_word_t *mmu_walk(riscv_t *rv, riscv_word_t addr);

Obviously, we can pass a variable to indicate the width of the data and reduce the number of MMU related functions but I believe one function does one thing well might be a better adoption. Also, they might show more consistency upon the existing riscv_io_t callback functions.

ChinYikMing · 2024-05-03T04:41:18Z

Additionally, I am considering the possibility of eliminating the mmu_{read,write} callback functions within the definition and registration of riscv_io_t. Can you refine it accordingly?

One thing to notice is that: after the commit 8355777, the I/O interface are binding during initialization, thus no opportunity is given for user registration. Similar situation for mmu_{load, store} callback functions.

jserv · 2024-05-03T07:50:00Z

Obviously, we can pass a variable to indicate the width of the data and reduce the number of MMU related functions but I believe one function does one thing well might be a better adoption. Also, they might show more consistency upon the existing riscv_io_t callback functions.

Agree. Prior to the refinement of memory operations, I was thinking of Duff's device to unify these functions with various widths. However, we can benefit from compiler optimizations by using specialized functions which are not exposed and would be only hooks during initialization.

ChinYikMing · 2024-05-04T02:50:00Z

Obviously, we can pass a variable to indicate the width of the data and reduce the number of MMU related functions but I believe one function does one thing well might be a better adoption. Also, they might show more consistency upon the existing riscv_io_t callback functions.

However, we can benefit from compiler optimizations by using specialized functions which are not exposed and would be only hooks during initialization.

Yes, declare MMU related functions using static storage-class-specifier and inline function-specifier has potential to optimize them by inlining them via compiler optimization and do not expose them. Does hooking them at initialization still necessary in this way?

jserv · 2024-05-04T14:50:45Z

Does hooking them at initialization still necessary in this way?

Not necessary. Let's proceed.

ChinYikMing · 2024-05-11T02:46:23Z

Since we have ISA and system emulator, it should provide a way to turn on or off the MMU support. There are two ways to do this:

For every memory access, check if a variable rv->mmu_on is set. If yes, consider the address as virtual address.
Pre-select the I/O handlers/callbacks and bind to riscv_io_t interface during the initilization of RISC-V instance.

Obviously, option 2 has lower overhead than option 1. If option 2 is used, the existing riscv_io_t interface remain unchanged but only different handlers/callbacks.

jserv · 2024-05-11T14:29:42Z

Pre-select the I/O handlers/callbacks and bind to riscv_io_t interface during the initilization of RISC-V instance.
Obviously, option 2 has lower overhead than option 1. If option 2 is used, the existing riscv_io_t interface remain unchanged but only different handlers/callbacks.

I prefer option 2. When the MMU is not set by Linux during early boot, which set of memory handlers would be used?

ChinYikMing · 2024-05-11T16:26:05Z

Pre-select the I/O handlers/callbacks and bind to riscv_io_t interface during the initilization of RISC-V instance.
Obviously, option 2 has lower overhead than option 1. If option 2 is used, the existing riscv_io_t interface remain unchanged but only different handlers/callbacks.

I prefer option 2. When the MMU is not set by Linux during early boot, which set of memory handlers would be used?

According to the Sv32 description in RISC-V privileged 20211203 section 4.3, the MODE field of satp CSR determines whether the MMU is on or off. During the early boot, some temporarily kernel mapping setup by kernel function setup_vm should set the MODE off (or Bare mode). For further detail, refer to the comment of source code of kernel function setup_vm which states that the setup_vm is called in MMU-off mode.

In summary, rv32emu can check if MODE and decide whether to translate the address or not. Particularly, we can simply disable translation then read and write data directly from the given address by basic I/O functions defined in io.[ch].

The purpose of this commit is to boot 32-bit RISC-V Linux in the future. The virtual memory scheme to support is Sv32. There are one change to original code base to adapt the MMU: The prototype of riscv_io_t interface needs to be changed. Particularly, add a RISC-V instance(riscv_t) as the first parameter. MMU related callbacks require to access the satp CSR to perform a page table walk during virtual memory translation but satp CSR is stored in RISC-V instance(riscv_t), thus it should have a way to access the satp CSR. The trivial solution is adding RISC-V instance(riscv_t) to the prototype of riscv_io_t interface. After this change, we can reuse riscv_io_t for system emulation afterward. The rest of changes are implementing the Sv32 virtual memory scheme. For every memory access, it has to walk through the page table to get the corresponding PTE. Depends on the retrieval of PTE, there are several page faults to be handled if necessary, so there are three exceptions handlers have been introduced which are insn_pgfault, load_pgfault, and store_pgfault and they are used in MMU_CHECK_FAULT. In this commit, the access fault are not handled well since they are related to PMA and PMP and they might not the must to boot 32-bit RISC-V Linux (tested on semu). Related: sysprog21#310

The purpose of this commit is to boot 32-bit RISC-V Linux in the future. The virtual memory scheme to support is Sv32. There are one change to original code base to adapt the MMU: The prototype of riscv_io_t interface needs to be changed. Particularly, add a RISC-V instance(riscv_t) as the first parameter. MMU related callbacks require to access the satp CSR to perform a page table walk during virtual memory translation but satp CSR is stored in RISC-V instance(riscv_t), thus it should have a way to access the satp CSR. The trivial solution is adding RISC-V instance(riscv_t) to the prototype of riscv_io_t interface. After this change, we can reuse riscv_io_t for system emulation afterward. The rest of changes are implementing the Sv32 virtual memory scheme. For every memory access, it has to walk through the page table to get the corresponding PTE. Depends on the retrieval of PTE, there are several page faults to be handled if necessary, so there are three exceptions handlers have been introduced which are insn_pgfault, load_pgfault, and store_pgfault and they are used in MMU_CHECK_FAULT. In this commit, the access fault are not handled well since they are related to PMA and PMP and they might not the must to boot 32-bit RISC-V Linux (tested on semu). More S-mode and M-mode CSR helper macro are introduced as well for future needs. Related: sysprog21#310

The purpose of this commit is to boot 32-bit RISC-V Linux in the future. The virtual memory scheme to support is Sv32. There are one change to original code base to adapt the MMU: The prototype of riscv_io_t interface needs to be changed. Particularly, add a RISC-V instance(riscv_t) as the first parameter. MMU related callbacks require to access the satp CSR to perform a page table walk during virtual memory translation but satp CSR is stored in RISC-V instance(riscv_t), thus it should have a way to access the satp CSR. The trivial solution is adding RISC-V instance(riscv_t) to the prototype of riscv_io_t interface. After this change, we can reuse riscv_io_t for system emulation afterward. The rest of changes are implementing the Sv32 virtual memory scheme. For every memory access, it has to walk through the page table to get the corresponding PTE. Depends on the retrieval of PTE, there are several page faults to be handled if necessary, so there are three exceptions handlers have been introduced which are insn_pgfault, load_pgfault, and store_pgfault and they are used in MMU_CHECK_FAULT. In this commit, the access fault are not handled well since they are related to PMA and PMP and they might not the must to boot 32-bit RISC-V Linux (tested on semu). More PTE, S-mode, M-mode CSR helper macro are introduced as well. Related: sysprog21#310

The purpose of this commit is to boot 32-bit RISC-V Linux in the future. The virtual memory scheme to support is Sv32. There are one change to original code base to adapt the MMU: The prototype of riscv_io_t interface needs to be changed. Particularly, add a RISC-V instance(riscv_t) as the first parameter. MMU related callbacks require to access the satp CSR to perform a page table walk during virtual memory translation but satp CSR is stored in RISC-V instance(riscv_t), thus it should have a way to access the satp CSR. The trivial solution is adding RISC-V instance(riscv_t) to the prototype of riscv_io_t interface. After this change, we can reuse riscv_io_t for system emulation afterward. The rest of changes are implementing the Sv32 virtual memory scheme. For every memory access, it has to walk through the page table to get the corresponding PTE. Depends on the retrieval of PTE, there are several page faults to be handled if necessary, so there are three exceptions handlers have been introduced which are insn_pgfault, load_pgfault, and store_pgfault and they are used in MMU_CHECK_FAULT. In this commit, the access fault are not handled well since they are related to PMA and PMP and they might not the must to boot 32-bit RISC-V Linux (tested on semu). Some S-mode CSRs are added to riscv_internal to support S-mode. PTE, S-mode and M-mode CSR helper macro are introduced as well. Related: sysprog21#310

SBI acts as a communication layer between S-mode software and M-mode hardware. To boot Linux kernel, there are some minimal SBI extensions have to be implemented and they are: 1. Base extension(EID=0x10) 2. Timer extension(EID=0x54494D45) SRST extension(EID=0x53525354) is optional so just implemented shutdown reason. Related: sysprog21#310

jserv changed the title ~~Refactor riscv APIs to simplify porting~~ Refactoring RISC-V emulation APIs for easier adoption and porting Dec 25, 2023

jserv added the enhancement New feature or request label Dec 26, 2023

ChinYikMing mentioned this issue Jan 31, 2024

Refine the API in the public header #340

Merged

ChinYikMing mentioned this issue Feb 25, 2024

Bind I/O handlers during emulator initialization #356

Closed

ChinYikMing mentioned this issue Feb 25, 2024

Bind I/O handlers during emulator initialization #357

Merged

ChinYikMing mentioned this issue May 12, 2024

Preliminary support for MMU emulation #438

Open

ChinYikMing mentioned this issue May 13, 2024

Implement minimal SBI v0.3 #439

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring RISC-V emulation APIs for easier adoption and porting #310

Refactoring RISC-V emulation APIs for easier adoption and porting #310

ChinYikMing commented Dec 25, 2023 •

edited

jserv commented Dec 25, 2023

jserv commented Dec 25, 2023

ChinYikMing commented Dec 26, 2023 •

edited

RinHizakura commented Dec 26, 2023 •

edited

RinHizakura commented Dec 26, 2023 •

edited

ChinYikMing commented Dec 26, 2023 •

edited

ChinYikMing commented Dec 27, 2023

ChinYikMing commented Dec 27, 2023 •

edited

jserv commented Dec 30, 2023

jserv commented Jan 26, 2024

ChinYikMing commented Jan 27, 2024 •

edited

ChinYikMing commented Jan 28, 2024

jserv commented Jan 29, 2024

ChinYikMing commented Jan 29, 2024 •

edited

jserv commented Jan 30, 2024

ChinYikMing commented Jan 31, 2024

RinHizakura commented Jan 31, 2024 •

edited

ChinYikMing commented Jan 31, 2024

jserv commented Mar 12, 2024

ChinYikMing commented May 2, 2024 •

edited

jserv commented May 2, 2024

ChinYikMing commented May 2, 2024 •

edited

jserv commented May 2, 2024

ChinYikMing commented May 2, 2024 •

edited

ChinYikMing commented May 3, 2024

jserv commented May 3, 2024

ChinYikMing commented May 4, 2024

jserv commented May 4, 2024

ChinYikMing commented May 11, 2024

jserv commented May 11, 2024

ChinYikMing commented May 11, 2024

Refactoring RISC-V emulation APIs for easier adoption and porting #310

Refactoring RISC-V emulation APIs for easier adoption and porting #310

Comments

ChinYikMing commented Dec 25, 2023 • edited

jserv commented Dec 25, 2023

jserv commented Dec 25, 2023

ChinYikMing commented Dec 26, 2023 • edited

RinHizakura commented Dec 26, 2023 • edited

RinHizakura commented Dec 26, 2023 • edited

ChinYikMing commented Dec 26, 2023 • edited

ChinYikMing commented Dec 27, 2023

ChinYikMing commented Dec 27, 2023 • edited

jserv commented Dec 30, 2023

jserv commented Jan 26, 2024

ChinYikMing commented Jan 27, 2024 • edited

ChinYikMing commented Jan 28, 2024

jserv commented Jan 29, 2024

ChinYikMing commented Jan 29, 2024 • edited

jserv commented Jan 30, 2024

ChinYikMing commented Jan 31, 2024

RinHizakura commented Jan 31, 2024 • edited

ChinYikMing commented Jan 31, 2024

jserv commented Mar 12, 2024

ChinYikMing commented May 2, 2024 • edited

jserv commented May 2, 2024

ChinYikMing commented May 2, 2024 • edited

jserv commented May 2, 2024

ChinYikMing commented May 2, 2024 • edited

ChinYikMing commented May 3, 2024

jserv commented May 3, 2024

ChinYikMing commented May 4, 2024

jserv commented May 4, 2024

ChinYikMing commented May 11, 2024

jserv commented May 11, 2024

ChinYikMing commented May 11, 2024

ChinYikMing commented Dec 25, 2023 •

edited

ChinYikMing commented Dec 26, 2023 •

edited

RinHizakura commented Dec 26, 2023 •

edited

RinHizakura commented Dec 26, 2023 •

edited

ChinYikMing commented Dec 26, 2023 •

edited

ChinYikMing commented Dec 27, 2023 •

edited

ChinYikMing commented Jan 27, 2024 •

edited

ChinYikMing commented Jan 29, 2024 •

edited

RinHizakura commented Jan 31, 2024 •

edited

ChinYikMing commented May 2, 2024 •

edited

ChinYikMing commented May 2, 2024 •

edited

ChinYikMing commented May 2, 2024 •

edited