Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring RISC-V emulation APIs for easier adoption and porting #310

Open
2 tasks
ChinYikMing opened this issue Dec 25, 2023 · 33 comments
Open
2 tasks

Refactoring RISC-V emulation APIs for easier adoption and porting #310

ChinYikMing opened this issue Dec 25, 2023 · 33 comments
Assignees
Labels
enhancement New feature or request

Comments

@ChinYikMing
Copy link
Collaborator

ChinYikMing commented Dec 25, 2023

First trial of refactoring, the wasm branch's latest commit is the result.

Since state_t is a user-provided data, so all runtime defined value(often change) shall be stored there. For instance, the emulated target program's argc and argv, and the emulator's parameter. The following have been adjusted to reflect the changes:

  1. state_t *state_new(void) -----> state_t *state_new(uint32_t mem_size, int argc, char **argv, bool allow_misalign, bool quiet_output)
    mem_size is used for memory_new because different runtimes may have memory requirements (for example, the page size in WebAssembly is 64KiB), the default MEM_SIZE(2^32 - 1) is not appropriate for that. The rest of parameters are runtime defined value.

  2. riscv_t *rv_create(const riscv_io_t *io, riscv_user_t userdata, int argc, char **args, bool output_exit_code) -----> riscv_t *rv_create(riscv_user_t userdata)
    Much cleaner function signature.

  3. void rv_reset(riscv_t *rv, riscv_word_t pc, int argc, char **args) -----> void rv_reset(riscv_t *rv, riscv_word_t pc)
    We can use rv->userdata to get the required argc and argv.

  4. Since memory I/O handlers are rarely changed, it makes less sense to define them during runtime (makes porting difficult). Instead, I believe it is preferable to link them during build time. If really want to change the implementations, make a new C file and link it during build time might be a better choice.
    To do this, some changes are made:

    • define memory I/O handlers in rv_create and link during build time
    • to make memory write interfaces match to compatible with the function pointers, MEM_WRITE_IMPL macro has to be changed:
    • src/io.c
// old
#define MEM_WRITE_IMPL(size, type)                                 \
    void memory_write_##size(uint32_t addr, const uint8_t *src)    \
    {                                                              \
        *(type *) (data_memory_base + addr) = *(const type *) src; \
    }

// new
#define MEM_WRITE_IMPL(size, type)                                 \
    void memory_write_##size(uint32_t addr, const type src)    \
    {                                                              \
        *(type *) (data_memory_base + addr) = src; \
    }
  • the calling of memory_write_w in "src/syscall.c" shall be changed accordingly:
  • src/syscall.c
// old
memory_write_w(tv + 0, (const uint8_t *) &tv_s.tv_sec);

// new
memory_write_w(tv + 0, *((const uint32_t *) &tv_s.tv_sec));

For notably change, the "pre.js" of the wasm branch do not define IO on its own anymore compare to first attempt.(more abstraction)

  1. Change all uint32_t and uint16_t and uint8_t in riscv.[ch] to riscv_word_t and riscv_half_t and riscv_byte_t in function signature respectively for consistency.

  2. bool elf_load(elf_t *e, riscv_t *rv, memory_t *mem); -----> bool elf_load(elf_t *e, riscv_t *rv);
    The memory instance required by elf_load can be accessed via rv's userdata.

I am wondering shall we abstract the FILE defined in state_new as a parameter of state_new. Without abstraction, the emulator always depends on standard io(e.g., stdin, stdout, stderr). What if the user want to use a file instead of stdout?

Related to: #75

@jserv jserv changed the title Refactor riscv APIs to simplify porting Refactoring RISC-V emulation APIs for easier adoption and porting Dec 25, 2023
@jserv
Copy link
Contributor

jserv commented Dec 25, 2023

I would like to invite @RinHizakura, @qwe661234, and @visitorckw to join the discussion and contribute to the refinement of the API.

@jserv
Copy link
Contributor

jserv commented Dec 25, 2023

I am wondering shall we abstract the FILE defined in state_new as a parameter of state_new. Without abstraction, the emulator always depends on standard io(e.g., stdin, stdout, stderr). What if the user want to use a file instead of stdout?

In the initial stages of developing this emulator, I redirected I/O operations to files for comparison purposes. However, I now recognize that this approach to the function interface was not as flexible as I had initially thought.

@ChinYikMing
Copy link
Collaborator Author

ChinYikMing commented Dec 26, 2023

I am wondering shall we abstract the FILE defined in state_new as a parameter of state_new. Without abstraction, the emulator always depends on standard io(e.g., stdin, stdout, stderr). What if the user want to use a file instead of stdout?

In the initial stages of developing this emulator, I redirected I/O operations to files for comparison purposes. However, I now recognize that this approach to the function interface was not as flexible as I had initially thought.

So, it could be useful to provide an abstract way for defining the desired stdin, stdout, stderr, or more than just three of them. stdin, stdout, stderr can be set as default if any spefication of them is not given.

@RinHizakura
Copy link
Collaborator

RinHizakura commented Dec 26, 2023

Since memory I/O handlers are rarely changed, it makes less sense to define them during runtime (makes porting difficult). Instead, I believe it is preferable to link them during build time.

I think the distinction between modules is a little bit unclear in the current design of rv32emu. On the current design, if we regard riscv.c as the part of the library and main.c as the part of the application using the library. Although rv_create() seems to allow the application to customize memory operations through io in a pointer manner, the operation on simulated memory actually must be bound to the instance created by state_new(), which is belongs to the library side. This leads to limitations for customizing io. For example, what if you want to use a backup file to simulate memory? This design seems to make memory operations using function pointers io redundant.

I believe it is preferable to link them during build time.

So, if it doesn't matter to provide user-specific operations on memory, providing them on the build time for the library will also be a great solution.

@RinHizakura
Copy link
Collaborator

RinHizakura commented Dec 26, 2023

mem_size is used for memory_new because different runtimes may have memory requirements (for example, the page size in WebAssembly is 64KiB), the default MEM_SIZE(2^32 - 1) is not appropriate for that.

Not quite sure about whether changing the memory size directly is safe. As I remember, some implementations of rv32emu intensionally rely on the fact that the memory size is 2^32 - 1 to have some trick. Or maybe I mix up with some project else. Looking for others to answer the question.

@ChinYikMing
Copy link
Collaborator Author

ChinYikMing commented Dec 26, 2023

mem_size is used for memory_new because different runtimes may have memory requirements (for example, the page size in WebAssembly is 64KiB), the default MEM_SIZE(2^32 - 1) is not appropriate for that.

Not quite sure about whether changing the memory size directly is safe. As I remember, some implementations of rv32emu intensionally rely on the fact that the memory size is 2^32 - 1 to have some trick. Or maybe I mix up with some project else. Looking for others to answer the question.

The built-in ELF programs do not seem to need a lot of memory so I think 2GB - 4GB is a safe region. Dynamically changing the memory size in different runtime might be needed. For example, 64KiB multiples should be used in WebAssembly. The MEM_SIZE is set to 2^32 originally in #151 as the memory size for preallocating memory to prevent extra checking when manipulating the memory region. Then, MEM_SIZE is set to 2^32 - 1 in #221 to compatible with emcc which default build target is wasm32 ( memory shall < 4GB ).

@jserv jserv added the enhancement New feature or request label Dec 26, 2023
@ChinYikMing
Copy link
Collaborator Author

I would like to introduce cycles_per_step into state_t structure since it can be varied. For example, in web-based simulation, the user might want to increase the cycles_per_step to jump quicker to the desired part of execution to see the register file or memory bank status. For better abstraction, it could be possible to add a pair of getters and setters.

Then, rv_step signature can be refactored to have only one parameter: void rv_step(riscv_t *rv). The cycles_per_step can be retrieved via rv->userdata

@ChinYikMing
Copy link
Collaborator Author

ChinYikMing commented Dec 27, 2023

rv_enables_to_output_exit_code could be renamed as something like rv_get_xxx. Same rules might be applied to other fields of state_t to improve consistency. rv_set_xxx can be the setter.

For example:

  • rv_get_userdata / rv_set_userdata
  • rv_get_pc / rv_set_pc
  • rv_get_reg / rv_set_reg
  • rv_get_halt_status / rv_set_halt_status
  • rv_get_cycle_per_step / rv_set_cycle_per_step
  • rv_get_output_exit_code_flag / rv_set_output_exit_code_flag
  • rv_get_allow_misalign_flag / rv_set_allow_misalign_flag
  • ...

@jserv
Copy link
Contributor

jserv commented Dec 30, 2023

I would like to introduce cycles_per_step into state_t structure since it can be varied. For example, in web-based simulation, the user might want to increase the cycles_per_step to jump quicker to the desired part of execution to see the register file or memory bank status. For better abstraction, it could be possible to add a pair of getters and setters.

Then, rv_step signature can be refactored to have only one parameter: void rv_step(riscv_t *rv). The cycles_per_step can be retrieved via rv->userdata

I agree. By the way, state_t might not be a very self-explanatory name. I am considering unifying it into a VM-specific structure.

@jserv
Copy link
Contributor

jserv commented Jan 26, 2024

The repository mnurzia/rv serves as an additional reference for API refinement. It features three main APIs:

  • Memory Access Callback: This function processes data as input/output and returns RV_BAD in case of a fault. It's defined as:
    typedef rv_res (*rv_bus_cb)(void *user, rv_u32 addr, rv_u8 *data, rv_u32 is_store, rv_u32 width);
  • CPU Initialization: This function initializes the CPU and can be called again on the cpu object to reset it. The function signature is:
    void rv_init(rv *cpu, void *user, rv_bus_cb bus_cb);
  • CPU Single-Step: This function advances the CPU by one step and returns RV_E * in case of an exception. Its definition is:
    rv_u32 rv_step(rv *cpu);

These APIs collectively provide a structure for memory access, CPU initialization, and step-wise execution in the CPU simulation.

@ChinYikMing
Copy link
Collaborator Author

ChinYikMing commented Jan 27, 2024

As previously suggested, the maximum memory (MEM_SIZE) of a virtual machine (VM) shall be determined by the application. If these modifications are made, the Makefile-defined default stack size shall also be adjusted.

Makefile:

# Set the default stack pointer
...
CFLAGS += -D DEFAULT_STACK_ADDR=0xFFFFE000
# Set the default args starting address
CFLAGS += -D DEFAULT_ARGS_ADDR=0xFFFFF000
...

Thus, adjusting stack size should be a part in public API.

@ChinYikMing
Copy link
Collaborator Author

I would like to introduce cycles_per_step into state_t structure since it can be varied. For example, in web-based simulation, the user might want to increase the cycles_per_step to jump quicker to the desired part of execution to see the register file or memory bank status. For better abstraction, it could be possible to add a pair of getters and setters.
Then, rv_step signature can be refactored to have only one parameter: void rv_step(riscv_t *rv). The cycles_per_step can be retrieved via rv->userdata

I agree. By the way, state_t might not be a very self-explanatory name. I am considering unifying it into a VM-specific structure.

I think vm_attr_t can be the candidate for the name ( inspired by pthread_attr_t ).

Currently, vm_attr_t should consist of the following:

  1. vm RAM size (if previous concern is OK)
  2. vm STACK size (if vm RAM size changes)
  3. vm-specific argc, argv
  4. error code (to represent the exit state of vm)
  5. enable_outout_exit_code
  6. logging level
  7. union of target ELF program and target vm
union {
    rv_struct_t rv_struct;
    vm_struct_t vm_struct;
};

typedef struct rv_struct {
     char *elf_program;
} rv_struct_t;

typedef struct vm_struct {
    kernel_img;
    dtb;
    rootfs_img;
} vm_struct_t ;
  1. cycle_per_step
  2. enable_misaligned

I would like to introduce the sixth attribute of vm_attr_t which allows the user to select how vm should log, just like printk log level of Linux kernel. This logging level will register corresponding handler during rv_init initialization. This feature enable the user has more flexibility to observe the vm state or error reporting. The sixth attribute of vm_attr_t allows to differentiate RISC-V program or RISC-V system emulation, then rv_create return a corresponding internal structure (riscv_internal or vm_internal), of course they are forward declaration structure.

Prefix of all vm-related functions should be consistent ( more discussion ).

@jserv
Copy link
Contributor

jserv commented Jan 29, 2024

state_t might not be a very self-explanatory name. I am considering unifying it into a VM-specific structure.

I think vm_attr_t can be the candidate for the name ( inspired by pthread_attr_t ).

It sounds promising. Please send pull request(s) to refine APIs.

Currently, vm_attr_t should consist of the following:

  1. vm RAM size (if previous concern is OK)
  2. vm STACK size (if vm RAM size changes)
  3. vm-specific argc, argv

How about envp?

  1. error code (to represent the exit state of vm)
  2. enable_outout_exit_code

enable_outout_exit_code looks hacky. Can you show something detailed?

@ChinYikMing
Copy link
Collaborator Author

ChinYikMing commented Jan 29, 2024

  1. vm RAM size (if previous concern is OK)
  2. vm STACK size (if vm RAM size changes)
  3. vm-specific argc, argv

How about envp?

Since the envp is not accessible for now, place a TODO in vm_attr_t might be decent.

  1. enable_outout_exit_code

enable_outout_exit_code looks hacky. Can you show something detailed?

It is related to syscall_exit to determine whether to output the exit code. I think always output the exit code is not a bad thing, maybe this is redundant. Or, it can be determined on top of logging feature.

@jserv
Copy link
Contributor

jserv commented Jan 30, 2024

I think always output the exit code is not a bad thing, maybe this is redundant. Or, it can be determined on top of logging feature.

After streamlining the API, we can control the exit code by storing it in a specific structure, instead of displaying it directly in the console.

@ChinYikMing
Copy link
Collaborator Author

I am wondering shall we abstract the FILE defined in state_new as a parameter of state_new. Without abstraction, the emulator always depends on standard io(e.g., stdin, stdout, stderr). What if the user want to use a file instead of stdout?

In the initial stages of developing this emulator, I redirected I/O operations to files for comparison purposes. However, I now recognize that this approach to the function interface was not as flexible as I had initially thought.

So, it could be useful to provide an abstract way for defining the desired stdin, stdout, stderr, or more than just three of them. Standard stdin, stdout, stderr can be set as default if any spefication of them is not given.

For abstracting file or file descriptor, we could have an attribute called bool use_default_stdin_stdout_stderr in vm_attr_t which will use common stdin, stdout and stderr. For alternative, we could have a function called vm_register_stdxxx makes the vm to register the non-common fd (e.g., regular file) before emulation.

@RinHizakura
Copy link
Collaborator

RinHizakura commented Jan 31, 2024

For abstracting file or file descriptor, we could have an attribute called bool use_default_stdin_stdout_stderr in vm_attr_t which will use common stdin, stdout and stderr. For alternative, we could have a function called vm_register_stdxxx makes the vm to register the non-common fd (e.g., regular file) before emulation.

Since we should also have to maintain the file descriptor if vm_register_stdxxx() for redirection, how about just adding three file descriptors in vm_attr_t:

struct {
    ...
    int stdin;
    int stdout;
    int stderr;
} vm_attr_t;

Note: The variable's name should be refined. I just can't come up with a suitable and concise name now

These file descriptors are assigned to 0, 1, and 2 by default, and modified if vm_register_stdxxx() it. So we don't need a redundant variable bool use_default_stdin_stdout_stderr for this feature.

@ChinYikMing
Copy link
Collaborator Author

For abstracting file or file descriptor, we could have an attribute called bool use_default_stdin_stdout_stderr in vm_attr_t which will use common stdin, stdout and stderr. For alternative, we could have a function called vm_register_stdxxx makes the vm to register the non-common fd (e.g., regular file) before emulation.

Since we should also have to maintain the file descriptor if vm_register_stdxxx() for redirection, how about just adding three file descriptors in vm_attr_t:

struct {
    ...
    int stdin;
    int stdout;
    int stderr;
} vm_attr_t;

Note: The variable's name should be refined. I just can't come up with a suitable and concise name now

These file descriptors are assigned to 0, 1, and 2 by default, and modified if vm_register_stdxxx() it. So we don't need a redundant variable bool use_default_stdin_stdout_stderr for this feature.

Thanks for tips! I think the boolean really redundant since vm_register_stdxxx() could overwrite them.

ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue Jan 31, 2024
The following should be included in an emulator's simple and clear
public API:
1. create/init core
2. run emulation
3. delete/destroy core

Other components, including as memory, file systems, program data,
etc., should be abstracted from the user; as a result, setting a
configuration value (vm_attr_t) is sufficient. The user should
concern about memory (state_t) and elf stuff before this PR.
The user may just construct a core, run it, and shut it down
after this PR, so they won't need to worry about them anymore.

The vm_attr_t has multiple fields and they are commented clearly
in the code.

Note that logging feature and system emulator integration are not
implemented yet.

related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue Jan 31, 2024
The following should be included in an emulator's simple and clear
public API:
1. create/init core
2. run emulation
3. delete/destroy core

Other components, including as memory, file systems, program data,
etc., should be abstracted from the user; as a result, setting a
configuration value (vm_attr_t) is sufficient. For stdio redirection,
rv_register_stdio function is introduced.

The user should concern about memory (state_t) and elf stuff before
this PR. The user may just construct a core, run it, and shut it down
after this PR, so they won't need to worry about them anymore.

The vm_attr_t has multiple fields and they are commented clearly
in the code.

Note that logging feature and system emulator integration are not
implemented yet.

related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue Feb 1, 2024
The following should be included in an emulator's simple and clear
public API:
1. create/init core
2. run emulation
3. delete/destroy core

Other components, including as memory, file systems, program data,
etc., should be abstracted from the user, as a result, setting a
configuration value (vm_attr_t) is sufficient. The user should
manage about memory (state_t) and elf stuff before this PR.
The user may just construct a core, run it, and shut it down
after this PR, so they won't need to worry about them anymore.

For stdio remapping, rv_remap_stdstream function is introduced.

The vm_attr_t has multiple fields and they are commented clearly
in the code.

elf_t is reopened in run_and_trace and dump_test_signature because
elf_t is allocated inside rv_create and they cannot access them.
It is acceptable to reopen elf_t since they are only for testing and
debugging.

PRINT_EXIT_CODE build macro is introduced to enable syscall_exit
to print exit code to console only during testing since the actual usage
of exit code is really depending on applications.

The io interface is not changed in this PR because it could maybe reused
with semu in some way, still need to be investigated. Also, Logging
feature and system emulator integration are not implemented yet.

related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue Feb 1, 2024
The following should be included in an emulator's simple and clear
public API:
1. create/init core
2. run emulation
3. delete/destroy core

Other components, including as memory, file systems, program data,
etc., should be abstracted from the user, as a result, setting a
configuration value (vm_attr_t) is sufficient. The user should
manage about memory (state_t) and elf stuff before this PR.
The user may just construct a core, run it, and shut it down
after this PR, so they won't need to worry about them anymore.

For stdio remapping, rv_remap_stdstream function is introduced.

The vm_attr_t has multiple fields and they are commented clearly
in the code.

elf is reopened in run_and_trace and dump_test_signature because
elf is allocated inside rv_create and they cannot access them.
It is acceptable to reopen elf since they are only for testing and
debugging. Print inferior exit code to console inside main instead of
syscall_exit because the actual usage of exit code depends on
applications of using riscv public API.

The io interface is not changed in this PR because it could maybe
reused with semu in some way, still need to be investigated. Also,
Logging feature and system emulator integration are not implemented
yet.

related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue Feb 1, 2024
The following should be included in an emulator's simple and clear
public API:
1. create/init core
2. run emulation
3. delete/destroy core

Other components, including as memory, file systems, program data,
etc., should be abstracted from the user, as a result, setting a
configuration value (vm_attr_t) is sufficient. The user should
manage about memory (state_t) and elf stuff before this PR.
The user may just construct a core, run it, and shut it down
after this PR, so they won't need to worry about them anymore.

For stdio remapping, rv_remap_stdstream function is introduced.

The vm_attr_t has multiple fields and they are commented clearly
in the code.

elf is reopened in dump_test_signature because elf is allocated
during rv_create. It is acceptable to reopen elf since it is only
for testing. Print inferior exit code to console inside main
instead of syscall_exit because the actual usage of exit code
depends on applications of using riscv public API.

The io interface is not changed in this PR because it could maybe
reused with semu in some way, still need to be investigated. Also,
Logging feature and system emulator integration are not implemented
yet.

related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue Feb 2, 2024
The following should be included in an emulator's simple and clear
public API:
1. create/init core
2. run emulation
3. delete/destroy core

Other components, including as memory, file systems, program data,
etc., should be abstracted from the user, as a result, setting a
configuration value (vm_attr_t) is sufficient. The user should
manage about memory (state_t) and elf stuff before this PR.
The user may just construct a core, run it, and shut it down
after this PR, so they won't need to worry about them anymore.

The vm_attr_t has multiple fields and they are commented clearly
in the code. As you can see in "main", there are various mode to run the
emulator such as "run_and_trace", "gdbstub", and "profiling". Thus,
a field call "run_flag" is introduced in vm_attr_t.

For standard stream remapping, rv_remap_stdstream function is
introduced. The emulator can remap default standard stream to required
streams after creating the emulator by calling the rv_remap_stdstream
function.

elf is reopened in dump_test_signature because elf is allocated during
rv_create. It is acceptable to reopen elf since it is only for testing.

Print inferior exit code to console inside main instead of syscall_exit
because the actual usage of exit code depends on applications of using
riscv public API.

The io interface is not changed in this PR because it could maybe
reused with semu in some way, still need to be investigated. Also,
Logging feature and system emulator integration are not implemented
yet. A validator for validating the user-defined vm_attr_t might need to
be introduced.

related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue Feb 2, 2024
The following should be included in an emulator's simple and clear
public API:
1. create/init core
2. run emulation
3. delete/destroy core

Other components, including as memory, file systems, program data,
etc., should be abstracted from the user, as a result, setting a
configuration value (vm_attr_t) is sufficient. The user should
manage about memory (state_t) and elf stuff before this PR.
The user may just construct a core, run it, and shut it down
after this PR, so they won't need to worry about them anymore.

The vm_attr_t has multiple fields and they are commented clearly
in the code. As you can see in "main", there are various mode to run the
emulator such as "run_and_trace", "gdbstub", and "profiling". Thus,
a field call "run_flag" is introduced in vm_attr_t.

For standard stream remapping, rv_remap_stdstream function is
introduced. The emulator can remap default standard stream to required
streams after creating the emulator by calling the rv_remap_stdstream
function.

rv_userdata has been dropped since PRIV macro is sufficient for
internal implemntation. Also, application will not need to direct
access it.

elf is reopened in dump_test_signature because elf is allocated during
rv_create. It is acceptable to reopen elf since it is only for testing.

Print inferior exit code to console inside main instead of syscall_exit
because the actual usage of exit code depends on applications of using
riscv public API.

The io interface is not changed in this PR because it could maybe
reused with semu in some way, still need to be investigated. Also,
Logging feature and system emulator integration are not implemented
yet. A validator for validating the user-defined vm_attr_t might need to
be introduced.

related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue Feb 2, 2024
The following should be included in an emulator's simple and clear
public API:
1. create/init core
2. run emulation
3. delete/destroy core

Other components, including as memory, file systems, program data,
etc., should be abstracted from the user, as a result, setting a
configuration value (vm_attr_t) is sufficient. The user should
manage about memory (state_t) and elf stuff before this PR.
The user may just construct a core, run it, and shut it down
after this PR, so they won't need to worry about them anymore.

The vm_attr_t has multiple fields and they are commented clearly
in the code. As you can see in "main", there are various mode to run the
emulator such as "run_and_trace", "gdbstub", and "profiling". Thus,
a field call "run_flag" is introduced in vm_attr_t.

For standard stream remapping, rv_remap_stdstream function is
introduced. The emulator can remap default standard stream to required
streams after creating the emulator by calling the rv_remap_stdstream
function.

rv_userdata has been dropped since PRIV macro is sufficient for
internal implemntation. Also, application will not need to direct
access it.

elf is reopened in dump_test_signature because elf is allocated during
rv_create. It is acceptable to reopen elf since it is only for testing.

Print inferior exit code to console inside main instead of syscall_exit
because the actual usage of exit code depends on applications of using
riscv public API.

The io interface is not changed in this PR because it could maybe
reused with semu in some way, still need to be investigated. Also,
Logging feature and system emulator integration are not implemented
yet. A validator for validating the user-defined vm_attr_t might need to
be introduced.

related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue Feb 2, 2024
The following should be included in an emulator's simple and clear
public API:
1. create/init core
2. run emulation
3. delete/destroy core

Other components, including as memory, file systems, program data,
etc., should be abstracted from the user, as a result, setting a
configuration value (vm_attr_t) is sufficient. The user should
manage about memory (state_t) and elf stuff before this PR.
The user may just construct a core, run it, and shut it down
after this PR, so they won't need to worry about them anymore.

The vm_attr_t has multiple fields and they are commented clearly
in the code. As you can see in "main", there are various mode to run the
emulator such as "run_and_trace", "gdbstub", and "profiling". Thus,
a field call "run_flag" is introduced in vm_attr_t.

For standard stream remapping, rv_remap_stdstream function is
introduced. The emulator can remap default standard stream to required
streams after creating the emulator by calling the rv_remap_stdstream
function.

rv_userdata has been dropped since PRIV macro is sufficient for
internal implemntation. Also, application will not need to direct
access it.

elf is reopened in dump_test_signature because elf is allocated during
rv_create. It is acceptable to reopen elf since it is only for testing.

Print inferior exit code to console inside main instead of syscall_exit
because the actual usage of exit code depends on applications of using
riscv public API.

The io interface is not changed in this PR because it could maybe
reused with semu in some way, still need to be investigated. Also,
Logging feature and system emulator integration are not implemented
yet. A validator for validating the user-defined vm_attr_t might need to
be introduced.

related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue Feb 25, 2024
It is not required to give an application the opportunity to bind IO
handlers because IO handlers are rarely altered during the
creation of a emulator.

With this commit, the application can now build a emulator much more
easier by only taking the emulator's attribute (vm_attr_t)
into consideration.

In order to facilitate further integration with the RISC-V system
emulator (semu), I have included a TODO inside the IO interface.

Related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue Feb 25, 2024
It is not required to give an application the opportunity to bind I/O
handlers because I/O handlers are rarely altered during the
creation of a emulator.

With this commit, the application can now build a emulator much more
easier by only taking the emulator's attribute (vm_attr_t)
into consideration.

In order to facilitate further integration with the RISC-V system
emulator (semu), I have included a TODO inside the I/O interface.

Related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue Feb 25, 2024
It is not required to give an application the opportunity to bind I/O
handlers because I/O handlers are rarely altered during the
creation of an emulator.

With this commit, the application can now build a emulator much more
easier by only taking the emulator's attribute (vm_attr_t)
into consideration.

In order to facilitate further integration with the RISC-V system
emulator (semu), I have included a TODO inside the I/O interface.

Related: sysprog21#310
@jserv
Copy link
Contributor

jserv commented Mar 12, 2024

  • mmu_fetch signature of semu is compatible with riscv_mem_ifetch by removing the vm and value parameter. The I/O interface is embedded inside riscv_t so vm parameter is no longer needed. The fetched value is returned
  • mmu_load signature of semu is compatible with riscv_mem_read_w, riscv_mem_read_s and riscv_mem_read_b by removing the vm, width, value and reserved parameter. The I/O interface is embedded inside riscv_t so vm param is no longer needed. The width parameter is not necessary since there are width related handlers(riscv_mem_read_w, riscv_mem_read_s and riscv_mem_read_b). The loaded value is returned. The registration of the 'reservation set' can be done in corresponding RVOP()(some fields might be added to riscv_t, e.g., reservation) so reserved parameter is no longer needed
  • mmu_store is similar to mmu_load

The proposal sounds great. I wonder how mmu_{fetch,load,store} can be interoperated with existing structure. Can you show more about function prototypes?

@ChinYikMing
Copy link
Collaborator Author

ChinYikMing commented May 2, 2024

I wonder how mmu_{fetch,load,store} can be interoperated with existing structure. Can you show more about function prototypes?

Sure.

We have to emulate the peripherals, like MMU, UART and PLIC for minimum requirements to boot Linux.

First of all, we shall support MMU for more resource-management technique in kernel, for example memory sharing or copy-on-write(COW) such that user space programs can call fork system call. In order to support MMU, we can reuse the riscv_io_t inferface for I/O operations. The new function pointer for MMU_{fetch, load, store} might look like this:

typedef struct {
    /* memory read interface */
    riscv_mem_ifetch mem_ifetch;
    riscv_mem_read_w mem_read_w;
    riscv_mem_read_s mem_read_s;
    riscv_mem_read_b mem_read_b;

    /* memory write interface */
    riscv_mem_write_w mem_write_w;
    riscv_mem_write_s mem_write_s;
    riscv_mem_write_b mem_write_b;

    /* TODO: add peripheral I/O interfaces */

+   /* MMU memory helper interface */
+   riscv_mmu_mem_walk mmu_mem_walk;

+   /* MMU memory read interface */
+   riscv_mem_ifetch mmu_mem_ifetch;
+   riscv_mem_read_w mmu_mem_read_w;
+   riscv_mem_read_s mmu_mem_read_s;
+   riscv_mem_read_b mmu_mem_read_b;

+   /* MMU memory write interface */
+   riscv_mem_write_w mmu_mem_write_w;
+   riscv_mem_write_s mmu_mem_write_s;
+   riscv_mem_write_b mmu_mem_write_b;

    /* system */
    riscv_on_ecall on_ecall;
    riscv_on_ebreak on_ebreak;
    riscv_on_memset on_memset;
    riscv_on_memcpy on_memcpy;
} riscv_io_t;

We can decide which function pointer to call during instruction decoding stage since we will know the data width at that time.

mmu_mem_walk is the helper function to walk the 3-level page table(Sv32) with virtual memory and return the corresponding PTE. It's riscv_mmu_mem_walk interface might be like this:

typedef riscv_word_t *(*riscv_mmu_mem_walk)(riscv_word_t addr);

You might notice that non-mmu {fetch, read, write} and mmu {fetch, read, write} are duplicated after this changes. To preserve the mnemonic of the function pointers, we might want to separate them although we could use union to wrap them up to save memory. But, for simplicity, I would like not to use union first.

@jserv
Copy link
Contributor

jserv commented May 2, 2024

You might notice that non-mmu {fetch, read, write} and mmu {fetch, read, write} are duplicated after this changes. To preserve the mnemonic of the function pointers, we might want to separate them although we could use union to wrap them up to save memory. But, for simplicity, I would like not to use union first.

The proposed mmu_mem_{read,write}_[wsb] are confusing since we already have the ones prefixing with mem_. Can you avoid such inconsistency?

@ChinYikMing
Copy link
Collaborator Author

ChinYikMing commented May 2, 2024

You might notice that non-mmu {fetch, read, write} and mmu {fetch, read, write} are duplicated after this changes. To preserve the mnemonic of the function pointers, we might want to separate them although we could use union to wrap them up to save memory. But, for simplicity, I would like not to use union first.

The proposed mmu_mem_{read,write}_[wsb] are confusing since we already have the ones prefixing with mem_. Can you avoid such inconsistency?

What about remove mem_? If so, the proposed would becomes:

typedef struct {
    ...

    /* TODO: add peripheral I/O interfaces */

+   /* MMU memory helper interface */
+   riscv_mmu_mem_walk mmu_walk;

+   /* MMU memory read interface */
+   riscv_mem_ifetch mmu_ifetch;
+   riscv_mem_read_w mmu_read_w;
+   riscv_mem_read_s mmu_read_s;
+   riscv_mem_read_b mmu_read_b;

+   /* MMU memory write interface */
+   riscv_mem_write_w mmu_write_w;
+   riscv_mem_write_s mmu_write_s;
+   riscv_mem_write_b mmu_write_b;

    ...
} riscv_io_t;

@jserv
Copy link
Contributor

jserv commented May 2, 2024

What about remove mem_?

Yes, I anticipate removing the legacy memory callback functions prefixed with mem_ in favor of the newly-added MMU counterparts. Additionally, I am considering the possibility of eliminating the mmu_{read,write} callback functions within the definition and registration of riscv_io_t. Can you refine it accordingly?

@ChinYikMing
Copy link
Collaborator Author

ChinYikMing commented May 2, 2024

What about remove mem_?

Yes, I anticipate removing the legacy memory callback functions prefixed with mem_ in favor of the newly-added MMU counterparts. Additionally, I am considering the possibility of eliminating the mmu_{read,write} callback functions within the definition and registration of riscv_io_t. Can you refine it accordingly?

Originally, I am not intend to make mmu_{load,store} registratable, but to reduce the number of parameters that passed to a mmu_{load,store} function. However, if we want to eliminate registration for mmu_{load,store} from riscv_io_t, we can declare and define them as static functions within file scope inside "emulate.c" since all instructions implementation will be expanded by RVOP macro. In this way, the function prototype for mmu_{load,store} and helper function might look like this:
load:

static riscv_word_t mmu_ifetch(riscv_t *rv, riscv_word_t addr);
static riscv_word_t mmu_read_w(riscv_t *rv, riscv_word_t addr);
static riscv_half_t mmu_read_s(riscv_t *rv, riscv_word_t addr);
static riscv_byte_t mmu_read_b(riscv_t *rv, riscv_word_t addr);

store:

static void mmu_write_w(riscv_t *rv, riscv_word_t addr, riscv_word_t data);
static void mmu_write_s(riscv_t *rv, riscv_word_t addr, riscv_half_t data);
static void mmu_write_b(riscv_t *rv, riscv_word_t addr, riscv_byte_t data);

MMU helper function:

static riscv_word_t *mmu_walk(riscv_t *rv, riscv_word_t addr);

Obviously, we can pass a variable to indicate the width of the data and reduce the number of MMU related functions but I believe one function does one thing well might be a better adoption. Also, they might show more consistency upon the existing riscv_io_t callback functions.

@ChinYikMing
Copy link
Collaborator Author

Additionally, I am considering the possibility of eliminating the mmu_{read,write} callback functions within the definition and registration of riscv_io_t. Can you refine it accordingly?

One thing to notice is that: after the commit 8355777, the I/O interface are binding during initialization, thus no opportunity is given for user registration. Similar situation for mmu_{load, store} callback functions.

@jserv
Copy link
Contributor

jserv commented May 3, 2024

Obviously, we can pass a variable to indicate the width of the data and reduce the number of MMU related functions but I believe one function does one thing well might be a better adoption. Also, they might show more consistency upon the existing riscv_io_t callback functions.

Agree. Prior to the refinement of memory operations, I was thinking of Duff's device to unify these functions with various widths. However, we can benefit from compiler optimizations by using specialized functions which are not exposed and would be only hooks during initialization.

@ChinYikMing
Copy link
Collaborator Author

Obviously, we can pass a variable to indicate the width of the data and reduce the number of MMU related functions but I believe one function does one thing well might be a better adoption. Also, they might show more consistency upon the existing riscv_io_t callback functions.

However, we can benefit from compiler optimizations by using specialized functions which are not exposed and would be only hooks during initialization.

Yes, declare MMU related functions using static storage-class-specifier and inline function-specifier has potential to optimize them by inlining them via compiler optimization and do not expose them. Does hooking them at initialization still necessary in this way?

@jserv
Copy link
Contributor

jserv commented May 4, 2024

Does hooking them at initialization still necessary in this way?

Not necessary. Let's proceed.

@ChinYikMing
Copy link
Collaborator Author

Since we have ISA and system emulator, it should provide a way to turn on or off the MMU support. There are two ways to do this:

  1. For every memory access, check if a variable rv->mmu_on is set. If yes, consider the address as virtual address.
  2. Pre-select the I/O handlers/callbacks and bind to riscv_io_t interface during the initilization of RISC-V instance.

Obviously, option 2 has lower overhead than option 1. If option 2 is used, the existing riscv_io_t interface remain unchanged but only different handlers/callbacks.

@jserv
Copy link
Contributor

jserv commented May 11, 2024

  1. Pre-select the I/O handlers/callbacks and bind to riscv_io_t interface during the initilization of RISC-V instance.
    Obviously, option 2 has lower overhead than option 1. If option 2 is used, the existing riscv_io_t interface remain unchanged but only different handlers/callbacks.

I prefer option 2. When the MMU is not set by Linux during early boot, which set of memory handlers would be used?

@ChinYikMing
Copy link
Collaborator Author

  1. Pre-select the I/O handlers/callbacks and bind to riscv_io_t interface during the initilization of RISC-V instance.
    Obviously, option 2 has lower overhead than option 1. If option 2 is used, the existing riscv_io_t interface remain unchanged but only different handlers/callbacks.

I prefer option 2. When the MMU is not set by Linux during early boot, which set of memory handlers would be used?

According to the Sv32 description in RISC-V privileged 20211203 section 4.3, the MODE field of satp CSR determines whether the MMU is on or off. During the early boot, some temporarily kernel mapping setup by kernel function setup_vm should set the MODE off (or Bare mode). For further detail, refer to the comment of source code of kernel function setup_vm which states that the setup_vm is called in MMU-off mode.

In summary, rv32emu can check if MODE and decide whether to translate the address or not. Particularly, we can simply disable translation then read and write data directly from the given address by basic I/O functions defined in io.[ch].

ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue May 12, 2024
The purpose of this commit is to boot 32-bit RISC-V Linux in the future.
The virtual memory scheme to support is Sv32. There are one change to
original code base to adapt the MMU:
   The prototype of riscv_io_t interface needs to be changed.
   Particularly, add a RISC-V instance(riscv_t) as the first parameter.
   MMU related callbacks require to access the satp CSR to perform a
   page table walk during virtual memory translation but satp CSR is
   stored in RISC-V instance(riscv_t), thus it should have a way to
   access the satp CSR. The trivial solution is adding RISC-V
   instance(riscv_t) to the prototype of riscv_io_t interface.
After this change, we can reuse riscv_io_t for system emulation
afterward.

The rest of changes are implementing the Sv32 virtual memory scheme. For
every memory access, it has to walk through the page table to get the
corresponding PTE. Depends on the retrieval of PTE, there are several
page faults to be handled if necessary, so there are three exceptions
handlers have been introduced which are insn_pgfault, load_pgfault, and
store_pgfault and they are used in MMU_CHECK_FAULT. In this commit, the
access fault are not handled well since they are related to PMA and PMP
and they might not the must to boot 32-bit RISC-V Linux (tested on
semu).

Related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue May 12, 2024
The purpose of this commit is to boot 32-bit RISC-V Linux in the future.
The virtual memory scheme to support is Sv32. There are one change to
original code base to adapt the MMU:
   The prototype of riscv_io_t interface needs to be changed.
   Particularly, add a RISC-V instance(riscv_t) as the first parameter.
   MMU related callbacks require to access the satp CSR to perform a
   page table walk during virtual memory translation but satp CSR is
   stored in RISC-V instance(riscv_t), thus it should have a way to
   access the satp CSR. The trivial solution is adding RISC-V
   instance(riscv_t) to the prototype of riscv_io_t interface.
After this change, we can reuse riscv_io_t for system emulation
afterward.

The rest of changes are implementing the Sv32 virtual memory scheme. For
every memory access, it has to walk through the page table to get the
corresponding PTE. Depends on the retrieval of PTE, there are several
page faults to be handled if necessary, so there are three exceptions
handlers have been introduced which are insn_pgfault, load_pgfault, and
store_pgfault and they are used in MMU_CHECK_FAULT. In this commit, the
access fault are not handled well since they are related to PMA and PMP
and they might not the must to boot 32-bit RISC-V Linux (tested on
semu). More S-mode and M-mode CSR helper macro are introduced as well
for future needs.

Related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue May 12, 2024
The purpose of this commit is to boot 32-bit RISC-V Linux in the future.
The virtual memory scheme to support is Sv32. There are one change to
original code base to adapt the MMU:
   The prototype of riscv_io_t interface needs to be changed.
   Particularly, add a RISC-V instance(riscv_t) as the first parameter.
   MMU related callbacks require to access the satp CSR to perform a
   page table walk during virtual memory translation but satp CSR is
   stored in RISC-V instance(riscv_t), thus it should have a way to
   access the satp CSR. The trivial solution is adding RISC-V
   instance(riscv_t) to the prototype of riscv_io_t interface.
After this change, we can reuse riscv_io_t for system emulation
afterward.

The rest of changes are implementing the Sv32 virtual memory scheme. For
every memory access, it has to walk through the page table to get the
corresponding PTE. Depends on the retrieval of PTE, there are several
page faults to be handled if necessary, so there are three exceptions
handlers have been introduced which are insn_pgfault, load_pgfault, and
store_pgfault and they are used in MMU_CHECK_FAULT. In this commit, the
access fault are not handled well since they are related to PMA and PMP
and they might not the must to boot 32-bit RISC-V Linux (tested on
semu). More PTE, S-mode, M-mode CSR helper macro are introduced as well.

Related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue May 12, 2024
The purpose of this commit is to boot 32-bit RISC-V Linux in the future.
The virtual memory scheme to support is Sv32. There are one change to
original code base to adapt the MMU:
   The prototype of riscv_io_t interface needs to be changed.
   Particularly, add a RISC-V instance(riscv_t) as the first parameter.
   MMU related callbacks require to access the satp CSR to perform a
   page table walk during virtual memory translation but satp CSR is
   stored in RISC-V instance(riscv_t), thus it should have a way to
   access the satp CSR. The trivial solution is adding RISC-V
   instance(riscv_t) to the prototype of riscv_io_t interface.
After this change, we can reuse riscv_io_t for system emulation
afterward.

The rest of changes are implementing the Sv32 virtual memory scheme. For
every memory access, it has to walk through the page table to get the
corresponding PTE. Depends on the retrieval of PTE, there are several
page faults to be handled if necessary, so there are three exceptions
handlers have been introduced which are insn_pgfault, load_pgfault, and
store_pgfault and they are used in MMU_CHECK_FAULT. In this commit, the
access fault are not handled well since they are related to PMA and PMP
and they might not the must to boot 32-bit RISC-V Linux (tested on
semu). Some S-mode CSRs are added to riscv_internal to support S-mode.
PTE, S-mode and M-mode CSR helper macro are introduced as well.

Related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue May 12, 2024
The purpose of this commit is to boot 32-bit RISC-V Linux in the future.
The virtual memory scheme to support is Sv32. There are one change to
original code base to adapt the MMU:
   The prototype of riscv_io_t interface needs to be changed.
   Particularly, add a RISC-V instance(riscv_t) as the first parameter.
   MMU related callbacks require to access the satp CSR to perform a
   page table walk during virtual memory translation but satp CSR is
   stored in RISC-V instance(riscv_t), thus it should have a way to
   access the satp CSR. The trivial solution is adding RISC-V
   instance(riscv_t) to the prototype of riscv_io_t interface.
After this change, we can reuse riscv_io_t for system emulation
afterward.

The rest of changes are implementing the Sv32 virtual memory scheme. For
every memory access, it has to walk through the page table to get the
corresponding PTE. Depends on the retrieval of PTE, there are several
page faults to be handled if necessary, so there are three exceptions
handlers have been introduced which are insn_pgfault, load_pgfault, and
store_pgfault and they are used in MMU_CHECK_FAULT. In this commit, the
access fault are not handled well since they are related to PMA and PMP
and they might not the must to boot 32-bit RISC-V Linux (tested on
semu). Some S-mode CSRs are added to riscv_internal to support S-mode.
PTE, S-mode and M-mode CSR helper macro are introduced as well.

Related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue May 13, 2024
SBI acts as a communication layer between S-mode software and M-mode
hardware. To boot Linux kernel, there are some minimal SBI extensions
have to be implemented and they are:
1. Base extension(EID=0x10)
2. Timer extension(EID=0x54494D45)

SRST extension(EID=0x53525354) is optional so just implemented shutdown
reason.

Related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue May 13, 2024
SBI acts as a communication layer between S-mode software and M-mode
hardware. To boot Linux kernel, there are some minimal SBI extensions
have to be implemented and they are:
1. Base extension(EID=0x10)
2. Timer extension(EID=0x54494D45)

SRST extension(EID=0x53525354) is optional so just implemented shutdown
reason.

Related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue May 13, 2024
SBI acts as a communication layer between S-mode software and M-mode
hardware. To boot Linux kernel, there are some minimal SBI extensions
have to be implemented and they are:
1. Base extension(EID=0x10)
2. Timer extension(EID=0x54494D45)

SRST extension(EID=0x53525354) is optional so just implemented shutdown
reason.

Related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue May 13, 2024
SBI acts as a communication layer between S-mode software and M-mode
hardware. To boot Linux kernel, there are some minimal SBI extensions
have to be implemented and they are:
1. Base extension(EID=0x10)
2. Timer extension(EID=0x54494D45)

SRST extension(EID=0x53525354) is optional so just implemented shutdown
reason.

Related: sysprog21#310
ChinYikMing added a commit to ChinYikMing/rv32emu that referenced this issue May 13, 2024
SBI acts as a communication layer between S-mode software and M-mode
hardware. To boot Linux kernel, there are some minimal SBI extensions
have to be implemented and they are:
1. Base extension(EID=0x10)
2. Timer extension(EID=0x54494D45)

SRST extension(EID=0x53525354) is optional so just implemented shutdown
reason.

Related: sysprog21#310
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants