Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The checkpoint feature may not have successfully saved and restored some data. #802

Open
despotx opened this issue Jan 24, 2024 · 4 comments
Labels

Comments

@despotx
Copy link

despotx commented Jan 24, 2024

Describe the bug
The checkpoints are malfunctioning.

Affects version
Commit bae3487 (HEAD -> stable, tag: v23.1.0.0, origin/stable, origin/HEAD)

gem5 Modifications
None

To Reproduce

  • Gem5 build
git clone https://github.com/gem5/gem5
sudo apt install build-essential git m4 scons zlib1g zlib1g-dev \
    libprotobuf-dev protobuf-compiler libprotoc-dev libgoogle-perftools-dev \
    python3-dev libboost-all-dev pkg-config python3-tk
# Modify the configuration to make PROTOCOL="MESI_Three_Level" take effect.
scons build/RISCV/gem5.{variant} -j {cpus}
  • Keystone deployment
sudo apt install autoconf automake autotools-dev bc \
bison build-essential curl expat jq libexpat1-dev flex gawk gcc git \
gperf libgmp-dev libmpc-dev libmpfr-dev libtool texinfo tmux \
patchutils zlib1g-dev wget bzip2 patch vim-common lbzip2 python3 \
pkg-config libglib2.0-dev libpixman-1-dev libssl-dev screen \
device-tree-compiler expect makeself unzip cpio rsync cmake ninja-build p7zip-full
git clone --recurse-submodules https://github.com/keystone-enclave/keystone.git
make buildroot-configure   # Modify the bootloader configuration, check the option to include the kernel as the payload.
make -j`nproc`
  • Directory structure
.
├── checkpoints
├── gem5
├── keystone
├── m5out
├── riscv-fs.py
  • Configuration script
import m5
from m5.objects import Root

from gem5.components.boards.riscv_board import RiscvBoard
from gem5.components.memory import DualChannelDDR4_2400
from gem5.components.processors.cpu_types import CPUTypes
from gem5.components.processors.simple_processor import SimpleProcessor
from gem5.isas import ISA
from gem5.resources.resource import  KernelResource,DiskImageResource,CheckpointResource,BootloaderResource
from gem5.simulate.simulator import Simulator
from gem5.utils.requires import requires

requires(isa_required=ISA.RISCV)

from gem5.components.cachehierarchies.ruby.mesi_three_level_cache_hierarchy import (
    MESIThreeLevelCacheHierarchy,
)

cache_hierarchy = MESIThreeLevelCacheHierarchy(
    l1d_size="32kB",
    l1d_assoc=8,
    l1i_size="32kB", 
    l1i_assoc=8,
    l2_size="256kB",
    l2_assoc=4,
    l3_size="16MB",
    l3_assoc=16,
    num_l3_banks=1
)

# Memory: Dual Channel DDR4 2400 DRAM device.

memory = DualChannelDDR4_2400(size="3GB")

# Here we setup the processor. We use a simple processor.
processor = SimpleProcessor(
    cpu_type=CPUTypes.ATOMIC, isa=ISA.RISCV, num_cores=2
)

# Here we setup the board. The RiscvBoard allows for Full-System RISCV
# simulations.
board = RiscvBoard(
    clk_freq="3GHz",
    processor=processor,
    memory=memory,
    cache_hierarchy=cache_hierarchy,
)



# Set the Full System workload.
# Use the object file on your machine to specify the arg local_path
board.set_kernel_disk_workload(
    kernel=KernelResource(local_path="./keystone/build-generic64/buildroot.build/images/fw_payload.elf"),
    disk_image=DiskImageResource(local_path="./keystone/build-generic64/buildroot.build/images/rootfs.ext2"),
    # checkpoint=CheckpointResource(local_path="./checkpoints/last_end"), # not at first boot
)

simulator = Simulator(  board=board,
                        full_system=True,
                        )
simulator.run()

simulator.save_checkpoint("./checkpoints/last_end")
  • Run
./gem5/build/RISCV/gem5.fast ./riscv-fs.py 
m5term localhost 3456         # Open a new terminal and execute this command.Wait until the login prompt appears, use the username 'root' and password 'sifive'.
modprobe keystone-driver 
/usr/share/keystone/examples/hello.ke # It should execute successfully if gem5 is started from 0 tick.
# Press Ctrl + C to terminate gem5 and wait for the checkpoints to be written.
# Restore from the last checkpoints.
/usr/share/keystone/examples/hello.ke # Run the command again, and it fails to execute.

Terminal Output

src/arch/riscv/pmp.cc:136: warn: pmp access fault.
src/arch/riscv/pmp.cc:136: warn: pmp access fault.
src/arch/riscv/pmp.cc:136: warn: pmp access fault.

Expected behavior
I want to recover from checkpoints, executing the secure program should not result in PMP access error loops. Checkpoints may have lost some crucial data

Host Operating System
Ubuntu 22.04

Host ISA
X86

Additional information
Add any other information which does not fit in the previous sections but may be of use in fixing this bug.

@despotx despotx added the bug label Jan 24, 2024
@powerjg
Copy link
Contributor

powerjg commented Jan 25, 2024

Sounds like PMP register state isn't saved at the checkpoint. Thanks for the detailed bug report!

Feel free to contribute a change to fix this! I can't give an estimate for when someone on my team will be able to take a look, but it will be at least a week or more.

@despotx
Copy link
Author

despotx commented Jan 27, 2024

I have added serialization and deserialization code for PMP. After restoring from a checkpoint, I did not encounter PMP access errors during the subsequent execution of the secure program. However, a new issue emerged, and here is the terminal output.
image
At this point, the TTY has become unresponsive, and attempting to save the checkpoint and restore results in an inability to proceed in the terminal. How can I locate and resolve this issue. More specifically, how can I overview what data is stored in the checkpoint and which data can be omitted to reduce overhead。

@powerjg
Copy link
Contributor

powerjg commented Mar 8, 2024

You can look at the .cpt file to see all of the data that's stored.

To debug this, here's what I'd do:

  • Use the Exec debug flag after you take the checkpoint for the next few thousand instructions and then do the same thing on the checkpoint restore. I would compare these traces and look for anything that seems "off"
  • My first guess is that there's something missing in the takeOverFrom function somewhere in the RISC-V ISA.

@despotx
Copy link
Author

despotx commented Mar 9, 2024

I think resolving this issue might be a major undertaking. I saw in the gem5 documentation under the checkpoints section as

shown in the figure below.

image

Unfortunately, in my configuration, I am using the Ruby memory model while also using the MESI protocol. I don't quite

understand what the relationship is between checkpoints and cache protocols.

I see that the serialization and deserialization methods are defined in the base class Serializable, and SimObject is a subclass of

Serializable. At the same time, almost all components (including storage components, temporary ones, and those that are commit

targets) inherit from SimObject. Not all storage components implement serialization and deserialization methods, right? So why

would different cache protocols affect checkpoints? These questions might seem silly, but I need to emphasize that I am a

beginner with gem5, and I am already frazzled by the extremely slow simulation speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants