Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate deterministic QEMU behavior #307

Open
rafalcieslak opened this issue May 13, 2017 · 7 comments
Open

Investigate deterministic QEMU behavior #307

rafalcieslak opened this issue May 13, 2017 · 7 comments
Assignees
Labels

Comments

@rafalcieslak
Copy link
Contributor

rafalcieslak commented May 13, 2017

I've stumbled upon:

http://wiki.qemu.org/index.php/Features/record-replay

Summary:

  • Deterministically replays whole system execution and all contents of
    the memory, state of the hardware devices, clocks, and screen of the VM.
  • Writes execution log into the file for later replaying for multiple times
    on different machines.
  • Performs deterministic replay of all operations with keyboard and mouse input devices.

This feature is very interesting for us! Chances are it could help us drop OVPsim entirely. It does need a closer look first. Some questions that need answering:

  1. Does it support MIPS? Some sources say it does, some say it does not.
  2. How well does it work with GDB? Do we get cycle-exact behavior on each replay?
  3. How large are the replay files, in practice? How can we download such replay file from Travis, to investigate a test failure? Is it okay replay with a different qemu version than recorded?
  4. What would be a convenient way of integrating replays with our workflow?
  5. Is record-replay everything we need for testing and debugging, or do we use OVPsim's determinism in some other ways this feature would not provide?

Additional documentation can be found here:

https://github.com/qemu/qemu/blob/master/docs/replay.txt

@cahirwpz: Suppose the answer to all questions above is "Perfect for our needs". Would you then consider dropping support for OVPsim as a viable option? If so, then this task is probably of a high priority.

EDIT: Also:

6. Do we even need recording? Using -icount shift=N might be enough to enable deterministic behavior.

@rafalcieslak
Copy link
Contributor Author

  1. How large are the replay files, in practice? How can we download such replay file from Travis, to investigate a test failure? Is it okay replay with a different qemu version than recorded?

Well, the replay files aren't small. Running test=all with -icount shift=7 produces a 14MB file, it then grows about 5MB/second. Tunning test=all repeat=5 produces a 40MB file.

The replays are qemu version sensitive, thus if Travis provided us with a recoding for a test failure, we would need to debug it using the exact same qemu version as installed on Travis. That's not a big problem though, we can install multiple qemu versions locally, or deploy a particluar new qemu version onto Travis just as we do with the toolchain.

The problem might be with getting these files back from Travis. 10MB is far too much to push via raw output. Travis supports uploading result files to S3, but we would need to pay for AWS storage. Maybe it would be possible to have Travis upload results to the mimiker server, but we'd need to figure out a way of doing it securely so that nobody else can push junk onto the server.

@rafalcieslak
Copy link
Contributor Author

  1. Does it support MIPS? Some sources say it does, some say it does not.

It seems to.... Kinda. With -icount shift=7,sleep=off I seem to receive timer interrupts at the exact same instruction every time. I'll need to test this in more detail (e.g. prepare a non-deterministically failing test and see whether the ktest seed is enough to reproduce it 100%), but initial observations make me very hopeful!

There is a problem with replaying though. It seems to be supported, but something's off and the kernel gets stuck in initrd_build_tree_and_names, looping forever. If that's a bug on our side, then it should also emerge during a recording. But maybe qemu provides the initrd somehow differently for a replay run?

@cahirwpz
Copy link
Owner

That's an awesome finding you've made!

@cahirwpz: Suppose the answer to all questions above is "Perfect for our needs". Would you then consider dropping support for OVPsim as a viable option? If so, then this task is probably of a high priority.

Knowing how deeply device emulation is broken in OVPsim - more than happily ;-)

@cahirwpz
Copy link
Owner

For the record, the decision of dropping OVPsim will automatically render issues #293 and #286 obsolete.

@rafalcieslak
Copy link
Contributor Author

While debugging #328 I've met with a lot of random synchronization bugs that were unreproducible. Out of curiosity, I added -icount shift=7 to QEMU options, and now each tests seed causes the kernel to crash in identical manner! Running ./launch with -d option allows me to immediately debug a repro case just by passing the seed.

I'd say we can safely enable -icount shift=7 in master (even if we don't want to support recording/playback ATM), and see if/how it helps us. It has unnoticeable performance penalty, and there is a very high chance it may help us reproduce Travis problems. Then, after some time, we'll examine if this setup is right for us and how to proceed with OVPsim. What do you think?

@cahirwpz
Copy link
Owner

I'm in favour of enabling the flag in master. Please do so!

@rafalcieslak
Copy link
Contributor Author

There is one detail that worries me, though. Making QEMU deterministic makes it 1000x less likely to trigger synchronization bugs. See #328, as soon as I enabled icount in a recent commit, Travis build no longer fails!

This is related to why we had such a hard time finding bugs on OVPsim - it's because of it's cycle-exact nature.

I don't know know whether the reduced chance of sync problems is a bad thing (will cause us to detect less problems), or actually a benefit (QEMU default chaotic execution is unrepresentative of hardware, and it triggers buggy scenarios that never would happen on a real machine). I don't want to diminish Travis' capability of bug-finding by enabling icount, but (maybe?) it actually fixes stuff we shouldn't pay attention to?

Another observation I have is that Travis uses QEMU 2.0. When testing #328 locally with the same version of QEMU I observed no problems. However, switching to a recent QEMU 2.8.1 the run_tests.py script was able to find (100% reproducible!) problems. Therefore if we decide to enable icount we'll need to find a way to get a newer QEMU on Travis.

@cahirwpz cahirwpz added this to the Summer 2017 milestone May 31, 2017
@cahirwpz cahirwpz removed this from the Summer 2017 milestone Dec 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants