-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latency with MappedMemory #581
Comments
Hi @slosar, thanks for asking the question. Renode simulations operate based on fully-controlled virtual time flow which allows us to obtain reproducible results and recreate scenarios where timing of events is important from the perspective of a simulated application. Having said that, at the CPU level we operate on a simplified model where we assume that on average execution of each instruction takes the same amount of time. The performance of the CPU itself (the number of executed instructions per virtual second) can be controlled with the For simulation performance reasons accesses to To assess performance of your system taking into account memory accesses, you might use the post-mortem analysis of the execution traces. Please take a look at https://antmicro.com/blog/2023/07/risc-v-co-design-using-trace-based-simulation-with-renode-and-tbm/ and https://antmicro.com/blog/2022/09/execution-tracing-in-renode/. Of course extending current CPU model with the notion of long-running instructions is technically possible, but would require some design and implementation work. |
@mateusz-holenko - that's interesting. In a fit of trying to understand what the assumption behind the average execution time is, I simulated an STM32F4 and ran 1M loops of 100 It turns out that the HW board performs only marginally better than the Renode simulation (613.1ms on HW vs 625ms on renode). This seems to indicate that the average execution time that renode assumes for a Cortex-M4 is only very slightly (~2%) higher than 1. Is that an expected result? Can I tweak that number somewhere? |
A somewhat related question, would it be possible to implement something like that as a python peripheral. Can python peripheral force "wait" cycles to the CPU? In practice any modern system does not use a constant time to fetch contents of memory location, but any effects of cache, etc should still be quite straightforward to deterministically simulate. |
I think any NOP takes exactly one cycle to execute. What you are seeing is almost certainly time "quantization" I think. Have you tried taking 10M loops vs 1M and see if the discrepancy changes? |
There are means of pushing virtual time forward without executing any instructions in Renode and we sometimes use it to improve performance of simulating busy-waiting sleep implementations.
This adds a Python hook executed each time the You can apply the same
Another way of delaying events happening in a peripheral is to use the Again please note that this is currently not directly applicable to executable memory accesses. Support for this scenario is also technically possible, but would require engineering work on the CPU model side. |
Description
We are simulating a system that has some internal on-fabric memory and a larger DDR external memory. The external memory is "slower" i.e. there is latency associated with reads (writes might be pipelined). Can this be at least approximately simulated by inserting wait states to "MappedMemory"?
Usage example
where latencies are expressed in clock cycles.
Additional information
I'm surrpised this is not implemented yet, but look at the code seems to show that indeed it is not, but perhaps I should be using a different peripheral for this.
Do you plan to address this issue and file a PR?
Perhaps, if the issue becomes burning enough.
The text was updated successfully, but these errors were encountered: