Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Toplevel ref cells interfere with static promotion #1956

Open
ayakayorihiro opened this issue Mar 4, 2024 · 4 comments
Open

Toplevel ref cells interfere with static promotion #1956

ayakayorihiro opened this issue Mar 4, 2024 · 4 comments
Labels
C: Calyx Extension or change to the Calyx IL Type: Bug Bug in the implementation

Comments

@ayakayorihiro
Copy link
Contributor

ayakayorihiro commented Mar 4, 2024

Branching off of #1603 - one blocker for deprecating @external and adopting ref's synthesis mode everywhere is that currently the version of Verilog generated by synthesis mode (externalizing all memories) incurs a performance cost.

Below is a comparison between (1) Calyx's current simulation mode (emitting readmemh/writememh - Calyx) and (2) Using ref to externalize memories (CalyxRef). Time [ms] is the average Verilator simulation time in milliseconds, and Cycles is the number of cycles taken by the simulation. The cycle counts for CalyxRef are about 1.5-2x more than those for Calyx, and that affects the simulation time as well.
image

This performance drop is most likely because of additional FSMs generated by the Calyx compiler to keep track of memory updates.

"Reproduction"/Trying out CalyxRef

You can run Calyx programs in CalyxRef by using fud2 from the fud2-refmem branch:

cd <CALYX_REPO>
git pull
git checkout fud2-refmem
cargo build
( cd fud2 ; cargo build)

Once you have fud2 set up, you can run CalyxRef by adding --through calyx-to-verilog-refmem to your fud2 simulation command:
ex)

fud2 examples/tutorial/language-tutorial-iterate.futil --to dat --through calyx-to-verilog-refmem -s sim.data=`pwd`/examples/tutorial/data.json

should give you

{
  "cycles": 76,
  "memories": {
    "mem": [
      42
    ]
  }
}

whereas running Calyx without externalized memories (Calyx):

fud2 examples/tutorial/language-tutorial-iterate.futil --to dat --through calyx-to-verilog -s sim.data=`pwd`/examples/tutorial/data.json

gives you

{
  "cycles": 37,
  "memories": {
    "mem": [
      42
    ]
  }
}

.

@rachitnigam
Copy link
Contributor

Thanks for the explanation! I'm not sure if there is anything to do about this. The readmemh and writememh functions hide the latency to read and write IO memories and ref is just making it explicit. If you want to report only the compute part, you can subtract out the amount of time to read and write things.

Anyways, this goes to show that real accelerators need to optimize more than just compute; memory movement often dominates!

@sampsyo
Copy link
Contributor

sampsyo commented Mar 9, 2024

Indeed; thanks, @ayakayorihiro, for the detailed reproduction instructions! I could even reproduce the effect for the simpler language-tutorial-compute.futil: 3 cycles vs. 5 cycles. I admit I got a little nerdsniped by this. I don't think it should be slower, and I wanted to know why it was. Relevantly, re. @rachitnigam's comment:

The readmemh and writememh functions hide the latency to read and write IO memories and ref is just making it explicit.

I don't think this is the case here. readmemh and writememh exist in both treatments. And the ref shouldn't add any cycles, I think: we're either accessing a comb_mem_d1 locally or by reference, in the testbench. This makes me think something is wrong with the way we're compiling the ref in general, and it might not have to do with the testbench approach.

Here are two VCDs I dumped from Icarus simulation of the two under the two treatments
normal.vcd.txt refmem.vcd.txt

The thing to notice in the waveforms is that the ref version has two FSM registers, fsm and fsm0, while the "normal" one just gets 1:
normal-waveform
ref-waveform

I dug into the way these two are compiled (going pass-by-pass through the compilation pipeline). Critically, it looks like static promotion failed for the ref version. Its control after promotion is:

seq {
  static<2> seq  {
    read0;
    upd0;
  }
  write;
}

Whereas the "normal" version yields:

static<3> seq  {
  read0;
  upd0;
  write0;
}

(To reproduce this, try:

cargo run -- -p infer-data-path -p compile-invoke -p static-inference -p static-promotion examples/tutorial/language-tutorial-compute.futil

and add -p external-to-ref at the beginning to see the difference.)

In the end, the problem is that the program's write group is not inferred static when the underlying memory is a ref. That group looks like this, FWIW:

group write {
  mem.addr0 = 1'b0;
  mem.write_en = 1'b1;
  mem.write_data = val.out;
  write[done] = mem.done;
}

(The read group is always inferred static, probably because it's reading combinationally from a comb_mem_d1.)

Anyway, it seems pretty clear what's going on: top-level ref memories, unlike top-level @external memories, are interfering with static inference. That seems like the bug to fix here.

@sampsyo sampsyo changed the title Cycle count blowup and performance drop when using ref-based testbench Cycle count blowup with ref-based testbench caused by static inference limitations in the presence of ref Mar 9, 2024
@ayakayorihiro
Copy link
Contributor Author

Wow, thanks a lot for looking into this @sampsyo ! It's good to know that the problem was from a bug rather than something fundamental about externalizing memories. If we can get this fixed then there seems to be no big obstacles around always externalizing memories via ref?

@sampsyo
Copy link
Contributor

sampsyo commented Mar 12, 2024

Indeed, that would be awesome! Then we would have only one testbench where the readmemh/writememhs live, and we could remove a lot of intelligence from the compiler itself that currently deals with emitting this…

@rachitnigam rachitnigam changed the title Cycle count blowup with ref-based testbench caused by static inference limitations in the presence of ref Toplevel ref cells interfere with static promotion Apr 23, 2024
@rachitnigam rachitnigam added Type: Bug Bug in the implementation C: Calyx Extension or change to the Calyx IL labels Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Calyx Extension or change to the Calyx IL Type: Bug Bug in the implementation
Projects
None yet
Development

No branches or pull requests

3 participants