-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calyx wrapper for Berkeley HardFloat Verilog library #1928
base: calyx-float
Are you sure you want to change the base?
Conversation
Woo! Awesome! This is exactly what I had in mind as well! Couple of things:
|
Just opened up
Agree. Yes
Sounds good! So we start with first extending the existing parser? |
Hi @rachitnigam ! Just want to share an update that the current wrapper can perform floating add/sub and multiplication! So I think I can move on to extending the user input data type to conduct more thorough tests. Any thoughts or tips on initiating and streamlining this process? Thanks! |
User inputs seem like the right next step. Do you want to create a plan to scope out the work in this PR? For example one primitive plus fud extension should be enough |
Sure! What does "fud extension" mean here, fud numeric type extension for floating-point values? And do you agree that Calyx users will only use normal floating-point values, and at most, they will use IEEE format representation, and that they barely want to interact with HardFloat specific formats like "recoded" formats and "raw deconstructions"? If that's the case, then I think ideally, I'd start with Now I only have
If we indeed want an |
Hi @rachitnigam , I think I have some idea about how to proceed! Can you check my plan to see if it makes sense?
|
Hi @jiahanxie353, I'm traveling a lot this month so I'll recommend that you go ahead and try things out instead of waiting on my responses. Folks in the Calyx zulip and slack should also be able to help. |
Sure thing, sounds good! |
Hi @rachitnigam , just finished a milestone that we can support passing floating point values in json and perform two floating-point numbers addition when using fud! |
primitives/float/addFN.futil
Outdated
@@ -0,0 +1,16 @@ | |||
extern "addFN.sv" { | |||
primitive addFN[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this implemented combinationally? If so, we should mark it as a comb primitive
instead of primitive
. Also, we should probably not implement this as a combinational primitive at all since that is unlikely to meet timing?
Looks cool! Couple of notes:
|
Yeah we definitely want floating point units to be pipelined when used in real designs. I think berkeley hardfloat is all combinational? If that's the case we can maybe add registers at the beginning and/or end and hope that retiming will do a decent job. In Vitis HLS all the floating point units I've seen are between 3 and 5 stage pipelines. I think this is definitely another case where allowing the latency to be parameterized would be very helpful. |
@jiahanxie353 I think the move is to pipeline the output value of the adder by 4 cycles for now. Once we have a set of primitives and some designs, we can push them through the FPGA flow and see if designs are correctly being mapped onto DSPs on the FPGA (which are hardened blocks for efficiently performing operations likes add and mult). Once this change is done, I think you're good to merge! Next step would be to change |
Excellent. I'm late to the party, but I agree with where everything has ended up: namely, @jiahanxie353, your plan enumerated above sounds right to me, and @rachitnigam's suggested next steps (test inputs from data files, and a fixed 4-cycle latency) are also the right things to do here. To expand in a possibly-unnecessary way about the 4-cycle latency: basically, the idea in Berkeley HardFloat is that all the operations are provided as combinational logic, but realistic designs do not want combinational FP ops. So its intended use case is that people just stick "useless" registers in front of or behind the HardFloat operation, and the EDA toolchain does its best to perform retiming (magically transforming the combinational-plus-registers description into a proper, balanced pipeline). So one option would be to expose these as |
One extra category of logistical discussion:
|
66504c2
to
dd3ca25
Compare
Thanks for all the advice folks! Got a 4-cycle adder by using dummy registers. As per
I believe you are right. That's why I kept
in the source files like here for attribution. Is there anything else I need to do for copyright?
It'll probably be annoying to adapt to upstream changes in the future if we just use a static snapshot. The issue is that HardFloat is implemented in Chisel in their repo. Implementing a translator/transpiler from Chisel to Verilog could (?) solve the concern but I doubt it's worth the effort. And since there's no direct Verilog implementation, I believe people just use the Verilog files obtained from a zip file in this page. |
This all sounds good! One extremely tangential note: "transpiler" is not a meaningful term: https://rachit.pl/post/transpiler/ Chisel already has a compiler to Verilog. |
Got it; thanks, @jiahanxie353! Here's my suggestion about how to deal with including external source code like this. I think we have two options:
Anyway, I think either could work! Just wanted to lay out the options. |
Got it, I'll take the second fetch-and-unzip approach! While working on it, got a decision to make regarding (1) changing the HardFloat source code after fetching it; (2) modifying
Do you have any preference over 2 options? |
I see! Good question… do you think you could elaborate a tiny bit more on the problem that each of these changes would solve? Here is my guess, but I am low-confidence: The way (most? all?) Calyx primitives currently work is that the Verilog code for each primitive gets inlined into the generated Verilog code. The result is an entirely self-contained Verilog program, consisting of both primitive code and Calyx-compiled code, that we can compile all by itself. HardFloat presents another level of logistical complexity because, of course, we need to include a bunch of external files. The current solution is that our calyx/primitives/float/addFN.sv Lines 4 to 6 in ebd2069
This in turn requires the actual HardFloat source files to use the same path prefix to
Obviously, the "pristine" HardFloat source files don't know about the path
…and that doesn't necessary work by default to include the file relative to the containing source file, for some reason. I don't understand Verilog!!! It seems like it should just work! It is annoying that it doesn't!!!!!!!
Is that a correct summary of the situation? If so, just adding the flag seems way easier to me… clearly, there is more we should do around the problem of Calyx now not emitting self-contained Verilog (it now has |
Yes, that's pretty much the situation and Firstly, the "pristine" HardFloat has this wacky kind of import: Even if they intended to mean relative import, they should at least write:
But anyhow, this is not difficult to solve. And my proposed solution is to append Line 109 in 4bc3007
A trickier thing seems unsolvable by using
This line is manually inserted by me, and there's no such import in the "pristine" HardFloat. And in fact, there's no single file that has As a workaround, |
I haven't kept track of this thread that well but one thing that would be good to have is being able to have a default flow that generates exactly one Verilog file like we do today. We can emit some information information the user that they've requested that we link with HardFloat and therefore are implicitly agreeing to its LICENSE. We also should be carefully when we provide a new release on crates.io if we place Hardfloat in |
* Make length an argument * FIFO need not flash ans * PIFO need not flash ans * Shared logic for peek, pop * Change to queue length of 16 * Stashing current Calyx output and stats thereof * New numbers * Nix stray file * Nix more stray files * Update fifo.py * Queue length factor in FIFO * Take max FIFO length factor as arg * Make PIFOs use the new style of FIFOs
* Rename a few methods * Catch up the user of those methods
* Tidy one file * Some easy cleanup in gen-systolic * Massage away one CompInst * Massage away all use of py_ast * A litte cleanup * No manual imports * Return type annotation * Revert change to unassociated file * Another return type
* cli options * static control tests * fsm implementation type * FSM allocation * extended instatiation of staticfsm object with one hot encoding * one hot * clippy * clippy * does this make clippy stop complainng * one-hot initialize to 00001 * cleaner code * loose ends * code refactoring * hopefully github shows fewer changes * clippy * rewrite tests * documentation, code cleaning * better documentation * higher one-hot cutoff --------- Co-authored-by: Parth Sarkar <ps679@cornell.edu>
* Neaten mul_group * Neaten another helper * A little more neatening * Less use of Cell * Reduce manual imports * Catch up to main * Stray doctring issue
* partial checkpoint * partial checkpoint * external tool checkpoint * reorder the main construction for cider * minor tweaks * update the main method to not error out * simple tidying * another partial checkpoint * Update the construction of the main method and serialize the memory data dumps * make it possible to load in values to memory
* Neaten mul_group * Neaten another helper * A little more neatening * Less use of Cell * Reduce manual imports * Catch up to main * Stray doctring issue * No cells when defining component * More thoroughly remove the power to init a component with list of cells
* Tidy some cell generation * Take a port by handle * Use new lsh_use * Use lsh_use more. * More tidying using builder helpers * No names for registers * Tidy imports
* Basic scaffolding for metadeta * Implemented new_metadata.pset and new_parser.rs * Added new_parser and test cases * Enabled debugger to debug any .futil file * Clippy complaints * Adjusted code to PR suggestions and added some comments
* keep @one_hot attribute * progress * fsm-opt attrs * undo change * clippy * rewrite test * attrs fn * remove extraneous file * feedback
* Refactor program counter to avoid `std::mem::take` * add a helper method * Refactor code to use `lookup_string` method for retrieving names in the environment * Refactor code to use `lookup_string` method for retrieving names in the environment * Minor tweak * stab at fud2 integration * bump MSRVs to make clippy stop bugging me * my life is a never ending litany of suffering * snapshotsssss * more snappéd shots
* Attempt at CombComponent in ast * Attempt to add comb comp to builder * Test comb groups with tuplify * Testing for comb comp * Stray assert * Fix slice helper, use correctly * Use HI instead of 1 * Clean up assertion message * Use cat instead of manually padding * Document cat
* Hazarding a guess at seq_mem docs * Hella periods * Griffin comments * Undefinedness comment from @sampsyo * Flesh out for d2-d4
calyxir#2040) * add new option * Barebones hack version created... Now I want to write to a file * Fixing clippy errors * Small scripts for first pass VCD parsing * very hacky way to only get group names * Bare bones python script to output group to number of cycles * Fix clippy errors * Output JSON to file * remove unused import * Remove hack for obtaining group to states * Small wrapper script to run TDCC, simulation, and output cycle counts * Fix clippy errors * Small changes to scripts * Get Enables to directly produce FSMInfo * First pass code for collecting FSMInfos across multiple TDCC groups * Fix clippy errors * remove debugging prints * Clean things up before making PR * Clean up more comments * Allow passing Calyx arguments from command line * Update fud2 tests to have empty args variable * Replace script to get no-opt vcd with fud2 command * README for scripts in tools/vcd-parsing * Documentation comments for new structs * Use Id instead of String for JSON structs
Concretely, this improves behavior in a few ways: * better error messages when Ninja fails * don't attempt to print stuff to stdout when Ninja fails * clean up .fud2 temp directory even when Ninja fails
* Correct command to list passes * pass-help instead of list-passes
* Require group names for comb_mem loads and stores * Require group names for seq mem loads and stores * Rename for consistency * Unify two helpers * Unify two helpers for seq mem reads * Unify two more helpers * Use the argument label more often * Some backticks
* add the continuous assigns * rearrange some stuff * add a flush * with stuff! * cleanup plus error message * add the with assigns to the assignment collection
This solves a confusing problem where using fud2 in "stdout mode" (i.e., without `-o`) would suppress errors from the underlying commands by default (i.e., unless you *also* remember to use `-v`). We don't want Ninja printing stuff to stdout, but we also don't like hiding the error messages, so a sensible solution is to just unconditionally make Ninja's progress messages go to stderr. Seems reasonable to me, semantically: all of fud2's output other than the actual content of a "stdout mode" output should go to stderr. I also tried "buffering up" the stderr and printing it only when there's an error. This leads to marginally "cleaner" output in "stdout mode" because we don't even see Ninja's progress messages. But it also means buffering and all the various conditions get quite messy... and we also lose colors in the errors because they don't think they are talking to a terminal anymore! I think this solution is a bit simpler and better overall.
* committing old version of testbench generator * [WIP] reading json * remove extraneous files * Parse json from YXI file * small fixes * Fix for comb_mem being interpreted as Sequential * Fixed YXI test using combinational memory * [WIP] Midway incorporating custom testbench generator to fud2 * inputs to FIRRTL should contain ref, not @external * [WIP] First pass at very convoluted way to get FIRRTL and testbench generated. * Fix pipeline through firrtl-noverify. Need to make path through firrtl (for verilator) * Add refmem build for verilator * Supporting FIRRTL w FIRRTL primitives * Fix small bugs * Remove testbench rsrcing * Some script documentation * Remove duplicate comb_mem_d1 * Fixed MissingConfig issue. Now need to double check and update snaps * Change test snapshots and rename setups * Fix tests according to current fud2 behavior * Update documentation comment * Replace dummy script with dummy command * Fix tests according to previous change * Added memories.sv hack to Icarus as well * Change test snapshots for Icarus (produce and use memories.sv) * Create new rules that either do or don't involve memories.sv instead of creating a blank file * Fix documentation comment
In calyxir#2058, I made fud2 clean up the `.fud2` temporary directory even when Ninja exited with an erorr. Unfortunately, that doesn't cover other possible errors, such as when launching the Ninja process fails altogether: calyxir#2058 (comment) Rather than patching one error at a time, this uses `Drop` to be sure we do this when any error occurs. It also means it will get deleted even when we panic, which is *definitely* overkill, but I guess it's not a big deal.
…t part select out of order
Implement a wrapper in Calyx for Berkeley HardFloat
fNToRecFN
(IEEE standard format to HardFloat recoded format) Verilog module (adopted based on PyMTL's corresponding file).And include a corresponding test case to convert from standard format to recoded format when the input floating-point number is a normal number.
I want to first create a draft PR to make sure that I'm on the right track. And I will include more test cases, such as subnormals, NaNs; and work on other HardFloat modules.