-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate Out FSM in IR when during TDCC and compile-static
#2020
Comments
I would not allow for the I also think we should have something like: query: HashMap<QueryType, ir::Port> // maps query to wire.out that generates it This allows us to memoize and reuse wires from the generated port. |
BTW, I should mention that this is super exciting overall! Thanks for writing up the issue. I think there is a lot of room for cool stuff. I'll add two pointers:
|
I would go ahead and implement a prototype if you have a good sense of what to do @calebmkim and we can discuss it in a future Calyx meeting. |
Yeah, both points make sense.
Yeah, I agree that this is a really interesting/exciting direction!
Sounds good; I can start poking around TDCC and playing with the above^ sketch. |
I'm looking at the TDCC code, and I think it already does something pretty close to this! It has the following
There are a few interesting things we could try to do.
|
Thanks for this writeup and looping me in!! This is all very cool :) Just out of curiosity, can you explain a bit about what the queries are for? |
I think it's mainly to make it easier to reason about optimizations when we are instantiating the FSM. For example, if there is an assignment: I'm not 100% sure if this makes a difference, but it may make sense to have a single wire for each query. For example:
to
If we have Also, if we see there are a lot of queries (i.e., a lot of assignments read from the FSM's current state), then it means the logic for the FSM is getting very complicated, so we might want to split the FSM into 2 separate registers when we actually instantiate it. This increases register usage, but (if the FSM is on the critical path of the design) then it will shorten the critical path since now each FSM has to do less work. Obviously, we don't need |
I see! Thanks a lot for explaining this, it makes a lot of sense to keep the information around for optimization purposes. On a somewhat related note, one of my goals for the week is to get TDCC to emit a JSON file denoting the FSM info (the |
Whoa, that's really cool! That would be really awesome and may be a great addition to the performance dashboard. Not sure if we even want that kind of information on the dashboard, but I feel like it could be helpful to see how optimizations affect output FSMs. When you get to that point, would you be able to let us know? Thanks!! |
Sure thing! |
Just going to write down the state of the current FSM optimizations that we have been thinking of. Static FSMs (I'm working on these currently)
Dynamic FSMs (I think @parthsarkar17 is working on these).
Querying
Using attributes for each optimizationsIn general, it might be helpful to introduce attributes for each of these optimizations. I can imagine one pass inserting these attributes on the control and then the actual compilation passes blindly following whatever the attributes say: this makes things simpler, since it simplifies the logic of making the decision of which optimization to implement with the actual implementation. It also allows users of Calyx to control the FSMs generated themselves if they want. |
Super exciting stuff!
I'm not sure what this would look like? Are you saying that instead of having a constant (
This is an interesting idea but once again, the trade-off is that attributes add implicit dependencies between passes. The decoupling between the optimization and the decision to optimization is nice but do we want passes earlier in the pipeline to be able to affect the optimization of FSMs as well? |
I'm not even sure if this is even a good idea but ...
To compile this, we have a one-bit register
Now, we don't have to check any ranges (i.e., we don't need a This may not actually be a good idea, though. Just thought it was interesting (and useful to write down before I forget about it). (And agreed about the attributes: I think it may be worth talking synchronously about the tradeoffs of what this decision would be at some point). |
Yep! @calebmkim hit everything we discussed, but I also wanted to give some more details on what we discussed w.r.t.
the goal (if we even want to split for this value of
so we can build smaller FSMs for each of the sub-
Any thoughts would be appreciated! Thanks!! |
Interesting stuff @parthsarkar17! As always, I think it is going to be hard to judge the impact of some of these optimizations without implementing and trying them out with the benchmark suite. I recommend starting with a simple one (like the |
In dynamic Calyx, while blocks are not automatically given their own FSM. So I think something like
|
tl;dr: I think the Yesterday in AMC-CalyxOpt we discussed the For example, suppose we have
We were discussing how to implement
The parent can either a) keep counting while the child is executing or b) just stay put while the child is executing (and wait for the "done" signal, i.e., wait until the child FSM finishes). a) is easier to implement and more composable but b) saves FSM states (this becomes especially important if we want to do OHE, where one state = 1 bit). However, in the static-calyx meeting today we just realized that option b) is very similar to the partitioning idea we have already talked about:
vs.
The only difference is that option b) hands control back over to register 1 rather than passing control to a new register (register 3). So I think it might be worth thinking about |
Oh, this is an awesome insight! We can generalize and maybe hide away |
During the AMC/Calyx-Opt meetings, we realized that Vitis was doing a one-hot encoding for their FSMs, which probably explains their lower LUT usage. This got us thinking about a new way to represent FSMs in the IR of the Calyx Compiler (not in the actual Calyx language) that will let us better exploit different sorts of FSM optimizations. (There's an interesting old lecture discussing this here.) Going to tag @parthsarkar17 who I wrote this up with.
Idea
The idea is to have a separate struct used in the TDCC (and the
compile-static
) pass that is meant to specifically represent FSMs. Something like this (obviously just a sketch an not a well thought out implementation detail):Then, separately, we will instantiate the FSM (i.e., actually build the registers and assignments necessary for this FSM). When we perform the instantiation, this will make it easer for us to know what optimizations to perform, based on the FSM characteristics (i.e.., num_states, queries, and transitions). For example, do we want to split the FSM (or even just copy it to reduce the fanout on some of them)? Should we use a one-hot encoding? etc.
Keep in mind that this all occurs within the TDCC, so there aren't actually any changes to the language.
Also looping in @ayakayorihiro since Adrian mentioned that you were interested in something that sounds related, but for profiling purposes rather than optimizations.
Also @rachitnigam, @andrewb1999, please feel free to add on if there is any thing we missed or got wrong.
The text was updated successfully, but these errors were encountered: