Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parameterization algorithm for SV modules/classes (#4497) #4629

Draft
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

donlon
Copy link
Contributor

@donlon donlon commented Oct 26, 2023

Fixes #4497
Fixes #3858

Requires: #4581, #4582
Probably fixes: #3875, #4013

Backgrounds

#4497 fails because uvm_sequencer#(uvm_sequence_item) and uvm_sequencer#(uvm_sequence_item, uvm_sequence_item) are treated as different parameterization of class uvm_sequencer, but they're actually the same thing. There're some issues about the previous algorithm, for example

  • It only processes the overridden part of the parameters, other parameters (may be an expression depends on other parameters) are not evaluated.
    So references with different number of parameter pins can't be matched to the same parameterization instance.

  • It didn't re-evaluated overridden values under the context of the target module, even if the values are already constant.
    Sometime values should be truncated/expended after assigned to module, based on the variable type. E.g. Cls#(1024) and Cls#(0) should refer to the same parameterization, where Cls is

class Cls #(bit [7:0] A);
endclass
  • It didn't propagate parameters/parameter types through parameter list when evaluating the values. The types could be different in different instances and affect constant folding. E.g.
typedef struct packed {int a;} S1;
typedef struct packed {logic [3:0] b;} S2;
class Cls #(type T, T VAL);
endclass
// and
Cls #(logic[3:0], 100) obj1;
Cls #(S1, '{a: 10}) obj2;
Cls #(S2, '{b: 20}) obj3;

The new algorithm

To solve problems mentioned above and handle the general case, we have to substitute the overridden values into a copy of the original module, and then evaluate all of the parameters one by one (using V3Const::constifyParamsEdit). Then we can collect all of the evaluated constants (into a parameter set) and lookup the parameterization instance previously created with the same parameter set, then it's possible to match an exact instance for the reference.

New algorithm searches parameterization instances in a hash map, with the calculated parameter set. Instead of using longer string name that represents different parameterization before, it's expected to have higher efficiency.

However, deep cloning a module is expensive. To accelerate the procedure, it's possible to cache the instance by adding mapping from only the overridden part of the parameter set. If later reference has the same parameter overrides as before, it can find the instance with only the overridden part of the parameter set, and avoid another deep clone (See ModInfo::insertOverriddenParamSet and ModInfo::findNodeWithOverriddenParamSet in V3Param.cpp).

For hierarchical blocks, searching the library wrapper is now using the same routine as normal modules, so it should be more clear.

And I'm not sure whether constant folding can be performed without cloning a clean module (using something like V3Simulate?), if it's possible maybe I can optimize it in the future.

TODOs/Not yet solved issues

It's almost done for the new deparam algorithm! But there's still some problems need to be fixed.

  • Add more tests covering all cases mentioned above

  • Fix top-level interface port used in lint-only mode #4582

  • Testcase t_hier_block_trace* fails. It generates fst/vcd files with different structure than before.
    It seem that previously, cells without parameter pins failed to match the correct (the library version of) parameterized hierarchical blocks, instead it match the original one. Probably it's needed to be fixed in current codebase first.
    Fixed by Fix linking parameterized hierarchical blocks and recursive hierarchical blocks #4654

  • During constant folding, some expressions result in numbers with wrong signedness. E.g. the value in parameter int A = int'(10.00) is converted into an unsigned number, and sometimes it causes instances mismatch. Currently, only the V3Number field of AstConst is compared when searching an instance (the signedness is ignored when comparing/hashing V3Numbers(?)), as a workaround. (link)

  • During constant folding, V3Width may move some datatypes in the tree to the global type table, and the nodes may contain reference to a class removed during parameterization, which causes 'Broken link' error later. It's very similar to Internal error Broken link in node #3875. I think the best way to handle such problems is that keep the unused classes but mark them as 'dead', and garbage-collect all unused types and classes/modules in next V3Dead. I tried this (using function here) but it causes more problems (e.g. child nodes of AstBracketArrayDType are removed, and the algorithm is not perfect, likes AstClassRefDTypes inside dead module does contribute to reference count of the class). Probably we have to improve the algorithm (instead of using reference counters, we can perform 'connectivity check' through the tree, and then remove the 'disconnected' nodes). I think I need more advice on this.

Performance and memory usage

It's a little bit better than before. For t_uvm_todo:

New:

Stage, Elapsed time (sec), 008_param         0.048460
Stage, Memory (MB), 008_param               95.632812

Old: (cf6e362)

Stage, Elapsed time (sec), 008_param         0.105867
Stage, Memory (MB), 008_param              140.031250

@donlon donlon marked this pull request as draft October 26, 2023 18:05
@@ -1,5 +1,4 @@
%Error: t/t_class_extends_aliased_real_bad.v:14:10: Attempting to extend using non-class
: ... note: In instance 't'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some instance hints disappeared because the node of pointed type has been moved to the type table during constant folding, but the instance name is from the containing module back-searched. If the type is inside type table, the module 't' can not be found.
I think it can be improved by providing the processing state to the error builder by some other means, so the accurate error context can be determined.

src/V3Param.cpp Show resolved Hide resolved
src/V3Param.cpp Outdated Show resolved Hide resolved
void V3Dead::deadifyModuleDTypes(AstNetlist* nodep) {
UINFO(2, __FUNCTION__ << ": " << endl);
{
DeadVisitor{nodep, false, true, false, false, !v3Global.opt.topIfacesSupported()};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failed to garbage-collect all unnecessary types and modules together :( Maybe can improve the algorithm first? Probably related to #2227

@wsnyder
Copy link
Member

wsnyder commented Oct 27, 2023

To accelerate the procedure, it's possible to cache the instance by adding mapping from only the overridden part of the parameter set. If later reference has the same parameter overrides as before, it can find the instance with only the overridden part of the parameter set, and avoid another deep clone

Note it's critical for C-compiler and later performance that we have multiple cells with same parameter set share a cloned module. For example some FPGAs use parameters to customize a LUT and will have 10K parameterized cells with identical final parameters.

Copy link
Member

@wsnyder wsnyder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, I agree this seems the right track for these fixes.

src/V3AstNodes.cpp Show resolved Hide resolved
src/V3Broken.cpp Outdated Show resolved Hide resolved
src/Verilator.cpp Show resolved Hide resolved
test_regress/t/t_hier_block_sc_trace_fst.pl Outdated Show resolved Hide resolved
test_regress/t/t_interface_localparam.v Outdated Show resolved Hide resolved
test_regress/t/t_xml_first.out Outdated Show resolved Hide resolved
…go-wip-staging

Conflicts:
	src/V3Param.cpp
	test_regress/t/t_hier_block_sc_trace_fst.out
	test_regress/t/t_hier_block_sc_trace_vcd.out
	test_regress/t/t_hier_block_trace_fst.out
	test_regress/t/t_hier_block_trace_vcd.out
@donlon
Copy link
Contributor Author

donlon commented Nov 8, 2023

Hi, I just saw that you mentioned V3Dead somewhere else (#4646 (comment)) so I wonder if you have started coding?
I was also digging into it and conceived some immature ideas, perhaps I can open an issue for discussion?

@wsnyder
Copy link
Member

wsnyder commented Nov 9, 2023

I have a new V3Dead all coded, just in debug phase which I underestimated!

@donlon
Copy link
Contributor Author

donlon commented Nov 9, 2023

I have a new V3Dead all coded, just in debug phase

Thank you! Looking forward to the improvements.

which I underestimated!

My humble understanding about its complexity is that you need to handle references from every pointer fields, and the dependency relationship between nodes is also complex (nodes can depend on one node or one node and all its children, while all its parents should be kept). Especially if we need to provide options to remove specific types of the nodes (e.g. module, dtype, scope, ...).

@em2machine
Copy link

This looks like would help #4391 and similar.

@donlon
Copy link
Contributor Author

donlon commented Dec 1, 2023

This looks like would help #4391 and similar.

I think #4659 is more likely to fix it. If we process modules in the order of source code, the xxx_class#(Cfg0) can be parameterized before using cmd_tag_t (in #4391) so the field can be resolved correctly. But #4659 indeed depends on this pull.

Hopping that it can also fix #4391 :)

@wsnyder
Copy link
Member

wsnyder commented Dec 1, 2023

If we process modules in the order of source code

Unfortunately source code order is not sufficient, and changing to that will break other cases. We need to implement following IEEE 1800 3.12 and 23.10.4 (and perhaps others) to do better.

@donlon
Copy link
Contributor Author

donlon commented Dec 2, 2023

Unfortunately source code order is not sufficient, and changing to that will break other cases

Are they cases containing hierarchical names?

So I think actually we do need to support performing elaboration out of order, as hierarchical names (and $root) can reference objects after itself (however merely an identifier without any dots can not, so for this source code order is good).

Maybe the description for #4659 is not accurate, as the 'elaboration' process defined in the LRM is split into multiple stages in Verilator, but #4659 actually affects only the order of ParamProcessor::nodeDeparam called in V3Param, and other stages are not changed, so it won't break cases related to hierarchical names.

For hierarchical names and, moreover, upwards referencing, they do add difficulties and it still need further improvements to make the whole stages perfect (especially when used during parameter resolution). But I think #4659 is almost sufficient for #4391.

Here's the new algorithm I implemented (https://github.com/donlon/verilator/commits/wip/elab-order-1), and the key changes are here (donlon@b92d2cc). Luckily I find it doesn't break anything (the CI failures are due to this pull, which it depends on).

For 23.10.4 it looks like a way to support arbitrary defparams? (which we probably tend not to provide good support for?) And it's also difficult to be supported with current algorithm...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants