Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential issue with codegen for struct types #318

Open
mtmk opened this issue Jan 10, 2024 · 5 comments
Open

Potential issue with codegen for struct types #318

mtmk opened this issue Jan 10, 2024 · 5 comments
Labels

Comments

@mtmk
Copy link
Collaborator

mtmk commented Jan 10, 2024

Have mixed feelings about this, would need to check what else has changed.

Main reason being, it looks like we've lowered a generic call further down (i.e. this is now <T> rather than ReadOnlySequence<byte>) and that may have implications around codegen for struct types.

Originally posted by @to11mtm in #303 (comment)

@neon-sunset
Copy link

I have been browsing issues and found this one. What kind of codegen implications are you worried about regarding structs? Perhaps I can help.

Usually, the corresponding method bodies get their own monomorphized instantiations with optimal code and all generic-related checks get optimized away - CoreLib relies on this mechanic quite heavily to write zero-cost abstraction code which suspiciously looks like C++ templates.

@to11mtm
Copy link
Contributor

to11mtm commented Jan 14, 2024

I have been browsing issues and found this one. What kind of codegen implications are you worried about regarding structs? Perhaps I can help.

Usually, the corresponding method bodies get their own monomorphized instantiations with optimal code and all generic-related checks get optimized away - CoreLib relies on this mechanic quite heavily to write zero-cost abstraction code which suspiciously looks like C++ templates.

The concern to be had here is that the lower monomorphized calls exist, the more unique code branches exist... the lesser impact is that type args may have to get passed around more, but more important concern is that in scenarios where consumers are primarily using (a variety of) different struct types in their app, the number of monomorphized instantiations may be detrimental to code caches/etc.

This may be a scenario worth it's own benchmarks (it could very well be not worth worrying about,) That will give us more data and potentially see whether there are other potential optimizations to be had in the call. One option that sometimes helps is taking the 'nongeneric' sections of a method and moving those into calls (This way, the monomorphized code can be smaller and common bits stay in cache.)

@neon-sunset
Copy link

neon-sunset commented Jan 14, 2024

It seems there is quite a bit going on so let me address these one by one 😄

  • struct generic instantiations do not create more branches (unless you mean manually written code which does unbox + switch on type)
  • the number of generic instantiations is something that even CoreLib does not care about in 95% of situations
  • CPU cache does not care for the number of generic instantiations either, it is populated along the execution path guided by branch predictor and is either way so large that it's not a concern in case of .NET
  • optimizations in dotnet/runtime care about code size specifically because it usually means reducing the amount of work done, which is exactly what struct generics do since they eliminate generic dispatch cost that exists for objects (if methods are called on them)

What actually does matter however is large method bodies that may cause JIT to regress due to running into either its inliner or locals tracked budget which causes the codegen quality to degrate significantly. For these, it is optimal to simply have high level throughput/latency benchmarks with spot checks of codegen against hot paths detected by profiler (which you probably know better than I do :) )

@to11mtm
Copy link
Contributor

to11mtm commented Jan 15, 2024

struct generic instantiations do not create more branches (unless you mean manually written code which does unbox + switch on type)

poor wording here, but what I mean is that, to the best of my understanding, (could be wrong here) but .WritePublish<StructSizeOfInt32One>() and .WritePublish<StructSizeOfInt32Two> would each get their own monomorphized implementations. My understanding (again could be wrong here) is that the old code was writing to buffer, such that each serialized write to command queue was monomorphized, however the actual protocol writer was -not-, there was only the implementation for the enclosed ReadOnlyMemory (or other wrapping construct).

What actually does matter however is large method bodies that may cause JIT to regress due to running into either its inliner or locals tracked budget which causes the codegen quality to degrate significantly. For these, it is optimal to simply have high level throughput/latency benchmarks

Per above this is where my concern lies and where benchmarks would help. As it stands the method in question is ~100 lines (including spacing) so there is a question of if anything can be done to help.

with spot checks of codegen against hot paths detected by profiler (which you probably know better than I do :) )

More than happy to defer to your judgement if this is a nothingburger;
will happily admit if I can't compare the ASM in SharpLab (and shamefully admit my dotTrace workflow is atrocious) I just try to write some fair benchmarks and draw the line there.

@neon-sunset
Copy link

poor wording here, but what I mean is that, to the best of my understanding, (could be wrong here) but .WritePublish() and .WritePublish would each get their own monomorphized implementations. My understanding (again could be wrong here) is that the old code was writing to buffer, such that each serialized write to command queue was monomorphized, however the actual protocol writer was -not-, there was only the implementation for the enclosed ReadOnlyMemory (or other wrapping construct).

If the access is synchronized, it's no different as two separate methods calling it. Perhaps another mental abstraction you could apply to this scenario is struct generics working the same way as object generics except being a zero-cost abstraction? They do not introduce program behavior differences in such way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants