New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve sequence building performance #3913
base: refactor/parsing
Are you sure you want to change the base?
Conversation
Some wcf remoting examples don't compile because of http in System.Net when targetting .net 4.8. |
@@ -6,14 +6,13 @@ namespace LinqToDB.Linq.Builder | |||
using Reflection; | |||
using LinqToDB.Expressions; | |||
|
|||
[BuildsMethodCall("Distinct", nameof(LinqExtensions.SelectDistinct))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I like the inconsistency of strings and nameof
. I'd say we (royal collective we) should pick one and stick with it, and this PR is a good chance to pick and update all to thath. I'd vote for nameof
myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this first draft I tried to stick to the style of existing code, but that's totally doable. 👍
At least for our own methods (e.g. LinqExtensions
) I like the strong references.
It is worth noting:
- Some references are to .net fx methods;
- Sometimes a single name refers to multiple methods in different classes.
I see PR is based on parser refactoring feature, so I assign it to v6 milestone |
@MaceWindu yes, this is on top of the parser refactoring. |
Do you have an example that shows the problem? I would like to look at in performance profiler. |
@igor-tkachev if you want to see the worst query compilations, you basically need to use the builders that are close to the end of the list. A good example would be some complex That said, be careful after the first query compilation because linq2db attempts to improve performance by sorting the list of potential builders, which means your query will now be vastly more performant on the 2nd run. So if you're doing benchmarking, maybe execute a bunch of In fact, another performance issue that is a bit unlikely would be when 2 different builders fight for the top spot. Then the list would be constantly updated, which is another perf hit. Finally, the current code uses locks, so if you compile many queries in parallel, contention might impact perf. I think that's unlikely to show up in simple benchmarks. |
It looks like we need some mechanism to collect metrics/statistics of our code and create a report at the end of the test run.. So, we will be able to compare different solutions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite remember (it's been a while...) but I think it was necessary, maybe to avoid picking up properties from a parent directory.
Then again... maybe I added it to override some props, then removed them and kept the file.
If it makes no difference to remove it, we can. But it needs a quick test.
} | ||
} | ||
|
||
void GenerateCode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for evaluation: would it be better to do this as written, or to use a templating system like Scriban? pros and cons to each, and no right answer for all situations, so worth evaluating here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not want to add external dependencies, even for a private tool, as I know this is usually avoided in linq2db.
That said, I use Scriban myself on other projects and I agree it's a nice templating syntax to generate source code. 👍
We can make the change if the rest of the team feels the same way.
Obviously, there's a small learning curve to get familiar with the syntax and using a plugin for syntax highlighting (I use VS Code, not sure if avail. in VS) is highly recommended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Public dependencies are usually avoided. Build dependencies I think are fine. The point is to minimize what dependencies we require consumers to assume.
No syntax coloring in VS, unfortunately. I've not found that to be a significant issue for me.
|
||
<!-- <EmitCompilerGeneratedFiles>true</EmitCompilerGeneratedFiles> --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debugging code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of removing, does it make sense to emit them to a place where they can be added to source control, so that changes can be reviewed as part of future PRs? This is what I do in SuperLinq to ensure that any changes to the SG or it's output are able to evaluated as part of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it on purpose, because it's a very useful property to know about when debugging the code generator and I doubt we all know its exact name.
But we can remove it if you prefer, obviously it's just a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other projects I also like to commit generated code somewhere (if not as the main compilation input), because as you said it makes it easy to diff changes when updating the generator.
Let's see what the team think about this.
Co-authored-by: Stuart Turner <stuart@turner-isageek.com>
@viceroypenguin Thanks for reviewing, it's a big changeset. 🙏 |
According to metrics this is not an issue and old strategy performance is fine. |
What's your objection to merging this? |
Limitations of current design
Today
ExpressionBuilder
linearly goes through a long list (~70) ofISequenceBuilder
to find which builder can handle anExpression
. Each builder is probed by a call toCanBuild
, in sequence.This is horribly not performant. If your query uses one of the last builders -- or simply the expression has no builder, which is sometimes a normal case -- you have to go through 70+ checks!! They are virtual calls, many builders repeatedly perform the same casts or checks, and not all builders perform quick, cheap tests. Of course this is only gonna get worse as we add new builders.
The current design attempts at improving performance by dynamically sorting the list to put the most used builders at the front.
This is a good idea but it has two drawbacks.
So once the most used builder is found, nothing changes anymore, even if a builder that becomes commonly used is still way down the list at the moment.
As I said before, "not finding a builder" is sometimes an expected use-case and this always go through the whole list.
New strategy
Finding a builder now starts with a big
switch
statement onNodeType
(an int). C# optimizes this depending on the number and distribution of items, it could be a hard-coded binary search (ifs) or a single indirect jump based on a table.Then a
CanBuild
method is called, but notice:CanBuild
is now a static method. Short ones will be inlined, which is very likely as most have been simplified thanks to the fact that we know what node to expect whenCanBuild
is called. Most of them are as simple as=> true
,=> call.IsQueryable()
or=> call.IsAsyncExtension()
One notable case is
ExpressionType.Call
because we have many (most) builders for this node type.This is handled by a second
switch
, this time onMethodCallExpression.Method.Name
, which C# will compile into a static hashtable. This is then handled as other node types. In 90% of cases there's only a single candidate left.There are other benefits, such as clarity. The book-keeping and re-sorting code is gone.
Before it was difficult to see which builder handles what, which is particularly important as they must all be mutually exclusive!
Now you just need to peek at the switch tables to find what is handled, by what.
Join
method?CountBuilder
(methodsCount
andLongCount
are actually handled byAggregationBuilder
).Source generators
I'm not crazy and I didn't write the big switches by hand, that would be too error-prone.
Instead, I added a C# Source Generator to the project, that looks for attributes like
BuildsExpression
orBuildsMethodCall
on builders and implementsFindBuilderImpl
automatically.As generated code is not committed to git, I uploaded the source generated by this PR here for your convenience:
https://gist.github.com/jods4/6b145327f3c7563836c357984b723caf
Code patterns
Special cases
A few builders may handle any method call. You can do this simply by using
[BuildsExpression(ExpressionType.Call)]
, they will be probed after any method-specific builder[BuildsMethodCall]
.Today this inculdes
MethodChainBuilder
,QueryExtensionBuilder
,TableBuilder
. Those builders tend to look for attributes on methods, e.g.[Extension]
or[Table]
.Some builders take advantage of having different, simpler checks based on what method or expression type matched.
You can have multiple attributes on a single builder, and use the named property
CanBuildName
to direct the check to a different method for some cases. Many sync/async builders do that, for exampleContainsBuilder
:ContextRefBuilder
is unique: it seems that it could potentially handle any expression ifBuilderInfo.Parent
is null. For this case I added[BuildsAny]
. Builders with this attribute will be probed after everything else failed.The signature of the static method is
CanBuild(BuildInfo info, ExpressionBuilder builder)
.Risks and breaking changes
Everything is internal: there's no public API change, so no breaking change.
A lot of files were touched because there is 70+ builders but 95% of changes are very mechanical. I think there's still room for improvements but I wanted to keep this PR straightforward, as it's fairly large.
If all tests pass, I'd say the risk of regression is low.