New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move IN optimizations to SQL Optimizer #3991
base: master
Are you sure you want to change the base?
Conversation
/azp run test-all |
Azure Pipelines successfully started running 1 pipeline(s). |
Disable whitespaces changes before review. GH diff goes crazy otherwise |
I was just noticing the title of this PR and wondering if there’s a chance to add optimizations for IN with lists of certain types. Specifically, number types and guids. By using STRING_SPLIT the query within SQL Server can be cached and speed should not suffer (since essentially it is string-split already). It also allows to provide lists that would otherwise exceed the maximum SQL query length. I write my own code to implement this within my repositories but I think it would be a great feature to add to linq2db by default. Also wondering if this is something that would fit in principle with the project. I also have similar optimizations for lists of strings but I use a TVP which is not available without schema changes to the database. |
var idx = 0; | ||
var hasNull = false; | ||
|
||
foreach (var value in items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty obscure loop for something that, if I understand correctly, boils down to (I agree the impure Where is unusual):
items
.Where(x =>
{
bool isNull = x == null;
hasNull |= isNull;
return !isNull;
})
.Select(x => new SqlValue(x))
.ToList();
// Then change the check (newValues == null) to (newValues.Count == 0)
Alternatively:
var newValues = new List<SqlValue>(items.TryGetNonEnumeratedCount(out int len) ? len : 8);
foreach (var value in items)
{
if (value == null) hasNull = true;
else newValues.Add(new SqlValue(value));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it saves list instance creation for edge cases when list contains only nulls, so I would prefer to leave it as it.
(And in any case I don't plan to spend much time now on this PR as I don't want it to block release)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know but an IN that contains only nulls is a very exceptional case (I guess), so not worth optimizing for too much. Esp. if we're only talking about saving 1 List instance, which is not much considering how many allocations any query does overall.
The loop I proposed is much simpler to understand (which IMHO is quite important) and has a much simpler body which is likely to execute faster for the common case of non-empty IN (so we optimize for the right case).
(And in any case I don't plan to spend much time now on this PR as I don't want it to block release)
Maybe it's best to ship the SqlBuilder fix in 5.0.0 and postpone this change to later (5.1 or 6.0), then.
It's non-trivial and I wouldn't want to rush and introduce new bugs, esp. considering my other comment below.
(I'll look at the other comment more deeply and list differences between this and SqlBuilder later.)
@Shane32 if we go this route, note that Oracle accepts an array-typed parameter for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Test baselines changed by this PR. Don't forget to merge/close baselines PR after this pr merged/closed. |
56c75d2
/azp run test-all |
Azure Pipelines successfully started running 1 pipeline(s). |
List<ISqlExpression>? newValues = null; | ||
var idx = 0; | ||
var hasNull = false; | ||
return ConvertInListNulls(p, items.Cast<object?>(), context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why introduce an extra .Cast<object?>
?
ConvertInListNulls
only consumes this with foreach
, so the non-generic IEnumerable
is enough.
{ | ||
newValues = new List<ISqlExpression>(); | ||
var i = 0; | ||
foreach (var item in items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're enumerating items
twice, and in a nested fashion.
This should be ok practically always (e.g. with arrays, lists...) but some users might pass enumerables that are costly and don't expect us to enumerate twice.
We might debate if that's just bad usage of linq2db, but given this is easy to avoid (see my suggested alternatives for this part) I would say we should do so and be less surprising for users.
Fix #3986
also includes fix from #3990