Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move IN optimizations to SQL Optimizer #3991

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Move IN optimizations to SQL Optimizer #3991

wants to merge 2 commits into from

Conversation

MaceWindu
Copy link
Contributor

Fix #3986

also includes fix from #3990

@MaceWindu
Copy link
Contributor Author

/azp run test-all

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MaceWindu
Copy link
Contributor Author

Disable whitespaces changes before review. GH diff goes crazy otherwise

@Shane32
Copy link
Contributor

Shane32 commented Feb 22, 2023

I was just noticing the title of this PR and wondering if there’s a chance to add optimizations for IN with lists of certain types. Specifically, number types and guids. By using STRING_SPLIT the query within SQL Server can be cached and speed should not suffer (since essentially it is string-split already). It also allows to provide lists that would otherwise exceed the maximum SQL query length. I write my own code to implement this within my repositories but I think it would be a great feature to add to linq2db by default. Also wondering if this is something that would fit in principle with the project.

I also have similar optimizations for lists of strings but I use a TVP which is not available without schema changes to the database.

var idx = 0;
var hasNull = false;

foreach (var value in items)
Copy link
Contributor

@jods4 jods4 Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty obscure loop for something that, if I understand correctly, boils down to (I agree the impure Where is unusual):

items
  .Where(x => 
  { 
    bool isNull = x == null;
    hasNull |= isNull;
    return !isNull;
  })
  .Select(x => new SqlValue(x))
  .ToList();
  
  // Then change the check (newValues == null) to (newValues.Count == 0)

Alternatively:

var newValues = new List<SqlValue>(items.TryGetNonEnumeratedCount(out int len) ? len : 8);
foreach (var value in items)
{
  if (value == null) hasNull = true;
  else newValues.Add(new SqlValue(value));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it saves list instance creation for edge cases when list contains only nulls, so I would prefer to leave it as it.

(And in any case I don't plan to spend much time now on this PR as I don't want it to block release)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know but an IN that contains only nulls is a very exceptional case (I guess), so not worth optimizing for too much. Esp. if we're only talking about saving 1 List instance, which is not much considering how many allocations any query does overall.

The loop I proposed is much simpler to understand (which IMHO is quite important) and has a much simpler body which is likely to execute faster for the common case of non-empty IN (so we optimize for the right case).

(And in any case I don't plan to spend much time now on this PR as I don't want it to block release)

Maybe it's best to ship the SqlBuilder fix in 5.0.0 and postpone this change to later (5.1 or 6.0), then.
It's non-trivial and I wouldn't want to rush and introduce new bugs, esp. considering my other comment below.
(I'll look at the other comment more deeply and list differences between this and SqlBuilder later.)

@jods4
Copy link
Contributor

jods4 commented Feb 22, 2023

@Shane32 if we go this route, note that Oracle accepts an array-typed parameter for x IN :param, which would be a similar optimisation.

viceroypenguin
viceroypenguin previously approved these changes Feb 22, 2023
Copy link
Contributor

@viceroypenguin viceroypenguin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Source/LinqToDB/SqlProvider/BasicSqlOptimizer.cs Outdated Show resolved Hide resolved
@linq2dbot
Copy link

Test baselines changed by this PR. Don't forget to merge/close baselines PR after this pr merged/closed.

igor-tkachev
igor-tkachev previously approved these changes Feb 22, 2023
sdanyliv
sdanyliv previously approved these changes Feb 23, 2023
@MaceWindu
Copy link
Contributor Author

/azp run test-all

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MaceWindu MaceWindu modified the milestones: 5.0.0, 5.? Feb 23, 2023
@MaceWindu MaceWindu marked this pull request as draft February 23, 2023 11:39
List<ISqlExpression>? newValues = null;
var idx = 0;
var hasNull = false;
return ConvertInListNulls(p, items.Cast<object?>(), context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why introduce an extra .Cast<object?> ?
ConvertInListNulls only consumes this with foreach, so the non-generic IEnumerable is enough.

{
newValues = new List<ISqlExpression>();
var i = 0;
foreach (var item in items)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're enumerating items twice, and in a nested fashion.
This should be ok practically always (e.g. with arrays, lists...) but some users might pass enumerables that are costly and don't expect us to enumerate twice.

We might debate if that's just bad usage of linq2db, but given this is easy to avoid (see my suggested alternatives for this part) I would say we should do so and be less surprising for users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect SQL generated over 'Contains' in array having null element
7 participants