Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concatenate binary data using || #2200

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Concatenate binary data using || #2200

wants to merge 1 commit into from

Conversation

virzak
Copy link

@virzak virzak commented Jan 3, 2022

No description provided.

@roji
Copy link
Member

roji commented Jan 3, 2022

@virzak what's the C# LINQ code which this would enable? There's no addition operator defined over BitArray in C#.

@virzak
Copy link
Author

virzak commented Jan 3, 2022

Hi @roji,

Ultimately, the goal is to have an easy way to implement the last non null puzzle. lastval is the one that needs to be generated.

id     col1     lastval
----   ------   -----------
2      NULL     NULL
3      10       10
5      -1       -1
7      NULL     -1
11     NULL     -1
13     -12      -12
17     NULL     -12
19     NULL     -12
23     1759     1759

This SQL is the most efficient for SQL server.

SELECT id, col1,
  CAST(
    SUBSTRING(
  MAX( CAST(id AS BINARY(4)) + CAST(col1 AS BINARY(4)) )
    OVER( ORDER BY id
      ROWS UNBOUNDED PRECEDING ),
  5, 4)
    AS INT) AS lastval
FROM dbo.T1;

The walkthrough is here (unfortunately behind a registration gate) and also mentioned here

So I'm writing an extension that handles:

  • Window functions
  • Binary manipulations

I just got it to work with SQL Server, but not with SQLite or PostgreSQL (never used it before).

The LINQ looks as follows:

var query = dbContext.LastNonNulls
.Select(r => new
{
    LastNonNull =
    EF.Functions.ToInt32(
    EF.Functions.Substring(
        EF.Functions.MaxOver(
            EF.Functions.Concatenate(
                EF.Functions.GetBytes(r.Id), EF.Functions.GetBytes(r.Col1)
            ), EF.Functions.CreateOrdering(r.Id))
        , 5, 4
    )
    )
});

The innermost sql for posgres shoud (I think) look as follows:

SELECT l."Id"::bit(32) || l."Col1"::bit(32)
FROM "LastNonNulls" AS l

One way to generate that would be to override FindBaseMapping and invoke this code:

if (clrType == typeof(BitArray))
{
mapping = mappingInfo.IsFixedLength ?? false ? _bit : _varbit;
return mapping.Clone($"{mapping.StoreType}({mappingInfo.Size})", mappingInfo.Size);
}

That's how the concatenation operands end up with BitArray type.

The current workaround is this:

public class BinaryNpgsqlQuerySqlGenerator : NpgsqlQuerySqlGenerator
{
    public BinaryNpgsqlQuerySqlGenerator(QuerySqlGeneratorDependencies dependencies, bool reverseNullOrderingEnabled, Version postgresVersion) : base(dependencies, reverseNullOrderingEnabled, postgresVersion)
    {
    }

    protected override string GetOperator(SqlBinaryExpression e)
        => e.OperatorType switch
        {
            ExpressionType.Add when
                e.Type == typeof(BitArray) || e.Left.TypeMapping?.ClrType == typeof(BitArray) || e.Right.TypeMapping?.ClrType == typeof(BitArray)
                => " || ",
            _ => base.GetOperator(e)
        };
}

I'll be publishing the entire source code within the next few days.

@roji
Copy link
Member

roji commented Jan 3, 2022

@virzak OK, thanks for the context. But if your intention is to actually propose these extensions/translations go into the provider, I'd recommend dealing with each addition separately and discussing the design before coding. For example, an API to allow users to express window functions is tracked at the EF Core level by dotnet/efcore#12747 - this likely wouldn't be a PostgreSQL-specific thing (and the API shape requires a lot of thought). Something like EF.Functions.Substring given that .NET's string.Substring is already translated (similar goes for EF.Functions.Int32).

So assuming you have a particular SQL you'd like to generate with EF Core, I'd breaking it down into the various constructs not yet supported, examine each one separately and start a conversation about it. Of course, if you're doing this for fun then go ahead - but actually making this go into the provider requires a bit more thought and planning, etc.

@virzak
Copy link
Author

virzak commented Jan 3, 2022

@roji, the library I'm building is to eliminate raw SQL from a client project.

The database contains features of the road (section boundary, signs, turns, etc...) of a road. Last non null puzzle is used to determine which road section a feature belongs to - every feature is assigned a section id which just preceded it by distance.

There a few other scenarios where last non null puzzle is used and a lack of window functions forced us to generate sql by hand, which resulted in a bunch of bugs.

dotnet/efcore#12747, but it is in the backlog, so it isn't planned for EF Core 7, correct?

EF.Functions.Substring takes byte[] as the argument, so string.Substring didn't work.

It currently makes more sense for the client project to use LINQ queries, even if the supporting library is limited or doesn't have the best design. Also with one user project initially, improvements with breaking changes aren't a problem.

@roji
Copy link
Member

roji commented Jan 12, 2022

dotnet/efcore#12747, but it is in the backlog, so it isn't planned for EF Core 7, correct?

That's right - relatively few users have shown interest in an doing window function explicitly, and a good API design here requires quite a bit of work. I do hope we'l get to it though.

Just to make sure you're aware, you can always call any arbitrary SQL function yourself, without specific support from an EF Core provider, by using user-defined function mapping. We don't generally aim to provide ready-to-use translations for each and every possible function that exists out there - that wouldn't be feasible.

EF.Functions.Substring takes byte[] as the argument, so string.Substring didn't work.

Make sense - EF.Functions.Concat may indeed be a good idea. If may even be a good idea to make it receive objects rather than byte[], so that it's usable with any type where PG supports the concatenation operator ||. However, rather than introducing an EF.Functions.ToInt32, .NET already has Convert.ToInt32 which IIRC should already work. We generally try to avoid introducing stuff on EF.Functions where a built-in .NET alternative can be translated instead.

@virzak
Copy link
Author

virzak commented Jan 24, 2022

@roji,

I just published the extension library. The lines relative to this PR are here.

I ended up replacing ToInt32 with ToValue<T> as well as introducing an OverClause which can chain Partition and OrderBy clauses.

Right now the only window functions are min/max, but will add many more.

If you have any further thoughts on improvements, it will certainly be appreciated.

@roji
Copy link
Member

roji commented Jan 24, 2022

@virzak as I wrote above, I don't think it makes sense to merge the PR as-is, adding support only in NpgsqlQuerySqlGenerator without any way to actually express the concatenation (without your extension). But if you'd like to propose the full support for binary concatenation as part of this PR - by introducing the appropriate user-facing APIs and their translation - I'd be happy to merge that into the provider.

Note that there's Enumerable.Concat which can be used at least for byte[] - although that returns an IEnumerable, which wouldn't allow further composition of bytea-specific logic (so something on EF.Functions could make sense). Note also that you're concentrating on the PostgreSQL bytea type, but we also support PostgreSQL arrays, which also support concatenation, in case you're interestedi n that.

(finally, binary concatenation doesn't really have anything specific to do with window functions, so it's a bit odd for an extension dealing with window functions to deal with them)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants