Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for table partitioning #1035

Open
mrjrt opened this issue Sep 22, 2019 · 69 comments · May be fixed by #1789
Open

Support for table partitioning #1035

mrjrt opened this issue Sep 22, 2019 · 69 comments · May be fixed by #1789
Labels
enhancement New feature or request
Milestone

Comments

@mrjrt
Copy link

mrjrt commented Sep 22, 2019

I'm a member of a project that is going to be using code-first EFC with Postgres and need to utilise table partitioning. It would be great if the data connector supported this natively so we don't have to resort to hacking SQL into the migrations manually or switching to schema-first.

I've taken a quick gander at your codebase and have some ideas, but I suspect quite strongly I'll not get anywhere so thought I'd ask politely here as well as with the improvements in PG12 (i.e. foreign key support) more people may decide to utilise partitioning.

Thanks!

@YohDeadfall YohDeadfall transferred this issue from npgsql/npgsql Sep 22, 2019
@roji roji added the enhancement New feature or request label Sep 23, 2019
@roji roji changed the title FR: Partitioned table support for code-first Support for table partitioning Sep 23, 2019
@roji roji modified the milestones: 5.0.0, 3.1.0 Sep 23, 2019
@roji
Copy link
Member

roji commented Sep 23, 2019

@roji
Copy link
Member

roji commented Sep 23, 2019

@mrjrt this is definitely a good feature to have in the provider. If you want to give this a stab I can provide some guidance and pointers.

@roji roji modified the milestones: 3.1.0, 5.0.0 Nov 4, 2019
@jonmill
Copy link

jonmill commented Dec 9, 2019

@roji - We're looking to use Partitions for efficient deletes using EF Core. Do you have any guidance on either

  1. How this could be done using EF Core today
  2. If it cannot be, how we could help contribute to help make it possible?

@roji
Copy link
Member

roji commented Dec 9, 2019

@jonmill using partitions today could be done by editing the generated migrations files for table creation, and replacing the appropriate lines with raw SQL that would create the partitions as desired (according to https://www.postgresql.org/docs/current/ddl-partitioning.html). I'm not familiar enough with PostgreSQL partitions to foresee particular problems or details, but in principle partitions should only ever affect table (and index) creation, and not querying or inserting. That means it's purely a migrations thing.

Contributing to this would mean designing model building extensions on EntityType, which would allow to model partitioning as supported by PostgreSQL. Users would be able to call C# extension methods when configuring their model, to define partitions for each entity etc. The Npgsql provider already exposes quite a few such PostgreSQL-specific extensions, although partitioning does seem to be slightly richer than the average case.

@jonmill
Copy link

jonmill commented Dec 11, 2019

@roji - Hmm manually editing the migrations file seems sketchy. I did some research and it doesn't look like you can alter a table to generate partitioning after it's created, so it looks like updating the EF extensions might be the way to go.

Any guidance on where to start looking in the source for ways to begin adding support for partitions?

@roji
Copy link
Member

roji commented Dec 11, 2019

Hmm manually editing the migrations file seems sketchy.

In general, there's no problem at all with editing migration files and adding raw SQL (or replacing existing directives) - the EF Core team explicitly encourages this where it is necessary. A good example is database triggers; EF Core doesn't provide any way to model triggers and produce migrations for them, and there wouldn't be much value in it. So if you need them you can always edit migrations and add them there.

Adding support for partitions isn't going to be trivial. But if you want to go into it, take a look at the existing entity builder extensions, which are PostgreSQL model configuration extensions. These add metadata to entities (in the form of annotations), which are later picked up by the migrations SQL generator when it generates the CREATE TABLE DDL, for example. As I wrote above, the description of how partitions are specified/configured isn't trivial, so we'd have to think about how to represent everything in metadata etc.

@jonmill
Copy link

jonmill commented Dec 11, 2019

In general, there's no problem at all with editing migration files and adding raw SQL (or replacing existing directives) - the EF Core team explicitly encourages this where it is necessary. A good example is database triggers; EF Core doesn't provide any way to model triggers and produce migrations for them, and there wouldn't be much value in it. So if you need them you can always edit migrations and add them there.

Interesting, I didn't know that...thanks for clarifying that. So if we were to go this route since the addition of partitions to the code base sounds like it would be non-trivial and time-consuming, especially for someone not intimately familiar with the code base, would this involve completely replacing the CREATE TABLE call for the partitioned table? Or is there a way we can update the existing CREATE TABLE call to add our partitioning scheme?

@jonmill
Copy link

jonmill commented Jan 17, 2020

@roji - I found a potentially better way that's simpler; I use table inheritance to create the master table using EF Core, then create inherited tables as necessary using raw sql with a Rule to redirect inserts into the inherited table. The one problem I've found is inserting into the master table; EF Core throws a concurrency error since rows weren't actually inserted into the master table.

I'm guessing this is due to how postgres rules change the actual query...for instance

CREATE TABLE Inherited (
    CHECK (ColA = 'Foo')
) INHERITS (Master);

CREATE RULE Inherited_Insert AS
ON INSERT TO Master WHERE
(ColA = 'Foo')
DO INSTEAD
INSERT INTO Inhertied VALUES (NEW.*);

Inserting data into Master causes a concurrency exception since EF tries to get values back for tracking but the insert rule necessary for inherited tables doesn't return one.

Do you know of any way around this outside of writing manual SQL to insert?

@roji
Copy link
Member

roji commented Jan 17, 2020

Hmm... If you examine the INSERTs sent by EF Core, you should see a RETURNING clause which instructs PostgreSQL to send back the generated columns in the new row. As long as you replicate the same clause in your INSTEAD rule, things should be transparent to EF Core and everything should work.

Hope I've understood your issue, note that there's an issue open on having EF Core support PostgreSQL table inheritance, although it's unlikely we'll get around to doing anything soon.

@jonmill
Copy link

jonmill commented Jan 17, 2020

I don't think a DO INSTEAD rule with a WHERE clause can have a RETURNING statement...I tried that and Postgres complained. Is there a way to ignore the RETURNING clause on insert?

@roji
Copy link
Member

roji commented Jan 20, 2020

@jonmill no, EF Core really does require those keys in order to work correctly (in at least some scenarios). For example, if you add a new entity and then update it, EF Core needs to have the (generated) key from the earlier INSERT statement to properly generate the later UPDATE statement.

Note this paragraph in the PG docs which seems to indicate that RETURNING is supported in INSERT rules, but apparently on unconditional ones (e.g. no WHERE clause)? I'm guessing this is the problem you're running into?

In a rule for INSERT, UPDATE, or DELETE on a view, you can add a RETURNING clause that emits the view's columns. This clause will be used to compute the outputs if the rule is triggered by an INSERT RETURNING, UPDATE RETURNING, or DELETE RETURNING command respectively. When the rule is triggered by a command without RETURNING, the rule's RETURNING clause will be ignored. The current implementation allows only unconditional INSTEAD rules to contain RETURNING; furthermore there can be at most one RETURNING clause among all the rules for the same event. (This ensures that there is only one candidate RETURNING clause to be used to compute the results.) RETURNING queries on the view will be rejected if there is no RETURNING clause in any available rule.

@jonmill
Copy link

jonmill commented Jan 21, 2020

@roji - gotcha; yes, that is the issue I'm running into. I got around it by just doing a manual INSERT statement instead of using EF. The items I'm inserting won't be tracked or (immediately) used after insertion, so the RETURNING clause is extra work that's unnecessary for me.

It'd be great if this was possible without writing a manual INSERT clause, but I couldn't find a way

@roji
Copy link
Member

roji commented Jan 23, 2020

The items I'm inserting won't be tracked or (immediately) used after insertion, so the RETURNING clause is extra work that's unnecessary for me.

Unfortunately EF Core doesn't know that you don't need tracking (and so don't need the RETURNING) - what you're asking for is dotnet/efcore#9118. Right now I don't think there's a solution for what you're trying to do.

@jonmill
Copy link

jonmill commented Jan 23, 2020

Gotcha; thanks for all your help!

@agabidullin
Copy link

Hi all!
Have found some workaround for table partitioning using ef core and npgsql.

  1. Create shadow property e.g. "Partitioned" for entity
  2. Create BEFORE INSERT TRIGGER function that create partition (if not exists) and inserts row in this partition.
    And returns NEW row (it's important for ef core, otherwise there will be exception)
  3. But this returned row will be inserted in master table too.
  4. In BEFORE INSERT TRIGGER I've made next steps:
  • BEFORE inserting in partition set Partitioned column to true

NEW."Partitioned" = true;

  • BEFORE returning row set set Partitioned column to false

NEW."Partitioned" = false;

  1. Create AFTER INSERT TRIGGER function that deletes from master table rows by condition "Partitioned = false"

Hope it will help!

@mathavanmani
Copy link

Hi @roji, we got all the tables migration happening using EF Core. For enabling the partition by range, I'm not sure how I can change the Migration File? ( to achieve something like below )

CREATE TABLE people (
id int not null,
birth_date date not null,
country_code character(2) not null,
name text
) PARTITION BY RANGE (birth_date);

reg child partition table, I can add by migrationBuilder.Sql(sql);. but not sure how I can create the master table with the partition definition using the EF Core Migration.

@roji
Copy link
Member

roji commented Aug 5, 2020

@mathavanmani EF Core (and the PostgreSQL provider) doesn't currently support table partitioning. However, it's perfectly fine to generate migrations, and then edit the generated files and introduce raw SQL. So you can use EF Core migrations to generate a plain vanilla CreateTable operation, then edit the migration and replace that with a raw SQL operation that contains your PARTITION BY clauses etc. See these docs for more info on raw SQL in migrations.

@roji roji modified the milestones: 5.0.0, 6.0.0 Aug 29, 2020
@freerider7777
Copy link

freerider7777 commented Sep 12, 2020

"returning xmin" does not work in partitioned tables in Postgres 12 (works ok in 11), so we can't use npgsql EF Core with partitioned tables...
There is a discussion but I don't know if they are going to fix it:
https://www.postgresql.org/message-id/16446-2011a4b103fc5fd1@postgresql.org

@roji roji added this to the 8.0.0 milestone Oct 15, 2022
@malix0
Copy link

malix0 commented Jun 6, 2023

Hi, I'm interested on this feature. What I didn't understand is if I can somehow alter the table creation sql statement and append the partitioning rule to it or am I forced to completely rewrite the statement by hand.
Another thing that I don't understand is this. I saw that more than 2 years ago this PR #1789 was proposed but the author received no feedback.

@roji
Copy link
Member

roji commented Jun 6, 2023

You need to replace the scaffolded migration code with SQL - this is usually pretty easy to do by generating the actual SQL, and copying-and-pasting the generated CREATE TABLE into the scaffolded migration (replacing the CreateTable call) and adding the bits you need.

Hi, I'm interested on this feature. What I didn't understand is if I can somehow alter the table creation sql statement and append the partitioning rule to it or am I forced to completely rewrite the statement by hand.
Another thing that I don't understand is this. I saw that more than 2 years ago this PR #1789 was proposed but the author received no feedback.

I unfortunately didn't have time to devote to this. It's not great, but sometimes reviewing a PR takes a lot of time and effort, and I simply haven't gotten around to it.

@mmisztal1980
Copy link

@roji I take it, this feature is still not available?

@roji
Copy link
Member

roji commented Nov 10, 2023

Note #2934 which is about recognizing partitions in scaffolding, possibly before actual support is implemented.

@roji roji modified the milestones: 8.0.0, 9.0.0 Nov 13, 2023
@freerider7777
Copy link

wow... this issue is still alive :))

@KrzysztofBranicki
Copy link

Hi @roji , I think that recommendation to just manually edit EF generated migrations is very far from ideal for us. Our use case is that we have a system with over 50 microservices all written in .NET and all sharing some common infrastructural libraries. In those libraries among many other things we provide message forwarding to the message bus using transactional outbox pattern. Therefore definition of transactional outbox table and it's mappings is defined in library (nuget). Till now whenever we were modifying any of those infrastructure owned tables (yes we have more of them than just the outbox) we were just doing the changes in the nuget and then stating in release notes that it requires generating EF migration. It wasn't a problem because migration was generated automatically and developer accepted whatever code was generated. Now if we want to introduce partitioning we are forcing developers in over 50 microservices to manually edit migrations that they don't even understand.

Long story short I think that recommendation to edit migrations manually is not ideal but acceptable if you are writing whole application but it is not acceptable for library authors (think about something like ASP Identity team adding partitioning support to user entity forcing every developer using this feature to manually edit migrations).

Is it realistic to expect work for supporting this feature will happen for 9.0 or we need to assume it will be moved from version to version (like it was so far) and therefore we need to figure out some other solution?

@dashiell-zhang
Copy link

dashiell-zhang commented Dec 13, 2023

@freerider7777 I finally understood what exactly you're proposing - thanks for your patience.

However, after also reading https://www.postgresql-archive.org/posgres-12-bug-partitioned-table-td6139957.html, I don't think the EF Core provider should be (implicitly) working around this PG problem. It should be fairly easy to write a database interceptor which rewrites xmin into if that's what you want to do.txid_current()

image

image

Hello, @roji I used an interceptor to rewrite the SQL, but the SaveChanges data was successfully saved to the table, but an exception in the npgsql driver was triggered, and an error was still reported at the code level.

image

It seems that npgsql defaults to a bigint after receiving txid_current() as xmin, and an exception occurs when converting to the rowversion uint type of the ef model.

@dashiell-zhang
Copy link

image
After adding oid conversion to the sql interceptor, it can be used correctly.

@freerider7777
Copy link

Yes, we did such interceptor as I remember :)) Now we're on spring boot projects - other technology other problems :))

@roji
Copy link
Member

roji commented Dec 16, 2023

@KrzysztofBranicki you make a good point about libraries vs. applications; however, it simply isn't realistic to expect EF to cover the entire range of DDL possibilities across all supported databases (after all, the point you make has nothing to do with PostgreSQL table partitioning specifically). There's a huge space of options there, and there are bound to be users out there who want each one of them; modeling them in EF is non-trivial work and requires careful planning, and generally the return on that effort isn't great, given that users can do whatever they want by editing their migrations.

Having said that, there definitely does seem to be lots of interest in table partitioning specifically. I can't commit on this getting done for 9.0 - but give the number of votes this is definitely one of the highest-value features, and I promise to do my best.

@pantonis

This comment was marked as off-topic.

@roji

This comment was marked as off-topic.

@pantonis

This comment was marked as off-topic.

@roji

This comment was marked as off-topic.

@pantonis

This comment was marked as off-topic.

@roji

This comment was marked as off-topic.

@pantonis

This comment was marked as off-topic.

@roji

This comment was marked as off-topic.

@roji

This comment was marked as off-topic.

@pantonis

This comment was marked as off-topic.

@roji

This comment was marked as off-topic.

@dashiell-zhang
Copy link

But I don't have an option here. postgresql partition by range requires the partition column to be included in the primary key or unique constraints

@pantonis The primary key can use the Snowflake ID algorithm to generate a long type ID. The Snowflake ID algorithm has a built-in concept of time.

@n0099
Copy link

n0099 commented May 18, 2024

#1035 (comment)

@freerider7777 I finally understood what exactly you're proposing - thanks for your patience.

However, after also reading postgresql-archive.org/posgres-12-bug-partitioned-table-td6139957.html, I don't think the EF Core provider should be (implicitly) working around this PG problem. It should be fairly easy to write a database interceptor which rewrites xmin into txid_current() if that's what you want to do.

https://dba.stackexchange.com/questions/123145/how-to-view-tuples-changed-in-a-postgresql-transaction/123183#123183
https://stackoverflow.com/questions/49214219/what-is-the-meaning-of-epoch-in-txid-current-in-postgresql
https://www.postgresql.org/message-id/flat/141051591267657%40mail.yandex.ru

/// <see>https://www.postgresql.org/message-id/flat/141051591267657%40mail.yandex.ru</see>
/// <see>https://dba.stackexchange.com/questions/123145/how-to-view-tuples-changed-in-a-postgresql-transaction/123183#123183</see>
/// <see>https://stackoverflow.com/questions/49214219/what-is-the-meaning-of-epoch-in-txid-current-in-postgresql</see>
/// <see>https://github.com/npgsql/efcore.pg/issues/1035#issuecomment-2118584744</see>
protected class UseCurrentXactIdAsConcurrencyTokenCommandInterceptor : DbCommandInterceptor
{
    public static UseCurrentXactIdAsConcurrencyTokenCommandInterceptor Instance => new();

    public override InterceptionResult<DbDataReader> ReaderExecuting(
        DbCommand command,
        CommandEventData eventData,
        InterceptionResult<DbDataReader> result)
    {
        ManipulateCommand(command);
        return result;
    }

    public override ValueTask<InterceptionResult<DbDataReader>> ReaderExecutingAsync(
        DbCommand command,
        CommandEventData eventData,
        InterceptionResult<DbDataReader> result,
        CancellationToken cancellationToken = default)
    {
        ManipulateCommand(command);
        return new(result);
    }

    private static void ManipulateCommand(DbCommand command) =>
        command.CommandText = command.CommandText.Replace(
            "RETURNING xmin",
            "RETURNING pg_current_xact_id()::xid");
}
protected override void OnConfiguring(DbContextOptionsBuilder options)
{
+    options.AddInterceptors(UseCurrentXactIdAsConcurrencyTokenCommandInterceptor.Instance)

Also it's recommend to disable SAVEPOINT: dotnet/efcore#23269 (comment) as subxid of subtransaction is always larger than the parent xid: https://www.postgresql.org/docs/current/subxacts.html

n0099 added a commit to n0099/open-tbm that referenced this issue May 18, 2024
… `cannot retrieve a system column in this context` with partitioned table: npgsql/efcore.pg#1035 (comment) @ TbmDbContext.cs

$ mv Db/Entit{ies,y}WithImageId.cs
@ c#/shared
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.