Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reusing DB context issue with large collections #438

Open
ctrlaltdan opened this issue May 18, 2023 · 4 comments
Open

Reusing DB context issue with large collections #438

ctrlaltdan opened this issue May 18, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@ctrlaltdan
Copy link

Hi all,

I've been using the EfCoreSave operator with SaveMode.EntityFrameworkCore option enabled as I'm planning to write my data to Postgres.

I've noticed when dealing with large streams, this operator creates a bottleneck for two reasons:

  1. A single TPT is used to write the data
  2. The DbContext is never disposed of, therefore creating a large quantity of unused entities in the change tracker (albeit in with "detached" state).

I've changed a local copy to look as follows:

var ret = args.SourceStream.Observable
    .Chunk(args.BatchSize)
    .Map(i => i.Select(j => (Input: j, Entity: args.GetEntity(j))).ToList())
    .Do(i =>
    {
        var dbContextFactory = args.KeyedConnection == null
            ? this.ExecutionContext.DependencyResolver.Resolve<IDbContextFactory<TMyContext>>()
            : this.ExecutionContext.DependencyResolver.Resolve<IDbContextFactory<TMyContext>>(args.KeyedConnection);
        using (var dbContext = dbContextFactory.CreateDbContext())
        {
            ProcessBatch(i, dbContext, args.BulkLoadMode);
        }
})

MS Docs on DbContextFactory

This allows multiple streams to write to the database at the same time, and crucially, disposes the DbContext after each operation (considerably improving performance/memory overhead in the change tracker).

On my sample of 500k+ entries the import is now silky smooth.

Was there a deliberate design decision behind this?

@paillave
Copy link
Owner

Hello,
It is a very good suggestion indeed, even if the implementation you bring here has a flaw (in case of several operators working against EF core, InvokeInDedicatedThreadAsync ensures no call is done in parallel since it is not supported).
I will work on a amendment on this regard. Thanks a lot for this! 😃

@paillave paillave self-assigned this May 19, 2023
@paillave paillave added the enhancement New feature or request label May 19, 2023
@ctrlaltdan
Copy link
Author

Great to hear!

I eagerly removed the dbcontext instance from the thread lock as I assumed the node instance was threadsafe, and the dbcontext instance was local to the batch anyway.

My thinking that parallel queries are supported, just not within a single dbcontext. Similar to how the .net request pipeline doesn't enforce singleton access to a dbcontext.

Thanks for picking this up. I'm enjoying learning about your framework.

@paillave
Copy link
Owner

paillave commented May 19, 2023

Operators are meant to work in parallel, (that's the purpose). But EF core raises an exception when a query is already being run on the same DbContext in a concurrent thread (this is for this very purpose I had to develop the method InvokeInDedicatedThreadAsync).

Just as an extra information, in the current implementation, the fact the dbContext is not disposed it is absolutely normal. It is not up to an ETL operator to decide that it must be trashed as it has been created in a much higher level context.

Of course, if you issue a different DbContext at each use, everything that is mentioned above is irrelevant, but in most of situations a DbContext is directly given to the ProcessRunner. So what you suggest will be another way to work: if a IDbContextFactory it will be used instead.

About proper release of entities, I will make a quick win: execute a dbContext.ChangeTracker.Clear() at every batch unless the option KeepChangeTracker is activated. This will permit entities to be released from memory if they are not used anywhere else.

paillave pushed a commit that referenced this issue May 19, 2023
paillave added a commit that referenced this issue May 19, 2023
@paillave
Copy link
Owner

a new version 2.1.6-beta is available. I didn't test it too well to speak franckly. You can give it a try. I should work with IDbContextFactory out of the box. If you want to specify the type of the DbContext, you must call .WithContextType<your-dbContext-type>() on the options builder.
Let me know if it suits you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants