Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BulkInsertOrUpdateAsync performance questions #1485

Open
bkulcsar opened this issue May 10, 2024 · 3 comments
Open

BulkInsertOrUpdateAsync performance questions #1485

bkulcsar opened this issue May 10, 2024 · 3 comments
Labels

Comments

@bkulcsar
Copy link

Hi Guys!

I need some help regarding to BulkInsertOrUpdateAsync operation, I could not find any performance benchmarks related to BulkInsertOrUpdateAsync, however I think it not as fast as I would expect from SQL MERGE operation. I am using Azure MS SQL server with 200 DTU and for 250k records the BulkInsertOrUpdateAsync tooks arround 6 minues for the first run (when basically all records are new so it does only insert), and when we load again the same records with some changes it tooks 2 minutes. If course increasing the MS DB's DTU would help the performance, but first I want to optimize the BulkInsertOrUpdateAsync as much as possible.

So my question is that is 6 mins normal to load 250k records with BulkInsertOrUpdateAsync with 200 DTU Azure SQL? Can I improve its perfomance with configuration? I cannot find any config in BulkConfig which might can help.

Code for the merge operation:

public virtual async Task BulkMergeAsync(
    IEnumerable<T> entities,
    BulkConfig? bulkConfig = null,
    CancellationToken cancellationToken = default)
{
    try
    {
        if (bulkConfig == null)
        {
            bulkConfig = new BulkConfig();
        }

        bulkConfig.CalculateStats = true;
        bulkConfig.BulkCopyTimeout = 0;

        await _dbContext.BulkInsertOrUpdateAsync(entities, bulkConfig, cancellationToken: cancellationToken);

        if (bulkConfig.StatsInfo != null)
        {
            _logger.LogInformation($"Record inserted: {bulkConfig.StatsInfo.StatsNumberInserted}, Record updated: {bulkConfig.StatsInfo.StatsNumberUpdated}");
        }
    }
    catch (Exception ex)
    {
        _logger.LogError($"Error during BulkInsertOrUpdate: {ex.Message}");
        throw;
    }        
}
@bkulcsar
Copy link
Author

I was able to find out that not the MERGE operation is slow, but the INSERT to the temporary created table.
Is there a way to omit the COLLATE in the insert statement?

@borisdj
Copy link
Owner

borisdj commented May 14, 2024

Both Insert or Update should be done much faster, as you check from ReadMe stats even 1 mil. is under a minute.
But that is done on local Sql, not sure about Azure, still should not be over 1 min.
Where did you find COLLATE statement in the code ?

@bkulcsar
Copy link
Author

I've changed the pricing model from DTU based to vCore based, it became much faster, however it's still not fast enough.
I've tested the operations with 2 million records:

80 vCores - 55 seconds - insert
80 vCores - 190 seconds - merge
16 vCores - 58 seconds - insert
16 vCores - 206 second - merge
8 vCores - 100 seconds - insert
8 vCores - 300 seconds - merge

I've profiled the SQL statements generated by the library and I saw there is a COLLATE function for every column:

insert bulk <table> (<column1> NVarChar(40) COLLATE SQL_Latin1_General_CP1_CI_AS, <column2> NVarChar(100) COLLATE SQL_Latin1_General_CP1_CI_AS ....)

This may can cause performance issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants