Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BulkSave on entities with virtual properties *and* a transaction: massive performance hit and OOM crashes #14394

Open
2 tasks done
hardcodet opened this issue Feb 29, 2024 · 6 comments
Milestone

Comments

@hardcodet
Copy link

hardcodet commented Feb 29, 2024

Prerequisites

  • I have written a descriptive issue title
  • I have searched existing issues to ensure the bug has not already been reported

Mongoose version

8.2.0

Node.js version

20.x

MongoDB server version

Atlas 7.x

Typescript version (if applicable)

5.2.2

Description

SOLVED - problem due to calls to toObject on complex hydrated documents, not the transactions.

We noticed massive performance issues in PROD on rather trivial bulk updates. I ran some local tests and narrowed it down to virtual properties in combination with transactions/sessions).

This is the bulkSave we perform to update our entities.

await this.model.bulkSave(batch, { session: transaction?.session });

Our entities have 4 virtual properties, e.g.:

CredentialSchema.virtual("parentAuthorization")
    .get(function() {
        return (this as any)._parentAuthorization;
    })
    .set(function(v) {
        (this as any)._parentAuthorization = v;
    });

For my tests, I ran the bulkSave

  1. with the virtual properties declared in the schema, and populated (the problem)
  2. with the virtual properties declared in the schema, but set the field values to undefined before running the update
  3. without virtual properties in the schema, and setting field values to undefined before running the update.

2 and 3 resulted in the same performance, so I'll just list measures for 1 and 2 going forward.

Test Results

Here's some measurements with a clear outlier:

  • Updating 100 entities with populated virtual properties and no transaction: < 150 ms
  • Updating 100 entities with populated virtual properties within a transaction: ~ 4 seconds
  • Updating 100 entities without populated virtual properties and no transaction: < 150 ms
  • Updating 100 entities without populated virtual properties within a transaction: < 150 ms

With bigger arrays, the performance degradation is absolutely fatal, resulting in timeouts etc.
A test with 10K entities crashed my process because the JS heap ran out of memory (V8::FatalProcessOutOfMemory+662))

This looks like a bug to me, as the virtual properties shouldn't even be touched. My guess is that mongoose iterates through those virtual properties in order to detect changes rather than skipping them, which seems to introduce a massive overhead. That wouldn't explain why it only occurs with a transaction/session in the mix.

Right now, this really is a show stopper for us. Is there a better short-time solution than setting those properties to undefined before every update, and then reassigning them?

Thanks!

Steps to Reproduce

  • Create a schema with some virtual properties and create 100 documents.
  • Modify your entites by changing a flag
  • Then create a session and run a bulkSave in order to persist the changes.

Note in case it matters: some the virtual properties in our case are related mongoose entities.

Expected Behavior

No performance degradation at all from virtual properties. Also, virtual properties and transactions should not affect each other.

@hardcodet hardcodet changed the title Bulk operations entities with virtual properties *and* a transaction: massive performance hit and OOM crashes Bulk operations on entities with virtual properties *and* a transaction: massive performance hit and OOM crashes Feb 29, 2024
@hardcodet hardcodet changed the title Bulk operations on entities with virtual properties *and* a transaction: massive performance hit and OOM crashes BulkSave on entities with virtual properties *and* a transaction: massive performance hit and OOM crashes Feb 29, 2024
@vkarpov15 vkarpov15 added this to the 8.2.2 milestone Mar 5, 2024
@vkarpov15 vkarpov15 added the needs repro script Maybe a bug, but no repro script. The issue reporter should create a script that demos the issue label Mar 5, 2024
@vkarpov15
Copy link
Collaborator

I'm unable to repro, the following script shows that bulkSave() only takes 18ms:

const mongoose = require('mongoose');

void async function main() {
  await mongoose.connect('mongodb://127.0.0.1:27017,127.0.0.1:27018/mongoose_test');

  const ChildSchema = new mongoose.Schema({ name: String, parentId: 'ObjectId' });
  const ChildModel = mongoose.model('Child', ChildSchema);
  await ChildModel.deleteMany({});

  const ParentSchema = new mongoose.Schema({
    name: String
  });
  ParentSchema.virtual('child1', { ref: 'Child', localField: '_id', foreignField: 'parentId' });
  ParentSchema.virtual('child2', { ref: 'Child', localField: '_id', foreignField: 'parentId' });
  ParentSchema.virtual('child3', { ref: 'Child', localField: '_id', foreignField: 'parentId' });
  ParentSchema.virtual('child4', { ref: 'Child', localField: '_id', foreignField: 'parentId' });
  const ParentModel = mongoose.model('Parent', ParentSchema);
  await ParentModel.deleteMany({});

  const parents = [];
  const children = [];
  for (let i = 0; i < 100; ++i) {
    const parent = await ParentModel.create({ name: 'test parent ' + i });
    const child = await ChildModel.create({ name: 'test child ' + i, parentId: parent._id });
    parents.push(parent);
    children.push(child);
  }

  await mongoose.connection.transaction(async session => {
    const docs = await ParentModel.find().session(session).populate(['child1', 'child2', 'child3', 'child4']);
    for (const doc of docs) {
      doc.name = 'test parent';
    }
    const start = Date.now();
    await ParentModel.bulkSave(docs, { session });
    console.log('BulkSave ms:', Date.now() - start);
  });

  console.log('Done');
}();

Can you try modifying the above script to demonstrate the issue you're seeing?

@vkarpov15 vkarpov15 removed this from the 8.2.2 milestone Mar 7, 2024
@vkarpov15 vkarpov15 added can't reproduce Mongoose devs have been unable to reproduce this issue. Close after 14 days of inactivity. and removed needs repro script Maybe a bug, but no repro script. The issue reporter should create a script that demos the issue labels Mar 7, 2024
@hardcodet
Copy link
Author

Hi Vitali

TL;DR;

After realizing it was on us, heading over to Open Collective and making a donation for your lost time was the least I could do.
Thanks for your support!

Long version:

Thanks to your script, I could narrow down the issue, and ... it's us :(

We have a until function that creates a snapshot of our entities in case of a transaction using toObject(). This proves to be fatal in combination with the fact that we configured our schemas to serialize virtuals (which were hydrated, so traversing is slow).

Removing the toObject() call below immediately gave us a performance boost of factor 40 🚀. I knew that traversing hydrated documents is slow, but I was still surprised that invoking toObject() on 100 hydrated docs took nearly 4 seconds

image

@vkarpov15
Copy link
Collaborator

Thanks for looking into this, I'll check this out and see if I can repro. That does sound extremely slow.

@vkarpov15 vkarpov15 added needs repro script Maybe a bug, but no repro script. The issue reporter should create a script that demos the issue and removed can't reproduce Mongoose devs have been unable to reproduce this issue. Close after 14 days of inactivity. labels Mar 14, 2024
@vkarpov15 vkarpov15 modified the milestones: 8.2.2, 8.2.3, 8.2.4 Mar 14, 2024
@vkarpov15 vkarpov15 modified the milestones: 8.2.4, 8.2.5 Mar 28, 2024
@vkarpov15
Copy link
Collaborator

@hardcodet do your populated subdocs have other virtuals or other populated properties? Because I'm still unable to repro with the following, running in about 15ms on my laptop, no indication of anything that would cause the toObject() call to take minutes:

'use strict';

const mongoose = require('mongoose');

void async function main() {
  await mongoose.connect('mongodb://127.0.0.1:27017,127.0.0.1:27018/mongoose_test');

  const ChildSchema = new mongoose.Schema({ name: String, parentId: 'ObjectId' });
  const ChildModel = mongoose.model('Child', ChildSchema);
  await ChildModel.deleteMany({});

  const ParentSchema = new mongoose.Schema({
    name: String
  });
  ParentSchema.virtual('child1', { ref: 'Child', localField: '_id', foreignField: 'parentId' });
  ParentSchema.virtual('child2', { ref: 'Child', localField: '_id', foreignField: 'parentId' });
  ParentSchema.virtual('child3', { ref: 'Child', localField: '_id', foreignField: 'parentId' });
  ParentSchema.virtual('child4', { ref: 'Child', localField: '_id', foreignField: 'parentId' });
  const ParentModel = mongoose.model('Parent', ParentSchema);
  await ParentModel.deleteMany({});

  const parents = [];
  const children = [];
  for (let i = 0; i < 100; ++i) {
    const parent = await ParentModel.create({ name: 'test parent ' + i });
    const child = await ChildModel.create({ name: 'test child ' + i, parentId: parent._id });
    parents.push(parent);
    children.push(child);
  }


  await mongoose.connection.transaction(async session => {
    const docs = await ParentModel.find().session(session).populate(['child1', 'child2', 'child3', 'child4']);
    for (const doc of docs) {
      doc.name = 'test parent';
    }
    const start = Date.now();
    docs.forEach(doc => doc.toObject({ virtuals: true }));
    console.log('BulkSave and toObject ms:', Date.now() - start);
  });

  console.log('Done');
}();

I'd love to try to repro this slowdown if possible

@vkarpov15 vkarpov15 removed this from the 8.3.1 milestone Apr 8, 2024
@vkarpov15 vkarpov15 added can't reproduce Mongoose devs have been unable to reproduce this issue. Close after 14 days of inactivity. and removed needs repro script Maybe a bug, but no repro script. The issue reporter should create a script that demos the issue labels Apr 8, 2024
@hardcodet
Copy link
Author

It's only one entity type that has virtuals, but those are a few:

  • a reference to a hydrated parent entity
  • a reference to a field of that parent entity
  • an array of child entities (potentially hundreds, also hydrated)
  • an array of temporary objects which in turn reference hydrated child entities

Also, we assign the virtuals in our code, so there's no refs or anything. Basically, after querying the entities, we build them up. So it looks something like this:

for (const foo of foos) {
    foo.parent = parentSetMap.get(foo.parentId);
    foo.parentBar = foo.parent.bar;

    foo.childs = [ ... ]
    foo.childAbstractions = foo.childs.map(c => { child: c, xxx: ... } )
}

We already noticed in other places (not related to toObject), that iterating through arrays of hydrated entities is much slower than iterating through regular arrays. We had to work around it in multiple places, e.g. by building maps and continue to work with those to prevent more than 1 iteration through an array, which gave us massive performance improvements. That might also contribute, since those virtuals are arrays in parts, and they also contain arrays themselves. If the code recurses into them, that might explain everything.

If it helps, I could send you the actual schemas (would have to be in a DM though)

@vkarpov15
Copy link
Collaborator

You're right, with many children this gets very slow. The following takes 2.5sec:

'use strict';

const mongoose = require('mongoose');

void async function main() {
  await mongoose.connect('mongodb://127.0.0.1:27017,127.0.0.1:27018/mongoose_test');

  const ChildSchema = new mongoose.Schema({ name: String, parentId: 'ObjectId' });
  const ChildModel = mongoose.model('Child', ChildSchema);
  await ChildModel.deleteMany({});

  const ParentSchema = new mongoose.Schema({
    name: String
  });
  ParentSchema.virtual('child1', { ref: 'Child', localField: '_id', foreignField: 'parentId' });
  ParentSchema.virtual('child2', { ref: 'Child', localField: '_id', foreignField: 'parentId' });
  ParentSchema.virtual('child3', { ref: 'Child', localField: '_id', foreignField: 'parentId' });
  ParentSchema.virtual('child4', { ref: 'Child', localField: '_id', foreignField: 'parentId' });
  const ParentModel = mongoose.model('Parent', ParentSchema);
  await ParentModel.deleteMany({});

  const parents = [];
  for (let i = 0; i < 200; ++i) {
    const parent = await ParentModel.create({ name: 'test parent ' + i });
    const children = [];
    console.log(`${i} / 200`);
    for (let j = 0; j < 200; ++j) {
      children.push({ name: 'test child ' + i + '_' + j, parentId: parent._id });
    }
    await ChildModel.create(children);
    parents.push(parent);
  }

  await mongoose.connection.transaction(async session => {
    const docs = await ParentModel.find().session(session).populate(['child1', 'child2', 'child3', 'child4']);
    for (const doc of docs) {
      doc.name = 'test parent';
    }
    const start = Date.now();
    docs.forEach(doc => doc.toObject({ virtuals: true }));
    console.log('BulkSave and toObject ms:', Date.now() - start);
  });

  console.log('Done');
}();

I'll take a look and see if we can fix this slowdown

@vkarpov15 vkarpov15 added performance and removed can't reproduce Mongoose devs have been unable to reproduce this issue. Close after 14 days of inactivity. labels Apr 15, 2024
@vkarpov15 vkarpov15 added this to the 8.3.3 milestone Apr 15, 2024
vkarpov15 added a commit that referenced this issue Apr 29, 2024
perf(document): add fast path for applying non-nested virtuals to JSON
@vkarpov15 vkarpov15 modified the milestones: 8.3.3, 8.3.4 Apr 29, 2024
vkarpov15 added a commit that referenced this issue May 6, 2024
@vkarpov15 vkarpov15 modified the milestones: 8.3.4, 8.3.5 May 6, 2024
vkarpov15 added a commit that referenced this issue May 6, 2024
perf(document): avoid cloning options using spread operator for perf reasons
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants