Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeJS generator seems to generate clients with memory leak #823

Closed
olavloite opened this issue Feb 11, 2021 · 23 comments
Closed

NodeJS generator seems to generate clients with memory leak #823

olavloite opened this issue Feb 11, 2021 · 23 comments
Assignees
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@olavloite
Copy link

The NodeJS gapic clients seem to have a memory leak that can be triggered using the following steps:

  1. Open a gapic client.
  2. Execute one simple request.
  3. Close the gapic client.
  4. Repeat.

This issue was first reported for the Spanner hand-written client, but seems to be more of a generic problem with the generated gapic clients.

The following simple application shows the behavior for both PubSub and Spanner. The memory consumption of the application increases with approx 100KB-200KB for each iteration.

index.js

import {InstanceAdminClient} from "@google-cloud/spanner/build/src/v1/instance_admin_client.js";
import {PublisherClient} from "@google-cloud/pubsub/build/src/v1/publisher_client.js"

await main();

async function main() {
    const REPETITIONS = 50;
    console.log(`Running ${REPETITIONS} times`);
    for (let i = 0; i < REPETITIONS; i++) {
        await usePubSubGapic();
        global.gc();
        const used = process.memoryUsage().heapUsed / 1024 / 1024;
        console.log(
            `${i} Current mem usage ${Math.round(used * 100) / 100} MB`
        );
    }
    for (let i = 0; i < REPETITIONS; i++) {
        await useSpannerGapic();
        global.gc();
        const used = process.memoryUsage().heapUsed / 1024 / 1024;
        console.log(
            `${i} Current mem usage ${Math.round(used * 100) / 100} MB`
        );
    }
}

async function useSpannerGapic() {
    const client = new InstanceAdminClient({
        projectId: 'my-project',
        keyFile: '/path/to/key.json',
    });
    await client.listInstanceConfigs({
        parent: 'projects/my-project',
    });
    await client.close();
}

async function usePubSubGapic() {
    const client = new PublisherClient({
        projectId: 'my-project',
        keyFile: '/path/to/key.json',
    });
    await client.listTopics({
        project: 'projects/my-project',
    });
    await client.close();
}

package.json

{
  "type": "module",
  "name": "test-gapic-client",
  "version": "1.0.0",
  "main": "index.js",
  "dependencies": {
    "@google-cloud/pubsub": "^2.8.0",
    "@google-cloud/spanner": "^5.4.0"
  }
}

Output at 50 iterations. Running this script with more iterations and/or with tight memory settings will cause an out of memory error.

Running 50 times
0 Current mem usage 11.7 MB
1 Current mem usage 11.89 MB
2 Current mem usage 12.02 MB
3 Current mem usage 12.21 MB
...
41 Current mem usage 15.84 MB
42 Current mem usage 15.94 MB
43 Current mem usage 16.01 MB
44 Current mem usage 16.09 MB
45 Current mem usage 16.2 MB
46 Current mem usage 16.27 MB
47 Current mem usage 16.37 MB
48 Current mem usage 16.46 MB
49 Current mem usage 16.57 MB

0 Current mem usage 17.01 MB
1 Current mem usage 17.19 MB
2 Current mem usage 17.39 MB
3 Current mem usage 17.53 MB
...
40 Current mem usage 22.58 MB
41 Current mem usage 22.68 MB
42 Current mem usage 22.8 MB
43 Current mem usage 22.93 MB
44 Current mem usage 23.06 MB
45 Current mem usage 23.19 MB
46 Current mem usage 23.34 MB
47 Current mem usage 23.48 MB
48 Current mem usage 23.57 MB
49 Current mem usage 23.85 MB
@olavloite olavloite changed the title NodeJS generator seem to generate clients with memory leak NodeJS generator seems to generate clients with memory leak Feb 11, 2021
@miraleung
Copy link
Contributor

miraleung commented Feb 18, 2021

Is this issue for the monolith or gapic-generator-typescript?

@alexander-fenster
Copy link
Contributor

By looking at the paths (build/src/...) I understand it's for gapic-generator-typescript. I'll move the issue over there.

@alexander-fenster alexander-fenster transferred this issue from googleapis/gapic-generator Feb 18, 2021
@alexander-fenster
Copy link
Contributor

Thank you for the report, I'll investigate. The leak may be on various levels: the GAPIC client itself, google-gax, protobuf.js, or gRPC. With your test, I believe I will be able to figure out on which layer the memory is not reclaimed properly, and hopefully fix it.

Having said that, I need to point out that the client creation is a heavy operation that involves synchronous reading of a big file from the filesystem, and we never actually tested that creating and closing the client has zero memory footprint, just because this is not something that should happen often in the process. What is the use case that requires creating more and more clients and close them?

@olavloite
Copy link
Author

Having said that, I need to point out that the client creation is a heavy operation that involves synchronous reading of a big file from the filesystem, and we never actually tested that creating and closing the client has zero memory footprint, just because this is not something that should happen often in the process. What is the use case that requires creating more and more clients and close them?

The issue was originally reported for the Spanner hand-written client. The use case as I understand it, is that a single application opens several different connections to Spanner using specific credentials for different users / clients / customers.
@tomdee Could you confirm my above assumption, or otherwise elaborate a little bit on the use case?

@yoshi-automation yoshi-automation added 🚨 This issue needs some love. triage me I really want to be triaged. labels Feb 18, 2021
@tomdee
Copy link

tomdee commented Feb 18, 2021

@alexander-fenster Thanks for taking a look a this.

Just to elaborate on the use case some more. I work for a security/compliance company and we're scanning the GCP accounts of our customers for compliance violations. Hence the need to be creating 1000's of these clients with different credentials.

Anecdotally, (and I don't meant for this to sound like a dig) we do a similar scanning process for many other cloud services (including Azure and AWS) and this is the only node client library where we're seeing these leaks.

@alexander-fenster alexander-fenster added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Feb 18, 2021
@alexander-fenster
Copy link
Contributor

Thanks @tomdee for your explanation! It totally makes sense (even though, as you could guess, this is not the common use case). I'll update the bug when I figure out where exactly the problem happens.

@Utsav2
Copy link

Utsav2 commented Mar 2, 2021

BTW, It seems like the long running operations client never gets closed.

https://github.com/googleapis/gapic-generator-typescript/blob/master/templates/typescript_gapic/src/%24version/%24service_client.ts.njk#L75

That seems to contribute a large part of the problem, but there's still some other leaks there.

@AlanGasperini AlanGasperini added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Mar 4, 2021
@AlanGasperini
Copy link

This bug is blocking a p1, so it's p1 also

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Mar 5, 2021
@Utsav2
Copy link

Utsav2 commented Apr 30, 2021

@alexander-fenster do you have any updates on this issue? We'd really like to mitigate some memory leaks in production and this would be hugely beneficial to us.

@alexander-fenster
Copy link
Contributor

@Utsav2 One thing that I did that could make the situation better was introducing of the proto caching in google-gax, which is available since google-gax v2.11.0, so if your npm ls google-gax shows something older than that, please upgrade.

And thanks for pinging me here, we'll update this issue when we have more information.

@Utsav2
Copy link

Utsav2 commented May 3, 2021

Thanks @alexander-fenster! I can share what context I have on this leak to help with your investigation.

The operations client here doesn't have a close method, which I think we could add: https://github.com/googleapis/gax-nodejs/blob/889730c1548a6dcc0b082a24c59a9278dd2296f6/src/operationsClient.ts#L63

Then we need to bump the dependency pin in this repository, and close the lro client when this client is closed. https://github.com/googleapis/gapic-generator-typescript/blob/master/templates/typescript_gapic/src/%24version/%24service_client.ts.njk#L75

@olavloite
Copy link
Author

@alexander-fenster Thanks for the update.

I tried my reproduction case (only the Spanner client, not the PubSub client) with google-gax v.2.12.0, and it does make a difference, but it does not solve the entire problem.

Memory Usage with v2.10.3:

Running 50 times
0 Current mem usage 11.32 MB
1 Current mem usage 11.58 MB
2 Current mem usage 11.71 MB
3 Current mem usage 11.78 MB
4 Current mem usage 11.83 MB
5 Current mem usage 11.89 MB
6 Current mem usage 12.05 MB
7 Current mem usage 12.05 MB
8 Current mem usage 12.11 MB
9 Current mem usage 12.28 MB
10 Current mem usage 12.13 MB
11 Current mem usage 12.19 MB
12 Current mem usage 12.18 MB
13 Current mem usage 12.2 MB
14 Current mem usage 12.24 MB
15 Current mem usage 12.26 MB
16 Current mem usage 12.32 MB
17 Current mem usage 12.33 MB
18 Current mem usage 12.34 MB
19 Current mem usage 12.5 MB
20 Current mem usage 12.39 MB
21 Current mem usage 12.46 MB
22 Current mem usage 12.44 MB
23 Current mem usage 12.46 MB
24 Current mem usage 12.48 MB
25 Current mem usage 12.51 MB
26 Current mem usage 12.77 MB
27 Current mem usage 12.57 MB
28 Current mem usage 12.57 MB
29 Current mem usage 12.61 MB
30 Current mem usage 12.62 MB
31 Current mem usage 12.64 MB
32 Current mem usage 12.67 MB
33 Current mem usage 12.77 MB
34 Current mem usage 12.74 MB
35 Current mem usage 12.78 MB
36 Current mem usage 12.98 MB
37 Current mem usage 12.8 MB
38 Current mem usage 12.86 MB
39 Current mem usage 12.87 MB
40 Current mem usage 12.89 MB
41 Current mem usage 12.91 MB
42 Current mem usage 12.93 MB
43 Current mem usage 12.99 MB
44 Current mem usage 13 MB
45 Current mem usage 13.01 MB
46 Current mem usage 13.06 MB
47 Current mem usage 13.04 MB
48 Current mem usage 13.11 MB
49 Current mem usage 13.08 MB

Memory Usage with v2.12.0

Running 50 times
0 Current mem usage 11.29 MB
1 Current mem usage 11.33 MB
2 Current mem usage 11.42 MB
3 Current mem usage 11.45 MB
4 Current mem usage 11.56 MB
5 Current mem usage 11.6 MB
6 Current mem usage 11.67 MB
7 Current mem usage 11.69 MB
8 Current mem usage 11.73 MB
9 Current mem usage 11.74 MB
10 Current mem usage 11.77 MB
11 Current mem usage 11.79 MB
12 Current mem usage 11.83 MB
13 Current mem usage 11.85 MB
14 Current mem usage 11.88 MB
15 Current mem usage 11.9 MB
16 Current mem usage 11.91 MB
17 Current mem usage 11.93 MB
18 Current mem usage 11.95 MB
19 Current mem usage 11.96 MB
20 Current mem usage 11.99 MB
21 Current mem usage 12 MB
22 Current mem usage 12.02 MB
23 Current mem usage 12.05 MB
24 Current mem usage 12.09 MB
25 Current mem usage 12.11 MB
26 Current mem usage 12.12 MB
27 Current mem usage 12.15 MB
28 Current mem usage 12.17 MB
29 Current mem usage 12.2 MB
30 Current mem usage 12.21 MB
31 Current mem usage 12.23 MB
32 Current mem usage 12.27 MB
33 Current mem usage 12.28 MB
34 Current mem usage 12.3 MB
35 Current mem usage 12.34 MB
36 Current mem usage 12.35 MB
37 Current mem usage 12.37 MB
38 Current mem usage 12.39 MB
39 Current mem usage 12.41 MB
40 Current mem usage 12.44 MB
41 Current mem usage 12.46 MB
42 Current mem usage 12.49 MB
43 Current mem usage 12.51 MB
44 Current mem usage 12.76 MB
45 Current mem usage 12.55 MB
46 Current mem usage 12.57 MB
47 Current mem usage 12.59 MB
48 Current mem usage 12.59 MB
49 Current mem usage 12.62 MB

Memory Usage with Closing Operations Client

I also created a local build of google-gax and added a close method to the operationsClient as suggested by @Utsav2. That also makes an additional difference, although it still does not solve the entire problem. The memory usage with a custom built v2.12.0 that supports closing the operations client is as follows:

Running 50 times
0 Current mem usage 11.43 MB
1 Current mem usage 11.31 MB
2 Current mem usage 11.39 MB
3 Current mem usage 11.42 MB
4 Current mem usage 11.53 MB
5 Current mem usage 11.56 MB
6 Current mem usage 11.6 MB
7 Current mem usage 11.62 MB
8 Current mem usage 11.64 MB
9 Current mem usage 11.67 MB
10 Current mem usage 11.68 MB
11 Current mem usage 11.7 MB
12 Current mem usage 11.72 MB
13 Current mem usage 11.74 MB
14 Current mem usage 11.74 MB
15 Current mem usage 11.76 MB
16 Current mem usage 11.77 MB
17 Current mem usage 11.79 MB
18 Current mem usage 11.8 MB
19 Current mem usage 11.81 MB
20 Current mem usage 11.82 MB
21 Current mem usage 11.84 MB
22 Current mem usage 11.85 MB
23 Current mem usage 11.87 MB
24 Current mem usage 11.9 MB
25 Current mem usage 11.91 MB
26 Current mem usage 11.92 MB
27 Current mem usage 11.93 MB
28 Current mem usage 11.94 MB
29 Current mem usage 11.97 MB
30 Current mem usage 11.97 MB
31 Current mem usage 11.99 MB
32 Current mem usage 12 MB
33 Current mem usage 12.16 MB
34 Current mem usage 12.04 MB
35 Current mem usage 12.07 MB
36 Current mem usage 12.08 MB
37 Current mem usage 12.08 MB
38 Current mem usage 12.09 MB
39 Current mem usage 12.1 MB
40 Current mem usage 12.12 MB
41 Current mem usage 12.13 MB
42 Current mem usage 12.15 MB
43 Current mem usage 12.17 MB
44 Current mem usage 12.18 MB
45 Current mem usage 12.2 MB
46 Current mem usage 12.2 MB
47 Current mem usage 12.21 MB
48 Current mem usage 12.23 MB
49 Current mem usage 12.24 MB

So each of the above steps seem to help a little, but there also seems to be at least one more issue hidden somewhere.

@olavloite
Copy link
Author

@alexander-fenster Is there any update on this?

@summer-ji-eng
Copy link
Collaborator

@olavloite This issue is triaged and I will investigate in late of July sprint. Will update the further info here. Thanks for the patience.

@hkdevandla hkdevandla removed the 🚨 This issue needs some love. label Aug 2, 2021
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Aug 3, 2021
@alicejli
Copy link
Contributor

alicejli commented Aug 16, 2021

Hello! Quick update here - I've made a fix to the generator that will close the mixin clients (i.e. IAMPolicy, LongrunningOperations, and Locations) when calling the close() function on the main service client.
Basically, this fix: googleapis/gax-nodejs#1047 added the ability to close the LongrunningOperations client, and this fix: #953 will actually close the client (the mixin clients are generated underneath the service clients to enable additional functionality).

Once nodejs-spanner is regenerated with the latest version of the generator, we should see significant reduction in memory leakage for multiple creation of clients. It does seem like the listInstanceConfigs method is causing most of the memory leak after this fix (as opposed to the generation of the client itself). Using the async version of the method listInstanceConfigsAsync reduces memory leakage by ()% so in the short term I'd recommend using that if possible.

I did some more digging to see if I could reduce the memory leakage some more, but so far haven't found an additional solution. Interestingly enough, if I add the ability to take a heap snapshot in the program, it seems to reduce the memory leakage slightly, which maybe suggests something funky going on with garbage collection (in addition to something else causing the leaks).

Sample code (with the close() fix mentioned above):

'use strict';
 
import {v1} from '@google-cloud/spanner';
import * as fs from 'fs';
import * as v8 from 'v8';

function createHeapSnapshot() {
    const snapshotStream = v8.getHeapSnapshot();
    // It's important that the filename end with `.heapsnapshot`,
    // otherwise Chrome DevTools won't open it.
    const fileName = `${Date.now()}.heapsnapshot`;
    const fileStream = fs.createWriteStream(fileName);
    snapshotStream.pipe(fileStream);
  }
  
await main();

async function main() {
    const REPETITIONS = 1000;
    let total = 0;
    for (let i = 0; i < REPETITIONS; i++) {
        await useSpannerGapic();
        global.gc();
        const used = process.memoryUsage().heapUsed / 1024 / 1024;
        console.log(
            `${i} Current mem usage ${Math.round(used * 100) / 100} MB`
        );
        // if(i % 100 == 0){
        //     createHeapSnapshot();
        // }
    }
}

async function useSpannerGapic() {
    let client = new v1.InstanceAdminClient({
        projectId: '<sample_project_id>',
        keyFile: '<sample_key_file>,
    });
    let c = await client.listInstanceConfigs({
        parent: 'projects/<sample_project_id>',
    });
    await client.close();
}

Output without heap snapshot (over 1000 iterations, memory leakage of ~1.4MB):

0 Current mem usage 17.2 MB
1 Current mem usage 17.27 MB
2 Current mem usage 17.34 MB
3 Current mem usage 17.44 MB
4 Current mem usage 17.47 MB
...
995 Current mem usage 18.56 MB
996 Current mem usage 18.55 MB
997 Current mem usage 18.56 MB
998 Current mem usage 18.56 MB
999 Current mem usage 18.56 MB

Output with heap snapshot (over 1000 iterations, memory leakage of ~0.9MB):

0 Current mem usage 17.2 MB
1 Current mem usage 17.05 MB
2 Current mem usage 17.09 MB
3 Current mem usage 17.1 MB
4 Current mem usage 17.11 MB
5 Current mem usage 17.13 MB
...
995 Current mem usage 18.13 MB
996 Current mem usage 18.13 MB
997 Current mem usage 18.12 MB
998 Current mem usage 18.12 MB
999 Current mem usage 18.12 MB

@olavloite
Copy link
Author

@alicejli Thanks for the fix and elaborate explanation. Just to be sure: There are no additional changes that we need to do in the Spanner client to invoke any of the close methods that are added in the generated client, other than what has already been added in googleapis/nodejs-spanner#1416. Is that correct?

@alicejli
Copy link
Contributor

Ah interesting, I didn't see that PR. I am not totally sure how the index.ts file corresponds to the various client.js files, but the generator fix would be to add this.operationsClient.close() after line 1928 here: https://github.com/googleapis/nodejs-spanner/blob/master/src/v1/instance_admin_client.ts (and any other clients that are using longrunning operations).

When I test this change locally, it reduces the memory leakage by ~50% as opposed to having the change just in index.ts, so my sense is somehow the close() function in the index.ts file is not the one being referenced?


Some additional investigation - it seems that for the instance_admin_client, the biggest source of memory leak in instantiating the client is the part of the client that instantiates the operations client (line 216 https://github.com/googleapis/nodejs-spanner/blob/master/src/v1/instance_admin_client.ts):

      // This API contains "long-running operations", which return a
        // an Operation object that allows for tracking of the operation,
        // rather than holding a request open.
        this.operationsClient = this._gaxModule
            .lro({
            auth: this.auth,
            grpc: 'grpc' in this._gaxGrpc ? this._gaxGrpc.grpc : undefined,
        })
            .operationsClient(opts);

I identified this by adding a for loop set to run 10 times over every part of the constructor, and this was the only part of the constructor that generated a significant change in memory usage. It seems that even adding this.operationsClient.close() to the close() function does not quite clear out everything in the old operationsClient.

@olavloite
Copy link
Author

Ah interesting, I didn't see that PR. I am not totally sure how the index.ts file corresponds to the various client.js files ...

The index.ts file is the main entry point for the hand-written Spanner client. A user will create a Spanner instance that encapsulates the underlying generated clients. When close() is called on the hand-written Spanner client, it will also call close() on all the underlying generated clients (including any operations client it can find). But if that close() function did not correctly clean up all resources prior to #953, that could certainly explain large parts of the remaining memory leak that we were seeing with only the fix in googleapis/nodejs-spanner#1416.

TLDR version: Once the Spanner hand-written client is using the newest generated client that includes the fix in #953, I think we have solved most of the problem. I'll test that when the client has been re-generated.

@alicejli
Copy link
Contributor

Thanks for explaining! It looks like the PR for the newly generated client: googleapis/nodejs-spanner#1455 was just merged; let me know how your testing goes.

@hkdevandla
Copy link
Member

@alicejli , is this good to be closed assuming this is done?

@alicejli
Copy link
Contributor

@olavloite I'm going to close this with the assumption the memory leak issue with client generation is fixed. Please feel free to re-open if not. Thanks!

@olavloite
Copy link
Author

@olavloite I'm going to close this with the assumption the memory leak issue with client generation is fixed. Please feel free to re-open if not. Thanks!

@alicejli Sorry, I should have posted my findings here as well (copy-paste from googleapis/nodejs-spanner#1306 (comment)):

With the latest changes in the generated clients the memory leak when opening and closing a new client repeatedly has been significantly reduced, but not completely eliminated. Running a simple script that opens a simple generated client, executes one request with it and then closes the client will now leak approx 2MB after 100 iterations. That is approx 1/10th of the original problem.

Test script

const {Spanner, v1} = require('@google-cloud/spanner');

const REPETITIONS = 100;
console.log(`Running ${REPETITIONS} times`);

function main() {
    async function test() {
        for (let i = 0; i < REPETITIONS; i++) {
            await useGapic();
            global.gc();
            const used = process.memoryUsage().heapUsed / 1024 / 1024;
            console.log(`${i}: Current mem usage ${Math.round(used * 100) / 100} MB`);
        }
    }

    async function useGapic() {
        const client = new v1.InstanceAdminClient({
            projectId: 'my-project',
            keyFile: '/path/to/my-key.json',
        });
        await client.listInstanceConfigs({
            parent: 'projects/my-project',
        });
        await client.close();
    }
    test();
}

process.on('unhandledRejection', err => {
    console.error(err.message);
    process.exitCode = 1;
});
main();

Results for 100 iterations

Running 100 times
0: Current mem usage 17.28 MB
1: Current mem usage 17.34 MB
2: Current mem usage 17.43 MB
3: Current mem usage 17.53 MB
4: Current mem usage 17.57 MB
5: Current mem usage 17.6 MB
6: Current mem usage 17.63 MB
7: Current mem usage 17.66 MB
8: Current mem usage 17.69 MB
9: Current mem usage 17.71 MB
...
90: Current mem usage 19.01 MB
91: Current mem usage 19.03 MB
92: Current mem usage 19.04 MB
93: Current mem usage 19.06 MB
94: Current mem usage 19.07 MB
95: Current mem usage 19.08 MB
96: Current mem usage 19.1 MB
97: Current mem usage 19.1 MB
98: Current mem usage 19.11 MB
99: Current mem usage 19.13 MB

I've been trying out a number of different things, but I've not been able to exactly pinpoint the problem, although the following is clear:

  1. Opening and closing a generated gapic client without executing any requests will not leak any memory.
  2. Opening a gapic client, executing one single simple request and then closing the client will leak a bit of memory. It does not seem to make any real difference which request is being executed.

Opening and closing clients like this is not a very common use case, so maybe this is something that we just have to live with, but I at least wanted to inform you.

@surbhigarg92
Copy link

Hi Team , We have a customer for NodeJS Spanner library who has reached out to use for the memory leak issue.

Our findings indicate that leak is still happening. Using the sample code mentioned in Knut's comment we are still able to reproduce memory leak.

Can you please work on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests