Issues retrieving code coverage for a large codebase #5554

amtrack · 2024-04-24T12:05:59Z

Summary

Retrieving the code coverage for a large codebase is very slow and uses a lot of RAM.

In our case, the unit tests only take 20 minutes to run, but retrieving the code coverage takes 30 minutes and even fails with a JavaScript heap out of memory error in some environments.

Steps To Reproduce

Create a Scratch Org and deploy some generated Apex Classes (see this repository).

sf org create scratch -f config/project-scratch-def.json -a cc -d
sf project deploy start

Run the unit tests:

sf apex run test -l RunLocalTests --output-dir test-results --wait 60 --code-coverage

Expected result

After the unit tests have finished, I expect it will not take more than 5 minutes to retrieve the code coverage.

Actual result

It either takes very long (30 minutes) to retrieve the code coverage or even fails with a JavaScript heap out of memory in some environments.

System Information

{
  "architecture": "darwin-arm64",
  "cliVersion": "@salesforce/cli/2.37.4",
  "nodeVersion": "node-v20.10.0",
  "osVersion": "Darwin 23.4.0",
  "rootPath": "/Users/john.doe/.config/yarn/global/node_modules/@salesforce/cli",
  "shell": "zsh",
  "pluginVersions": [
    "@oclif/plugin-autocomplete 3.0.13 (core)",
    "@oclif/plugin-commands 3.2.2 (core)",
    "@oclif/plugin-help 6.0.20 (core)",
    "@oclif/plugin-not-found 3.1.2 (core)",
    "@oclif/plugin-plugins 5.0.7 (core)",
    "@oclif/plugin-search 1.0.20 (core)",
    "@oclif/plugin-update 4.2.4 (core)",
    "@oclif/plugin-version 2.0.16 (core)",
    "@oclif/plugin-warn-if-update-available 3.0.15 (core)",
    "@oclif/plugin-which 3.1.7 (core)",
    "@salesforce/cli 2.37.4 (core)",
    "apex 3.1.4 (core)",
    "auth 3.5.5 (core)",
    "data 3.2.5 (core)",
    "deploy-retrieve 3.5.6 (core)",
    "info 3.1.3 (core)",
    "limits 3.2.1 (core)",
    "marketplace 1.1.1 (core)",
    "org 4.0.4 (core)",
    "packaging 2.3.0 (core)",
    "schema 3.2.1 (core)",
    "settings 2.1.2 (core)",
    "sobject 1.2.2 (core)",
    "source 3.2.3 (core)",
    "telemetry 3.1.18 (core)",
    "templates 56.1.1 (core)",
    "trust 3.5.4 (core)",
    "user 3.4.3 (core)"
  ]
}

Additional information

This is probably a consequence of fixing an issue with incomplete code coverage report which was released in @salesforce/cli v2.22.7.
We didn't notice the incomplete coverage reports but I can now reproduce this with an old version.

Debug logs from a Mac when retrieving the code coverage succeeds but takes 30 minutes:

$ sf apex test get --code-coverage --output-dir test-results --test-run-id 7076s000023eTl5 --dev-debug
...
[18:02:55.648] DEBUG (sf:elapsedTime): CodeCoverage.fetchResults - enter
[18:03:17.554] DEBUG (sf:connection): request
    method: "GET"
    url: "https://efficiency-velocity-8823-dev-ed.scratch.my.salesforce.com/services/data/v60.0/tooling/query/0r8xx2sNrVvXQ7WAUW-2000"
    headers: {
    "content-type": "application/json",
    "user-agent": "sfdx toolbelt:"
    }
[18:03:38.407] DEBUG (sf:connection): request
    method: "GET"
    url: "https://efficiency-velocity-8823-dev-ed.scratch.my.salesforce.com/services/data/v60.0/tooling/query/0r8xx2sNrVokQDMAU2-4000"
    headers: {
    "content-type": "application/json",
    "user-agent": "sfdx toolbelt:"
    }
[18:03:39.872] DEBUG (sf:connection): request
    method: "GET"
    url: "https://efficiency-velocity-8823-dev-ed.scratch.my.salesforce.com/services/data/v60.0/tooling/query/0r8xx2sNrVvXQ7WAUW-4000"
    headers: {
    "content-type": "application/json",
    "user-agent": "sfdx toolbelt:"
    }
...
[18:31:50.203] DEBUG (sf:connection): request
    method: "GET"
    url: "https://efficiency-velocity-8823-dev-ed.scratch.my.salesforce.com/services/data/v60.0/tooling/query/0r8xx2sNrVokQDMAU2-158000"
    headers: {
    "content-type": "application/json",
    "user-agent": "sfdx toolbelt:"
    }
[18:32:06.048] DEBUG (sf:elapsedTime): CodeCoverage.fetchResults - exit
    elapsedTime: 1750444.311667
...
Error (1): Invalid string length

*** Internal Diagnostic ***

RangeError: Invalid string length
    at JSON.stringify (<anonymous>)
    at stringify (/Users/john.doe/.config/yarn/global/node_modules/@salesforce/plugin-apex/node_modules/@salesforce/apex-node/lib/src/tests/utils.js:31:17)
    at TestService.writeResultFiles (/Users/john.doe/.config/yarn/global/node_modules/@salesforce/plugin-apex/node_modules/@salesforce/apex-node/lib/src/tests/testService.js:211:48)
    at descriptor.value (/Users/john.doe/.config/yarn/global/node_modules/@salesforce/plugin-apex/node_modules/@salesforce/apex-node/lib/src/utils/elapsedTime.js:57:48)
    at TestReporter.report (file:///Users/john.doe/.config/yarn/global/node_modules/@salesforce/plugin-apex/lib/reporters/testReporter.js:34:31)
    at Test.run (file:///Users/john.doe/.config/yarn/global/node_modules/@salesforce/plugin-apex/lib/commands/apex/get/test.js:64:29)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Test._run (/Users/john.doe/.config/yarn/global/node_modules/@oclif/core/lib/command.js:311:22)
    at async Config.runCommand (/Users/john.doe/.config/yarn/global/node_modules/@salesforce/cli/node_modules/@oclif/core/lib/config/config.js:433:25)
    at async run (/Users/john.doe/.config/yarn/global/node_modules/@salesforce/cli/node_modules/@oclif/core/lib/main.js:92:16)
...

JavaScript heap out of memory error on a CI machine with 8GB RAM and heap size increased from 2GB default to 4GB using NODE_OPTIONS=--max_old_space_size=4096.

[2024-04-19T08:54:08.222Z] + npm run --silent test
[2024-04-19T09:45:53.046Z]
[2024-04-19T09:45:53.046Z] <--- Last few GCs --->
[2024-04-19T09:45:53.046Z]
[2024-04-19T09:45:53.046Z] [30942:0x5853b10]  3076338 ms: Mark-Compact 4019.7 (4133.0) -> 4004.6 (4133.5) MB, 6141.71 / 0.00 ms  (average mu = 0.113, current mu = 0.025) allocation failure; scavenge might not succeed
[2024-04-19T09:45:53.046Z] [30942:0x5853b10]  3082724 ms: Mark-Compact 4020.4 (4133.5) -> 4005.3 (4134.3) MB, 6250.92 / 0.00 ms  (average mu = 0.070, current mu = 0.021) allocation failure; scavenge might not succeed
[2024-04-19T09:45:53.046Z]
[2024-04-19T09:45:53.046Z]
[2024-04-19T09:45:53.046Z] <--- JS stacktrace --->
[2024-04-19T09:45:53.046Z]
[2024-04-19T09:45:53.046Z] FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
[2024-04-19T09:45:53.046Z] ----- Native stack trace -----
[2024-04-19T09:45:53.046Z]
[2024-04-19T09:45:53.046Z]  1: 0xb84bd6 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node]
[2024-04-19T09:45:53.046Z]  2: 0xefec30 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
[2024-04-19T09:45:53.046Z]  3: 0xefef17 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
[2024-04-19T09:45:53.046Z]  4: 0x1110925  [node]
[2024-04-19T09:45:53.046Z]  5: 0x1110eb4 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node]
[2024-04-19T09:45:53.046Z]  6: 0x1127da4 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [node]
[2024-04-19T09:45:53.046Z]  7: 0x11285bc v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
[2024-04-19T09:45:53.046Z]  8: 0x10fe8c1 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
[2024-04-19T09:45:53.046Z]  9: 0x10ffa55 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
[2024-04-19T09:45:53.046Z] 10: 0x10dd0a6 v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
[2024-04-19T09:45:53.046Z] 11: 0x1537e36 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
[2024-04-19T09:45:53.046Z] 12: 0x7fdaabed9ef6
[2024-04-19T09:45:53.046Z] test.sh: line 17: 30942 Aborted                 (core dumped) sf apex run test -l RunLocalTests --output-dir test-results --wait 60 --code-coverage

The text was updated successfully, but these errors were encountered:

github-actions · 2024-04-24T12:06:15Z

Thank you for filing this issue. We appreciate your feedback and will review the issue as soon as possible. Remember, however, that GitHub isn't a mechanism for receiving support under any agreement or SLA. If you require immediate assistance, contact Salesforce Customer Support.

mshanemc · 2024-04-24T13:38:52Z

moved this since it's in the vscode team's apex library

peternhale · 2024-04-25T14:23:40Z

@amtrack we are aware of the issue around volume and additional time to process code coverage. I see that you have already tried bumping max heap space to no avail. I also see that the machine being used for CI/CD is not very robust.

Given the limited memory on the CI/CD machine, out recommendation would be to breakup the test runs into smaller groups. The CLI supports running of Apex Test Suites, as well as taking a list of Apex test classes to run.

Thank you for the example project that challenges both time and space dimension of this issue. We also use similar configurations for our testing. The largest of those produces almost a million code coverage records and requires a heap space of 32GB.

We are looking into alternatives to help minimize the memory consumption.

amtrack · 2024-04-25T15:52:38Z

@peternhale Thanks for your detailed response!
Good to know that you are already exploring this topic.

I'm wondering if you at Salesforce could eventually make the ApexCodeCoverage object available for the Bulk API / Bulk API 2.0. That could probably solve both challenges of the time and space dimension (at least on the client).

Apparently this is possible for the MetadataComponentDependency Tooling API object:

Using Bulk API 2.0, you can query the MetadataComponentDependency Tooling API object and retrieve up to 100,000 records in a single query.

peternhale · 2024-04-25T16:07:50Z

@amtrack It would be nice to have the ApexCodeCoverage object available in Bulkv2. I don't believe either bulk of the endpoints support tooling objects.

As for improving fetch time, the Coverage field is blob of sorts, that stores two arrays, covered and uncovered lines. When included in a query, the fetch times go up dramatically, due to the field's complexity.

amtrack · 2024-04-25T16:38:27Z

I don't believe either bulk of the endpoints support tooling objects.

I honestly believe the same but the docs say the Bulk API 2.0 does support the MetadataComponentDependency Tooling API object.

I wanted to test it myself but failed quickly failed:

$ sf data query -q "SELECT Id FROM MetadataComponentDependency" --use-tooling-api --bulk
Error (2): The following errors occurred:
  --bulk=true cannot also be provided when using --use-tooling-api
  --use-tooling-api=true cannot also be provided when using --bulk
See more help with --help

Looking at the code in the plugin-data repo, I found that it isn't as easy as just removing the flag constraint.
Gave up for now but wondering why the docs should make such a claim.

peternhale · 2024-04-25T16:51:02Z

The errors you are seeing are from the cli query command, rather than to bulk api. Until now, bulk and tooling don'e mix.

peternhale · 2024-05-06T19:44:24Z

@amtrack have you made any progress in getting the tests w/coverage to run on your CI/CD machine?

amtrack · 2024-05-06T20:28:37Z

@peternhale thanks for asking!
Unfortunately 8GB RAM was not enough and scaling up the paid cloud infrastructure even more only because of this is not an option for us.

After trying lots of things and reading https://developer.salesforce.com/blogs/developer-relations/2012/11/how-code-coverage-works we're probably going to switch to Aggregated Code Coverage.
This works fine in terms of speed and memory.
I think there is a small bug in sf apex test run when Aggregated Code Coverage is enabled that the Coverage per Class isn't queried or printed.
I'll create an issue when I find time.
But I was able to query ApexCodeCoverageAggregate for the few hundred classes with a simple and fast Tooling API query.
And that seems to give us the info we need.

peternhale · 2024-05-06T21:22:57Z

@amtrack Thanks for getting back to me. We do have work on backlog to try to better handle large test runs. Please file the issue you mentioned when you have the details gathered.

I would like to close this issue if it is ok with you.

amtrack · 2024-05-07T07:39:23Z

@peternhale OK, I'm looking forward to test it again when you have better handling of large test runs implemented.

amtrack added the investigating label Apr 24, 2024

mshanemc transferred this issue from forcedotcom/cli Apr 24, 2024

amtrack closed this as completed May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues retrieving code coverage for a large codebase #5554

Issues retrieving code coverage for a large codebase #5554

amtrack commented Apr 24, 2024

github-actions bot commented Apr 24, 2024

mshanemc commented Apr 24, 2024

peternhale commented Apr 25, 2024

amtrack commented Apr 25, 2024 •

edited

peternhale commented Apr 25, 2024

amtrack commented Apr 25, 2024 •

edited

peternhale commented Apr 25, 2024

peternhale commented May 6, 2024

amtrack commented May 6, 2024 •

edited

peternhale commented May 6, 2024

amtrack commented May 7, 2024

Issues retrieving code coverage for a large codebase #5554

Issues retrieving code coverage for a large codebase #5554

Comments

amtrack commented Apr 24, 2024

Summary

Steps To Reproduce

Expected result

Actual result

System Information

Additional information

github-actions bot commented Apr 24, 2024

mshanemc commented Apr 24, 2024

peternhale commented Apr 25, 2024

amtrack commented Apr 25, 2024 • edited

peternhale commented Apr 25, 2024

amtrack commented Apr 25, 2024 • edited

peternhale commented Apr 25, 2024

peternhale commented May 6, 2024

amtrack commented May 6, 2024 • edited

peternhale commented May 6, 2024

amtrack commented May 7, 2024

amtrack commented Apr 25, 2024 •

edited

amtrack commented Apr 25, 2024 •

edited

amtrack commented May 6, 2024 •

edited