Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory accounting for multi-phase execution #15968

Merged
merged 2 commits into from May 14, 2024

Conversation

mkleen
Copy link
Contributor

@mkleen mkleen commented May 7, 2024

Summary of the changes / Why this improves CrateDB

Multi-Phase execution can produce large intermediate results which can cause to run the node out-of-memory. This adds memory accounting to prevent this.

Follow-up for #15664

Checklist

  • Added an entry in the latest docs/appendices/release-notes/<x.y.0>.rst for user facing changes
  • Updated documentation & sql_features table for user facing changes
  • Touched code is covered by tests
  • CLA is signed
  • This does not contain breaking changes, or if it does:
    • It is released within a major release
    • It is recorded in the latest docs/appendices/release-notes/<x.y.0>.rst
    • It was marked as deprecated in an earlier release if possible
    • You've thought about the consequences and other components are adapted
      (E.g. AdminUI)

@mkleen mkleen force-pushed the mkleen/multiphase-memory-accounting branch from af054aa to 4be60d8 Compare May 8, 2024 08:41
@mkleen mkleen marked this pull request as ready for review May 8, 2024 08:50
@mkleen mkleen force-pushed the mkleen/multiphase-memory-accounting branch 6 times, most recently from f819681 to b0e7963 Compare May 8, 2024 09:42
@BaurzhanSakhariev
Copy link
Contributor

I think we need a port to 5.7 and a change entry since OutOfMemory is a bug in general?

Even if only new change exposed it, there can be another scenarios causing it.

@mkleen mkleen force-pushed the mkleen/multiphase-memory-accounting branch from b0e7963 to fde5452 Compare May 8, 2024 10:52
Copy link
Contributor

@BaurzhanSakhariev BaurzhanSakhariev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, see also comment about treating it as a fix

Comment on lines +53 to +60
var ramAccounting = ConcurrentRamAccounting.forCircuitBreaker(
"multi-phase",
executor.circuitBreaker(HierarchyCircuitBreakerService.QUERY),
plannerContext.transactionContext().sessionSettings().memoryLimitInBytes()
);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth checking performance impact of this - and if it's worth wrapping into BlockBasedRamAccounting (but per consumer, further below) to reduce the amount of synchronization.

@mfussenegger
Copy link
Member

mfussenegger commented May 8, 2024

I think we need a port to 5.7 and a change entry since OutOfMemory is a bug in general?

+1
This is not exclusive to lookup join, but a general problem with where xs in (select ..) queries.

@mkleen
Copy link
Contributor Author

mkleen commented May 8, 2024

@mfussenegger Performance-wise there is no difference but allocation seems less with ram accounting #15971

Q: select t1."sourceIP" from uservisits_large t1 inner join uservisits_small t2 on t1."sourceIP" = t2."sourceIP" limit 1000
C: 5
| Version |         Mean ±    Stdev |        Min |     Median |         Q3 |        Max |
|   V1    |    25394.059 ± 1524.774 |  22558.672 |  25072.870 |  25472.729 |  28150.404 |
|   V2    |    24274.561 ± 3739.156 |  17015.885 |  24462.213 |  25827.852 |  30273.521 |
├---------┴-------------------------┴------------┴------------┴------------┴------------┘
|               -   4.51%                           -   2.47%
There is a 77.74% probability that the observed difference is not random, and the best estimate of that difference is 4.51%
The test has no statistical significance


System/JVM Metrics (durations in ms, byte-values in MB)
    |    YOUNG GC            |       OLD GC           |      HEAP         |     ALLOC
    |  cnt      avg      max |  cnt      avg      max |  initial     used |     rate      total
 V1 |  200    26.05    19.98 |   67   791.01   718.79 |     4295     2281 |  1034.70     106516
 V2 |  211    24.46    23.54 |   73   930.11  1080.94 |     4295     2909 |   994.45      99299

@mkleen
Copy link
Contributor Author

mkleen commented May 8, 2024

Is it worth also trying out the BlockBasedRamAccounting ?

@mfussenegger
Copy link
Member

Is it worth also trying out the BlockBasedRamAccounting ?

On the spec/in_subquery.toml there's a ~30% perf regression for me comparing latest-nightly with this version, so I'd say yes.

@mkleen
Copy link
Contributor Author

mkleen commented May 8, 2024

Is it worth also trying out the BlockBasedRamAccounting ?

On the spec/in_subquery.toml there's a ~30% perf regression for me comparing latest-nightly with this version, so I'd say yes.

Ok, thanks!

@mkleen
Copy link
Contributor Author

mkleen commented May 8, 2024

The BlockBasedRamAccounting makes a clear difference. I will continue in this direction.

@mkleen mkleen changed the title Add memory accounting for multiphase execution Add memory accounting for multi-phase execution May 8, 2024
@mkleen
Copy link
Contributor Author

mkleen commented May 9, 2024

With block-ram-accounting against master:

Q: select * from articles where id not in (select id from colors where coolness > 0) limit 10000
C: 1
| Version |         Mean ±    Stdev |        Min |     Median |         Q3 |        Max |
|   V1    |      127.809 ±   39.882 |    119.867 |    121.695 |    122.614 |    449.163 |
|   V2    |      115.855 ±   39.757 |    107.544 |    110.058 |    110.408 |    432.368 |
├---------┴-------------------------┴------------┴------------┴------------┴------------┘
|               -   9.81%                           -  10.04%
There is a 99.71% probability that the observed difference is not random, and the best estimate of that difference is 9.81%
The test has statistical significance

Note: I don't expect that the block-ram-accounting does a speed against no accounting, this is probably a measurement failure, but it def. does not slows it down.

@mkleen mkleen force-pushed the mkleen/multiphase-memory-accounting branch 2 times, most recently from 0851460 to f6695b9 Compare May 9, 2024 10:41
@mkleen mkleen added the v/5.7 label May 9, 2024
@mkleen mkleen force-pushed the mkleen/multiphase-memory-accounting branch 3 times, most recently from 01cd279 to ab6ac29 Compare May 9, 2024 11:52
@mkleen mkleen requested a review from mfussenegger May 9, 2024 12:52
@mkleen mkleen force-pushed the mkleen/multiphase-memory-accounting branch 3 times, most recently from f4aa81f to fae232c Compare May 9, 2024 17:04
@mkleen mkleen force-pushed the mkleen/multiphase-memory-accounting branch from fae232c to efd8fd2 Compare May 14, 2024 19:07
Multi-Phase execution can produce large intermediate results which can
cause out-of-memoryi errors. This adds memory accounting to prevent this.
@mkleen mkleen force-pushed the mkleen/multiphase-memory-accounting branch from efd8fd2 to da11f75 Compare May 14, 2024 19:08
@mkleen mkleen added the ready-to-merge Let Mergify merge the PR once approved and checks pass label May 14, 2024
@mergify mergify bot merged commit 4fd6e41 into master May 14, 2024
15 checks passed
@mergify mergify bot deleted the mkleen/multiphase-memory-accounting branch May 14, 2024 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge Let Mergify merge the PR once approved and checks pass v/5.7
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants