[VL] Use conf to control C2R occupied memory #5799

XinShuoWang · 2024-05-17T13:59:47Z

What changes were proposed in this pull request?

In the current design, the Column2Row operation is completed in one go, which consumes a lot of memory and causes the program OOM. In this commit, I modified the C2R operation into multiple operations, which will greatly reduce the peak memory of the C2R operation. In addition, there should be some performance advantages from memory reuse.

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

github-actions · 2024-05-17T14:00:03Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Other pull requests

github-actions · 2024-05-17T14:00:20Z

Run Gluten Clickhouse CI

FelixYBW · 2024-05-18T04:50:57Z

@jinchengchenghh

github-actions · 2024-05-18T13:11:47Z

Run Gluten Clickhouse CI

github-actions · 2024-05-19T09:16:55Z

Run Gluten Clickhouse CI

XinShuoWang · 2024-05-19T09:19:15Z

@jinchengchenghh @zhztheplayer Can you review this pr?

cpp/core/jni/JniWrapper.cc

github-actions · 2024-05-19T12:26:19Z

Run Gluten Clickhouse CI

xumingming

Looks good overall, some comments.

cpp/core/operators/c2r/ColumnarToRow.h

cpp/velox/operators/serializer/VeloxColumnarToRowConverter.cc

shims/common/src/main/scala/org/apache/gluten/GlutenConfig.scala

backends-velox/src/main/scala/org/apache/gluten/execution/VeloxColumnarToRowExec.scala

jinchengchenghh · 2024-05-20T00:31:34Z

cpp/core/jni/JniWrapper.cc

  auto columnarToRowConverter = ctx->objectStore()->retrieve<ColumnarToRowConverter>(c2rHandle);
  auto cb = ctx->objectStore()->retrieve<ColumnarBatch>(batchHandle);
-  columnarToRowConverter->convert(cb);
+
+  int64_t column2RowMemThreshold = 256 * 1024 * 1024;


Can you place this config computation to GlutenConfig.cc

cpp/velox/operators/serializer/VeloxColumnarToRowConverter.cc

github-actions · 2024-05-20T03:36:16Z

Run Gluten Clickhouse CI

FelixYBW · 2024-05-20T04:46:17Z

I met similar issue recently, reports as ArrowContext OOM. Is the root cause that the RowVector too larger? or because the data in each row is too large? By default GLuten configure Velox as 4K batch, but looks many Velox operators may exceed 4K limit.

Does the PR hold on the batch and output part of the batch each time?

FelixYBW · 2024-05-20T04:47:26Z

We may need couple of UTs to cover this

XinShuoWang · 2024-05-20T05:44:54Z

I met similar issue recently, reports as ArrowContext OOM. Is the root cause that the RowVector too larger? or because the data in each row is too large? By default GLuten configure Velox as 4K batch, but looks many Velox operators may exceed 4K limit.

Does the PR hold on the batch and output part of the batch each time?

In my test case, each row is only 20KB in size, but the size of RowVector is very large, which eventually leads to OOM.
In my test case the number of output lines exceeds the 4K limit is because of the Generate Operator.
This PR is aimed to hold on the batch and output part of the batch each time

ulysses-you · 2024-05-20T06:01:55Z

cpp/core/jni/JniWrapper.cc

@@ -580,17 +583,28 @@ Java_org_apache_gluten_vectorized_NativeColumnarToRowJniWrapper_nativeColumnarTo
    JNIEnv* env,
    jobject wrapper,
    jlong batchHandle,
-    jlong c2rHandle) {
+    jlong c2rHandle,
+    jlong rowId) {


Can we pass a range [startRowId, endRowId) to naitve ? I think it should be more clear to use the config at java side.

FelixYBW · 2024-05-20T06:09:34Z

I met similar issue recently, reports as ArrowContext OOM. Is the root cause that the RowVector too larger? or because the data in each row is too large? By default GLuten configure Velox as 4K batch, but looks many Velox operators may exceed 4K limit.
Does the PR hold on the batch and output part of the batch each time?

In my test case, each row is only 20KB in size, but the size of RowVector is very large, which eventually leads to OOM.

In my test case the number of output lines exceeds the 4K limit is because of the Generate Operator.

This PR is aimed to hold on the batch and output part of the batch each time

Can we use the batch size (4K row) by default as the config? We can still add the threshold but if customer doesn't config threshold, let's convert batch size (4k by default) instead of 256M memory.

We can still reuse the memory allocated for row but I don't expect much perf gain here. It's too large for L1/L2 cache, very likely to be evicted from L3 as well when Spark process this row.

github-actions · 2024-05-20T08:47:19Z

Run Gluten Clickhouse CI

FelixYBW · 2024-05-28T17:44:59Z

why the PR is closed?

XinShuoWang · 2024-06-02T09:03:51Z

why the PR is closed?

Sorry, I accidentally forced push the main branch, which caused the PR to be closed. The new PR is here: #5952.

XinShuoWang marked this pull request as ready for review May 19, 2024 09:17

XinShuoWang changed the title ~~Use conf to control C2R occupied memory~~ [VL] Use conf to control C2R occupied memory May 19, 2024

xumingming reviewed May 19, 2024

View reviewed changes

cpp/core/jni/JniWrapper.cc Outdated Show resolved Hide resolved

xumingming suggested changes May 19, 2024

View reviewed changes

jinchengchenghh reviewed May 20, 2024

View reviewed changes

cpp/velox/operators/serializer/VeloxColumnarToRowConverter.cc Outdated Show resolved Hide resolved

jinchengchenghh reviewed May 20, 2024

View reviewed changes

cpp/velox/operators/serializer/VeloxColumnarToRowConverter.cc Outdated Show resolved Hide resolved

ulysses-you reviewed May 20, 2024

View reviewed changes

FelixYBW mentioned this pull request May 21, 2024

[VL] arrowcontext not support spill #5718

Open

XinShuoWang closed this May 28, 2024

XinShuoWang force-pushed the main branch from 2b1c25c to 12fdb70 Compare May 28, 2024 03:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Use conf to control C2R occupied memory #5799

[VL] Use conf to control C2R occupied memory #5799

XinShuoWang commented May 17, 2024

github-actions bot commented May 17, 2024

github-actions bot commented May 17, 2024

FelixYBW commented May 18, 2024

github-actions bot commented May 18, 2024

github-actions bot commented May 19, 2024

XinShuoWang commented May 19, 2024

github-actions bot commented May 19, 2024

xumingming left a comment

jinchengchenghh May 20, 2024

github-actions bot commented May 20, 2024

FelixYBW commented May 20, 2024

FelixYBW commented May 20, 2024

XinShuoWang commented May 20, 2024

ulysses-you May 20, 2024

FelixYBW commented May 20, 2024 •

edited

github-actions bot commented May 20, 2024

FelixYBW commented May 28, 2024

XinShuoWang commented Jun 2, 2024

[VL] Use conf to control C2R occupied memory #5799

[VL] Use conf to control C2R occupied memory #5799

Conversation

XinShuoWang commented May 17, 2024

What changes were proposed in this pull request?

How was this patch tested?

github-actions bot commented May 17, 2024

github-actions bot commented May 17, 2024

FelixYBW commented May 18, 2024

github-actions bot commented May 18, 2024

github-actions bot commented May 19, 2024

XinShuoWang commented May 19, 2024

github-actions bot commented May 19, 2024

xumingming left a comment

Choose a reason for hiding this comment

jinchengchenghh May 20, 2024

Choose a reason for hiding this comment

github-actions bot commented May 20, 2024

FelixYBW commented May 20, 2024

FelixYBW commented May 20, 2024

XinShuoWang commented May 20, 2024

ulysses-you May 20, 2024

Choose a reason for hiding this comment

FelixYBW commented May 20, 2024 • edited

github-actions bot commented May 20, 2024

FelixYBW commented May 28, 2024

XinShuoWang commented Jun 2, 2024

FelixYBW commented May 20, 2024 •

edited