Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator::getOutput failed for [operator: ValueStream, plan node ID: 0]:IOError: ZSTD decompression failed: Unknown frame descriptor #5736

Open
boneanxs opened this issue May 14, 2024 · 1 comment
Labels
bug Something isn't working triage

Comments

@boneanxs
Copy link
Contributor

Backend

VL (Velox)

Bug description

Meet this error when running with velox backend:

Reason: Operator::getOutput failed for [operator: ValueStream, plan node ID: 0]: Error during calling Java code from native code: org.apache.gluten.exception.GlutenException: java.lang.RuntimeException: IOError: ZSTD decompression failed: Unknown frame descriptor
	at org.apache.gluten.vectorized.GeneralOutIterator.next(GeneralOutIterator.java:48)
	at org.apache.gluten.vectorized.ColumnarBatchSerializerInstance$TaskDeserializationStream.liftedTree1$1(ColumnarBatchSerializer.scala:176)
	at org.apache.gluten.vectorized.ColumnarBatchSerializerInstance$TaskDeserializationStream.readValue(ColumnarBatchSerializer.scala:175)
	at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
	at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:32)
	at org.apache.gluten.vectorized.GeneralInIterator.hasNext(GeneralInIterator.java:31)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNextInternal(ColumnarBatchOutIterator.java:65)
	at org.apache.gluten.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:37)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
	at org.apache.gluten.utils.IteratorCompleter.hasNext(Iterators.scala:69)
	at org.apache.gluten.utils.PayloadCloser.hasNext(Iterators.scala:35)
	at org.apache.gluten.utils.PipelineTimeAccumulator.hasNext(Iterators.scala:98)
	at scala.collection.Iterator.isEmpty(Iterator.scala:387)
	at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
	at org.apache.gluten.utils.PipelineTimeAccumulator.isEmpty(Iterators.scala:88)
	at org.apache.gluten.execution.VeloxColumnarToRowExec$.toRowIterator(VeloxColumnarToRowExec.scala:119)
	at org.apache.gluten.execution.VeloxColumnarToRowExec.$anonfun$doExecuteInternal$1(VeloxColumnarToRowExec.scala:83)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.gluten.execution.ColumnarInputRDDsWrapper.$anonfun$getIterators$1(WholeStageTransformer.scala:441)
	at scala.collection.immutable.List.flatMap(List.scala:366)
	at org.apache.gluten.execution.ColumnarInputRDDsWrapper.getIterators(WholeStageTransformer.scala:432)
	at org.apache.gluten.execution.WholeStageZippedPartitionsRDD.$anonfun$compute$1(WholeStageZippedPartitionsRDD.scala:48)
	at org.apache.gluten.utils.Arm$.withResource(Arm.scala:25)
	at org.apache.gluten.metrics.GlutenTimeMetric$.millis(GlutenTimeMetric.scala:37)
	at org.apache.gluten.execution.WholeStageZippedPartitionsRDD.compute(WholeStageZippedPartitionsRDD.scala:46)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1532)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: IOError: ZSTD decompression failed: Unknown frame descriptor
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeNext(Native Method)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nextInternal(ColumnarBatchOutIterator.java:70)
	at org.apache.gluten.vectorized.GeneralOutIterator.next(GeneralOutIterator.java:46)
	... 57 more

Retriable: False
Function: runInternal
File: /velox/velox/exec/Driver.cpp
Line: 579
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKSsEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE.cold
# 4  _ZN8facebook5velox4exec6Driver4nextERSt10shared_ptrINS1_13BlockingStateEE
# 5  _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 6  _ZN6gluten24WholeStageResultIterator4nextEv
# 7  Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 8  0x00007f0145165e70

	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNextInternal(ColumnarBatchOutIterator.java:65)
	at org.apache.gluten.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:37)
	... 42 more

The related DAG graph is:
Screenshot 2024-05-14 at 11 29 46

Spark version

Spark-3.2.x

Spark configurations

No response

System information

No response

Relevant logs

No response

@boneanxs boneanxs added bug Something isn't working triage labels May 14, 2024
@zhouyuan
Copy link
Contributor

@zhztheplayer is this related with some bugs on c2r case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants