Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test TestHiveTransactionalTable.testLargePartitionedUpdate: Hive TEZ failure: Vertex did not succeed due to OWN_TASK_FAILURE #21906

Open
findepi opened this issue May 10, 2024 · 0 comments

Comments

@findepi
Copy link
Member

findepi commented May 10, 2024

https://github.com/trinodb/trino/actions/runs/8890555083/job/24411611083?pr=21758

tests               | 2024-04-30 13:38:26 INFO: SUCCESS     /    io.trino.tests.product.hive.TestHiveTransactionalTable.testLargePartitionedDelete (Groups: profile_specific_tests, hive_transactional) took 21.8 seconds
tests               | 2024-04-30 13:38:26 INFO: [117 of 117] io.trino.tests.product.hive.TestHiveTransactionalTable.testLargePartitionedUpdate (Groups: profile_specific_tests, hive_transactional)
presto-master       | 2024-04-30T13:38:27.107+0545	INFO	dispatcher-query-9	io.trino.event.QueryMonitor	TIMELINE: Query 20240430_075326_00628_xtn6i :: FINISHED :: elapsed 554ms :: planning 27ms :: waiting 6ms :: scheduling 18ms :: running 5ms :: finishing 504ms :: begin 2024-04-30T13:38:26.552+05:45 :: end 2024-04-30T13:38:27.106+05:45
presto-master       | 2024-04-30T13:38:27.603+0545	INFO	dispatcher-query-9	io.trino.event.QueryMonitor	TIMELINE: Query 20240430_075327_00629_xtn6i :: FINISHED :: elapsed 492ms :: planning 15ms :: waiting 4ms :: scheduling 10ms :: running 6ms :: finishing 461ms :: begin 2024-04-30T13:38:27.110+05:45 :: end 2024-04-30T13:38:27.602+05:45
presto-master       | 2024-04-30T13:38:27.998+0545	INFO	dispatcher-query-7	io.trino.event.QueryMonitor	TIMELINE: Query 20240430_075327_00630_xtn6i :: FINISHED :: elapsed 391ms :: planning 35ms :: waiting 22ms :: scheduling 29ms :: running 4ms :: finishing 323ms :: begin 2024-04-30T13:38:27.606+05:45 :: end 2024-04-30T13:38:27.997+05:45
presto-master       | 2024-04-30T13:38:28.061+0545	INFO	dispatcher-query-5	io.trino.event.QueryMonitor	TIMELINE: Query 20240430_075328_00631_xtn6i :: FINISHED :: elapsed 59ms :: planning 18ms :: waiting 7ms :: scheduling 32ms :: running 4ms :: finishing 5ms :: begin 2024-04-30T13:38:28.001+05:45 :: end 2024-04-30T13:38:28.060+05:45
2024-04-30T07:53:34.714Z	INFO	container-stats-0	io.trino.tests.product.launcher.env.EnvironmentListener	Container stats:
+---------------+--------+---------+---------+--------+------+----------+----------+
|     container |    cpu |     mem | max mem |  mem % | pids |   net in |  net out |
+---------------+--------+---------+---------+--------+------+----------+----------+
| hadoop-master | 14.91% | 15.61GB |  2.56GB | 16.40% | 1034 | 355.01MB | 119.15MB |
| presto-master |  1.49% | 15.61GB |  3.69GB | 23.62% |  358 | 119.46MB | 357.73MB |
|         tests |  0.02% | 15.61GB |  0.55GB |  3.50% |   22 |   5.04MB |   2.64MB |
+---------------+--------+---------+---------+--------+------+----------+----------+
presto-master       | 2024-04-30T13:38:40.973+0545	ERROR	transaction-finishing-0	io.trino.metadata.CatalogMetadata	Connector threw exception on abort
presto-master       | java.lang.IllegalStateException: Tried to rollback buffered metastore operations after transaction has been committed/aborted
presto-master       | 	at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.rollback(SemiTransactionalHiveMetastore.java:1201)
presto-master       | 	at io.trino.plugin.hive.HiveMetadata.rollback(HiveMetadata.java:3877)
presto-master       | 	at io.trino.plugin.hive.HiveTransactionManager.lambda$rollback$1(HiveTransactionManager.java:69)
presto-master       | 	at java.base/java.util.Optional.ifPresent(Optional.java:178)
presto-master       | 	at io.trino.plugin.hive.HiveTransactionManager.rollback(HiveTransactionManager.java:67)
presto-master       | 	at io.trino.plugin.hive.HiveConnector.rollback(HiveConnector.java:233)
presto-master       | 	at io.trino.metadata.CatalogTransaction.abort(CatalogTransaction.java:93)
presto-master       | 	at io.trino.metadata.CatalogMetadata.safeAbort(CatalogMetadata.java:163)
presto-master       | 	at io.trino.metadata.CatalogMetadata.abort(CatalogMetadata.java:157)
presto-master       | 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
presto-master       | 	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
presto-master       | 	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
presto-master       | 	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
presto-master       | 	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:79)
presto-master       | 	at io.trino.$gen.Trino_445_252_gb00c88a____20240430_074244_2.run(Unknown Source)
presto-master       | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
presto-master       | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
presto-master       | 	at java.base/java.lang.Thread.run(Thread.java:1570)
presto-master       | 
presto-master       | 
presto-master       | 2024-04-30T13:38:40.979+0545	INFO	dispatcher-query-10	io.trino.event.QueryMonitor	TIMELINE: Query 20240430_075337_00632_xtn6i :: FAILED (HIVE_WRITER_DATA_ERROR) :: elapsed 3091ms :: planning 30ms :: waiting 12ms :: scheduling 43ms :: running 1259ms :: finishing 1759ms :: begin 2024-04-30T13:38:37.879+05:45 :: end 2024-04-30T13:38:40.970+05:45
tests               | 2024-04-30 13:38:41 WARNING: Query failed on attempt 1, will retry.
tests               | io.trino.tempto.query.QueryExecutionException: java.sql.SQLException: Query failed (#20240430_075337_00632_xtn6i): Failed to write temporary file: hdfs://hadoop-master:9000/user/hive/warehouse/test_large_update_new_true_none_8dorovyimr614dd3u3l8/d=20220101/delete_delta_0000001_0000001_0000/.tmp-sort.bucket_00000.2
tests               | 	at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:119)
tests               | 	at io.trino.tempto.query.JdbcQueryExecutor.executeQuery(JdbcQueryExecutor.java:84)
tests               | 	at io.trino.tests.product.utils.QueryExecutors$1.lambda$executeQuery$0(QueryExecutors.java:54)
tests               | 	at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:243)
tests               | 	at dev.failsafe.Functions.lambda$get$0(Functions.java:46)
tests               | 	at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74)
tests               | 	at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187)
tests               | 	at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
tests               | 	at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112)
tests               | 	at io.trino.tests.product.utils.QueryExecutors$1.executeQuery(QueryExecutors.java:54)
tests               | 	at io.trino.tests.product.hive.TestHiveTransactionalTable.lambda$testLargePartitionedUpdate$87(TestHiveTransactionalTable.java:2040)
tests               | 	at io.trino.tests.product.hive.TestHiveTransactionalTable.withTemporaryTable(TestHiveTransactionalTable.java:1794)
tests               | 	at io.trino.tests.product.hive.TestHiveTransactionalTable.lambda$testLargePartitionedUpdate$88(TestHiveTransactionalTable.java:2034)
tests               | 	at java.base/java.lang.Thread.run(Thread.java:1570)
tests               | Caused by: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1714462954618_0029_2_00, diagnostics=[Task failed, taskId=task_1714462954618_0029_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1714462954618_0029_2_00_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: java.lang.NullPointerException
tests               | 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
tests               | 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
tests               | 	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
tests               | 	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
tests               | 	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
tests               | 	at java.security.AccessController.doPrivileged(Native Method)
tests               | 	at javax.security.auth.Subject.doAs(Subject.java:422)
tests               | 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
tests               | 	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
tests               | 	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
tests               | 	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
tests               | 	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
tests               | 	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
tests               | 	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
tests               | 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
tests               | 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
tests               | 	at java.lang.Thread.run(Thread.java:748)
tests               | Caused by: java.lang.RuntimeException: java.io.IOException: java.lang.NullPointerException
tests               | 	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
tests               | 	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:145)
tests               | 	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
tests               | 	at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:157)
tests               | 	at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:83)
tests               | 	at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
tests               | 	at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
tests               | 	at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
tests               | 	at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
tests               | 	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532)
tests               | 	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178)
tests               | 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
tests               | 	... 16 more
tests               | Caused by: java.io.IOException: java.lang.NullPointerException
tests               | 	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
tests               | 	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
tests               | 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:421)
tests               | 	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
tests               | 	... 27 more
tests               | Caused by: java.lang.NullPointerException
tests               | 	at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.<init>(VectorizedOrcAcidRowBatchReader.java:1126)
tests               | 	at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.<init>(VectorizedOrcAcidRowBatchReader.java:224)
tests               | 	at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.<init>(VectorizedOrcAcidRowBatchReader.java:121)
tests               | 	at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.<init>(VectorizedOrcAcidRowBatchReader.java:116)
tests               | 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1984)
tests               | 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:418)
tests               | 	... 28 more
tests               | ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1714462954618_0029_2_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1714462954618_0029_2_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1714462954618_0029_2_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
tests               | 	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:275)
tests               | 	at io.trino.tempto.query.JdbcQueryExecutor.executeQueryNoParams(JdbcQueryExecutor.java:128)
tests               | 	at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:112)
tests               | 	... 21 more
tests               | 	Suppressed: java.lang.Exception: Query: SELECT count(1) FROM test_large_update_new_true_NONE_8dorovyimr614dd3u3l8 WHERE a = 0
tests               | 		at io.trino.tempto.query.JdbcQueryExecutor.executeQueryNoParams(JdbcQueryExecutor.java:136)
tests               | 		... 22 more
tests               | 
tests               | 2024-04-30 13:38:51 INFO: [118 of 117] io.trino.tests.product.hive.TestHiveTransactionalTable.testMultiColumnDelete [HIVE, TRINO] (Groups: profile_specific_tests, hive_transactional)
presto-master       | 2024-04-30T13:38:51.864+0545	INFO
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant