You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderMetrics#testTimelineReaderMetrics does not perform a source unregistration after test execution, so the TimelineReaderMetrics.getInstance() call in repeated runs will throw an error since the metrics source TimelineReaderMetrics already exists.
Error message in the 2nd run:
org.apache.hadoop.yarn.server.federation.store.metrics.TestFederationStateStoreClientMetrics#testSuccessfulCalls retrieves the historical number of successful calls, but does not retrieve the historical average latency of those calls. For example, it asserts FederationStateStoreClientMetrics.getLatencySucceededCalls() is 100 after the goodStateStore.registerSubCluster(100); call. However, in the second execution of the test, 2 historical calls from the first execution (with latency 100 and 200 respectively) has already been recorded, so FederationStateStoreClientMetrics.getLatencySucceededCalls() will be 133.33... (mean of 100, 200 and 100)
Error message in the 2nd run:
java.lang.AssertionError: expected:<100.0> but was:<133.33333333333334>
Fix: Retrieve existing latency data and use them for calculation.
All tests in TestTaskRunner
All tests in org.apache.hadoop.yarn.sls.scheduler.TestTaskRunner are not idempotent and fails upon repeated execution within the same JVM instance due to self-induced state pollution. Specifically, the test runs made changes to the static fields (e.g. PreStartTask.first in the task classes without restoring them. Therefore, repeated runs throw assertion errors.
Sample error message of TestTaskRunner#testPreStartQueueing in repeated test run:
java.lang.AssertionError:
at org.junit.Assert.fail(Assert.java:87)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.junit.Assert.assertTrue(Assert.java:53)
at org.apache.hadoop.yarn.sls.scheduler.TestTaskRunner.testPreStartQueueing(TestTaskRunner.java:244)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
The fix is done by explicitly setting (resetting) the static variables (countdown latches and booleans) at the start of each test, so that each test runs on a fresh state.
How was this patch tested?
After the patch, rerunning the tests in the same JVM does not produce any exceptions.
@kaiyaok2 Thank you for your contribution! but it does indeed introduce some noise (I have already seen at least 4 similar pull requests). Can we find all the relevant issues and fix them together?
@kaiyaok2 Thank you for your contribution! but it does indeed introduce some noise (I have already seen at least 4 similar pull requests). Can we find all the relevant issues and fix them together?
@slfan1989 Thanks for reviewing! I merged changes in #6794 to this PR (#6793). Now all fixes to the MapReduce module are in #6785 , and all fixes to the Yarn module are in this PR (#6793).
@slfan1989 I fixed the 2 "line too long" issues.
The remaining 5 are in fact existing issues (as I changed public static CountDownLatch latch = new CountDownLatch(1); to public static CountDownLatch latch;). I don't think it's necessary to turn latch private and add getter methods, since these classes are only used in testing.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR
This PR fixes non-idempotent unit tests detected in the Yarn module. These tests pass in the first run but fails in the second run in the same JVM.
TestTimelineReaderMetrics.testTimelineReaderMetrics()
org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderMetrics#testTimelineReaderMetrics
does not perform a source unregistration after test execution, so theTimelineReaderMetrics.getInstance()
call in repeated runs will throw an error since the metrics sourceTimelineReaderMetrics
already exists.Error message in the 2nd run:
Fix: Unregister
"TimelineReaderMetrics"
before the test.TestFederationStateStoreClientMetrics.testSuccessfulCalls()
org.apache.hadoop.yarn.server.federation.store.metrics.TestFederationStateStoreClientMetrics#testSuccessfulCalls
retrieves the historical number of successful calls, but does not retrieve the historical average latency of those calls. For example, it assertsFederationStateStoreClientMetrics.getLatencySucceededCalls()
is 100 after thegoodStateStore.registerSubCluster(100);
call. However, in the second execution of the test, 2 historical calls from the first execution (with latency 100 and 200 respectively) has already been recorded, soFederationStateStoreClientMetrics.getLatencySucceededCalls()
will be 133.33... (mean of 100, 200 and 100)Error message in the 2nd run:
Fix: Retrieve existing latency data and use them for calculation.
All tests in TestTaskRunner
All tests in
org.apache.hadoop.yarn.sls.scheduler.TestTaskRunner
are not idempotent and fails upon repeated execution within the same JVM instance due to self-induced state pollution. Specifically, the test runs made changes to the static fields (e.g.PreStartTask.first
in the task classes without restoring them. Therefore, repeated runs throw assertion errors.Sample error message of
TestTaskRunner#testPreStartQueueing
in repeated test run:The fix is done by explicitly setting (resetting) the static variables (countdown latches and booleans) at the start of each test, so that each test runs on a fresh state.
How was this patch tested?
After the patch, rerunning the tests in the same JVM does not produce any exceptions.