Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Log the actual exception for InternalTestCluster shardLock failure #13628

Open
Hailong-am opened this issue May 11, 2024 · 1 comment
Labels
enhancement Enhancement or improvement to existing feature or request flaky-test Random test failure that succeeds on second run Other :test Adding or fixing a test

Comments

@Hailong-am
Copy link

Hailong-am commented May 11, 2024

Is your feature request related to a problem? Please describe

When i troubleshoot a flaky test of opensearch-project/ml-commons#2436

the error message report test failure as below, it tells that shard [.plugins-ml-config][0] is still locked for somehow, there is no reason/details logged. It's inconvenient to know why the shard is still locked.

java.lang.AssertionError: Shard [.plugins-ml-config][0] is still locked after 5 sec waiting
        at __randomizedtesting.SeedInfo.seed([CC4F8C51F54E62C2:CF7AF5C5193A2F89]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.opensearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:2773)

for (ShardId id : shardIds) {
try {
env.shardLock(id, "InternalTestCluster assert after test", TimeUnit.SECONDS.toMillis(5)).close();
} catch (ShardLockObtainFailedException ex) {
fail("Shard " + id + " is still locked after 5 sec waiting");
}

Describe the solution you'd like

As shared lock will have a detail/reason of why shard been locked, it would be better we log the exception which contains the detail of existing lock and then fail the test case.

/**
* Tries to lock the given shards ID. A shard lock is required to perform any kind of
* write operation on a shards data directory like deleting files, creating a new index writer
* or recover from a different shard instance into it. If the shard lock can not be acquired
* a {@link ShardLockObtainFailedException} is thrown.
* <p>
* Note: this method will return immediately if the lock can't be acquired.
*
* @param id the shard ID to lock
* @param details information about why the shard is being locked
* @return the shard lock. Call {@link ShardLock#close()} to release the lock
*/
public ShardLock shardLock(ShardId id, final String details) throws ShardLockObtainFailedException {
return shardLock(id, details, 0);
}

try {
    env.shardLock(id, "InternalTestCluster assert after test", TimeUnit.SECONDS.toMillis(5)).close();
} catch (ShardLockObtainFailedException ex) {
    // would like to log out the ex that will helps to troubleshoot
    // logger.error(ex);
    fail("Shard " + id + " is still locked after 5 sec waiting");
}

the log should be similar like below, lock already held for [starting shard] this is the reason why it still been locked.

org.opensearch.env.ShardLockObtainFailedException: [.plugins-ml-config][0]: obtaining shard lock for [InternalTestCluster assert after test] timed out after [5000ms], lock already held for [starting shard] with age [20365ms]
	at org.opensearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:877) ~[opensearch-2.15.0-SNAPSHOT.jar:2.15.0-SNAPSHOT]

Related component

Other

Describe alternatives you've considered

No response

Additional context

No response

@Hailong-am Hailong-am added enhancement Enhancement or improvement to existing feature or request untriaged labels May 11, 2024
@github-actions github-actions bot added the Other label May 11, 2024
@peternied peternied added :test Adding or fixing a test flaky-test Random test failure that succeeds on second run and removed untriaged labels May 15, 2024
@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7 8]
@Hailong-am Thanks for creating this issue, could you create a pull request for this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request flaky-test Random test failure that succeeds on second run Other :test Adding or fixing a test
Projects
None yet
Development

No branches or pull requests

2 participants