Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (timeout on partition move) in NodesDecommissioningTest.test_decommissioning_rebalancing_node #13522

Closed
VladLazar opened this issue Sep 18, 2023 · 4 comments · Fixed by #18426
Assignees
Labels
area/replication ci-failure ci-rca/test CI Root Cause Analysis - Test Issue kind/bug Something isn't working team/replication helper for jira sync

Comments

@VladLazar
Copy link
Contributor

VladLazar commented Sep 18, 2023

https://buildkite.com/redpanda/redpanda/builds/37054

Module: rptest.tests.nodes_decommissioning_test
Class: NodesDecommissioningTest
Method: test_decommissioning_rebalancing_node
Arguments: {
    "shutdown_decommissioned": true
}
test_id:    NodesDecommissioningTest.test_decommissioning_rebalancing_node
status:     FAIL
run time:   67.334 seconds

TimeoutError('')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 82, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/nodes_decommissioning_test.py", line 629, in test_decommissioning_rebalancing_node
    wait_until(lambda: self._partitions_moving(node=first_node),
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError

JIRA Link: CORE-2263

@mmaslankaprv mmaslankaprv self-assigned this Sep 22, 2023
@mmaslankaprv
Copy link
Member

Fixed with: #13616

@michael-redpanda michael-redpanda added the team/replication helper for jira sync label Apr 26, 2024
@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@mmaslankaprv mmaslankaprv added the ci-rca/test CI Root Cause Analysis - Test Issue label May 13, 2024
mmaslankaprv added a commit to mmaslankaprv/redpanda that referenced this issue May 13, 2024
In the `test_decommissioning_rebalancing_node` test case we check if a
node that joined the cluster and have some partitions assigned to it in
the process of data rebalancing can be successfully decommissioned.

The test was flaky as sometimes all the partition rebalance actions were
finished before we validated if rebalance started.

Added a condition to make sure that we wait for more data before adding
the node to cluster to make sure rebalance will last long enough for the
decommission to interrupt it.

Fixes: redpanda-data#13522

Signed-off-by: Michał Maślanka <michal@redpanda.com>
mmaslankaprv added a commit to mmaslankaprv/redpanda that referenced this issue May 13, 2024
In the `test_decommissioning_rebalancing_node` test case we check if a
node that joined the cluster and have some partitions assigned to it in
the process of data rebalancing can be successfully decommissioned.

The test was flaky as sometimes all the partition rebalance actions were
finished before we validated if rebalance started.

Added a condition to make sure that we wait for more data before adding
the node to cluster to make sure rebalance will last long enough for the
decommission to interrupt it.

Fixes: redpanda-data#13522

Signed-off-by: Michał Maślanka <michal@redpanda.com>
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue May 14, 2024
In the `test_decommissioning_rebalancing_node` test case we check if a
node that joined the cluster and have some partitions assigned to it in
the process of data rebalancing can be successfully decommissioned.

The test was flaky as sometimes all the partition rebalance actions were
finished before we validated if rebalance started.

Added a condition to make sure that we wait for more data before adding
the node to cluster to make sure rebalance will last long enough for the
decommission to interrupt it.

Fixes: redpanda-data#13522

Signed-off-by: Michał Maślanka <michal@redpanda.com>
(cherry picked from commit eb26708)
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue May 14, 2024
In the `test_decommissioning_rebalancing_node` test case we check if a
node that joined the cluster and have some partitions assigned to it in
the process of data rebalancing can be successfully decommissioned.

The test was flaky as sometimes all the partition rebalance actions were
finished before we validated if rebalance started.

Added a condition to make sure that we wait for more data before adding
the node to cluster to make sure rebalance will last long enough for the
decommission to interrupt it.

Fixes: redpanda-data#13522

Signed-off-by: Michał Maślanka <michal@redpanda.com>
(cherry picked from commit eb26708)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/replication ci-failure ci-rca/test CI Root Cause Analysis - Test Issue kind/bug Something isn't working team/replication helper for jira sync
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants