Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: mergeable check redesign #64093

Merged
merged 6 commits into from
May 21, 2024
Merged

CI: mergeable check redesign #64093

merged 6 commits into from
May 21, 2024

Conversation

maxknv
Copy link
Member

@maxknv maxknv commented May 18, 2024

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • Set green Mergeable Check status only after all required checks are passed with success
  • All non-required checks are started at stage Test_3 when all required checks are passed in Test_1/2

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

Modify your CI run

NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing
NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step

Include tests (required builds will be added automatically):

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Unit tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with Analyzer
  • All with Azure
  • Add your option here

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • Add your option here

Extra options:

  • do not test (only style check)
  • disable merge-commit (no merge from master before tests)
  • disable CI cache (job reuse)

Only specified batches in multi-batch jobs:

  • 1
  • 2
  • 3
  • 4

@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-not-for-changelog This PR should not be mentioned in the changelog label May 18, 2024
@robot-ch-test-poll3
Copy link
Contributor

robot-ch-test-poll3 commented May 18, 2024

This is an automated comment for commit a735ab7 with description of existing statuses. It's updated for the latest CI running

✅ Click here to open a full report in a separate page

Successful checks
Check nameDescriptionStatus
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR✅ success
ClickBenchRuns [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table✅ success
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Mergeable CheckChecks if all other necessary checks are successful✅ success
PR CheckThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success

@maxknv maxknv force-pushed the ci_mergeable_check_redesign branch from 582981c to e3329a1 Compare May 18, 2024 16:26
@maxknv maxknv self-assigned this May 18, 2024
@maxknv maxknv force-pushed the ci_mergeable_check_redesign branch from ffc1670 to d5eac97 Compare May 18, 2024 17:53
@maxknv maxknv force-pushed the ci_mergeable_check_redesign branch 2 times, most recently from fbbc8fa to 3f5e367 Compare May 19, 2024 15:43
@maxknv maxknv force-pushed the ci_mergeable_check_redesign branch from 3f5e367 to 5698ef6 Compare May 19, 2024 16:02
@maxknv maxknv added this pull request to the merge queue May 21, 2024
Merged via the queue into master with commit f00f551 May 21, 2024
35 checks passed
@maxknv maxknv deleted the ci_mergeable_check_redesign branch May 21, 2024 08:23
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label May 21, 2024
@azat
Copy link
Collaborator

azat commented Jun 5, 2024

@maxknv do I understand correctly that after this PR if something had failed on a previous stage, the next one will not be run? For instance here #61973, because Stateless tests (release) failed it will not run any subsequent checks?

image

@maxknv
Copy link
Member Author

maxknv commented Jun 6, 2024

@azat yes, correct.

@azat
Copy link
Collaborator

azat commented Jun 6, 2024

Was that the intention?

Right now there are tons of flaky tests that fail on the previous stage and because of this you have to rebase, and almost zero developers, I guess, would fix some tests, wait until it will be merged, and only after this rebase the PR where this flaky test failed. So in other words to me it looks like increased CI resources usage out of from nowhere

@azat azat mentioned this pull request Jun 6, 2024
23 tasks
@maxknv
Copy link
Member Author

maxknv commented Jun 6, 2024

Intention was quite opposite to your current impression of this change. It's aimed to decrease CI load and make CI more effective with more digestible test results at the same time. Though, I understand some ideas can be hard to adopt in our reality.

First stage of testing supposed to be not-flaky. I'm aware there are some flaky tests. They must be fixed or somehow moved to the next test stage. With this approach we know faster that change is not good and before we run huge amount of CI workload. (if there are tons of flaky tests on this stage as you said, it's really bad).

BTW, if some test is failed you don't need to rebase to restart CI. you can just restart it via web interface and successful jobs
will be skipped.

@nikitamikhaylov you wanted to participate in this discussion

@azat
Copy link
Collaborator

azat commented Jun 6, 2024

I do understand the the intention was the opposite, and likely to reduce the load, but the reality is different right now.

Consider for instance this two tiny PRs:

In both of them some tests had been failed, and while they by themselves fixes some tests, they cannot be merged until all 2xx checks will pass, but after this change they will not even started due to some test failures on the previous stage.

BTW, if some test is failed you don't need to rebase to restart CI. you can just restart it via web interface and successful jobs
will be skipped.

I wish I had this button (I do hate rebases without a real purpose), but only those who has write access has it.

@maxknv
Copy link
Member Author

maxknv commented Jun 6, 2024

@nikitamikhaylov I really want to hear other dev' voices here.

@azat I completely understand your point

Both PRs have failed due to the same, presumable flaky, test test_mask_sensitive_info/test.py::test_create_table)
and both PRs are not mergeable after CI run just because of this test. Why we cannot fix it? it's not tons of tests.

NOTE: before my change these 2 PRs would be "un-meargable" as well, we would just run more tests for them. I mean the change does not make more PRs not-mergeable, it only change test execution flow.

It's a shame you don't have restart button, maybe pushing empty (--amend) commit would do the same for you.

@azat
Copy link
Collaborator

azat commented Jun 6, 2024

I do like this change actually, since it does indeed decrease useless CI resources usage (I was always upset about this, even though it has zero relation to myself)

Both PRs have failed due to the same, presumable flaky, test test_mask_sensitive_info/test.py::test_create_table)

Not both, there was another one (#59547)

Why we cannot fix it? it's not tons of tests.

It is not tons (https://github.com/ClickHouse/ClickHouse/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+label%3A%22flaky+test%22 - right now there is 49, but I'm 100% sure that it is not complete, and I guess something maybe obsolete), but if there will be at least one, at some point, this will be enough.

I can say that this is not the first time this happens, I have couple of other examples, and this is only during this week.
Thought couple of them was due to one more merge with failed jobs (that time it was due to broken clang-tidy build)
Maybe I'm just overreacting and that was just bad timings...

@maxknv
Copy link
Member Author

maxknv commented Jun 6, 2024

Not at all. You raised absolutely valid issue and I appreciate your input here.

Now it's a transition phase in CI and you are not the only one disappointed, tbh. I'm trying to figure out how to deal with current struggles, I wonder if we should aiming at fixing all flakiness in Test_1 CI stage or maybe I need to implement some means to quickly move out flaky test cases to another non-blocking CI stage or rollback everything... @alesapin @nikitamikhaylov

@azat
Copy link
Collaborator

azat commented Jun 6, 2024

I wonder if we should aiming at fixing all flakiness in Test_1 CI stage or maybe I need to implement some means to quickly move out flaky test cases to another non-blocking CI stage or rollback everything

I know that it is not perfect, but maybe it will be OK to add couple of retries (2-3), and if the test passed eventually mark it as FLAKY (there was such a status before, but it has different meaning), and do not fail the whole pipeline if there are flaky tests. And also new tests should not be retried.

And answer to the "should we run the next stage?" should ignore FLAKY tests, but the status of check itself could be marked as FAILED I guess (though not sure that it easy to do right now).

Thoughts?

@azat
Copy link
Collaborator

azat commented Jun 6, 2024

And one more question, @maxknv is it possible to run all the stages with "CI settings" somehow?

@maxknv
Copy link
Member Author

maxknv commented Jun 7, 2024

Thoughts?

Could be an option too. We'll discuss it in the team

is it possible to run all the stages with "CI settings" somehow?

I think it's doable to add this feature. (All Test jobs will run in the single stage and this will be an equivalent of old behaviour)
What you can do now is selecting "Allow: All NOT Required check" and this will skip Fast Tests and whole Test_1 stage and next stage will run. Or you can select specific job(s) "Allow: Performance test"

@@ -447,9 +447,7 @@ def set_mergeable_check(
)


def update_mergeable_check(
commit: Commit, pr_info: PRInfo, check_name: str
) -> Optional[CommitStatus]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't imagine the long-projected refactoring to make the whole pipeline mergeable check -> a sync with accurate propagating proper descriptions was removed so easily.

It's just a the whole reverting https://github.com/ClickHouse/ClickHouse/pull/61464/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants