Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproducible comparison jobs for other platforms are taking up Linux/x64 nodes #906

Closed
sxa opened this issue Jan 31, 2024 · 4 comments · Fixed by #910
Closed

reproducible comparison jobs for other platforms are taking up Linux/x64 nodes #906

sxa opened this issue Jan 31, 2024 · 4 comments · Fixed by #910
Assignees
Labels
docker linux mac windows x64 Issues that affect or relate to the x64/x32 LINUX OS

Comments

@sxa
Copy link
Member

sxa commented Jan 31, 2024

What are you trying to do? Use the dockerBuild linux executors (or in today's case, mark the host offline for temporary maintenance)

Expected behaviour: Executors are available

Observed behaviour: Executors are tied up with reproduce_compare jobs for other platforms as per the following screenshot:
image
It appears that we require a node with label dockerBuild&&linux&&x64 regardless of which reproducible build platform we are running.

Any other comments:
While #905 covers a potential deadlock where the reproduce_compare jobs for xLinux are deadlocking, this is a separate issue to cover discussion around the use of xlinux executors for other platforms, placing unnecessary load on those systems at a time when we are reducing capacity as a result of the Equinix removal.

Also in the current situation the reproducible compare jobs are continuing even when their parent pipeline is complete, so we're in a situation where mac compare 96 from the screenshot above is still waiting, but it's parent pipeline has ended. A similar situation is happening for Windows where the windows compare job was initiated by the aborted parent #904. Similarly for Linux/s390x the compare job was initiated by parent job which has completed. In all three cases another subsequent pipeline is running, which blocks the ability of the compare job to run until that new one is complete.

Noting that for each of these situations, the parent job was in a different state (Aborted for Windows, Failed for mac, warnings for Linux/s390x) so that does not seem to affect the behaviour)

@github-actions github-actions bot added docker linux mac windows x64 Issues that affect or relate to the x64/x32 LINUX OS labels Jan 31, 2024
@sophia-guo
Copy link
Contributor

Workflow of the reproduce_compare jobs :

  • Start : triggered by parent build job ( won't block or hold the parent job, that is when parent job is compete the reproduce_compare jobs might just start).
  • Prepare : grab parents jobs artifacts and trigger a second build job, wait util triggered jobs finish, archive artifacts ( this stage is done on a node with label dockerBuild&&linux&&x64. There is no need to be reproducible build platform. Any agents work for that we can also use 'worker' node. The reason of using dockerBuild&&linux&&x64 might be the label is specific for any non-build nor non-test jobs as both code-tool and installer are using the label. Might using dynamic azure docker to reduce the load on building agents at that time and now the status is different?
  • Compare: compare jdk, which require platform-dependent agents

@sophia-guo
Copy link
Contributor

That being said the Prepare stage can run on any node. I thought linux_x64 tend to be more abundant resources than other platforms.

@smlambert
Copy link
Contributor

re: #906 (comment), That used to be true, but it is no longer the case (no longer have access to equinix machines, related: adoptium/infrastructure#3352)

@sxa
Copy link
Member Author

sxa commented Jan 31, 2024

Workflow of the reproduce_compare jobs :

Thanks - that really helps :-)

Yep - if it's just running pipeline code (as opposed to calling too much stuff in shell) then it probably doesn't matter what architecture of machine it is, so I suspect just linux would be good enough. As Shelley says the Linux/x64 ones are likely to be in short supply (In fact at this moment we only have one active, so the Linux/x64 compare builds will probably block trying to find a different build node tonight, but I will get that fixed later)

Any agents work for that we can also use 'worker' node.

Yes I expect we can probably run it on the https://ci.adoptium.net/computer/jenkins%2Dworker/ node by selecting one of the labels on there, which I think we do elsewhere in the build pipelines. That machine has a large number of executors on it so is ideal for this sort of thing and won't risk blocking any of the machines that are used for other build work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docker linux mac windows x64 Issues that affect or relate to the x64/x32 LINUX OS
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants