Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Retrospective for July 2022 Releases #155

Open
1 of 11 tasks
sxa opened this issue Jul 19, 2022 · 39 comments
Open
1 of 11 tasks

General Retrospective for July 2022 Releases #155

sxa opened this issue Jul 19, 2022 · 39 comments

Comments

@sxa
Copy link
Member

sxa commented Jul 19, 2022

Summary

A retrospective for all efforts surrounding the titular releases.

All community members are welcome to contribute to the agenda via comments below.

This will be a virtual meeting after the release, with at least a week of notice in the #release Slack channel.

On the day of the meeting the agenda items will be iterated over, with a list of actions added in a comment at/near the end.

Invited: Everyone.

Time, Date, and URL

Time:
Date:

URL:
Dial-in:
PIN:

Details

Retrospective Owner Tasks (in order):

  • Post this issue's URL on the #Release Slack channel around the start of the new release.
  • Copy actions from the previous retrospective into this issue, while ignoring actions that are ticked or have issue links (i.e. clearly-completed actions).
  • Wait until 90% of all builds have been released, with no clear signs of additional respins.
  • Announce the retrospective Slack call's date + time on #Release at least one full week in advance, and send out meeting invites.
  • Host the slack call for the retrospective, including:
    • Iterating over the actions from the previous retrospective issue, ticking off completed items.
    • Iterate over the agenda, ensuring everything gets debated.
    • Create a clear list of actions at the end of the retrospective, including volunteer names.
  • Create a new retrospective issue for the next release.
  • Set yourself a calendar reminder so that you remember to commence step 1 (in the new issue) just before the next release.
  • Close this issue.
@sxa
Copy link
Member Author

sxa commented Jul 19, 2022

Add item into "week before" or "on release week" checklist to explicitly list the aqaReference to be used for the release cycle so it can easily be copied into place when starting the pipelines.

@smlambert
Copy link
Contributor

re: #155 (comment) and noting it will ALWAYS be the latest stable release (https://github.com/adoptium/aqa-tests/releases) as per instructions at in https://adoptium.net/docs/aqavit-verification/ in the Overview section: "To verify binaries, testers clone a specified release of the aqa-tests github repository (the latest stable release of the aqa-tests repository)."

Noting that the release notes at the top of the release at https://github.com/adoptium/aqa-tests/releases also indicates which OpenJDK release it maps to:

Release branch for July 2022 CPU [2022-07-19 CPU 18.0.2, 17.0.4, 11.0.16, 8u341] see java.com/releases

@smlambert
Copy link
Contributor

Due to the design of the build pipeline where JDK binaries / artifacts are only copied at the end of the pipeline to their 'final archival spot', it implies that we lose time on launching TCKs. Example: from the time of the launch of the jdk17 pipeline to the time it will complete all platforms and copy artifacts, is several hours. Meanwhile some of the child jobs within the platform complete quickly, and testing could proceed without waiting for all platforms to complete.

One way to mitigate this could be by running the 'fast to build' platforms together and separating the 'slow to build' platforms into a second run. I think we decided last retrospective that we should do this, but have launched the jdk17 and jdk11 with all platforms together.

@zdtsw
Copy link
Contributor

zdtsw commented Jul 20, 2022

On the day when we expect release,
either:
run job "DeleteJCKMaterials" on all machines with all jck versions.
so we can have the correct jck material when we start jenkins pipeline and interactive tests
or:
login to all TCK machines to delete /home/jenkins/jck_root
then pipeline should re-extract correct jck material

SL/Aug10: referenced in TCK retro issue: https://github.com/temurin-compliance/temurin-compliance/issues/201#issuecomment-1190004838

@sxa
Copy link
Member Author

sxa commented Jul 20, 2022

One way to mitigate this could be by running the 'fast to build' platforms together and separating the 'slow to build' platforms into a second run.

We had certainly discussed tiers, but it was more in priority order. The main issue is that windows is high priority and slow so it's hard to justify doing it in a second run. It should also be stated that the artifacts CAN be pulled from the build job earlier while they are being tested - they just can't be pulled from the top level openjdkXX-pipeline. That is likely a better option if required (and is what I did for windows/jdk17)

@smlambert
Copy link
Contributor

smlambert commented Jul 20, 2022

Put win32 in a second pipeline, since it ALWAYs runs before win64, so it takes all of the build and test resources away from a primary platform.

[ SXA 2022/10/08: Raised an issue for analysis at https://github.com/adoptium/temurin-build/issues/3065 ]

@tellison
Copy link
Contributor

tellison commented Jul 22, 2022

The release tool needs tests to ensure that the expected set of artefacts are present after a release is made.
In this round the publish of Windows JRE 11 and 17 resulted in no JRE MSI in the releases repository, which was only caught downstream by the container updater checks failing. This should be caught much earlier.

(SXA 2022/10/08 Yellow status for release tool job raised at adoptium/temurin-build#3064)

@sxa
Copy link
Member Author

sxa commented Jul 23, 2022

Can we lock the artifacts of the individual jdkXX- build jobs when RELEASE=true for the duration of the release cycle in case installe/signingr re-runs are required

@sxa
Copy link
Member Author

sxa commented Jul 23, 2022

FOr the "pre-release" test runs, look at step in the process to delete the jobs afterwards so they are not preserved, which takes space on the server and potentially causes confusion as to which are the release pipelines (Likely named differently but best to avoid any potential confusion)

@smlambert
Copy link
Contributor

Update the release template to include jdk8 alpine-linux (see #153 (comment)).

@sxa
Copy link
Member Author

sxa commented Jul 25, 2022

Encourage those involved in the releases to get any absences listed in the status document under the "Planned absences during the release cycle" section

@zdtsw
Copy link
Contributor

zdtsw commented Jul 26, 2022

revisit jobs' timeout setting.
Saw one extended job is set to
Timeout set to expire in 2 days 2 hr which does not sound reasonable even for extended job on arm linux.
after 34hrs run, it just hung there, till i manually aborted that job already 36hr passed on this node.

@steelhead31
Copy link
Contributor

For windows natives, can we install 2 versions, ie both 64 bit and 32 bit, as removing and putting them on a single machine, takes time ?, alternatively devote some machines to 64 bit / some to 32bit?

@chadlwilson
Copy link

The x64 Mac JDK 17 TCK seems to be relatively slow as far as primary platforms go? As a naive community member, I wonder if it is expected and/or worth reflecting on if not?

  • JDK+JRE 17 x64 Linux tarballs appeared ~5 days ago
  • JDK+JRE 17 x64 Windows zips ~4 days ago
  • JDK+JRE 11 x64 Mac tarballs appeared 3 days ago.
  • Still no sign of JDK+JRE 17 x64 Mac tarballs at time of writing, TCK says ⏳. 😢

Waiting for them to appear here, basically.

@smlambert
Copy link
Contributor

thanks @chadlwilson - re: #155 (comment) it is very much worth reflecting on it. I created this issue (with timeline charts) to assess during the retrospective for this release.

The project goals for delivery for the primary platforms is to be completed within 2 days of the build completion. Any platform that took longer than that (the ones you called out, need a critical eye on various things that can be done differently for next time (to be discussed in the public retrospective and the private temurin-compliance retrospective).

@jiekang
Copy link

jiekang commented Jul 26, 2022

Reminder comment to post-mortem Windows 64 bit vscode update issue that results in

java -version
Error: loading: C:\openjdk-8\jre\bin\msvcp140.dll
Error: loading: C:\openjdk-8\jre\bin\server\jvm.dll

@sophia-guo
Copy link

@xsa suggest: "standard" stuff ( updating installer jdk versions, disable nightly testing) may not need to require the comment or slack message with "approval to ship during lockdown".

@sophia-guo
Copy link

Aqa-tests testenv.properties need to take care of jdk8 arm separately

@sophia-guo
Copy link

Some of aqa test build jobs haven't been updated for a long time. When doing the rebuild there are a few issues.

  1. no USE_TESTENV_PROPERTIES parameter
  2. default of SDK_SOURCE are upstream or nightly
  3. CUSTOMIZED_SDK_URL_CREDENTIAL_ID is not set

Those parameters are not consistent, have to pay more attention when rebuild.

@sophia-guo
Copy link

When ga is available, before trigger the pipeline job update corresponding lines in file testenv.properties of aqa-tests release branch

@llxia
Copy link

llxia commented Jul 28, 2022

re #155 (comment), to avoid this problem, I would suggest regenerating the test jobs using aqaAutoGen. It can be done prior to release or with the release.

image

@smlambert
Copy link
Contributor

re: #155 (comment) could be done the weekend before, when we do a dry run of the release.

re: #155 (comment) - we have a clunky process and I am not sure how to make it smoother... our release branch & testenv.properties should be created/updated well ahead of release week, but can not contain the actual release tags for openjdk GA until they are present. At the moment they are present, we can update them, alternatively, we can compare the tag we have with the GA tag and make sure there are no diffs.

@sxa
Copy link
Member Author

sxa commented Aug 1, 2022

Perhaps adjust order of columns in the status doc to indicate the "natural" order that things would be completed i.e. AQA-TCK-Publish-Installers-Containers instead of AQA-TCK-Installers-Containers-publish as it is currently.

@sxa
Copy link
Member Author

sxa commented Aug 1, 2022

Ensure that if someone is working on a platform and has a planned absence that someone else is in a position to take over and given appropriate hand-over, or we agree to defer until their return (i.e. we shouldn't have to block progress)

@sxa
Copy link
Member Author

sxa commented Aug 2, 2022

Understand the following:

@sxa
Copy link
Member Author

sxa commented Aug 4, 2022

Clarify some of the field descriptions on the create_installer_windows (maybe others) jobs. It says JDK should be "openj9" or "hotspot" whereas the likely option now is "temurin". Also ensure that examples are provided in JDK8 and JDK11+ format. Ditto for sign_installer's "FULL_VERSION" parameter. Also sign_installer cannot directly pick up from create_installer_windows a it uses a hard coded path which does not include wix/ReleaseDir which is what comes out of the installer job.

@zdtsw
Copy link
Contributor

zdtsw commented Aug 4, 2022

Understand how "jfrog" is working, i.e: when we cannot download/install rpm/deb from it
who to contact
how to escalate this

@sxa
Copy link
Member Author

sxa commented Aug 8, 2022

As per #153 (comment) possibly rename the "Containers" heading on the status table to avoid any ambiguity

@sxa
Copy link
Member Author

sxa commented Aug 10, 2022

The release tool needs tests to ensure that the expected set of artefacts are present after a release is made. In this round the publish of Windows JRE 11 and 17 resulted in no JRE MSI in the releases repository, which was only caught downstream by the container updater checks failing. This should be caught much earlier.

releasCheck.sh should handle this and is now included in the output when performing a release so if anything has the wrong number of artifacts it will be caught and visible in the log of the release job. (At the beginning of the release cycle it was counting wrongly as the GPG signature files were not included) We should probably make the release job show a yellow warning message if the counts are incorrect so it is more visible.

@sxa
Copy link
Member Author

sxa commented Aug 10, 2022

Noting that the release notes at the top of the release at https://github.com/adoptium/aqa-tests/releases also indicates which OpenJDK release it maps to:

Just had a look at that - it doesn't seem to include v0.9.4-release yet which is the latest one, but in general that definitely sounds like a good link to add into the RELEASING.md. but for the purposes of anyone being able to verify / re-run against an old release, perhaps if someone does an emergency fix and wants to re-run, I think it would be useful to have the version used explicitly logged in the checklist for each version (For example the two JDK8u releases in this cycle require different ones). Also fits in with Sophia's comment which is something else that should go into the release process now.

When ga is available, before trigger the pipeline job update corresponding lines in file testenv.properties of aqa-tests release branch

@sxa
Copy link
Member Author

sxa commented Aug 10, 2022

Saw one extended job is set to
Timeout set to expire in 2 days 2 hr which does not sound reasonable even for extended job on arm linux.
after 34hrs run, it just hung there, till i manually aborted that job already 36hr passed on this node.

If we force those jobs to the equinix machine it won't be a problem, however we have the same issue for Solaris/SPARC JDK8 where a full extended run takes around 80 hours . This has also been flagged by eclipse :-). Need to confirm if any jobs will fail to run on the other machine, which is still useful for running the interactive tests.

@andrew-m-leonard
Copy link
Contributor

andrew-m-leonard commented Aug 10, 2022

Need more reliable "release publish" job, to ensure all the correct files get published
adoptium/github-release-scripts#85

@andrew-m-leonard
Copy link
Contributor

Nightly build smoke tests could be improved to validate build archives, eg.they all exist? verify the .sig?

@smlambert
Copy link
Contributor

release job can be made 'easier' to use (this was raised in past retros, still improvements to be made), even if its to add examples for all 3 primary platforms, or all 13 possible platforms for the regex. Main thing is to remove the chance of manual error.

@smlambert
Copy link
Contributor

Revisit the release checklist. Many of the listed items had notes that they could be automated. Some of them should be re-thunk completely, example, having to create a PR to disable nightly testing, then another to re-enable nightly testing seems like a way to clog the release workflow with silly little steps. Should we not just click the Jenkins button to disable the entire pipeline (including build)?

@sxa
Copy link
Member Author

sxa commented Aug 10, 2022

For windows natives, can we install 2 versions, ie both 64 bit and 32 bit, as removing and putting them on a single machine, takes time ?, alternatively devote some machines to 64 bit / some to 32bit?

I agree that would be incredibly useful. We'd need to have both JDKs available for the job that is performing the extraction, or potentially have the ability to run the job in both 32 and 64-bit version and have the natives write to a different directory for each (which would require equivalent changes to the test jobs to know about the different directories)

@smlambert
Copy link
Contributor

Consider to make the release pipeline jobs produce and print a suitable prepopulated release job URL (with the right tag, upstream job name and ID, and correct regex) to reduce human error and make it more pleasant for those running the release job.

@smlambert
Copy link
Contributor

Scorecard for interest: #157

@adamfarley
Copy link
Contributor

All actions in this issue have been copied to #155

@sxa - Please close when you have a moment, thx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests