New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General Retrospective for July 2022 Releases #155
Comments
Add item into "week before" or "on release week" checklist to explicitly list the aqaReference to be used for the release cycle so it can easily be copied into place when starting the pipelines. |
re: #155 (comment) and noting it will ALWAYS be the latest stable release (https://github.com/adoptium/aqa-tests/releases) as per instructions at in https://adoptium.net/docs/aqavit-verification/ in the Overview section: "To verify binaries, testers clone a specified release of the aqa-tests github repository (the latest stable release of the aqa-tests repository)." Noting that the release notes at the top of the release at https://github.com/adoptium/aqa-tests/releases also indicates which OpenJDK release it maps to:
|
Due to the design of the build pipeline where JDK binaries / artifacts are only copied at the end of the pipeline to their 'final archival spot', it implies that we lose time on launching TCKs. Example: from the time of the launch of the jdk17 pipeline to the time it will complete all platforms and copy artifacts, is several hours. Meanwhile some of the child jobs within the platform complete quickly, and testing could proceed without waiting for all platforms to complete. One way to mitigate this could be by running the 'fast to build' platforms together and separating the 'slow to build' platforms into a second run. I think we decided last retrospective that we should do this, but have launched the jdk17 and jdk11 with all platforms together. |
On the day when we expect release, SL/Aug10: referenced in TCK retro issue: https://github.com/temurin-compliance/temurin-compliance/issues/201#issuecomment-1190004838 |
We had certainly discussed tiers, but it was more in priority order. The main issue is that windows is high priority and slow so it's hard to justify doing it in a second run. It should also be stated that the artifacts CAN be pulled from the build job earlier while they are being tested - they just can't be pulled from the top level openjdkXX-pipeline. That is likely a better option if required (and is what I did for windows/jdk17) |
Put win32 in a second pipeline, since it ALWAYs runs before win64, so it takes all of the build and test resources away from a primary platform. [ SXA 2022/10/08: Raised an issue for analysis at https://github.com/adoptium/temurin-build/issues/3065 ] |
The release tool needs tests to ensure that the expected set of artefacts are present after a release is made. (SXA 2022/10/08 Yellow status for release tool job raised at adoptium/temurin-build#3064) |
Can we lock the artifacts of the individual jdkXX- build jobs when RELEASE=true for the duration of the release cycle in case installe/signingr re-runs are required |
FOr the "pre-release" test runs, look at step in the process to delete the jobs afterwards so they are not preserved, which takes space on the server and potentially causes confusion as to which are the release pipelines (Likely named differently but best to avoid any potential confusion) |
Update the release template to include jdk8 alpine-linux (see #153 (comment)). |
Encourage those involved in the releases to get any absences listed in the status document under the "Planned absences during the release cycle" section |
revisit jobs' timeout setting. |
For windows natives, can we install 2 versions, ie both 64 bit and 32 bit, as removing and putting them on a single machine, takes time ?, alternatively devote some machines to 64 bit / some to 32bit? |
The x64 Mac JDK 17 TCK seems to be relatively slow as far as primary platforms go? As a naive community member, I wonder if it is expected and/or worth reflecting on if not?
Waiting for them to appear here, basically. |
thanks @chadlwilson - re: #155 (comment) it is very much worth reflecting on it. I created this issue (with timeline charts) to assess during the retrospective for this release. The project goals for delivery for the primary platforms is to be completed within 2 days of the build completion. Any platform that took longer than that (the ones you called out, need a critical eye on various things that can be done differently for next time (to be discussed in the public retrospective and the private temurin-compliance retrospective). |
Reminder comment to post-mortem Windows 64 bit vscode update issue that results in
|
@xsa suggest: "standard" stuff ( updating installer jdk versions, disable nightly testing) may not need to require the comment or slack message with "approval to ship during lockdown". |
Aqa-tests testenv.properties need to take care of jdk8 arm separately |
Some of aqa test build jobs haven't been updated for a long time. When doing the rebuild there are a few issues.
Those parameters are not consistent, have to pay more attention when rebuild. |
When ga is available, before trigger the pipeline job update corresponding lines in file testenv.properties of aqa-tests release branch |
re #155 (comment), to avoid this problem, I would suggest regenerating the test jobs using aqaAutoGen. It can be done prior to release or with the release. |
re: #155 (comment) could be done the weekend before, when we do a dry run of the release. re: #155 (comment) - we have a clunky process and I am not sure how to make it smoother... our release branch & testenv.properties should be created/updated well ahead of release week, but can not contain the actual release tags for openjdk GA until they are present. At the moment they are present, we can update them, alternatively, we can compare the tag we have with the GA tag and make sure there are no diffs. |
Perhaps adjust order of columns in the status doc to indicate the "natural" order that things would be completed i.e. AQA-TCK-Publish-Installers-Containers instead of AQA-TCK-Installers-Containers-publish as it is currently. |
Ensure that if someone is working on a platform and has a planned absence that someone else is in a position to take over and given appropriate hand-over, or we agree to defer until their return (i.e. we shouldn't have to block progress) |
Understand the following:
|
Clarify some of the field descriptions on the create_installer_windows (maybe others) jobs. It says JDK should be "openj9" or "hotspot" whereas the likely option now is "temurin". Also ensure that examples are provided in JDK8 and JDK11+ format. Ditto for sign_installer's "FULL_VERSION" parameter. Also sign_installer cannot directly pick up from create_installer_windows a it uses a hard coded path which does not include |
Understand how "jfrog" is working, i.e: when we cannot download/install rpm/deb from it |
As per #153 (comment) possibly rename the "Containers" heading on the status table to avoid any ambiguity |
releasCheck.sh should handle this and is now included in the output when performing a release so if anything has the wrong number of artifacts it will be caught and visible in the log of the release job. (At the beginning of the release cycle it was counting wrongly as the GPG signature files were not included) We should probably make the release job show a yellow warning message if the counts are incorrect so it is more visible. |
Just had a look at that - it doesn't seem to include
|
If we force those jobs to the equinix machine it won't be a problem, however we have the same issue for Solaris/SPARC JDK8 where a full extended run takes around 80 hours . This has also been flagged by eclipse :-). Need to confirm if any jobs will fail to run on the other machine, which is still useful for running the interactive tests. |
Need more reliable "release publish" job, to ensure all the correct files get published |
Nightly build smoke tests could be improved to validate build archives, eg.they all exist? verify the .sig? |
release job can be made 'easier' to use (this was raised in past retros, still improvements to be made), even if its to add examples for all 3 primary platforms, or all 13 possible platforms for the regex. Main thing is to remove the chance of manual error. |
Revisit the release checklist. Many of the listed items had notes that they could be automated. Some of them should be re-thunk completely, example, having to create a PR to disable nightly testing, then another to re-enable nightly testing seems like a way to clog the release workflow with silly little steps. Should we not just click the Jenkins button to disable the entire pipeline (including build)? |
I agree that would be incredibly useful. We'd need to have both JDKs available for the job that is performing the extraction, or potentially have the ability to run the job in both 32 and 64-bit version and have the natives write to a different directory for each (which would require equivalent changes to the test jobs to know about the different directories) |
Consider to make the release pipeline jobs produce and print a suitable prepopulated release job URL (with the right tag, upstream job name and ID, and correct regex) to reduce human error and make it more pleasant for those running the release job. |
Scorecard for interest: #157 |
Summary
A retrospective for all efforts surrounding the titular releases.
All community members are welcome to contribute to the agenda via comments below.
This will be a virtual meeting after the release, with at least a week of notice in the #release Slack channel.
On the day of the meeting the agenda items will be iterated over, with a list of actions added in a comment at/near the end.
Invited: Everyone.
Time, Date, and URL
Time:
Date:
URL:
Dial-in:
PIN:
Details
Retrospective Owner Tasks (in order):
The text was updated successfully, but these errors were encountered: