Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Retrospective for July 2023 Releases #240

Closed
11 tasks done
adamfarley opened this issue Jul 14, 2023 · 28 comments
Closed
11 tasks done

General Retrospective for July 2023 Releases #240

adamfarley opened this issue Jul 14, 2023 · 28 comments

Comments

@adamfarley
Copy link
Contributor

adamfarley commented Jul 14, 2023

Summary

A retrospective for all efforts surrounding the titular releases.

All community members are welcome to contribute to the agenda via comments below.

This will be a virtual meeting after the release, with at least a week of notice in the #release Slack channel.

On the day of the meeting the agenda items will be iterated over, with a list of actions added in a comment at/near the end.

Invited: Everyone.

Time, Date, and URL

Time: 13:30 GMT, 08:30 EDT
Date: 2023/08/02

URL: https://eclipse.zoom.us/j/81872484707?pwd=dGl1QzcrTllWUkNWRUVGNzdYVUx4dz09

Details

Retrospective Owner Tasks (in order):

  • Post this issue's URL on the #Release Slack channel around the start of the new release.
  • Copy actions from the previous retrospective into this issue, while ignoring actions that are ticked or have issue links (i.e. clearly-completed actions).
  • Wait until 90% of all builds have been released, with no clear signs of additional respins.
  • Announce the retrospective Slack call's date + time on #Release at least one full week in advance, and send out meeting invites.
  • Host the slack call for the retrospective, including:
    • Iterating over the actions from the previous retrospective issue, ticking off completed items.
    • Iterate over the agenda, ensuring everything gets debated.
    • Create a clear list of actions at the end of the retrospective, including volunteer names.
  • Create a new retrospective issue for the next release.
  • Set yourself a calendar reminder so that you remember to commence step 1 (in the new issue) just before the next release.
  • Close this issue.

Actions
Actions from previous retrospective
Actions from this retrospective

@adamfarley
Copy link
Contributor Author

I noticed the new JDK21 ea-tag-triggered pipelines do not run all of the tests that used to be present in the JDK21 weekly pipelines.

The missing test targets are:

  • special.functional
  • extended.openjdk
  • extended.perf

After the release, can we add these tests to the JDK21 ea-tag-triggered pipelines?

@adamfarley
Copy link
Contributor Author

ci.adoptium.net logs me out several times a day. Is it possible to extend the login period? 12 hours or more would be great, but any increase would help.

@adamfarley
Copy link
Contributor Author

adamfarley commented Jul 17, 2023

The release checklist template needs to be updated to consolidate the following checks, as the same action handles both.

  • Disable standard & evaluation nightly builds completely
  • Disable standard & evaluation weekend builds completely

Update: It appears that these checklist entries are factored incorrectly. See this table:

standard nightly evaluation nightly
standard weekly evaluation weekly

Right now we divide the tasks by row and use the same job link for both, when we should divide by column, with a different job link for each.

@adamfarley
Copy link
Contributor Author

The build-pipeline-generator jenkins job currently disables pipeline scheduling by setting the various nightly/weekly pipelines to only build on February 31st.

I don't see why we don't leave the values as-is and simply disable the tick-box.

@adamfarley
Copy link
Contributor Author

adamfarley commented Jul 17, 2023

The job "release-build-pipeline-generator" has been renamed to "release-pipeline-generator".

We should update the release checklist template accordingly.

Also, I recommend we add a note to all the jobs we use near releases, saying "If anyone changes these, please update the release checklist and any associated docs". Plus link.

@adamfarley
Copy link
Contributor Author

adamfarley commented Jul 17, 2023

The release checklist's "Double check the relevent aqa-tests branch version" check could be clearer. Note to refine.

Note: this may be clearer if the sub-checks were moved under the main check in "Prepare For Release".

Another note: This entire section needs to be updated to reflect the current state of the release-pipeline-generator job, and I think the main check sentence could be re-worked for clarity.

@adamfarley
Copy link
Contributor Author

adamfarley commented Jul 19, 2023

I think we'd be faster off the mark during releases if we had a clear, visible, @channel or @here message in the #release channel when the release builds are being built.

Perhaps we could automate this as part of the release jobs?

NOTE: We were working on this point when we concluded Part 1 of the Retrospective.

@sxa
Copy link
Member

sxa commented Jul 19, 2023

Dry run builds prevented the JDK8 mirror from working properly so we didn't get _adopt tags. Unclear why - we should try to understand for next time.

To github.com:adoptium/jdk8u.git
 ! [rejected]              jdk8u382-dryrun-ga -> jdk8u382-dryrun-ga (already exists)
error: failed to push some refs to 'git@github.com:adoptium/jdk8u.git'
hint: Updates were rejected because the tag already exists in the remote.

@sxa
Copy link
Member

sxa commented Jul 21, 2023

Look at recommendations for expressions for filenames in RELEASING.md vs the examples shown in the job (Particularly in terms of the sources tarball and inclusion of testimage files)

@adamfarley
Copy link
Contributor Author

adamfarley commented Jul 21, 2023

Look at recommendations for expressions for filenames in RELEASING.md vs the examples shown in the job (Particularly in terms of the sources tarball and inclusion of testimage files)

Given that it's a long string that's easy to get wrong, I'd consider automating the strings. Maybe internalising the patterns and replacing that field with a drop-down menu to pick the platform?

@adamfarley
Copy link
Contributor Author

adamfarley commented Jul 25, 2023

Releasing JDK8 on Aarch32 had a number of challenges.

  • testenv_arm32.properties was not updated prior to the release.
  • The following aarch32-jdk8u tags caused confusion. Do we really need to keep all of these?
    • jdk8u382-ga
    • jdk8u382-b05_adopt
    • jdk8u382-b05-aarch32-20230719_adopt
    • jdk8u382-ga-aarch32-20230719
    • jdk8u382-b05-aarch32-20230719
    • jdk8u382-b05
  • When testenv had the JDK8 branch set to jdk8u382-b05-aarch32-20230719 and the scmReference for the release job was set to jdk8u382-b05_adopt, the release job did not allow the release to proceed as they did not "match". Example.
    • Note that the problem isn't the _adopt bit. Changing the scmReference to jdk8u382-b05-aarch32-20230719_adopt worked just fine. Example.

@adamfarley
Copy link
Contributor Author

The error message here has a typo in "branch".

[ERROR] scmReference does not match with any JDK brnach in testenv_arm32.properties in aqa-tests release branch. Please update aqa-tests v0.9.8-release release branch. Set the current build result to FAILURE!

@smlambert
Copy link
Contributor

@smlambert
Copy link
Contributor

Update https://adoptium.net/release_notes.html link in checklist as that page does not seem to exist (or need to find it on the website).

@adamfarley
Copy link
Contributor Author

adamfarley commented Jul 26, 2023

Does anyone know why the SHA256 for the tarballs is stored in its own file, when the exact same sha can be found in the json?

We have a lot of artifacts with every build, and I'm curious why we add 50% more just to duplicate data that's already available.

Add this to the retrospective because, if there is no reason, perhaps we can strip it out and make the artifacts list easier to read.

@adamfarley
Copy link
Contributor Author

Looks like an installer Git check (possibly more than 1, looking now) was broken 2 months ago. It could never have compiled in its current state, and I think we should find these things out prior to release week.

Advising that we run all the Installer repo's Git checks regularly, or at least as part of the pre-release work.

@adamfarley
Copy link
Contributor Author

adamfarley commented Jul 26, 2023

The Slack messages produced by using the "/merge" command (used to request approval to merge PRs into release-frozen repos) can quickly fill the #release channel due to the bulky previews.

Can we suppress the issue preview?

Note: If we do, we may want to include the issue title in the text message, as the link doesn't allow at-a-glance issue recognition without the abstract.

@adamfarley
Copy link
Contributor Author

The Installer repo appears to be slow, which can be inconvenient. Not sure if all repos are slow right now, and I'm just more aware of it in Installer because I'm writing PRs for that repo, or if it's Installer-specific.

Example:
0 Mins: I push a second commit to an existing PR.
4 Mins: Ctrl-F5 refreshes the PR's web page, but the page does not show the 2nd PR yet.
5 Mins: The PR now shows the second commit.
6 Mins: The branch's checks finish running on my branch as it was 9 mins ago, telling me that it failed due to the error my 2nd commit fixed.

@adamfarley
Copy link
Contributor Author

The ci server's job history limits seem excessive, and this has resulted in output being lost and files being deleted before both were needed. Would it be possible (in regard to text (limited), not files) to:

  • Purchase more space
  • Increase the job history limit and compensate by storing all files somewhere with cheaper storage. No more jenkins-stored artifacts.
  • Something else?

@adamfarley
Copy link
Contributor Author

adamfarley commented Jul 31, 2023

Can we call arm32 arm32 everywhere? It seems confusing when different places refer to arm32 as:

  • arm32
  • aarch32
  • armv7l
  • armhf
  • armv7hl

and ditto for aarch:

  • aarch64
  • arm64

P.S. Biggest offender seems to be the installer job/docs, which need an update in general.

@adamfarley
Copy link
Contributor Author

adamfarley commented Aug 2, 2023

Actions from previous retrospectives

April 2023:

  • JDK20, we still launched the builds for the 3 platforms we knew were not in plan for the release (AIX, s390x, arm32) and I caught myself triaging one, before I remembered that I did not need to put effort there. Issue raised.
  • Reorganize the template for the release status issue to put the 2 additional primary platforms (aarch64 Linux, aarch64 Mac) at the top of the table and in bold. Issue raised.
  • Extended.openjdk on AIX is being run in parallel mode across 3 nodes, but there is only a single node online with AIX 72 label, so it should not be launched in parallel and hits 15 hr time_limit without completion. Issue raised.
  • Improve/add installers specific checklist/matrix element, to ensure all combinations of version(8,11,17,etc) , type(JDK/JRE) , architectures & distros are covered.
  • Forgot to specify overridePublishName parameter, for jdk8 arm32 build, so job artifacts named wrong:
    OpenJDK8U-debugimage_arm_linux_hotspot_8u372b07-aarch32-20230426.tar.gz
    @sxa Had to rename jenkins job artifacts on the file system..
  • Improve installers & dockerhub release "checklist", maybe add issue template to each repo, and link for main checklist. Issue for checklists.

March 2023:

July 2022:

The items listed here will not be covered as there's a bajillion of them and it's over a year ago. Any issues still occurring can be raised at a future retrospective. - [ ] Add item into "week before" or "on release week" [checklist ](https://github.com/adoptium/adoptium/blob/main/.github/ISSUE_TEMPLATE/release-checklist.md)to explicitly list the aqaReference to be used for the release cycle so it can easily be copied into place when starting the pipelines. [Reply.](https://github.com//issues/155#issuecomment-1189533870) [Reply to reply](https://github.com//issues/155#issuecomment-1210451196) - [ ] \[We should run\] the 'fast to build' platforms together and separating the 'slow to build' platforms into a second run. [Reply.](https://github.com//issues/155#issuecomment-1190227099) - [ ] \[We should refresh JCK materials on each node before each release.\] - [ ] Put win32 in a second pipeline, since it ALWAYS runs before win64, so it takes all of the build and test resources away from a primary platform. [Issue raised.](https://github.com/https://github.com/adoptium/temurin-build/issues/3065) - [ ] The release tool needs tests to ensure that the expected set of artefacts are present after a release is made. In this round the [publish of Windows JRE 11 and 17](https://adoptium.slack.com/archives/CLCFNV2JG/p1658500701043129) resulted in no JRE MSI in the releases repository, which was only caught downstream by the https://github.com/adoptium/containers/pull/235. This should be caught much earlier. (SXA 2022/10/08 Yellow status for release tool job raised at https://github.com/adoptium/temurin-build/issues/3064) [Reply](https://github.com//issues/155#issuecomment-1210446166) - [ ] Can we lock the artifacts of the individual jdkXX- build jobs when RELEASE=true for the duration of the release cycle in case installe/signingr re-runs are required - [ ] For the "pre-release" test runs, look at step in the process to delete the jobs afterwards so they are not preserved, which takes space on the server and potentially causes confusion as to which are the release pipelines (Likely named differently but best to avoid any potential confusion) - [ ] Update the release template to include jdk8 alpine-linux (see https://github.com//issues/153#issuecomment-1193926916). - [ ] Encourage those involved in the releases to get any absences listed in the status document under the "Planned absences during the release cycle" section - [ ] Revisit jobs' timeout setting. One [extended job](https://ci.eclipse.org/temurin-compliance/job/Test_openjdk18_hs_extended.jck_arm_linux/10/consoleFull) is set to "Timeout set to expire in 2 days 2 hr" [Reply.](https://github.com//issues/155#issuecomment-1210456991) - [ ] For windows natives, can we install 2 versions, ie both 64 bit and 32 bit, as removing and putting them on a single machine, takes time ?, alternatively devote some machines to 64 bit / some to 32bit? [Reply.](https://github.com//issues/155#issuecomment-1210720306) - [ ] The x64 Mac JDK 17 TCK seems to be relatively slow as far as primary platforms go. [Reply](https://github.com//issues/155#issuecomment-1195604239) - [ ] Reminder comment to post-mortem Windows 64 bit vscode update issue that results in java -version error loading "msvcp140.dll" and "jvm.dll" - [ ] @xsa suggest: "standard" stuff ( updating installer jdk versions, disable nightly testing) may not need to require the comment or slack message with "approval to ship during lockdown". - [ ] Aqa-tests testenv.properties need to take care of jdk8 arm separately - [ ] Some of aqa test build jobs haven't been updated for a long time. When doing the rebuild, \[these paramerters are not consistent: USE_TESTENV_PROPERTIES, SDK_SOURCE, and CUSTOMIZED_SDK_URL_CREDENTIAL_ID\]. [Reply.](https://github.com//issues/155#issuecomment-1198153235) [Reply to reply.](https://github.com//issues/155#issuecomment-1198165554) - [ ] When ga is available, before trigger the pipeline job update corresponding lines in file testenv.properties of aqa-tests release branch. [Reply.](https://github.com//issues/155#issuecomment-1198165554) - [ ] Perhaps adjust order of columns in the status doc to indicate the "natural" order that things would be completed i.e. AQA-TCK-Publish-Installers-Containers instead of AQA-TCK-Installers-Containers-publish as it is currently. - [ ] Ensure that if someone is working on a platform and has a planned absence that someone else is in a position to take over and given appropriate hand-over, or we agree to defer until their return (i.e. we shouldn't have to block progress) - [ ] Understand how to handle "point releases" versioning given that the update that may only affect some platforms in terms of installers and container image creation. - [ ] Clarify process for "installer respins" that don't require rebuild (so are not point releases). [Comment link.](https://github.com//issues/155#issuecomment-1202250097) - [ ] Clarify some of the field descriptions on the create_installer_windows (maybe others) jobs. [Full comment.](https://github.com//issues/155#issuecomment-1204956422) - [ ] Understand how "jfrog" is working, i.e: when we cannot download/install rpm/deb from it. who to contact. how to escalate this. - [ ] possibly rename the "Containers" heading on the status table to avoid any ambiguity - [ ] Need more reliable "release publish" job, to ensure all the correct files get published https://github.com/adoptium/github-release-scripts/issues/85 - [ ] Nightly build smoke tests could be improved to validate build archives, eg.they all exist? verify the .sig? - [ ] release job can be made 'easier' to use (this was raised in past retros, still improvements to be made), even if its to add examples for all 3 primary platforms, or all 13 possible platforms for the regex. Main thing is to remove the chance of manual error. - [ ] Revisit the release checklist. [Full comment.](https://github.com//issues/155#issuecomment-1210578771) - [ ] Consider to make the release pipeline jobs produce and print a suitable prepopulated release job URL (with the right tag, upstream job name and ID, and correct regex) to reduce human error and make it more pleasant for those running the release job. - [ ] Scorecard for interest: https://github.com//issues/157

@adamfarley
Copy link
Contributor Author

adamfarley commented Aug 2, 2023

Actions from this retrospective

  • Person: To do a thing.

    • Relevant links when action is complete.
  • Adam: Remind anyone raising a retrospective issue to make sure a named individual will add a list of actions at the end of the meeting. Otherwise we have to copy all the comments to make sure we miss nothing, which results in a lot of text. Examples can be found here.

  • Adam: Raise a PR to add the missing weekly tests to the ea-tag-triggered tests.

    • Blocked on the ea builds being run nightly, as adding weekly tests to nightly builds runs the risk of flooding our machines with test jobs. Mentioned here.
    • Unblocked now. PR raised.
  • Adam: Create Release Checklist PR to consolidate the nightly/weekly points, and then separate them by standard/evaluation. Plus a URL change in the evaluation one.

  • Adam: The job "release-build-pipeline-generator" has been renamed to "release-pipeline-generator". Update the release checklist template accordingly.

  • Adam: Refine the release checklist's "Double check the relevent aqa-tests branch version" to be clearer.

  • Andrew: create an issue for the dry run builds preventing the JDK8 mirror from working correctly.

    • Relevant links when action is complete.
  • Stewart: To make an issue for reviewing filename expression in RELEASING.md. Details

    • Relevant links when action is complete.
  • Adam: Add comments to Stewart's issue in the action above re my reply comment.

  • Adam: To raise issue to ensure docs cover the _adopt aarch32 things mentioned here.

  • Adam: Replace the two release notes links mentioned in these comments.

  • Scott: To raise issue for regular installer testing (build installer, test it, upload it, download it, install it, java -version).

    • Relevant links when action is complete.
  • Adam: To raise issue to extend installer build (and binary pushing refactor job) job history limit.

    • Limit is already substantial. Can increase further if needed in the future.
  • Adam: When copying actions between retros, include user name. Add this point to retro template.

  • Stewart: To discuss future plans for build scheduling at the next community retrospective (ea builds, testing, etc).

  • Shelley: Please review the unticked items here and tick them off if you feel they have been resolved.

@smlambert
Copy link
Contributor

Creating a workflow in installers repo to automate the creation of the PR to update the spec file, and watch for when tarballs show up in the API, in a step towards full automation the publishing of Linux installers.

@adamfarley
Copy link
Contributor Author

A few notes on the JDK21 nightly builds during the release:

  • These builds continued during the release. We should make sure they are cancelled during the next release, so as to not use up valuable machine time.
  • These builds appear to be running nightly. This is wrong, as they should only be triggered when a new "ea" tag is detected. Perhaps the ea-tag-detection code is having issues? @sxa - What do you think?

P.S. I'm holding off on adding the weekly tests (mentioned in my action here) until these ea builds are not running every day.

@andrew-m-leonard
Copy link
Contributor

@sxa The releaseTrigger_21ea was triggering and building and publishing the same ea tag every trigger/day.
I've update the releaseTrigger_21ea configuration, hopefully to fix:

  • Changed: rm -f $WORKSPACE/properties to rm -f $WORKSPACE/properties.*
  • There was 2 refactor_openjdk_release_tool triggers, one at the start, and again at end, so removed one

@adamfarley
Copy link
Contributor Author

To avoid bottlenecking solutions on lengthy retrospective debate, is it worth:

  • Raising issues instead of adding retrospective comments.
  • Modifying the retrospective template to advise people of the above, and encourage them not to leave comments.
  • Using a retrospective “tag” so that any unclaimed issues can find owners during the retrospective (with people free to claim issues at any point).

@sxa
Copy link
Member

sxa commented Aug 8, 2023

Consider creating the status issues in temurin-build instead of the adoptium repository to make it easier for all temurin committees who may be involved in releases to be able to update them.

@adamfarley
Copy link
Contributor Author

New general retrospective issue raised here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants