Skip to content

Retrospectives 2021 02 01 SecureDrop 1.7.0 and 1.7.1

Erik Moeller edited this page Feb 2, 2021 · 1 revision

SecureDrop 1.7.0/1.7.1 retrospective - 2021-02-01

What worked well

  • (Kushal): Team identified the issue fast and patch and whole release process happened much faster than any other previous time I remember. +1
  • Extremely collaborative response, team responded like a well-oiled machine. Blameless analysis and detailed documentation, made it easier for others to get involved and participate
  • [1.7.0] Allie did an awesome job closing gaps in the release process docs during the 1.7.0 release period +2
  • The QA process of actual testing went smooth. Means we did better QA during PR merging time.
  • Informed decisions to revise process in the moment: we skipped a few CI runs to keep the time-till-release low.
  • good external comms while the issue was in flight (mostly done ad-hoc) +1
  • Using a dedicated chatroom for all release-related comms has proven to work well to have a shared record of all release-related actions +1+1+1

What can be improved

  • identifying who will be monitoring socials/forum/icingy/paying attention to status reports/etc. after a release +2
    • PROPOSAL: Update RM docs to monitor upgrade status (already done! - but perhaps need clearer responsibilities/handoff)
    • PROPOSAL: Use canary instance(s) to detect issues before nightly update run
  • discipline around PR focus: the breakage fell out of functional changes that could have been made separately from the type annotations the PR was about. +1
  • Testing config changes on long-running SD instances remains a challenge (n.b. even clean install on Focal in 2021-04 won't resolve, since we'll be restoring a backup)
  • Consider migration to static configuration file e.g. for 2.0.0 to reduce likelihood of breakage similar to what we saw in 1.7.0 +2
  • the config module makes tests trickier, which is probably why there were none. a static file would be easier. +1

near term

  • PROPOSAL: survey possible historical configs
  • PROPOSAL: Spec the configuration, file vs database vs code,

longer term

  • PROPOSAL: test different (historical) configs
  • PROPOSAL: Add logic to the restore script, to "update" the configuration to the latest values, longer term, migrating to static config

Maybe proposal: separate configuration:

  • secrets/instance specific stuff (for example, salt & secret values)
  • other, generic default config (for example session timeout)
  • Worth thinking about migrating/updating config.py as part of backups?+1+1
  • formalizing a hotfix workflow to minimize superfluous CI +2
    • yes, we could have a section in the docs
    • PROPOSAL: Merge "hotfixes" (i.e. point releases on a short timeline) directly into release branch, backport later into develop. Consideration: CI targets on release branches may be less reliable than on develop.
    • Task: Define in what situations we need to do a hotfix and document hotfix prcess as described above [ACTION: Erik will file issue]
  • Coordinating support/redmine messaging, calling it our explicitly in the IR plan +1
  • translations are done on release day and it can make it a little risky for getting the release out on time +2, we discussed "continuous translation" as an alternative
    • or, until we can make that switch, just adjusting the translation period so it ends before release day.
  • the translation timeline is problematic for both us and translators. we're hearing that it would be better to have more translation freeze, either by simply lengthening it or reducing the source string feedback period.
    • PROPOSAL: short term: Merge translations at least the day before the release or the final round of QA, long term: continuous translations
  • there's a bit of overhead around QA'ing across multiple release candidates. once you get up to a certain amount, our coverage starts to stetch across multiple release candidates rather than one entire rc
    • we could try to push ourselves more to only make development changes if there are critical fixes so that we can get QA done days ahead of time and focus more on long-standing instance tests/ testing thoroughly
    • PROPOSAL: clarify what is a release blocker and what is not. Opportunistic fixes to non-release blockers could be merged to develop, but not backported to the release branch
    • release candidate contains only the release candidate , and other fixes go to develop
  • PROPOSAL: timeboxed investigation to improve time/paralellization of CI:
    • Improving CI speed and reliablility
    • selectively running tests on PRs
      • i18n-*
      • update-builder-*
Clone this wiki locally