[SSDF Issue] PW2.1: Review the security architecture #144

zdtsw · 2022-05-05T13:46:53Z

Ref: [SSDF Epic] PW: Produce well secured software

Recording work has been done for PW2.1:

Task: Have 1) a qualified person (or people) who were not involved with the design and/or 2) automated processes instantiated in the toolchain review the
software design to confirm and enforce that it meets all of the security requirements and satisfactorily addresses the identified risk information.
Examples:
Example 1: Review the software design to confirm that it addresses applicable security requirements.
Example 2: Review the risk models created during software design to determine if they appear to adequately identify the risks.
Example 3: Review the software design to confirm that it satisfactorily addresses the risks identified by the risk models.
Example 4: Have the software’s designer correct failures to meet the requirements.
Example 5: Change the design and/or the risk response strategy if the security requirements cannot be met.
Example 6: Record the findings of design reviews to serve as artifacts (e.g., in the software specification, in the issue tracking system, in the threat model).

Detail see: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-218.pdf Page 21

zdtsw · 2022-05-06T08:29:05Z

i am working on it : self assign

zdtsw · 2022-05-10T08:24:28Z

PW.2.1

Task: Have 1) a qualified person (or people) who were not involved with the design and/or 2) automated processes instantiated in the toolchain review the
software design to confirm and enforce that it meets all of the security requirements and satisfactorily addresses the identified risk information.
Examples:
Example 1: Review the software design to confirm that it addresses applicable security requirements.
Example 2: Review the risk models created during software design to determine if they appear to adequately identify the risks.
Example 3: Review the software design to confirm that it satisfactorily addresses the risks identified by the risk models.
Example 4: Have the software’s designer correct failures to meet the requirements.
Example 5: Change the design and/or the risk response strategy if the security requirements cannot be met.
Example 6: Record the findings of design reviews to serve as artifacts (e.g., in the software specification, in the issue tracking system, in the threat model).
Work:
- a list of diagrams in different systems under project "adoptium"
  - ci-jenkins-pipelines + temurin-build build job with scripts: docs: add markdown diagram of jenkins job and calling scripts ci-jenkins-pipelines#285
  - git repos interaction: bd2469b#diff-bd4d2582a7f70efe351534543f69960bf2695ac89ebe40395aef05359c23168b
  - release process: https://github.com/temurin-compliance/temurin-compliance/blob/master/doc/diagram.md (TODO: update after TCK part, mainly installer + uploader etc when get permission to access these jobs)
- Findings/questions on Build and Release:
  - No risk models exist when temurin work was done (confirmed)
  - Systems we use and developing are complex => more risky in the release chain to cause problem
    - jenkins and GitHub Action both for building and testing => GitHub Action has the same ability to build something and push out to user?
    - Jfrog + DockerHub + GitHub all use as storage for release
    - old openjdk and new temurin adoptium both need to support (mainly for docker image)
    - old release version(e.g jdk15, jdk16) still left in build system even not actively in use
  - project is practicing CI/CD but lack of traceability
    - no versioning on source code: no tags on git repo under projects(adoptium, temurin-compliance, etc) per each release
    - no commit SHA can be traced back per old release: release notes internally (what bug we fixed, what test cases are skipped etc) and externally(what have been fixed in upstream, what we have added upon that, what is exemption etc): [confirmed]: when migrate to website2 the old feature of release notes are gone, there is a ticket on this.
    - release tags made in temurin*-binaries repo are not useful
    - jenkins build result + AQA test result are in different ways as feedback(slack notification, TRSS) but not get enough attention => later stage spot on real issue
  - Most of the agents deploy/udpate with ansible but not lock down SW version (mostly latest dependency), hard to reproduce exactly the same old build.
  - How to know docker image we use has no vulnerability e.g based image + "update all" done in "docker build"
  - Tests on changes made into adoptium repos need to be reviewed:
    - ci-jenkins-pipeline:
      - run GH Action: not much real test
      - trigger https://ci.adoptopenjdk.net/view/build-tester/job/build-scripts-pr-tester/job/openjdk-build-pr-tester do we care about result of builds here ?
      - merge based on review only
    - installer process not clear:
      - can anyone hijack uploading any release to jfrog?
      - GH is open to everyone with write permission(tracability?)?
      - dockerhub image involves external "docker maintainance"
      - we need https://cbi.eclipse.org/authenticode/sign and https://cbi.eclipse.org/macos/codesign/sign for codesign job
      - we need tools from https://github.com/tianon/gosu (need confirm)
      - we have GH action to build mac and windows packages but jenkins to build linux(suse,redhat and debian)(need confirm) if this is the case, what security we get from GH? since we are using actions provided by different vendors
  - We do not have a "staging" env, everything is running in "production" e.g ci.adoptopenjdk.net
    - jobs created from source code in git repo: ci-pipeline-jenkins, aqa-test
    - jobs manually created for testing purpose
    - jobs are not needed but kept there
  - what is the backup plan? [SXA: ThinBackup plugin used nightly, along with backups taken onto another server]
    - geo-redundancy enabled? No
    - storage reguarly snapshot, Yes: nightly on adoptium jenkins controller; No: Jenkins agents, re-create by ansible
    - do we have any build dependecy on the env.? (e.g build artifacts rely on previous builds, URL etc)
    - jfrog artifactory: any SLA support? Need confirm
    - website2: if it is down, how can user download tarballs? Need confirm [SXA: Only by going to github or pulling from homebrew or JFrog for rpms/debs. API is also separate and does not require website (Although website requires the API)]
    - if we store all tarballs in GH including nightly and offical release, do we have SLA with GH? Need confirm
    - dockerhub: any SLA support? vulnability scanner done automatically by dockerhub? Need confirm
- Findings on AQA(test related system):
  - TODO
- Findings on Adoptium part (git repo + build + infra):
  - Different level or access is not clear:
    - how to add/modify/remove people from project, git repo, jenkins access, JCK machines; [SXA: public jenkins access is done via git groups - currently in the AdoptOpenJDK space which PMC controls. JCK machine access is controlled by Eclipse when people join or leave as committers to the TC project]
    - what is needed for above access to be granted:
    - what is the process to get vote/approve: [SXA: Committer status to adoptium is required for those repositories, and that is a vote that goes out to existing committers via a standard Eclipse process]
  - For temurin, most of the security related activities are done by various access control:
    1. Access to write permission in GitHub repos(https://github.com/adoptium/), including source code, run Github Action and Issue(self-assigne, review, merge etc)
    2. Access to jenkins(https://ci.adoptopenjdk.net/) divided into 4 groups:
      - public (read-only) for some jobs
      - execution (as build/rebuild) for some jobs [SXA: This is generally the AdoptOpenJDK*build and AdoptOpenJDK*build-triage groups who can run the build jobs]
      - execution for release related jobs: users are manually updated for the job, config is not under source control [SXA: Currentl jenkins admins and named others (Haroon+Sophia) although that should be replaced with https://github.com/orgs/AdoptOpenJDK/teams/build-release in the short term]
      - admin for everything
  - For infrastructure, process is not clear:
    1. Jenkins controller(https://ci.adoptopenjdk.net)
      - Day0 operation not clear: who, how, and what, apart us who else has permission to login these servers? [SXA: A subset of the PMC members - SXA/MV/GA/TE - and some others from IBM who had worked with us (PS/GJ/MW/DG) have admin access to the server currently]
      - Day1 operation: assume done by 2 PMC members: deploy changes on VM to set up baseline(?)
      - Day2 operation: not clear, who, how often, OS and applications including jenkins core is patched 2.263.3 Jenkins core Need more documentation to confirm [SXA: Intention is to start looking at it on a weekly basis but this has not been done for the last year]
      - is it HA? Need confirm [SXA: No]
    2. Jenkins agents:
      - Day0 operation not clear: who, how, and what kind of security has been applied: who else can login to such agent? some sponsors? public cloud providers? [SXA: In some cases we've left the cloud providers on e.g. Marist by request, but usually they are removed and only the infrastructure team can log into the agents]
      - Day1 operation: assume done ansible playbook [SXA: Generally yes, although the dockerHost systems are not strictly in accordance with them, and the dynamic agents we have at Azure/AWS are probably not configured this way]
      - Day2 operation: mostly by Ansible + some manual work done on the machine (e.g docker host volume) which can be hard to trace or reproduce on a new machine?
      - Should public all agents information from https://github.com/adoptium/infrastructure/blob/master/ansible/inventory.yml ? [All machines which would be configured and maintained by ansible are in there as this is used as the source of information for AWX. Machines deployed through the DockerStatic role are not in there - they are a bit more fluid in their creation and do not have the playbooks applied. Also the Linux x64 and aarch64 machines are started on demand using docker images on dockerhub created from the dockerfiles in https://github.com/adoptium/infrastructure/tree/master/ansible/docker]
      - is every type of agent working as single node or a node pool? Single point of failure, need confirm [SXA: No jobs should be running against a specific host - there should be redundancy everywhere to cover outages. Where possible this is across more than one cloud provider. https://github.com/Ensure all builds can run on multiple machines temurin-build#1044 covers some of the remaining operations which do not have redundancy]
Findings on External systems:
1. Monitoring system:
- nagios (need permission to access it) if anyone uses it? [SXA: Needs work - we can discuss on Wednesday :-)]
- who is keeping eyes on if agents(VM and containers are dead) [SXA: Pretty much just me as jobs get queued up...]
2. Logging system:
- do we have system to logging our build result: console? who access our system? [SXA: Nothing outside jenkins]
- what is the log rotation: in case a wrong build made from months ago, can we know who did the build with what commits etc [SXA: Only via the metadata that is uploaded to github for nightlies (They are from a timer so no user associated with them). Full console logs from release builds will generally be kept by jenkins as they are locked]
3. Automation system:
- AWX: backup postgres? share the same access control to nagios? [SXA: All such services have separate ACLs. AWX should be accessible to anyone in the the infrastructure GitHub team)
4. Backup plan: https://github.com/adoptium/infrastructure/blob/master/README.md#backups need correction and updates [SXA: See also https://github.com/Collate and document backup strategy for our infrastructure machines infrastructure#1295 - not perfect but we have a plan for each service]
5. Any other system is not under source control in Adoptium repo ? except compliance part.

jiekang · 2022-05-30T13:28:35Z

Jfrog account is a "sponsored enterprise" account

zdtsw · 2022-05-31T12:10:25Z

what is the release process we have for services we provide and who is responsible for these services:

api => Azure k8s
dash => running on Netlify
website-v2 => running on Netlify
https://blog.adoptium.net/ still update? => running on Netlify
https://github.com/adoptium/adoptium.net should be archived?

zdtsw · 2022-05-31T12:28:08Z

Some AQA related question:
- announce new release of run-aqa? v2 was from last Dec, but our temurin-build stil uses v1 =>how to make sure our internal systems get updated need input
- is PerfNext or SmartMedia running somewhere and accessible by public? as service we provide need confirm
- how TRSS is released with new commits in source code? => when we run ansible to deploy new changes in TRSS?
- TRSS runs on AWS, deployed by "infrastructur" ansible as trss.adoptopenjdk.net
- any system is used to monitor service of TRSS need confirm
@smlambert could you help to answer these parts?

sxa · 2022-06-06T16:36:26Z

what is the release process we have for services we provide and who is responsible for these services:
* https://blog.adoptium.net/ still update? => running on Netlify

Updated via PRs to https://github.com/adoptium/blog.adoptium.net

* https://github.com/adoptium/adoptium.net should be archived?

Now that the new version is live and seems to work we do need to do that.

any system is used to monitor service of TRSS need confirm

Once Nagios is back in a reasonable state we should add TRSS to that to monitor it's health, as with any other systems not already covered :-)

zdtsw · 2022-06-16T10:10:53Z

@gdams @karianna could you give inputs for below items:

"No check/scan on docker base image and packages we use to build docker image"
has this been done in any way? or if plan to implement in the future(existing issue ticket)?
"process of release installer for ca certification image"
if this is a manual process, any documentation describe the steps?

gdams · 2022-06-16T10:12:20Z

"No check/scan on docker base image and packages we use to build docker image"
has this been done in any way? or if plan to implement in the future(existing issue ticket)?

The docker images have a synk security run before the published (this is done by the Dockerhub folks rather than us)

zdtsw · 2022-06-16T13:19:25Z

Here is the SSDF security review doc for public access: https://docs.google.com/document/d/1w3znf2X4y0yoiK2w1cNxSwu8ok7ibWCbD3t1oEs2wwU/edit#heading=h.vkwl1qx18vjg
Please review it and let me know if anything is missing

smlambert · 2023-04-03T19:02:16Z

re: #144 (comment)

announce new release of run-aqa? v2 was from last Dec, but our temurin-build stil uses v1 =>how to make sure our internal systems get updated need input

is PerfNext or SmartMedia running somewhere and accessible by public? as service we provide need confirm

how TRSS is released with new commits in source code? => when we run ansible to deploy new changes in TRSS?

TRSS runs on AWS, deployed by "infrastructur" ansible as trss.adoptopenjdk.net

any system is used to monitor service of TRSS need confirm

Sorry, somehow missed this note earlier:

I was not aware that temurin-build uses run-aqa in any shape or form, where/how does it use it? I presume a bot is now monitoring all workflow .yml files for updates, which would include run-aqa updates (and to use immutable SHA vs tag)
no public instances of PerfNext or SmartMedia are running anywhere
TRSS is updated via a Jenkins job, https://ci.adoptium.net/view/Test_grinder/job/TRSS_Code_Sync/ that runs weekly on Friday, no ansible playbook is required to deploy the latest changes to TRSS (in the future, we hope to move the public TRSS instance into a container, which should make it even easier to manage/deploy/recover, but not currently in the 2Q plan) - this was working very well, but recent changes possibly to user ids and permissions or to the recent Jenkins server upgrade seem to have caused the most recent run to fail
Scott Fryer has added TRSS to nagios for monitoring it

sxa · 2023-07-12T11:57:31Z

Noting that we intend to have a third party perform an audit on parts of this, so that process will cover part of this.

sxa · 2023-10-17T10:35:19Z

Note that we have volunteered for an eclipse project to use an external party to perform an analysis of our project security that will be commencing soon.

sxa · 2023-12-13T17:06:21Z

The audit described above started last week (Monday 15th December) and is progressing.

zdtsw mentioned this issue May 5, 2022

[SSDF Epic] PW: Produce well secured software #124

Open

smlambert assigned zdtsw May 6, 2022

smlambert assigned sxa May 30, 2022

smlambert added the security label Jun 3, 2022

zdtsw mentioned this issue Nov 10, 2022

Discussion: Access permission to download artifacts from openjdkX-pipeline adoptium/ci-jenkins-pipelines#490

Closed

zdtsw removed their assignment Jan 31, 2023

sxa mentioned this issue Feb 24, 2023

Tracker: Adoptium Secure Software Development Framework (SSDF) #120

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SSDF Issue] PW2.1: Review the security architecture #144

[SSDF Issue] PW2.1: Review the security architecture #144

zdtsw commented May 5, 2022

zdtsw commented May 6, 2022

zdtsw commented May 10, 2022 •

edited by sxa

jiekang commented May 30, 2022 •

edited

zdtsw commented May 31, 2022 •

edited

zdtsw commented May 31, 2022 •

edited

sxa commented Jun 6, 2022 •

edited

zdtsw commented Jun 16, 2022

gdams commented Jun 16, 2022

zdtsw commented Jun 16, 2022

smlambert commented Apr 3, 2023

sxa commented Jul 12, 2023

sxa commented Oct 17, 2023

sxa commented Dec 13, 2023

[SSDF Issue] PW2.1: Review the security architecture #144

[SSDF Issue] PW2.1: Review the security architecture #144

Comments

zdtsw commented May 5, 2022

zdtsw commented May 6, 2022

zdtsw commented May 10, 2022 • edited by sxa

PW.2.1

jiekang commented May 30, 2022 • edited

zdtsw commented May 31, 2022 • edited

zdtsw commented May 31, 2022 • edited

sxa commented Jun 6, 2022 • edited

zdtsw commented Jun 16, 2022

gdams commented Jun 16, 2022

zdtsw commented Jun 16, 2022

smlambert commented Apr 3, 2023

sxa commented Jul 12, 2023

sxa commented Oct 17, 2023

sxa commented Dec 13, 2023

zdtsw commented May 10, 2022 •

edited by sxa

jiekang commented May 30, 2022 •

edited

zdtsw commented May 31, 2022 •

edited

zdtsw commented May 31, 2022 •

edited

sxa commented Jun 6, 2022 •

edited