Trigger downstream liboqs-python CI is failing #1789

dstebila · 2024-05-10T00:10:10Z

Describe the bug
In recent liboqs CI builds on CircleCI, the "Trigger liboqs-python CI" step is failing.

To Reproduce
See https://app.circleci.com/pipelines/github/open-quantum-safe/liboqs/3710/workflows/34731b55-1e34-4510-bb20-bfdd484fa5d6/jobs/29103

dstebila · 2024-05-10T00:11:41Z

I'm guessing it's somehow related to the changes involving oqs-bot and things not being configured correctly in https://github.com/open-quantum-safe/liboqs/blob/main/.circleci/config.yml#L264.

@ryjones Do you have any ideas about this?

Possibly it would be easier if we switched this job (to trigger downstream CI) over Github Actions...?

ryjones · 2024-05-10T15:53:15Z

The issue is enterprises don't allow PATs to work like they used to. You have to create a GitHub app with a webhook. I'm looking into how to get this done.

dstebila · 2024-05-10T16:00:39Z

Thanks Ry!

ryjones · 2024-05-10T18:35:05Z

Also, if circle-ci doesn't offer anything over GitHub actions, it would make life easier if you moved it over.

baentsch · 2024-05-11T06:10:30Z

Also, if circle-ci doesn't offer anything over GitHub actions, it would make life easier if you moved it over.

Agreed: We have a long-standing issue on this that no-one found time to work on (particularly, getting us ARM runners that were the sole reason why we didn't move off CCI): open-quantum-safe/oqs-provider#248 (oqs-provider typically leads liboqs in infrastructure updates which is why I create such issues first in that sub project as a "proving ground"). If you'd have time to work on this, we'd surely be happy. In that case, please also take a look at #1780 and all dependents.

ryjones · 2024-05-11T12:30:22Z

OQS doesn't (yet) have access to ARM64. I don't have authorization to spend money on large runners, so I will need to huddle with Naomi and Hart to figure out what is authorized for this.

baentsch · 2024-05-12T05:53:13Z

I don't have authorization to spend money

Thanks for the clear statement of limitation, @ryjones.

@dstebila Please help to have the PQCA-powers-that-be authorize this before the 0.11.0 release (created open-quantum-safe/tsc#25 to track): This stops the project from streamlining to GH actions (as recommended by LF employee and desired by OQS since a long time to be more efficient), otherwise requiring unnecessary work:

In order to deliver the 0.11.0 milestone, #1780 will need to support ARM64 CI as per PLATFORMS.md. Given the missing authorization above, the only way to facilitate that seems to be again investing in bespoke ARM64 CCI code.

Unpaid volunteers could consider it unfair or unsavory to do such inefficient or "throw-away work" to save money to an alliance funded by multi-billion-dollar-profit companies.

I personally found it OK to do such "work-around code" while OQS was a pure research project carried by voluntary contributors, but am unwilling to put in such effort to retain a mirage of a well-funded professional alliance, particularly as I'm personally annoyed seeing LF/PQCA processes forced onto OQS without any immediately visible offsetting benefits such as such suitable CI funding authorizations: I'd really be happy if PQCA were willing to spend a healthy portion of its funding on supporting development and not most on lawyers, marketing and executive travel.

FWIW, I did complete open-quantum-safe/ci-containers#84 to lay the foundation for hitting the 0.11.0 goal but for the reasons above will not write further CCI code going forward (beyond the one in the PR above to test the Dockerfile).

ryjones · 2024-05-12T13:42:49Z

To be clear, access to the ARM64 runners is blocked by two things: money and approval from GitHub. I will push on the GitHub angle.

planetf1 · 2024-05-13T11:46:51Z

My understanding is that with our current pqca structure we could raise the request for funding of ARM runners at the PQCA TAC, then potentially they could raise a request with the governing board for funding?

I don't know exactly what the scope is here, but there should be some budget? For our projects, access to supported arm64 runners would seem to be very beneficial in reducing workload, and I wouldn't imagine the usage is too intense.

Is it worth figuring out how much resource we think we might need so that we could provide some kind of cost estimate based on github's published figures?

Given we have arm code in pq-code-package too it could be useful there (currently using QEMU) - I can float the idea there.

Is using a regular running with QEMU a viable fallback? (can be very slow...)

baentsch · 2024-05-13T13:10:34Z

Given we have arm code in pq-code-package too it could be useful there (currently using QEMU) - I can float the idea there.

We also have the goal to not destroy earth's resources uselessly. Using QEMU is a clear case of that: Why run CPUs for hours if you can do the same thing in seconds on "proper" CPUs?

For the purposes of showing that all would work on GH, I already implemented this as "proof of concept", e.g., see test run in action here -- but with a very bad ecological conscience as per the above.

For our projects, access to supported arm64 runners would seem to be very beneficial in reducing workload, and I wouldn't imagine the usage is too intense.

Completely agree. Should be a no-brainer. (The promise for) Getting this (access to such resources) was also one of the reasons why I withdrew my objections to the LF take-over of OQS.

ryjones · 2024-05-13T13:20:09Z

I have requested that pqcp and oqs get access to the ARM runners. The issue is they enter public beta in a few weeks, so they have been slow to approve new access requests.
Here is a copy of the request I raised yesterday.

Please add two orgs to the beta; please add three users to support them

Please add these orgs of which I am an owner:
https://github.com/pq-code-package
https://github.com/open-quantum-safe

Please add these users to the beta org:
baentsch
bhess
SWilson4
planetf1

bhess · 2024-05-14T08:54:09Z

The same issue appears when triggering oqs-provider downstream tests (using Github Actions):
https://github.com/open-quantum-safe/liboqs/actions/runs/9076079554/job/24938031071

planetf1 · 2024-05-14T09:13:49Z

@ryjones thanks for requesting access again. I had assumed there will still be fees for using the arm runners once public. Maybe that concern is misplaced and some usage will be supported on the free tier. Do we know any more yet?

baentsch · 2024-05-14T10:02:08Z

some usage will be supported on the free tier

As I wrote above, "some usage" may already be working for non-commercial projects. It's just taking ages to complete: 10min for x64 and 100mins for aarch64 as per the log I referenced. Possibly using QEMU I added to be safe should the ARM64 runners not, well, run. But conceptually the "test GH job" I have created for that purpose should use real HW (unless I did sth real wrong -- please check).

ryjones · 2024-05-14T12:00:35Z

In a stroke of good fortune, the PQCA board call is right after the PQCA TAC call next week. Given the data @baentsch has provided, I should be able to have a reasonable request to make.

For example, at Hyperledger, we spend about $2000 a month (more or less) on GitHub large runners, including arm. I imagine PQCA as a whole will be less than that for at least a year or two.

ryjones · 2024-05-14T13:00:26Z

Having looked at all available CircleCI data, OQS would have spent ~$82 since June of 2023 on ARM64 runners, had they been available. All of the other usage seems to fall in the free tier for GitHub.

baentsch · 2024-05-14T15:40:00Z

Thanks for this assessment @ryjones -- but please note that OQS has been skipping constant time testing on ARM64. This is a very debatable limitation that IMO should be improved on given ARM64 is now a formally supported tier 1 platform and --unlike Hyperledger-- OQS conceptually is a security software library that should have such (time-intensive) testing, particularly as/if people should begin to trust it in real world applications also on that platform. In addition, OQS is currently not doing a lot of other time-intensive testing that it should (fuzzing, etc.).

All told, I hope you can put (substantially) more than $82 into your annual budget for this: It would save (at least myself) quite a bit of effort to continue to work around this limitation. Also please do not (have LF/PQCA) consider offsetting my work at 0-cost given I am "0-cost"/a volunteer....

ryjones · 2024-05-14T21:07:44Z

I plan to ask for $2000 a month, to cover workload expansion. With the exception of the ARM64 jobs, I think GitHub's current free runners should be able to do substantially all of the CI work; you could move them over at your leisure.

ryjones · 2024-05-15T21:03:39Z

Even if we don't get into the beta, one option would be to sign up for BuildJet, which Hyperledger used for a while.

planetf1 · 2024-05-17T12:57:22Z

My interpretation of the sequence leading up to the CI failure (github): (cc: @ryjones )

The test that fails is triggered by

liboqs/.github/workflows/release-test.yml

Line 15 in a5ec23c

oqs-provider-release-test:

(well, in main).

this then seems to generate an event on the liboqs repo

https://github.com/open-quantum-safe/liboqs/blob/a5ec23cf19763d36a558b8358345823ae45d57e5/scripts/provider-test-trigger.sh

This is a manual ‘dispatches’ event, but against the oqs-provider repo — so it’s effectively triggering tests there

The workflow https://github.com/search?q=repo%3Aopen-quantum-safe%2Foqs-provider%20liboqs-release&type=code is then run

which then run tests https://github.com/open-quantum-safe/oqs-provider/blob/main/scripts/release-test-ci.sh

SWilson4 · 2024-05-17T14:17:00Z

My interpretation of the sequence leading up to the CI failure (github): (cc: @ryjones )

The test that fails is triggered by

liboqs/.github/workflows/release-test.yml

Line 15 in a5ec23c

oqs-provider-release-test:

(well, in main).

this then seems to generate an event on the liboqs repo

@planetf1 Apologies if I'm misinterpreting what you wrote, but just to clarify: the downstream tests are not failing. The failures are due to permissions issues with the token that we use to trigger the downstream tests. Even if the downstream tests were failing, it would not cause the upstream workflows to "go red": the upstream workflow checks the GitHub API response code, which only indicates whether the downstream workflow was triggered successfully, not whether it completed successfully.

The infrastructure that's currently failing is mostly my work (#1507, open-quantum-safe/liboqs-python#65, open-quantum-safe/oqs-provider#345, #1654). My understanding is that it broke when the OQS GitHub account was upgraded to "Enterprise", which changed what we can and can't do with personal access tokens. @ryjones Please let me know if there's anything I can do (within the permissions I have) to help with getting this to work again. I think I have a pretty good understanding of the moving parts involved with the different workflows.

ryjones · 2024-05-17T20:54:11Z

@bhess @dstebila would it be OK if I forked the two repos within the oqs org so I can test out some actions? they would have different names, and be deleted after I'm done with them

dstebila · 2024-05-17T22:28:31Z

Go for it!

ryjones · 2024-05-18T12:18:04Z

@SWilson4 please join me on https://github.com/open-quantum-safe/oqs-provider-ry and https://github.com/open-quantum-safe/liboqs-ry. I am working on branch GHA to use this action

SWilson4 · 2024-05-23T18:07:33Z

The CI failures were occurring because oqs-bot didn't have sufficient permissions. (I'm guessing its permissions were lowered silently during the move to Enterprise or some other recent change.)

After https://github.com/open-quantum-safe/tsc/pull/30/files, liboqs main CI is green and the oqs-provider release test trigger works.

dstebila changed the title ~~Trigger downstreadm liboqs-python CI is failing~~ Trigger downstream liboqs-python CI is failing May 10, 2024

dstebila added the bug Something isn't working; high priority to fix label May 10, 2024

ryjones self-assigned this May 10, 2024

baentsch mentioned this issue May 12, 2024

Obtain funding for ARM CI GH runners open-quantum-safe/tsc#25

Open

baentsch mentioned this issue May 12, 2024

Update Ubuntu support to more current LTS version(s) #1780

Open

2 tasks

planetf1 mentioned this issue May 13, 2024

Access to ARM github runners pq-code-package/tsc#55

Open

ryjones mentioned this issue May 13, 2024

Spending money on GitHub actions: what would be the scale? PQCA/TAC#20

Open

bhess mentioned this issue May 14, 2024

Add MAYO signature scheme from NIST onramp #1707

Open

10 tasks

baentsch mentioned this issue May 15, 2024

Move from CCI to GH CI #1795

Open

SWilson4 closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trigger downstream liboqs-python CI is failing #1789

Trigger downstream liboqs-python CI is failing #1789

dstebila commented May 10, 2024

dstebila commented May 10, 2024

ryjones commented May 10, 2024

dstebila commented May 10, 2024

ryjones commented May 10, 2024

baentsch commented May 11, 2024

ryjones commented May 11, 2024

baentsch commented May 12, 2024

ryjones commented May 12, 2024

planetf1 commented May 13, 2024 •

edited

baentsch commented May 13, 2024

ryjones commented May 13, 2024

bhess commented May 14, 2024

planetf1 commented May 14, 2024

baentsch commented May 14, 2024

ryjones commented May 14, 2024

ryjones commented May 14, 2024

baentsch commented May 14, 2024

ryjones commented May 14, 2024

ryjones commented May 15, 2024

planetf1 commented May 17, 2024 •

edited

SWilson4 commented May 17, 2024

ryjones commented May 17, 2024

dstebila commented May 17, 2024

ryjones commented May 18, 2024

SWilson4 commented May 23, 2024

Trigger downstream liboqs-python CI is failing #1789

Trigger downstream liboqs-python CI is failing #1789

Comments

dstebila commented May 10, 2024

dstebila commented May 10, 2024

ryjones commented May 10, 2024

dstebila commented May 10, 2024

ryjones commented May 10, 2024

baentsch commented May 11, 2024

ryjones commented May 11, 2024

baentsch commented May 12, 2024

ryjones commented May 12, 2024

planetf1 commented May 13, 2024 • edited

baentsch commented May 13, 2024

ryjones commented May 13, 2024

bhess commented May 14, 2024

planetf1 commented May 14, 2024

baentsch commented May 14, 2024

ryjones commented May 14, 2024

ryjones commented May 14, 2024

baentsch commented May 14, 2024

ryjones commented May 14, 2024

ryjones commented May 15, 2024

planetf1 commented May 17, 2024 • edited

SWilson4 commented May 17, 2024

ryjones commented May 17, 2024

dstebila commented May 17, 2024

ryjones commented May 18, 2024

SWilson4 commented May 23, 2024

planetf1 commented May 13, 2024 •

edited

planetf1 commented May 17, 2024 •

edited