-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[checkbox-ce-oem] Modify dbus warm cold boot job (BugFix) #1038
base: main
Are you sure you want to change the base?
[checkbox-ce-oem] Modify dbus warm cold boot job (BugFix) #1038
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1038 +/- ##
==========================================
- Coverage 43.21% 40.52% -2.69%
==========================================
Files 356 335 -21
Lines 38662 37365 -1297
Branches 6561 6348 -213
==========================================
- Hits 16706 15143 -1563
- Misses 21293 21585 +292
+ Partials 663 637 -26
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition with the background process scheduling found.
...rib/checkbox-provider-ce-oem/units/strict-confinement/powermanagement-strict-confinement.pxu
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@rickwu666666 , I think you can reference this PR |
9324e7c
to
017f551
Compare
I've pushed a fix for the race condition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In test plan, we are going to compare the logs between the reboot test, so we need to add a job to collect logs before reboot test.
017f551
to
e9bb247
Compare
@stanley31huang , Yes you are correct. I've pushed the fix. Thanks for the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going through an extra snap seems unnecessary. It:
- obfuscates what is being done
- reduces clarity and makes the whole thing less readable and debuggable
- creates unnecessary dependencies
Also there's a serious race condition with Checkbox being created here that may lead to breakage of other jobs that may follow the one modified here.
imports: | ||
from com.canonical.certification import snap | ||
from com.canonical.certification import lsb | ||
requires: | ||
lsb.distributor_id == 'Ubuntu Core' | ||
snap.name == 'test-strict-confinement' | ||
depends: com.canonical.certification::init-boot-loop-data | ||
command: | ||
test-strict-confinement.warm-boot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is going through additional snap needed here?
As far as I understand it, this just reboots the machine, why introduce extra steps, that involve more ways the job can go wrong (calling a snap from a snap is problematic, also debugging becomes more difficult).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kissiel,
Introducing extra snap is because we want to test those commands in a strict confinement environment. This requirement is from the project Shiner and it is due to a snap issue. The Shiner team requires us to run some test cases that they are concerned about in a strict confinement environment, and that means a snap that is not installed in devmode and also connected necessary interfaces.
And this is kind of a POC and also a short-term solution for the shiner team.
depends: com.canonical.certification::init-boot-loop-data | ||
command: | ||
set -e | ||
rtcwake -d "${RTC_DEVICE_FILE:-rtc0}" -v -m no -s "${STRESS_BOOT_WAKEUP_DELAY:-120}" && test-strict-confinement.cold-boot || exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two problems with this line:
- This command creates a race against checkbox. As soon as the dbus call is made to Poweroff, Checkbox proceeds to the next job (and writes progress to the disk, which leads to other IO problems, btw.), and if Checkbox starts the next job before the actual poweroff happens, the next job will be interrupted by the system.
|| exit 1
is redundant, if the first command failed then the error code will be preserved (also if the second one fails).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kissiel,
Does this race condition happen because the dbus command is called by another SNAP which is outside of the checkbox instead of calling dbus itself? If so, do you have any recommendations for ways to avoid this kind of issue?
Since we change the command name in test-strict-confinement snap. Fix the wrong job name in test-plan
Make the job easier to read
Due to -m on is for debugging purpose
Since we need initial status of the system
e9bb247
to
d3e2ae4
Compare
Since we change the command name in the test-strict-confinement snap.
Make the job easier to read
Description
We modify the command in the test-strict-confinement snap. Therefore, we modify the command that is called in the job.
Also, fix the wrong job ID that is included in the test plan.
Resolved issues
N/A
Documentation
N/A
Tests
Warm-boot
https://certification.canonical.com/hardware/202304-31485/submission/359313/
Cold-boot
https://certification.canonical.com/hardware/202304-31485/submission/359311/