Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[checkbox-ce-oem] Modify dbus warm cold boot job (BugFix) #1038

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

rickwu666666
Copy link
Contributor

@rickwu666666 rickwu666666 commented Mar 6, 2024

  • Modify the command called in the job
    Since we change the command name in the test-strict-confinement snap.
  • Fix the wrong job name in test-plan
  • Change job structure
    Make the job easier to read

Description

We modify the command in the test-strict-confinement snap. Therefore, we modify the command that is called in the job.
Also, fix the wrong job ID that is included in the test plan.

Resolved issues

N/A

Documentation

N/A

Tests

Warm-boot
https://certification.canonical.com/hardware/202304-31485/submission/359313/

Cold-boot
https://certification.canonical.com/hardware/202304-31485/submission/359311/

@rickwu666666 rickwu666666 requested a review from a team as a code owner March 6, 2024 01:59
@rickwu666666 rickwu666666 changed the title Modify dbus warm cold boot job [checkbox-ce-oem] Modify dbus warm cold boot job(Bug Fix) Mar 6, 2024
Copy link

codecov bot commented Mar 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 40.52%. Comparing base (84878c2) to head (9324e7c).
Report is 3 commits behind head on main.

❗ Current head 9324e7c differs from pull request most recent head d3e2ae4. Consider uploading reports for the commit d3e2ae4 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1038      +/-   ##
==========================================
- Coverage   43.21%   40.52%   -2.69%     
==========================================
  Files         356      335      -21     
  Lines       38662    37365    -1297     
  Branches     6561     6348     -213     
==========================================
- Hits        16706    15143    -1563     
- Misses      21293    21585     +292     
+ Partials      663      637      -26     
Flag Coverage Δ
contrib-provider-ce-oem 29.59% <ø> (-0.89%) ⬇️
provider-genio ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rickwu666666 rickwu666666 changed the title [checkbox-ce-oem] Modify dbus warm cold boot job(Bug Fix) [checkbox-ce-oem] Modify dbus warm cold boot job (Bug Fix) Mar 6, 2024
@rickwu666666 rickwu666666 changed the title [checkbox-ce-oem] Modify dbus warm cold boot job (Bug Fix) [checkbox-ce-oem] Modify dbus warm cold boot job (BugFix) Mar 6, 2024
Copy link
Contributor

@kissiel kissiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition with the background process scheduling found.

LiaoU3
LiaoU3 previously approved these changes Mar 14, 2024
Copy link
Contributor

@LiaoU3 LiaoU3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@baconYao
Copy link
Contributor

Race condition with the background process scheduling found.

@rickwu666666 , I think you can reference this PR
#979

@rickwu666666
Copy link
Contributor Author

I've pushed a fix for the race condition.

Copy link
Collaborator

@stanley31huang stanley31huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In test plan, we are going to compare the logs between the reboot test, so we need to add a job to collect logs before reboot test.

@rickwu666666 rickwu666666 force-pushed the modify_dbus_warm_cold_boot_job branch from 017f551 to e9bb247 Compare March 15, 2024 01:48
@rickwu666666
Copy link
Contributor Author

@stanley31huang , Yes you are correct. I've pushed the fix. Thanks for the review.

Copy link
Contributor

@kissiel kissiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going through an extra snap seems unnecessary. It:

  • obfuscates what is being done
  • reduces clarity and makes the whole thing less readable and debuggable
  • creates unnecessary dependencies

Also there's a serious race condition with Checkbox being created here that may lead to breakage of other jobs that may follow the one modified here.

imports:
from com.canonical.certification import snap
from com.canonical.certification import lsb
requires:
lsb.distributor_id == 'Ubuntu Core'
snap.name == 'test-strict-confinement'
depends: com.canonical.certification::init-boot-loop-data
command:
test-strict-confinement.warm-boot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is going through additional snap needed here?
As far as I understand it, this just reboots the machine, why introduce extra steps, that involve more ways the job can go wrong (calling a snap from a snap is problematic, also debugging becomes more difficult).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kissiel,
Introducing extra snap is because we want to test those commands in a strict confinement environment. This requirement is from the project Shiner and it is due to a snap issue. The Shiner team requires us to run some test cases that they are concerned about in a strict confinement environment, and that means a snap that is not installed in devmode and also connected necessary interfaces.
And this is kind of a POC and also a short-term solution for the shiner team.

depends: com.canonical.certification::init-boot-loop-data
command:
set -e
rtcwake -d "${RTC_DEVICE_FILE:-rtc0}" -v -m no -s "${STRESS_BOOT_WAKEUP_DELAY:-120}" && test-strict-confinement.cold-boot || exit 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two problems with this line:

  • This command creates a race against checkbox. As soon as the dbus call is made to Poweroff, Checkbox proceeds to the next job (and writes progress to the disk, which leads to other IO problems, btw.), and if Checkbox starts the next job before the actual poweroff happens, the next job will be interrupted by the system.

|| exit 1 is redundant, if the first command failed then the error code will be preserved (also if the second one fails).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kissiel,
Does this race condition happen because the dbus command is called by another SNAP which is outside of the checkbox instead of calling dbus itself? If so, do you have any recommendations for ways to avoid this kind of issue?

 Since we change the command name in test-strict-confinement snap.

Fix the wrong job name in test-plan
Make the job easier to read
Due to -m on is for debugging purpose
Since we need initial status of the system
@rickwu666666 rickwu666666 force-pushed the modify_dbus_warm_cold_boot_job branch from e9bb247 to d3e2ae4 Compare April 23, 2024 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants