Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilize a watchdog inside the firmware #221

Closed
miczyg1 opened this issue Oct 11, 2022 · 7 comments
Closed

Utilize a watchdog inside the firmware #221

miczyg1 opened this issue Oct 11, 2022 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@miczyg1
Copy link
Contributor

miczyg1 commented Oct 11, 2022

The problem you're addressing (if any)
Can we use one of the system watchdogs to automatically reset the system if it gets really wedged? As I mentioned, we do regularly see the PCIe root complex get really wedged, so I'm not sure a software watchdog will do the trick properly. Can we expose the PCH watchdog (i.e. program it in firmware and have a driver in the OS that pets it every 30s or whatever). (Maybe coreboot already supports this out of the box - haven't looked into it, just writing down ideas).

Describe the solution you'd like
Utilize PCH watchdog to reset the platform in case of hangs. Implement a driver that will keep reloading the watchdog.

Where is the value to a user, and who might that user be?
#219 (comment)

Describe alternatives you've considered
None

Additional context
None

@miczyg1 miczyg1 added the enhancement New feature or request label Oct 11, 2022
@macpijan macpijan added this to To Do in Nlnet October 2022 Oct 14, 2022
@macpijan macpijan assigned macpijan and miczyg1 and unassigned macpijan Oct 14, 2022
@miczyg1 miczyg1 moved this from To Do to In progress in Nlnet October 2022 Oct 15, 2022
@miczyg1
Copy link
Contributor Author

miczyg1 commented Oct 15, 2022

I have done a deeper research of watchdogs on Intel platforms and have following results:

  1. TCO timer has a watchdog that can cause a reboot after two consecutive timeouts. However there is a power-on strap that can disable this reboot functionality. If the strap disables rebooting, it can no longer be enabled back (and AFAIK most board designs disable the reboot functionality). This basically disqualifies TCO watchdog timer.
  2. OC Watchdog. Looks to be present since Skylake/Kaby Lake and can cause global reset if expired. It has a nice 1 second granularity with a max timeout of around 17 minutes. Public library for this watchdog is available in edk2-platforms: https://github.com/tianocore/edk2-platforms/tree/master/Silicon/Intel/KabylakeSiliconPkg/Pch/Library/PeiOcWdtLib

So the only option is to go with OC watchdog timer.

@miczyg1
Copy link
Contributor Author

miczyg1 commented Oct 15, 2022

A few considerations:

  1. How long should the timeout be? I would prefer to set it once to e.g. 1-2 minutes and reload it when leaving coreboot and when BDS starts in UEFI payload and then disable the watchdog on ExitBootServices.
  2. Do we expect people to wander in the setup UI longer than 2 minutes? 😄

@pietrushnic
Copy link

@miczyg1 I would say that every time we enable some feature we should expose it in menu so customer can enable/disable as well as modify basic parameters.

@miczyg1
Copy link
Contributor Author

miczyg1 commented Oct 28, 2022

@miczyg1
Copy link
Contributor Author

miczyg1 commented Oct 28, 2022

And the PR with integration to our fork: Dasharo/coreboot#254

@rafkoch
Copy link

rafkoch commented Nov 29, 2022

@miczyg1 what are we waiting for in this task, for more than a month, before we move it to the CLOSED status?

@miczyg1
Copy link
Contributor Author

miczyg1 commented Nov 29, 2022

We can move to closed.

@miczyg1 miczyg1 closed this as completed Nov 29, 2022
@miczyg1 miczyg1 moved this from In progress to Done in Nlnet October 2022 Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

4 participants