Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(hid): Eliminate data race in USB pathway causing dropped keys [ALTERNATIVE] #2267

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

khoek
Copy link
Contributor

@khoek khoek commented Apr 11, 2024

Alternative to #2257, solving the same problem which happens there, perhaps more elegantly. We simply move the existing mutex to protect calls to hid_int_ep_write() as well as the passed buffer, causing zmk_usb_hid_send_report() to become synchronous on first invocation. (Subsequent invocations of zmk_usb_hid_send_report() while the first USB TX was pending also blocked previously.)

The cost of the introduced synchronous delay upon first invocation can be mitigated (and latency I imagine actually slightly improved in some applications) by wrapping zmk_usb_hid_send_report() in a work queue, but I doubt this is worth the memory + task switching penalty. After all, the whole reason the data race wasn't discovered for so long was because the USB TX transactions were completing so blazingly fast in the first place.

Tested working on real hardware (Adv360 Pro).

Closes #2253, closes #2257.

@khoek khoek requested a review from a team as a code owner April 11, 2024 21:46
@caksoylar caksoylar added core Core functionality/behavior of ZMK usb labels Apr 25, 2024
@petejohanson
Copy link
Contributor

I'm not sure I love this version as much. In particular, since we call the API here from the system work queue, we'll be completely blocking all processing while waiting for the USB TX to complete (unless I'm misreading this, but that seems to be the net impact), where as the current code (which admittedly still has the reported bug) avoids this.

I'm tempted to go with the solution in #2257, but ask for that to put the "copy into temp buffer" behavior behind an optional Kconfig flag, so we can have the fix in place only when using a platform/driver that requires it.

Thoughts?

@petejohanson petejohanson self-assigned this May 13, 2024
@khoek
Copy link
Contributor Author

khoek commented May 17, 2024

@petejohanson I agree with you in all respects. Though, I think it's worth pointing out that (at least I believed when I last read the code), in the current implementation the main thread still gets hung up when processing various behaviours, e.g. if a "tap" from a hold-tap is emitted then the code will immediately emit a press then a release USB message, and the main thread will still temporarily hang on sending the second message until sending the first one is complete (for the same reason as your describe above).

All of this class of problems could be alleviated by moving the USB messages to a separate work queue with a simple handler just processing (and waiting on) each queued message item one by one. If something like that was implemented at some point down the track, would you potentially have an interest in merging it?

(I might add the Kconfig guard to the other pull tomorrow, if that's okay with you.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core functionality/behavior of ZMK usb
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NKRO breaks layer-tap with "extended" keycodes
3 participants