Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XWindowsEventQueueBuffer: Fix delays when waiting for new events #679

Merged
merged 1 commit into from May 30, 2020

Conversation

p12tic
Copy link
Member

@p12tic p12tic commented May 20, 2020

Fixes #617, #298, #58, symless#6487.

This PR fixes intermittent waits on Linux client. These were caused by the wait loop in XWindowsEventQueueBuffer where we wait for new events. The problem was that we use QLength() to check if there are new events coming from the X11 server. This is insufficient, because it checks only for messages that the XLib already knows about.

Consider the situation when we have no pending events at all and we enter the wait loop. An event arrives in the socket, we are woken up and go on to check QLength(). It will return 0, because the socket itself was not read by XLib. If we call XPending() instead, the socket would be read as we expect, the event would become visible to XLib, XPending() would return nonzero value and we would exit the event loop.

Tested for a couple of days, did not see problems on my local workstation. The fix need further testing, maybe there are other related issues that I have not noticed.

QLength() may return 0 even if there are events pending because they
need to be read from the display socket in order to become visible. We
must use XPending() which will poll the socket if QLength() == 0.
@p12tic
Copy link
Member Author

p12tic commented May 20, 2020

The upstream PR symless#6685 is incorrect because the input timeout is in seconds and we're operating on milliseconds within the function, so a 1000 multiplier is expected. What the upstream PR did was effectively disabled the waiting for events which just burns the CPU. Though it's not full busy-wait, so the impact may not be large.

@p12tic
Copy link
Member Author

p12tic commented May 20, 2020

cc @Jnewbon. I think you should consider something similar to the approach in this PR, instead of what's in symless#6685.

@shymega
Copy link

shymega commented May 20, 2020

Makes sense. I'd like to give it a bit more time to be tested before merging.

I'm preparing for a v2.3.3 release, but I'm keen to get this patch in the release.

@axtux
Copy link

axtux commented May 22, 2020

Hello,

Thanks for the fix. As I understand, it is a fix for the client. I tried your fix client side and it does not solve my problem #617 (comment)

@p12tic
Copy link
Member Author

p12tic commented May 22, 2020

@axtux Interesting. Does the Barrier "unblock" itself if you move a mouse on the client or do any other input event that does not go through Barrier?

I had similar symptoms caused by poor internet connection, so maybe the root cause is different.

@Jnewbon
Copy link

Jnewbon commented May 22, 2020

@p12tic Thanks for the ping, you make total sense with the fix, ill have a look at pulling this in to a test branch as i can reliably reproduce the bug.

@axtux
Copy link

axtux commented May 22, 2020

@p12tic Last time I tried, generating an input event did not unblock anything. Though at that time, the lag was shorter and it was quite difficult to test. I should try again.

Right now, I'm trying the symless#6685 fix to see if that solves my issue.

@axtux
Copy link

axtux commented May 22, 2020

Also, I can confirm the fix does not solve #207 (comment)

@p12tic
Copy link
Member Author

p12tic commented May 22, 2020

@axtux: Alright, so it seems that you could have an unreliable network connection between the client and the server. What happens is that corrupted packets will cause the TCP connection between the client and the server to break and then it takes time to re-establish the connection again, which looks like a delay. The issue that I was fixing was that the client sometimes blocks waiting for X11 events even when there are some. If you generate some events by moving mouse on the client, then the client will unblock. This is how we can check whether the issue is caused by network connection problems or by waiting for already arrived events.

I removed #207 from the list of issues I think this bug fixes, because after rereading it also seems to be caused by connection problems. Dropping the connection when a shift or another modifier key is pressed will cause Barrier to lose this information and then it gets stuck.

@axtux
Copy link

axtux commented May 22, 2020

@p12tic As stated in this comment #617 (comment), both computers are connected together via Ethernet cable, ping running during freeze is never higher than 1ms so this excludes eventual network issue

Also, I confirm this fix symless#6685 fixes the freeze issue on my side.

As per #207 (comment), I'll update my comment to state that this happens every time I press the shift key. What are the chances that a network issue happens every time I press this key?

@p12tic
Copy link
Member Author

p12tic commented May 22, 2020

@axtux:

Could you modify getPendingCountLocked to this and test again?

int XWindowsEventQueueBuffer::getPendingCountLocked()
{
    Lock lock(&m_mutex);
    // work around a bug in libx11 which causes the first XPending not to read events sometimes
    // https://gitlab.freedesktop.org/xorg/lib/libx11/-/merge_requests/1
    XPending(m_display); 
    return XPending(m_display);
}

This could be the missing piece.

@linaori
Copy link

linaori commented May 25, 2020

In regards of the keys getting stuck as I opened an issue about this as well... After updating to Ubuntu 20.04 my ethernet started working, so I'm wired now. This is a direct connection between both my PC and laptop via the same switch.

While I've had a lot less disconnects (oddly enough still happening a few times a day), the issue of keys/mouse cursor getting stuck persists. I can't say it's not the network, just hoping that this info will help.

@axtux
Copy link

axtux commented May 25, 2020

@p12tic I tried again this PR and the original code fixes the issue 😕

Now I'm wondering if I did anything wrong in my first test. I'll do more testing during the week.

@p12tic p12tic force-pushed the x11-fix-event-wait-race-condition branch from 7d2d395 to 722b7d6 Compare May 29, 2020 09:59
@p12tic
Copy link
Member Author

p12tic commented May 29, 2020

@shymega Could this PR be merged? It fixes the problem for me and at least one other user I'm in contact with, as well as for @axtux. The problem and the bugfix are well understood and there's little chance for a regression.

@shymega
Copy link

shymega commented May 30, 2020

Merging now. Thanks!

@shymega shymega merged commit b373d8e into debauchee:master May 30, 2020
@p12tic p12tic deleted the x11-fix-event-wait-race-condition branch June 1, 2020 18:12
@richwalker
Copy link

Apologies, I've commented and attached debug logs here:
b373d8e#commitcomment-40347586

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Input freezes or gets stuck, losing control for a few seconds
6 participants