Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syncthing/lib/api tests unreliable, failing ~50% of the time on Fedora Linux #9455

Open
decathorpe opened this issue Mar 7, 2024 · 4 comments
Labels
bug A problem with current functionality, as opposed to missing functionality (enhancement) needs-triage New issues needed to be validated

Comments

@decathorpe
Copy link

What happened?

I'm the package maintainer for syncthing on Fedora Linux (and for EPEL 9 and 8). We're running tests during package builds, and one of the tests (for the github.com/syncthing/syncthing/lib/api module) has been very unstable and unreliable. This started happening a few releases ago, but it is much worse with v1.27.4 now.

The errors I see most often happen on Linux x86_64. I have seen then occur with both Go 1.22 (Fedora Rawhide and Fedora 40) and Go 1.21 (Fedora 39). With v1.27.4, about half of my attempted builds fail with the log output included below (or output very similar to it).

I am building from the official "source" distribution that includes your vendored dependencies (i.e. not against Go packages that are packaged by the distribution), so there should be no mismatch between the sources and dependencies you use and the ones used for my builds.

Syncthing version

v1.27.4

Platform & operating system

Linux x86_64

Browser version

N/A

Relevant log output

$ go test -buildmode pie -compiler gc -ldflags '  -extldflags '\''-Wl,-z,relro -Wl,--as-needed  -Wl,-z,pack-relative-relocs -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld-errors -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes  '\''' github.com/syncthing/syncthing/lib/api
2024/03/07 13:01:39 INFO: GUI and API listening on 127.0.0.1:35601
2024/03/07 13:01:39 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:39 INFO: GUI and API listening on 127.0.0.1:45031
2024/03/07 13:01:39 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:39 INFO: GUI and API listening on 127.0.0.1:34027
2024/03/07 13:01:39 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:39 INFO: GUI and API listening on [::]:45301
2024/03/07 13:01:39 INFO: Access the GUI via the following URL: http:///
2024/03/07 13:01:39 INFO: GUI and API listening on 127.0.0.1:44831
2024/03/07 13:01:39 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:39 INFO: GUI and API listening on 127.0.0.1:36359
2024/03/07 13:01:39 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:39 INFO: GUI and API listening on 127.0.0.1:43877
2024/03/07 13:01:39 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:39 INFO: GUI and API listening on 127.0.0.1:37173
2024/03/07 13:01:39 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:39 INFO: GUI and API listening on 127.0.0.1:39525
2024/03/07 13:01:39 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:39 INFO: GUI and API listening on 127.0.0.1:46203
2024/03/07 13:01:39 INFO: GUI and API listening on 127.0.0.1:46383
2024/03/07 13:01:39 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:39 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:39 WARNING: Connection priority number for QUIC over WAN must be worse (higher) than QUIC over LAN. Correcting.
2024/03/07 13:01:39 WARNING: Connection priority number for TCP over WAN must be worse (higher) than TCP over LAN. Correcting.
2024/03/07 13:01:39 INFO: GUI and API listening on 127.0.0.1:37241
2024/03/07 13:01:39 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:39 INFO: GUI and API listening on 127.0.0.1:41315
2024/03/07 13:01:39 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:39 WARNING: Decoding posted config: folder "foo": folder has duplicate ID
2024/03/07 13:01:39 INFO: GUI and API listening on 127.0.0.1:34627
2024/03/07 13:01:39 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:39 INFO: GUI and API listening on [::]:33697
2024/03/07 13:01:39 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:40 INFO: GUI and API listening on [::1]:43443
2024/03/07 13:01:40 INFO: Access the GUI via the following URL: http://[::1]:0/
2024/03/07 13:01:40 INFO: Wrong credentials supplied during API authorization from 127.0.0.1:34780
2024/03/07 13:01:40 INFO: Wrong credentials supplied during API authorization from 127.0.0.1:34784
2024/03/07 13:01:40 INFO: Wrong credentials supplied during API authorization from 127.0.0.1:51566
2024/03/07 13:01:40 INFO: Wrong credentials supplied during API authorization from 127.0.0.1:51614
2024/03/07 13:01:40 INFO: GUI and API listening on 127.0.0.1:39313
2024/03/07 13:01:40 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:40 INFO: Wrong credentials supplied during API authorization from 127.0.0.1:40874
2024/03/07 13:01:40 INFO: Wrong credentials supplied during API authorization from 127.0.0.1:40872
2024/03/07 13:01:40 INFO: GUI and API listening on 127.0.0.1:41619
2024/03/07 13:01:40 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:40 INFO: Wrong credentials supplied during API authorization from 127.0.0.1:34506
2024/03/07 13:01:40 INFO: Wrong credentials supplied during API authorization from 127.0.0.1:34470
2024/03/07 13:01:41 INFO: GUI and API listening on 127.0.0.1:33699
2024/03/07 13:01:41 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:41 INFO: Wrong credentials supplied during API authorization from 127.0.0.1:42548
2024/03/07 13:01:41 INFO: Wrong credentials supplied during API authorization from 127.0.0.1:42564
--- FAIL: TestAPIServiceRequests (0.01s)
    --- FAIL: TestAPIServiceRequests//rest/db/file?folder=default&file=something (1.00s)
        api_test.go:464: Unexpected error requesting /rest/db/file?folder=default&file=something: Get "http://127.0.0.1:44831/rest/db/file?folder=default&file=something": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    --- FAIL: TestAPIServiceRequests//rest/db/need?folder=default (1.00s)
        api_test.go:464: Unexpected error requesting /rest/db/need?folder=default: Get "http://127.0.0.1:44831/rest/db/need?folder=default": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    --- FAIL: TestAPIServiceRequests//rest/db/ignores?folder=default (1.00s)
        api_test.go:464: Unexpected error requesting /rest/db/ignores?folder=default: Get "http://127.0.0.1:44831/rest/db/ignores?folder=default": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    --- FAIL: TestAPIServiceRequests//rest/system/log?since=0 (1.00s)
        api_test.go:464: Unexpected error requesting /rest/system/log?since=0: Get "http://127.0.0.1:44831/rest/system/log?since=0": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024/03/07 13:01:41 INFO: GUI and API listening on 127.0.0.1:46033
2024/03/07 13:01:41 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:41 INFO: Wrong credentials supplied during API authorization from 127.0.0.1:50216
2024/03/07 13:01:41 INFO: Wrong credentials supplied during API authorization from 127.0.0.1:50174
2024/03/07 13:01:41 INFO: GUI and API listening on 127.0.0.1:36115
2024/03/07 13:01:41 INFO: Access the GUI via the following URL: http://127.0.0.1:0/
2024/03/07 13:01:41 INFO: Wrong credentials supplied during API authorization from 127.0.0.1:33410
2024/03/07 13:01:41 INFO: Wrong credentials supplied during API authorization from 127.0.0.1:33422
FAIL
FAIL	github.com/syncthing/syncthing/lib/api	2.284s
FAIL
@decathorpe decathorpe added bug A problem with current functionality, as opposed to missing functionality (enhancement) needs-triage New issues needed to be validated labels Mar 7, 2024
@imsodin
Copy link
Member

imsodin commented Mar 10, 2024

These tests started to be executed in parallel in 1.23.7, that would match your experience of these timeouts occuring more frequently. It becoming worse in 1.27.4 is a bit weird, as there was a change to make it less flaky:
ad81ac8

Anyhow, as I can't repro myself, could you please increase the default test timeout here and report back if that fixes the issue for you, and if so with what timeout - thanks.
https://github.com/syncthing/syncthing/blob/main/lib/api/api_test.go#L447

@er-pa
Copy link
Member

er-pa commented Mar 11, 2024

Mhm, when running this test-case enough times it eventually gets triggered. So I assume that it is not unrealistic to expect it to happen more often on different machines. With go test -v -run TestAPIServiceRequests --count=50 it is almost guaranteed to get triggered here by random cases.

I increased the default timeout to 5s and added a logger in case it took longer than 1s (the actual default). The results;

...
=== NAME  TestAPIServiceRequests//rest/svc/deviceid?id=7777777-777777N-7777777-777777N-7777777-777777N-7777777-77777Q4
    api_test.go:446: exceeded 1s 1001 ms
=== NAME  TestAPIServiceRequests//rest/db/file?folder=default&file=something
    api_test.go:446: exceeded 1s 1001 ms
...
=== NAME  TestAPIServiceRequests//rest/system/ping
    api_test.go:446: exceeded 1s 2003 ms
=== NAME  TestAPIServiceRequests//rest/config/devices
    api_test.go:446: exceeded 1s 2002 ms

Feels like it gets executed in 1s batches where for any given reason it can get pushed to a next one. As the average execute-time is like 1-3 ms.

P.s. unless my timer is faulty;)

@decathorpe
Copy link
Author

Ah, that might indeed explain what's going on. Our build machines are pretty big - one of the failures I see was on a system with 2 x 12 CPUs (48 threads) available ... I will try to increase the timeout duration.

@decathorpe
Copy link
Author

As suggested, I raised the timeout here (to 60 seconds, just to be on the safe side - 5 seconds would probably be fine too), and I can no longer reproduce the failure for the lib/api tests. So it looks like the current 1 second timeout is definitely too tight when running these tests in parallel on many CPU cores.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A problem with current functionality, as opposed to missing functionality (enhancement) needs-triage New issues needed to be validated
Projects
None yet
Development

No branches or pull requests

3 participants