Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bombardier locks up on large core count ARM64 machines #72

Open
RobertHenry6bev opened this issue Mar 29, 2021 · 0 comments
Open

bombardier locks up on large core count ARM64 machines #72

RobertHenry6bev opened this issue Mar 29, 2021 · 0 comments

Comments

@RobertHenry6bev
Copy link
Contributor

RobertHenry6bev commented Mar 29, 2021

I'm using the latest release of bombardier, downloaded as a binary from this github repository.

I'm running on ubuntu 20.10 and 21.04 on large core count hardware, both ARM64 and x86_64. I'm using bombardier to drive a webserver implemented in dotNET (C#), communicating through localhost (eg, bombardier and the webserver are all on the same machine).

Sometimes bombardier will lock up on ARM64.

Running the "same" stack on x86_64 I have never seen a lock up.

I see the lockup with both low connection count and high connection count.

Pointing root strace at the locked-up process shows no syscall activity, and the one reported syscall in progress is futex.

I compiled bombardier from source, using -race (thread sanitizer model), and the go runtime didn't complain, although, of course, the dynamic behavior of the entire system changed since the thread sanitizer is slow.

I enabled the appropriate goruntime environment variable to dump stacks on fault. for this test I had requested 479 connections. The population count of the reported state for each goroutine is:

475 IO wait
  3 IO wait, 1 minutes
  1 syscall, 1 minutes
  1 goroutine running on other thread; stack unavailable
  1 sleep, 1 minutes
  1 sleep
  1 semacquire, 1 minutes
  1 select
  1 running
  1 idle
  1 chan receive, 1 minutes

478 of these goroutine stack dumps have net.(*netFD).Read on their call stack. (This relationship of ask for N connections and see N-1 goroutines in netFD.Read is born out for different values of N.)

My experience debugging go is now about 5 years old :)

Have other users of bombardier, or other users of go, seen similar situations on ARM64?

It is possible that the webserver itself locks up momentarily, which causes bombardier to lock up. I tend to discount this, since I can start up an independent parallel execution of bombardier, which runs just fine talking to the same webserver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant