Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-think CpuBoundWork implementation and usage #9988

Open
Al2Klimov opened this issue Feb 2, 2024 · 1 comment · May be fixed by #9990
Open

Re-think CpuBoundWork implementation and usage #9988

Al2Klimov opened this issue Feb 2, 2024 · 1 comment · May be fixed by #9990
Assignees
Labels
area/api REST API area/distributed Distributed monitoring (master, satellites, clients) core/quality Improve code, libraries, algorithms, inline docs ref/IP

Comments

@Al2Klimov
Copy link
Member

Is your feature request related to a problem? Please describe.

With the new network stack we start CORES x 2 I/O threads, for accept(2), SSL handshake, ... and for actual messages processing. Especially the latter may consume quite some CPU and time. So we've scoped such with CpuBoundWork "semaphores", so that those happen only in (CORES x 2) x 0.75 threads in parallel. This way there are always some threads available for... "I/O stuff".

CpuBoundWork uses RAII, just like std::unique_lock. On construction, as long as there's no available slot, it yields the current coroutine so that I/O stuff can happen. (Kinda async spin.) Once there's a slot, it occupies that and continues. On destruction it frees the slot.

That "async spin" is neither optimal (Hello unfair scheduling!), nor necessary.

Describe the solution you'd like

As CpuBoundWork is scoped anyway, let's put the stuff to do in a callback and in the regular thread pool. It's their turn when it's their turn. Once that finished, the caller coroutine can be awakened via the existing AsioConditionVariable.

Describe alternatives you've considered

Also, if no slots are available, such an AsioConditionVariable can be put into a queue. This way nothing spins, but no callbacks need to be transferred across thread pools.

Additional context

Also, @yhabteab said I've used CpuBoundWork a bit more than necessary across lib/remote.

ref/IP/48306

@Al2Klimov Al2Klimov added area/distributed Distributed monitoring (master, satellites, clients) area/api REST API core/quality Improve code, libraries, algorithms, inline docs ref/IP labels Feb 2, 2024
@Al2Klimov Al2Klimov self-assigned this Feb 6, 2024
@yhabteab
Copy link
Member

yhabteab commented Feb 8, 2024

There is absolutely no need to acquire a CPU slot here as there's no lock involved in anyway, that would entirely block the current thread.

CpuBoundWork fetchingAuthenticatedUser (yc);
authenticatedUser = ApiUser::GetByAuthHeader(std::string(request[http::field::authorization]));

CpuBoundWork allowOriginHeader (yc);
auto allowedOrigins (headerAllowOrigin->ToSet<String>());

This only wastes CPU slots and slows down the message handling unnecessarily, as you can update the stats within the try/catch above it while already holding a CPU slot for other (better) things.

CpuBoundWork taskStats (yc);
l_TaskStats.InsertValue(Utility::GetTime(), 1);

Although there is locking involved here, I don't think it should take too long for the thread to actually acquire it, since there aren't that many threads dealing with endpoint clients. It's just wasting pointless time trying to obtain a CPU slot.

CpuBoundWork removeClient (yc);
if (m_Endpoint) {
m_Endpoint->RemoveClient(this);
} else {
ApiListener::GetInstance()->RemoveAnonymousClient(this);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api REST API area/distributed Distributed monitoring (master, satellites, clients) core/quality Improve code, libraries, algorithms, inline docs ref/IP
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants