Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use an RCU lock for the method store #24344

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

mattcaswell
Copy link
Member

The function property_read_lock() obtains a read lock on the method store. This is a very hot path and is hit every time an algorithm is fetched. However typically the method store is updated infrequently, and there are many reads. We refactor the method store to use an RCU lock instead

We also convert the various instance of gaining the "property read lock"
to use RCU too.
The ossl_method_store_remove() function was never called by anything
outside of the test suite. So we remove it.
We convert the function ossl_method_store_remove_all_provided() from
using the old lock style to the new RCU lock.
We change the ossl_method_store_cache_flush_all() function to use the
new RCU lock
@mattcaswell mattcaswell added branch: master Merge to master branch tests: exempted The PR is exempt from requirements for testing labels May 8, 2024
@mattcaswell mattcaswell marked this pull request as draft May 8, 2024 07:53
@mattcaswell
Copy link
Member Author

Currently set to draft so I can see what happens with CI

@github-actions github-actions bot added the severity: fips change The pull request changes FIPS provider sources label May 8, 2024
We need to ensure that we hold a read RCU lock while iterating over the
store.
TSan does not understand RCU locks. We suppress false positives from code
using these locks.
The current OPENSSL_thread_stop() mechanism uses thread local storage
to store cleanup handlers for the various threads. When a thread exits the
destructor for the thread local storage gets called and we can call the
various handlers. The handlers assume that they can still access the
thread local storage for that particular part of the code - but this relies
on the OPENSSL_thread_stop() thread local storage being destroyed first.
If happens in a different order then glibc seems to NULL out thread local
storage that hasn't been explicitly destroyed causing a memory leak.

glibc seems to call the destructors in the same order that the keys were
created, so we workaround this by always ensuring that the
OPENSSL_thread_stop() key is always created first. This is only a
workaround because it assumes a particular implementation in glibc. We need
a better solution.

We also revert an earlier change in test/threadstest.h. This was only
necessary because of this problem and masked the real issue.
@mattcaswell
Copy link
Member Author

The remaining CI failure is not relevant (the strange tls13messages failure that we occasionally have been seeing).

I tried out the new evp fetch test from openssl/tools#193 on this PR.

Unfortunately I am not seeing the expected performance benefit. In fact things are quite a lot slower. It is unclear why at the moment.

The numbers below represent the average time per fetch call for varying numbers of threads (1, 10, 100, 500, 1000). Smaller numbers are better. For each reading I ran the performance test 5 times and averaged the result.

  before this PR after this PR % change
1 0.141784 0.195926 38.19%
10 1.82386 2.20086 20.67%
100 13.250134 31.748288 139.61%
500 68.427636 238.226994 248.14%
1000 182.90734 411.09355 124.76%

@paulidale
Copy link
Contributor

I do wonder why we're trying to optimise fetch time, the original 3.0 design worked under the assumption that fetches would be uncommon. Not great for legacy apps but okay for new/updated ones.

The way the fetch test is written, it will be exercising the fetch cache rather than the fetching. I'd expect a lot of conflicts at the beginning while populating things and then a pure read phase. Because of the way RCU works, a lot of threads will be copying a fair bit of data around during the populating phase. Each running thread will attempt to populate missing algorithms rather than blocking while other threads do their work for them. Still, only a theory. Profiling will test it.

@t8m
Copy link
Member

t8m commented May 10, 2024

I do wonder why we're trying to optimise fetch time, the original 3.0 design worked under the assumption that fetches would be uncommon. Not great for legacy apps but okay for new/updated ones.

I would say this assumption is wrong even for non-legacy apps. There are many cases where the application cannot cache the fetched algorithms and furthermore even if it can cache them, in many apps doing a proper caching can be fairly complicated.

@t8m
Copy link
Member

t8m commented May 10, 2024

However that does not necessarily mean we should focus on optimizing the first uncached fetch time. Perhaps it would be better to optimize the caching inside libcrypto.

My idea would be to make the fetched algorithm cache per-thread-per-libctx and thus it would avoid locking altogether.

The question is also whether we should support some concurrent calls which actually do not make much sense. I.e. what is the semantics of concurrently loading/unloading providers into a single libctx? I assume we need to support the concurrent loading because of the lazy loads, but unloading should be IMO never done concurrently.

@mattcaswell
Copy link
Member Author

My idea would be to make the fetched algorithm cache per-thread-per-libctx and thus it would avoid locking altogether.

But presumably there still needs to be some synchronisation even with this idea because we would need to flush the cache in some circumstances.

@mattcaswell
Copy link
Member Author

The question is also whether we should support some concurrent calls which actually do not make much sense. I.e. what is the semantics of concurrently loading/unloading providers into a single libctx? I assume we need to support the concurrent loading because of the lazy loads, but unloading should be IMO never done concurrently.

We only get a benefit if we stop concurrent unloading and loading. If we only stop concurrent unloading then we still need all the plumbing to handle concurrent changes to the method store. Either way this is probably a 4.0 thing since it would be a significant API break.

@@ -374,6 +374,9 @@ static int default_context_inited = 0;

DEFINE_RUN_ONCE_STATIC(default_context_do_init)
{
if (!ossl_init_thread())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like ossl_init_thread() does an CRYPTO_THREAD_init_local(), so shouldn't there be a corresponding cleanup call (ossl_cleanup_thread()?) in the error return if either of the two subsequent initialisations (line 380 and 383) fail?

@t8m
Copy link
Member

t8m commented May 10, 2024

But presumably there still needs to be some synchronisation even with this idea because we would need to flush the cache in some circumstances.

That synchronization could be a single atomic variable read. I.e. is the cache still valid or not because there was some asynchronous provider load/unload.

@paulidale
Copy link
Contributor

That synchronization could be a single atomic variable read. I.e. is the cache still valid or not because there was some asynchronous provider load/unload.

It would have to be a counter or timestamp, a simple global (per libctx) yes/no flag isn't going to be enough.

Keep in mind this gem:

There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.

@t8m
Copy link
Member

t8m commented May 13, 2024

It would have to be a counter or timestamp, a simple global (per libctx) yes/no flag isn't going to be enough.

Yes, sure. But as long as it is atomic, it should work fine.

If we have successfully made updates that on RCU write unlock we need
to call ossl_syncrhonize_rcu()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch: master Merge to master branch severity: fips change The pull request changes FIPS provider sources tests: exempted The PR is exempt from requirements for testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

convert property_read_lock to use rcu if suitable
4 participants