New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use an RCU lock for the method store #24344
base: master
Are you sure you want to change the base?
Conversation
We also convert the various instance of gaining the "property read lock" to use RCU too.
The ossl_method_store_remove() function was never called by anything outside of the test suite. So we remove it.
We convert the function ossl_method_store_remove_all_provided() from using the old lock style to the new RCU lock.
We change the ossl_method_store_cache_flush_all() function to use the new RCU lock
Currently set to draft so I can see what happens with CI |
3d0fe93
to
19ca65f
Compare
We need to ensure that we hold a read RCU lock while iterating over the store.
TSan does not understand RCU locks. We suppress false positives from code using these locks.
The current OPENSSL_thread_stop() mechanism uses thread local storage to store cleanup handlers for the various threads. When a thread exits the destructor for the thread local storage gets called and we can call the various handlers. The handlers assume that they can still access the thread local storage for that particular part of the code - but this relies on the OPENSSL_thread_stop() thread local storage being destroyed first. If happens in a different order then glibc seems to NULL out thread local storage that hasn't been explicitly destroyed causing a memory leak. glibc seems to call the destructors in the same order that the keys were created, so we workaround this by always ensuring that the OPENSSL_thread_stop() key is always created first. This is only a workaround because it assumes a particular implementation in glibc. We need a better solution. We also revert an earlier change in test/threadstest.h. This was only necessary because of this problem and masked the real issue.
19ca65f
to
f6b6973
Compare
The remaining CI failure is not relevant (the strange tls13messages failure that we occasionally have been seeing). I tried out the new evp fetch test from openssl/tools#193 on this PR. Unfortunately I am not seeing the expected performance benefit. In fact things are quite a lot slower. It is unclear why at the moment. The numbers below represent the average time per fetch call for varying numbers of threads (1, 10, 100, 500, 1000). Smaller numbers are better. For each reading I ran the performance test 5 times and averaged the result.
|
I do wonder why we're trying to optimise fetch time, the original 3.0 design worked under the assumption that fetches would be uncommon. Not great for legacy apps but okay for new/updated ones. The way the fetch test is written, it will be exercising the fetch cache rather than the fetching. I'd expect a lot of conflicts at the beginning while populating things and then a pure read phase. Because of the way RCU works, a lot of threads will be copying a fair bit of data around during the populating phase. Each running thread will attempt to populate missing algorithms rather than blocking while other threads do their work for them. Still, only a theory. Profiling will test it. |
I would say this assumption is wrong even for non-legacy apps. There are many cases where the application cannot cache the fetched algorithms and furthermore even if it can cache them, in many apps doing a proper caching can be fairly complicated. |
However that does not necessarily mean we should focus on optimizing the first uncached fetch time. Perhaps it would be better to optimize the caching inside libcrypto. My idea would be to make the fetched algorithm cache per-thread-per-libctx and thus it would avoid locking altogether. The question is also whether we should support some concurrent calls which actually do not make much sense. I.e. what is the semantics of concurrently loading/unloading providers into a single libctx? I assume we need to support the concurrent loading because of the lazy loads, but unloading should be IMO never done concurrently. |
But presumably there still needs to be some synchronisation even with this idea because we would need to flush the cache in some circumstances. |
We only get a benefit if we stop concurrent unloading and loading. If we only stop concurrent unloading then we still need all the plumbing to handle concurrent changes to the method store. Either way this is probably a 4.0 thing since it would be a significant API break. |
@@ -374,6 +374,9 @@ static int default_context_inited = 0; | |||
|
|||
DEFINE_RUN_ONCE_STATIC(default_context_do_init) | |||
{ | |||
if (!ossl_init_thread()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like ossl_init_thread()
does an CRYPTO_THREAD_init_local()
, so shouldn't there be a corresponding cleanup call (ossl_cleanup_thread()
?) in the error return if either of the two subsequent initialisations (line 380 and 383) fail?
That synchronization could be a single atomic variable read. I.e. is the cache still valid or not because there was some asynchronous provider load/unload. |
It would have to be a counter or timestamp, a simple global (per libctx) yes/no flag isn't going to be enough. Keep in mind this gem:
|
Yes, sure. But as long as it is atomic, it should work fine. |
If we have successfully made updates that on RCU write unlock we need to call ossl_syncrhonize_rcu()
The function property_read_lock() obtains a read lock on the method store. This is a very hot path and is hit every time an algorithm is fetched. However typically the method store is updated infrequently, and there are many reads. We refactor the method store to use an RCU lock instead