Use an RCU lock for the method store #24344

mattcaswell · 2024-05-08T07:52:59Z

The function property_read_lock() obtains a read lock on the method store. This is a very hot path and is hit every time an algorithm is fetched. However typically the method store is updated infrequently, and there are many reads. We refactor the method store to use an RCU lock instead

We also convert the various instance of gaining the "property read lock" to use RCU too.

The ossl_method_store_remove() function was never called by anything outside of the test suite. So we remove it.

We convert the function ossl_method_store_remove_all_provided() from using the old lock style to the new RCU lock.

We change the ossl_method_store_cache_flush_all() function to use the new RCU lock

mattcaswell · 2024-05-08T07:53:23Z

Currently set to draft so I can see what happens with CI

We need to ensure that we hold a read RCU lock while iterating over the store.

TSan does not understand RCU locks. We suppress false positives from code using these locks.

The current OPENSSL_thread_stop() mechanism uses thread local storage to store cleanup handlers for the various threads. When a thread exits the destructor for the thread local storage gets called and we can call the various handlers. The handlers assume that they can still access the thread local storage for that particular part of the code - but this relies on the OPENSSL_thread_stop() thread local storage being destroyed first. If happens in a different order then glibc seems to NULL out thread local storage that hasn't been explicitly destroyed causing a memory leak. glibc seems to call the destructors in the same order that the keys were created, so we workaround this by always ensuring that the OPENSSL_thread_stop() key is always created first. This is only a workaround because it assumes a particular implementation in glibc. We need a better solution. We also revert an earlier change in test/threadstest.h. This was only necessary because of this problem and masked the real issue.

mattcaswell · 2024-05-09T14:53:08Z

The remaining CI failure is not relevant (the strange tls13messages failure that we occasionally have been seeing).

I tried out the new evp fetch test from openssl/tools#193 on this PR.

Unfortunately I am not seeing the expected performance benefit. In fact things are quite a lot slower. It is unclear why at the moment.

The numbers below represent the average time per fetch call for varying numbers of threads (1, 10, 100, 500, 1000). Smaller numbers are better. For each reading I ran the performance test 5 times and averaged the result.

	before this PR	after this PR	% change
1	0.141784	0.195926	38.19%
10	1.82386	2.20086	20.67%
100	13.250134	31.748288	139.61%
500	68.427636	238.226994	248.14%
1000	182.90734	411.09355	124.76%

paulidale · 2024-05-09T22:36:19Z

I do wonder why we're trying to optimise fetch time, the original 3.0 design worked under the assumption that fetches would be uncommon. Not great for legacy apps but okay for new/updated ones.

The way the fetch test is written, it will be exercising the fetch cache rather than the fetching. I'd expect a lot of conflicts at the beginning while populating things and then a pure read phase. Because of the way RCU works, a lot of threads will be copying a fair bit of data around during the populating phase. Each running thread will attempt to populate missing algorithms rather than blocking while other threads do their work for them. Still, only a theory. Profiling will test it.

t8m · 2024-05-10T07:53:56Z

I do wonder why we're trying to optimise fetch time, the original 3.0 design worked under the assumption that fetches would be uncommon. Not great for legacy apps but okay for new/updated ones.

I would say this assumption is wrong even for non-legacy apps. There are many cases where the application cannot cache the fetched algorithms and furthermore even if it can cache them, in many apps doing a proper caching can be fairly complicated.

t8m · 2024-05-10T07:57:21Z

However that does not necessarily mean we should focus on optimizing the first uncached fetch time. Perhaps it would be better to optimize the caching inside libcrypto.

My idea would be to make the fetched algorithm cache per-thread-per-libctx and thus it would avoid locking altogether.

The question is also whether we should support some concurrent calls which actually do not make much sense. I.e. what is the semantics of concurrently loading/unloading providers into a single libctx? I assume we need to support the concurrent loading because of the lazy loads, but unloading should be IMO never done concurrently.

mattcaswell · 2024-05-10T08:43:35Z

My idea would be to make the fetched algorithm cache per-thread-per-libctx and thus it would avoid locking altogether.

But presumably there still needs to be some synchronisation even with this idea because we would need to flush the cache in some circumstances.

mattcaswell · 2024-05-10T08:47:49Z

The question is also whether we should support some concurrent calls which actually do not make much sense. I.e. what is the semantics of concurrently loading/unloading providers into a single libctx? I assume we need to support the concurrent loading because of the lazy loads, but unloading should be IMO never done concurrently.

We only get a benefit if we stop concurrent unloading and loading. If we only stop concurrent unloading then we still need all the plumbing to handle concurrent changes to the method store. Either way this is probably a 4.0 thing since it would be a significant API break.

tom-cosgrove-arm · 2024-05-10T09:17:40Z

crypto/context.c

@@ -374,6 +374,9 @@ static int default_context_inited = 0;

 DEFINE_RUN_ONCE_STATIC(default_context_do_init)
 {
+    if (!ossl_init_thread())


It looks like ossl_init_thread() does an CRYPTO_THREAD_init_local(), so shouldn't there be a corresponding cleanup call (ossl_cleanup_thread()?) in the error return if either of the two subsequent initialisations (line 380 and 383) fail?

t8m · 2024-05-10T09:58:20Z

But presumably there still needs to be some synchronisation even with this idea because we would need to flush the cache in some circumstances.

That synchronization could be a single atomic variable read. I.e. is the cache still valid or not because there was some asynchronous provider load/unload.

paulidale · 2024-05-10T22:46:04Z

That synchronization could be a single atomic variable read. I.e. is the cache still valid or not because there was some asynchronous provider load/unload.

It would have to be a counter or timestamp, a simple global (per libctx) yes/no flag isn't going to be enough.

Keep in mind this gem:

There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.

t8m · 2024-05-13T06:44:44Z

It would have to be a counter or timestamp, a simple global (per libctx) yes/no flag isn't going to be enough.

Yes, sure. But as long as it is atomic, it should work fine.

If we have successfully made updates that on RCU write unlock we need to call ossl_syncrhonize_rcu()

mattcaswell added 4 commits May 7, 2024 14:40

Convert ossl_method_store_add() to use an RCU lock instead

7f3bbf8

We also convert the various instance of gaining the "property read lock" to use RCU too.

Remove ossl_method_store_remove()

7065045

The ossl_method_store_remove() function was never called by anything outside of the test suite. So we remove it.

Convert ossl_method_store_remove_all_provided for RCU

452b400

We convert the function ossl_method_store_remove_all_provided() from using the old lock style to the new RCU lock.

Convert ossl_method_store_cache_flush_all() for RCU

e2a7eed

We change the ossl_method_store_cache_flush_all() function to use the new RCU lock

mattcaswell added branch: master Merge to master branch tests: exempted The PR is exempt from requirements for testing labels May 8, 2024

mattcaswell marked this pull request as draft May 8, 2024 07:53

github-actions bot added the severity: fips change The pull request changes FIPS provider sources label May 8, 2024

mattcaswell force-pushed the property-read-lock branch from 3d0fe93 to 19ca65f Compare May 9, 2024 10:44

mattcaswell added 6 commits May 9, 2024 13:30

Convert ossl_method_store_cache_set() for RCU

7c3a745

Fix error handling duplicating STORED_ALGORITHMS

0e20fca

Convert ossl_method_store_do_all to use an RCU lock

5d492e0

We need to ensure that we hold a read RCU lock while iterating over the store.

Fix some memory leaks

947a5d9

Suppress false positives from ThreadSanitizer

7601083

TSan does not understand RCU locks. We suppress false positives from code using these locks.

mattcaswell force-pushed the property-read-lock branch from 19ca65f to f6b6973 Compare May 9, 2024 12:37

mattcaswell mentioned this pull request May 10, 2024

The automatic "thread stop" functionality relies on non-standard behaviour #24357

Open

tom-cosgrove-arm reviewed May 10, 2024

View reviewed changes

mattcaswell linked an issue May 13, 2024 that may be closed by this pull request

convert property_read_lock to use rcu if suitable openssl/project#538

Closed

mattcaswell mentioned this pull request May 13, 2024

convert property_read_lock to use rcu if suitable openssl/project#538

Closed

t8m mentioned this pull request May 16, 2024

Investigate slowdown observed in PR #24344 openssl/project#547

Open

Ensure that we properly synchronize on RCU write unlock

490bcc3

If we have successfully made updates that on RCU write unlock we need to call ossl_syncrhonize_rcu()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use an RCU lock for the method store #24344

Use an RCU lock for the method store #24344

mattcaswell commented May 8, 2024

mattcaswell commented May 8, 2024

mattcaswell commented May 9, 2024

paulidale commented May 9, 2024

t8m commented May 10, 2024

t8m commented May 10, 2024 •

edited

mattcaswell commented May 10, 2024

mattcaswell commented May 10, 2024

tom-cosgrove-arm May 10, 2024

t8m commented May 10, 2024

paulidale commented May 10, 2024

t8m commented May 13, 2024

Use an RCU lock for the method store #24344

Are you sure you want to change the base?

Use an RCU lock for the method store #24344

Conversation

mattcaswell commented May 8, 2024

mattcaswell commented May 8, 2024

mattcaswell commented May 9, 2024

paulidale commented May 9, 2024

t8m commented May 10, 2024

t8m commented May 10, 2024 • edited

mattcaswell commented May 10, 2024

mattcaswell commented May 10, 2024

tom-cosgrove-arm May 10, 2024

Choose a reason for hiding this comment

t8m commented May 10, 2024

paulidale commented May 10, 2024

t8m commented May 13, 2024

t8m commented May 10, 2024 •

edited