New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Reproducible Error Injection (rebased) #21668
base: master
Are you sure you want to change the base?
Add Reproducible Error Injection (rebased) #21668
Conversation
This adds reproducible memory error and test-data error injection, to the fuzzy-test framework. This feature can be enabled with ./config -DERROR_INJECT and additionally to enable call stacks -DERROR_CALLSTACK If enable-asan is used, the callstack is printed by the sanitizer, otherwise please set a breakpoint at the function "break_here", which is executed each time a memory allocation error is injected. If called with the environment variable ERROR_INJECT defined to the empty string the initialization value is printed, and can be used to reproduce the failure later, by passing the value to the ERROR_INJECT variable. There is a search script that can be used to look for errors, and print the command to reproduce the bug: ./testrun.sh This runs in endless mode until an error is found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work !
Note: this an enhanced version of #18355
The first callstack is where the memory allocation error was injected, |
This turns a lot of the asserts into just returning an error. I wonder if we want to keep some of them to trigger a problem when fuzzing. |
Yes, currently lots of asserts are triggered by simple out-of-memory errors. |
This PR is dependent on fixes being submitted for the issue that is detected. |
Now after the init-deadlock is fixed by #21683 there is another bug detected:
|
Found via the reproducible error injection in openssl#21668
Fix for the above bug in #21723. |
bn_wexpand can fail as the result of a memory allocation failure. We should not be calling ossl_assert() on its result because it can fail in normal operation. Found via the reproducible error injection in openssl#21668
Another fix found by this in #21725 |
@@ -62,10 +65,13 @@ int FuzzerTestOneInput(const uint8_t *buf, size_t len) | |||
s3 = buf[0] & 4; | |||
++buf; | |||
} | |||
OPENSSL_assert(BN_bin2bn(buf, l1, b1) == b1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except in the case of a malloc failure this really should succeed. So by removing the assert you remove the possibility that the normal fuzzer (i.e. where we're not injecting malloc errors) could find an input that produces an unexpected failure. After this change if it has an input that produces an unexpected failure, the fuzzer will just ignore it.
We should really only avoid this assert in the case that ERROR_INJECT is defined.
The same comment applies to many of the other places where you are removing asserts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what you mean, that's valid point.
Although I doubt that BN_bin2bn has any other failure mode than out-of-memory,
the other BN_mod_exp below might be a different story.
I made them go to the assertion and guard that in ifndef ERROR_INJECT as suggested in several places.
|
||
static uint64_t my_seed = 88172645463325252LL; | ||
|
||
static void my_srand(uint32_t seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my_* doesn't seem like a great name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it this way and I see no problem with it as it is just a static symbol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But a better name would indicate to future maintainers what the purpose of this replacement rand/srand function is - whereas "my" is really quite meaningless in this context.
# endif | ||
|
||
static void* my_malloc(size_t s | ||
#if OPENSSL_VERSION_NUMBER >= 0x1010000fL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These conditional compilation statements based on the OpenSSL version don't seem appropriate for a specific branch where the version number is well known.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this is the only portable way to writs such callbacks.
And it documents how that this API has evolved over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this is the only portable way to writs such callbacks.
This code does not need to be portable. It will only ever be used in this branch, so these checks are not relevant here. We don't do that anywhere else and it seems wrong to me to start doing so now.
And it documents how that this API has evolved over time.
I don't really see that as adding value here. If we really wanted to document such a thing then it should be in the caller of the callbacks, not in the callbacks themselves.
#ifdef ERROR_INJECT | ||
if (s > 0) | ||
while (my_rand() % 3 <= 1) | ||
buf[my_rand() % s] = (unsigned char)my_rand(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we do this? It seems this deliberately corrupts the input file into the fuzzer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is also error injection. We have a 33% chance of using the test vector unmodified.
And 22% of one byte error, 14% of two byte errors, and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems quite counter-intuitive to me. We are basing the error injection approach on the various corpora files because we hope that the complete set hits as many paths through the codebase as possible. So we increase the likelihood of hitting a codepath with a memory allocation which we can then cause to fail or not.
By deliberately corrupting the corpora files its seems we are reducing the effectiveness of the approach. A corruption in the files seems to me more likely to cause an early bail-out which means we don't then hit the codepaths that the files would otherwise reach. So these corruptions seem like they would reduce the number of codepaths that we are going to hit in total in any one run through of the complete set of files.
In other words, this corruption seems more likely to decrease the chances of finding memory errors than increase them.
While testing this out I hit an assertion failure on this line in fuzz/pem.c:
My temporary solution was to remove it and check the return value instead...but see my comment about assert above. |
After I pushed my last update the next error was detected in the fuzzy-testrun (looks like a new/independent issue),
|
Here is another error detection, see: https://github.com/openssl/openssl/actions/runs/5841857184
I was able to reproduce this and the previous error in current master revision with all known fixes applied. |
Here is another interesting new error that happened on my local machine:
|
bn_wexpand can fail as the result of a memory allocation failure. We should not be calling ossl_assert() on its result because it can fail in normal operation. Found via the reproducible error injection in #21668 Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Kurt Roeckx <kurt@roeckx.be> (Merged from #21725)
Fix for that in #22536 |
#22536 was merged... |
Free functions are expected to be tolerant of a NULL pointer being passed. Fixes the problem in #21668 (comment) Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Hugo Landau <hlandau@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #22536)
Free functions are expected to be tolerant of a NULL pointer being passed. Fixes the problem in #21668 (comment) Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Hugo Landau <hlandau@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #22536) (cherry picked from commit 8d13d9e)
So the log at https://github.com/openssl/openssl/actions/runs/6705970584/job/18221487067?pr=21668 found at least:
|
Fix in #22572 |
We need to ensure we clear any references to the SSL_CTX and crypto_ex data. Addresses the issue found in: openssl#21668 (comment)
New error in https://github.com/openssl/openssl/actions/runs/6734599786/job/18305859563?pr=21668
|
Possible fix for that in #22600 |
Free functions are expected to be tolerant of a NULL pointer being passed. Fixes the problem in openssl/openssl#21668 (comment) Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Hugo Landau <hlandau@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from openssl/openssl#22536) Signed-off-by: fly2x <fly2x@hitls.org>
Now we get:
|
That's still the same as the previous one? |
This PR is in a state where it requires action by @openssl/otc but the last update was 30 days ago |
There are still some outstanding comments from my earlier review. It would be nice to get them resolved so we can merge this. Any update? |
This PR is in a state where it requires action by @openssl/otc but the last update was 30 days ago |
This PR is in a state where it requires action by @openssl/otc but the last update was 61 days ago |
This PR is in a state where it requires action by @openssl/otc but the last update was 92 days ago |
This PR is in a state where it requires action by @openssl/otc but the last update was 123 days ago |
This PR is in a state where it requires action by @openssl/otc but the last update was 154 days ago |
This adds reproducible memory error and test-data
error injection, to the fuzzy-test framework.
This feature can be enabled with ./config -DERROR_INJECT and additionally to enable call stacks -DERROR_CALLSTACK
If enable-asan is used, the callstack is printed by the sanitizer, otherwise please set a breakpoint
at the function "break_here", which is executed each time a memory allocation error is injected.
If called with the environment variable ERROR_INJECT defined to the empty string the initialization value is printed, and can be used to reproduce the failure later, by passing the value to the ERROR_INJECT variable.
There is a search script that can be used to look
for errors, and print the command to reproduce the bug:
./testrun.sh
This runs in endless mode until an error is found.
Checklist