Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI regression from the new hashtable support #24272

Closed
t8m opened this issue Apr 26, 2024 · 7 comments
Closed

CI regression from the new hashtable support #24272

t8m opened this issue Apr 26, 2024 · 7 comments
Assignees
Labels
branch: master Merge to master branch severity: regression The issue/pr is a regression from previous released version severity: urgent Fixes an urgent issue (exempt from 24h grace period) triaged: bug The issue/pr is/fixes a bug

Comments

@t8m
Copy link
Member

t8m commented Apr 26, 2024

2 CI on-push jobs have regressed.

run-checker (enable-asan enable-ubsan no-shared no-asm -DOPENSSL_SMALL_FOOTPRINT)

# ../../fuzz/corpora/hashtable/750489b928c4012a6c4352fdefd92c33d5ca600d
fuzz/hashtable.c:145:14: runtime error: load of misaligned address 0x602000000111 for type 'uint16_t' (aka 'unsigned short'), which requires 2 byte alignment
0x602000000111: note: pointer points here
 00 00 00  ce ce ce ce ce ce c4 ce  3f 3f 28 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00
              ^ 
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior fuzz/hashtable.c:145:14 in 
../../util/wrap.pl ../../fuzz/hashtable-test ../../fuzz/corpora/hashtable => 1
not ok 2 - Fuzzing hashtable
# ------------------------------------------------------------------------------
#   Failed test 'Fuzzing hashtable'
#   at ../../test/recipes/fuzz.pl line 22.

cross-compilation (hppa-linux-gnu, libc6-dev-hppa-cross, -static linux-generic32, no, -test_inclu...

../../util/wrap.pl ../../test/lhash_test => 139
not ok 1 - running lhash_test
# ------------------------------------------------------------------------------
#   Failed test 'running lhash_test'
#   at /home/runner/work/openssl/openssl/util/perl/OpenSSL/Test/Simple.pm line 77.
# Looks like you failed 1 test of 1.02-test_lhash.t ......................... 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/1 subtests 
@t8m t8m added branch: master Merge to master branch triaged: bug The issue/pr is/fixes a bug severity: urgent Fixes an urgent issue (exempt from 24h grace period) severity: regression The issue/pr is a regression from previous released version labels Apr 26, 2024
nhorman added a commit to nhorman/openssl that referenced this issue Apr 26, 2024
we extract several values (uint16_t and uint64_t from the fuzzer buff
passed in, but they weren't aligned on 2 and 8 byte boundaries.  Adjust
the fuzzer to use the proper offsets

Fixes openssl#24272
nhorman added a commit to nhorman/openssl that referenced this issue Apr 26, 2024
we extract several values (uint16_t and uint64_t from the fuzzer buff
passed in, but they weren't aligned on 2 and 8 byte boundaries.  Adjust
the fuzzer to memcpy data to the target variables to avoid unalignment
issues

Fixes openssl#24272
@t8m t8m reopened this Apr 30, 2024
@t8m
Copy link
Member Author

t8m commented Apr 30, 2024

@nhorman Unfortunately the hppa cross-compile job is still not fixed:

https://github.com/openssl/openssl/actions/runs/8893580037/job/24420072833

@nhorman
Copy link
Contributor

nhorman commented Apr 30, 2024

will look at it, need to get a local reproducer setup

@nhorman
Copy link
Contributor

nhorman commented Apr 30, 2024

Notes:
So....this is troubling
while looking for the root cause of this issue, I found that it was occuring in the test_hashtable_multithread test. Its really odd, because it appears we're getting random memory corruption during the test

Whats more odd is that hppa is built without thread support, so its all single threaded, which eliminates race conditions as a possible source of the problem.

Whats more odd is that, in this single threaded environment

Whats more odd is that if I reduce the optimization level to -O1, the problem abates

further reading shows that pa-risc has no hardware support for atomics, all atomic operations are by default implemented with the sync library, or optionally by libatomic. When using the latter several build errors are produced, so for now we appear stuck with the former

I'm starting to wonder if perhaps we're seeing a issue with atomic operations on an old arch that isn't as well maintained anymore

@t8m
Copy link
Member Author

t8m commented May 1, 2024

Hmm... I am wondering if we should just have much more simple fallback for atomics on no-thread builds? Of course another option is to just declare this platform unsupported and remove it from CI.

@nhorman
Copy link
Contributor

nhorman commented May 1, 2024

Possibly. I'd like to better understand the issue before we make a decision, but yeah, I think both of those options are on the table. It's also feasible that this is a compiler issue. One of the options for hppas compiler is -mnoindex-loads, which eliminates the use of indexed addressing mode in loads/stores to which this problem seems related to. Will let you know

@nhorman
Copy link
Contributor

nhorman commented May 1, 2024

I'm becoming increasingly convinced that this is a compiler bug.

My observations so far:

  • Crashes happen in multiple places randomly (sometimes in do_mt_hash_work, sometimes in CRYPTO_atomic_add, sometimes in various other locations), but always as a subordonate of do_mt_hash_work
  • The crashes happen normally because a register that holds the value needed for an indirect load is garbage
  • Inspecting with gdb shows those registers always hold the result of an indexed load
  • setting optimization for the function do_mt_hash_work to O1 makes the problem abate

Its also notable that this isn't a race condition as hppa build disables threading

@t8m
Copy link
Member Author

t8m commented May 2, 2024

Then add -O1 or something else that fixes the issue here as workaround:

target: -static linux-generic32,

nhorman added a commit to nhorman/openssl that referenced this issue May 2, 2024
We're getting some odd errors in the lhash test on hppa.  Analysis shows
that the crash is happening randomly in various places, but always
occurs during an indexed load of register r11 or r23.  Root cause hasn't
been completely determined, but given that:

1) hppa is an unadopted platform
2) asan/ubsan/threadsan shows no issues with the affected code elsewhere
3) The hppa build does not have threading enabled
4) reducing the optimization level to 01 quashes the problem

The belief is that this is either a bug in gcc optimization, or an issue
in the qemu emulator we use to test.

Since this is causing CI failures, I'm proposing that we just lower the
optimization level of the build to -01 to avoid the problem, and address
it more throughly should an actual platform user encounter an error

Fixes openssl#24272
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch: master Merge to master branch severity: regression The issue/pr is a regression from previous released version severity: urgent Fixes an urgent issue (exempt from 24h grace period) triaged: bug The issue/pr is/fixes a bug
Projects
Status: Done
2 participants