Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: test 53/55 failures/errors on aarch64 #210

Closed
pcmoore opened this issue Mar 4, 2020 · 23 comments
Closed

BUG: test 53/55 failures/errors on aarch64 #210

pcmoore opened this issue Mar 4, 2020 · 23 comments
Assignees
Milestone

Comments

@pcmoore
Copy link
Member

pcmoore commented Mar 4, 2020

It looks like we have a number test failures and errors on aarch64 with the current master branch; HEAD is 38f04da ("tests: add tests for the binary tree").

Test FAILUREs:

 test mode:  c
 test type:  bpf-valgrind
Test 53-sim-binary_tree%%331-00001 result:   FAILURE 53-sim-binary_tree rc=14
 test mode:  c
 test type:  basic
Test 55-basic-pfc_binary_tree%%001-00001 result:   FAILURE 55-basic-pfc_binary_tree.sh rc=1

Test ERRORs:

 test mode:  c
 test type:  bpf-sim
Test 53-sim-binary_tree%%001-00001 result:   ERROR 53-sim-binary_tree rc=14
...
<every 53-sim-binary_tree test in "c" mode ERRORs in the same manner>
 test mode:  python
 test type:  bpf-sim
Test 53-sim-binary_tree%%001-00001 result:   ERROR 53-sim-binary_tree rc=1
...
<every 53-sim-binary_tree test in "python" mode ERRORs in the same manner>
@pcmoore pcmoore added this to the v2.5 milestone Mar 4, 2020
@pcmoore
Copy link
Member Author

pcmoore commented Mar 4, 2020

@drakenclimber I'm not sure if you have access to an aarch64 system (maybe a Raspberry Pi running a 64-bit build?), but I'm going to assign this one to you for obvious reasons. If you don't have access to an aarch64 system let me know and I'll look into this.

@pcmoore
Copy link
Member Author

pcmoore commented Mar 4, 2020

EDIT: sorry, that is error code 14 NOT error 16 (CORRECTED BELOW)

I did a little digging just now and it appears that test 53-sim-binary_tree isn't generating any input and is returning with error code 14/EFAULT when run standalone.

@pcmoore
Copy link
Member Author

pcmoore commented Mar 4, 2020

I added some quick instrumentation (below):

diff --git a/tests/53-sim-binary_tree.c b/tests/53-sim-binary_tree.c
index 2c7890e..c154f67 100644
--- a/tests/53-sim-binary_tree.c
+++ b/tests/53-sim-binary_tree.c
@@ -81,6 +81,7 @@ int main(int argc, char *argv[])
                                              SCMP_A0(SCMP_CMP_EQ, i));
                else
                        rc = seccomp_rule_add(ctx, SCMP_ACT_ERRNO(i), i, 0);
+               printf("PMD: i=%d, rc=%d\n", i, rc);
                if (rc < 0)
                        goto out;
        }

... which resulted in the following output:

# ./53-sim-binary_tree -p
PMD: i=0, rc=0
PMD: i=1, rc=0
PMD: i=2, rc=0
PMD: i=3, rc=0
...
PMD: i=161, rc=0
PMD: i=162, rc=0
PMD: i=163, rc=-14

@drakenclimber
Copy link
Member

Interesting. My Raspberry Pi died a couple months ago, but this seems like a good excuse to order a new one. The new 2GB version looks really tempting.

Until it arrives, I should be able to find an ARM machine within Oracle. I'll start looking...

@pcmoore
Copy link
Member Author

pcmoore commented Mar 4, 2020

Sounds good. I'll be curious to hear how the new RPi 4 does, I've only got a 3B.

Regardless, if you can't find a system let me know.

@drakenclimber
Copy link
Member

drakenclimber commented Mar 4, 2020

aarch64's syscall table is sparse in several places, and thus aarch64_syscall_resolve_num() returns NULL if the syscall number can't be found in the aarch64_syscall_table.

Here's the exact list of syscall numbers that currently do not exist in aarch64:

                if (i == 163 || i == 164 || i == 244 || i == 245 ||
                    i == 246 || i == 247 || i == 248 || i == 249 ||
                    i == 250 || i == 251 || i == 252 || i == 253 ||
                    i == 254 || i == 255 || i == 256 || i == 257 ||
                    i == 258 || i == 259 || i >= 295)
                        /* aarch64 doesn't support these syscall numbers.
                         * remove them
                         */
                        continue;

I'm not really sure of the best way to proceed. Should we limit test 53 and 55 only to x86_64? I really want to ensure we're testing large, unbalanced binary trees. It seems like in a case like this we can skip these tests. Or perhaps we should make a smaller, simpler test for these architectures.

Thoughts?

@pcmoore
Copy link
Member Author

pcmoore commented Mar 4, 2020

Bummer. Although feeding in a bogus syscall number and expecting a valid result isn't quite fair ;)

One possibility would be to first make a call to seccomp_syscall_resolve_num_arch(SCMP_ARCH_NATIVE, i) to see if that syscall number exists; if it doesn't, you simply skip that non-existent syscall.

I think it is also worth noting that this is purely a test problem and not a general aarch64 problem. In the real world only valid syscall numbers would be fed into the API with the expectation of proper results.

@drakenclimber
Copy link
Member

Bummer. Although feeding in a bogus syscall number and expecting a valid result isn't quite fair ;)

I knew I was playing with fire when I did it :). This is a tricky feature to test because the end seccomp behavior shouldn't change when using a binary tree. Only the way we traverse the bpf (or pfc) should change.

One possibility would be to first make a call to seccomp_syscall_resolve_num_arch(SCMP_ARCH_NATIVE, i) to see if that syscall number exists; if it doesn't, you simply skip that non-existent syscall.

I thought about that, but that makes it really difficult to compare against a pre-populated *.pfc file like we're doing in test 55.

I think it is also worth noting that this is purely a test problem and not a general aarch64 problem. In the real world only valid syscall numbers would be fed into the API with the expectation of proper results.

Agreed. I need to step back and think about this for a bit. Perhaps there's a way to use some of the bpf tools you have made to help in the testing.

@pcmoore
Copy link
Member Author

pcmoore commented Mar 4, 2020

I thought about that, but that makes it really difficult to compare against a pre-populated *.pfc file like we're doing in test 55.

Perhaps the answer is that for test 55 we use a smaller subset of syscalls (one dozen, two dozen?) such that we can do the number resolution from the syscall name to ensure they exist.

@drakenclimber
Copy link
Member

Perhaps the answer is that for test 55 we use a smaller subset of syscalls (one dozen, two dozen?) such that we can do the number resolution from the syscall name to ensure they exist.

Yeah, I think that's likely the best and safest answer. So obviously aarch64's syscall table has no holes up until 163. Do you know what the other architecture's tables look like?

And do we feel confident that a binary tree of 24 or so syscalls fully tests the code? (Probably more of a question for me than you.) My gut feeling is yes, but I would like to look through it and run it through coveralls as well.

@pcmoore
Copy link
Member Author

pcmoore commented Mar 4, 2020

Yeah, I think that's likely the best and safest answer. So obviously aarch64's syscall table has no holes up until 163. Do you know what the other architecture's tables look like?

If only we had a pre-built table of syscall architectures ... oh wait, we do! ;)

Example using aarch64, sorted by syscall number:

# ./src/arch-syscall-dump -a aarch64 | sort -k2 -n

And do we feel confident that a binary tree of 24 or so syscalls fully tests the code? (Probably more of a question for me than you.) My gut feeling is yes, but I would like to look through it and run it through coveralls as well.

A test that is unreliable, regardless of it's level of comprehensiveness isn't useful. I would rather have a "good enough" test that is always consistent, everywhere it is run, every time it is run that we can later build upon.

@drakenclimber
Copy link
Member

If only we had a pre-built table of syscall architectures ... oh wait, we do! ;)

I had a feeling there was a tool for that. Thanks ;)

A test that is unreliable, regardless of it's level of comprehensiveness isn't useful.

Agreed. I got greedy and thought I could create the home run of tests. Oh well, a "good enough" test in this case should be sufficient. I'll get to work on it today.

@pcmoore
Copy link
Member Author

pcmoore commented Mar 5, 2020

I got greedy and thought I could create the home run of tests. Oh well, a "good enough" test in this case should be sufficient. I'll get to work on it today.

Hey, no worries, we've all fallen into that trap from time to time. It's still a nice improvement and we've got at least a basic test; this could be much worse :)

@piso77
Copy link

piso77 commented Mar 5, 2020

FWIW, it fails on ppc64 too:

batch name: 53-sim-binary_tree
test mode: c
test type: bpf-sim
Test 53-sim-binary_tree%%001-00001 result: ERROR 53-sim-binary_tree rc=14
Test 53-sim-binary_tree%%002-00001 result: ERROR 53-sim-binary_tree rc=14
Test 53-sim-binary_tree%%003-00001 result: ERROR 53-sim-binary_tree rc=14
Test 53-sim-binary_tree%%004-00001 result: ERROR 53-sim-binary_tree rc=14
Test 53-sim-binary_tree%%005-00001 result: ERROR 53-sim-binary_tree rc=14
Test 53-sim-binary_tree%%006-00001 result: ERROR 53-sim-binary_tree rc=14
Test 53-sim-binary_tree%%007-00001 result: ERROR 53-sim-binary_tree rc=14
Test 53-sim-binary_tree%%008-00001 result: ERROR 53-sim-binary_tree rc=14
Test 53-sim-binary_tree%%009-00001 result: ERROR 53-sim-binary_tree rc=14
Test 53-sim-binary_tree%%010-00001 result: ERROR 53-sim-binary_tree rc=14
...
test mode: c
test type: bpf-valgrind
Test 53-sim-binary_tree%%331-00001 result: FAILURE 53-sim-binary_tree rc=14

@piso77
Copy link

piso77 commented Mar 5, 2020

And 55-basic-pfc_binary_tree too:

batch name: 55-basic-pfc_binary_tree
test mode: c
test type: basic
Test 55-basic-pfc_binary_tree%%001-00001 result: FAILURE 55-basic-pfc_binary_tree.sh rc=1

@pcmoore
Copy link
Member Author

pcmoore commented Mar 5, 2020

Hi @piso77, thanks for the report. Once @drakenclimber has a fix would you be willing to test it?

Unfortunately, I imagine there may be a few other ABIs that have similar failures; it is simply that aarch64 is the only non-x86 ABI that I have easy access to for testing purposes. The fix we have been discussing in this thread should address all these failures (I think).

@drakenclimber
Copy link
Member

I wrote a quick python script to find the first hole in each architecture.

Here are the results:

$ ./find-first-hole-in-each-arch.py 
aarch64's first syscall number hole is 163
arm's first syscall number hole is 7
mips64's first syscall number hole is 1
mips64n32's first syscall number hole is 1
mips's first syscall number hole is 1
parisc64's first syscall number hole is 102
ppc64's first syscall number hole is 192
ppc's first syscall number hole is 224
riscv64's first syscall number hole is 38
s390's first syscall number hole is 17
s390x's first syscall number hole is 13
x32's first syscall number hole is 1
x86_64's first syscall number hole is 335

Based upon these results, I think we should lower tests 53 and 55 to limit out at syscall number 12. Note that there are a few architectures (mips, etc.) that will fail regardless of how low of a number we pick.

@piso77
Copy link

piso77 commented Mar 5, 2020

Hi @piso77, thanks for the report. Once @drakenclimber has a fix would you be willing to test it?

Sure, ping me when there's a patch available!

@pcmoore
Copy link
Member Author

pcmoore commented Mar 5, 2020

Note that there are a few architectures (mips, etc.) that will fail regardless of how low of a number we pick.

How do you plan to address that?

@drakenclimber
Copy link
Member

Note that there are a few architectures (mips, etc.) that will fail regardless of how low of a number we pick.

How do you plan to address that?

I was thinking that I would check the native architecture, and if it matched arm, mips*, or x32, then we would skip the tests. Not pretty, but I don't see an easy way to test the binary tree on those systems.

@pcmoore
Copy link
Member Author

pcmoore commented Mar 5, 2020

I was thinking that I would check the native architecture, and if it matched arm, mips*, or x32, then we would skip the tests. Not pretty, but I don't see an easy way to test the binary tree on those systems.

If we are going to be skipping ABIs based on this, we might as well limit it to just x86_64. Which is pretty bad. I'm not sure if we have a precedence with any of the other tests, but I'm beginning to think the right way to fix this is to use a list of syscall names and not numbers. It's a little more work in the test, but it should avoid this probably completely.

NOTE: the syscalls don't have to be valid for that particular ABI so long as they are valid syscall names in our internal syscall table.

@drakenclimber
Copy link
Member

I'm not sure if we have a precedence with any of the other tests

I believe you are correct. This is a special situation.

I'm beginning to think the right way to fix this is to use a list of syscall names and not numbers. It's a little more work in the test, but it should avoid this probably completely.

I'm good with this choice. I'm not yet sure how I can make it work for the pfc case (test 55), but I'll give it a shot!

@pcmoore
Copy link
Member Author

pcmoore commented Mar 11, 2020

This was fixed in PR #211 and a follow-up fix that I hacked together (no PR) via dc2831e.

@pcmoore pcmoore closed this as completed Mar 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants