Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C99 VLAs #7

Open
woodard opened this issue Aug 26, 2021 · 1 comment
Open

C99 VLAs #7

woodard opened this issue Aug 26, 2021 · 1 comment

Comments

@woodard
Copy link
Collaborator

woodard commented Aug 26, 2021

There is the C ABI enforced by the toolchaain, the psABI which is the ABI for the particular platform defines things like the calling convention. Then there is another ABI which is kind of the function compatibility ABI that is ill defined and not written down anywhere and this is part of it. One could make an argument that this should be part of the C ABI but C is old, poorly specified, there isn’t much interest in it and therefore few resources dedicated to it so it isn’t likely to move forward. The general sense is C VLAs are a bad part of the language, they are probably under specified and really people shouldn’t use them. This is why C99 VLAs were never adopted into C++ there are better ways to do this kind of thing. That being said there may be some efforts to actually sort out this in C23 but the RH guys are not really hopeful http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0009r10.html

Now so long as we understand that we are not talking about “the C ABI” or the psABI and we are just talking about the “function compatibility ABI” then we can define two rules which should determine if the functions are compatible:

  1. for VLAs the array bounds need to be the same. This means both type and value. In DWARF there are two ways to specify this. One is just to supply the upper bound in which case the lower bound is assumed based upon the language since this is C you can just assume 0. The other explicitly sets both the upper and lower bounds. IIRC LLVM does this in some cases.

This is important because there may be internal checks within the function that aare specified or optimized out based upon the array bounds. One good example of this is statically defined VLAs https://hamberg.no/erlend/posts/2013-02-18-static-array-indices.html

  1. when the value of an array bound is not a constant but stored in a variable then both the location of the variable that defines the array bound needs to be in the same location. You can derive this from the DWARF but the information is not in the same DWARF info section it is in the location lists which are pointed to by the DWARF information.

You can tell the difference between:

int get(int m, int n, int a[m][n], int x, int y);
int get(int m, int n, int a[n][m], int x, int y);

but you have to look carefully at the debuginfo. The real difference is in the location lists.

 [    b9]    pointer_type         abbrev: 10
             byte_size            (data1) 8
             type                 (ref4) [    9f]

 [    9f]    array_type           abbrev: 7
             type                 (ref4) [    98]
             sibling              (ref4) [    b2]
 [    a8]      subrange_type        abbrev: 8
               type                 (ref4) [    b2]
               upper_bound          (ref4) [    8a]
 [    b2]    base_type            abbrev: 9
             byte_size            (data1) 8
             encoding             (data1) unsigned (7)
             name                 (strp) "long unsigned int"

The fact that this a variable means that we have to go dig in the location lists and make sure that the variable that defines the array bounds is in the same location as the function expects it to be.

 [    8a]      variable             abbrev: 5
               type                 (ref4) [    b2]
               artificial           (flag_present) yes
               location             (sec_offset) location list [    22]
               GNU_locviews         (sec_offset) location list [    1e]

Thus we get this difference when we dig through the location lists.

--- foo 2021-08-25 19:01:28.300955535 -0700
+++ bar 2021-08-25 19:01:40.487129297 -0700
@@ -34,7 +34,7 @@
     offset_pair 0, 3
       .text+000000000000000000 <get>..
       .text+0x0000000000000002 <get+0x2>
-        [ 0] breg4 0
+        [ 0] breg5 0
         [ 2] const1u 32
         [ 4] shl
         [ 5] const1u 32
@@ -45,7 +45,7 @@
     offset_pair 3, 16
       .text+0x0000000000000003 <get+0x3>..
       .text+0x0000000000000015 <get+0x15>
-        [ 0] breg4 -1
+        [ 0] breg5 -1
         [ 2] stack_value
     end_of_list

This may crop up in other languages that support different constructs. Luckily C++ is not subject to this problem.

Does this all make sense to everybody? It boils down to: “array bounds types and vaalues must be the same” and “when array bounds are specified in variables their locations must be the same” then we can agree that it would be nice to have this in the C ABI but it isn’t likely going to end up there. VLAs are bad and not perfectly specified but it is 20 years too late to change them. Use C++ with vectors and arrays instead. If that is not good enough use std::span or mdspan https://github.com/kokkos/mdspan

@woodard
Copy link
Collaborator Author

woodard commented Aug 26, 2021

Note: libabigail doesn't notice this problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant