Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm64 adrp instruction disassembly incorrect #22676

Open
JuliusNmn opened this issue Mar 12, 2024 · 13 comments
Open

Arm64 adrp instruction disassembly incorrect #22676

JuliusNmn opened this issue Mar 12, 2024 · 13 comments
Labels

Comments

@JuliusNmn
Copy link

Environment

Tue 12 Mar 2024 03:58:30 PM CET
radare2 5.8.8 1 @ linux-x86-32
birth: git.5.8.8 2023-06-08__14:01:02
commit: ea7f0356519884715cf1d5fba16042bac72b2df5
options: gpl -O1 cs:5 cl:0 make
Linux x86_64

Description

disassembly of adrp instructions does not make sense. take the following code, which initializes a struct:

static union U2  func_9(int64_t  p_10, const int32_t * p_11, union U2  p_12, uint64_t  p_13, int32_t * p_14)
{ /* block id: 196 */
    union U2 l_673 = {0x5A8A51BE3E7C0A4FLL};
...

here's radare2's disassembly:

> aa
> sf sym.func_9
> pdf 
...
│           0x08000828      011d00f9       str x1, [x8, 0x38]          ; arg2
│           0x0800082c      031900f9       str x3, [x8, 0x30]          ; arg4
│           0x08000830      041500f9       str x4, [x8, 0x28]          ; arg5
│           0x08000834      08000090       adrp x8, loc.imp.strcmp     ; loc.imp.printf ; 0x8000000
│           0x08000838      08010091       add x8, x8, 0               ; 0x8000000 ; loc.imp.printf
│           0x0800083c      0001c03d       ldr q0, [x8]                ; 0xe2 ; 226
│           0x08000840      2001803d       str q0, [x9]

addrp takes an offset, the resulting address is calculated relative to the current instruction. i assume that r2 interprets the offset as an absolute address, resulting in the disassembly showing "loc.imp.strcmp".
Is this expected behavior? If not, can someone point me to where I can fix this?
Also, is there a way to make radare2 give me the raw operand value instead of symbols?

ghidra's output makes more sense:

              001007e8 01 1d 00 f9                    str              x1,[x8, #0x38]
              001007ec 03 19 00 f9                    str              x3,[x8, #0x30]
              001007f0 04 15 00 f9                    str              x4,[x8, #0x28]
              001007f4 28 00 00 b0                    adrp             x8,0x105000
              001007f8 08 41 2f 91                    add              x8,x8,#0xbd0
              001007fc 00 01 c0 3d                    ldr              q0,[x8]
              00100800 20 01 80 3d                    str              q0,[x9]

I don't know why the instruction bytes differ for that specific instruction, but I have verified that radare2's output corresponds to the actual file while ghidra's have changed, probably due to a preformed analysis.

Here are the files I used (source and binary), in case anyone is interested.
random_39855.zip

@trufae
Copy link
Collaborator

trufae commented Mar 12, 2024

Try r2 from git

@JuliusNmn
Copy link
Author

JuliusNmn commented Mar 12, 2024

Same behavior on current dev:

./radare2/binr/radare2/radare2 -v                         
radare2 5.8.9 31767 @ linux-x86-64
birth: git.5.8.8-1179-g49ff46d0a1 2024-03-12__16:47:26
commit: 49ff46d0a1717ea7ca19eaa674a2cbb9981ecdba
options: gpl -O? cs:5 cl:2 make

I was able to work around the issue by making radare2 output to json, which gives me the opcode without disassembly.

{
         "offset":134219828,
         "ptr":134217728,
         "esil":"134217728,x8,=",
         "refptr":0,
         "fcn_addr":134219776,
         "fcn_last":134220400,
         "size":4,
         "opcode":"adrp x8, 0x8000000",
         "disasm":"adrp x8, loc.imp.strcmp",
         "bytes":"08000090",
         "family":"cpu",
         "type":"lea",
         "reloc":false,
         "type_num":33,
         "type2_num":0,
         "refs":[
            {
               "addr":134217728, <- 0x80000000
               "type":"DATA",
               "perm":"r--"
            }
         ]
      },

the "refs" attribute is also very revealing here. I believe this is an issue with radare2.

@trufae
Copy link
Collaborator

trufae commented Mar 13, 2024

yes this is "correct" and the expected behaviour, the address in 0x80000 is exactly the same address as loc.imp.strcmp. so its expected because the disassembler filters that string, the emulation shows the final computation of the value which happens in the add instruction after that adrp. and actually the 0x8000 is shown as a comment in the disassembly. SO i dont see the problem here. because the code can be doing adrp without any add and then you wont see which symbol is referencing.

And yes. the value shown by the disassembler is computed by providing the actual instruction position which shows that value. So let me clarify:

  • you want an option to make adrp instructions not be filtered
  • you want to see the absolute value instead of the relative

@JuliusNmn
Copy link
Author

Either way, it can't be a reference to strcmp. adrp is used here to initialize structs with constant values.

The instructinos are from the following function, which doesn't reference strcmp at all:

static union U2  func_9(int64_t  p_10, const int32_t * p_11, union U2  p_12, uint64_t  p_13, int32_t * p_14)
{ /* block id: 196 */
    union U2 l_673 = {0x5A8A51BE3E7C0A4FLL}; <- this is where the adrp happens.
    for (p_12.f4.f0 = 0; (p_12.f4.f0 != 0); p_12.f4.f0 = safe_add_func_int8_t_s_s(p_12.f4.f0, 4))
    { /* block id: 199 */
        uint32_t l_657 = 0xBF1DF1F3L;
        struct S0 l_664 = {0xEF594491L,0x00BDF289L,1L,0x07L};
        struct S0 *l_663 = &l_664;
        struct S0 ** const l_662 = &l_663;
        struct S0 ** const *l_661 = &l_662;
        for (p_10 = 0; (p_10 <= (-27)); --p_10)
        { /* block id: 202 */
            struct S0 **l_666 = &l_663;
            struct S0 ***l_665[2][2][2] = {{{&l_666,&l_666},{&l_666,&l_666}},{{&l_666,&l_666},{&l_666,&l_666}}};
            uint32_t l_670 = 0x85187951L;
            uint32_t *l_669 = &l_670;
            uint8_t l_671 = 0xF5L;
            int32_t *l_672 = &l_664.f0;
            int i, j, k;
            (*l_672) = (l_657 == (~(((safe_mul_func_int8_t_s_s(((void*)0 == &p_12), (l_661 == (l_664.f1 , l_665[1][1][0])))) <= ((safe_rshift_func_int8_t_s_s((p_13 <= (((void*)0 != l_669) > l_671)), p_13)) > p_10)) || p_13)));
        }
    }
    return l_673;
}

Here's how Ghidra disassembles the instruction, in case it's of any help:

Operand x8 0x105000
Labeled x8 0x105000
Type REG SCAL
Scalar   0x105000
Address    
Register x8  
Op-Objects x8 const:0x105000
Operand Mask 00011111 00000000 00000000 00000000 11100000 11111111 11111111 01100000
Masked Value 00001000 00000000 00000000 00000000 00100000 00000000 00000000 00100000

Ghidra's "Basic Constant Reference Analyzer" correctly determines the constant values, and points me to the correct initial values in data.

@trufae
Copy link
Collaborator

trufae commented Mar 13, 2024

for reference, this is the output from objdump:

     7e8:       f9001d01        str     x1, [x8, #56]
     7ec:       f9001903        str     x3, [x8, #48]
     7f0:       f9001504        str     x4, [x8, #40]
     7f4:       90000008        adrp    x8, 0 <main>
     7f8:       91000108        add     x8, x8, #0x0
     7fc:       3dc00100        ldr     q0, [x8]
     800:       3d800120        str     q0, [x9]
     804:       f9400908        ldr     x8, [x8, #16]

which is referencing "main".. what i see, is that those addresses are patched by relocs. and you can solve this if you compile an executable or a library instead of an object file. yes, r2, objdump, and many other tools have issues when processing object files, because relocs and elf objects are a big mess. the same output from objdump of the compiled executable is this:

     fb8:       f9002100        str     x0, [x8, #64]
     fbc:       f9001d01        str     x1, [x8, #56]
     fc0:       f9001903        str     x3, [x8, #48]
     fc4:       f9001504        str     x4, [x8, #40]
     fc8:       b0000028        adrp    x8, 5000 <safe_mul_func_int16_t_s_s+0xc>
     fcc:       913ee108        add     x8, x8, #0xfb8
     fd0:       3dc00100        ldr     q0, [x8]
     fd4:       3d800120        str     q0, [x9]
     fd8:       f9400908        ldr     x8, [x8, #16]

which in r2 looks like this:

  [0x00000f94]> pd 20
            ;-- func_9:
            0x00000f94      ff8304d1       sub sp, sp, 0x120
            0x00000f98      fd7b10a9       stp x29, x30, [sp, 0x100]
            0x00000f9c      fc8b00f9       str x28, [sp, 0x110]
            0x00000fa0      fd030491       add x29, sp, 0x100
            0x00000fa4      a92301d1       sub x9, x29, 0x48
            0x00000fa8      e91300f9       str x9, [sp, 0x20]
            0x00000fac      e90308aa       mov x9, x8
            0x00000fb0      e81340f9       ldr x8, [sp, 0x20]
            0x00000fb4      e21700f9       str x2, [sp, 0x28]
            0x00000fb8      002100f9       str x0, [x8, 0x40]
            0x00000fbc      011d00f9       str x1, [x8, 0x38]
            0x00000fc0      031900f9       str x3, [x8, 0x30]
            0x00000fc4      041500f9       str x4, [x8, 0x28]
            0x00000fc8      280000b0       adrp x8, 0x5000
            0x00000fcc      08e13e91       add x8, x8, 0xfb8           ; str.O_n_
            0x00000fd0      0001c03d       ldr q0, [x8]                ; 0xe2 ; str.O_n_
            0x00000fd4      2001803d       str q0, [x9]
            0x00000fd8      080940f9       ldr x8, [x8, 0x10]          ; 0xe2
            0x00000fdc      280900f9       str x8, [x9, 0x10]
            0x00000fe0      5f0000b9       str wzr, [x2]
[0x00000f94]> s str.O_n_
[0x00005fb8]> pxq 8
0x00005fb8  0x5a8a51be3e7c0a4f                       O.|>.Q.Z
[0x00005fb8]>

so the constant is there, and the root issue is how object files and relocs are handled. there's nothing wrong with adrp in here imho

@trufae
Copy link
Collaborator

trufae commented Mar 13, 2024

a.out.zip

@trufae
Copy link
Collaborator

trufae commented Mar 14, 2024

looking further in this object file i've spotted a bunch of other issues:

will be good to extend the #22690 PR to add support for the remaining arm64 relocs, would you like to give it a try? otherwise as a workaround i would suggest you to compile the executable or library and use it instead of the object file.

thanks for the sample files, it is ok for you if i push them into the testsuite?

@trufae
Copy link
Collaborator

trufae commented Mar 15, 2024

found another issue in the elf imports, all of them are located in the same address because the plt section is not yet available. gonna fix that in another pr too :)
Screenshot 2024-03-15 at 09 50 24

fixed here #22692

@JuliusNmn
Copy link
Author

thanks for the sample files, it is ok for you if i push them into the testsuite?

Sure.
I will see what I can do about #22690

@trufae
Copy link
Collaborator

trufae commented Mar 15, 2024

another improvement here would be to ignore flags pointing to the base address of the loaded module. there's no point on showing a section name if that is not set. but still the right fix to do here would be to support the needed relocs properly. Let me know about adding this binary into the testsuite.

@trufae
Copy link
Collaborator

trufae commented Mar 15, 2024

I will add your sample and add some tests now. Thanks! Let me need if you need some guiding about the relocs. The code is a little broken into different parts that patch, enumerate.. and they are actually stored 3 times. I may find some time after the release to fix that across the 90 bin plugins. But for your needs none of this should affect you and it’s only elf related

@trufae
Copy link
Collaborator

trufae commented Mar 18, 2024

added tests here 8471613 let me know if you need some help for implementing the relocs. feel free to join the discord or telegram channels for more interactions

@trufae
Copy link
Collaborator

trufae commented Mar 31, 2024

ping?

@trufae trufae added the relocs label Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants