Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structs with same name in different compilation units #4421

Open
rockrid3r opened this issue Apr 6, 2024 · 6 comments
Open

Structs with same name in different compilation units #4421

rockrid3r opened this issue Apr 6, 2024 · 6 comments

Comments

@rockrid3r
Copy link
Contributor

Work environment

Questions Answers
OS/arch/bits (mandatory) Debian amd64
File format of the file you reverse (mandatory) ELF
Architecture/bits of the file (mandatory) amd64
rizin -v full output, not truncated (mandatory) rizin 0.8.0 @ linux-x86-64 commit: 9a27c4f

Consider this example:

/* main.c */

struct A {
    int a;
    int b;
};

struct A a;

int main() {
    return 0;
}

and

/* other.c */
struct A {
    int a;
};
struct A b;

Now consider compiling it with

gcc main.c other.c -g

and opening it in rizin:

rizin a.out

Then the tp command (tp prints the type)
results in

[0x00001040]> tp A
 a : 0x00001040 = 2303323441
 b : 0x00001044 = 2303221457

so clearly it uses the definition of struct A from main.c without complaining/warning that it is also defined in other.c.

Steps to reproduce the behavior

  • You can compile it yourself or use the provided binary

Additional Logs, screenshots, source code, configuration dump, ...

a.out.tar.gz

@rockrid3r
Copy link
Contributor Author

There is a another scenario though:

/* header.h */
struct A {
    int a; 
    int b;
}
#include "header.h"
/* main.c */
struct A a;
int main() {
    return 0;
}
#include "header.h"
/* other.c */
struct A b;

so both compilation units main.c and other.c has the same (included) definition of struct A.

This example shows we can't just name those structs differently or create a local storage within a compilation unit or name the respectively to compilation units like main.c.A and other.c.B

@XVilka XVilka added this to the 0.8.0 milestone Apr 6, 2024
@XVilka XVilka added the RzType label Apr 6, 2024
@rockrid3r
Copy link
Contributor Author

rockrid3r commented Apr 6, 2024

100% bug

If you use the avgp function (which prints the type of variable) on b you get this:

[0x00001040]> avgp b
 a : 0x00004020 = 0
 b : 0x00004024 = 0

rizin is using incorrect type for variable b.

So this is clearly a bug and can not be skipped.

@rockrid3r
Copy link
Contributor Author

rockrid3r commented Apr 9, 2024

Okay so here is my main point:

  • There is an efficient way to verify if 2 structs with the same name are equal or not.

There are 2 different causes of the same struct name in 2 different compilation units:

  1. including the same header file (which contains the struct definition) -- structs are the same
  2. just name collision -- structs are different (most probably)

The dwarf DOES contain the file info from which the struct came from (where struct was declared).

So in 1st case those structs will have DW_AT_decl_file pointing to the same source file (the header file).
In 2nd case the DW_AT_decl_file will point to different files.

Probably it is the solution. I will release the PR soon.

@DMaroo
Copy link
Member

DMaroo commented Apr 11, 2024

Hey @rockrid3r, I also had a look at it in the meantime. I would suggest using the compile unit offset and name, defined here:

typedef struct rz_bin_dwarf_comp_unit_t {
ut64 offset;
RzBinDwarfCompUnitHdr hdr;
RzVector /*<RzBinDwarfDie>*/ dies;
const char *name;
const char *comp_dir;
const char *producer;
const char *dwo_name;
DW_LANG language;
ut64 low_pc;
ut64 high_pc;
ut64 stmt_list;
ut64 str_offsets_base;
ut64 addr_base;
ut64 loclists_base;
ut64 rnglists_base;
} RzBinDwarfCompUnit;

This is because of the One Definition Rule (a.k. ODR). There can only be one definition of a struct/union/enum type inside a single compilation/translation unit. So it makes more sense to use that to classify types. Also, DW_AT_decl_file is a DWARF attribute, whereas the compilation units are a part of the DWARF tree. Hence, someone could theoretically strip out all the file name attributes, but they cannot remove the compile units from the tree without invalidating the DWARF info.

If I were to go about it, I would create a new type as follows.

typedef struct rz_base_type_with_metadata_t {
	RzBaseType *base_type;
	ut64 cu_offset;
	char *cu_name;
} RzBaseTypeWithMetadata;

Here cu_name and cu_offset are the name and offset of the compilation unit. This way, you can modify the existing API and IPI to return a vector (preferably a RzPVector<const RzBaseTypeWithMetadata *>) of such base types with metadata whenever finding the type with the name. However, the issue with this is that it would be a very vast change. Just looking up the places where rz_type_db_get_base_type (the same function which gets called when running the tp command) is referenced/called, we'd find that many other files in Rizin rely on this function and it's current behavior (which is returning a single RzBaseType). So now we have 2 solutions:

  1. Add a new function which returns RzPVector<const RzBaseTypeWithMetadata *>. Call this new function (by replacing calls to rz_type_db_get_base_type) only where it makes sense (like tp), and leave other calls unchanged. This will obviously lead to inconsistent behavior and there will be two redundant functions, but is the easy/simple way out (like a quickfix).
  2. Modify rz_type_db_get_base_type to return RzPVector<const RzBaseTypeWithMetadata *> and modify all the callers to handle the returned vector appropriately. This is a more general solution, however you may end up touching a lot more files than in the case of solution 1.

I would advise you to go ahead with solution 2 and open a draft PR, so that the maintainers can track the changes. Since solution 2 will be a big overhaul, it'd be nice to be on the same page since the starting changes itself.

@rockrid3r
Copy link
Contributor Author

@DMaroo respect. Good research.

Most probably will proceed with 2.
Will open draft PR soon.

@rockrid3r
Copy link
Contributor Author

@DMaroo it looks like RzBinDwarfCompUnit info is not enough to define type scope. The DW_AT_decl_file is necessary since the struct may be included from header file. Header file won't appear as RzBinDwarfCompUnit because it is not a compilation unit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants