Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hungarian Notation] Variables with different names but same labels are renamed to the same variable. #317

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

github-actions[bot]
Copy link
Contributor

closes #309

@github-actions github-actions bot added bug Something isn't working priority-high High priority issue labels Aug 15, 2023
@NeoQuix
Copy link
Collaborator

NeoQuix commented Aug 15, 2023

To decide:

  1. Same name for every variable or custom ones:
  • e.G. var, tmp and every other identifier get the same name ==> Var
  • or custom ones: var => Var, tmp => Tmp, ...
    If for custom names:
  • is every custom name seperated by a _ or how can i reliably extract the name?
  1. Same counter for every variable or one counter for every "type":
  • e.G. var_0 => Var0, tmp_0 => Var1 ...
  • or var_0 => Var0, tmp_0 => Tmp0
  1. Prefix for invalid stuff
  • invalid stuff will sometimes get nothing as a prefix. Maybe change it, to a "default" value like unk?

Spartak Ehrlich added 2 commits August 15, 2023 10:29
@NeoQuix
Copy link
Collaborator

NeoQuix commented Aug 15, 2023

Also: Substitute variables in a batch (dictionary) instead of iterating over AST every time

@NeoQuix
Copy link
Collaborator

NeoQuix commented Aug 23, 2023

Will only go once through the whole tree, but will still call n times replace_variable for all n replacement pairs in the nodes.
==> a bit faster, but not really

If the nodes would know the variables in their instructions, it could be faster.

@NeoQuix NeoQuix marked this pull request as ready for review August 23, 2023 09:12
for var in self._variables:
names[var] = var.copy(self.getVariableName(var))

self._ast.replace_variables_in_subtree(self._ast.root, names)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to use the substitute here and not the replace function Hendrik is using?

My point is the following:

The variables var1 = Variable("var1", int, None, False, Variable("rax", int, 5), "some tag") and variable var2 = Variable("var1", int, None, False, Variable("rax", int, 7), "some other tag") have the hash and are equivalent, although some values differ.

Now, if you find variable var1 before variable var2, then each occurrence of var2 is also replaced by variable Variable("iVar3", int, None, False, Variable("rax", int, 5), "some tag").

There are two options, depending on what we want resp. need:

  1. We say from this point on, we do not need the tags and old SSA-Variable names. Then we can ignore this information when creating the replacement variable, and both would be replaced by the same variable anyway.
  2. We really only change the name, but then each variable need to be replaced by its unique new variable, i.e., we really check that we have the same object.

In general, I would prefer the second, because we do not know whether we need it later, and it is more general. However, then I think your approach is not the most efficient one, because as soon as we find a variable, it would make sense to replace it in its expression.
However, when we decide that we do not need this information after this stage, you only have to update line 56, i.e., remove the unnecessary information when copying the variable.

Copy link
Collaborator

@NeoQuix NeoQuix Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get your point.
The old algorithm, which simply renamed every variable (var.name), did it like in the second approach.
But my problem was, how messy variables are handled.

I have no grantee if a variable is a real copy of some other variable, or not.
For example: Let's say we collected three times var_0, where the first and third are the same object and the second is a real copy.

If I would iterate over every variable and simply assign the .name field, the third variable would change as I'm changing the first one.
Afterwards, as the third one is being processed, if will again be renamed, destroying the name completely.
Using a set, does not help, because as you said, their hashes are the same, which results in missing collected variables.

The only other thing i could do, is to keep a list of ids of variables which I already renamed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we are back to the point that pseudo should be immutable.

Okay, but I think Hendrik did exactly this, considering the ids when replacing. So this is probably the cleanest way. Collecting variables according to their ids, assigning the new names and replace them with this new visitor.

Seems very complicated for such an easy thing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look into his code/ask him today and probably block this request until his code is in main.

And YES, I HATE IT.
Someone would think that renaming variables would be an easy task, but nope.
Hopefully with the id's everything works.

assert node.instructions[0].destination.name == "dLoopbreak0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether dLoopBreak0 would be easier to read. Perhaps whenever we have a separation, the first letter should again be capitalized, but I am not sure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm i get your point. We can talk about shortly tomorrow. Will make it like requested (but it's a bit ugly)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not saying that this is better, it was a more open question. If this is the simplest and cleanest way, I am ok with it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was a bit unclear, I like you idea better. dLoopBreak0 is way easier to read then dLoopbreak0.

My problem was rather: "How do i cleanly capitalize the char after a removed one?" (with numbers/other corner cases included). Don't know if there is a better way, maybe with RE or something.

Spartak Ehrlich added 2 commits August 30, 2023 11:16
…_different_names_but_same_labels_are_renamed_to_the_same_variable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority-high High priority issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Hungarian Notation] Variables with different names but same labels are renamed to the same variable.
2 participants