Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way for users to specify asm/decompiled source intermingled #1161

Open
uxmal opened this issue Mar 1, 2022 · 2 comments
Open

Provide a way for users to specify asm/decompiled source intermingled #1161

uxmal opened this issue Mar 1, 2022 · 2 comments
Labels
code-generation This issue is about the Reko's output code enhancement This is a feature request

Comments

@uxmal
Copy link
Owner

uxmal commented Mar 1, 2022

Several users have requested being able to intermingle assembler language and decompiled output, like this:

    eax_1 = foo(ecx);
;   push ecx
;   call foo
;   add esp,4
    eax_3 = eax_1->dw0004;
;   mov eax,[eax+4]

Consider supporting this with both a command line switch and a Reko project setting.

@uxmal uxmal added enhancement This is a feature request code-generation This issue is about the Reko's output code labels Mar 1, 2022
@DeptOfMeteors
Copy link

reko
The Rolls-Royce of solutions would involve providing some linkage between these two panes. Right now they are side-by-side, but it's difficult to see the connection between them. Failing that, I guess an objdump-like solution wouldn't be the end of the world. It makes for difficult reading, but after a while it becomes comprehensible.
In my case, these are the problems I'm wresting w/:

  1. Expressions like Mem1001[ds:(di_1415 + 0x01) *s 0x11 + si_1416 + (fp - 0x0138):byte] make some important code indecipherable. Knowing the assembly behind it would help that.
  2. I find a lot of SLICE(), CONVERT() and SEQ() calls, but these aren't defined.
  3. Then there's code that looks ok, but is definitely wrong. The executable I'm using produces BMP files, which means there has to be something that writes 0x42 and 0x4D ("BM") to a file, but those bytes aren't mentioned anywhere in the generated source code
    The executable I'm trying to decompile is a 16-bit MS-DOS program.

@Elthial
Copy link

Elthial commented Jul 10, 2022

@DeptOfMeteors I'm also working on a 16-bit MS-DOS application and can probably provide some hints.

1/ The Mem1001 expression is broken on the (fp - 0x0138) part which makes it pointless to decipher in HL. There is missing information which you'd have to backtrack the ASM [bp - <value>] and backtrack up the code to find the missing info / value.

The (fp - <value>) are often Reko losing track of a variable further up in the code. I've started a request #1188 for bug fixing these issues.

Be aware, when that fp - <value> turns up is because there is a mistake further up in the code. In my experience this is usually bad loop code assigning a pointer offset to the loop start variable instead of starting 0x00 but other things could occur.

2/ These are functions that Reko uses and you're probably really familiar with them in your 16 bit MS-DOS code because we jump between 32bit, 16bit and 8bit values like a pogo stick.

A lot of theses SLICE() and CONVERT() calls are because one of the variables in the code is the wrong size and then it's trying to resize it to fit into another variable.

SEQ will be how Reko handles far pointers being assembled.
You'll note the first value is ds or a segment pointer.
The second value will be an offset pointer for the above segment.

You can search reko code for the defs:

SLICE:

/// <summary>
/// A slice is an improper subsequence of bits. Commonly used to isolate
/// a byte register from a wider word- or dword-width register.
/// </summary>
Slice(DataType dt, Expression i, int bitOffset)

CONVERT:

/// <summary>
/// Makes an instance of the <see cref="Conversion"/> class.
/// </summary>
/// <param name="exp">Expression to convert.</param>
/// <param name="dtFrom">Data type converting from.</param>
/// <param name="dtTo">Data type converting to.</param>
Conversion(Expression exp, DataType dtFrom, DataType dtTo)

SEQ:

/// <summary>
/// Generate a concatenated sequence of values. Use this to express
/// values that are too long to fit in a machine register, or to model
/// segmented pointers on architectures like the x86.
/// </summary>
/// <remarks>
/// This method is ised for the very common case of a two-element
/// sequence, especially in contexts where x86-style segment:offset
/// pairs exist.</remarks>
/// <param name="head">Most significant part of value.</param>
/// <param name="tail">Least significant part of value.</param>
/// <returns>A value sequence.</returns>
Seq(Expression head, Expression tail)

3/ Are you compiling reko direct from source or using the latest release?

Sometimes reko misses code or merges it together resulting in missing variables.
An example is I was noticing was a position procedure with PosX, PosX instead of expected PosX, PosY.
The entire PosY variable and code disappeared / merged into PosX.

There was a recent bug fix for 16 bit MS-DOS that improved my code quality and resolved the above issue.
However the fix is recent and is not yet in any of the releases, it's only available if you build the source.

In the end, you might need to read through the code and mark suspicious parts of your code.
Then come back and review the ASM in that section to see if it matches the HL.

If you find consistent errors in reko HL then report then so we can improve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code-generation This issue is about the Reko's output code enhancement This is a feature request
Projects
None yet
Development

No branches or pull requests

3 participants