Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Disassembler #89

Merged
merged 17 commits into from
Jul 12, 2023
Merged

Add Disassembler #89

merged 17 commits into from
Jul 12, 2023

Conversation

wnienhaus
Copy link
Collaborator

This PR adds a disassembler for ESP32 ULP binaries. (See docs/disassembler.rst for details)

It can disassemble ULP binaries as well as snippets of hex bytes (e.g. from an xxd output) into ULP instructions.

This tool was built primarily for making debugging of the assembler easier but may be useful for other use cases.

Note, that instructions printed by the disassembler show values according to what is encoded into the actual binary instruction, not what was originally specified as a value during assembly. For example JUMP instructions take an offset in bytes during assembly, whereas the binary instruction contains the offset as number of words (bytes divided by 4). The disassembler will show number of words, not number of bytes for JUMP instructions.

The work-horse code of this disassembler already exists for some time (I used it when implementing #50) and this PR now cleans it all up and makes it into a useable tool.

I am already using this to help with implementing S2 support (#85)

Pass bytes from a hexdump in as command line arguments, eg:

micropython -m tools.disassemble 401f 0040

(If the byte sequence is not quoted, all args are joined together
into a single byte sequence. Spaces are allowed and will be ignored)
In this approach, each opcode has its own decoding (using the correct
struct for each opcode). Each opcode (or opcode+subopcode) also has
its own rendering function.

The lookup table is hierarchical so the same structure used for opcodes
is also used within opcodes for looking up subopcodes.
Useful for running just one unit test file instead of all.

Now one can pass the name of a unit test (or a list of names)
to the 00_unit_tests.sh script.

Example:
  cd tests
  ./00_unit_tests.sh disassemble  # run only disassemble.py

The default (if nothing is passed the script) is still to run
all tests as before.
These are likely memory left empty for storing data.
The original "manual disassembling" now requires the "-m" option,
followed by the sequence of hex digits representing the instructions.

The sequence of hex digits does not need to be quoted. All parameters
after -m will be joined together into a sequence of hex digits.
Now the instruction (hex) and disassembled code will appear on one line
next to each other and the bytes are no longer printed with Python
specific formatting (not wrapped in b''). This results in a much cleaner
looking output.

Example output:

40008072  MOVE r0, 4
010000d0  LD r1, r0, 0
Offsets are in number of bytes (matches how 'GNU as' outputs listings)
If the magic bytes in the header are not 'ulp\0' then the file
is not a ULP binary or otherwise corrupt.
Some values are easier to read as hex values than as decimal.
For example peripheral register addresses like 0x123 where the
first digit (1) indicates which peripheral register to address,
while the remaining 2 digits (0x23) are the offset within that
register in number of 32-bit words.

Also absolute JUMP addresses are easier to find via the hex value
given that the disassembler includes the byte offset of each
instruction in hex format.
@wnienhaus
Copy link
Collaborator Author

@ThomasWaldmann If you have a bit of time, I'd value your feedback. You have very critical eye (in a positive sense). But I know you're not spending any time on this project anymore, so no hard feelings if you pass.

Btw, ESP32-S2 support is basically done and I'm busy cleaning that up, so soon there will be something more useful to look at.

@wnienhaus wnienhaus force-pushed the disassembler branch 2 times, most recently from e9ad923 to eac277b Compare July 2, 2023 20:07
Test both disassembling a file (assembled from source for the test),
and disassembling a byte sequence provided on the command line.

Source code to be assembled and expected disassembler listings are
provided in the tests/fixtures directory.
@@ -0,0 +1,320 @@
from uctypes import struct, addressof, LITTLE_ENDIAN, UINT16, UINT32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding an MIT license and copyright to this file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that comment. Would you suggest generally adding this to all files in the repo? Or is there something about the disassembler specifically that makes it better to add here?

We currently have a LICENSE file in the repository root stating this is licensed under MIT and copyright by those listed in the AUTHORS file.

I see Micropython itself has the MIT licence at the beginning of all its files. I guess that is better, in case a file is distributed on its own somehow. Perhaps I'll make a commit separate to the PR to add the licence and copyright to all files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you suggest generally adding this to all files in the repo?

Yes. That makes it very clear for anyone who copies the file what the license/copyright is.

You can add a short header using the SPDX-License-Identifier format. Or a long one like in the micropython repo.

Perhaps I'll make a commit separate to the PR to add the licence and copyright to all files.

That sounds good!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'ld rather dislike having the same (long) license text as a header in each file.

If there is a header in a file, I'ld rather expect it to roughly tell what's inside that file and after that there could be also a short notice identifying the license.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so perhaps we'll go the SPDX-License-Identifier route? I'll look at how that works, and how to add the copyright notice.

@wnienhaus
Copy link
Collaborator Author

I will now merge this PR. I have moved the License header topic to a new issue (#90) where we can discuss this further.

@wnienhaus wnienhaus merged commit 5c4d016 into micropython:master Jul 12, 2023
1 check passed
@wnienhaus wnienhaus deleted the disassembler branch August 5, 2023 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants