-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial auto-sync LoongArch support #2349
base: next
Are you sure you want to change the base?
Conversation
Not finished yet, but hope to see some suggestions from @Rot127. The pr can already disassemble some instructions: $ test_loongarch
****************
Platform: loongarch32
Code:0x0c 0x00 0x08 0x14 0x8c 0xfd 0xbf 0x02
Disasm:
0x1000: lu12i.w $t0, 0x4000
op_count: 2
operands[0].type: REG = t0
operands[1].type: IMM = 0x4000
0x1004: addi.w $t0, $t0, -1
op_count: 3
operands[0].type: REG = t0
operands[1].type: REG = t0
operands[2].type: IMM = 0xffffffffffffffff
0x1008:
****************
Platform: loongarch64
Code:0x80 0x80 0x00 0x40 0x63 0x80 0xff 0x02 0x78 0x20 0xc0 0x29 0x00 0x84 0x00 0x01 0x00 0xa4 0x14 0x01
Disasm:
0x1000: beqz $a0, 0x80
op_count: 2
operands[0].type: REG = a0
operands[1].type: IMM = 0x80
0x1004: addi.d $sp, $sp, -0x20
op_count: 3
operands[0].type: REG = sp
operands[1].type: REG = sp
operands[2].type: IMM = 0xffffffffffffffe0
0x1008: st.d $s1, $sp, 8
op_count: 3
operands[0].type: REG = (null)
operands[1].type: REG = sp
operands[2].type: IMM = 0x8
0x100c: fadd.s $fa0, $fa0, $fa1
op_count: 3
operands[0].type: REG = fa0
operands[1].type: REG = fa0
operands[2].type: REG = fa1
0x1010: movgr2fr.w $fa0, $zero
op_count: 2
operands[0].type: REG = fa0
operands[1].type: REG = zero
0x1014: The memory operand need to be handled, and more tests are required. |
f4f66e2
to
6cd0735
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing job! Thanks a lot to you and @FurryAcetylCoA.
An advice if you are uncertain how to do some of the things below. You can look at the ARM or PPC module. Don't check AArch64 (too complex), Alpha, TriCore (does not always follow the auto-sync design for good reasons) or the other modules (they are heavily outdated).
Now, the biggest things missing are:
- The details should be filled in by an
add_cs_detail()
function inLoongArchMapping.c
(see comments).- The access information is still missing. Again, you can take a look at the ARM or PPC module, how it is done there.
- Instruction groups are still missing as well if I didn't overlooked it.
- The generated enums in the header file (register and instructions) should not be copied by hand, but via the
HeaderPatcher.py
. Just place the comments into the header file and run the script on it. - Delete unused
inc
files. - Regarding testing:
- Disassembler tests are taken from
llvm/test/MC/
andllvm/test/MC/Disassembler/
, mostly copied by hand into the capstone repo (suite/MC/
). I write a script currently to extract them automatically. I'll notify you when it's done. - Testing the details (luckily not difficult for LoongArch), can be done in your case by adding test cases to
suite/cstest/issues.cs
. Just ensure that every possible execution path for register, immediate and memory operand details are taken.
- Disassembler tests are taken from
Once these things are done we are almost there. But we should do a second round of review then.
suite/auto-sync/src/autosync/cpptranslator/patches/QualifiedIdentifier.py
Outdated
Show resolved
Hide resolved
suite/auto-sync/src/autosync/cpptranslator/patches/TemplateParamDecl.py
Outdated
Show resolved
Hide resolved
c02fcbd
to
d17db4f
Compare
Regarding the CI:
|
ce9dd99
to
1d37d04
Compare
Features added:
However, in LoongArch assembly, memory operands are not special i.e. they are normal register/immediate operands. I am unsure how to create MEM operands. |
Indeed a little tricky. I checked the Here is the code where we generate it for PPC. In your case you can do it just like for PPC there. All LoongArch instructions seem to be derived from the class In the end it should look something like this: In typedef struct {
loongarch_insn_form form;
loongarch_mem_access maccess; // LOONGARCH_MEM_LOAD, LOONGARCH_MEM_STORE, LOONGARCH_MEM_NONE
} loongarch_suppl_info; // add this to the union in Mapping.h A generated entry in {
/* <mnemonic> */
LOONGARCH_LOAD.... /* 337 */, LOONGARCH_INS_LOAD...,
#ifndef CAPSTONE_DIET
{ 0 }, { 0 }, { 0 }, 0, 0, {{ LOONGARCH_INSN_FORM_2RI12, LOONGARCH_MEM_LOAD }}
#endif
}, Now, when you add the details about the operands in Would this work? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update also the README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice so far!
I think the only thing left are the details of the memory operands and the instruction groups?
Afterwards we only need to run clang-format and fuzz it.
Thanks! I have added code for memory operands fixup and instruction group detection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply. It looks really good.
I would propose the following:
- Add LoongArch to the
.github/labeler.yml
- Please write a summary of the PR into the description. What capabilities the module has (supported operands, ISA version used and a link to it and on what version of LLVM it is on, and whatever details you would like). We use them for the squash merge later.
- Last actual thing is fuzzing. We need to be sure the code doesn't segfault as it is. Follow point 3 from the comment here: Add HPPA(PA-RISC) architecture #2265 (review)
- Lastly please run
clang-format
on all LoongArch files, if you haven't done yet. There is a.clang-format
file in the repos root dir you can use.
Then I would consider it done. I will merge the llvm-capstone update after AArch64 is ready. Then we can rebase this one onto the newest next. Afterwards the Auto-Sync CI job should also succeed.
If everything is green, we can merge.
@XVilka @kabeor You might want to review this as well. Since everything is implemented which we need.
@jiegec @FuzzySecurity Before I forget. Do you have any feedback working with Auto-Sync? I would appreciate any comments and feedback on it. Especially, what to make better, what was difficult to work with and where to improve. |
Due to capstone-engine/llvm-capstone@ee2e109 please generate the Disassembler tables again or just add the one line by hand. |
@jiegec Did you already find time to run the fuzzing? The PR is almost done. |
Thanks, I have run the following fuzzing tests without crashes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the typo. Besides that, I approve it.
Thank you! Awesome job!
@kabeor Please review. I would finish the AArch64 PR, update llvm-capstone, retrigger the CI here to ensure everything is green and then we can merge this one.
- Accompanied llvm changes: capstone-engine/llvm-capstone#45 - MC Tests are generated from llvm - Instruction groups are implemented - Register accesses are implemented - Memory operands are handled for memory instructions - Code are formatted using clang-format of LLVM 17 Co-authored-by: CoA <1109673069@qq.com>
// generated content <LoongArchGenCSInsnFormatsEnum.inc> begin | ||
// clang-format off | ||
|
||
LoongArch_INSN_FORM_PSEUDO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please indent this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we need capitalized letters too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we need capitalized letters too.
The name is concatenated like ${ARCH}_INSN_FORM_${FORM}
, where ARCH
comes from LLVM which uses LoongArch
instead of all-capitalized letters of LOONGARCH
. @Rot127 What's your opinion on this? We will need to change the capstone printer in llvm if required.
|
||
/// Operand type for instruction's operands | ||
typedef enum loongarch_op_type { | ||
LoongArch_OP_INVALID = CS_OP_INVALID, ///< Invalid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all other architectures have the same naming convention, that is to use capitalized letters for enums.
can you change this to follow that?
static csh handle; | ||
|
||
static void print_string_hex(const char *comment, unsigned char *str, | ||
size_t len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please have the closing bracket {
on the new line.
#include <capstone/platform.h> | ||
|
||
struct platform { | ||
cs_arch arch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use tabs for indentation, not spaces
excellent work, thank you for your effort! just few comments on coding convention, please see my comments. |
Your checklist for this pull request
Detailed description
Add loongarch support to auto-sync. Co-authored by @FurryAcetylCoA.
See capstone-engine/llvm-capstone#47 for llvm changes. The generated code have small changes after generated via:
Test plan
...
Closing issues
...