Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong assumption while parsing rich header #394

Open
plusvic opened this issue Mar 4, 2024 · 1 comment
Open

Wrong assumption while parsing rich header #394

plusvic opened this issue Mar 4, 2024 · 1 comment

Comments

@plusvic
Copy link

plusvic commented Mar 4, 2024

pefile assumes that the 12 bytes following the DanS tag are actually three copies of the 32-bits XOR key used for encrypting the rich header, and it does some validation based in this assumption:

pefile/pefile.py

Lines 3429 to 3430 in d86b96e

if data[0] ^ checksum != DANS or data[2] != checksum or data[3] != checksum:
return None

The origin of this assumption seems to be this article: http://www.ntcore.com/files/richsign.htm, and YARA had this same issue. However, while rewriting the parsing logic in YARA-X I came across this other article that suggests that those 12-bytes are actually padding: https://bytepointer.com/articles/the_microsoft_rich_header.htm.

In YARA-X I treated those bytes as padding, and now, while comparing YARA and YARA-X results side-by-side, I found that file 043066108b68b30fc2c475eae8edfafc080be7d451600eaa283d2c750bddbceb proves that the padding theory is correct. The padding in that file was not initially filled with zeroes, therefore the 12 bytes after the DanS tag don't contain copies of the XOR key.

I had to fix this same issue in YARA: VirusTotal/yara@4793b49

@j-t-1
Copy link
Contributor

j-t-1 commented Jun 5, 2024

def parse_rich_header(self):
    """Parses the rich header
    see http://www.ntcore.com/files/richsign.htm for more information
    
    Structure:
    00 DanS ^ checksum, checksum, checksum, checksum
    10 Symbol RVA ^ checksum, Symbol size ^ checksum...
    ...
    XX Rich, checksum, 0, 0,...
    """

All bytes between "DanS" (inclusive) and "Rich" (exclusive) are XORed with the checksum, but nothing else is, in particular the padding after the checksum is not.

The checksum should probably be obtained as the four bytes after "Rich"; then the following better check could be done:

if data[0] ^ checksum != DANS or data[1] != checksum or data[2] != checksum or data[3] != checksum:
    return None

The Rich header is not documented, but Microsoft being the originator does use the above format, so it is useful. I have done a PR #402 to make it a warning.

Also, is the file you reference WIMA_SFX.EXE?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants