Wrong assumption while parsing rich header #394

plusvic · 2024-03-04T17:49:54Z

pefile assumes that the 12 bytes following the DanS tag are actually three copies of the 32-bits XOR key used for encrypting the rich header, and it does some validation based in this assumption:

pefile/pefile.py

Lines 3429 to 3430 in d86b96e

    
           if data[0] ^ checksum != DANS or data[2] != checksum or data[3] != checksum: 
        
               return None

The origin of this assumption seems to be this article: http://www.ntcore.com/files/richsign.htm, and YARA had this same issue. However, while rewriting the parsing logic in YARA-X I came across this other article that suggests that those 12-bytes are actually padding: https://bytepointer.com/articles/the_microsoft_rich_header.htm.

In YARA-X I treated those bytes as padding, and now, while comparing YARA and YARA-X results side-by-side, I found that file 043066108b68b30fc2c475eae8edfafc080be7d451600eaa283d2c750bddbceb proves that the padding theory is correct. The padding in that file was not initially filled with zeroes, therefore the 12 bytes after the DanS tag don't contain copies of the XOR key.

I had to fix this same issue in YARA: VirusTotal/yara@4793b49

The text was updated successfully, but these errors were encountered:

j-t-1 · 2024-06-05T14:21:16Z

def parse_rich_header(self):
    """Parses the rich header
    see http://www.ntcore.com/files/richsign.htm for more information
    
    Structure:
    00 DanS ^ checksum, checksum, checksum, checksum
    10 Symbol RVA ^ checksum, Symbol size ^ checksum...
    ...
    XX Rich, checksum, 0, 0,...
    """

All bytes between "DanS" (inclusive) and "Rich" (exclusive) are XORed with the checksum, but nothing else is, in particular the padding after the checksum is not.

The checksum should probably be obtained as the four bytes after "Rich"; then the following better check could be done:

if data[0] ^ checksum != DANS or data[1] != checksum or data[2] != checksum or data[3] != checksum:
    return None

The Rich header is not documented, but Microsoft being the originator does use the above format, so it is useful. I have done a PR #402 to make it a warning.

Also, is the file you reference WIMA_SFX.EXE?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong assumption while parsing rich header #394

Wrong assumption while parsing rich header #394

plusvic commented Mar 4, 2024

j-t-1 commented Jun 5, 2024 •

edited

Wrong assumption while parsing rich header #394

Wrong assumption while parsing rich header #394

Comments

plusvic commented Mar 4, 2024

j-t-1 commented Jun 5, 2024 • edited

j-t-1 commented Jun 5, 2024 •

edited