Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7z: failed SFX extraction due to sfx_min_addr and sfx_max_addr #2075

Open
mehrabiworkmail opened this issue Feb 26, 2024 · 4 comments · May be fixed by #2088
Open

7z: failed SFX extraction due to sfx_min_addr and sfx_max_addr #2075

mehrabiworkmail opened this issue Feb 26, 2024 · 4 comments · May be fixed by #2088

Comments

@mehrabiworkmail
Copy link

mehrabiworkmail commented Feb 26, 2024

When attempting to use libarchive to extract a corpus of malicious SFX files, I found that libarchive fails to extract many SFX files that the 7z tool can extract just fine. I investigated the issue and found the following to be one of the causes.

In archive_read_support_format_7zip.c, two macros, namely SFX_MIN_ADDR and SFX_MAX_ADDR, place limits on the addr range for compressed data in SFX 7z files.

image

It is understandable that these range checks were added to contain the performance penalty of the header search. However, they come with the drawback that libarchive might fail to process certain 7z SFX files that have the compressed data at addresses outside the hard-coded ranges. Unfortunately, these limits seem to be not well-documented.

A suggestion is to add an optional compiler flag to disable the range checks, if the user so chooses, with the understanding that this could come with a performance hit.
image

In this patch, I chose the values to basically cover pretty much the entire possible address range (starting from the second 4k page, which is typically start of the first PE section, all the way to the max possible RVA value in a PE file).

Please let me know if you have other suggestions.

@kientzle
Copy link
Contributor

In #2048, @dzwdz proposed a way to improve our recognition and handling of SFX archives. Would that address the issue here?

Setting SFX_MAX_ADDR to 0xffffffff as you suggest here would allow a malicious archive to force libarchive to allocate 4GB in the __archive_read_ahead call in the snippet above. That's problematic, as libarchive is being used in programs that are subject to malicious attack. The better approach is to change the search you showed above to work incrementally, reading 64k or so at a time, then consuming that and reading the next 64k until it finds the relevant header. That would allow you to eliminate this limit without requiring the ability to allocate huge amounts of memory.

@mehrabiworkmail
Copy link
Author

Thanks for the feedback, @kientzle. I took a look at @dzwdz's solution and from what I can see it uses the following approaches to find the Rar! header:
1- Checking the bytes at a fixed offset (0x17888) in the SFX file -- Works because the decompressor is a fixed piece of code that gets prepended to the compressed data (so the compressed data is stored as PE overaly)
2- Checking what comes after the rsrc section in the PE -- Works because rsrc is often the last PE section, so whatever comes after it is the PE overaly.

I believe none of these solutions are robust enough to be fully-future proof, because (1) the size of the decompressor code might change in future updates to the legitimate decompressor code, or worse, the decompressor code might be modified by a malicious actor changing the offset , and (2) here is no guarantee based on PE spec that rsrc is the last section, it is just a common convention that might be neglected by malicious files. The same issues apply for SFX 7z as well. I actually find the current solution based on SFX_MIN_ADDR and SFX_MAX_ADDR to be more robust because it makes fewer assumptions about the structure of the SFX file (e.g., that rsrc is the last section). The problem is that the chosen values are too conservative, expectedly leading to some misses.

As for the mem alloc, I might be wrong but I think it depends on the type of filter used. In my testing, I provided custom reader and seeker functions to libarchive, so the never actually consumed more than a window (64k) worth of mem at a time. My understanding is that SFX_MAX_ADDR is added just to limit the search range and prevent libarchive from spending a long time processing a huge PE file to no avail, and not to limit mem usage (would love to get more clarity on this from the code author).

So, what we need is to find a way to pick better values for SFX_MIN_ADDR and SFX_MAX_ADDR. I suggest the following solution:
1- Remove SFX_MIN_ADDR. Instead, in a few lines of code, parse the section table of the PE file to find the location of the overlay. Always start the search from PE overlay, instead of a fixed starting point as in SFX_MIN_ADDR.
2- Make SFX_MAX_ADDR an offset from the start of the overlay section, instead of the start of the PE file.

These changes will make libarchive faster to locate the Rar/7z header (because it reduces the search range) and it will reduce the number of misses.

Would love to know what you (and other contributors) think. I'll spend some time implementing this solution to test and see how well it works.

@kientzle
Copy link
Contributor

kientzle commented Mar 6, 2024

Those sound like very promising approaches. Please let us know how it works out!

Note: The Zip reader has two different bid functions that use different strategies. It sounds like that approach could be useful here as well.

@mehrabiworkmail
Copy link
Author

mehrabiworkmail commented Mar 8, 2024

Hello again @kientzle. I created a pull request to fix this issue: #2088

For PE files, I implemented the approach I described above. I ended up implementing a similar solution for ELF as well. My tests show the solution works well.

I added 3 test cases to demonstrate the problem and show the effectiveness of the solution (actually, libarchive had no 7z SFX test cases before). The tests are:
1- Unmodified 7z SFX PE created using 7z tool on Windows: libarchive succeeds in extracting it
2- Modified 7z SFX PE: libarchive fails because the 7z signature falls outside the hard-coded serach range
3- Unmodified 7z SFX ELF created using 7z tool on Ubuntu 22.04: libarchive fails to extract because the 7z signature falls outside the hard-coded ranges

7z tool is able to extract all 3 files no problem. And after applying my solution, so can libarchive.

Once there is agreement amongst the contributors about the approach, we can use it in rar, rar5, and zip SFX extraction as well.

Plz let me know of any feedback. Tnx.

@mehrabiworkmail mehrabiworkmail linked a pull request Mar 8, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants