Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong file type identification - Python as INI #213

Open
kam193 opened this issue Apr 12, 2024 · 4 comments
Open

Wrong file type identification - Python as INI #213

kam193 opened this issue Apr 12, 2024 · 4 comments
Assignees
Labels
assess We still haven't decided if this will be worked on or not bug Something isn't working identify

Comments

@kam193
Copy link

kam193 commented Apr 12, 2024

Describe the bug
I've come across wrong identification of a Python code. It's identified as INI script, by both AL and the mimetype. My suspicious is that UTF chars used to evade static analyzers tricked the libmagic as well.

To Reproduce

  1. Upload the following file: fc84d9137b1f8a4fea8a6729325333b776049d88782554bfd0d03d9dbda5d881.zip (pass: zippy)
  2. Observe identification as in the picture:

obraz

Expected behavior
Identification should be set as code/python.

Screenshots
As above.

Environment (please complete the following information if pertinent):

  • Assemblyline Version: 4.5.0.10, Extractor (privilege mode): 4.5.0.8
@kam193 kam193 added assess We still haven't decided if this will be worked on or not bug Something isn't working labels Apr 12, 2024
@gdesmar
Copy link

gdesmar commented May 6, 2024

I took a look at it and am impressed by that utf-8 trick.
On the AL side of things, we are trusting libmagic's ini detection.
Have you often seen files being wrongly identified as text/ini?
If I wanted to fix it purely on the Assemblyline side, I would need to:

  • Not trust application/x-wine-extension-ini from libmagic.
  • Add application/x-wine-extension-ini to the possible mime types for python identification.
  • Add another strong identifier since requests.get( is not found because of the encoding.

I was thinking of adding import requests as an identifier, but after testing it, it could also be utf-encoded or use a more complex import like import time,requests as L,uuid,platform as Z and we would be back to square one.
The other downsides is that we won't have text/ini anymore (they will be unknown), and every application/x-wine-extension-ini file is going to be sent to yara for identification, which can slow down your system if you have many of those. On the upside, I don't think we have any official modules that were relying on the filetype text/ini.

Since we are also trusting libmagic's python detection, the (probably better) approach is to see if they could improve libmagic to detect that case too. If I'm right, this would be libmagic's bug tracker. I'll be glad to create a ticket for you, if you do not want to create an account. 🙂

In the meantime, you could apply those three steps to your own instance, but that would usually stop any new improvements that we make to our Identify from being updated to your instance. You would need to reset your changes and reapply them to get the latest changes.

@kam193
Copy link
Author

kam193 commented May 6, 2024

Hey, I think the best way would be to ask libmagic if they could look at it. May I take your offer to create a ticket?

I think I saw some other code detected as ini, but it doesn't happen frequently. I think a change in libmagic may also improve detecting such a trick overall ;)

I took a look at it and am impressed by that utf-8 trick.

It's an old Python feature... which I haven't seen used besides malicious cases yet :D If you were interested, I wrote briefly about the package it comes from on a blog

@gdesmar
Copy link

gdesmar commented May 6, 2024

After a few more tests, it turns out that the utf-8 is well handled by libmagic.
If I convert the dos newlines to unix using dos2unix, the sample is identified as code/python with a magic of Python script, UTF-8 Unicode text executable and a mime of text/x-script.python.
If the first (empty) line is deleted, it would be text/plain and our custom yara rules are failing us because of the requests.get( identifier that is encoded.
I opened ticket 522 and hopefully the libmagic maintainers will be able to find a tweak to the ini/python identification. 🙂

@kam193
Copy link
Author

kam193 commented May 7, 2024

O, nice! Thanks for digging in, it may be than a little easier to fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assess We still haven't decided if this will be worked on or not bug Something isn't working identify
Projects
None yet
Development

No branches or pull requests

4 participants