Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identity: Python obfuscated code identified as text/plain #222

Closed
kam193 opened this issue May 5, 2024 · 4 comments
Closed

Identity: Python obfuscated code identified as text/plain #222

kam193 opened this issue May 5, 2024 · 4 comments
Assignees
Labels
assess We still haven't decided if this will be worked on or not bug Something isn't working identify

Comments

@kam193
Copy link

kam193 commented May 5, 2024

Describe the bug
I came across already a few similar files AssemblyLine isn't able to identify as Python code. The common thing is that the large part of the file is a base64-encoded variable, and there are just a few function calls.

I assume those cases can be difficult to properly identify, but in case you had an idea, two example files (zippy as password, be careful - all wants to do something more or less bad, so please don't run them).

main.py.zip (Type: text/plain Mimetype: text/plain Magic: ASCII text, with very long lines (65515), with CRLF line terminators)
__decompiled_source.py.zip (Type: text/plain Mimetype: text/plain Magic: ASCII text, with very long lines (65515))

To Reproduce
Steps to reproduce the behavior:

  1. Submit one of example files to AL
  2. Observe the filetype set by AL.

Expected behavior
Files should be identified as code/python

Screenshots

Environment (please complete the following information if pertinent):

  • Assemblyline Version: 4.5.19

Additional context

@kam193 kam193 added assess We still haven't decided if this will be worked on or not bug Something isn't working labels May 5, 2024
@gdesmar
Copy link

gdesmar commented May 6, 2024

I added the new executor to our current list. It is obviously a very flimsy approach as a single change to the exec line would stop our identification. If we start amassing enough executors, we'd want to generalize them with a better regex.

@kam193
Copy link
Author

kam193 commented May 8, 2024

There is a nice collection of executors from Datadog: https://github.com/DataDog/guarddog/blob/main/guarddog/analyzer/sourcecode/exec-base64.yml But I don't have any real examples to say how the type recognition is doing.

However, I'd suggest adding another one to the list:
pickle.loads(zlib.decompress(
An example file: text.zip (password: zippy, and as always, be careful, it comes from some real case - although I think it doesn't work).

@gdesmar
Copy link

gdesmar commented May 22, 2024

The PR was merged. The updated Identify code should be part of the next release! Just make sure to backup your local change before reverting to get the latest at that point. 🙂
Thank you for the help!
EDIT: And I added a lot of the items from the Datadog link, plus the pickle one, so those should be handled as well!

@gdesmar gdesmar closed this as completed May 22, 2024
@kam193
Copy link
Author

kam193 commented May 22, 2024

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assess We still haven't decided if this will be worked on or not bug Something isn't working identify
Projects
None yet
Development

No branches or pull requests

3 participants