Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False Positive in Rule WEBSHELL_PHP_Dynamic_Big #317

Open
gotmls opened this issue Apr 10, 2024 · 3 comments
Open

False Positive in Rule WEBSHELL_PHP_Dynamic_Big #317

gotmls opened this issue Apr 10, 2024 · 3 comments

Comments

@gotmls
Copy link

gotmls commented Apr 10, 2024

This looks for a php variable function call like $my_function($something); using the following regex:
/$[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff[]'"]{0,20}\s{0,20}($/

The main problem is that it will find regular variables inside double-quoted string that are followed by open-parentheses, like this:
die("$gt ( $fa ) $lt/span$gt\n$lt/body$gt$lt/html$gt");
... and these are not executing variable functions at all, but simply outputting variable text in a string that also contains parentheses.

This similar regex:
/$[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff[]'"]{0,20}\s{0,20}("/
also finds other examples of this safe and normal usage in my own code that you label as malware:
$auto .= "File MODIFIED: $file (".md5($GLOBALS["file_contents"])."O".strlen($GLOBALS["file_contents"])." => $md5)";
return "\n Debugging $function (".round(microtime(true), 4).")\n";

I am also not sure why you would use {0,20} as your quantifier in these expressions because there could easily be malicious code that would not be found by your regex simply by having a variable name longer than 21 characters or by putting more than 20 spaces between the variable function and the open-parentheses, or even just by putting a single space after the open-parentheses, like this:
$run_this_malicious_function_with_long_name ( $this_bad_code_will_not_be_found );

@ruppde
Copy link
Contributor

ruppde commented Apr 29, 2024

hi,

the problem are webshells like these:

<?php 
    $a = substr('1a',1).'s'.'s'.'e'.'r'.'t';
    $a($_POST['x']);
?>

unfortunately there's no perfect solution to find them without false positives (with the biggest problem, that php allows such coding in the first place. I think it's removed in never versions). but WEBSHELL_PHP_Dynamic_Big should only match, if there are further suspicious strings so could you post the matched file or at least the output of "yara -s" ?

regards
arnim

@gotmls
Copy link
Author

gotmls commented May 1, 2024

I understand the problem, and I know it is hard to these malicious scripts without occasionally reporting a False Positive. My main job is fighting malware and I spend a lot of time writing and improving my own regex definitions for known threats like the one you posted here. I work very hard to not generate False Positive reports in my Anti-Malware plugin because it can cause irreparable harm to both the subject's reputation and my own. The current situation with your signatures is causing hosting providers who use your software to inform their clients who use my software that my code is malicious, when it is not and it matches your signatures for the wrong reasons (which I explained above). I'm not asking for perfection, however, this Rule of yours is faulty in a few ways that I have already mentioned and is effectively slandering my plugin for containing innocuous string that are not at all what you are claiming them to be (WEBSHELL).

Since you seem reluctant to address the faults that I have outlined with your approach I am taking steps to avoid false detection by your signature, but I do not feel that this is a solution, this is only my own workaround for your poor implementation. The sad think (which I also mentioned in my first post) is that any hacker could also avoid detection from your signature with the simplest and potentially incidental alterations of that malicious script you posted above:

<?php 
    $a = substr('1a',1).'ssert';
    $a( $_POST['x']);
?>

or

<?php 
    $afc0586aca6e42cffade83252446d0613 = substr('1a',1).'ssert';
    $afc0586aca6e42cffade83252446d0613($_POST['x']);
?>

Neither of these would be detected by this signature of yours but they do the exact same thing as your first example, and my first alternate here just has a single SPACE before the $_POST and the other just uses a variable name that is too long to match your regex (I also took out the string obfuscation so it didn't match your other signature but it's all still bad code). Here is a simplified version of one of my own regex definitions that finds variations of this threat:
/<\?(?:php)?\s*+(?<C>\/\*(?:[^\*]++|\*++[^\/])*+\*++\/\s*+|(?:\#|\/\/)[^\n]++\s++)*+(?:(?:(?&C)|if[\s\(]++isset[\s\(]++[\$\{'"_]++[^\{]++[\{\s]*+|\$\w++\s*+(?<A>\[[^\]]*+\]++\s*+)*+)++=[^;]*+;\s*+)++(?:(?:\$\w++\s*+(?&A)*+=\s*+)?\$[\$\{]*+\w++[\}\s]*+(?:(?&C)|(?&A))*+\(.*;[\}\s]*+)++($|\?>)/i

Here are two simple examples of completely reasonable code that look enough like that WEBSHELL you posted to be detected by your signature but are not at all malicious or dangerous in any way:

<?php 
    $a = 'a string value';
    print("$a (".strlen($a).")");
?>

This just prints the string followed by the length of the string in parentheses.

<?php 
    $a = 'a string value';
    print($a[strlen($a)-1]);
?>

This just prints the last character in the string. I have seen a lot of PHP code, both malicious and otherwise, and I know that these are both examples that illustrate common uses of parentheses in ways that your signature is confusing for function calls. This is something I feel that you need to change so that you do not report so many False Positive which cause the support professional, who trust and rely on your work, to contact their clients with false and slanderous claims that my software (and other legitimate and safe software by other professional developers like yourself) is malicious when it is clearly not.

Since you asked, here is some of the output from the yara -s command that detected this False Positive in one of my plugin files:

WEBSHELL_PHP_Dynamic_Big .../wp-content/plugins/gotmls/images/index.php
0x0:$new_php2: <?php
0x1c501:$new_php2: <?php
0x0:$php_short: <?
0x13abc:$dynamic1: $whitelist[md5($
0xe1ee:$gen_much_sus25: Exploit
0xe4f4:$gen_much_sus33: hacker
0xe637:$gen_much_sus33: hacker
0x2907:$gen_much_sus100: rot13
0x29d4:$gen_much_sus100: rot13

As you said, "WEBSHELL_PHP_Dynamic_Big should only match, if there are further suspicious strings", so you can see that I also have 5 gen_much_sus matches (amoung others), for which I cannot really do anything about, and it only takes 3 of these matches in your signature to flag my code as suspicious. As I mentioned, this code is from my Anti-Malware plugin, so I do have strings in my code that list "Exploits" and mentions how my plugin can protect my users from a "hacker", Now the real issue, your primary complaint which classifies my code as a WEBSHELL is this supposed dynamic function execution "$whitelist[md5($", but you can clearly see from just the 16 characters that were matched that this is not at all a dynamic function call. It is clearly just a call to the static function md5 which return result to be used as the key in an array variable (nothing malicious or even suspicious there). If you look at the whole line of code you will see how my code is completely safe and reasonable without being anything close to a WEBSHELL as you are claiming.
if (isset($whitelist[md5($GLOBALS["GOTMLS"]["tmp"]["file_contents"]).'O'.$filesize]))
If you notice that in my own regex I also look for the potential use of square brackets after the variable name, but I make sure there is a closing bracket before I look for the open-parentheses that would indicate a function is being called.

In another of my files your signature finds 2 similar False Positives:

WEBSHELL_PHP_Dynamic_Big .../wp-content/plugins/gotmls/index.php
0x0:$new_php2: <?php
0x0:$php_short: <?
0xe42d:$dynamic2: $file ("
0x19bd3:$dynamic2: $function ("
0x8dc2:$gen_much_sus24: exploit
0x12c56:$gen_much_sus32: hacked
0x8db7:$gen_much_sus33: hacker
0xafe0:$gen_much_sus33: hacker
0x16622:$gen_much_sus33: hacker
0xd224:$gen_much_sus89: backdoor
0xd26a:$gen_much_sus89: backdoor

Again, only 3 mentions of exploit, hackers, and backdoors would be enough to mark my code as "suspicious", yet I have 7, and I have every right and good cause to mention these words in my own Anti-Malware software! Worse yet, there are two matches here that you are calling "dynamic functions", and they do look like it from just the small snippets of code you are isolating $file (" and $function (" ... but again, if you look at the whole line of code you will see that these are both just innocuous output strings for which you are confusing the closing quote with an opening quote by not looking at the bigger picture:
$autoUpJS .= "<li>Core File MODIFIED: $file (".md5($GLOBALS["GOTMLS"]["tmp"]["file_contents"])."O".strlen($GLOBALS["GOTMLS"]["tmp"]["file_contents"])." => $md5)</li>";
and
return "\n<!-- Debugging $function (".round(microtime(true)-$GLOBALS["GOTMLS"]["MT"], 4).") -->\n";

You are effectively saying that return "\n<!-- Debugging $function (" is malicious when it is clearly only a string and not even a function call at all (dynamic or otherwise).

@Neo23x0
Copy link
Owner

Neo23x0 commented May 2, 2024

and I have every right and good cause to mention these words in my own Anti-Malware software

Yes. Correct.

is causing hosting providers who use your software to inform their clients who use my software that my code is malicious

You should discuss this with the hosting provider and not with us.
The "score" of that rule is set to 50 of 100. It's a threat hunting rule. They're using it in the wrong way.
https://github.com/Neo23x0/YARA-Style-Guide?tab=readme-ov-file#score

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants