Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using yara regex rule to scan chinese character, error #1952

Open
hanggao481 opened this issue Aug 17, 2023 · 3 comments
Open

using yara regex rule to scan chinese character, error #1952

hanggao481 opened this issue Aug 17, 2023 · 3 comments
Labels

Comments

@hanggao481
Copy link

How to use yara regex rule to scan chinese character? what's the reason of the following error match?

Describe the bug
my yara rule:
rule AsianCharacter : general
{
strings:
$chinese = /[\u8fd9]/
condition:
$chinese
}

match result:
0x1cd:$chinese: u
0x1d2:$chinese: f
0x1dd:$chinese: 8

Expected behavior
expecting match result:
0x1cd:$chinese: 这

Note:
unicode of "这" is \u8fd9

@hanggao481 hanggao481 added the bug label Aug 17, 2023
@hanggao481
Copy link
Author

another example: I want to scan Chinese character by regex yara rules as beloww:
rule AsianCharacter : general
{
strings:
$chinese = /[\u4e00-\u9fa5]/
condition:
$chinese
}
Problem:
it cannot match Chinese character.

@vthib
Copy link
Contributor

vthib commented Aug 20, 2023

Yara does not have unicode handling in strings, and the \u syntax does not exist. What you wrote is actually [u8fd9], so one of those five ascii bytes.

If you want to search for a non ascii character, you will need to search for the bytes that match its encoding in the files you search. For utf-8 files, that would mean something like this:

rule AsianCharacter : general
{
  strings:
    $chinese = /\xe8\xbf\x99/
  condition:
    $chinese
}

For utf-16 encoding, I guess something like that /\x8f\xd9/.

Note that because you need to encode in a given encoding, you cannot use ranges like in your second example.

@gaohang
Copy link

gaohang commented Sep 4, 2023

Yara does not have unicode handling in strings, and the \u syntax does not exist. What you wrote is actually [u8fd9], so one of those five ascii bytes.

If you want to search for a non ascii character, you will need to search for the bytes that match its encoding in the files you search. For utf-8 files, that would mean something like this:

rule AsianCharacter : general
{
  strings:
    $chinese = /\xe8\xbf\x99/
  condition:
    $chinese
}

For utf-16 encoding, I guess something like that /\x8f\xd9/.

Note that because you need to encode in a given encoding, you cannot use ranges like in your second example.

Thanks. Is there any way to use yara to match Chinese characters ? It means that a scope of unicode can be a yara regex like general regex, e.g. [\u4e00-\u9fa5].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants