Skip to content

Backward incompatible changes in YARA 4.0 API

Victor M. Alvarez edited this page Jan 8, 2021 · 9 revisions

YARA 4.0.0 introduces some backward-incompatible changes in its C API that developers must be aware of. Backward-incompatible changes are always a nuisance and put a maintenance burden in software that depends on libyara, but they are sometimes necessary in order to pay technical debt and move the project forward in good shape. This document aims to explain those changes, how they affect existing users, and the reason behind them.

The YR_RULES structure

YR_RULES is a cornerstone structure in the YARA's API. This structure represents a set of YARA rules that has been compiled from their textual form. Instances of YR_RULES are created by calling the yr_compiler_get_rules function, and here comes the first change: while in YARA 2.x and 3.x this function returned a new instance of YR_RULES each time you called the function, in YARA 4.0.0 this structure is a singleton, all calls to yr_compiler_get_rules return the same YR_RULES structure.

In previous versions having multiple instances of YR_RULES was necessary because each instance of the structure could be shared by 32 threads at most (the limit was initially 16, but it was raised to 32 in later versions). If you wanted to use more than 32 scanning threads, you needed additional instances of YR_RULES. With YARA 4.0.0 however this limit doesn't exist anymore, the YR_RULES structure can be shared with as many threads you want, and therefore a single instance is enough.

By having a single instance of YR_RULES your program's memory footprint is greatly reduced, specially when you are compiling thousands of rules. Also, individual rules within the YR_RULES structure now take less space, which also contributes to save memory. In VirusTotal we reduced the size of compiled rules from more than 4GB to less than 1GB.

If your program creates multiple instances of YR_RULES for the same compiled rules, and for each of those instances it calls yr_rules_define_XXX_variable for assigning different values to some variable X, it won't work as it used to be. In YARA 3.x each instance of YR_RULES holds its own set of variables, but in YARA 4.0 there's a single YR_RULES per compiled rules. If you want to scan your data with the same compiled rules but only changing the values of some variables you need to create multiple YR_SCANNER structures for your YR_RULES using yr_scanner_create, and then use yr_scanner_define_XXX_variable on each of them for assigning the desired values to your variables. Each YR_SCANNER can have a different value for some variable X.

Compiler callback

The yr_compiler_set_callback functions accepts a pointer to a callback function that YARA will call for notifying you about errors occurred during the compilation of your rules. In YARA 3.x the callback's definition was:

void callback_function(
    int error_level,
    const char* file_name,
    int line_number,
    const char* message,
    void* user_data)

In YARA 4.0 a new argument const YR_RULE* rule has been added:

void callback_function(
    int error_level,
    const char* file_name,
    int line_number,
    const YR_RULE* rule,
    const char* message,
    void* user_data)

This new argument is a pointer to the rule containing the error, but it can be NULL if the error wasn't found within a rule definition.

Scanning callback

Programs receive information about matches found in the scanned data via a callback function. The program provides the callback function and YARA calls it whenever it finds a matching (or not matching) rule. This callback has changed its signature from:

int callback_function(
    int message,
    void* message_data,
    void* user_data);

To:

int callback_function(
    YR_SCAN_CONTEXT* context,
    int message,
    void* message_data,
    void* user_data);

Notice that the callback function now receives an additional argument YR_SCAN_CONTEXT* context. This structure is opaque to the program, you shouldn't rely on the fields contained in the structure, but the context will be necessary for iterating the matches for a given string as will be shown below.

yr_string_matches_foreach

The yr_string_matches_foreach macro now receives an additional argument, the scan context mentioned in the section above. This macro is used for iterating over the matches found for a given string in one of your rules. In YARA 3.x the implementation of your callback function looked similar to:

int callback_function(
    int message,
    void* message_data,
    void* user_data)
{
    if (message == CALLBACK_MSG_RULE_MATCHING)
    {
        // If message is CALLBACK_MSG_RULE_MATCHING message_data is a pointer
        // to the matching rule.
        YR_RULE* rule = (YR_RULE*) message_data;
        YR_STRING* string;

        // Iterate the rule's strings
        yr_rule_strings_foreach(rule, string)
        {
            // Iterate the matches for the current string.
            yr_string_matches_foreach(string, match)
            {
                ..do something with match
            }
        }
    }
}

In YARA 4.0 it will look like:

int callback_function(
    YR_SCAN_CONTEXT* context,
    int message,
    void* message_data,
    void* user_data)
{
    if (message == CALLBACK_MSG_RULE_MATCHING)
    {
        // If message is CALLBACK_MSG_RULE_MATCHING message_data is a pointer
        // to the matching rule.
        YR_RULE* rule = (YR_RULE*) message_data;
        YR_STRING* string;

        // Iterate the rule's strings
        yr_rule_strings_foreach(rule, string)
        {
            // Iterate the matches for the current string.
            yr_string_matches_foreach(context, string, match)
            {
                ..do something with match
            }
        }
    }
}

Notice the extra argument context in the callback definition and how it is used with yr_string_matches_foreach.