Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update create-parser.md #2455

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Update create-parser.md #2455

wants to merge 2 commits into from

Conversation

jenworthington
Copy link
Collaborator

First editing pass

First editing pass
## Why create a parser?
Splunk Connect for Syslog can offload Splunk Indexers by performing operations that normally would have been done during index time, including
linebreaking, source/sourcetype setting, and timestamping. Creating a parser also reduces the need of using corresponding add-ons on indexers.
Whn you create parsers, SC4S can offload Splunk indexers by performing operations that would normally be performed during index time, including
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe let's git of it all:

Splunk Connect for Syslog can offload Splunk Indexers by performing operations that normally would have been done during index time, including
linebreaking, source/sourcetype setting, and timestamping. Creating a parser also reduces the need of using corresponding add-ons on indexers.

and write something like:

SC4S parsers perform operations that would normally be performed during index time, including linebreaking, source and sourcetype setting, and timestamping. You can write your own parser if the parsers available in the SC4S package do not meet your needs.

* Prepare your [environment](../developing/index.md).
* Create a new branch in the repository where you will apply your changes.

## Procure a raw log message
If you already have a raw log message, you can skip this step. Otherwise, you need to extract one to have something to work with. You can do this in multiple ways, this section describes two methods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

three methods:

  1. tcpdump
  2. wireshark
  3. Raw log message in Splunk


### Save your raw log message in Splunk or an archive

Once you get your stream of messages, copy one of them. Note that in UDP there are not usually any message separators.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move it to ### Procure a raw log message using Wireshark


### Save raw log message in Splunk or archive

### Save your raw log message in Splunk or an archive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Save your raw log message in Splunk or an archive
### Procure a raw log message by saving it in Splunk

The naming convention is `test_vendor_product.py`
Afterwards, you need to make sure that your log is being parsed correctly by creating a test case.
To create a unit test, use the existing test case that is most similar to your use case. The naming convention is `test_vendor_product.py`.
1. Make sure that your log is being parsed correctly by creating a test case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* extract replace values with field names in test string.

Here you can see proper test case for Vmware Carbonblack Protect device:
then do the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be rather:

Assuming you have a raw message like this:

`<14>1 2022-03-30T11:17:11.900862-04:00 host - - - - Carbon Black App Control event:  text="File 'c:\program files\azure advanced threat protection sensor\2.175.15073.51407\winpcap\x86\packet.dll' [c4e671bf409076a6bf0897e8a11e6f1366d4b21bf742c5e5e116059c9b571363] would have blocked if the rule was not in Report Only mode." type="Policy Enforcement" subtype="Execution block (unapproved file)" hostname="CORP\USER" username="NT AUTHORITY\SYSTEM" date="3/30/2022 3:16:40 PM" ip_address="10.0.0.3" process="c:\program files\azure advanced threat protection sensor\2.175.15073.51407\microsoft.tri.sensor.updater.exe" file_path="c:\program files\azure advanced threat protection sensor\2.175.15073.51407\winpcap\x86\packet.dll" file_name="packet.dll" file_hash="c4e671bf409076a6bf0897e8a11e6f1366d4b21bf742c5e5e116059c9b571363" policy="High Enforcement - Domain Controllers" rule_name="Report read-only memory map operations on unapproved executables by .NET applications" process_key="00000433-0000-23d8-01d8-44491b26f203" server_version="8.5.4.3" file_trust="-2" file_threat="-2" process_trust="-2" process_threat="-2" prevalence="50"`

1. Make sure that the message is a valid python string, where escape characters are placed correctly.
2. Anonymize the data.
3. Rename functions.
4. Update index and sourcetype fields.
5. Extract and replace values with field names in the test string.

This example shows a test case for Vmware Carbonblack Protect device:

6. Now run the test:
...
7. The parsed log should appear in Splunk:

![parsed_log](../resources/images/parser_dev_splunk_first_run.png)
As you can see, at this moment, the message is being parsed as a generic *nix:syslog sourcetype.
To assign it to the proper index and sourcetype you will need an actual parser. So far we have ensured that the fields in the messages are properly recognized.
In this example the message is being parsed as a generic *nix:syslog sourcetype and we have ensured that the fields in the messages are properly recognized.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this example the message is being parsed as a generic *nix:syslog sourcetype and we have ensured that the fields in the messages are properly recognized.
In this example the message is being parsed as a generic `nix:syslog` sourcetype. This means that the message format complied with RFC standards, and SC4S could correctly identify the format fields in the message.

Here is an example:
To assign your messages to the proper index and sourcetype you will need to create a parser. Your parser must be declared in `package/etc/conf.d/conflib`. The naming convention is `app-type-vendor_product.conf`.

1. If you already have a similar parser, you can use it as a reference. In the parser, make sure you assign the proper sourcetype, index, vendor, product, and template. The template shows how your message should be parsed before sending them to Splunk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

Suggested change
1. If you already have a similar parser, you can use it as a reference. In the parser, make sure you assign the proper sourcetype, index, vendor, product, and template. The template shows how your message should be parsed before sending them to Splunk.
1. If you find a similar parser in SC4S, you can use it as a reference. In the parser, make sure you assign the proper sourcetype, index, vendor, product, and template. The template shows how your message should be parsed before sending them to Splunk.

@mstopa-splunk
Copy link
Contributor

@jenworthington ready for the next pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants