Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data integrity lost when data is set to SDK External Commands #336

Open
malvidin opened this issue Jul 25, 2020 · 2 comments
Open

Data integrity lost when data is set to SDK External Commands #336

malvidin opened this issue Jul 25, 2020 · 2 comments

Comments

@malvidin
Copy link

malvidin commented Jul 25, 2020

If data is sent to an external command, data can be modified unexpectedly because leading spaces are ignored when sent back to Splunk from the SDK. It is possible that this issue should to be addressed on the receiving Splunk CSV reader, not the SDK CSV writer, but that is outside of the scope of the Splunk Python SDK.

Issue was confirmed with:
SDK 1.6.13
Python 2 and Python 3
Splunk 8.0.4.1 (Docker)

This streaming command adds a field named six_spaces, and adds two spaces to a specified field.

@Configuration()
class MyCommand(StreamingCommand):
    field = Option(name='field', require=True)

    def stream(self, records):
        for record in records:
            record['six_spaces'] = '      '
            if self.field not in record:
                yield record
                continue
            record[self.field] = ' {} '.format(record[self.field])
            if record[self.field] == record['six_spaces']:
                record['fields_equal'] = "true"
            yield record

If my_command is run like the following:

| makeresults
| eval two_spaces = "  ", four_spaces = "    "
| eval two_space_len = len(two_spaces)
| eval four_space_len = len(four_spaces)
| my_command field=four_spaces
| eval two_space_len_after = coalesce(len(two_spaces), "field modified without reference in streaming command")
| eval four_space_len_after = coalesce(len(four_spaces), "field modified with reference in streaming command")
| eval six_space_len_after = coalesce(len(six_spaces), "field modified without reference in streaming command")
| eval spaces_read_during_command = coalesce(fields_equal, "false")

I would expect that all fields would contain spaces, a number, or the string "true". However, none of the space fields contain any spaces after the streaming command is performed, including fields not referenced in the command.

A workaround is to modify line 364 of internals.py to quoting = csv.QUOTE_ALL
Because the data is read in properly and available during the streaming command execution, it appears that the correct CSV data it sends back to Splunk is inappropriately modified. It could be a CSV reader misconfiguration, use of string.strip(), or something else.

quoting = csv.QUOTE_MINIMAL

@fantavlik
Copy link
Contributor

Thanks for reporting this, we will investigate and attempt to provide a fix - the thorough info and resolution steps are a huge help!

@fantavlik
Copy link
Contributor

Hi @malvidin, we have evaluated this suggested change with quoting = csv.QUOTE_ALL but have found that this generated many empty/invalid fields that were not present before the change, given this undesirable behavior we can't take that solution however I will leave this issue Open as the problem still remains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants