Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Error Handling Large CSV Fields in Splunk SDK #561

Open
Catsofsuffering opened this issue Mar 6, 2024 · 2 comments
Open

Bug Report: Error Handling Large CSV Fields in Splunk SDK #561

Catsofsuffering opened this issue Mar 6, 2024 · 2 comments

Comments

@Catsofsuffering
Copy link

The bug I found and how to repair it
When developing a threat hunting application, I encountered a bug located at line 948 of splunklib\searchcommands\search_command.py. The relevant code snippet is as follows:

def _read_csv_records(self, ifile):
    reader = csv.reader(ifile, dialect=CsvDialect)

    try:
        fieldnames = next(reader)
    except StopIteration:
        return

The bug arises due to the use of the Python csv package and the reader function. This leads to an error occurring during the processing of large amounts of data:

Error: field larger than field limit (131072)

After conducting a Google search, I found a solution on Stack Overflow (https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072). Implementing the following code snippet resolved the issue:

csv.field_size_limit(sys.maxsize)

Splunk:

  • Version: 9.1.2
  • OS: Redhat 8.5.0-20
  • Deployment: Cluster

SDK:

  • Version: 1.7.4
  • Language Runtime Version: Python 3.7
  • OS: Redhat 8.5.0-20

Additional context
Are there any risks or issues associated with my approach?

@ashah-splunk
Copy link
Contributor

@Catsofsuffering Can you please provide the steps to reproduce this issue?

@Catsofsuffering
Copy link
Author

@Catsofsuffering Can you please provide the steps to reproduce this issue?

Due to our company's data security policy, I am unable to directly provide screenshots or logs to you. However, I can briefly describe the background and cause of this bug:

As mentioned earlier, I have developed an app that matches IOC threat intelligence, which requires sending a large amount of URL data to the corresponding API and then importing the returned results into Splunk. During this process, it seems that the error Error: field larger than field limit (131072) occurred because the returned data (approximately close to 1 billion events) exceeded the limit of the csv.reader.

Although only about 40,000 records are filtered on average in the end, this issue still occurs. I'm not sure if this is a bug or an area that needs optimization, so I would like to consult you teams. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants