Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONResultsReader exception when record contains invalid UTF-8 characters #540

Open
ericatdropzone opened this issue Aug 25, 2023 · 1 comment

Comments

@ericatdropzone
Copy link

ericatdropzone commented Aug 25, 2023

I'm using the botsv3 dataset and running running code similar to this:

from splunklib.client import connect
from splunklib.results import JSONResultsReader
from time import sleep

spl_query = "search index=botsv3 sourcetype=stream:udp earliest=0"
connection = connect(host=host, port=port, username=user, password=password, autologin=True)
job = self.connection.jobs.create(spl_query)

# Wait for the job to complete
sleep(5)

reader = JSONResultsReader(job.results(output_mode="json", earliest_time=earliest_time, count=max_results))
for result in reader: # This throws an exception
    ...

I'm seeing this exception:

Traceback (most recent call last):
  File "/app/splunk_scanner/splunk_connection.py", line 54, in query
    for result in reader:
  File "/usr/local/lib/python3.11/site-packages/splunklib/results.py", line 352, in next
    return next(self._gen)
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/splunklib/results.py", line 361, in _parse_results
    parsed_line = json_loads(strip_line)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/__init__.py", line 341, in loads
    s = s.decode(detect_encoding(s), 'surrogatepass')
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8e in position 89348: invalid start byte

This looks similar to this issue, but running version 1.7.4 didn't fix this instance of the problem. I also noticed that this pull request appears to fix the issue, but I'm not sure if that's the approach you'd want to take

Splunk (please complete the following information):

  • splunk version: 9.1.0.1
  • OS: MacOS 13.5
  • Deployment: single, local instance

SDK (please complete the following information):

  • Version: 1.7.4
  • Language Runtime Version: Python 3.11.4
  • OS: Linux (in a docker container) 5.15.49
@ericatdropzone ericatdropzone changed the title JSONResultsReader when record contains UTF-8 characters JSONResultsReader when record contains invalid UTF-8 characters Aug 26, 2023
@ericatdropzone ericatdropzone changed the title JSONResultsReader when record contains invalid UTF-8 characters JSONResultsReader exception when record contains invalid UTF-8 characters Aug 26, 2023
@kleptog
Copy link

kleptog commented Sep 7, 2023

Well this is disappointing. We've had to work around Splunk returning incorrectly encoded XML is the case of binary data, and I was hoping that in the switch to JSON they would have fixed that. So now we can look ahead to working around this in JSON as well (yay!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants