Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding issue UnicodeDecodeError #86

Open
eredi93 opened this issue Jun 11, 2020 · 2 comments
Open

encoding issue UnicodeDecodeError #86

eredi93 opened this issue Jun 11, 2020 · 2 comments

Comments

@eredi93
Copy link

eredi93 commented Jun 11, 2020

Something within the logging pipeline is breaking encoding, but just for some characters.
I'm having a hard time reproducing this issue and i cannot pin point what is actually causing this but it seems that on the fluentd level, either the logger or fluentd itself.

I deployed fluentd in production and sending events from the Rails app using this logger. The logger is configured to send events to fluentd which sends it to S3 in as a gzip file.
I then have a processing pipeline using these files and here is where i started seeing the issues.

client config

client = Fluent::Logger::FluentLogger.new(
  nil,
  host: "localhost",
  port: 24224,
  use_nonblock: true,
  wait_writeable: false
)
client.post("foo", event)

fluentd config

<match foo.**>
  @type s3
  @id   S3_output

  s3_bucket my-bucket
  s3_region us-east-1

  acl bucket-owner-full-control
  store_as gzip_command

  path preprocessed_logs/year=%Y/month=%-m/day=%-d/hour=%-H
  s3_object_key_format "%{path}/#{Socket.gethostname}_%{hex_random}_%{index}.%{file_extension}"

  <buffer time>
    timekey 300
    timekey_use_utc true
    timekey_wait 30
    @type file
    path /var/log/td-agent/buffer/foo
  </buffer>

  <format>
    @type json
  </format>
</match>

It seems that some characters are badly encoded.
here is this user agent example:

'Mozilla/5.0 (iPhone; CPU iPhone OS 13_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Clube da Fluência'

was logged as:

'Mozilla/5.0 (iPhone; CPU iPhone OS 13_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Clube da Flu\xeancia'

ê got changed to \xea which is breaking decoding.

Do you think this might be something to do with how the logger is sending data to fluentd?

to add more context, I'm using this logger in a Rails app and what I log is requests informations. I have checked the Rails side of things and the string passed to the logger is UTF-8 encoded.

@repeatedly
Copy link
Member

Fluentd treats data as a binary by default.
If you hit the encoding problem, one way is convert encoding by using record_modifier or something.

https://docs.fluentd.org/quickstart/faq#i-got-encoding-error-inside-plugin-how-to-fix-it

@eredi93
Copy link
Author

eredi93 commented Jul 7, 2020

@repeatedly thanks let me try this next week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants