Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 characters from marketo are improperly decoded #74

Open
abrittis opened this issue Jul 29, 2021 · 0 comments
Open

UTF-8 characters from marketo are improperly decoded #74

abrittis opened this issue Jul 29, 2021 · 0 comments

Comments

@abrittis
Copy link

Steps to reproduce

  • Extract records from marketo that contain UTF-8 characters
  • View results in an editor that supports UTF-8

Expected Results

  • UTF-8 characters are displayed properly

Actual Results

  • UTF-8 characters are corrupted

Proposed Solution

  • IMO this is actually a bug with the Marketo API call https://.mktorest.com/bulk/v1//export/<job_id>/file.json
  • The above call returns a response with the encoding set to ISO-8859-1, even if the response contains UTF-8 characters.
  • This causes Python's requests.models.iter_content(decode_unicode=True) to use the incorrect decoder. (Since the response says the encoding is IOS-8859-1, python just ignores the decode_unicode parameter).
  • My proposed fix would be to set the encoding to 'utf-8' in the response before we request python to iter_content. This change would be in tap-marketo.sync.py right after we make the call "resp = client.stream_export(stream_type, export_id)."

tap-marketo.sync.py

def stream_rows(client, stream_type, export_id):
    with tempfile.NamedTemporaryFile(mode="w+", encoding="utf8", delete=False) as csv_file:
        singer.log_info("Download starting.")
        resp = client.stream_export(stream_type, export_id)
        # Force response encoding to 'utf-8' since Marketo doesn't set this properly
        resp.encoding = 'utf-8'
        for chunk in resp.iter_content(chunk_size=CHUNK_SIZE_BYTES, decode_unicode=True):
        if chunk:
            # Replace CR
            chunk = chunk.replace('\r', '')
            csv_file.write(chunk)
@abrittis abrittis changed the title UTF-8 characters from marketo characters are not handled properly UTF-8 characters from marketo are improperly decoded Jul 29, 2021
abrittis pushed a commit to abrittis/tap-marketo that referenced this issue Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant