Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UnicodeDecodeError #265

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tmsanrinsha
Copy link

@tmsanrinsha tmsanrinsha commented Mar 11, 2018

Original code does not take into account scriptencoding is comment or not.
So UnicodeDecodeError occures in the code

" scriptencoding とは
Traceback (most recent call last):
  File "/Users/tmsanrinsha/python/bin/vint", line 11, in <module>
    sys.exit(main())
  File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/__init__.py", line 11, in main
    init_cli()
  File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/bootstrap.py", line 22, in init_cli
    cli.start()
  File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/linting/cli.py", line 27, in start
    violations = self._lint_all(env, config_dict)
  File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/linting/cli.py", line 120, in _lint_all
    violations += linter.lint_file(file_path)
  File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/linting/linter.py", line 107, in lint_file
    root_ast = self._parser.parse_file(path)
  File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/ast/parsing.py", line 37, in parse_file
    decoded = decoder.read(file_path)
  File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/encodings/decoder.py", line 30, in read
    string = self.strategy.decode(hunk, debug_hint=debug_hint_for_the_loc)
  File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/encodings/decoding_strategy.py", line 45, in decode
    string_candidate = strategy.decode(bytes_seq, debug_hint)
  File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/encodings/decoding_strategy.py", line 77, in decode
    return bytes_seq.decode(encoding=encoding_part.decode(encoding='ascii'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)

This PR fixes the problem.

Sample output:

#!/usr/bin/env python
import re

def _split_by_scriptencoding(bytes_seq):
    # type: (bytes) -> [(str, bytes)]
    max_end_index = len(bytes_seq)
    start_index = 0
    bytes_seq_and_loc_list = []

    for m in re.finditer(b'^\s*(scriptencoding)', bytes_seq, re.MULTILINE):
        end_index = m.start(1)

        if end_index == 0:
            continue

        bytes_seq_and_loc_list.append((
            "{start_index}:{end_index}".format(start_index=start_index, end_index=end_index),
            bytes_seq[start_index:end_index]
        ))
        start_index = end_index

    bytes_seq_and_loc_list.append((
        "{start_index}:{end_index}".format(start_index=start_index, end_index=max_end_index),
        bytes_seq[start_index:max_end_index]
    ))

    return bytes_seq_and_loc_list


str = '''scriptencoding utf-8
" scriptencoding あ
echo 'scriptencoding い'
 scriptencoding utf-8
'''

print(_split_by_scriptencoding(str.encode()))

output

[('0:69', b'scriptencoding utf-8\n" scriptencoding \xe3\x81\x82\necho \'scriptencoding \xe3\x81\x84\'\n '), ('69:90', b'scriptencoding utf-8\n')]

Original code does not take into account scriptencoding is comment or not.
So UnicodeDecodeError occures in the code

        " scriptencoding とは
@Kuniwak
Copy link
Member

Kuniwak commented Jun 18, 2018

Sorry for my too late reply.

We should support the following abnormal situation if we can:

:::::
    \scriptencoding utf8

How do you feel about it?

@blueyed
Copy link
Member

blueyed commented Nov 29, 2018

@tmsanrinsha
Please reply to the last comment from @Kuniwak / provide an update.

@blueyed
Copy link
Member

blueyed commented Nov 29, 2018

Also a test would be needed.

@blueyed blueyed added the bug label Nov 29, 2018
@blueyed
Copy link
Member

blueyed commented Apr 11, 2019

@tmsanrinsha
Ping.
I'd like to do a new release soonish, and it would be great to have this included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants