Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Specifying Encoding When Opening a File #482

Open
Light-Towers opened this issue Nov 30, 2023 · 1 comment
Open

Support Specifying Encoding When Opening a File #482

Light-Towers opened this issue Nov 30, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@Light-Towers
Copy link

Describe the bug

  • All of my SQL files, encoding set is "UTF-8" . Can it be supported "UTF-8" or other encoding set ? Thanks

SQL

insert into analyze select * from foo;

To Reproduce

sqllineage -f D:\code\warehouse\hive\hsql\dim\test.sql
Traceback (most recent call last):
  File "D:\Program Files\miniconda3\envs\py310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\Program Files\miniconda3\envs\py310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\Program Files\miniconda3\envs\py310\Scripts\sqllineage.exe\__main__.py", line 7, in <module>
    sys.exit(main())
  File "D:\Program Files\miniconda3\envs\py310\lib\site-packages\sqllineage\cli.py", line 93, in main
    sql = extract_sql_from_args(args)
  File "D:\Program Files\miniconda3\envs\py310\lib\site-packages\sqllineage\utils\helpers.py", line 17, in extract_sql_from_args
    sql = f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x8e in position 5: illegal multibyte sequence

Python version (available via python --version)

  • 3.10.11

SQLLineage version (available via sqllineage --version):

  • 1.4.8
@Light-Towers Light-Towers added the bug Something isn't working label Nov 30, 2023
@reata
Copy link
Owner

reata commented Dec 3, 2023

The encoding used by Python's builtin open function is platform dependent, which you can verify with following command:

python -c "import locale; print(locale.getpreferredencoding(False))"

And it should return 'gbk' in your machine. Use gbk to read a utf-8 encoding file caused the exception.

Right now we don't provide an encoding option for user to choose. As a short-term solution, you can enable Python UTF-8 Mode via environment variable PYTHONUTF8.

@reata reata changed the title Support "UTF-8" or other Support Specifying Encoding When Opening a File Dec 3, 2023
@reata reata added enhancement New feature or request and removed bug Something isn't working labels Dec 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants