Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceptions when using cz from ISO8859-1 Terminal #956

Open
keenonkites opened this issue Jan 9, 2024 · 4 comments
Open

Exceptions when using cz from ISO8859-1 Terminal #956

keenonkites opened this issue Jan 9, 2024 · 4 comments

Comments

@keenonkites
Copy link

Description

When issuing 'cz info' from a ISO8859-1 encoded Terminal I get the following exception:

[xx@XXXXXX:~/tmp/test-git-repo (master +)] $ cz info
Traceback (most recent call last):
  File "/home/user/pb/venv_commitizen/bin/cz", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/cli.py", line 570, in main
    args.func(conf, arguments)()
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/commands/info.py", line 13, in __call__
    out.write(self.cz.info())
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/out.py", line 13, in write
    print(value, *args)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 1068: ordinal not in range(256)

I was running into that problem on MacOS (Ventura 13.6.3) and AlmaLinux 8.9.

Same exception also happens when issuing 'cz init' quite at the end of the process:

[xx@xxxxxxxx:~/tmp/test-git-repo (master +)] $ cz init
Welcome to commitizen!

Answer the questions to configure your project.
For further configuration visit:

https://commitizen-tools.github.io/commitizen/config/

? Please choose a supported config file:  .cz.toml
? Please choose a cz (commit rule): (default: cz_conventional_commits) cz_conventional_commits
? Choose the source of the version: commitizen: Fetch and set version in commitizen config (default)
No Existing Tag. Set tag to v0.0.1
? Choose version scheme:  semver
? Please enter the correct version format: (default: "$version")
? Create changelog automatically on bump Yes
? Keep major version zero (0.x) during breaking changes Yes
? What types of pre-commit hook you want to install? (Leave blank if you don't want to install) done

You can bump the version running:

	cz bump

Traceback (most recent call last):
  File "/home/user/pb/venv_commitizen/bin/cz", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/cli.py", line 570, in main
    args.func(conf, arguments)()
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/commands/init.py", line 150, in __call__
    out.success("Configuration complete \U0001f680")
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/out.py", line 28, in success
    line(message)
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/out.py", line 18, in line
    print(value, *args, **kwargs)
UnicodeEncodeError: 'latin-1' codec can't encode character '\U0001f680' in position 28: ordinal not in range(256)

This probably happens also with other non UTF-8 encoded Terminals.

We have quite a few machines that still run (due to the application running on them) in ISO8859-1 Encoding

Steps to reproduce

  1. Start new Terminal
  2. Set LANG to 'de_CH.ISO8859-1' with 'export LANG=de_CH.ISO8859-1'
  3. issue 'cz info'

Current behavior

Exception / Crash

Desired behavior

optimum:
Proper output with reencoding also with other Terminal Encodings than UTF-8. (This is due the fact that we have to use ISO8859-1 Terminal for development and therefore also use git and cz inside these terminals)

Acceptable but not desireable:
Prevent cz from being used with non UTF-8 Termial: warn and exit at start instead of throwing an exception in the middle of the work.

Screenshots

No response

Environment

cz version --report
Commitizen Version: 3.13.0
Python Version: 3.11.6 (main, Oct 3 2023, 17:06:54) [GCC 8.5.0 20210514 (Red Hat 8.5.0-18)]
Operating System: Linux

@Lee-W
Copy link
Member

Lee-W commented May 20, 2024

Hi @keenonkites , thanks for filing this issue. I just tested with the following commands without encountering issues.

export LANG=de_CH.ISO8859-1
cz info

I also tried to install 3.13.0 and change terminal encoding. Could you please check if it still happens? If so, could you please provider another way to reproduce? Thanks!

@keenonkites
Copy link
Author

I've just tested it on my mac (where cz is installed via brew) as well as on a AlmaLinux VM (where cz is installed via pip) with the newest version 3.26.0 and it still happens on both systems.

to be mentioned facts:

  • both systems have ISO8559-1 as system setting
  • it does throw an exception when I start a terminal with LANG=de_CH.ISO8859-1
  • it works if I start a terminal with LANG=de_CH.UTF-8
  • if does throw an exception when I start a UTF8 Terminal and change to export LANG=de_CH.ISO8859-1 before issuing cz info

Below you see result of the third version (UTF8 Term, cz info, changing Lang, czinfo:

[xx@yyyyy:~] $ echo $LANG
de_CH.UTF-8
[xx@yyyyy:~] $ cz info
The commit contains the following structural elements, to communicate
intent to the consumers of your library:

fix: a commit of the type fix patches a bug in your codebase
(this correlates with PATCH in semantic versioning).

feat: a commit of the type feat introduces a new feature to the codebase
(this correlates with MINOR in semantic versioning).

BREAKING CHANGE: a commit that has the text BREAKING CHANGE: at the beginning of
its optional body or footer section introduces a breaking API change
(correlating with MAJOR in semantic versioning).
A BREAKING CHANGE can be part of commits of any type.

Others: commit types other than fix: and feat: are allowed,
like chore:, docs:, style:, refactor:, perf:, test:, and others.

We also recommend improvement for commits that improve a current
implementation without adding a new feature or fixing a bug.

Notice these types are not mandated by the conventional commits specification,
and have no implicit effect in semantic versioning (unless they include a BREAKING CHANGE).

A scope may be provided to a commit’s type, to provide additional contextual
information and is contained within parenthesis, e.g., feat(parser): add ability to parse arrays.

<type>[optional scope]: <description>

[optional body]

[optional footer]

[xx@yyyyy:~] $ export LANG=de_CH.ISO8859-1
[xx@yyyyy:~] $ cz info
Traceback (most recent call last):
  File "/home/user/pb/venv_commitizen/bin/cz", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/cli.py", line 607, in main
    args.func(conf, arguments)()
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/commands/info.py", line 13, in __call__
    out.write(self.cz.info())
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/out.py", line 13, in write
    print(value, *args)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 1068: ordinal not in range(256)[xx@yyyyy:~] $

I can't think of anything else how to produce it as for me it's more an 'avoiding' the problem, but question from my end:
Do you have de_CH.ISO8859-1 installed on your test system ? If the local is not installed you probably fall back automatically to UTF8 anyway.

@keenonkites
Copy link
Author

In the case of issuing the command cz info the crash is caused by the single quote on line 24 from "commit’s" in the file commitizen/cz/conventional_commits/conventional_commits_info.txt.

Taking this out from the source code prevents cz from crashing for 'cz info' with non-utf terminals. But there are other non-ascii characters in other sections also that crashes cz with other commands (cz init, e.g., as mentioned in the original posting).

To make the application non-utf save I think the functions for producing the output in the file commitizen/out.py have to be written in a way that allows masking/rewriting characters that are not safe for the actual encoding... or at least prevent crashes and print out proper error messages.

@Lee-W
Copy link
Member

Lee-W commented May 21, 2024

Thanks for updating! Will take a deeper look after I come back.

@Lee-W Lee-W self-assigned this May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants