Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use latin1 for text encoding? #10

Open
martonmiklos opened this issue Feb 7, 2019 · 2 comments
Open

Use latin1 for text encoding? #10

martonmiklos opened this issue Feb 7, 2019 · 2 comments

Comments

@martonmiklos
Copy link

martonmiklos commented Feb 7, 2019

Hi folks!

First of all thanks for all efforts put into this project!

I have some schematics where accented characters were present in the texts and got some exceptions:

Traceback (most recent call last):
  File "altium.py", line 1615, in <module>
    main()
  File "altium.py", line 420, in main
    render(args.file, renderer.Renderer)
  File "altium.py", line 590, in __init__
    self.handle_children([objects])
  File "altium.py", line 627, in handle_children
    handler(self, owners, obj)
  File "altium.py", line 996, in handle_text_frame
    text=obj["TEXT"].decode("utf-8").replace("~1", "\n"),
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 5: invalid start byte

The problematic text was the following:

b'1x5 t\xfcskesor~190\xb0, 1,27mm'
Which corresponds to:

1x5 tüskesor\n90°, 1,27mm

I will do some experiments to map all the accented and special characters, but I am under an impression that Altium uses latin1 character encoding rather than plain ASCII.

@vadmium
Copy link
Owner

vadmium commented Feb 8, 2019

I expect it uses something like Latin-1 or Windows-1252. I am happy to change line 996 to decode with Latin-1. However I noted under https://github.com/vadmium/python-altium/blob/master/format.md#pin that I saw the byte 0x8E representing a broken bar (U+00A6, ¦). So the full story might not be so simple.

I have come across parallel UTF-8 properties, for instance as well as one named TEXT, there is one named %UTF8%TEXT. You don’t know if your text frame object has a UTF-8 version of the text?

@martonmiklos
Copy link
Author

Hi @vadmium

I have not found any occurrence of the "UTF" string in the file.

I think I will create a text with including the most accents, and special characters, save it and see the text to make more solid conclusion on the encoding type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants