Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Encoding.LATIN1 returns wrong text with polish letters #558

Open
superpawko opened this issue Apr 28, 2022 · 3 comments
Open

[bug] Encoding.LATIN1 returns wrong text with polish letters #558

superpawko opened this issue Apr 28, 2022 · 3 comments

Comments

@superpawko
Copy link

Maybe I'm wrong, I'm not so good in coding. Could you help me with this.
after:
tags = ID3(mp3, v2_version=3)
print(tags.getall("TIT2"))
I got this:
[TIT2(encoding=<Encoding.LATIN1: 0>, text=['Uciekaj¹ca ska³a'])]

In mp3tag program I see that everything is fine ( I see polish characters : Uciekająca skała )
It is ID3v2.3(Id3v1 Id3v2.3)

'TPE1': TPE1(encoding=<Encoding.LATIN1: 0>, text=['Roman Felczyñski'] is also broken, I have many files like this, I have no clue how to fix it. Thank you for your help.

Full tags object:
{'TIT2': TIT2(encoding=<Encoding.LATIN1: 0>, text=['Uciekaj¹ca ska³a']), 'PRIV:WM/MediaClassPrimaryID:¼}Ñ#ãâK\x86¡H¤*(D\x1e': PRIV(owner='WM/MediaClassPrimaryID', data=b'\xbc}\xd1#\xe3\xe2K\x86\xa1H\xa4*(D\x1e'), 'PRIV:WM/MediaClassSecondaryID:\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00': PRIV(owner='WM/MediaClassSecondaryID', data=b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'), 'TCON': TCON(encoding=<Encoding.LATIN1: 0>, text=['Przygodowy']), 'POPM:Windows Media Player 9 Series': POPM(email='Windows Media Player 9 Series', rating=255), 'TPE1': TPE1(encoding=<Encoding.LATIN1: 0>, text=['Roman Felczyñski'])}

@phw
Copy link
Collaborator

phw commented Apr 28, 2022

"ą" and "ł" actually cannot be encoded in latin-1 / ISO-8859-1, see https://en.wikipedia.org/wiki/ISO/IEC_8859-1 . I don't know what encoding MP3Tag is using there, I could not reproduce the exact outcome. Logical choices with regards to Polish letters would be ISO-8859-2 or on Windows maybe Windows-1250. But these would give:

>>> s = "Uciekająca skała"
>>> s.encode('iso-8859-2').decode('latin-1')
'Uciekaj±ca ska³a'
>>> s.encode('windows-1250').decode('latin-1')
'Uciekaj¹ca ska³a'

So a bit different result from yours. But anyway, both are not latin-1.

Is there any specific reason you can't use a Unicode encoding for the files?

@superpawko
Copy link
Author

superpawko commented Apr 28, 2022

I tried decode and encode myself before posting. And I got some errors. I don't know how to load it correctly or fix it. I have few TB database and a lot of files have this problem. MP3tag is getting it correctly. I thought maybe something during tags = ID3(mp3, v2_version=3) is not correct. Or can I fix it somehow later ?

Windows mp3 details view also show proper title and album name with polish letters.

edit: is this the same problem : #354 ?

I found that this is not Latin-1 But windows-1250.
I'm able to fix it with this code:
utitle = tags["TIT2"][0].encode('utf-8').decode('windows-1250').replace(u"Â", "")

But I have no clue how to detect it for rest of the files, because it is only for id3v1 files with latin-1 encoding. How can I check if TIT2 is encoded as Latin-1 ?

edit2: maybe this code:
str(tags.getall("TIT2")).find("encoding=<Encoding.LATIN1")

But I still think Mutagen could do this better, mp3tag does.

@lazka
Copy link
Member

lazka commented May 3, 2022

edit: is this the same problem : #354 ?

It's not, id3v2 has a known encoding stored in the file, which is likely wrong in your case.

edit2: maybe this code:
str(tags.getall("TIT2")).find("encoding=<Encoding.LATIN1")

tags["TIT2"][0].encoding == id3.Encoding.LATIN1 should work

But I still think Mutagen could do this better, mp3tag does.

mutagen currently doesn't second-guess encodings.

We could add something to the docs for starters with some examples though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants