Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

id3 0.80 does not correctly handle non-English characters. #6

Open
WonderRat opened this issue Mar 4, 2016 · 3 comments
Open

id3 0.80 does not correctly handle non-English characters. #6

WonderRat opened this issue Mar 4, 2016 · 3 comments
Assignees

Comments

@WonderRat
Copy link

WXPSP3

The Russian text is written in ID3V1 are encoded in CP1251 but ID3 shows nonsense (i expect output in 866 - its russian OEM codepage):

>id3 -q "%t\n%a\n%l\n%c" russian1.mp3
AAAAAA?CEEEEIIII
?NOOOOO?OUUUUY??
aaaaaa?ceeeeiiii
?nooooo?ouuuuy?y??

I suspect problem in charconv.cpp in "template<> conv<>::data conv::decode(const char* s, size_t len)".


Strings from ID3V2 (russian text in unicode) printed in wrong codepage:

E:\>id3 -q "%t\n%a\n%l\n%c" russian2.mp3
└┴┬├─┼╞╟╚╔╩╦╠═╬╧
╨╤╥╙╘╒╓╫╪┘┌█▄▌▐▀
рстуфхцчшщъыьэюя
ЁёЄєЇїЎў°∙·√№¤■ и╕

Its 1251 shown as 866.

If i change console codepage to 1251 and recode output from 1251 to 866, then text is correct:

>chcp 1251
>id3 -q "%t\n%a\n%l\n%c" russian2.mp3 | iconv -f CP1251 -t CP866

http://i.imgur.com/6Fe7LO2.png

АБВГДЕЖЗИЙКЛМНОП
РСТУФХЦЧШЩЪЫЬЭЮЯ
абвгдежзийклмноп
рстуфхцчшщъыьэюяЁё

samples.zip

russian2_1251.txt, russian2_correct_866.txt - redirected output

@squell
Copy link
Owner

squell commented Mar 4, 2016

  • Information contained in ID3v1 is treated by id3 as encoded in ISO-8859-1 (a subset of cp1252). If you really want I can give you a patch that changes that behaviour.
  • The current version of id3.exe will output everything in the ANSI codepage, not the OEM codepage.

Have you tried switching your terminal to a Truetype font (e.g. Lucida Console, or Consolas)? That should correct the output in the ID3v2 case.

@WonderRat
Copy link
Author

Information contained in ID3v1 is treated by id3 as encoded in ISO-8859-1 (a subset of cp1252).

ISO-8859-1 and cp1252 don't have cyrillic letters, so all russian strings in ID3v1 are written in cp1251 (yes, old mp3s, but they still come across). Why don't treat ID3v1 as ANSI? English users will not suffer from that - their ANSI code page will be 1252. WinAPI have alias CP_ACP (0x0) for that - real code page depends from locale settings.
https://msdn.microsoft.com/en-us/library/dd374130%28v=vs.85%29.aspx
My players and tag editors treat them as ANSI.

Have you tried switching your terminal to a Truetype font

It works, but i like my raster font (modified 8x16, not that in the screenshot).
I don't like Lucida Console, or Consolas as console font and don't need display all unicode symbols in console.
I thought windows console programs should using OEM code page in first place (because it default) - like DOS programs.
May be recoding option in commandline?

@squell squell self-assigned this Mar 8, 2016
@squell
Copy link
Owner

squell commented Mar 9, 2016

  1. I do recognize that a switch to make id3 bug-compatible with regards to ID3v1 handling of other software might be useful, so I'll consider adding this; but probably only for reading/converting tags.
  2. I think the default nowadays is to write a "Unicode" application, which I'm working on (as soon as I have time available again), but those again require a Truetype-font.

In the end, I want id3 to work on Windows as it does on Linux/BSD:

C:\> id3 file.mp3
File: file.mp3
Metadata: ID3v2.3
Title: Something from Japan
Artist: 日本語

Until then, using the ANSI codepage makes more sense to me: if I redirect the output of id3 to a file, I expect to be able to read it using notepad. Commandline arguments are encoded in the ANSI codepage, as is the filesystem, etc. The OEM codepage to me is a relic from the Win3.x/Win9x days (which relied on DOS for its console); AFAICT it is only really necessary if you use the console full-screen.

So, I am going to finish the Unicode-build first; then we'll see how that functions in a console with a non-Truetype font. But supporting that is really low on my priority list.

@squell squell added the win32 label Jun 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants