Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issues / Umlaut is not decoded correctly #117

Open
TomRauchenwald38 opened this issue Jan 2, 2018 · 9 comments · May be fixed by #248
Open

Encoding issues / Umlaut is not decoded correctly #117

TomRauchenwald38 opened this issue Jan 2, 2018 · 9 comments · May be fixed by #248

Comments

@TomRauchenwald38
Copy link

I have trouble decoding the QR code from this PDF (on page 27).
It seems the Umlaut in the last line is not decoded correctly. Screenshot from the live demo:
image
The last line should read ..."für Gartenarbeit und Entsorgung"...

I can decode the QR Code just fine in Java using ZXing.
If I set the the CHARACTER_SET decoding hint to "ISO-8859-1" the decoded result is exactly the same as pictured in the screenshot, so I suspect that somewhere ISO-8859-1 is assumed in InstaScan.

Here's the QR Code I used for easier copy/pasting:
qr_sample_1

Is there a way to specify the encoding to use, or is this a bug?

@dieperie
Copy link

In PHP, use: utf8_decode
Thsi converts the string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1

@dieperie
Copy link

In javascript, the following to to the same:

var decoded_content = self.utf8_decode(content);
self.scans.unshift({ date: +(Date.now()), content: decoded_content });

utf8_decode: function (str_data) {
// Converts a string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1
var string = "", i = 0, c = c1 = c2 = 0;

	while ( i < str_data.length ) {
		c = str_data.charCodeAt(i);
		if (c < 128) {
			string += String.fromCharCode(c);
			i++;
		} else if((c > 191) && (c < 224)) {
			c2 = str_data.charCodeAt(i+1);
			string += String.fromCharCode(((c & 31) << 6) | (c2 & 63));
			i += 2;
		} else {
			c2 = str_data.charCodeAt(i+1);
			c3 = str_data.charCodeAt(i+2);
			string += String.fromCharCode(((c & 15) << 12) | ((c2 & 63) << 6) | (c3 & 63));
			i += 3;
		}
	}
	return string;

@yamnikov-oleg
Copy link

Having the same issue. Cyrillics are decoded into gibberish:

�анн�й к�пон �гене�и�ован

@fariskas
Copy link

fariskas commented Sep 6, 2019

having same issues with korean language

@alekciy
Copy link

alekciy commented May 9, 2020

Having the same issue. Cyrillics are decoded into gibberish:

�анн�й к�пон �гене�и�ован

Проблема с этом куске

let str = String.fromCharCode.apply(null, result);
но я пока еще не разобрался как это пофиксить.

@yamnikov-oleg yamnikov-oleg linked a pull request May 9, 2020 that will close this issue
@yamnikov-oleg
Copy link

@alekciy Thank you for the tip, I have added utf8 decoder in that line and it worked.

@yamnikov-oleg
Copy link

Though this might not get merged. In case somebody needs this fix, you can clone the repo, apply the fix yourself and rebuild the package with:

npm install
./node_modules/.bin/gulp release

The instascan.min.js will appear in dist directory.

@alekciy
Copy link

alekciy commented May 10, 2020

@alekciy Thank you for the tip, I have added utf8 decoder in that line and it worked.

А если cp1251? Например, платежки по ГОСТ Р 56042-2014 формат ST00011. В идеале добавить бы детектор кодировки.

@yamnikov-oleg
Copy link

@alekciy I don't think there is a reliable way to detect text encoding, especially when it's CP encodings. It would probably be better to add an encoding parameter to the Scanner class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants