Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IE9 encoding issue #5

Open
Cody-SDLGov opened this issue Aug 6, 2014 · 4 comments
Open

IE9 encoding issue #5

Cody-SDLGov opened this issue Aug 6, 2014 · 4 comments

Comments

@Cody-SDLGov
Copy link

The following HTML code demonstrates a problem with IE9:

<html>
<head>
<meta charset="utf-8">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<script src="jquery.js"></script>
</head>
<body>  
<script>
function setFrame() {
    alert("Setting srcDoc for testFrame");
    srcDoc.set(document.getElementById("testFrame"),
     '<html><head><meta charset=\'UTF-8\'></head><body>' + 
       String.fromCharCode(1601) + 
     '</body></html>');
}     
</script>
  <button onclick="setFrame()">Click me</button>
  Frame 1:<br>
 <iframe src="about:blank" id="testFrame">
 </iframe>
 <script src="polyfill.js"></script>
</body>
</html>

I expect to see the Arabic character "ف", but instead see mangled text output.

I can inject an alert into polyfill that shows the content it receives is proper UTF-8.

I am serving this from a basic nginx on linux.

The problem does not show up when browsing with Chrome/FF/IE10.

(I forced polyfill to use the 'legacy' path for those to be sure).

The same issue can be seen with what is effectively the result of this:

document.getElementById("testFrame").src = "javascript:{String.fromCharCode(1601);};";

For IE9, this produces the mangled text. For IE10, it is fine.

The UTF-8 bytes in the content present when I use "view source" on the iFrame appear to be 0x0041 0x0006 in Hex, which in fact is the reverse (and padded) value of the actual desired UTF8 of 0x06 0x41.

I can also inject HTML escaped items in Hex, ie: &#x0641. This works for IE9. But using the actual UTF-8 byte sequence, which is the case in many input files, does not. Therefore, a general solution may need to escape all multibyte input characters.

@Cody-SDLGov
Copy link
Author

I have had some success integrating the approach used in https://github.com/mathiasbynens/jsesc, with modifications to always use "&#x0000;" style instead of \u or \x, and to not escape for quoting, just leave items like \n alone, etc.

@jugglinmike
Copy link
Owner

Thanks for the report, @Cody-SDLGov! @mathiasbynens maintains a somewhat related project named "he" that I think would be more appropriate here. If I understand the problem correctly, though, we'll need a new feature. I've submitted a draft implementation for review; let's see how that goes and decide next steps after that.

@jugglinmike
Copy link
Owner

(Also, you can see my reduced test case on the non-ascii branch.)

@mathiasbynens
Copy link

The he@0.5.0 release includes @jugglinmike’s patch! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants