Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement allowUnsafeSymbols option (which defaults to false) for he.encode #23

Closed

Conversation

jugglinmike
Copy link
Contributor

Hey @mathiasbynens!

I'd love to use this library in my polyfill for the iframe srcdoc attribute, but I think I need an additional feature. The functionality I'm proposing here would help me to resolve an issue that was recently filed against that project.

Commit message:

Although he.escape allows for escaping "unsafe" markup characters,
there is currently no way to escape the inverse set--non-ASCII
characters only. This is useful in contexts where markup is allowed, but
non-ASCII characters might be otherwise mangled (as in the iframe
srcdoc attribute, for example).

Introduce a new option to he.encode which disables the escaping of
unsafe markup characters. Preserve API compatability by disabling this
behavior by default.

@mathiasbynens
Copy link
Owner

Your patch looks good so far. Some remarks:

  1. I’d like to rename markupChars to something that more closely matches the other option names, though – escapeUnsafeSymbols for example. What do you think? Any suggestions?
  2. More tests are needed that show how this new option combines with the other existing options (such as escapeEverything).
  3. The CLI needs to support this new option too.

By the way, this would solve #16.

Note that this new option should not be used in cases where the markup can contain custom elements, since those can contain all kinds of symbols and escaping e.g. <λ-calculus> as <&#x3BB;-calculus> would break stuff. The same goes for non-HTML contexts, such as the contents of a <script> or <style> element (you don’t want to HTML-escape anything in there). Perhaps this should be added to the docs as a warning?

@jugglinmike
Copy link
Contributor Author

@mathiasbynens I've renamed the option, but I have a few questions about your other requests:

  1. This is an option for the encode method and the library already supports an option named encodeEverything, so what about naming this option encodeUnsafeSymbols instead of escapeUnsafeSymbols?
  2. How should the library behave when both encodeEverything and this new option are set to true encodeEverything is set to true and this new option is set to false? There is a little ambiguity here because both options control the same behavior, just on different character sets. I could see it encoding everything except unsafe characters, but I could also see "everything" taking precedence, resulting in everything being encoded (in other words, ignoring escapeUnsafeSymbols). Because of the ambiguity, I think either decision could be interpreted as arbitrary, so I'm inclined to throw an error if both are set. Do you think that's the right approach?
  3. It doesn't look like the project explicitly tests the CLI--could you confirm?

@mathiasbynens
Copy link
Owner

@jugglinmike

  1. Good call! +1 to encodeUnsafeSymbols rather than escape….
  2. IMHO encodeEverything should take precedence, we should update the documentation to reflect this, and not throw an error (i.e. silently do what the documentation says it does). No one is going to use these options without reading the documentation anyway. Perhaps the problem could be avoided by renaming encodeEverything into something like encodeSafeAsciiSymbols or encodeAsciiSymbols?
  3. At the moment the CLI does not have tests, indeed.

Thanks for your work on this so far!

@jugglinmike
Copy link
Contributor Author

@mathiasbynens Thanks for the clarification. I've held off on renaming the existing method because I'm not sure it resolves the ambiguity, and the suggestion seemed more like an afterthought. Of course, I'm happy to change it if you would like me to!

The encodeUnsafeSymbols option is a little unique in that it defaults to true, and reflecting this in the CLI presented a little bit of a problem. I decided that consistency between the flags within the CLI (via --allow-unsafe) is more important than consistency across the CLI and JavaScript API (via, for instance, --unsafe=false). I'm not sure if that matches how you would design it, so I wanted to call it to your attention.

By the way, I expect you'll want to squash all this down when we get to a final draft. I'd be happy to do that too, just let me know.

@@ -106,7 +106,7 @@ he.encode('foo © bar ≠ baz 𝌆 qux', {

#### `encodeEverything`

The default value for the `encodeEverything` option is `false`. This means that `encode()` will not use any character references for printable ASCII symbols that don’t need escaping. Set it to `true` to encode every symbol in the input string.
The default value for the `encodeEverything` option is `false`. This means that `encode()` will not use any character references for printable ASCII symbols that don’t need escaping. Set it to `true` to encode every symbol in the input string. When set to `true`, this option will take precedence over `encodeUnsageSymbols` (setting the latter to `false` will have no effect).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: encodeUnsafeSymbols

@mathiasbynens
Copy link
Owner

@jugglinmike Your arguments convinced me that allowUnsafeSymbols would actually be an even better name (because then the default value is false, like all other options). 👍 (I don’t like the double negative in allowUnsafeSymbols: false but hey.)

@jugglinmike
Copy link
Contributor Author

Thanks for the feedback, @mathiasbynens — I think I've addressed it all.

he.encode('foo © and & ampersand');
// → 'foo &#xA9; and & ampersand'
```

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs for he.encode() should be updated too (maybe the first paragraph can just get a link to this section). I can do this after merging your PR, no worries.

@mathiasbynens
Copy link
Owner

This looks great! Could you squash your commits please?

If you don’t want to fix those last few nitpicks I just left as comments, no worries, I’m happy to take care of them after merging your patch. Just let me know.

@jugglinmike
Copy link
Contributor Author

No problem at all, Mathias. I am a bit busy this week, so I don't expect to address these final issues until Saturday. If you'd like to see this merged before then, though, be my guest :)

Although `he.escape` allows for escaping "unsafe" markup characters,
there is currently no way to escape the inverse set--non-ASCII
characters only. This missing functionality is useful in contexts where
markup is allowed, but non-ASCII characters might be otherwise mangled
(as in the iframe `srcdoc` attribute, for example).

Introduce a new option to `he.encode` which disables the escaping of
unsafe markup characters. Preserve API compatability by disabling this
behavior by default.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.07%) when pulling 37f0e29 on jugglinmike:markupchars-option into 78bc2bb on mathiasbynens:master.

@jugglinmike
Copy link
Contributor Author

@mathiasbynens I've fixed the typo, squashed the commits, and updated the commit message. I held off on the README.md change you requested because I'm not sure how to best address it. It will probably save us a few more rounds of review if I leave that to you :)

@mathiasbynens mathiasbynens changed the title Implement markupChars option for he.encode Implement allowUnsafeSymbols option (which defaults to false) for he.encode Aug 24, 2014
mathiasbynens pushed a commit that referenced this pull request Aug 24, 2014
Although `he.escape` allows for escaping ‘unsafe’ markup characters, there is currently no way to escape the inverse set – non-ASCII characters only. This missing functionality is useful in contexts where markup is allowed, but non-ASCII characters might be otherwise mangled (as in the `iframe` `srcdoc` attribute, for example).

Introduce a new option to `he.encode` which disables the escaping of unsafe markup characters. Preserve API compatability by disabling this behavior by default.

Closes #16 and #23.
@mathiasbynens
Copy link
Owner

Thanks, I’ve tweaked your patch and merged it. Will release a new version now.

$ npm install he@0.5.0 --save-dev

mathiasbynens pushed a commit that referenced this pull request Aug 24, 2014
Although `he.escape` allows for escaping ‘unsafe’ markup characters, there is currently no way to escape the inverse set – non-ASCII characters only. This missing functionality is useful in contexts where markup is allowed, but non-ASCII characters might be otherwise mangled (as in the `iframe` `srcdoc` attribute, for example).

Introduce a new option to `he.encode` which disables the escaping of unsafe markup characters. Preserve API compatability by disabling this behavior by default.

Closes #16 and #23.
@jugglinmike
Copy link
Contributor Author

Awesome--thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IE9 encoding issue
3 participants