Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments, RAWTEXT, script data, etc., should not be escaped #90

Closed
aantron opened this issue Mar 8, 2016 · 3 comments
Closed

Comments, RAWTEXT, script data, etc., should not be escaped #90

aantron opened this issue Mar 8, 2016 · 3 comments
Milestone

Comments

@aantron
Copy link
Contributor

aantron commented Mar 8, 2016

See the specification for serialization.

However, this program

open Html5.M
open Html5.P

let () =
  print_list print_string
    [tot (Xml.comment "foo&");
     script (pcdata "foo&");
     style [pcdata "foo&"]]

produces this output:

<!--foo&amp;--><script>foo&amp;</script><style>foo&amp;</style>

In particular, this seems to prevent comments such as:

<!--[if IE 8]> <html class="no-js lt-ie9" lang="en"> <![endif]-->

being generated by expressions such as

tot (Xml.comment "[if IE 8]> <html class=\"no-js lt-ie9\" lang=\"en\"> <![endif]")

(credit to @sgrove for a query).

In the case of comments, if the user includes --> as text, we obviously have a problem if we "fix" this issue. Perhaps we can escape only those. Similarly, in RAWTEXT of a style tag, we may have to handle </style>.

Rather than fixing this immediately in TyXML, I propose that before TyXML 4.0,

  • I split up Markup.ml into subpackages, or aliased modules, one of which will include the HTML writer, but exclude any other writers or parsers,
  • I add a polyglot output mode to Markup.ml,
  • Markup.ml (IIRC) already properly doesn't escape the above fragments, but we decide what to do about --> in comments, etc., and do it, or support doing it, in Markup.ml,
  • we use Markup.ml's HTML writer to do the HTML writing.

This will leave us with only one place at which to maintain conformance with HTML5 and XHTML. It will add Markup.ml as a run-time dependency to projects using TyXML (as opposed to only build-time with the PPX). The split of Markup.ml into packages or aliased submodules should prevent the linking of unnecessary code into users' programs, such as the enormous HTML5 parser.

@Drup Drup added this to the 4.0 milestone Mar 29, 2016
@Drup
Copy link
Member

Drup commented Mar 29, 2016

It will add Markup.ml as a run-time dependency to projects using TyXML

This may not be true, Tyxml is currently split between the functor part and the concrete implementation. I have no problem adding dependencies to the concrete implementation (especially a printer) since it's not used inside eliom and js_of_ocaml.

@Drup
Copy link
Member

Drup commented Apr 7, 2016

This is now partially solved, for comments. Export to markup printer is still desirable but should be addressed in #100

Unescaped data (for script/style) can be introduced with Unsafe.data.

@Drup Drup closed this as completed Apr 7, 2016
@aantron
Copy link
Contributor Author

aantron commented Apr 8, 2016

Regarding Unsafe.data, I think the issue is a bit more complicated. For example, in a stylesheet, the text </style> is not allowed. So, if it is present, it should be escaped. Everything else is tokenized literally (no entity or character references).

Also, if Unsafe.data (or some other function) should be used for script or stylesheet data, the PPX should emit that instead of pcdata in the AST.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants