Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spaces between tags are treated as pcdata #330

Open
Mbodin opened this issue Dec 13, 2023 · 7 comments
Open

Spaces between tags are treated as pcdata #330

Mbodin opened this issue Dec 13, 2023 · 7 comments

Comments

@Mbodin
Copy link
Contributor

Mbodin commented Dec 13, 2023

Consider the following OCaml file:

open Tyxml
let%svg test = {|<g> </g>|} (* Notice the space between the opening and closing of the g tag. *)

The content of the g tag is interpreted as a [> `PCDATA ] Tyxml_svg.elt which is incompatible with [< Svg_types.g_content ] Tyxml_svg.elt, and I get a type error. In other words the space between the opening and closing of the g tag is interpreted as a pcdata. The same goes if I place a newline instead of the space.

I'm surprised as the SVG specification itself uses a lot of spacing in its examples. For instance the following SVG is provided as a valid example in https://www.w3.org/TR/2003/REC-SVG11-20030114/struct.html#GroupsOverview despite having the same spacing between its g tags.

<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" 
  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg width="4in" height="3in" version="1.1"
     xmlns="http://www.w3.org/2000/svg">
  <desc>Groups can nest
  </desc>
  <g>
     <g>
       <g>
       </g>
     </g>
   </g>
</svg>

I don't know the SVG specification enough to state what is meant to be there: should text with only white spaces be ignored instead of being treated as pcdata?

@Drup
Copy link
Member

Drup commented Dec 13, 2023

We had a similar issue in HTML (I though it was in table). Some elements should have a special parsing mode to be white-space insensitive. We introduced it for appropriate HTML elements, so it's a matter of reusing that for the appropriate SVG elements.

@Mbodin
Copy link
Contributor Author

Mbodin commented Dec 13, 2023

Thanks! I can see the discussion in #225.
I haven't found a place in the SVG specification listing which elements should be whitespace insensitive and which shouldn't. Would treating any tags not accepting pcdata as whitespace insensitive be too much?

@Drup
Copy link
Member

Drup commented Dec 13, 2023

I'm not quite sure. @aantron , you probably have a bit more expertise there, do you have an opinion ?

@aantron
Copy link
Contributor

aantron commented Dec 14, 2023

I don't have an opinion on this. All I can say is that for let%svg it should probably be parsed or adjusted according to the SVG specification, which I am not familiar with. For let%html it should be parsed according to the HTML specification. SVG tags inside an HTML document are not parsed according to the SVG specification, but according to the HTML rules for foreign elements inside HTML, which specifically includes rules for SVG. So on a syntax level, SVG inside HTML is not SVG proper, but looks like SVG.

@Drup
Copy link
Member

Drup commented Dec 15, 2023

So on a syntax level, SVG inside HTML is not SVG proper, but looks like SVG.

<insert appropriate swearing/> 🤦

I think we can probably gloss over that.

@Mbodin I agree with your proposal, to ignore all whitespace for elements that don't accept pcdata. It might be worth glancing at the spec if something is specified in that regard, but otherwise, that will suffice.

@ncitron
Copy link

ncitron commented Feb 2, 2024

Is there a fix for this?

It seems like pcdata should be allowed, especially for examples that use <text>somestuff</text>.

I have been unable to work with types like this which is unfortunate.

@Mbodin
Copy link
Contributor Author

Mbodin commented Feb 5, 2024

@ncitron I think that the above-mentioned branch https://github.com/Mbodin/tyxml/tree/whitespace (PR #331) fixes the issue.

pcdata is definitively allowed. I did not (at least intentionally) change the behaviour of any tag already accepting pcdata, so the behaviour of <text> should be unchanged. I however accepted white-space-only pcdata (which are ignored) in all the other tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants