Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating XML values for ASCII characters like newline or cr #57

Open
cmarkle opened this issue May 4, 2023 · 5 comments
Open

Generating XML values for ASCII characters like newline or cr #57

cmarkle opened this issue May 4, 2023 · 5 comments

Comments

@cmarkle
Copy link

cmarkle commented May 4, 2023

Amazon s3 key names with ASCII chars like \n or \r are expected to be mapped in XML data to like "& # 13; " or "& # 10 ;" (added spaces to ensure all would be seen here) or the like. Can I motivate XmlBuilder to generate that? When I use XmlBuilder.generate with character data with those special characters in it, it is not happening like this:

iex(17)> {:person, %{id: 12345}, "Josh\n"} |> XmlBuilder.generate
"<person id=\"12345\">Josh\n</person>"

...where I would like it to generate:

"<person id=\"12345\">Josh&#10;</person>"

I looked in the tests to see if there are any examples of this and I did not see any...

Thanks in advance for any help on this.

@joshnuss
Copy link
Owner

joshnuss commented May 6, 2023

Hi @cmarkle,

According to the XML spec:

Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646.

Can you explain the use case, what you are trying to accomplish?

@cmarkle
Copy link
Author

cmarkle commented May 6, 2023

@joshnuss Sorry should have been more specific about why I want to do this. We are pulling Amazon S3 bucket data, specifically a list of objects in the bucket, which is returned as XML. If our customer managed to make an object in with a '\r' (or other funky ASCII character) in the object key name (which is do-able, although not recommended, in S3), then we'd like the accurate name (by "accurate" I mean '\r' not altered/mapped to '\n') so that we can turn around and use that object name is something like a delete request. As it is right now we can't delete these funkily-named objects.

Here's a specific example of what I am talking about:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Name>bucket-name-obfuscated</Name>
  <Prefix></Prefix>
  <NextContinuationToken>1ILUW_obfuscated_YlD9k0Yuf7RxD4ArX1yMUM</NextContinuationToken>
  <KeyCount>1000</KeyCount>
  <MaxKeys>1000</MaxKeys>
  <Delimiter></Delimiter>
  <IsTruncated>true</IsTruncated>
  <Contents>
    <Key>workspaces/51109/packages/ND-7H46rSQ.asp-package/contents/WHALE -&gt; MH/Whale_TTB_DOCUMENTS/Aspera Inboxes/Icon&#13;</Key>
    <LastModified>2022-03-25T23:52:22.122Z</LastModified>
    <ETag>&quot;d41d8cd98f00b204e9800998ecf8427e&quot;</ETag>
    <Size>0</Size>
    <StorageClass>STANDARD</StorageClass>
  </Contents>
</ListBucketResult>

Note the " " ('\r') at the end of the Key.

@joshnuss
Copy link
Owner

joshnuss commented May 6, 2023

Thanks for the clarification,

When your parser parses &#13;, does it convert it to \r?

Because running \r through this library should be preserved (I believe)

@cmarkle
Copy link
Author

cmarkle commented May 7, 2023

@joshnuss Our parser (which right now is SweetXml) does convert &#13; to \r, but then maps the single \r' to newline \n', which is its own problem in my case.

Running \r through XmlBuilder does preserve the \r:

iex(13)> {:person, %{id: 12345}, "Josh\n"} |> XmlBuilder.generate
"<person id=\"12345\">Josh\n</person>"
iex(14)> {:person, %{id: 12345}, "Josh\r"} |> XmlBuilder.generate 
"<person id=\"12345\">Josh\r</person>"

AWS guidance on this is documented in "Creating object key names (https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html) - see in particular the section "XML related object key constraints". Duplicating that here:

XML related object key constraints
As specified by the [XML standard on end-of-line handling](https://www.w3.org/TR/REC-xml/#sec-line-ends), 
all XML text is normalized such that single carriage returns (ASCII code 13) and carriage returns 
immediately followed by a line feed (ASCII code 10) are replaced by a single line feed character. To 
ensure the correct parsing of object keys in XML requests, carriage returns and [other special characters 
must be replaced with their equivalent XML entity code](https://www.w3.org/TR/xml/#syntax) when they 
are inserted within XML tags. The following is a list of such special characters and their equivalent entity 
codes:

' as &apos;
” as &quot;
& as &amp;
< as &lt;
> as &gt;
\r as &#13; or &#x0D;
\n as &#10; or &#x0A;

The following example illustrates the use of an XML entity code as a substitution for a carriage return. 
This DeleteObjects request deletes an object with the key parameter: 
/some/prefix/objectwith\rcarriagereturn (where the \r is the carriage return).

<Delete xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Object>
    <Key>/some/prefix/objectwith&#13;carriagereturn</Key>
  </Object>
</Delete>

So net-net, if there was a way that XmlBuilder could be motivated to do this for \r and \n like it does for other special characters, that would be great:

iex(15)> {:person, %{id: 12345}, "Josh<>\r"} |> XmlBuilder.generate
"<person id=\"12345\">Josh&lt;&gt;\r</person>"

@joshnuss
Copy link
Owner

joshnuss commented May 7, 2023

We could add it to the list of escaped string patterns, see escape_string/1:

defp escape_string(""), do: ""
defp escape_string(<<"&"::utf8, rest::binary>>), do: escape_entity(rest)
defp escape_string(<<"<"::utf8, rest::binary>>), do: ["&lt;" | escape_string(rest)]
defp escape_string(<<">"::utf8, rest::binary>>), do: ["&gt;" | escape_string(rest)]
defp escape_string(<<"\""::utf8, rest::binary>>), do: ["&quot;" | escape_string(rest)]
defp escape_string(<<"'"::utf8, rest::binary>>), do: ["&apos;" | escape_string(rest)]
defp escape_string(<<c::utf8, rest::binary>>), do: [c | escape_string(rest)]

Do you wanna open a PR?
Also, I'm thinking to wait on merging it until there are more requests about this fix.

Keep in mind, this is a package that many apps are dependent on, so I hesitate to change the output and cause extra working for anyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants