HTML Entity Name munging in XML listings #103

benjwadams · 2023-05-22T17:56:58Z

ERDDAP does some bizarre name munging to HTML entities in XML listings.

For example in https://gcoos4.tamu.edu/erddap/metadata/iso19115/xml/ there are numerous href values like this
2004JuvenileSportfishNOAA_DATA_Mean_v0_0_iso19115.xml

Most browsers will transform this, but I have had issues with following links in some Python libraries if these HTML entities aren't explicitly escaped beforehand. It's also a pretty odd way to represent simple characters like periods and underscores where the usual characters would suffice. Any reason why these characters shouldn't be used instead of encoding to HTML entities?

The text was updated successfully, but these errors were encountered:

BobSimons · 2023-05-22T20:05:35Z

It is the attributes of HTML and XML tags that must be strongly encoded, for security reasons. The code that does this is in com/cohort/util/XML.java in the method called encodeAsHTMLAttribute. The JavaDoc for that method explains:

 * For security reasons, for text that will be used as an HTML or XML attribute, 
 * this replaces non-alphanumeric characters with HTML Entity &amp;#xHHHH; format.
 * See HTML Attribute Encoding at
 * [https://owasp.org/www-pdf-archive/OWASP_Cheatsheets_Book.pdf](https://owasp.org/www-pdf-archive/OWASP_Cheatsheets_Book.pdf)
 * pg 188, section 25.4 
 * "Encoding Type: HTML Attribute Encoding
 * Encoding Mechanism: 
 * Except for alphanumeric characters, escape all characters with the HTML Entity &#xHH;
 * format, including spaces. (HH = Hex Value)".
 * On the need to escape HTML attributes: [http://wonko.com/post/html-escaping](http://wonko.com/post/html-escaping)

Both of the links there are interesting reading.

One might argue that in some circumstances this strict encoding is not necessary. Perhaps. Perhaps not. The problem is that it is very time consuming (even if we assume the programmer has 100% understanding of the situation) and error prone to try to make that determination. It is vastly simpler and (more important) vastly safer to just routinely encode all attributes in the safe and recommended way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML Entity Name munging in XML listings #103

HTML Entity Name munging in XML listings #103

benjwadams commented May 22, 2023

BobSimons commented May 22, 2023

HTML Entity Name munging in XML listings #103

HTML Entity Name munging in XML listings #103

Comments

benjwadams commented May 22, 2023

BobSimons commented May 22, 2023