You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Often, when consuming HTML or XML files from external sources, it's desirable to normalize the entities. For example, I'm interacting with an API that produces XML where all non-ASCII characters are encoded as numbered entities, making non-Latin-script text completely unreadable. I want to debug and store these files in a format that's human-readable as well as machine-readable, while remaining valid UTF-8 XML.
Describe the solution you'd like
Currently, html/entities exports escape and unescape functions. I suggest exporting a third function (tentatively named normalize) that normalizes all entities in a string of HTML or XML to a form that's valid, interoperable, and (mostly) human-readable:
It might be worth having multiple normalized forms (which would likely also affect the API surface area of escape); for example, a "readability" form that converts 两只小蜜蜂 to 两只小蜜蜂 vs a "compatibility" form that converts in the opposite direction. I don't currently have a use case for the "compatibility" form as any XML-consuming APIs I need to interact with either default to UTF-8 or respect UTF-8 where specified, but it might be useful for users needing to interact with legacy or poorly-designed APIs.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Often, when consuming HTML or XML files from external sources, it's desirable to normalize the entities. For example, I'm interacting with an API that produces XML where all non-ASCII characters are encoded as numbered entities, making non-Latin-script text completely unreadable. I want to debug and store these files in a format that's human-readable as well as machine-readable, while remaining valid UTF-8 XML.
Describe the solution you'd like
Currently, html/entities exports
escape
andunescape
functions. I suggest exporting a third function (tentatively namednormalize
) that normalizes all entities in a string of HTML or XML to a form that's valid, interoperable, and (mostly) human-readable:Describe alternatives you've considered
It might be worth having multiple normalized forms (which would likely also affect the API surface area of
escape
); for example, a "readability" form that converts两只小蜜蜂
to两只小蜜蜂
vs a "compatibility" form that converts in the opposite direction. I don't currently have a use case for the "compatibility" form as any XML-consuming APIs I need to interact with either default to UTF-8 or respect UTF-8 where specified, but it might be useful for users needing to interact with legacy or poorly-designed APIs.The text was updated successfully, but these errors were encountered: