Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function to sanitize string for output #203

Open
LorenzNickel opened this issue May 10, 2019 · 4 comments
Open

Function to sanitize string for output #203

LorenzNickel opened this issue May 10, 2019 · 4 comments

Comments

@LorenzNickel
Copy link
Contributor

I was wondering if we could implement a function to sanitize a string by removing (or replacing) all characters which lead to an error if they are part of an audio response. For example as far as I remember a audio string must not contain any smileys, <, > (only for ssml purposes), & and probably a lot more chars.
All somehow dynamic skills have this problem, for example a skill which tells you the last tweet of a person always has to create some sort of 'clean' string without emojis and these symbols from the last tweet. This function would probably make it impossible to use valid ssml 'commands' like speech pause in it, but if you want to use this function, you can only sanitize the part of the string you get from the api (or which is dynamic for some other reason) and can still use ssml for everything else and even around the sanitized string (just not for only parts of the sanitized string).

I think providing such a function would be really helpful (in the other sdks too), but I also understand if you don't want to have it since it's not directly related. If wanted, I probably could even create a PR for this repo implementing it.
In case you want this feature, we could just remove the characters or try to replace them with an allowed equivalent (for example & gets 'and'), but then the function would also be language dependent and I'm not even sure if there would also be a problem with symbols in a certain context, for example something like ordinal numbers. Removing would definitely be way more easy.

Independently of this decision I think we should at least provide a list of characters somewhere which are 'forbidden' or are known to cause trouble. (I'd need it for a PR too, I already know some but we'd need a complete list)

@jbrucej
Copy link

jbrucej commented May 27, 2019

Ditto. Would be helpful. Or perhaps just give us a list of those that are valid and/or invalid so we can write our own. Something like what's in the XML spec.

@kkocel
Copy link
Contributor

kkocel commented Mar 7, 2022

I use this function to sanitize input org.apache.commons.text.StringEscapeUtils.escapeXml11 and it works very well.

@rahulawl
Copy link
Contributor

Is this issue/feature-request still relevant?
We are working on prioritization of relevant issues and cleanup of rest. If we don’t hear back in 2 weeks, we will assume that the issue is not relevant and we will close it.

@LorenzNickel
Copy link
Contributor Author

Yes, I think this issue is still relevant. I'm no longer actively writing Alexa Skills, but as long as you did not publish a specification and/or created such a sanitization function, it's still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants