Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change AntiSamy to eventually use SAX parser by default, but allow DOM parser to be used for backward compatibility #800

Open
kwwall opened this issue Sep 30, 2023 · 0 comments
Assignees

Comments

@kwwall
Copy link
Contributor

kwwall commented Sep 30, 2023

After a discussion with the AntiSamy team, at some point in the not too distant future, they would like to deprecate their use of the DOM parser and only support the SAX parser. This

For ESAPI, that simple change could be done as simply as:

+++ b/src/main/java/org/owasp/esapi/reference/validation/HTMLValidationRule.java
@@ -239,7 +239,7 @@ public class HTMLValidationRule extends StringValidationRule {
 
         try {
             AntiSamy as = new AntiSamy();
-            CleanResults test = as.scan(canonical, antiSamyPolicy);     // Uses AntiSamy.DOM scanner by default.
+            CleanResults test = as.scan(canonical, antiSamyPolicy, AntiSamy.SAX);     // Use the faster SAX scanner.
 
             List<String> errors = test.getErrorMessages();
             if ( !errors.isEmpty() ) {

Unfortunately, that simple change also causes a few existing ESAPI tests to fail, so while we can "adjust" these tests accordingly, it does seem to point out (perhaps because of bugs in one or both parsers) that using the DOM vs SAX parser is not 100% compatible.

Therefore, what we need is a new property (I propose 'Sanitizer.parser', with possible values of either SAX or DOM and some appropriate comments, so that while we have the ESAPI implementation default to AntiSamy.DOM (the current behavior), but allowing clients to easily switch it to using AntiSamy.DOM by setting

Sanitizer.parser=DOM

We will, of course, have to clearly state all this in our release notes and elsewhere, which no doubt many will ignore, thus causing us to answer the question endlessly of why things break when we eventually flip 'Santizer.parser' to 'SAX', but at least we can give a heads up to those who are paying attention.

@kwwall kwwall self-assigned this Sep 30, 2023
@kwwall kwwall changed the title Change AntiSamy to use SAX parser by default, but allow DOM parser to be used for backward compatibility Change AntiSamy to eventually use SAX parser by default, but allow DOM parser to be used for backward compatibility Sep 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant