Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DefaultEncoder / getCanonicalizedURI returns mix encoding for HTML special characters #824

Closed
xeno6696 opened this issue Jan 22, 2024 Discussed in #823 · 5 comments
Closed

Comments

@xeno6696
Copy link
Collaborator

Discussed in #823

Originally posted by krog78 January 19, 2024

Hi,

DefaultEncoder / getCanonicalizedURI returns mix encoding for HTML special characters in query string (and does not seem to canonicalize the parameter value despite the fact it is mentionned):

//In the case of a uri query, we need to break up and canonicalize the internal parts of the query.

And the canonicalize is applied to scheme, host, port and also UriSegment.SCHEMSPECIFICPART, is it really relevant?

Thanks,
Regards,
Sylvain

xeno6696 added a commit to xeno6696/esapi-java-legacy that referenced this issue Jan 22, 2024
…javadoc to indicate that the method takes into consideration canonicalization of mixed/multi encoded URLs as specified in ESAPI.props 'allowMixed' and 'allowMultiple' accordingly.
@xeno6696
Copy link
Collaborator Author

xeno6696 commented Jan 22, 2024

Hi,
sorry for the response delay,

We effectively have both parameters set to false:

Encoder.AllowMultipleEncoding=false
Encoder.AllowMixedEncoding=false

The URL we are using is of kind /webapp/ux/home?d=1705914006565&status=login&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D&newsess=false&roleid=DP010101/0007&origin=ourprogram, this is a relative URL but I think the problem also occurs with full URL. The following warnings are written:

22-Jan-2024 10:03:28.231 AVERTISSEMENT [http-nio-8080-exec-8] org.owasp.esapi.logging.java.JavaLogLevelHandlers.log [SECURITY FAILURE Anonymous:58505@unknown -> 0:0:0:0:0:0:0:1:8080/eTemptation/Encoder] Mixed encoding (2x) detected in /webapp/ux/home?d=1705914006565&status=login&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D&newsess=false&roleid=DP010101/0007&origin=ourprogram
22-Jan-2024 10:03:52.919 AVERTISSEMENT [http-nio-8080-exec-8] org.owasp.esapi.logging.java.JavaLogLevelHandlers.log [SECURITY FAILURE Anonymous:58505@unknown -> 0:0:0:0:0:0:0:1:8080/eTemptation/Encoder] Mixed encoding (2x) detected in d=1705914006565&status=login&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D&newsess=false&roleid=DP010101/0007&origin=ourprogram

The warning is produced when seg = SCHEMSPECIFICPART and on seg = QUERY because of line

esapi-java-legacy/src/main/java/org/owasp/esapi/reference/DefaultEncoder.java

Line 571 in 2136292
String value = canonicalize(parseMap.get(seg), allowMultiple, allowMixed);
(the full line is canonicalized).

Note: the canonicalize parameters of the function are restrictMultiple and restrictMixed but we are passing allowMultiple and allowMixed is it normal?

The first HTMLEntityCodec decodes the string as:

/webapp/ux/home?d=1705914006565&status=login&ticket=1705914653964_thWhiiFp_VESwCkQ-Rq0TU0LZWVKuRxpSUmOzIMsZNCcUIiYGMXX_Q%3D%3D≠wsess=false&roleid=DP010101/0007∨igin=ourprogram

&or has been interpreted as HTML special char (is it normal? I made a test with Chrome, Firefox and Edge with the following code and none is interpreted the special character : Art and Copy).

How should we validate such URLs (containaing HTML special chars) ?

Thanks,
Regards,
Sylvain

Moved @krog78's comment here.

@xeno6696
Copy link
Collaborator Author

Quick notes:

Unwrapped URL as-is:

/webapp/ux/home?
d=1705914006565
&status=login
&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D
&newsess=false
&roleid=DP010101/0007
&origin=ourprogram

Percent-decoded:

/webapp/ux/home?
d=1705914006565
&status=login
&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q==
&newsess=false
&roleid=DP010101/0007
&origin=ourprogram

Both versions, unwrapped looking for HTML Entities results in a null finding.

@xeno6696
Copy link
Collaborator Author

xeno6696 commented Jan 22, 2024

Found it. As discussed in #823 the first call to canonicalize the entire query string is run into the canonicalize method on line 541 and generates the false positive.

Further research is necessary to determine exactly what is being detected because sweeping the input against standard HTML decoding (NOT ESAPI) results in zero change to the output. (There's no collision, so what gives?)

@xeno6696
Copy link
Collaborator Author

Not sure what to make of this one.

image

HTMLDecode absolutely transforms output here when it's not expected to.

@xeno6696
Copy link
Collaborator Author

Issue 1: the call to canonicalize on line 541 is attempting an early canonicalize in the case of the queries. We're not supposed to touch those until we've split the queries into key/value pairs. This will be resolved by finessing the logic to placed 541 into the else block that checks to see if we're at the QUERY segment. THAT will partially mitigate the problem by ensuring the check is done at the correct location.

Issue 2: Determine why the input /webapp/ux/home?d=1705914006565&status=login&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D&newsess=false&roleid=DP010101/0007&origin=ourprogram results in a transformation to /webapp/ux/home?d=1705914006565&status=login&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D≠wsess=false&roleid=DP010101/0007∨igin=ourprogram with the microscopic view of the text being:

&newsess=false&roleid=DP010101/0007&or
into
≠wsess=false&roleid=DP010101/0007∨

It appears that I solved that by looking at this. The HTML entity Codec is translating &ne into , and then the &or detection is a legitimate bug that I'm staring at right now. But at any rate, combined with the percents in the original input, that's a mixed encoding exception before we even get to the &or.

I'm stumped as to why we're translating that &or however. This is just strange.

The FP issue will be fixed easy and can go whenever the next point release goes out, but the misdetection on &or.... who knows. I think that's its own issue.

xeno6696 added a commit to xeno6696/esapi-java-legacy that referenced this issue Jan 23, 2024
…se block of the check to see whether or not we were dealing with a query segment.
@kwwall kwwall closed this as completed in f45876f May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant