Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decode method doesn't work proper for some strings #814

Open
mukesh4804 opened this issue Nov 24, 2023 · 3 comments
Open

decode method doesn't work proper for some strings #814

mukesh4804 opened this issue Nov 24, 2023 · 3 comments

Comments

@mukesh4804
Copy link

mukesh4804 commented Nov 24, 2023

Describe the bug
decode method doesn't work proper for some strings

Specify what ESAPI version(s) you are experiencing this bug in
2.5.2

To Reproduce
https://github.com/ESAPI/esapi-java-legacy
I am using the test folder of above git repository .

public class HTMLEntityCodecTest {
Codec codec = new HTMLEntityCodec();

@Test
public void testEntityDecoding(){
    assertEquals("#", codec.decode("#"));
    
}

}

Expected behavior
true

decode method should return # instead of νm .

Here '&num' encoded should be returning '#', but instead its returning 'vm' that is 'v' for '&nu' and 'm' for 'm'

https://dev.w3.org/html5/html-author/charref

Screenshots
If applicable, add screenshots to help explain your problem.
[NOTE: Please do NOT just ask general questions here as they will not be answered. Instead, please use the GitHub Discussions and create a new topic under 'Questions and Answers".
Please delete any irrelevant portion of this template before submitting your GitHub issue. Thanks.]

Platform environment (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • JDK version used with ESAPI

Additional context
Add any other context about the problem here.
If known, please select the label corresponding to the affected ESAPI component.

@mukesh4804 mukesh4804 added the bug label Nov 24, 2023
@planetlevel
Copy link

It's not exactly wrong as some parsers will decode HTML entities without the trailing semicolon.

So this test is technically ambiguous. Remember we are trying to encode/decode/canonicalize in a way that makes attacks difficult or impossible. We are not necessarily trying to strictly implement a spec. The real world implementations are more complex than specs.

Still, this issue keeps coming up over the years. Especially in the case where the semicolon is present it seems like we are producing the result least likely to be right.

Simply switching to using the longest (instead of shortest) possible match would fix a lot.

@mukesh4804
Copy link
Author

@planetlevel , Agreed with statement "Simply switching to using the longest (instead of shortest) possible match would fix a lot."
Do we expect any tentative timeline for the fix?

@kwwall
Copy link
Contributor

kwwall commented Nov 25, 2023

@mukesh4804 wrote:

Do we expect any tentative timeline for the fix?

Hard to say, but I can almost guarantee that it will be at least 6 months or longer should we decide to prioritize a fix for this. We tend to prioritize fixes that have security implications and also focus on ones that have the broadest impact to the security community. Not that many ESAPI clients use the decoders in the first place. In fact, the OWASP Java Encoder project does not even offer decoders! They just generally are not needed.

The implementation of the ESAPI output encoders were intentionally written to provide a "safe harbor" to users who used a popular set of badly broken and vulnerable browsers from a certain company whose first goal was definitely not security and who played fast and loose with the W3C specs. It might be argued that the "specs are the only things that (should) matter", but attackers don't care about adherence to specs, but only what the actual behavior is. Thus this "safe harbor", tended in the direction of aggressively encoding things. As a result, I suspect that the decoders were designed to match so that the respective encoder / decoder acted as inverse operations. So this is only speculation but that may be why the decoders seem to be operating on shortest match rather than the longest match.

That said, I think the ESAPI team would address this as a low priority bug. Its hard to see (outside of contrived cases of course) how such decoder bugs not following the W3C specs to the letter is going to cause a security problem. For output encoding, certainly. But most often when I see decoders being used, that raises a red flag in my mind. From a security perspective at least, decoders really shouldn't be necessary. Decoding should be left to the purview of the end user's browsers. Of all the hundreds of security code reviews that I've done, I can never really recall where an ESAPI decoder was actually needed. At best, it was unnecessary and mostly harmless. Hence, to me, this is a Low priority.

And given that it's a low priority, it honestly may never get fixed unless someone creates a PR to address it. So, it this is important to you, I'd say submit a PR for it (see the CONTRIBUTING-TO-ESAPI.txt file). But even then, it's likely that it will still be 6 months or so until the next ESAPI release unless we need to patch some security issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants