Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mojo::DOM doesn't detect end of script elements correctly #2026

Open
mauke opened this issue Jan 23, 2023 · 4 comments
Open

Mojo::DOM doesn't detect end of script elements correctly #2026

mauke opened this issue Jan 23, 2023 · 4 comments

Comments

@mauke
Copy link
Contributor

mauke commented Jan 23, 2023

  • Mojolicious version: 9.31
  • Perl version: v5.36.0
  • Operating system: Linux (Ubuntu 22.04)

Steps to reproduce the behavior

#!/usr/bin/env perl
use v5.12.0;
use warnings;
use Test::More;
use Mojo::DOM;

for my $fragment (
    '<script> console.log("<!--"); </script>',
    '<script> console.log("<!-- <script> -->"); </script>',
    '<script> console.log("<!-- <script> </script>"); </script>',
    '<script> console.log("<!-- <script> </script> -->"); </script>',
) {
    my $dom = Mojo::DOM->new("<!DOCTYPE html>\n$fragment");

    is_deeply
        $dom->find('script')->map(sub { $_->to_string })->to_array,
        [$fragment],
        "HTML fragment '$fragment' parses correctly";
}

done_testing;

Expected behavior

Test passes.

Actual behavior

ok 1 - HTML fragment '<script> console.log("<!--"); </script>' parses correctly
ok 2 - HTML fragment '<script> console.log("<!-- <script> -->"); </script>' parses correctly
not ok 3 - HTML fragment '<script> console.log("<!-- <script> </script>"); </script>' parses correctly
#   Failed test 'HTML fragment '<script> console.log("<!-- <script> </script>"); </script>' parses correctly'
#   at mojo-dom-bug-6.pl line 16.
#     Structures begin differing at:
#          $got->[0] = '<script> console.log("<!-- <script> </script>'
#     $expected->[0] = '<script> console.log("<!-- <script> </script>"); </script>'
not ok 4 - HTML fragment '<script> console.log("<!-- <script> </script> -->"); </script>' parses correctly
#   Failed test 'HTML fragment '<script> console.log("<!-- <script> </script> -->"); </script>' parses correctly'
#   at mojo-dom-bug-6.pl line 16.
#     Structures begin differing at:
#          $got->[0] = '<script> console.log("<!-- <script> </script>'
#     $expected->[0] = '<script> console.log("<!-- <script> </script> -->"); </script>'
1..4
# Looks like you failed 2 tests of 4.
@kraih kraih added the bug label Jan 29, 2023
@kraih
Copy link
Member

kraih commented Jan 29, 2023

For the future, please link to the relevant spec sections, it saves us a lot of time and you must have looked it up already anyway before opening the issue.

What is going on here is that for legacy reasons, "<!--" and "<script" strings in script elements in HTML need to be balanced in order for the parser to consider closing the block.

The full explanation can be found here.

@kraih
Copy link
Member

kraih commented Jan 29, 2023

Given the possible security implications, i'd say this is a high priority bug.

@kraih kraih added the security label Jan 29, 2023
@poti1
Copy link

poti1 commented Jun 28, 2023

Consider: https://www.rfc-editor.org/rfc/rfc1866

3.2.5. Comments

   To include comments in an HTML document, use a comment declaration. A
   comment declaration consists of `<!' followed by zero or more
   comments followed by `>'. Each comment starts with `--' and includes
   all text up to and including the next occurrence of `--'. In a
   comment declaration, white space is allowed after each comment, but
   not before the first comment.  The entire comment declaration is
   ignored.

      NOTE - Some historical HTML implementations incorrectly consider
      any `>' character to be the termination of a comment.

   For example:

    <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
    <HEAD>
    <TITLE>HTML Comment Example</TITLE>
    <!-- Id: html-sgml.sgm,v 1.5 1995/05/26 21:29:50 connolly Exp  -->
    <!-- another -- -- comment -->
    <!>
    </HEAD>
    <BODY>
    <p> <!- not a comment, just regular old data characters ->

@kraih
Copy link
Member

kraih commented Jun 28, 2023

See here for which specs we follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants