Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mojo::DOM doesn't recognize end of comment (when it should) #2030

Open
mauke opened this issue Jan 29, 2023 · 4 comments
Open

Mojo::DOM doesn't recognize end of comment (when it should) #2030

mauke opened this issue Jan 29, 2023 · 4 comments

Comments

@mauke
Copy link
Contributor

mauke commented Jan 29, 2023

  • Mojolicious version: 9.31
  • Perl version: v5.36.0
  • Operating system: Ubuntu 22.04.1 LTS

Steps to reproduce the behavior

#!/usr/bin/env perl
use v5.12.0;
use warnings;
use Test::More;
use Mojo::DOM;

for my $fragment (
    '<!--> <p>OK</p> <!-- -->',
    '<!---> <p>OK</p> <!-- -->',
    '<!-- --!> <p>OK</p> <!-- -->',
) {
    my $dom = Mojo::DOM->new("<!DOCTYPE html>\n$fragment");

    is_deeply
        $dom->find('p')->map(sub { $_->to_string })->to_array,
        ['<p>OK</p>'],
        "HTML fragment '$fragment' parses as comment + p + comment";
}

done_testing;

Expected behavior

Test passes.

After <!-- we are in the comment start state. If the very next characters are > or ->, an abrupt-closing-of-empty-comment error occurs. If the parser doesn't abort entirely, it must treat the comment as closed and switch back to the data state.

Otherwise, if we see --! inside a comment, we switch to the comment end bang state. If the very next character is a >, an incorrectly-closed-comment error occurs. If the parser doesn't abort entirely, it must treat the comment as closed and switch back to the data state.

Actual behavior

not ok 1 - HTML fragment '<!--> <p>OK</p> <!-- -->' parses as comment + p + comment
#   Failed test 'HTML fragment '<!--> <p>OK</p> <!-- -->' parses as comment + p + comment'
#   at mojo-dom-bug-8.pl line 15.
#     Structures begin differing at:
#          $got->[0] = Does not exist
#     $expected->[0] = '<p>OK</p>'
not ok 2 - HTML fragment '<!---> <p>OK</p> <!-- -->' parses as comment + p + comment
#   Failed test 'HTML fragment '<!---> <p>OK</p> <!-- -->' parses as comment + p + comment'
#   at mojo-dom-bug-8.pl line 15.
#     Structures begin differing at:
#          $got->[0] = Does not exist
#     $expected->[0] = '<p>OK</p>'
not ok 3 - HTML fragment '<!-- --!> <p>OK</p> <!-- -->' parses as comment + p + comment
#   Failed test 'HTML fragment '<!-- --!> <p>OK</p> <!-- -->' parses as comment + p + comment'
#   at mojo-dom-bug-8.pl line 15.
#     Structures begin differing at:
#          $got->[0] = Does not exist
#     $expected->[0] = '<p>OK</p>'
1..3
# Looks like you failed 3 tests of 3.
@poti1
Copy link

poti1 commented Jun 24, 2023

The entire block/line is treated like a comment:

perl -Mojo -E 'my $x = x "<!-- --!> <p>OK</p> <!-- -->"; say r $x'
bless( do{\(my $o = bless( {
    "tree" => [
      "root",
      [
        "comment",
        " --!> <p>OK</p> <!-- ",
        ${$VAR1}->{"tree"}
      ]
    ]
  }, 'Mojo::DOM::HTML' ))}, 'Mojo::DOM' )

@poti1
Copy link

poti1 commented Jun 24, 2023

In Mojo::DOM::HTML:

In $TOKEN_RE, although the comment portion is not greedy, it still consumes too much:

--(.*?)--\s*

@poti1
Copy link

poti1 commented Jun 24, 2023

Very interesting. Although we are using (.*?), it appears that there is perhaps some optimization when matching the trailing > on the line with:
)>

@poti1
Copy link

poti1 commented Jun 24, 2023

Please take a look at the PR and let me know if that would work for you.

I included the test cases from the description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants