Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floki.parse differs when using html5ever #236

Open
andyleclair opened this issue Nov 8, 2019 · 4 comments
Open

Floki.parse differs when using html5ever #236

andyleclair opened this issue Nov 8, 2019 · 4 comments
Labels

Comments

@andyleclair
Copy link

andyleclair commented Nov 8, 2019

Description

Mochiweb Floki will produce different output than html5ever, namely, the output of Floki.parse will be wrapped in <html><head></head><body>...</body></html>

To Reproduce

Steps to reproduce the behavior:

  • Using Floki v0.23.0
  • Using html5ever
  • Using Elixir v1.9.3
  • Using Erlang OTP v21.3.8.9
  • With this code:
defmodule TestCases do
  @test_cases [
    {
      ~s[<a href="javascript:alert('XSS');">Click here</a>],
      ~s[<a href="#">Click here</a>]
    },
    {
      ~s[<a href="whatever" onclick="alert('XSS');">Click here</a>],
      ~s[<a href="whatever">Click here</a>],
    },
    {
      ~s[<body onload="alert('XSS')"><p>Hello</p></body>],
      ~s[<body><p>Hello</p></body>],
    },
    {
      ~s[<img src="javascript:alert('XSS');">],
      ~s[<img src="#"/>],
    },
    {
      ~s[<script>alert('XSS');</script>],
      ~s[],
    },
    {
      ~s[<body background="javascript:alert('XSS');"><p>Hello</p></body>],
      ~s[<body background="#"><p>Hello</p></body>],
    },
    {
      ~s[<style>body { background-image: expression('alert("XSS")'); }</style>],
      ~s[<style>body { background-image: removed_by_strip_js('alert("XSS")'); }</style>],
    },
    {
      ~s[<style>body { background-image: url('javascript:alert("XSS")'); }</style>],
      ~s[<style>body { background-image: url('removed_by_strip_js:alert("XSS")'); }</style>],
    },
    {
      ~s[<style><script>alert('XSS')</script></style>],
      ~s[<style><script>alert('XSS')</script></style>],
    },
    {
      ~s[<style> h1 > a { color: red; } </style>],
      ~s[<style> h1 > a { color: red; } </style>],
    },
    {
      ~s[<],
      ~s[&lt;],
    },
    {
      ~s[>],
      ~s[&gt;],
    },
    {
      ~s[],
      ~s[],
    },
  ]

  def test_cases, do: @test_cases
end

TestCases.test_cases |> Enum.map(fn {ins, _outs} -> Floki.parse(ins) end)

[                                                                                                                                                                                                                                                                                         
  [                                                                                                                                                                                                                                                                                       
    {"html", [],                                                                                                                                                                                                                                                                          
     [                                                                                                                                                                                                                                                                                    
       {"head", [], []},                                                                                                                                                                                                                                                                  
       {"body", [],                                                                                                                                                                                                                                                                       
        [{"a", [{"href", "javascript:alert('XSS');"}], ["Click here"]}]}                                                                                                                                                                                                                  
     ]}                                                                                                                                                                                                                                                                                   
  ],                                                                                                                                                                                                                                                                                      
  [                                                                                                                                                                                                                                                                                       
    {"html", [],                                                                                                                                                                                                                                                                          
     [                                                                                                                                                                                                                                                                                    
       {"head", [], []},                                                                                                                                                                                                                                                                  
       {"body", [],                                                                                                                                                                                                                                                                       
        [
          {"a", [{"href", "whatever"}, {"onclick", "alert('XSS');"}],
           ["Click here"]}
        ]}
     ]}
  ],
  [
    {"html", [],
     [
       {"head", [], []},
       {"body", [{"onload", "alert('XSS')"}], [{"p", [], ["Hello"]}]}
     ]}
  ],
  [
    {"html", [],
     [
       {"head", [], []},
       {"body", [], [{"img", [{"src", "javascript:alert('XSS');"}], []}]}
     ]}
  ],
  [
    {"html", [],
     [{"head", [], [{"script", [], ["alert('XSS');"]}]}, {"body", [], []}]}
  ],
...
]

Expected behavior

I'd expect that the output would match the the output of calling this without the html5ever parser, namely, that it'd just be the fragments themselves.

@andyleclair andyleclair added the Bug label Nov 8, 2019
@philss
Copy link
Owner

philss commented Nov 14, 2019

@andyleclair Thank you for opening the issue.

This is a problem that we have because we don't consider parsing fragments as something different, when we should. html5ever's parses fragments as full documents because we (floki) don't distinguish this when calling it.

I'm planning to add a Floki.parse_fragment to differ from the standard Floki.parse because the HTML specs treats them as different algorithms, and with this we can call the correct functions on html5ever's side.

This should be fixed once I finish the work on the internal parser (#204).

@andyleclair
Copy link
Author

I see that this report got closed. Was there any resolution? We are currently handling the specific case of a fragment wrapped in the default wrapper, but I'd love to tear that code out

@philss
Copy link
Owner

philss commented Jun 7, 2020

@andyleclair it was not fixed. It's a known issue. I kept the issue fixed in the issues list, but I will let it open too.

@philss philss reopened this Jun 7, 2020
@Matsa59
Copy link

Matsa59 commented Apr 5, 2023

Is it really a problem from floki? After reading code I start to think it's from html5ever_elixir.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants