Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add Floki.Doc #457

Open
wojtekmach opened this issue May 19, 2023 · 10 comments
Open

Proposal: Add Floki.Doc #457

wojtekmach opened this issue May 19, 2023 · 10 comments
Labels

Comments

@wojtekmach
Copy link
Contributor

wojtekmach commented May 19, 2023

Hi!

I maintain a tiny Floki wrapper called EasyHTML which adds a struct around nodes and thus we can implement protocols and behaviours. Here's an example:

Mix.install([:easyhtml])

html = """
<!doctype html>
<html>
<body>
  <p class="headline">Hello, World!</p>
</body>
</html>
"""

doc = EasyHTML.parse!(html)

doc
#=> #EasyHTML[<html><body><p class="headline">Hello, World!</p></body></html>]

doc["p.headline"]
#=> #EasyHTML[<p class="headline">Hello, World!</p>]

doc["#bad"]
#=> nil

to_string(doc)
#=> "Hello, World!"

I'd like to add a Floki.Doc struct and a Floki.Doc.parse!/1 function.

Feedback appreciated!

@wojtekmach
Copy link
Contributor Author

@philss I remember we talked a little bit about it but I don't remember much. :) I think the main concern was we obviously cannot return this from Floki.parse* functions as it would be a major breaking change. I think we solve this with a separate module.

If we go with the struct, I'm curious whether Floki.attr and Floki.attribute functions would work on it or we should have equivalents on the struct module.

Btw, is the distinction between document and fragment such that the former always contains exactly one root element? If so the struct could have attributes field which would make accessing these super convenient. But then again I'd guess working with fragments is more common. So maybe we have two different structs after all?

Hey maybe I do remember parts of our earlier conversations. :)

@philss
Copy link
Owner

philss commented May 19, 2023

I'd like to add a Floki.Doc struct and a Floki.Doc.parse!/1 function.

I think the main concern was we obviously cannot return this from Floki.parse* functions as it would be a major breaking change.

@wojtekmach yeah, I think it's aligned with what we discussed. We wanted to avoid this breaking change, but I think in the future this "Doc.parse" could be the main API. I'm not sure if we discussed what would be the struct, but I imagine it would be the tree representation, like we have in Floki.HTMLTree. Is this what you are thinking?

If we go with the struct, I'm curious whether Floki.attr and Floki.attribute functions would work on it or we should have equivalents on the struct module.

We would probably want to add support for the new struct on these functions.

Btw, is the distinction between document and fragment such that the former always contains exactly one root element?

Structurally speaking, yes. But semantically the document is something that has the root element being "", but the specs say that we need a <!doctype html> as well (we are just ignoring this part today). Fragments don't have this restriction, but I'm not sure if we should have another struct for them.

Something that can help us if we go for two structs is the specs (they are too complex, so we shouldn't worry that much):

Hey maybe I do remember parts of our earlier conversations. :)

:D

@wojtekmach
Copy link
Contributor Author

Sorry, I wasn’t aware of HTMLTree struct. I didn’t really look into internals at all. 😅

@viniciusmuller
Copy link
Contributor

In case this gets implemented, I would suggest the name to be Floki.Document instead of Floki.Doc, since I read this issue and thought it was something documentation-related

@wojtekmach
Copy link
Contributor Author

If, per #463, we have maps as attributes and we add an ~HTML sigil (as a macro) we'd get these map match semantics for free:

html = ~HTML"""
<p class="p1">foo</p>
<p class="p2">bar</p>
"""

# these two are equivalent
assert ~HTML[<p class="p2">bar</p>] = html[".p2"]
assert ~HTML[<p>bar</p>] = html[".p2"]

assert html[".p2"] == ~HTML[<p class="p2">bar</p>]

which is potentially very interesting for testing.

@mischov
Copy link
Contributor

mischov commented May 31, 2023

@wojtekmach
Copy link
Contributor Author

Similar how?

FWIW EasyHTML mentioned at the beginning uses the "floki ast", the one returned from Floki.parse* functions. The querying-optimised one in Meeseeks is very interesting. I guess the point is if we use a struct we can consider the ast as implementation detail and pick either!

@mischov
Copy link
Contributor

mischov commented May 31, 2023

Similar in that it already implements the output of both parsing and selection in terms of structs (and provides a nice toolkit for working with those structs), meaning the building blocks are in place for something like EasyHTML.

@wojtekmach
Copy link
Contributor Author

Ah, makes sense!

@mischov
Copy link
Contributor

mischov commented May 31, 2023

It also goes beyond a single Node struct and has a top level Document struct, as well as Comment, Data, Doctype, Element, ProcessingInstruction, and Text structs, which is something else to consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants