Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching Order in LxmlMicrodataExtractor._extract_property_value #160

Open
kelvinso opened this issue Nov 20, 2020 · 1 comment
Open

Matching Order in LxmlMicrodataExtractor._extract_property_value #160

kelvinso opened this issue Nov 20, 2020 · 1 comment

Comments

@kelvinso
Copy link

I noticed that the matching order of _extract_property_value seems to be inconsistent with https://www.w3.org/TR/microdata/#values. In this doc, it mentions that the 2nd matching case is "If the element has a content attribute". However, in LxmlMicrodataExtractor._extract_property_value, it is 2nd to the last in the matching order.

Should this case

 elif node.get("content"):
            return node.get("content")

in w3cmicrodata.py be moved before resolving for meta tag at line 186?

Thanks a lot!
Kelvin

@Gallaecio
Copy link
Member

Gallaecio commented Feb 21, 2021

Yeah, it looks like the changes they’ve made to the specification since 2013 (that code is from 2014) include allowing content on any node, which back in 2013 was non-standard yet supported by extruct.

We should probably review the standard changes in general, there may be more surprises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants