New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML validation - Feature PR #180
Comments
It doesn't "fix" HTML, it parses it in accordance with spec. This is not a separate fixing mechanism from any other parsing, but normal parsing flow where some tags are implicit etc., but having implicit tags doesn't make HTML invalid according to HTML5 spec - in opposite, such documents are still totally valid. |
What "valid" means? for example, per spec:
So in this case its saying So is just that we are talking about different HTML spec versions? My ask is to add |
Well, any HTML is valid, however it can be non-conforming - in that case spec says to report parse error. I believe that having validator is a good thing for some use scenarios, e.g. having conforming HTML justifies that it safe for parse-serialize round trips, consequently making HTML instrumentation safe as well. We had this discussion before: #55. And it still blocked on whatwg/html#1339. I'll keep it open as there is a demand for the feature, however I wouldn't expect it to be implemented soon. |
@diervo It would be great! |
I think @inikulin has the right intuition here, it is not about validation, but about conforming, and if the parser can provide a report about the conforming aspect of the parsed document, that should be sufficient for developer to do:
|
Personally to me, conformance checkers just feels like a thing from the past nowadays when we had to check our HTML with online W3C tool to be sure that it will be parsed correctly (or parsed at all) by all the different browsers. Now that they all follow the same spec (apart from temporary bugs), that feels less useful, but I don't oppose it surely if there are valid use cases. |
@RReverser you're right that cross browser compatibility is not an issue anymore (kinda), but it might be useful to ensure that provided markup will be interpreted as intended, because in some cases auto corrections may screw things up. |
@diervo FYI you can run the nu validator locally using |
Yes we have been poking around with it, the fundamental problem is integration. The fact that is in Java adds some complexity to our integration scenarios. We will start working soon on the very first step to add validation into parse5 hopefully we can do incremental steps due to all of the possible parse errors and nuances |
We have done a lot of work in the past on the HTML spec and added the proper error names and description. However we have never finished porting those to parse5 and @inikulin I haven't forget we own you still a bunch of work :) |
@diervo Sure, I'd love to finish this work, I've also started to do some spec work for the tree construction stage on my own. Hit me up by email and we'll try to figure out the best time to finish it. |
It seems this feature may be dead, but I would like to resurrect it to say that I personally have been searching for an HTML parser that can perform basic validation/conformation checking. I teach at a University and have built a tool to auto-grade programming projects. It detects and provides feedback on errors/issues in students code and of all things HTML has been the hardest to find a parser for. The ideal would be every node being marked with a Boolean of The closest I have been able to find is htmlparser2 but the drawbacks there are:
|
TL;DR: would the owners of this repo be open to introduce a new API to
validate
a given HTML page or fragment?Today the parser fixes internally the tree for you (incorrect self closing tags, missing tags, etc), giving you the already fixed tree.
I've been trying to find a good HTML validator, but the only one that is spec compliant is the one from W3C which is written in Java and found only as a service which is very inconvenient for most uses.
I believe given that this is the most used/compliant HTML parser, should be pretty straightforward to add HTML validation
Rather than creating a fork I would gladly do a PR if there is no opposition to this feature.
Thoughts?
The text was updated successfully, but these errors were encountered: