-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xml content lost #21
Comments
Hm, this looks like a bug in the XML parsing library we are using (saxjs). I tried with the latest version of saxjs and the same problem exists. I will dig deeper later on. In any case, this parser is only meant to be used with valid XML. If you are looking at parsing HTML which is usually not well-formed and valid, you should look at other libraries which are specially designed for parsing HTML. |
Hi, saxjs support this scenario and i found the element tail actually been parsed in your TreeBuilder. The problem should be in _serialize_xml function as it only take care the element text but not the element tail. After i add the tail, the serialized output looks correct.
append the tail text right after each tag.
|
XML:
original XML, (actually a html snippet)
After process by elementtree.
var result = et.parse(xml);
console.log(result.write());
output:
Some of the content was lost.
The text was updated successfully, but these errors were encountered: