You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes, spacing is not necessarily preserved in code blocks, this can be improved.
adbar
changed the title
santize func remove all white space \ table even in code block when using txt outpput formating
Preserve horizontal space in code blocks
Apr 19, 2024
Hello,
thanks for yours continous work on trafilatura
recent when we using trafilatura working on code-text content extraction, wo noticed that the santize func remove all white space \ table even in code block when using txt outpput formating
we think the problem is here preserve_space=False in default
https://github.com/adbar/trafilatura/blob/2c9f20296c1c5ce9a23715a07df5b623f3016b65/trafilatura/xml.py#L315C5-L315C51
The text was updated successfully, but these errors were encountered: