Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RichTextField / RichTextBlock should strip out HTML tags from search-indexable content #6098

Closed
gasman opened this issue Jun 1, 2020 · 3 comments

Comments

@gasman
Copy link
Collaborator

gasman commented Jun 1, 2020

Issue Summary

Currently, when indexing RichTextFields, or RichTextBlocks within StreamFields, the raw content including HTML tags is passed to the index. Since HTML tag names / attributes are not meaningful searchable content*, it would be better to provide a get_searchable_content method on these that applies Django's striptags filter.

More importantly, if you customise the Elasticsearch backend (or run raw Elasticsearch queries) to take advantage of Elasticsearch's highlighting support (see #5340), you currently end up with either stray HTML tags or spurious formatting in the results, depending on whether you passed 'encoder': 'html' in the query.

(* This isn't totally true - for example, you could make a good case for indexing the alt text on images - so an ideal solution would probably involve being able to specify custom rules at the richtext feature level to generate a plain text representation. However, on balance, I think stripping tags out is better than keeping them in.)

Steps to Reproduce

Search for 'h2' on https://www.rca.ac.uk/ . Observe that the results include many pages that contain an <h2> element but not the text "h2"...

  • I have confirmed that this issue can be reproduced as described on a fresh Wagtail project: no
@acarasimon96
Copy link
Contributor

Is it OK if I work on a PR that resolves this issue? I've already implemented the same possible solution above in my project a long time ago.

@gasman
Copy link
Collaborator Author

gasman commented Jun 1, 2020

@acarasimon96 Absolutely, yes please!

@lb-
Copy link
Member

lb- commented Jun 2, 2020

Resolved via #6099

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants