Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML tags are put into full text search index #364

Open
joachim-n opened this issue Mar 12, 2024 · 3 comments
Open

HTML tags are put into full text search index #364

joachim-n opened this issue Mar 12, 2024 · 3 comments

Comments

@joachim-n
Copy link

joachim-n commented Mar 12, 2024

I don't know if this is a bug in upstream SearchAPI or a configuration problem...

The {search_api_db_localgov_directories_index_default_text} table has lots of entries for words which appear to be from HTML tags and attributes:

  • div
  • field
  • 'node content'

This means that if you search for 'field' you get ALL results. For technical words like div or node that doesn't matter so much, but if a school was called something like 'King George V Field', it's going to be hard to search for.

@finnlewis
Copy link
Member

Thanks @joachim-n
Very interesting!
I'm just looking at the default settings for the index, as the title field does not seem to get indexed at present, on a default install.

Just updating that to add the html filter which adds boosting.

I wonder if adding the html filter will ALSO filter out all the html tags?

@finnlewis
Copy link
Member

I think it will, description of the HTML filter:

Strips HTML tags from fulltext fields and decodes HTML entities. Use this processor when indexing HTML data – for example, node bodies for certain text formats. The processor also allows to boost (or ignore) the contents of specific elements.

@finnlewis
Copy link
Member

Indeed.... as you mention @joachim-n , searching for "field" brings back all results on a default install of localgov_directories, as demonstrated if we enable the localgov_demo module and search for field on the collaborators demo content:

https://demo.localgovdrupal.org/localgov-drupal-collaborators?search_api_fulltext=field

This pull request adds the html filter (as well as indexing the title field as fulltext) which fixes this behaviour:

#361

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants