Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkbox for "Search in Content" #2634

Open
1 task done
bitmage opened this issue May 7, 2024 · 2 comments
Open
1 task done

Checkbox for "Search in Content" #2634

bitmage opened this issue May 7, 2024 · 2 comments

Comments

@bitmage
Copy link

bitmage commented May 7, 2024

Kudos on creating a project with maintainable code and great documentation! There's a lot being done right here. 馃憤

I have a small contribution I would like to make: I would like to add a "Keep" filter that will search the Content, not just the Title. I'm willing to submit a pull request for this, and I'm writing here to give a heads up and gather feedback.

I want to try doing this with minimal changes. The simplest way I see to go about this would be to add a checkbox in the UI for "Search Title" that would go below the Block Rules and Keep Rules. This will have minimal disruption, allowing users to opt in to the new feature.

I see several open issues with similar requests:

Feature request: Block Rules for article body
Allow block rules based on website content (wants to potentially use a CSS selector to look at a particular part of the page)
Block rules based off of other elements (not just title). (wants to potentially look at the fully loaded page)

The CSS selector and parsing the fully loaded page are a little overkill for my use case, but I'm hoping that a simple checkbox here will expand the usefulness without adding much complexity.

On the code side it seems that the Entry model already supports the Content field. And the isAllowedEntry function would just need to be updated to take into consideration the new checkbox and the Content field. The value of the checkbox would need to be stored and retrieved, and the feature would need to be documented. New test(s) will be written.

  • Does anyone know what the status of the Content field is at the time when this function is run?
  • Is Content populated at this point?
  • Does Content just contain the RSS contents, not the fully loaded page contents?
  • Is there anything else I'm missing?

Welcoming any feedback or suggestions.

@bitmage
Copy link
Author

bitmage commented May 10, 2024

I added code for the new checkboxes which can be seen here. Unit and integration tests pass.

When I build a local docker container and test the UI manually, I notice:

  1. Using the new feature doesn't error, and doesn't seem to interfere with previous functionality...
  2. But it also doesn't properly search Content. I assume this is because the content field isn't loaded at the time processing is happening?

I'll look into it further, but feel free to drop information here if you have any tips.

@bitmage
Copy link
Author

bitmage commented May 13, 2024

So, while Content wouldn't be loaded as a result of scraping when these filters run, I believe it should be loaded due to the code in rss/adapter.go grabbing it directly from the description field in the RSS feed. I assume this is the adapter that would be run for an RSS feed like Webflow's Discourse.

Still investigating why my code doesn't appear to be searching the Content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant