Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH Action to gather content #1478

Open
jonocarroll opened this issue Oct 21, 2023 · 4 comments
Open

GH Action to gather content #1478

jonocarroll opened this issue Oct 21, 2023 · 4 comments

Comments

@jonocarroll
Copy link
Member

jonocarroll commented Oct 21, 2023

The curation process as it stands currently involves:

  1. collecting the last 10 days of RSS entries via get_rss_posts()
  2. collecting the last week or so of CRANberries new and updated via process_cranberries()
  3. de-duplicating (within the draft and from the last 20 issues)
  4. adding content found elsewhere
  5. curating posts - filtering out irrelevant/low-quality content and categorising

I believe the first 3 of these can be automated, potentially with a GitHub Action, performed on a weekly schedule. Getting that content into the draft itself is a minor addition, but collecting that content in the first place, even into a committed plaintext file, could help editors get closer to a draft, faster.

I think I'm able to prototype this myself, but this issue can serve as a place for discussion about improvements or concerns.

@jonocarroll
Copy link
Member Author

The prototype works! https://github.com/rweekly/rweekly.org/blob/gh-pages/curatinator_latest.md?plain=1 (I forgot to add linebreaks, but the concept is sound).

I'll add to this the collection of CRANberries and de-duplication. It's set to run at 9am Saturday UTC each week, but can also be triggered manually in the Actions tab on github.

@jonocarroll
Copy link
Member Author

I'm quite happy with that! This now fetches the RSS feeds and CRANberries, de-duplicates, and saves to curatinator_latest.md for copying over to the draft. Still requires the deup from past issues but I wasn't sure how to easily excise those.

@jonmcalder
Copy link
Member

This is really nice @jonocarroll! I've felt a sense of discontent each time I've curated since the loss of our infrastructure but lacked the initiative to do something about it, so I'm really appreciative of this step to remove some of the inefficiency in our process.

Doing it via a GH action is also really nice since it keeps it transparent for everyone and facilitates maintenance / collaboration / iteration.

@tonyelhabr
Copy link
Member

Looks great to me! This will save me some time during my curation weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants