Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using the github apis #4

Open
klaernie opened this issue Jan 5, 2024 · 3 comments
Open

using the github apis #4

klaernie opened this issue Jan 5, 2024 · 3 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@klaernie
Copy link

klaernie commented Jan 5, 2024

I'd have a pretty hacky idea to run against the github api: store that data in a branch of this repo, then have a github actions job run over the data, and since github provides you a GITHUB_TOKEN during the job run you can make 1000 calls per hour, from what I first found. Then just let the job do the periodic work of refreshing information, and when an update to the branch happened, because the job found new information to update the other jobs can take care of publishing the update, all without requiring infrastructure on your end.

Just a stupid and very rough idea, of cause.

@KristjanESPERANTO
Copy link
Owner

Yes, that would be a really nice step!

At the moment the process is very ineffective. And you are addressing the two fundamental reasons. I'll express them here in my own words:

  1. I'm still doing it manually.
    => There's already a PR to change that: Add workflow to deploy to GH pages #2
    => It would be nice if we had some rudimentary protection against vandalism. If someone intentionally or unintentionally empties the module list in the wiki, I would notice this in the current manual process.
  2. Always all repositories are cloned and tested although it only makes sense for updated and new ones.
    => Your suggestion would solve most of this. Only the repositories with new commits would have to be covered.
    => In addition, we need a job to monitor the official module list to trigger removing or adding modules.

I would actually like to tackle the second point first, but it would be nice to have the automation now... I'll take a closer look at this in the next few days. Of course, I'm always happy to receive help 🙂

@KristjanESPERANTO KristjanESPERANTO added enhancement New feature or request help wanted Extra attention is needed labels Jan 5, 2024
@klaernie
Copy link
Author

klaernie commented Jan 5, 2024

  1. Technically speaking, if you're writing a .json for each an every plugin you found into a separate branch, no vandalism on the wiki could affect that data anymore. That would also allow to generate the other json-files from during a build of the page, and make a 10k json file readable (splitting it into a directory).

  2. I'd probably also store the last commit ID in the json, since that allows to check for changes in a single API call, even without cloning the repo, and can short-circuit regenerating the remaining data.

If you're exhausting the rate limits of the github API, one way would be to segment the modules into buckets, where each bucket is checked in a specific hour of the day. Also one could make the refreshing process pay attention to the actual rate-limit usage according to the docs and simply pause until there are call available again.

I'd happy to help, but atm I'm refreshing my entire homelab (newer hardware, migrating multiple terabytes to zfs, getting rid of stupid dependencies like running NFS on a physical server...) so I'm pretty much underwater for another few weeks..

@klaernie
Copy link
Author

klaernie commented Jan 5, 2024

But of cause I'm always here to be a rubber ducky with experience in scaling systems (doing that at work for a nagios instance)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants