Web Scraping the issues from Personal Repository for GitHub and GitHub Enterprise #56350
-
Select Topic AreaQuestion BodyGreetings Sir/Madam I am writing this email as I came across an idea, I would like to save the content of all the issues that are posted on my repositories, and then make a database out of them so I could look through them. Is it mandatory to use the GitHub API in order to fetch all of the issue contents or would it be possible to save the HTML content for each page with a certain delay between each saving so as not to overload the servers with too many requests? Also, I would like to know the answer to the question above in the case of GitHub Enterprise as well. Thank you for your time and consideration and I look forward to hearing from you. Yours Sincerely, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
This comment was marked as spam.
This comment was marked as spam.
-
Hi Adrian-Nicolae, It's recommended to use the GitHub API to scrape GitHub issues from your repositories. This ensures reliable and efficient access to the data while respecting GitHub's rate limits. Saving HTML content with delays might not be as reliable and can lead to issues with data consistency. The same principles apply for GitHub Enterprise, but ensure you have appropriate access permissions. |
Beta Was this translation helpful? Give feedback.
To retrieve the content of issues from your repositories, using the GitHub API is the recommended and preferred method. The GitHub API provides a reliable and efficient way to programmatically interact with GitHub repositories, including accessing issue data.
By using the GitHub API, you can make requests to fetch issue information, including their titles, bodies, comments, labels, and other relevant data. This allows you to extract the necessary content and store it in a database for later retrieval and analysis.
Using the GitHub API also ensures that you stay within the permitted rate limits set by GitHub, preventing excessive requests that could overload the servers. The rate limits he…