Skip to content

A function that extracts GitHub repository search results. Project for Online Data Collection (oDCM) course @ Tilburg University.

Notifications You must be signed in to change notification settings

thtbui/github-repository-finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Github Repository Finder - with sample dataset

Introduction

This notebook gives information about the function used to fetch Github repositories based on a keyword and a determined timeframe starting from the day of fetching.

Running instruction

Requirements:

  1. Generate your own Github Token: Creating a personal access token
  2. Save your token as an enviroment variable, remember to name the variable as 'GITHUBTOKEN': Configuring Environment Variables
  3. Make sure you have installed the following packages in python: requests, math, datetime, dateutil, csv, pandas, json, os, time. Installation instruction can be found at Python website

Function structure:

GRF collects data by operating 4 separate steps accquired via 4 functions: find_repo, export_repo_list; save_column; save_dt. The working of these functions is illustrated in the following diagram:

Fig1 GitHub Repository Finder components

Sample dataset:

A sample dataset was obtained by using the following command:

grf("python", 3, 8)
import pandas as pd
pd.read_csv("data/dt.csv", delimiter= ";",nrows=10)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id name url language created stars watch forks readme
0 416977797 AI_Project https://github.com/pbl4team/AI_Project Python 2021-10-14T03:37:01Z 0 0 4 Project AI Systeam - Computer Vision with pyth...
1 416995331 python https://github.com/Cam0411/python Python 2021-10-14T05:06:00Z 0 0 0 python
2 416908634 Python https://github.com/psplendid61/Python NaN 2021-10-13T21:53:43Z 0 0 0 NaN
3 416963376 python https://github.com/iAMSe/python NaN 2021-10-14T02:28:55Z 0 0 0 python
4 416996896 python https://github.com/rakeshk67/python NaN 2021-10-14T05:13:42Z 0 0 0 python
5 416961346 python https://github.com/colddie/python NaN 2021-10-14T02:19:55Z 0 0 0 NaN
6 416990435 Python https://github.com/mahdidahmani/Python Python 2021-10-14T04:41:11Z 0 0 0 NaN
7 416952467 python https://github.com/grace-th3/python Python 2021-10-14T01:38:42Z 0 0 0 NaN
8 416928589 Python https://github.com/Cheung-man/Python NaN 2021-10-13T23:36:27Z 0 0 0 NaN
9 416935834 python https://github.com/mygithuang/python NaN 2021-10-14T00:15:25Z 0 0 0 python mygithuang

Function source code:

Source code and detailed function documentation are available at: GRF Source code

About

A function that extracts GitHub repository search results. Project for Online Data Collection (oDCM) course @ Tilburg University.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published