This repository contains the solution for the Engineering Internship Recruitment task. The task involves creating a program that fetches data from the Github API using OAuth authentication and stores it in a Postgres database. The data should be normalized and deduplicated before being stored in the database.
- Github OAuth login
- Fetch repositories from Github API
- Normalizes and deduplicates data
- Store data in a Postgres database
- Generate a CSV file containing the data
Language: Python
Web Framework: Flask
ORM: SQLAlchemy
Database: Postgres
API Integration: Github API
Data Processing: Pandas
Data Serialization: CSV
Deployment: Local machine / Docker
16-57-10.mp4
Before installing the project, ensure that you have the following prerequisites:
- Python 3.7 or higher
- pip package manager
- Docker Desktop
Clone the repository using the following command:
git clone https://github.com/BalkanID-University/balkanid-summer-internship-vit-vellore-2023-rishuyadav
Navigate to the project directory and create a new virtual environment:
cd balkanid-summer-internship-vit-vellore-2023-rishuyadav
python3 -m venv venv
Activate the virtual environment:
source venv/bin/activate
Install the project dependencies using pip:
pip install -r requirements.txt
FLASK_SECRET_KEY=<secret_key>
DATABASE_URL=<postgres_database_url>
GITHUB_CLIENT_ID=<github_client_id>
GITHUB_CLIENT_SECRET=<github_client_secret>
Create a new Postgres database and set the DATABASE_URL environment variable to the database URL. You can use ElephantSQL to create a free Postgres database.
export DATABASE_URL=postgres://username:password@host:port/database_name
To use the Github API, you need to set up Github OAuth. Follow these steps to set up Github OAuth:
-
Create a new OAuth application on Github by going to https://github.com/settings/applications/new.
-
Set the "Homepage URL" to http://localhost:5000/oauth/login Set the "Authorization callback URL" to http://localhost:5000/oauth/callback.
-
Note down the "Client ID" and "Client Secret" values for the application.
-
Set the GITHUB_CLIENT_ID and GITHUB_CLIENT_SECRET environment variables to the "Client ID" and "Client Secret" values, respectively:
export GITHUB_CLIENT_ID=your_client_id
export GITHUB_CLIENT_SECRET=your_client_secret
Run the app using the following command:
cd app
python3 app.py
The app should now be accessible at http://localhost:5000.
- Build the Docker image:
docker build -t your-image-name .
- Run the Docker container:
docker run -p 5000:5000 -e your-image-name
The app should now be accessible at http://localhost:5000