Language Trends

A simple python web application that analyzes languages usage on GitHub. See it in action here.

Essentially, it's a training project that I was developing while trying to get familiar with web- and asynchronous programming in Python. The main challenge here was finding an effective approach to GitHub scanning via an API having a very limited throughput.

Some difficulties I've encountered and worked around were:

It's not possible to list all the repositories on GitHub using its V4 API (GraphQL). A search result is limited to 1000 items. So I've split search to many sub-searches based on repository creation time.
Depending on actual data (and on something else, probably), API requests may fail. Some of the requests are failing all the time—my guess is that they don't fit into internal GitHub's timeouts. There is no way to predict which request will succeed and which will fail (it actually may depend on, say, the total commit count in the repository history). The solution I've used is to retrieve data in small portions and decrease a portion size in case of error.

Project structure

If you want to dig into the code a little bit, here is a short overview of the project's structure. The root package contains several modules and sub-packages encapsulating particular areas of responsibility.

langauge_trends.github

Everything related to working with GitHub API. One of the key points here is a separation of query forming methods from actual IO.
lanaguge_trends.data

Data persistence. The data model is pretty simple, therefore plain SQL queries are used. I implemented a simple migrations mechanism because I didn't like the complexity of existing solutions (use of a separate config, for example).
language_trends.scan

The core of the application—the scanning process. Glues github and data modules together.
language_trends.ui

A very simple Flask application that, in essence, serves an almost static page. Data for the chart is embedded into a page in a JavaScript variable. JavaScript and HTML in this project are bad, I know :-)

Playing around

Prerequisites

Python 3.6
PostgreSQL
GitHub account :-)

Installation

Install requirements:
```
cd <checkout-directory>
python -m venv .
./bin/python -m pip install -r requirements.txt
```
On Windows you probably should use ./Scripts/python.exe instead of ./bin/python.
Create a database
Edit CONNECTION_PARAMS in the file language_trends/data/access.py to make it point on a newly created database.
Put your GitHub API access token in file auth_token.txt. See instructions on how to generate a token. No special grants are required (the app uses only publicly available information), so, you can leave all the checkboxes on the token generation page unchecked.

Running

Running a scan process

./bin/python -m language_trends.scan [<languages>]

<langugages> is an optional space-separated list of languages, e.g. clojure python. If it isn't specified, all known languages will be scanned in order. The list of known languages is stored in variable ALL_LANGUAGES in module language_trends.languages.

Be aware that full scan may take many hours or even days. It's mainly caused by GitHub's rate limits—most of the time the scanner is just waiting for a new quota.

Running a development web-server

export FLASK_APP=language_trends.ui
./bin/flask run

For enchanced debugging and automatic reload you can also perform export FLASK_DEBUG=1.

Additional development tasks

Database migrations

Migrate to the latest version:

./bin/python -m language_trends.data migrate

Rollback the last migration:

./bin/python -m language_trends.data rollback

(CAUTION!) Rollback to an empty database:

./bin/python -m language_trends.data rollback all

Migrations are performed automatically when running scanning process or web server, so, you don't have to do it manually.

Language statistics

./bin/python -m language_trends.stat

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
language_trends		language_trends
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

language_trends

language_trends

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Language Trends

Project structure

Playing around

Prerequisites

Installation

Running

Running a scan process

Running a development web-server

Additional development tasks

Database migrations

Language statistics

About

Releases

Packages

Languages

olegknyazev/language-trends

Folders and files

Latest commit

History

Repository files navigation

Language Trends

Project structure

Playing around

Prerequisites

Installation

Running

Running a scan process

Running a development web-server

Additional development tasks

Database migrations

Language statistics

About

Resources

Stars

Watchers

Forks

Languages