Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
fronchetti committed Aug 10, 2022
1 parent 65ea210 commit 62930cb
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions scripts/README.md
Expand Up @@ -3,6 +3,7 @@
This is probably the most complex folder of all the repository, so I will try to be as detailed as possible.

This folder is organized as follows:
- If you want to run the code available in this folder, start by installing all the Python dependencies using [PIP](https://pypi.org/project/pip/) and the `requirements.txt` file (On terminal, your command should look like this: `pip install -r requirements.txt`).
- If you are looking for how we extracted documentation data from GitHub, you should look at the `scraper` folder. The `api_scraper.py` file is the main file of this folder, containing the code that requests custom URLs to GitHub API. The file `main.py` presents the whole process of extracting a documentation file, `scrapy.py` shows how to do the URL requets to the `api_scraper.py` module and `validate.py` shows how we validated if a documentation file was valid for qualitative analysis or not. If you want to know how we converted the markdown files to spreadsheets, take a look at `export.py` (Please noticed that we use cmark-gfm to convert the markdown content to plaintext and, if you want to run it, you will need to build cmark-gfm on your computer). More information about all these files are given in doctstrings.
- Inside the `classifier` folder you will find how we performed all the classification steps until getting a final model. The subfolders are supposed to as intuitive as possible. The `data_preparation` folder, contains the code about how we prepared data for classification, the `model_selection` folder about how we selected the best estimator for our problem, the `results_report` should contain scripts used to report our final model, and the `classification` folder contains the code used to perform classification. If you want to understand the whole process, I recommend starting with the `main.py` file, where I tried to split in clear methods the stages of this process.

Expand Down

0 comments on commit 62930cb

Please sign in to comment.