Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARBML/Klaam] Refactoring #11

Open
ma7dev opened this issue Apr 24, 2022 · 1 comment
Open

[ARBML/Klaam] Refactoring #11

ma7dev opened this issue Apr 24, 2022 · 1 comment
Assignees

Comments

@ma7dev
Copy link

ma7dev commented Apr 24, 2022

To organize the code and introduce testing and continuous integration, it would be beneficial to refactor the entire codebase.

TL;DR

  • Re-organizing the codebase to follow best practices and to introduce testing and continuous integration.
  • Separating logic to import the package as a separate module, scripts to localize scripts that were used for train/inference of the logic, notebooks to localize demos and simple scripts that were written as notebooks, and tests to test the logic
  • Adding GitHub Actions to test build, logic of the package, auto-generate docs, and to publish the package to pypi
  • Moving from pip and requirements.txt setup to conda for environment management and poetry for packages management. This will ease the development as the project scales.

Codebase refactoring

Mapping
file/dir action placement
FastSpeed2/* moved kaalm/external/FastSpeed2/*
dialect_speech_corpus moved klaam/speech_corpus/dialect.py
egy_speech_corpus moved klaam/speech_corpus/egy.py
mor_speech_corpus moved klaam/speech_corpus/mor.py
samples moved samples
.gitignore moved .gitignore
LICENSE moved LICENSE
README.md moved README.md
audio_utils.py moved klaam/utils/audio.py
demo.ipynb moved notebooks/demo.ipynb
demo_with_mic.ipynb moved notebooks/demo_with_mix.ipynb
inference.ipynb moved notebooks/inference.ipynb
klaam.py moved klaam/run.py
klaam_logo.PNG moved misc/klaam_logo.png
models.py moved klaam/models/wav2vec.py
processors.py moved klaam/processors/custom_wave2vec.py
requirements.txt removed  
run.sh moved scripts/run.sh
run_classifier.py moved scripts/run_classifier.py
run_common_voice.py moved scripts/run_common_voice.py
run_mgb3.py moved scripts/run_mgb3.py
run_mgb5.py moved scripts/run_mgb5.py
sample_run.sh moved scripts/sample_run.sh
utils.py moved klaam/utils/utils.py
  added docs
  added tests
  added .github
  added output
  added environment.yml
  added install.sh
  added mypi.ini
  added pyproject.toml
  added pytest.ini
  added ckpts
Tree Structure
root level 1 level2 description
.github     github stuff (e.g. github issue templates, github actions workflows, etc.)
  workflows    
    build.yml to test building of the package
    publish.yml to publish the package to pypi
    tests.yml to run tests
    docs.yml to generate documentation
klaam     the logic for the package
  utils    
    audio.py  
    utils.py  
  models    
    wav2vec.py  
  processors    
    wave2vec.py  
  external    
    FastSpeed2/*  
  speech_corpus    
    dialect.py  
    egy.py  
    mor.py  
  run.py    
notebooks      
  demo.ipynb    
  demo_with_mix.ipynb    
  inference.ipynb    
       
scripts     set of scripts to be used to train/evaluate or anything external from the logic of the package
  run.sh    
  run_classifier.py    
  run_common_voice.py    
  run_mgb3.py    
  run_mgb5.py    
  sample_run.sh    
tests     set of tests to test logics within klaam
  test_*.py    
  conftest.py    
misc      
  klaam_logo.png    
samples      
  demo.wav    
ckpts ...   checkpoints of pre-trained models that were downloaded
docs ...   documentation files
output ...    
environment.yml     conda environment definition
install.sh     installing script to setup conda environment and install dependecies using poetry
mypy.ini     pylint configuration
pyproject.toml     package definition and list of dependecies to be installed
pytest.ini     pytest configuration
LICENSE      
README.md      
.gitignore      

Environment/dependencies packages

  • conda is used to manage the environment and install essential libraries that are big/core to the package, e.g. TensorFlow, PyTorch, cudatools, etc.
  • poetry is used to manage dependencies and setup the package
  • pytest is used to enable unit/integration testing of the codebase

Commands

  • poetry add PACKAGE - to add a package (this will append to pyproject.toml)
    • If the package installation failed and couldn't find another way to add the package, then install it using conda and add to enviroment.yml manually. (leave a comment next to the line)
    • Check on the web for the right channels when install packages using conda
  • poetry install - to install the package (package_name)
  • pytest tests - to run all tests manually
  • pytest tests/TEST_PATH - to run a specific test file (check pytest documentation for more information)
@ma7dev
Copy link
Author

ma7dev commented Apr 24, 2022

@ma7dev ma7dev self-assigned this Apr 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant