Skip to content

General ML project structure which can be modified for different Use Cases

License

Notifications You must be signed in to change notification settings

rajivgupta99/ML-project-structure

Repository files navigation

ML-project-structure

Inspired from (with few changes): Cookiecutter Data Science

ML-project-structure/
├── Dockerfile                   
├── LICENSE     
├── README.md                  
├── Makefile
├── requirements.txt                       
├── setup.py                     
├── startup.sh                                       
├── configs                      
│   ├── model1.yaml
│   ├── config.ini                  
│   ├── config.py
│   └── .env   
├── data                         
│   ├── external                 
│   ├── interim                  
│   ├── processed               
│   └── raw                      
├── docs                         
├── model_weights                
├── notebooks                    
├── references                   
├── reports                                                 
└── src                          
    ├── __init__.py              
    ├── data_preprocessing       
    │   ├── data_preprocessing.py  
    │   ├── build_features.py   
    │   ├── cleaning.py          
    │   ├── ingestion.py         
    │   ├── labeling.py          
    │   ├── splitting.py         
    │   └── validation.py        
    ├── imagegeneration          # Use Case 1             
    │   ├── __init__.py
    │   ├── hyperparameters_tuning.py 
    │   ├── imagegeneration.py
    │   ├── model.py
    │   ├── predict.py
    │   ├── preprocessing.py
    │   └── train.py
    ├── textgeneration            # Use Case 2        
    │   ├── __init__.py
    │   ├── hyperparameters_tuning.py 
    │   ├── textgeneration.py
    │   ├── model.py
    │   ├── predict.py
    │   ├── preprocessing.py
    │   └── train.py
    ├── utils                  
    │   ├── __init__.py
    │   ├── common.py 
    │   ├── data_manipulation.py
    │   ├── data_operations.py
    │   └── misc.py   
    └── visualization        
        └── visualize.py   

The configs folder contains all the configuration files, such as model hyperparameters.

The data folder contains all data files used during model training and tesing.

The model_weights contains pre-trained model weights.

The notebooks contains .ipynb version of our code which is readily-executable.

The reports contains generated results and outputs in image, graphs and pdf formats.

The data_preprocessing folder (the sub-folder of src) includes the following files:

  • ingestion.py: Used to gather the data. If you need to create backups, protect private information, or create a metadata catalog, it’s best to do it here.

  • cleaning.py: Used to clean the data by reducing outliers/noise, handling missing values, etc.

  • labeling.py: Used to label the data if necessary.

  • splitting.py: Used to split the data into test and train sets.

  • validation.py: Used to validate the data (to ensure it is ready for training).

  • build_features.py has been moved to this folder because building features means organizing your dataset into a specific structure.

  • data_preprocessing.py: Used as a single source file where class PreProcessing() can be defined with all the methods such as: ingestion, cleaning, labeling, splitting, validation

In the imagegeneration folder (the sub-folder of src) - Use Case 1:

  • model.py: used to define the model architecture.

  • preprocessing.py: Used to preprocess the data before feeding it to the model.

  • train.py: Used to train the model.

  • hyperparameters_tuning.py: Used to tune the model and/or the training hyperparameters.

  • predict.py: Used to make predictions on random images (not from the dataset).

  • imagegeneration.py: Used as a single source file for image generaton use case where a class is defined ImageGeneration() with all it's methods: train, predict, preprocessing, hyperparameters_tuning.

In the textgeneration folder (the sub-folder of src) - Use Case 2:

  • model.py: used to define the model architecture.

  • preprocessing.py: Used to preprocess the data before feeding it to the model.

  • train.py: Used to train the model.

  • hyperparameters_tuning.py: Used to tune the model and/or the training hyperparameters.

  • predict.py: Used to make predictions on random images (not from the dataset).

  • textgeneration.py: Used as a single source file for image generaton use case where a class is defined TextGeneration() with all it's methods: train, predict, preprocessing, hyperparameters_tuning.

The utils folder (the sub-folder of src) includes the following files:

  • common.py:

  • data_manipulation.py:

  • data_operations.py:

  • misc.py:

The visualizations folder includes the following:

  • visualize.py: this file includes functions used to visualize the data

Dockerfile:

main.py:

requirements.txt:

setup.py:

startup.sh:

LICENSE:

Some preprocessing functions can be used across all models, so it’s possible to create a single preprocessing file and move it to the data folder to avoid duplicating functions. However, it’s recommended to keep the preprocessing file separate to improve the model’s reusability and prevent potential problems in the future.

About

General ML project structure which can be modified for different Use Cases

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages