ML-project-structure

Inspired from (with few changes): Cookiecutter Data Science

ML-project-structure/
├── Dockerfile                   
├── LICENSE     
├── README.md                  
├── Makefile
├── requirements.txt                       
├── setup.py                     
├── startup.sh                                       
├── configs                      
│   ├── model1.yaml
│   ├── config.ini                  
│   ├── config.py
│   └── .env   
├── data                         
│   ├── external                 
│   ├── interim                  
│   ├── processed               
│   └── raw                      
├── docs                         
├── model_weights                
├── notebooks                    
├── references                   
├── reports                                                 
└── src                          
    ├── __init__.py              
    ├── data_preprocessing       
    │   ├── data_preprocessing.py  
    │   ├── build_features.py   
    │   ├── cleaning.py          
    │   ├── ingestion.py         
    │   ├── labeling.py          
    │   ├── splitting.py         
    │   └── validation.py        
    ├── imagegeneration          # Use Case 1             
    │   ├── __init__.py
    │   ├── hyperparameters_tuning.py 
    │   ├── imagegeneration.py
    │   ├── model.py
    │   ├── predict.py
    │   ├── preprocessing.py
    │   └── train.py
    ├── textgeneration            # Use Case 2        
    │   ├── __init__.py
    │   ├── hyperparameters_tuning.py 
    │   ├── textgeneration.py
    │   ├── model.py
    │   ├── predict.py
    │   ├── preprocessing.py
    │   └── train.py
    ├── utils                  
    │   ├── __init__.py
    │   ├── common.py 
    │   ├── data_manipulation.py
    │   ├── data_operations.py
    │   └── misc.py   
    └── visualization        
        └── visualize.py

The configs folder contains all the configuration files, such as model hyperparameters.

The data folder contains all data files used during model training and tesing.

The model_weights contains pre-trained model weights.

The notebooks contains .ipynb version of our code which is readily-executable.

The reports contains generated results and outputs in image, graphs and pdf formats.

The data_preprocessing folder (the sub-folder of src) includes the following files:

ingestion.py: Used to gather the data. If you need to create backups, protect private information, or create a metadata catalog, it’s best to do it here.
cleaning.py: Used to clean the data by reducing outliers/noise, handling missing values, etc.
labeling.py: Used to label the data if necessary.
splitting.py: Used to split the data into test and train sets.
validation.py: Used to validate the data (to ensure it is ready for training).
build_features.py has been moved to this folder because building features means organizing your dataset into a specific structure.
data_preprocessing.py: Used as a single source file where class PreProcessing() can be defined with all the methods such as: ingestion, cleaning, labeling, splitting, validation

In the imagegeneration folder (the sub-folder of src) - Use Case 1:

model.py: used to define the model architecture.
preprocessing.py: Used to preprocess the data before feeding it to the model.
train.py: Used to train the model.
hyperparameters_tuning.py: Used to tune the model and/or the training hyperparameters.
predict.py: Used to make predictions on random images (not from the dataset).
imagegeneration.py: Used as a single source file for image generaton use case where a class is defined ImageGeneration() with all it's methods: train, predict, preprocessing, hyperparameters_tuning.

In the textgeneration folder (the sub-folder of src) - Use Case 2:

model.py: used to define the model architecture.
preprocessing.py: Used to preprocess the data before feeding it to the model.
train.py: Used to train the model.
hyperparameters_tuning.py: Used to tune the model and/or the training hyperparameters.
predict.py: Used to make predictions on random images (not from the dataset).
textgeneration.py: Used as a single source file for image generaton use case where a class is defined TextGeneration() with all it's methods: train, predict, preprocessing, hyperparameters_tuning.

The utils folder (the sub-folder of src) includes the following files:

common.py:
data_manipulation.py:
data_operations.py:
misc.py:

The visualizations folder includes the following:

visualize.py: this file includes functions used to visualize the data

Dockerfile:

main.py:

requirements.txt:

setup.py:

startup.sh:

LICENSE:

Some preprocessing functions can be used across all models, so it’s possible to create a single preprocessing file and move it to the data folder to avoid duplicating functions. However, it’s recommended to keep the preprocessing file separate to improve the model’s reusability and prevent potential problems in the future.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
configs		configs
data		data
docs		docs
model_weights		model_weights
notebooks		notebooks
references		references
reports		reports
src		src
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py
startup.sh		startup.sh

License

rajivgupta99/ML-project-structure

Folders and files

Latest commit

History

Repository files navigation

ML-project-structure

About

Topics

Resources

License

Stars

Watchers

Forks

Languages