Skip to content

ML implementations in Multi-scale model for lignin biosynthesis in Populus Trichocarpa

License

Notifications You must be signed in to change notification settings

himasai97/ML_Approaches

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ML implementations in Multi-scale model for lignin biosynthesis in Populus Trichocarpa

In this work, I present the development of a Machine Learning algorithm that can be used as a part of the multi-scale model, developed by Wang et al. and Matthews, designed to connect regulation across the different biological layers. As part of this model, the ML approach links the changes in the metabolic fluxes due to transgenic modifications of the monolignol transcript abundances with the 25 lignin and associated wood traits in the model tree P. trichocarpa. Through the study of lignin biosynthesis pathway, we aim to advance the strategic engineering of wood for timber, pulp, and biofuels.

In version 1 of the implementation, I have performed data preprocessing and trained each of the three chosen algorithms, XGB, SVR and kNN, in a pipeline and obtained optimal parameters for each model using GridSearchCV method. In addition to these steps, genetic algorithm has been added to the pipeline as a feature selection step in version 2, which significantly improved the performance of the models. After obtaining R2 scores greater than the baseline scores of the original model by Mathews, I have implemented SHAP analysis on each of the three ML algorithms and analyzed the results for two of the phenotype traits: Lignin Content and Total Carbohydrate to Lignin (CL) ratio. Further detials of this work can be found in my thesis dissertation