Role of genomics on regulating rice grain metabolic variability under warmer nights: A statistical and image-based deep learning approach
Preprint: link
It has been argued that metabolites can be used to accelerate crop improvement because metabolic profiles in crops are generally under genetic control. Evaluating the role of genetics in metabolic variation is a longstanding challenge. Rice, one of the world's most important staple crops, is known to be sensitive to recent increases in nighttime temperatures. Quantification of metabolic levels can help measure rice responses to high nighttime temperature (HNT) stress. However, the extent of metabolic variation that can be explained by regression on whole-genome molecular markers remains to be answered. In the current study, primary metabolites of a rice diversity panel generated from grains using gas chromatography-mass spectrometry were used. The metabolites obtained were low to moderately heritable, and the genomic prediction accuracies of the metabolites were within the expected upper limit set by their genomic heritability estimates. Genomic heritability estimates were slightly higher in the control group than in the HNT group. Genomic correlation estimates for the same metabolites between the control and HNT conditions indicated the presence of genotype by environment interactions. Reproducing kernel Hilbert spaces regression and deep learning, which treat markers as images, improved prediction accuracy, suggesting that some metabolites are under non-additive genetic control. Joint analysis of multiple metabolites simultaneously was effective in improving prediction accuracy by exploiting correlations among metabolites. The current study serves as an important first step in evaluating the cumulative effects of the genome in regulating metabolic variation under control and HNT conditions.
- .Rmd file Including metabolite and genotype data cleaning
- .R file Using sommer package to calculate heritability for metabolites.
- .Rmd file Drawing heritability plots.
- .R file Running Single trait GBLUP in cluster.
- .Rmd file Drawing Single trait GBLUP plots.
- .Rmd file Selecting suitable bandwidth for RKHS.
- .R file Runing Single trait RKHS in cluster.
- .R file Running multi-trait genomic correlation.
- .Rmd file Drawing multi-trait genomic correlation plots.
- .Rmd file Factorial analysis to identify underlying latent factors controlling metabolites.
- .R file Running MegaLMM for genomic prediction.
- .R file Running MegaLMM for RKHS.
- .Rmd file Drawing barplot, density plots for MegaLMM genomic prediction model.
- .Rmd file Drawing genomic correlation density plot.
Figure 8: Percentage difference of gain in prediction accuracy for multi-trait genomic best linear unbiased prediction (MegaLMM-G) and multi-trait reproducing kernel Hilbert spaces regression (MegaLMM-GK) relative to single-trait genomic best linear unbiased prediction (A). Density plots of percentage difference are shown for MegaLMM-G (B) and MegaLMM-GK (C).
- .ipynb Shows examples about how to convert SNP tabular data into SNP images.
- .py file Loop converting for SNPs in all chromosomes.
- .py file Convolutional neural network with multiple branches.
- .Rmd file Drawing barplot to compare performance of all deep learning models and RKHS.