Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

References for anaerobic benchmarking #325

Open
2 of 4 tasks
mperisin-lallemand opened this issue Sep 14, 2022 · 10 comments
Open
2 of 4 tasks

References for anaerobic benchmarking #325

mperisin-lallemand opened this issue Sep 14, 2022 · 10 comments

Comments

@mperisin-lallemand
Copy link

mperisin-lallemand commented Sep 14, 2022

Description of the issue:

I am looking to benchmark the model under anaerobic carbon limited conditions. I noticed that the main README now features a plot of in silico vs. experimental growth rates for combinations of C and N limitation in aerobic and anaerobic conditions (https://github.com/SysBioChalmers/yeast-GEM/blob/main/growth.png). Are there available reference datasets corresponding to these conditions (both flux tables for the simulations, and growth/metabolite measurements for the experiments)? Are there other references you can recommend for benchmarking the model under anaerobic conditions?

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Done this analysis in the develop branch of the repository
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue
@edkerk
Copy link
Member

edkerk commented Sep 14, 2022

The graph has been shown in previous publications (e.g. yeast8 Supplementary Figure 4C), and is generated by this function. It uses this data, which was gathered in Österlund et al. 2013. Under "Methods" - "In silico growth simulations" it cites to the source of those values, evenwhile it does not explicitly mention to the precise paper in its Supplementary file 5.

There might be more recent (post-2013) articles with relevant chemostat data, although anaerobic conditions are not as commonly studied, as you probably have also noticed. We'd be happy to enrich our benchmarking dataset with any additional datapoints though!

@mperisin-lallemand
Copy link
Author

Thanks for the quick reply. It appears that the plot on the README cuts off the C-limited anaerobic point with experimental growth rate of 0.369 h^-1 likely due to line 47 in https://github.com/SysBioChalmers/yeast-GEM/blob/main/code/modelTests/growth.m lim = max(exp_max,mod_max)+0.05;. Do you have any ideas why the model consistently overestimates growth rates for C-limited anaerobic conditions?

@edkerk
Copy link
Member

edkerk commented Sep 14, 2022

You're right that one point is no longer on the graph, the function is indeed sloppy to define the axes (it worked for yeast8 Supplementary Figure 4C:
image
but the function hasn't been changed since then, and it clearly is not suitable anymore.

We'll change

exp_max = max(exp_data2(:,4));
mod_max = max(mod_data1(:,4));

to the following in the next yeast-GEM release:

exp_max = max(exp_data(:,4));
mod_max = max([mod_data1(:,4);mod_data2(:,4);mod_data3(:,4);mod_data4(:,4)]);

But looking back at the yeast8 paper (figure above), it seemed like the C-limited anaerobic predictions have worsened since then (it was already overpredicted then, but not as drastically). But I'm not certain what model version was used to make the figure in the paper, would be interesting to trace back where it worsened. More general, a worse fit to anaerobic suggest that the growth associated energy requirement (GAM) is different between aerobic and anaerobic. Perhaps some biosynthetic pathway or macromolecule polymerization will use a non-optimal variant when no oxygen is available? Or it's a modelling artefact, hard to know a priori.

@mperisin-lallemand
Copy link
Author

mperisin-lallemand commented Sep 15, 2022

I am attempting to recreate the in silico vs experimental growth rate plot and I am confused by the rescaling of biomass under N-limited conditions in the growth.m script:

if strcmp(mode2,'N')
model_origin = scaleBioMass(model_origin,'protein',0.289,'',false);
model_origin = scaleBioMass(model_origin,'lipid',0.048,'',false);
model_origin = scaleBioMass(model_origin,'RNA',0.077,'carbohydrate',false);
end

After these modifications there is a function call on line 80 for anaerobic conditions, which will re-scale the biomass further and undo the protein modification:

if mode1 == 2
model_origin = anaerobicModel(model_origin);
end

Is there a reference for these changes?

@mperisin-lallemand
Copy link
Author

I see that the anaerobicModel.m script references Nissen et al. 1997.

%1st change: Refit GAM and NGAM to exp. data, change biomass composition
GAM = 30.49; %Data from Nissen et al. 1997
P = 0.461; %Data from Nissen et al. 1997
NGAM = 0; %Refit done in Jouthen et al. 2012

However, in the methods for that paper, the authors claim:
"The composition of protein, DNA, RNA and lipids is assumed to be constant under all growth conditions. This assumption is supported by measurements of the amino acid composition of the protein under various growth conditions. Whereas the amino acid composition of the protein was measured (data not shown), we use the composition of nucleotides in RNA and DNA obtained by de Robichon-Szulmajster & Surdin-Kerjan (1971) and the lipid composition described by Rattray (1988). The unsaturated fatty acids and the sterols are assumed to be supplied by respectively the Tween 80 and the ergosterol content of the medium."

Further in the results:
"In this study, the cellular composition was therefore determined at four different dilution rates (see Table. 4). The most important variation in the cellular composition is that the amount of active machinery, i.e. protein and RNA, increases linearly with increasing dilution rate at the expense of carbohydrates. The cellular content of other components is virtually independent of the dilution rate."

@edkerk
Copy link
Member

edkerk commented Sep 15, 2022

There are two different changes on biomass:

  1. Scaling the ratio of macromolecules, as N-limited conditions have somewhat different levels (proteins most significant: 29% (ref) instead of 46% (ref).
  2. Removing some metabolites (heme, NAD(P)(H), ergosterol) from the biomass equation, as they cannot be produced in anaerobic conditions (NAD(P)(H) is of course not realistic, the model should really be able to produce it, but we haven't identified where the problem lies). But, in this change, the rest of biomass is not scaled, so it does not overwrite the changes made in 1.

These biomass composition changes are all introduced in the yeast8 paper.

@edkerk
Copy link
Member

edkerk commented Sep 15, 2022

Regarding your second comment: biomass composition is indeed not scaled as an effect of anaerobiosis, only as an effect of N- vs. C-limitation.

And indeed, biomass composition varies somewhat anyway, dependent on growth rate, media, cultivation conditions, stand of the moon. But the most significant change would be the reduction in protein content during N-limitation (which should be compensated for by something, and we assume this is carbohydrates), and the complete inability to synthesize certain metabolites (e.g. ergosterol).

@mperisin-lallemand
Copy link
Author

Thanks again for the lightening fast responses! I am still confused by line 17 in anaerobicModel.m.

model = scaleBioMass(model,'protein',P,'carbohydrate',false);

This function does not change the biomass composition for anaerobic conditions?

@edkerk
Copy link
Member

edkerk commented Sep 15, 2022

Mea culpa, I completely missed that line. You're right, it indeed reverts the changes that were earlier introduced for N-limitation. Quick fix is to just swap these biomass-modifying functions around in growth.m: first address anaerobiosis, then N-limitation.

But, this also draws me to what is likely the cause of/contributing to the problem:

GAM = 30.49; %Data from Nissen et al. 1997
P = 0.461; %Data from Nissen et al. 1997
NGAM = 0; %Refit done in Jouthen et al. 2012
model = changeGAM(model,GAM,NGAM);

The GAM is changed! In the yeast-GEM.xml model, the GAM is fitted at 55.4 since #159 (release 8.3.1), using the Van Hoek 1998 chemostat data. This GAM is used for the "normal" (=aerobic) simulations, for the anaerobic simulations the GAM is changed to 30.49. Lower GAM would mean less energetic cost for the model: overprediction of growth.

Meanwhile, I cannot find (after a quick look at the article) the 30.49 number in Nissen et al. 1997, although its Table 3 seems to suggest 54.32 instead? Not sure, because it also suggest widely different values for different carbon sources, which is counterintuitive to me.

@mperisin-lallemand
Copy link
Author

Yes! The GAM modification makes a huge difference. I re-created the experimental vs in silico growth rate plot with yeast 8.6.0 and removed all GAM and biomass composition modifications. Here are plots before and after these changes.
yeast8 6_exp_vs_sim_growth_rates
yeast8 6-nogammod_exp_vs_sim_growth_rates
The N-limited aerobic in silico growth rates are different, so perhaps the biomass composition adjustment is necessary for this condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants