Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in Chen_MSB2009 benchmark #175

Open
FFroehlich opened this issue Jan 18, 2023 · 4 comments
Open

Errors in Chen_MSB2009 benchmark #175

FFroehlich opened this issue Jan 18, 2023 · 4 comments
Assignees

Comments

@FFroehlich
Copy link
Collaborator

Looking at the Chen_MSB2009 benchmark model, I suspect I may have identified some errors in the measurements table (https://github.com/Benchmarking-Initiative/Benchmark-Models-PEtab/blob/master/Benchmark-Models/Chen_MSB2009/measurementData_Chen_MSB2009.tsv).

The original data is available from the supplement of https://doi.org/10.1038/msb.2008.74 (MSB data), which was reused in https://doi.org/10.1371/journal.pcbi.1005331 (PLoSCB data). The issue with the MSB data is that standard deviations for measurements often contain 0 (see in supplement to https://doi.org/10.1038/msb.2008.74 _dataset/Chen et al - Experimental Data/A431_experiment.out), which makes the data not suitable for fitting. This is the likely reason why I added 0.1 to the standard deviations in the PLoSCB data (it's been a while ...; see supplement to https://doi.org/10.1371/journal.pcbi.1005331 code/project/data/getData.m lines 756-758.).

However, I ran into the following discrepancies:

ERK_PP data for model1_data3 condition in benchmark doesn't match MSB data (Low (1e-11 M) EGF condition) or PLoSCB data (D(3), lines 687-698) (looks like a copy & paster error in the benchmark data, as model for model1_data2 and model1_data3 are the same). MSB and PLoSCB data match.

AKT_PP data for model1_data4 condition in benchmark does match MSB data (Low (1e-10 M) HRG condition) but not PLoSCB data (D(4), lines 704-715) (looks like a copy & paste error in PLoSCB data, as data for model1_data3 and model1_data4 are the same. This sucks, but shouldn't affect any of the conclusions in the paper).

This of course begs the question about the origin of the benchmark data. As the data in the benchmark example also contains 0.1 values (as in the PLoSCB data) for the standard deviation instead of 0.0 values (as in the MSB data), this makes me believe the measurements file in the benchmark was likely derived from PLoSCB data (likely fixing the issue with model1_data4, but introducing the issue with model1_data3 😢).

I will refrain from making any remarks regarding how much I loathe data that is not available in easily machine readable formats and data processing pipelines that involve manual steps ...

@FFroehlich FFroehlich changed the title Errors in Chen2009 model Errors in Chen_MSB2009 benchmark Jan 18, 2023
@FFroehlich
Copy link
Collaborator Author

Ah it looks like the benchmark was exported from the Hass (MATLAB) suite where the same mismatch is present: https://github.com/Benchmarking-Initiative/Benchmark-Models/blob/master/Benchmark-Models/Chen_MSB2009/Data/model1_data4.xlsx

@FFroehlich
Copy link
Collaborator Author

FFroehlich commented Jan 18, 2023

Overall provenance of this benchmark model is a bit tricky, since both the PLoSCB implementation and the d2d implementation use standard deviations to normalize data, while in the original MSB paper measurements were normalized by the maximum for each observable across time+conditions:

image

@FFroehlich
Copy link
Collaborator Author

ping @elbaraim @dilpath

@dilpath
Copy link
Collaborator

dilpath commented Feb 5, 2023

Thanks for raising this issue, and the thorough feedback! I am currently the only maintainer of this repo now -- unfortunately, I haven't worked with this model yet.

What I got from this is:

  1. a note should be added, to say that the data used in the PLoS CB paper is different to what we provide
  2. condition model1_data3, observable ERK_PP needs to be changed to match MSB data
  3. although PLoS CB gets fitting working by specifying a standard deviation of 0.1 to some data, we need to reassess how to treat the data with 0 noise
    • since the objective function in your screenshot looks like least squares, I propose normal noise with standard deviation 1
  4. data normalization needs to be handled
    • I propose estimating scaling factor(s)

I will refrain from making any remarks regarding how much I loathe data that is not available in easily machine readable formats and data processing pipelines that involve manual steps ...

😭
Thanks for the work done already for the currently implementation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants