Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: universal DLKcat.tsv for both light and full ecModels #248

Open
Yu-sysbio opened this issue Feb 28, 2023 · 8 comments
Open

feat: universal DLKcat.tsv for both light and full ecModels #248

Yu-sysbio opened this issue Feb 28, 2023 · 8 comments

Comments

@Yu-sysbio
Copy link
Collaborator

kcatList_DLKcat = readDLKcatOutput(ecModel);

I ran this successfully when building a full ecModel, but got the error below when building a light ecModel:

Error using readDLKcatOutput (line 56)
Not all reactions from DLKcat output can be found in model.ec.rxns

@edkerk
Copy link
Member

edkerk commented Feb 28, 2023

Did you use the same DLKcat output file to make both a light and full ecModel?

@ae-tafur
Copy link
Collaborator

ae-tafur commented Feb 28, 2023

Maybe it because here:

if ~all(ismember(rxns,model.ec.rxns))

it compares the rxn identifiers but in light version, the identifiers in model.ec.rxns have a prefix 001_r_001

@edkerk
Copy link
Member

edkerk commented Feb 28, 2023

Indeed, this check is to make sure that the kcatList can later be used by selectKcatValue, which just matches by reaction ID:

[sanityCheck,idxInModel] = ismember(kcatList.rxns,model.ec.rxns);

Short term solution is to have the error suggest to make full/light specific versions of the DLKcat.tsv file. Longer term solution is to have selectKcatValue not just check by the ec.rxns style reaction identifier, but instead removes any suffices/prefixes (so returns to the ecModel.rxns format, and also matches to the enzymes that are annotated. Or something similar, at least not only matching by reaction identifier. This will require some more refactoring and testing.

@Yu-sysbio
Copy link
Collaborator Author

Did you use the same DLKcat output file to make both a light and full ecModel?

Yes

@Yu-sysbio
Copy link
Collaborator Author

I just notice that the DLKcat output files generated by full and light protocols are not interchangeable due to rxn IDs in these two types of ecModels. I do not think that it is very necessary to make them interchangeable by refactoring code. Instead I prefer the short-term solution, and we should clarify this in the protocol.

@edkerk edkerk added enhancement and removed bug labels Feb 28, 2023
@edkerk edkerk changed the title Bug in readDLKcatOutput for light ecModel feat: universal DLKcat.tsv for both light and full ecModels Feb 28, 2023
@edkerk
Copy link
Member

edkerk commented Feb 28, 2023

Mentioned in the protocol. Longer term goal would be to enhance the parsing of the file, to avoid this.

@edkerk edkerk removed the gecko3 label Mar 5, 2023
@mihai-sysbio
Copy link
Member

Instead of mapping via the reaction ID, perhaps we could use another ID as a unique identifier for the sequence, and do the mapping back to the reaction within Matlab. From a modularity point of view, the current DLKcat.tsv mixes the concerns of GECKO and DLKcat. Was there a specific reason for avoiding gene IDs?

@edkerk
Copy link
Member

edkerk commented Jul 3, 2023

No reason to avoid gene ID, that is actually my plan, to map by the core of the reaction ID (so without prefixes and suffixes from light and full ecModels) and the gene ID. And it's a GECKO-only concern, DLKcat doesn't care about identifiers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants