Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: update general kcat database #157

Open
edkerk opened this issue May 25, 2022 · 6 comments
Open

refactor: update general kcat database #157

edkerk opened this issue May 25, 2022 · 6 comments

Comments

@edkerk
Copy link
Member

edkerk commented May 25, 2022

Description of the new feature:

By repurposing some of the DLKcat code, construct a JSON file similar as in DLKcat. This JSON contains kcat values from BRENDA and SABIO-RK databases, and could be run every 6-12 months.

In contrast to the current DLKcat code, some filtering steps should be skipped, to (a) keep specific activities; (b) keep cases where no amino acid sequence could be assigned to. By doing so, this kcat database can be used in GECKO's fuzzy matching approach. The file thereby replaces the max_Kcat and max_SA files that GECKO currently uses.

This JSON file should be loaded into MATLAB, to be used in GECKO fuzzy matching (which is refactored in a separate Issue).

However, if the purpose of this file is only used for GECKO's fuzzy matching, then we might as well stick with the existing max_Kcat and max_SA files? Let's keep this Issue on hold for now.

@mihai-sysbio
Copy link
Member

I'm not totally sure on the scope of the issue: is it about creating a local kcat file or about how that file is used by GECKO? Perhaps it would be easier to discuss verbally and write up conclusions.

@edkerk
Copy link
Member Author

edkerk commented Jun 27, 2022

Both about the actual file and how it is used. But it sounds like GotEnzymes will also be able to provide such a file (did not know this when opening this issue), so this issue will most likely be addressed when GECKO and GotEnzymes can work together.

@edkerk
Copy link
Member Author

edkerk commented Jul 1, 2022

Does not strictly have to be JSON, but it should be flat-text so that it is diff-able.

@edkerk edkerk added the gecko3 label Jul 1, 2022
@edkerk
Copy link
Member Author

edkerk commented Jul 2, 2022

For reference, this is what GotEnzymes output looks like (showing only the first 28 lines here):

{"enzymes":[{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04770","ec_number":"4.4.1.1;4.4.1.11","compound":"C00109","kcat_values":1.1646},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04770","ec_number":"4.4.1.1;4.4.1.11","compound":"C00014","kcat_values":4.2309},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R09366","ec_number":"4.4.1.1;4.4.1.13","compound":"C05703","kcat_values":1.5564},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R00782","ec_number":"4.4.1.1;4.4.1.13;4.4.1.28","compound":"C00022","kcat_values":0.7785},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R09366","ec_number":"4.4.1.1;4.4.1.13","compound":"C00022","kcat_values":0.7785},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R02408","ec_number":"4.4.1.1;4.4.1.13;4.4.1.35","compound":"C01962","kcat_values":2.2991},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04930","ec_number":"4.4.1.1","compound":"C00109","kcat_values":1.1646},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R00782","ec_number":"4.4.1.1;4.4.1.13;4.4.1.28","compound":"C00283","kcat_values":2.2484},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R09366","ec_number":"4.4.1.1;4.4.1.13","compound":"C00014","kcat_values":4.2309},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R02408","ec_number":"4.4.1.1;4.4.1.13;4.4.1.35","compound":"C00022","kcat_values":0.7785},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04930","ec_number":"4.4.1.1","compound":"C05699","kcat_values":1.3478},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R09366","ec_number":"4.4.1.1;4.4.1.13","compound":"C05689","kcat_values":1.3304},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04930","ec_number":"4.4.1.1","compound":"C00014","kcat_values":4.2309},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04770","ec_number":"4.4.1.1;4.4.1.11","compound":"C05703","kcat_values":1.5564},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R02408","ec_number":"4.4.1.1;4.4.1.13;4.4.1.35","compound":"C00014","kcat_values":4.2309},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R02408","ec_number":"4.4.1.1;4.4.1.13;4.4.1.35","compound":"C00491","kcat_values":1.0792},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R01001","ec_number":"4.4.1.1","compound":"C02291","kcat_values":0.476},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04770","ec_number":"4.4.1.1;4.4.1.11","compound":"C05335","kcat_values":1.4882},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R01001","ec_number":"4.4.1.1","compound":"C00014","kcat_values":4.2309},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R01001","ec_number":"4.4.1.1","compound":"C00109","kcat_values":1.1646},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R01001","ec_number":"4.4.1.1","compound":"C00097","kcat_values":0.1704}
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R00782","ec_number":"4.4.1.1;4.4.1.13;4.4.1.28","compound":"C00097","kcat_values":0.1704}
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R00782","ec_number":"4.4.1.1;4.4.1.13;4.4.1.28","compound":"C00014","kcat_values":4.2309}
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04930","ec_number":"4.4.1.1","compound":"C05688","kcat_values":0.7746}
{"gene":"YAL023C","organism":"sce","domain":"E","ko":"K00728","reaction_id":"R11399","ec_number":"2.4.1.109","compound":"C03862","kcat_values":5.6405}
{"gene":"YAL023C","organism":"sce","domain":"E","ko":"K00728","reaction_id":"R11399","ec_number":"2.4.1.109","compound":"C00110","kcat_values":5.1653}
{"gene":"YAL023C","organism":"sce","domain":"E","ko":"K00728","reaction_id":"R04072","ec_number":"2.4.1.109","compound":"C03862","kcat_values":5.6405}
{"gene":"YAL023C","organism":"sce","domain":"E","ko":"K00728","reaction_id":"R04072","ec_number":"2.4.1.109","compound":"C00110","kcat_values":5.1653}

@mihai-sysbio
Copy link
Member

Just a note, the above is without having exposed a purpose-built API. Therefore, my recommendation would be to see that as a foundation to build a more compact and potentially more usable response.

@edkerk edkerk removed the enhancement label Jul 8, 2022
@mihai-sysbio
Copy link
Member

Here is the link to the promised API https://metabolicatlas.org/api/v2/#/GotEnzymes

@edkerk edkerk changed the title feat: generate JSON general kcat database feat: update general kcat database Dec 22, 2022
@edkerk edkerk mentioned this issue Dec 22, 2022
2 tasks
@edkerk edkerk changed the title feat: update general kcat database refactor: update general kcat database Mar 5, 2023
@edkerk edkerk removed the gecko3 label Mar 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants