Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meaning of repeated substrates #173

Open
CamiAgustini opened this issue Apr 22, 2024 · 4 comments
Open

meaning of repeated substrates #173

CamiAgustini opened this issue Apr 22, 2024 · 4 comments

Comments

@CamiAgustini
Copy link

Hi all,

I am analyzing the data from the table that dbCAN gives me and I find that there are entries in which the repeated substrate appears, for example, some appear as a substrate "chitin" and others say "chitin, chitin". What is this about?

Another thing I don't understand is what the function of the number that follows the subfamily is, for example, for a GH18 I get "GH18_e428", what does the number after the e correspond to?

@Xinpeng021001
Copy link
Contributor

Hi,
For the problem 1:
We use eCAMI (k-mer based) to create subfamilies for each CAZy family and then use CAZymes with EC number as a label to annotate the substrate (manually curation) for that subfamily. Then we build the HMM model for each subfamily wt/wo (some subfam could be assigned with substrates but some may not) substrates as the dbCAN-sub HMM.
1831713809764_ pic_hd
And sometimes CAZymes could be assigned with multiple subfamilies with different/same substrates (dbCAN-sub.out). That's the reason why she/he saw multiple substrates.

For the problem2:
"e_XX" means subfamily, such as GH18_e428.

1871713810138_ pic_hd

Please review our dbCAN-seq update (https://doi.org/10.1093/nar/gkac1068) and dbCAN3 paper (https://doi.org/10.1093/nar/gkad328) if needed.

Hope this could help you.
Please let us know if you have any other questions.

@cmkobel
Copy link

cmkobel commented Apr 23, 2024

I have another usage question pertaining to the substrates. Why are some of them missing? I would like to know the substrates of all of the cazymes. I understand that this is a matter of manual curation, but is there a place where I can find the missing ones?

Screenshot 2024-04-23 at 10 25 59

@Xinpeng021001
Copy link
Contributor

I have another usage question pertaining to the substrates. Why are some of them missing? I would like to know the substrates of all of the cazymes. I understand that this is a matter of manual curation, but is there a place where I can find the missing ones?

Screenshot 2024-04-23 at 10 25 59

Hi,
In CAZy database, there are two types of CAZymes: with EC number and without EC number (most). We use the EC number as a label to assign a substrate for our subfamily because those could be curated with known substrates from literature or databases like BRENDA. However, as we mentioned, there are a great many CAZymes without EC number and some subfamilies can only be assigned with those CAZymes. That's the reason why you can't find substrates for those subfamilies.

If you want to find those substrates, I would suggest you do some literature review or use our supplements in dbCAN3 to find a substrate/substrates at the CAZyme family level, not the subfamily level.

@yinlabniu
Copy link
Collaborator

I have another usage question pertaining to the substrates. Why are some of them missing? I would like to know the substrates of all of the cazymes. I understand that this is a matter of manual curation, but is there a place where I can find the missing ones?

Screenshot 2024-04-23 at 10 25 59

From https://doi.org/10.1093/nar/gkad328: "After the subfamily classification, 3003 CAZyme subfamilies contain experimentally characterized CAZy proteins with EC numbers, and among them only 655 (21.8%) subfamilies have more than one EC numbers (Figure 1B). 23 038 CAZyme subfamilies contain no experimentally characterized CAZy proteins and no EC numbers. Their HMMs will not help substrate prediction but can still be informative with subfamily annotation".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants