You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I downloaded the entire uniclust30 filtered database from AWS and I see that some clusters have only the pdb folder and are missing the a3m folder with the MSA. Is there a reason for that?
I counted 677 clusters that have this problem. Here are some of them: A0A023B4W7, A0A023SCZ3, A0A044VF87, A0A059F1C3.
Here is a file with all the clusters missing the MSA: missing_msas.txt
Also, another question, how do I get the representative sequence for each cluster? Is it the first sequence in the a3m file? Because I saw that in some cases the first sequence is called consensus, like for instance in the case of A0A009FAV8, does that mean that the first sequence is not always the representative?
Otherwise I tried to look at the list of clusters in the Uniclust30-2018_08 website to find the representative sequences but it looks like the cluster names are not the same as in OpenProteinSet.
The text was updated successfully, but these errors were encountered:
Hello, I downloaded the entire uniclust30 filtered database from AWS and I see that some clusters have only the
pdb
folder and are missing thea3m
folder with the MSA. Is there a reason for that?I counted 677 clusters that have this problem. Here are some of them:
A0A023B4W7, A0A023SCZ3, A0A044VF87, A0A059F1C3
.Here is a file with all the clusters missing the MSA: missing_msas.txt
Also, another question, how do I get the representative sequence for each cluster? Is it the first sequence in the
a3m
file? Because I saw that in some cases the first sequence is calledconsensus
, like for instance in the case ofA0A009FAV8
, does that mean that the first sequence is not always the representative?Otherwise I tried to look at the list of clusters in the Uniclust30-2018_08 website to find the representative sequences but it looks like the cluster names are not the same as in OpenProteinSet.
The text was updated successfully, but these errors were encountered: