New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ffindex_order giving many fewer entries in output .ff(data,index} files than in input files #343
Comments
I have the same issue sorting a the a3m and hhm databases of an entire proteome. Please let me know if you found a solution. Thanks! |
Hi Idrinnen I'm still trying to find a robust solution to this; what I can say with some confidence is that sometimes it appears to work and sometimes it doesn't - the solutions in the github issues pages aren't very reliable. I have to put more effort in this week (by sheer coincidence!) to get it working. |
Hi Idrinnen Sorry about the long delay - I've been distracted by other things for a couple of months, but I can now come back to this. What I have found so far is that if I try to add entries to an existing DB then mostly it works but barfs on some entries, so I end up with an incomplete (and inconsistent) DB. The only way I've managed to build full DBs from my particular selection (so far, I am now working on this again) is to rebuild the DB from scratch whenever I have to add new entries. I don't think this is the best solution but it works for me (unfortunately, I have to update my DB once a week to include new entries from the PDB) and at least doesn't fail. My guess is that I'm doing something wrong somewhere... |
Finally got round to looking at this. My solution (which works okay for me) is to rebuild the DB from the a3m files every time - I put this into a cron job to run once a week when the PDB is updated. Uses more processing time than just adding entries, but it does seem to be robust. Until hh-suite has more funding so that problems like this can be answered by the hh-suite group, this is what I will be going with. |
I'm trying to create a custom database, but find that I am getting many fewer entries in _a3m.ff{data,index} and _hhm.ffindex after running ffindex_order than are present in the input files.
I don't have the mpi versions installed.
My normal DB will have ~70K entries, but this behaviour can be seen with 100 entries in the initial _???.ff{data,index} files -
Running hh-suite/3.3.0 on Intel hardware, CentOS Linux release 7.9.2009 (Core):
cstranslate -f -x 0.3 -c 4 -I a3m -i full_a3m -o full_cs219
sort -k3 -n full_cs219.ffindex | cut -f1 > sorting.dat
cat sorting.dat | sed 's/.a3m$/.hhm/' > sorting.hhm # need this step because ffindex_order does not run for .hhm files with the extensions in the original sorting.dat file
ffindex_order sorting.hhm full_hhm.ff{data,index} full_hhm_ordered.ff{data,index}
ffindex_order sorting.dat full_a3m.ff{data,index} full_a3m_ordered.ff{data,index}
wc -l full*index
100 full_a3m.ffindex
6 full_a3m_ordered.ffindex
100 full_cs219.ffindex
100 full_hhm.ffindex
6 full_hhm_ordered.ffindex
312 total
My assumption is that I've screwed up somewhere - an indication of where would be most useful!
The text was updated successfully, but these errors were encountered: