Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAFFT adjust direction #5

Open
EdBiffin opened this issue Jan 29, 2024 · 15 comments
Open

MAFFT adjust direction #5

EdBiffin opened this issue Jan 29, 2024 · 15 comments

Comments

@EdBiffin
Copy link

Ive noticed that MAFFT is generating alignments with sequences in both forward and reverse orientation. Is it possible to add the MAFFT --adjustdirection flag to the pipeline?

@edgardomortiz
Copy link
Owner

Hi @EdBiffin!

I didn't use the --adjustdirection flag because during extraction all the sequences are put in the same direction as the sequence you used as reference. I wonder if you could upload one of those alignments here, I would like to solve the issue (or at least explain it)

Thanks

Edgardo

@EdBiffin
Copy link
Author

Hi Edgardo, thanks for your quick response. Ive attached an example alignment. Im using a custom reference file which Ive also attached. Look forward to your response.
Ed
captus_refs_nu_combined.fasta.txt
6164.fna.txt

@edgardomortiz
Copy link
Owner

Thank you Ed,

Could you also tell me the actual command you used? or even better upload the extraction .log file, this is very strange, the sequences shouldn't be reversed...

Edgardo

@EdBiffin
Copy link
Author

captus-assembly_extract.log
Please find attached and please let me know if you need anything else.

@edgardomortiz
Copy link
Owner

Thanks for the patience!

Would it be possible to upload the assembly.fasta for 376903_Malleostemon_tuberculatus and 376896_Austrobaeckea_verrucosa (if they are too big, maybe other smaller assemblies that produce locus 6164 in opposite directions). Finally, so I can try to replicate the issue, what was the captus align?

@EdBiffin
Copy link
Author

@edgardomortiz
Copy link
Owner

Sorry, the link for Malleostemon got broken... (I got the other two)

@EdBiffin
Copy link
Author

@edgardomortiz
Copy link
Owner

edgardomortiz commented Jan 29, 2024

By the way, while checking the reference I noticed you have several sequences with identical names, Captus will only take one of them because they have to be unique to avoid problems (in the picture the duplicates have a 2 after the name, these are just an example, there are many more)
image

@edgardomortiz
Copy link
Owner

edgardomortiz commented Jan 29, 2024

I got it!, when you provide a reference of nuclear proteins in nucleotides (CDS), Captus needs to translate it first (because Scipio performs a translated search on the assemblies).

Because I can't assume all sequences are translatable in Frame 1, Captus tries to guess the reading frame for each sequence, it translates it in the six reading frames and selects the frame that produces the fewest stop codons.

Now, I didn't anticipate that in some references like in your case, a sequence like Syzygium_micranthum-6164 can be perfectly translated in Frame 1 and Reverse Frame 3 (and Captus chose the latter in this case), so I will modify the code to choose a positive reading frame in tied cases like this. So basically, the reversed sequences in the alignment 6164 followed this "reversed" protein from Syzygium_micranthum-6164.

Until I post the updated code, the solution would be that you provide the reference in aminoacids unfortunately (or remove Syzygium_micranthum-6164 and provide it in nucleotides) Have you noticed other cases with reversed sequences?

Edgardo

@edgardomortiz edgardomortiz reopened this Jan 29, 2024
@edgardomortiz
Copy link
Owner

Actually, in the same locus eucgr-6164 can also be translated in Reverse Frame 1 without stop codons, but with a final stop codon in Frame 1. I guess I will need to add a rule to not count a stop codon when is at the end too.

@edgardomortiz
Copy link
Owner

Hi again,

This fix will come with the next release (v1.0.1), for now just decompress this attachment and replace your current bioformats.py (in the captus folder that is inside your Captus installation folder) with this version that improves the reading frame prediction. In my tests locus 6164 is now correctly translated in the reference.
bioformats.py.zip

@EdBiffin
Copy link
Author

Hi Edgrado, that all makes sense - thanks again for your help and look forward to then next release.

@edgardomortiz
Copy link
Owner

Dear Ed,

In case you didn't patch the previous version, I made the release on Bioconda incorporating many other changes...
Let me know if it v1.0.1 works better in this aspect.

Edgardo

@EdBiffin
Copy link
Author

EdBiffin commented Mar 4, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants