Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seqkit amplicon only keeps one amplicon #191

Open
mlosilla opened this issue Feb 22, 2021 · 5 comments
Open

seqkit amplicon only keeps one amplicon #191

mlosilla opened this issue Feb 22, 2021 · 5 comments

Comments

@mlosilla
Copy link

Hi,

In version 0.15.0 seqkit amplicon only keeps one amplicon (the largest) per primer pair. I don't always care about the largest amplicon.

Would it be possible to implement the feature of keeping all valid amplicons? A bed file with the start and end bases (columns 2 and 3) of all valid pairs would be fantastic. That would permit me to inspect the amplicons and keep the one I want (in my case, the smallest amplicon over a certain threshold).

Thanks

@shenwei356
Copy link
Owner

Right, PCR could produce all combinations of the forward and backward primers. We should output them too.

@mlosilla
Copy link
Author

yeah exactly.

In my case, the positions in the bed file would be enough. I would choose the amplicon I want and use the positions as trimming points for my fastq reads--but the sequences of the PCR products may be useful for other users.

Thank you for considering it!

@shenwei356
Copy link
Owner

In my case, the positions in the bed file would be enough.

Oh, you can use seqkit loate first, which outputs BED format.

@zjhzxjm
Copy link

zjhzxjm commented Nov 3, 2022

期待这个功能早日上线

@MostafaYA
Copy link

Hi, I have the same issue here. Outputting the largest amplicon only was misleading in my case, as I was looking for an amplicon within repititive RNA genes of an eukaryotic genome. Instead of predicting a correct PCR amplicon of size around 400bp, seqkit amplicon produced an amplicon of ~50 Kb which does not make any sense in my case.

As a solution, I see that outputting all valid amplicons is the best option, or you may add an option expected_amplicon_size to give this length a priority or to make an upper limit for the amplicon prediction.

Here is a toy example

echo -ne ">seq\nAGTACCTTGGTAGGAGTTTCCTGCTAATGATAAGAATGATATTGGACTAAGTAATGTTGCAAATATAGAAACTGAT\n" | seqkit amplicon -F GGTAGG -R ATCAG
echo  "AGTACCTTGGTAGGAGTTTCCTGCTAATGATAAGAATGATATTGGACTAAGTAATGTTGCAAATATAGAAACTGAT" > seq 
echo -ne ">seq\n" > multi_seq.fa 
for seq in {1..10}; do cat seq >> multi_seq.fa; done 
cat multi_seq.fa | seqkit amplicon -F GGTAGG -R ATCAG | seqkit stats 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants