Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subseq extracting reads with query name list failed #169

Open
yangyxt opened this issue Mar 2, 2021 · 1 comment
Open

Subseq extracting reads with query name list failed #169

yangyxt opened this issue Mar 2, 2021 · 1 comment

Comments

@yangyxt
Copy link

yangyxt commented Mar 2, 2021

I used the seqtk/1.3 version and I use subseq <in.fq> <name.lst> to extract reads from fq file and it failed.

The in.fq is a simulated file so the query name is with an ascending ID number:
image

I use awk to confirm that the query names specified by name.lst file are existed in fq file. Then I tried to extract the first several reads, using a file stored the names of sim_sample_1_chr7-chr7-1, sim_sample_1_chr7-chr7-3, sim_sample_1_chr7-chr7-5(one name per line). And it worked. But if I chose a query name ranked far behind in the fq file, the extraction carried by seqtk subseq failed!

Upon my test, If I try to fetch read before query name sim_sample_1_chr7-chr7-343063, it all works well. Any query name comes behind this failed to be extracted.

Here I show u an example, First a screenshot of a test name.lst:
image
(I assure u every query name in this list exist in the in.fq file, confirmed by awk)

Then a screenshot of the extracted sequences by commanding seqtk subseq in.fq name.lst | less -S -
image

I was so confused why this happened?! Does seqtk only read a part of fq file into memory for inspection? Pls help take a look at this issue at ur convenience. Much appreciated.

@yangyxt
Copy link
Author

yangyxt commented Mar 3, 2021

I just used seqtk seq to view the same fastq file and found that it ends at the query name sim_sample_1_chr7-chr7-343063. Why there is a line limit here for seqtk to inspect on data, I don't see any introduction on the manual about this limit and any argument I can use to remove this restriction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant