Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tip: some jq code to get list of "good" contigs #121

Open
kdm9 opened this issue Jun 29, 2022 · 0 comments
Open

Tip: some jq code to get list of "good" contigs #121

kdm9 opened this issue Jun 29, 2022 · 0 comments

Comments

@kdm9
Copy link

kdm9 commented Jun 29, 2022

Hello,

This is mostly a PSA, as the following took me way to long to work out myself. Perhaps the authors could add this to the docs somewhere appropriate.

To filter a set of contigs based on the GC content and coverage (a la the blobplot), one can use the following jq command:

jq -r '.dict_of_blobs[] | select((.covs.bam0 > 10) and (.gc > 0.4)) | .name' \
    < path/to/something.blobDB.json \
    > goodcontigs.txt

Here, I use a coverage threshold of 10 in the first bam, and a minmum GC of 0.4. Obviously adjust these thresholds to your blobplot. Additional bams would be supported by adding something like (.covs.bam1 > 23) and within the select() function. The resulting goodcontigs.txt is a simple text list of contig names compatible with blobtools seqfilter.

Thanks for a great tool,
K

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant