Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document how to generate sample list file #5

Open
kdaily opened this issue Aug 22, 2018 · 3 comments
Open

document how to generate sample list file #5

kdaily opened this issue Aug 22, 2018 · 3 comments
Assignees

Comments

@kdaily
Copy link
Member

kdaily commented Aug 22, 2018

No description provided.

@kdaily kdaily self-assigned this Sep 12, 2018
@kdaily kdaily added this to the Sprint 1 milestone Sep 12, 2018
@bintriz
Copy link
Collaborator

bintriz commented Sep 14, 2018

#!/bin/bash

synapse query "select name, id, sample_id_biorepository, sample_id_original, experiment_id, grant, group, assay, processingKit from syn7871084 where fileFormat='fastq'" \
    |tail -n+2 \
    |cut -f4- \
    |awk -F"\t" '{print $7"\t"$3"-"$4"-"$5"-"$6"\t"$1"\t"$2"\t"$8"\t"$9}' \
    |sort > tmp.fastq.txt
 
printf "group\tsample_id\tfile\tsynapse_id\tassay\tprocessingKit\n" > tmp.header.txt
{ cat tmp.header.txt; grep 10X tmp.fastq.txt; } > Samples.10X_WGS_fastq.txt
{ cat tmp.header.txt; grep wholeGenomeSeq tmp.fastq.txt |grep -v -e 10X -e '-535-' -e '-797-'; } > Samples.regular_WGS_fastq.txt
{ cat tmp.header.txt; grep -e '-535-' -e '-797-' tmp.fastq.txt; } > Samples.shallow_WGS_fastq.txt
{ cat tmp.header.txt; grep exomeSeq tmp.fastq.txt; } > Samples.WES_fastq.txt
{ cat tmp.header.txt; grep targetedSeq tmp.fastq.txt; } > Samples.Targeted_fastq.txt

rm tmp.fastq.txt

{ cat tmp.header.txt
synapse query "select name, id, sample_id_biorepository, sample_id_original, experiment_id, grant, group, assay, processingKit from syn7871084 where group='Vaccarino' and fileFormat='bam'" \
    |tail -n+2 \
    |cut -f4- \
    |awk -F"\t" '{print $7"\t"$3"-"$4"-"$5"-"$6"\t"$1"\t"$2"\t"$8"\t"$9}' \
    |sort \
    |grep -v 10X
} > Samples.regular_WGS_bam.txt

rm tmp.header.txt

This shell script is what I used to get the sample lists for BSMN ref brain data. Among columns, my pipeline only uses sample_id, file, synapse_id. The order of columns doesn't matter. Of course, this pipeline is pretty specific to BSMN ref brain sample.

@kdaily
Copy link
Member Author

kdaily commented Jan 16, 2019

Can you put this in an executable script in this repository, and document it in the README? Then we can close.

@kdaily
Copy link
Member Author

kdaily commented Jan 16, 2019

@attilagk it would be great if you can verify for @bintriz that this is sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants