Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demultiplexing tip for undetermined fastq files #22

Open
bfremin opened this issue Apr 22, 2019 · 3 comments
Open

Demultiplexing tip for undetermined fastq files #22

bfremin opened this issue Apr 22, 2019 · 3 comments

Comments

@bfremin
Copy link
Contributor

bfremin commented Apr 22, 2019

We have been getting data back as a giant fastq file of undetermined reads (instead of bcl) with the barcode in the read name. Most tools that demultiplex from fastq were very slow, could not be parallelized, and/or failed. This is just a pre-preprocessing tip.

You need two files (a file that lists your barcodes, and a script)

barcodes.txt:
samplenameA GGACTCCT+AGAGGATA
samplenameB TAGGCATG+AGAGGATA
samplenameC CTCTCTAC+AGAGGATA
...all your samples

demultiplex.sh
#!/bin/bash
module load sickle/1.33

#demultiplex samples
grep -A3 --no-group-separator -i $2 {giant_UndeterminedFile_1.fq} | gzip > $1_1.fq.gz &
grep -A3 --no-group-separator -i $2 {giant_UndeterminedFile_2.fq} | gzip > $1_2.fq.gz &
wait

#remove instances that do not have pairs (trimming will fail if you do not)
sickle pe -f $1_1.fq.gz -r $1_2.fq.gz -t sanger -o paired_$1_1.fq -p paired_$1_2.fq -s $1_single.fq

Run:
cat barcodes.txt | xargs -l bash -c 'sbatch ..... demultiplex.sh $0 $1'

Will save you a lot of time instead of trying existing tools.

@elimoss
Copy link
Contributor

elimoss commented Apr 22, 2019

It would be extremely useful to incorporate this into this workflow in some automated fashion

@bfremin
Copy link
Contributor Author

bfremin commented Apr 23, 2019

Yeah I can try something. It is only 2 commands though.

@elimoss
Copy link
Contributor

elimoss commented Apr 23, 2019

if you feel like tackling this, by all means do it and submit a pull request. it'll need the dependency taken care of with either conda or a container, and the new input will have to be integrated into the config, workflow and docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants