Sufficient information to run the pipeline #1

Laolga · 2024-02-05T10:56:23Z

Dear authors,
Please provide information needed to execute your pipeline:

what is the format of the samplesheet
how can one know barcodes before running any analysis?
What is BARCODE_START_CYCLE?
What is rc?
What is ad?

pas2182 · 2024-02-06T01:21:50Z

When you sequence with an Illumina sequencer, there is a standard file called a "sample sheet" that is required for demultiplexing by the Illumina Experiment Manager. It is also required for demultiplexing with cell ranger, which provides details on formatting this file here: https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/inputs/cr-mkfastq.
10x Genomics uses a pre-defined set of barcode sequences for each kit. For example, for the 10x Multiome kit, you can find the details of the barcode sequences here: https://kb.10xgenomics.com/hc/en-us/articles/4412343032205-Where-can-I-find-the-barcode-whitelist-s-for-Single-Cell-Multiome-ATAC-GEX-product.
BARCODE_START_CYCLE is the cycle of sequencing in the cell-identifying barcoding-containing read where the first base of the cell-identifying barcode is read.
rc = revserse complement. Use this option if your cell-identifying barcode list contains the reverse complement of the barcodes read by the sequencer.
ad = adapter. This is the adapter sequence to be trimmed from the end of short fragments.

mariaZig · 2024-05-26T13:36:31Z

Hello,

Thanks for the custom pipelines and for this nice protocol!

On a similar note, I'm having trouble understanding the exact input I should use to run the DNA-based pipeline.

Would it be possible to give me a specific example?

Please provide if possible an example samplesheet.csv file and also a specific value for the "--directory" parameter assuming that I already have my FASTQ files ready, so I don't need to run cellranger to produce them from the BCL files.

Thanks in advance,
Maria

tro2104 · 2024-05-28T18:21:43Z

Hello Maria,

Here is an example of a few runs and how to set up the software. It assumes the fastq files are in the directory that bcl2fastq would create. So you need to make that path and put your fastq's in it if you don't have that path already.

Create conda environment
conda create -n cutadapt -c bioconda -c conda-forge cutadapt python=3.9 bwa pysam samtools numpy3
Download dna10x pipeline from github
Create sample sheet in directory with the pipeline
vim ss.csv
i
Lane,Sample,Index
*,PTO035,SI-NA-F1
wq

Download reference or use 10x cellranger references

Make Directories
Within the dna10x directory with all the associated .py files create the following path
mkdir PTO035/outs/fastq_path/PTO035/PTO035/

Run pipeline
<With exisitng fastqs, assumed to be located in dna10x/PTO035/outs/fastq_path/PTO035/PTO035/>
nohup python dna10x.py --samplesheet ss.csv -d PTO035 -b /opt/cellranger-atac-2.0.0/lib/python/atac/barcodes/737K-arc-v1.txt -t 16 -r /opt/refdata-cellranger-arc-GRCh38-2020-A-2.0.0/fasta/genome.fa -i 1000 -m 0.9 -c -sf -p 1 -rc -ad CTGTCTCTTATACACATCT &

<With BCL to fastq, may need to install bcl2fastq>
nohup python dna10x.py --bcl ~/230407_NB551203_0654_AH5LM2BGXT --samplesheet ss.csv -d PTO035 -b /opt/cellranger-atac-2.0.0/lib/python/atac/barcodes/737K-arc-v1.txt -t 16 -r /opt/refdata-cellranger-arc-GRCh38-2020-A-2.0.0/fasta/genome.fa -i 1000 -m 0.9 -p 1 -rc -ad CTGTCTCTTATACACATCT -c &

Tim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sufficient information to run the pipeline #1

Sufficient information to run the pipeline #1

Laolga commented Feb 5, 2024 •

edited

pas2182 commented Feb 6, 2024

mariaZig commented May 26, 2024

tro2104 commented May 28, 2024

Sufficient information to run the pipeline #1

Sufficient information to run the pipeline #1

Comments

Laolga commented Feb 5, 2024 • edited

pas2182 commented Feb 6, 2024

mariaZig commented May 26, 2024

tro2104 commented May 28, 2024

Laolga commented Feb 5, 2024 •

edited