Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FASTP trimming Smart3-seq data removes all reads (UMI discard read2) #1200

Open
mschubert opened this issue Jan 23, 2024 · 0 comments
Open
Labels
bug Something isn't working
Milestone

Comments

@mschubert
Copy link

mschubert commented Jan 23, 2024

Description of the bug

I'm trying to process bulk Smart3-seq data using the pipeline (related Slack discussion here). In my case, the FASTQ read structure is the following:

R1: 6N UMI - GGG - transcript [- polyA - adaptors]
R2: 6N UMI - T

I specify the parameters of UMITools to construct the UMI from R1+R2 and then discard R2 (see below).

However, the trimmer (FASTP) afterwards reports that all reads are low quality or too short.

-[nf-core/rnaseq] Pipeline completed successfully with skipped sampl(es)-
-[nf-core/rnaseq] Please check MultiQC report: 18/18 samples skipped since they failed 10000 trimmed read threshold.-

I believe that this is because FASTP is called with both R1 and R2, instead of discarding R2 (see full log file below), which produces empty .fastp.fastq.gz files:

# all reads removed
fastp --in1 FF230228_13_1.fastq.gz --in2 FF230228_13_2.fastq.gz
    --out1 FF230228_13_1.fastp.fastq.gz --out2 FF230228_13_2.fastp.fastq.gz

The reason for this is that if I manually run FASTP on R1 only, it will preserve a non-zero number of reads:

# retains most reads
fastp --in1 FF230228_13_1.fastq.gz --out1 FF230228_13_1.fastp.fastq.gz

A similar issue was fixed by exposing the --umi_discard_read parameter, but I guess FASTP trimming was not included: #750.

Workaround: Not using FASTP but TrimGalore (the default) also processes the samples correctly (and outputs only one FASTQ per sample after trimming).

Command used and terminal output

nextflow run nf-core/rnaseq -r 3.11.0
    --input samples.csv
    --with_umi
    --umitools_extract_method regex
    --umitools_bc_pattern '(?P<umi_1>.{6})(?P<discard_1>GGG).*'
    --umitools_bc_pattern2 '(?P<umi_2>.{6})(?P<discard_2>T).*'
    --umi_discard_read 2
    --umitools_dedup_stats true
    --trimmer fastp # defaulting to trimgalore works as expected

Relevant files

FF230228_13.fastp.log
nextflow.log

System information

Nextflow = 22.10.1
Ubuntu Linux = 20.04.6 LTS
nf-core/rnaseq = 3.11.0
local executor

@mschubert mschubert added the bug Something isn't working label Jan 23, 2024
@drpatelh drpatelh added this to the 3.15.0 milestone May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants