Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipeline terminating early #264

Open
nick-youngblut opened this issue Mar 13, 2024 · 2 comments
Open

pipeline terminating early #264

nick-youngblut opened this issue Mar 13, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@nick-youngblut
Copy link

nick-youngblut commented Mar 13, 2024

Description of the bug

I just want to use the pipeline for QC'ing my nanopore data, but it prematurely terminates after the initial step of the pipeline:

[70/93224f] process > NFCORE_NANOSEQ:NANOSEQ:INPUT_CHECK:SAMPLESHEET_CHECK (SampleSheet.csv) [100%] 1 of 1 ✔
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:NANOPLOT                -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:FASTQC                  -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_RENAME                                      -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:CUSTOM_DUMPSOFTWAREVERSIONS                     -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:MULTIQC
-[nf-core/nanoseq] Pipeline completed successfully-

Command used and terminal output

nextflow run main.nf \
  --input SampleSheet.csv \
  --outdir path/to/output/ \
  --protocol cDNA \
  --skip_demultiplexing \
  --skip_vc \
  --skip_sv \
  --skip_alignment \
  --skip_differential_analysis \
  --skip_quantification \
  --skip_modification_analysis \
  --skip_fusion_analysis \
  -profile docker

Relevant files

My SampleSheet.csv file:

group,replicate,barcode,input_file,fasta,gtf
sample1,1,17,/path/to/basecalling/output/basecalling/barcode17/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_11240_0.fastq.gz,GRCh38,
sample1,2,17,/path/to/basecalling/output/basecalling/barcode17/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_10664_0.fastq.gz,GRCh38,
sample2,1,18,/path/to/basecalling/output/basecalling/barcode18/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_11240_0.fastq.gz,GRCh38,
sample2,2,18,/path/to/basecalling/output/basecalling/barcode18/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_10664_0.fastq.gz,GRCh38,

System information

  • Server: Ubuntu 22.04.3 LTS
  • Docker: 24.0.6
  • Nextflow: 23.10.1.5891
@nick-youngblut nick-youngblut added the bug Something isn't working label Mar 13, 2024
@nick-youngblut
Copy link
Author

nick-youngblut commented Mar 13, 2024

It appears that the issue is due to --skip_demultiplexing. A simple reprex:

nextflow run main.nf   --outdir /home/nickyoungblut/projects/SspArc0008_10x_cDNA_longRead/data/SspArc0008_10x_cDNA_longRead/nanoseq_TEST/   --protocol cDNA   --skip_demultiplexing   -profile docker,test

[53/2f0a9a] process > NFCORE_NANOSEQ:NANOSEQ:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_nobc_dx.csv)      [100%] 1 of 1 ✔
executor >  local (1)
[53/2f0a9a] process > NFCORE_NANOSEQ:NANOSEQ:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_nobc_dx.csv)      [100%] 1 of 1 ✔
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:NANOPLOT                             -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:FASTQC                               -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:PREPARE_GENOME:GET_CHROM_SIZES                               -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:PREPARE_GENOME:GTF2BED                                       -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:PREPARE_GENOME:SAMTOOLS_FAIDX                                -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:ALIGN_MINIMAP2:MINIMAP2_INDEX                                -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:ALIGN_MINIMAP2:MINIMAP2_ALIGN                                -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:SAMTOOLS_VIEW_BAM                    -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:SAMTOOLS_SORT                        -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:SAMTOOLS_INDEX                       -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS    -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTAT -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTATS -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:CUSTOM_DUMPSOFTWAREVERSIONS                                  -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:MULTIQC                                                      -
-[nf-core/nanoseq] Pipeline completed successfully-

The QC steps (e.g., NanoPlot) appear to be directly associated with the demultiplexing section of the pipeline, instead of applied to all downstream demux'd files (user provided demux'd files, or files demux'd by the pipeline):

    if (!params.skip_demultiplexing) {

        /*
         * MODULE: Demultipexing using qcat
         */
        QCAT ( ch_input_path )
        ch_fastq = Channel.empty()
        QCAT.out.fastq
            .flatten()
            .map { it -> [ it, it.baseName.substring(0,it.baseName.lastIndexOf('.'))] }
            .join(ch_sample, by: 1) // join on barcode
            .map { it -> [ it[2], it[1], it[3], it[4], it[5], it[6] ] }
            .set { ch_fastq }
        ch_software_versions = ch_software_versions.mix(QCAT.out.versions.ifEmpty(null))
    } else {
        if (!params.skip_alignment) {
            ch_sample
                .map { it -> if (it[6].toString().endsWith('.gz')) [ it[0], it[6], it[2], it[1], it[4], it[5] ] }
                .set { ch_fastq }
        } else {
            ch_fastq = Channel.empty()
        }
    }

If params.skip_demultiplexing or params.skip_alignment (or NOT it[6].toString().endsWith('.gz')), then ch_fastq = Channel.empty(), and so no fastq files to process future in the pipeline.

It would greatly help to have the columns associated with the index values in:

.map { it -> [ it[2], it[1], it[3], it[4], it[5], it[6] ] } 

and:

.map { it -> if (it[6].toString().endsWith('.gz')) [ it[0], it[6], it[2], it[1], it[4], it[5] ] }

@nick-youngblut
Copy link
Author

Changing ch_fastq = Channel.empty() to ch_sample.map { it -> [ it[0], it[6] ] }.set { ch_fastq } enables the completion of NANOPLOT and FASTQC.

Still, the multi-qc report is not generated, which seems to be due to an unmet dependency at:

        MULTIQC (
        ch_multiqc_config,
        ch_multiqc_custom_config.collect().ifEmpty([]),
        ch_fastqc_multiqc.ifEmpty([]),
        ch_samtools_multiqc.collect().ifEmpty([]),
        ch_featurecounts_gene_multiqc.ifEmpty([]),
        ch_featurecounts_transcript_multiqc.ifEmpty([]),
        CUSTOM_DUMPSOFTWAREVERSIONS.out.mqc_yml.collect(),
        ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')
        )

With my edits (above), ch_fastqc_multiqc is not empty, so I would think that MULTIQC would run.

The following edit works:

        MULTIQC (
        ch_multiqc_config,
        ch_multiqc_custom_config.collect().ifEmpty([]),
        ch_fastqc_multiqc.collect().ifEmpty([])//,
        //ch_samtools_multiqc.collect().ifEmpty([]),
        //ch_featurecounts_gene_multiqc.ifEmpty([]),
        //ch_featurecounts_transcript_multiqc.ifEmpty([]),
        //CUSTOM_DUMPSOFTWAREVERSIONS.out.mqc_yml.collect(),
        //ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')
        )

Note: I updated process MULTIQC accordingly.

Also note: I had to include collect() to ch_fastqc_multiqc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant