Pipeline unable to recognise samples processed across multiple lanes #351

jma1991 · 2023-10-22T13:16:55Z

Description of the bug

I've identified a potential issue in the recent pipeline release (v2.5.0). It seems the groupTuple command is executed twice during the input channel creation and branching of FASTQ files. As a result, the pipeline is unable to recognise samples processed across multiple lanes, due to an additional layer of file nesting. See here:

methylseq/workflows/methylseq.nf

Lines 98 to 105 in 66c6138

    
           .groupTuple() 
        
           .map { 
        
               meta, fastq -> 
        
               def meta_clone = meta.clone() 
        
               meta_clone.id = meta_clone.id.split('_')[0..-2].join('_') 
        
               [ meta_clone, fastq ] 
        
           } 
        
           .groupTuple(by: [0])

Command used and terminal output

No response

Relevant files

No response

System information

No response

mz448 · 2023-11-14T16:05:01Z

This believe this is an issue with the samplesheet.csv info

Discussion reference

Follow @FelixKrueger and @bioinfoMMS discussion in the slack channel -> conversation

How to "solve" it:

I was able to run the pipeline using bismark by adding an underscore "_" inside the name of the sample (in column 1) in the samplesheet.csv
e.g. ( use sample1_rep1 instead of sample1)
Make sure you use 4 header columns instead of 3 being the last genome. (this isn't very clear because the current documentation at https://nf-co.re/methylseq does not mention it! But Felix says it in the conversation

e.g.:

# use this:
sample, fastq_1, fastq_2, genome
sample1_rep1, bla/sample1_R1.fastq.gz, bla/sample1_R2.fastq.gz,
sample2_rep1, bla/sample2_R1.fastq.gz, bla/sample2_R2.fastq.gz,

# instead of:
sample,fastq_1,fastq_2
sample1, bla/sample1_R1.fastq.gz, bla/sample1_R2.fastq.gz
sample2, bla/sample2_R1.fastq.gz, bla/sample2_R2.fastq.gz

To add multiple lanes of the same sample, repeat the name of the sample, and they will merge during the processing.

wkang0 · 2024-01-04T20:43:22Z

This is bug in the instruction instead of in the code. To make one sample in different lanes, the sample sheet should look like this:

sample1_REP1,fq1.gz,fq2.gz
sample1_REP2,fq11,ga,fq12.gz

jma1991 added the bug Something isn't working label Oct 22, 2023

ewels mentioned this issue Feb 9, 2024

samplesheet input is not detecting entire values in sample column #378

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline unable to recognise samples processed across multiple lanes #351

Pipeline unable to recognise samples processed across multiple lanes #351

jma1991 commented Oct 22, 2023

mz448 commented Nov 14, 2023

wkang0 commented Jan 4, 2024

Pipeline unable to recognise samples processed across multiple lanes #351

Pipeline unable to recognise samples processed across multiple lanes #351

Comments

jma1991 commented Oct 22, 2023

Description of the bug

Command used and terminal output

Relevant files

System information

mz448 commented Nov 14, 2023

Discussion reference

How to "solve" it:

wkang0 commented Jan 4, 2024