Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

samplesheet input is not detecting entire values in sample column #378

Open
wvictor14 opened this issue Feb 9, 2024 · 4 comments
Open
Labels
bug Something isn't working

Comments

@wvictor14
Copy link

wvictor14 commented Feb 9, 2024

Description of the bug

when (a certain number of?) underscores are used in sample column, sometimes only a substring of the entire value is read in, rather than the whole value.

E.g. BATCH_DATE_SAMPLE is read in as BATCH_DATE in the following example. The impact is that when the read-in partial value is not unique , the pipeline will erroneously treat multiple (unique) rows as replicates.

See screenshots of an example samplesheet and running pipeline for example

image (1)
image

Command used and terminal output

No response

Relevant files

No response

System information

version methylseq 2.6.0
No response

@wvictor14 wvictor14 added the bug Something isn't working label Feb 9, 2024
@ewels
Copy link
Member

ewels commented Feb 9, 2024

I suspect that this is the offending code:

def meta_clone = meta.clone()
parts = meta_clone.id.split('_')
meta_clone.id = parts.length > 1 ? parts[0..-2].join('_') : meta_clone.id

I'm not 100% if this is a bug or a feature. If a feature then it should have better docs.

@ewels
Copy link
Member

ewels commented Feb 9, 2024

I think that this issue is essentially the inverse of #351 (here it's happening by accident, there it was the desired behaviour).

@sateeshperi sateeshperi mentioned this issue Feb 22, 2024
11 tasks
@flerpan01
Copy link

I'm so happy I found this issue, was pulling my hair the whole day thinking my code was wrong. Quickfix: changed the underscore to a dot in my samplesheet.csv (F0_1 -> F0.1)

@CathyXD
Copy link

CathyXD commented Apr 15, 2024

I've encountered the same issue that the sample name inputs were uncompleted causing later errors. My sequencing was paired-end with 4 lane per sample, so may also have the problem mentioned in #381 . Could anyone provide an updated workable samplesheet.csv example? Really confused now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants