Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide replicate information explicitly in samplesheet #343

Open
JoseEspinosa opened this issue Apr 26, 2023 · 5 comments
Open

Provide replicate information explicitly in samplesheet #343

JoseEspinosa opened this issue Apr 26, 2023 · 5 comments
Assignees
Labels
enhancement WIP Work in progress
Milestone

Comments

@JoseEspinosa
Copy link
Member

Description of feature

Currently, the pipeline considers as a biological replicate any sample which has the same id under the sample column of the samplesheet followed by a different suffix determined by an underscore e.g.:

sample1_r1
sample1_r2
sample2_r1
sample2_r2

This information is used by the pipeline in this code line to determine whether multiple groups are present e,g, sample1 and sample2 in the example above and whether replicates exists r1 and r2 also using the example above.

However, the problem with this approach is that is based on the sample names and sometimes this can be problematic since depends on the correct naming of the replicates with the underscore, see this issue.

I guess that the solution to this problem will be to include again the replicate column into the samplesheet, although this information is currently only used for enabling the run of DESEQ2_QC here and MACS2_CONSENSUS here.

I would like to know your opinion here @drpatelh, @bjlang and any other willing to give feedback of course :smi

@JoseEspinosa
Copy link
Member Author

JoseEspinosa commented Apr 26, 2023

Actually, I just remembered that for the IDR analysis the replicate information would be needed in case this feature is implemented, see #235 and #87

@JoseEspinosa JoseEspinosa added this to the 2.1 milestone May 29, 2023
@JoseEspinosa JoseEspinosa self-assigned this Jun 19, 2023
@JoseEspinosa JoseEspinosa added the WIP Work in progress label Jun 19, 2023
@JoseEspinosa JoseEspinosa changed the title Provide replicate information explicitely in samplesheet Provide replicate information explicitly in samplesheet Jun 21, 2023
@cjfields
Copy link

cjfields commented Jul 6, 2023

I'm not sure if there has been any input on this, but we have several ChIP-Seq projects that have biological reps, so having some way to keep track of these and perform IDR would be great.

@cjfields
Copy link

In the short term, couldn't replicates be captured when checking the sample sheet, then used downstream? Around this spot:

## Check sample name entries

@JoseEspinosa
Copy link
Member Author

The check_samplesheet.py script has been already updated to get this information in dev in #349 but the IDR is not yet implemented.

@cjfields
Copy link

The check_samplesheet.py script has been already updated to get this information in dev in #349 but the IDR is not yet implemented.

Yep, missed that. I definitely like having the explicit column for this better than the _r1, _r2 convention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement WIP Work in progress
Projects
None yet
Development

No branches or pull requests

2 participants