Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust get_read_group for multi sample config. #59

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

christopher-schroeder
Copy link
Contributor

I have projects where I have to use the same sample in multiple groups. For example I have a lot of single case samples, but the parents are sequenced chunkwise in pools. In that case I write a config which looks like this:

CSDN21	index	CSDN21	ILLUMINA	NA
CSDN47	motherpool	CSDN21	ILLUMINA	NA
CSDN52	fatherpool	CSDN21	ILLUMINA	NA
CSDN22	index	CSDN22	ILLUMINA	NA
CSDN47	motherpool	CSDN22	ILLUMINA	NA
CSDN52	fatherpool	CSDN22	ILLUMINA	NA
CSDN23	index	CSDN23	ILLUMINA	NA
CSDN47	motherpool	CSDN23	ILLUMINA	NA
CSDN52	fatherpool	CSDN23	ILLUMINA	NA

This seems to work just fine for the calling, but for the mapping we have to slightly modify the read_group string generation.

Copy link
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, it seems a bit weird that the sample needs to be repeated just because it occurs in more than one group. Perhaps it would be better to allow the group column to contain a comma-separated list, and adjust the code for getting all samples for a group accordingly.

@christopher-schroeder
Copy link
Contributor Author

Yes, I've also thought about a comma separated list. But it might be that a single sample might have a different role for different groups. A comma separated list would not be enough in this case, you would also need a comma separated alias list. ... I dont know, i dont know ...

@johanneskoester
Copy link
Contributor

Yes, I've also thought about a comma separated list. But it might be that a single sample might have a different role for different groups. A comma separated list would not be enough in this case, you would also need a comma separated alias list. ... I dont know, i dont know ...

that's a very good point.

@johanneskoester
Copy link
Contributor

What if we instead add another file groups.tsv for group assignment (while removing the alias and group column from samples.tsv)?

group	sample_name	alias
CSDN21	CSDN21	index
CSDN21	CSDN47	motherpool
CSDN21	CSDN52	fatherpool
CSDN22	CSDN22	index
CSDN22	CSDN47	motherpool
CSDN22	CSDN52	fatherpool

I think that would better capture the relational nature of such constructs, and maybe also be cleaner, because the tables become less crowded and redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants