Adjust get_read_group for multi sample config. #59

christopher-schroeder · 2020-11-20T13:38:28Z

I have projects where I have to use the same sample in multiple groups. For example I have a lot of single case samples, but the parents are sequenced chunkwise in pools. In that case I write a config which looks like this:

CSDN21	index	CSDN21	ILLUMINA	NA
CSDN47	motherpool	CSDN21	ILLUMINA	NA
CSDN52	fatherpool	CSDN21	ILLUMINA	NA
CSDN22	index	CSDN22	ILLUMINA	NA
CSDN47	motherpool	CSDN22	ILLUMINA	NA
CSDN52	fatherpool	CSDN22	ILLUMINA	NA
CSDN23	index	CSDN23	ILLUMINA	NA
CSDN47	motherpool	CSDN23	ILLUMINA	NA
CSDN52	fatherpool	CSDN23	ILLUMINA	NA

This seems to work just fine for the calling, but for the mapping we have to slightly modify the read_group string generation.

johanneskoester

Thinking about it, it seems a bit weird that the sample needs to be repeated just because it occurs in more than one group. Perhaps it would be better to allow the group column to contain a comma-separated list, and adjust the code for getting all samples for a group accordingly.

christopher-schroeder · 2020-11-25T16:06:42Z

Yes, I've also thought about a comma separated list. But it might be that a single sample might have a different role for different groups. A comma separated list would not be enough in this case, you would also need a comma separated alias list. ... I dont know, i dont know ...

johanneskoester · 2020-12-15T14:22:39Z

Yes, I've also thought about a comma separated list. But it might be that a single sample might have a different role for different groups. A comma separated list would not be enough in this case, you would also need a comma separated alias list. ... I dont know, i dont know ...

that's a very good point.

johanneskoester · 2020-12-15T14:33:41Z

What if we instead add another file groups.tsv for group assignment (while removing the alias and group column from samples.tsv)?

group	sample_name	alias
CSDN21	CSDN21	index
CSDN21	CSDN47	motherpool
CSDN21	CSDN52	fatherpool
CSDN22	CSDN22	index
CSDN22	CSDN47	motherpool
CSDN22	CSDN52	fatherpool

I think that would better capture the relational nature of such constructs, and maybe also be cleaner, because the tables become less crowded and redundant.

Adjust get_read_group for multi sample config.

553e931

johanneskoester approved these changes Nov 25, 2020

View reviewed changes

johanneskoester requested changes Nov 25, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust get_read_group for multi sample config. #59

Adjust get_read_group for multi sample config. #59

christopher-schroeder commented Nov 20, 2020

johanneskoester left a comment

christopher-schroeder commented Nov 25, 2020

johanneskoester commented Dec 15, 2020

johanneskoester commented Dec 15, 2020

Adjust get_read_group for multi sample config. #59

Are you sure you want to change the base?

Adjust get_read_group for multi sample config. #59

Conversation

christopher-schroeder commented Nov 20, 2020

johanneskoester left a comment

Choose a reason for hiding this comment

christopher-schroeder commented Nov 25, 2020

johanneskoester commented Dec 15, 2020

johanneskoester commented Dec 15, 2020