Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of hierarchical group simulation #120

Open
wants to merge 4 commits into
base: devel
Choose a base branch
from

Conversation

keshav-motwani
Copy link

Hi Luke!

First, thanks for the amazing package!

For a project I'm working on, I needed to generate single-cell datasets with a hierarchical cell type (group) structure. To do this, I extended the group structure functionality already present and added a few parameters specific to simulating a hierarchical structure.

The main parameter to specify the hierarchical structure is splits.per.level. This allows one to specify a fairly simple structure, in which each level in the hierarchy is split into an equal number of groups. For example, if splits.per.level = c(4, 2, 2), there would be 4 major cell types, each of those split into 2 subtypes, and each of those split into 2 sub-subtypes.

In addition to this, parameters similar to those in the group simulation are added to control the multiplicative factors that are simulated. However, instead of specifying these per group, they are specified at each level of the hierarchy for simplicity.

The hierarchy is created by generating the multiplicative factors in an iterative manner. First, these factors are sampled for the highest level, then those factors are modified for each subgroup, and so on. This is implemented in the splatSimHierarchicalDE function.

I've attached a few plots showing the structure of the resulting data (which can be reproduced using the hierarchical_example.R file in the root directory. The first plot shows how the multiplicative factors per group end up clustering, the second plot how the final data clusters, and the third plot a UMAP of the final data.

hierarchical_example.pdf

I'd appreciate any thoughts and suggestions on how to improve this if you think it'd be useful for others!

Thanks,
Keshav

@lazappi
Copy link
Collaborator

lazappi commented Aug 24, 2021

Hi @keshav-motwani

Thanks for the PR, it's always super exciting to have people want to contribute 🎉!

There is definitely a need to be able generate more complex designs. This is something I have played around with in the experimental Kersplat simulation and am hoping to port back to Splat soon. That approach will be a bit different to yours though (and much more work). I think I need some time to look at your code more closely and think about what the best solution is. What you have is interesting so it might be worth doing both. If you don't hear back in the next two weeks please ping me here to remind me 😸.

If you still feel like working on this while you wait I have turned on the CI which might pick up some issues to look at (no pressure though).

@lazappi
Copy link
Collaborator

lazappi commented Sep 3, 2021

Hi @keshav-motwani

I have thought about this a bit. I would be happy to add it to Splat but I think there might need to be some adjustments to how the parameters work. Basically I want to avoid duplicating parameters just for the hierarchical mode. For most of them we can probably just make them vectors where each value corresponds to a level but the tricky one will be group.prob. Some kind of hierarchical list might work there but it could take some adjustments to implement it.

Let me know what you think and if you are interested in doing some more work on this.

@keshav-motwani
Copy link
Author

keshav-motwani commented Sep 3, 2021 via email

@lazappi
Copy link
Collaborator

lazappi commented Sep 6, 2021

Hmmmmm...this is a good point which I hadn't thought about. Thanks for raising it! Let me have another think and see if I can come up with a design which will let us reuse at least some things.

@lazappi
Copy link
Collaborator

lazappi commented Sep 10, 2021

What if we added a groups.level parameter instead? What I am imagining is a vector which is the same length as the number of groups and the values tell you which level of the hierarchy each group belongs to. So assuming there are six groups groups.level = c(0, 0, 0, 0, 0, 0) would give you six groups on the top level (equivalent to how groups work now). Setting groups.level = c(0, 0, 0, 1, 1, 1) give you three top level groups that each split into three subgroups and groups.level = c(0, 1, 1, 2, 2, 2) would give you one top level group that splits into two sub-groups which each split into three sub-sub-groups (hopefully that makes sense). This would let us re-use the existing parameters which can already be set per group. The tricky one will be group.prob because the values won't quite correspond to an individual "group" anymore (if group.level = c(A = 0, B = 0, C = 1) and group.prob = c(A = 0.25, B = 0.25, C = 0.5) the group.prob for C matches two sub-groups of A and B). Probably the only way to really clear this up is have some kind of list structure for group.prob but I think that is feasible.

These would be somewhere between what you have suggested and what I am planning to do in the future which each group will have a parent (similar to how paths work now).

What do you think? (There's a fair chance I'm still overlooking something so please let me know 😸).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants