Addition of hierarchical group simulation #120

keshav-motwani · 2021-08-19T18:26:27Z

Hi Luke!

First, thanks for the amazing package!

For a project I'm working on, I needed to generate single-cell datasets with a hierarchical cell type (group) structure. To do this, I extended the group structure functionality already present and added a few parameters specific to simulating a hierarchical structure.

The main parameter to specify the hierarchical structure is splits.per.level. This allows one to specify a fairly simple structure, in which each level in the hierarchy is split into an equal number of groups. For example, if splits.per.level = c(4, 2, 2), there would be 4 major cell types, each of those split into 2 subtypes, and each of those split into 2 sub-subtypes.

In addition to this, parameters similar to those in the group simulation are added to control the multiplicative factors that are simulated. However, instead of specifying these per group, they are specified at each level of the hierarchy for simplicity.

The hierarchy is created by generating the multiplicative factors in an iterative manner. First, these factors are sampled for the highest level, then those factors are modified for each subgroup, and so on. This is implemented in the splatSimHierarchicalDE function.

I've attached a few plots showing the structure of the resulting data (which can be reproduced using the hierarchical_example.R file in the root directory. The first plot shows how the multiplicative factors per group end up clustering, the second plot how the final data clusters, and the third plot a UMAP of the final data.

hierarchical_example.pdf

I'd appreciate any thoughts and suggestions on how to improve this if you think it'd be useful for others!

Thanks,
Keshav

lazappi · 2021-08-24T15:09:10Z

Hi @keshav-motwani

Thanks for the PR, it's always super exciting to have people want to contribute 🎉!

There is definitely a need to be able generate more complex designs. This is something I have played around with in the experimental Kersplat simulation and am hoping to port back to Splat soon. That approach will be a bit different to yours though (and much more work). I think I need some time to look at your code more closely and think about what the best solution is. What you have is interesting so it might be worth doing both. If you don't hear back in the next two weeks please ping me here to remind me 😸.

If you still feel like working on this while you wait I have turned on the CI which might pick up some issues to look at (no pressure though).

lazappi · 2021-09-03T07:01:54Z

Hi @keshav-motwani

I have thought about this a bit. I would be happy to add it to Splat but I think there might need to be some adjustments to how the parameters work. Basically I want to avoid duplicating parameters just for the hierarchical mode. For most of them we can probably just make them vectors where each value corresponds to a level but the tricky one will be group.prob. Some kind of hierarchical list might work there but it could take some adjustments to implement it.

Let me know what you think and if you are interested in doing some more work on this.

keshav-motwani · 2021-09-03T12:51:48Z

Hi @lazappi, Thanks for checking it out, and great to hear! Actually, in my first implementation, I did not have separate parameters for hierarchical, but the reason I switched to separate was to help with parameter checking. Here is that initial implementation: https://github.com/keshav-motwani/splatter/tree/7d4c3bd27b32b4510a6dee732e5c38c00068a3a6. In this version, I had to disable all parameter checking in setValidity(" SplatParams", ... function for the DE parameters. When the SplatParams class is created, the simulation method is not known to us, so checking the de.prob, de.downProb, etc. is hard because we do not even know the correct length (in the case of method = "groups", the length should be the number of groups, in the case of method = "hierarchical," at least currently the length should be the number of levels in the hierarchy (currently this way mostly for simplicity). Also, another reason I went for new parameters is I figured adding a new set of parameters was kind of similar to what is done for method = "paths". As far as I understand, path-specific parameters are simply ignored if using method = "groups" or method = "single", so I figured the analogous case would be to have separate parameters for hierarchical. If we kept common parameters, how would you suggest dealing with parameter checking without knowing the intended simulation method? Thanks! Keshav

…

On Fri, Sep 3, 2021 at 3:02 AM Luke Zappia ***@***.***> wrote: Hi @keshav-motwani <https://github.com/keshav-motwani> I have thought about this a bit. I would be happy to add it to Splat but I think there might need to be some adjustments to how the parameters work. Basically I want to avoid duplicating parameters just for the hierarchical mode. For most of them we can probably just make them vectors where each value corresponds to a level but the tricky one will be group.prob. Some kind of hierarchical list might work there but it could take some adjustments to implement it. Let me know what you think and if you are interested in doing some more work on this. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#120 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADTOXQCNBCDFU5E6TAUQ2J3UABXGZANCNFSM5COYVG2Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

lazappi · 2021-09-06T07:11:38Z

Hmmmmm...this is a good point which I hadn't thought about. Thanks for raising it! Let me have another think and see if I can come up with a design which will let us reuse at least some things.

lazappi · 2021-09-10T07:50:19Z

What if we added a groups.level parameter instead? What I am imagining is a vector which is the same length as the number of groups and the values tell you which level of the hierarchy each group belongs to. So assuming there are six groups groups.level = c(0, 0, 0, 0, 0, 0) would give you six groups on the top level (equivalent to how groups work now). Setting groups.level = c(0, 0, 0, 1, 1, 1) give you three top level groups that each split into three subgroups and groups.level = c(0, 1, 1, 2, 2, 2) would give you one top level group that splits into two sub-groups which each split into three sub-sub-groups (hopefully that makes sense). This would let us re-use the existing parameters which can already be set per group. The tricky one will be group.prob because the values won't quite correspond to an individual "group" anymore (if group.level = c(A = 0, B = 0, C = 1) and group.prob = c(A = 0.25, B = 0.25, C = 0.5) the group.prob for C matches two sub-groups of A and B). Probably the only way to really clear this up is have some kind of list structure for group.prob but I think that is feasible.

These would be somewhere between what you have suggested and what I am planning to do in the future which each group will have a parent (similar to how paths work now).

What do you think? (There's a fair chance I'm still overlooking something so please let me know 😸).

keshav-motwani added 4 commits August 13, 2021 00:18

implement hierarchical group simulation

7d4c3bd

add new parameters specifically for hierarchical group simulation

3f1ac6e

add example

3874479

update example

fa2332d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addition of hierarchical group simulation #120

Addition of hierarchical group simulation #120

keshav-motwani commented Aug 19, 2021

lazappi commented Aug 24, 2021

lazappi commented Sep 3, 2021

keshav-motwani commented Sep 3, 2021 via email •

edited

lazappi commented Sep 6, 2021

lazappi commented Sep 10, 2021

Addition of hierarchical group simulation #120

Are you sure you want to change the base?

Addition of hierarchical group simulation #120

Conversation

keshav-motwani commented Aug 19, 2021

lazappi commented Aug 24, 2021

lazappi commented Sep 3, 2021

keshav-motwani commented Sep 3, 2021 via email • edited

lazappi commented Sep 6, 2021

lazappi commented Sep 10, 2021

keshav-motwani commented Sep 3, 2021 via email •

edited