Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I get DE gene list for a simulated data set? #57

Closed
tengfei-emory opened this issue Sep 28, 2018 · 4 comments
Closed

Can I get DE gene list for a simulated data set? #57

tengfei-emory opened this issue Sep 28, 2018 · 4 comments
Labels

Comments

@tengfei-emory
Copy link

I am conducting simulations to evaluate DE gene detection performances. Is it possible to know which genes are differentially simulated among groups, so that I can have a gold standard to compare with? I appreciate your support! Thank you.

@lazappi
Copy link
Collaborator

lazappi commented Oct 2, 2018

For the Splat simulation you need to consider the DEFacGroupX columns in the row metadata. These tell you whether the expression of a particular gene has been changed in a particular group. If you have simulated two groups then a positive DE factor indicates that a gene is up-regulated in that group (and down-regulated for negative factors). It gets a bit more complicated if you have multiple groups. Other models return different information about intermediate parameters.

@tengfei-emory
Copy link
Author

Thank you for answering! Could you please explain the complication when I simulate multiple groups? When comparing only two groups out of many groups, is it reasonable to ignore the DE factor for other groups and focus only on the two groups of interest? Your advice will be really helpful for our analyses. Thank you again.

@lazappi
Copy link
Collaborator

lazappi commented Oct 8, 2018

You are correct, you only need to compare the factors for the groups you are interested in. I will try to provide an example that explains how it works.

Let's say we are simulation some data with just two groups and we get these values for the genes:

Gene Mean DEFacGroup1 DEFacGroup2
GeneA 10 1 1
GeneB 100 1 5
GeneC 1000 10 5

In this case GeneA is not DE, GeneB is up-regulated in Group2 (foldchange = 5), GeneC is up-regulated in Group1 (foldchange = 2).

Now let's say we simulate three groups:

Gene Mean DEFacGroup1 DEFacGroup2 DEFacGroup3
GeneA 10 1 1 2
GeneB 100 1 5 1
GeneC 1000 10 5 8

We can do a similar thing to calculate the simulated differences in expression (foldchanges) between each of the pairs of groups we are interested in.

FirstGroup SecondGroup Gene FirstDEFac SecondDEFac FoldChange
Group1 Group2 GeneA 1 1 1
Group1 Group2 GeneB 1 5 1 / 5
Group1 Group2 GeneC 10 5 2
Group1 Group3 GeneA 1 2 1 / 2
Group1 Group3 GeneB 1 1 1
Group1 Group3 GeneC 10 8 10 / 8
Group2 Group3 GeneA 1 2 1 / 2
Group2 Group3 GeneB 5 1 5
Group2 Group3 GeneC 5 8 5 / 8

The simulated foldchanges we have just calculated can then be used to tell us which genes are DE between two group (those where FoldChange != 1).

If we want to compare one group to multiple other groups it gets a bit more complicated again. We need to look at the DE factors in each of the groups but weighted according to the number of cells in each group.

Let's say we want to compare GeneA between Group1 vs (Group2 + Group3) we could calculated the simulated foldchange as something like:

FoldChange = DEFacGroup1 / ((DEFacGroup2 * nCellsGroup2  + DEFacGroup3 * nCellsGroup3) / (nCellsGroup2 + nCellsGroup3))

Hope that helps explain things! 😸

@tengfei-emory
Copy link
Author

The explanation is really thorough and helpful. Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants