Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calc_diff_abund_deseq2, linear variables and "design"? #351

Open
msolbach opened this issue Aug 9, 2023 · 3 comments
Open

calc_diff_abund_deseq2, linear variables and "design"? #351

msolbach opened this issue Aug 9, 2023 · 3 comments

Comments

@msolbach
Copy link

msolbach commented Aug 9, 2023

Dear Zachary and members of the Grunwald lab,
first of all: Thank you very, very much for the metacoder heat tree package! These heat trees open up so many possibilities!

I have a question regarding the differential abundance analysis using the calc_diff_abund_deseq2 function. I am trying to perform differential abundance analysis and display the results as a heat tree, and that seems to work fine if I use a factor with two levels in the function (groups = metadata$treatment; treatment having two levels). However, my experimental design is a little more complex than that. My experimental design is a split-plot experiment (nested design, every plot was split into the 2 treatments; pseudo-replication!), so I would like to remove the "plot" effect first. In DESeq2 there is the option to specify the "design" to account for exactly that problem:
deseq2Data <- DESeqDataSetFromMatrix(countData=cerco_taxmap$data$tax_counts[-1], colData=metadata, design= ~ plotcode + treatment)
Is there a possibility to do this using calc_diff_abund_deseq2 as well?

Furthermore, on plot level, I have a plant diversity gradient that I would like to test as a linear variable (log-transformed, but that is not important at this point). In other words, I want to know which OTUs/ASVs are more abundant at low (or high, respectively) plant species richness. Again, DESeq2 offers a parameter for this in the "results" function: While "contrast" is used for factors, "name" can be used for continuous variables, e.g.:
deseq2Results <- results(diff_abundant_OTUs, name="log_sowndiv", alpha = 0.05)
But the calc_diff_abund_deseq2 function only accepts factors. The Error message (group is not a factor, see ?results) also refers to the results function of DESeq2. I think the issue is that DESeq2 itself uses the abovementioned two different parameters ("group" or "name") for different variable types, but calc_diff_abund_deseq2 only accepts factors. Or am I overlooking an additional parameter?

I am aware that this function is still in development, so this may not be implemented yet. But can you think of a "workaround" to still get this to work? I am sadly not too familiar with either DESeq2 nor metacoder, so I struggle a lot with that.
And if it is already implemented and I am just overlooking it, then I am very sorry for the inconvenience! But maybe this helps improve the function!

Thanks in advance, and greetings from Germany!
Marcel

@zachary-foster
Copy link
Contributor

Hello Marcel,

I dont think these are possible with the current code as it, but they seem like good functionality to add. I am about to leave for a conference, so I dont have time to look into this too much now, but it is possible that its a minor change. If you have some R experience, I bet you could make the changes since calc_diff_abund_deseq2 is not too complicated. The code is here:

https://github.com/grunwaldlab/metacoder/blob/master/R/calculations--differential_abundance.R

I will try to get back to this soon, but its possible that it will be a few weeks before I have time.

Best,

Zach

@msolbach
Copy link
Author

Hey Zach!
Thanks for your reply! I had a quick look at the function, and indeed, it does not seem to be too complicated. I see that you actually use both functions that I mentioned (DESeqDataSetFromMatrix and results), and automatically parse the input from calc_diff_abund_deseq2. It may be necessary to add an additional parameter for the design, but at least the contrast/name thing for factors/numeric variables can probably be solved with an if statement. I will give it a try myself and let you know!
If I come up with an idea I will let you know, of course!
Enjoy your conference!

@msolbach
Copy link
Author

msolbach commented Aug 17, 2023

Hello again!
I just started diving a bit deeper into DesSeq2 and its respective functions. I recently noticed a strange behavior when I was trying to construct the heat trees with metacoder: I wanted to display 2 versions of the same heat tree, one "large" tree (going down to OTU level) and one "small" tree (stopping at the genus level). So I did the analysis twice, and when building my taxmap dataset using parse_tax_data() I included OTU as the lowest taxonomic level in the first case, and in the latter case, I omitted it. I would expect the results to stay exactly the same, except for the additional values. However, this does not seem to be the case; the results differ (only slightly, but still). Also, I had the weird case once, that my root (e.g. Bacteria) was significant. How can the root be differentially abundant, if all samples contain e.g. 100% bacteria? This should be impossible, so I assume that there is an error somewhere.
I assume that the following issue is occurring: The taxmap object "knows" that there are different taxonomic levels; this can be seen by the taxa and their edges. However, DESeq2 does not seem to understand this; when parsing the count data to DESeq2, it is just a count table, and DESeq2 will treat all taxa as equal. That means DESeq2 will calculate the size factors for all samples using all taxa from all taxonomic levels (the root down to the lowest level like OTU). The size factor estimation includes using the median at some point; however, the median will (slightly) change when more taxa/taxonomic levels are added to the table. Therefore the whole DESeq2 analysis will (slightly) change. But I don't think that this is what is supposed to happen. The differential abundance analysis should be performed within each taxonomic level, but not across them, right?
Maybe I am misunderstanding what is happening, but this would explain my strange observations. I am just bringing this up now because this might also need some modifications in the calc_diff_abund_deseq2 function. But I assume that this can be solved with a loop that runs through the different taxonomic levels, performs the analysis per level, and then stores the outputs in a single table. By doing so, the results should be consistent when dropping a taxonomic level. And most importantly, the results should be "correct" and correspond to the results you would obtain when performing the analysis on OTU level only (as people usually do it).

Sorry for the wall of text, and I hope I am not saying anything stupid here. But I am quite sure that there is something not working quite correctly yet, but I will try my best to understand and maybe even fix the issue!
Thank you and cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants