New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for gene-level estimation and DE, along with documentation #4
Comments
I would like that, too. Until then, could somebody sketch what an analysis at the gene level could look like? |
If you are just estimating gene expression (stage 1) then you can sum up the transcript expression levels in FPKM or TPM units (not counts units - since length normalisation is important). In that case you can also just use our fast VB algorithm rather than MCMC is timing is an issue. If you are doing differential expression (stage 2) you have to do this sum for each MCMC sample in the original BitSeq. |
To recap and as a reference for everybody else (correct me if I'm wrong), I will put this here. For the moment, I am interested in computing logFC to compare to edgeR/limma etc and also qPCR. Differential gene expression might be on the list later, though. Further, I already have done the analysis for transcripts in R, writing (most of?) the files to my working directory (so no temp files). Stage1: Just estimating gene expression, I can work with the results files from getExpression(). I already have transcript estimates from an earlier run (MCMC), but if I was to start from scratch the new VB method would be fine for computing means and logFC. I have a per-replicate .mean file from my analysis, from reading the documentation I assume this is generated by getExpression() and is equal to the $exp bit of the resulting list. Stage2: "Summing each MCMC sample in the original Bitseq" is what getGeneExpression() does, I assume? That gives me a new table with estimates on gene level (~30000 genes x 1000 samples). How do I proceed from there? |
Hi Clemens, just to make sure: have you already used a *.tr file with proper genes-transcripts grouping and then called the getGeneExpression function or not? Regarding you question 'How do I proceed from there?' I understand that you just want to compute the logFC on the gene-level and not to perform DE analysis. In this case, you could run the 'getVariance' function over the replicates of a given condition (each file should have ~30000 genes x 1000 samples as you described). This will produce a file <*.Lmean> containing the averaged expression (in the first column) across all replicates of the specific condition. Then you will have to do this for the replicates of the second condition. Then you can compute the logFC. |
Hello Panagiotis, yes, I have a trInfo DataFrame/File with proper groupings (queried biomaRt for missing gene names) and I also have succesfully called getGeneExpression(), which gives me the mentioned files (without any file extension, though. ~30000 genes x 1000 samples down from ~80000 transcripts). Further questions that came up:
Thanks for your help! |
Hi, Yes, there is a mapping. I think you should be able to run all R methods with pretend=TRUE option, in such case, the only output will be the actual shell commands that can be used to get the same results. If you run the methods with same arguments as before, you should see exactly which command produced which files (and also what were the command line arguments). |
No description provided.
The text was updated successfully, but these errors were encountered: