You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
and then filtering for significant genes using lrt-pvalue < 0.05 the number of significant genes varies between pyseer runs even though none of the input files have any changes.
In total 7 runs with covariates were run. Within these the lowest number of significant genes is 1245, the highest is 1395. Also, each run has a different number of significant genes.
The expectation would be that each run has the same number of significant genes. When filtering for filter-pvalue <0.05 the number of significant genes is constant.
Additionally, the number of significant genes after using covariates is about twice the number of significant genes without covariates (based on lrt-pvalue, however, they are the same when filtering using filter-pvalue).
Could you help me understand whether this behaviour is expected when running pyseer? Thanks in advance.
The text was updated successfully, but these errors were encountered:
That comes a bit of a surprise, and this is not what we see in our unit tests, which return the same results every time. One thing I can think of is some stochasticity introduced when using multiple cores. Do you see the same variability when using a single core?
As an aside, a p-value threshold of 0.05 is likely too high, please refer to the docs for suggestions about setting such threshold.
Thanks for your response. I did run it three times with 1 CPU and I still get variable results.
wicovariates_cpu1_1.tsv: 6268
wicovariates_cpu1_2.tsv: 6345
wicovariates_cpu1_3.tsv: 6357
As for the p-value threshold, this is just for filtering and comparison to see whether I am getting variable results between runs. For my analysis, I correct it for multiple testing before taking any further steps.
After running Pyseer using
pyseer --phenotypes phenotypes.tsv --pres gene_presence_absence.Rtab --similarity phylogeny_similarity.tsv --lmm --covariates covariates.tsv --use-covariates 2 --cpu 8 > $1
and then filtering for significant genes using lrt-pvalue < 0.05 the number of significant genes varies between pyseer runs even though none of the input files have any changes.
In total 7 runs with covariates were run. Within these the lowest number of significant genes is 1245, the highest is 1395. Also, each run has a different number of significant genes.
The expectation would be that each run has the same number of significant genes. When filtering for filter-pvalue <0.05 the number of significant genes is constant.
Additionally, the number of significant genes after using covariates is about twice the number of significant genes without covariates (based on lrt-pvalue, however, they are the same when filtering using filter-pvalue).
Could you help me understand whether this behaviour is expected when running pyseer? Thanks in advance.
The text was updated successfully, but these errors were encountered: