Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disrepancy GSEA results when using by="DOSE" or by="fgsea" #18

Open
guidohooiveld opened this issue Nov 14, 2016 · 3 comments
Open

Disrepancy GSEA results when using by="DOSE" or by="fgsea" #18

guidohooiveld opened this issue Nov 14, 2016 · 3 comments

Comments

@guidohooiveld
Copy link

guidohooiveld commented Nov 14, 2016

Hi, I don't know whether I should ask here, or at the maintainer of fgsea (@assaron), but after analyzing the example DOSE dataset using the GSEA algorithm implemented in either DOSE or fgsea, I noticed the results are IMO quite different. I realize that the results are permutation based, and that the results are therefore expected not to be 100% identical, but i find the observed differences quite large. Could you maybe comment on this whether this is expected? Thanks, Guido

Full code below, but main outcome:
gseDO(geneList,by="DOSE") identified 162 significantly regulated gene sets
gseDO(geneList,by="fgsea") identified 149 significantly regulated gene sets

But overlap is only 138!
Unique for 'DOSE' are 24 gene sets, for 'fgsea' 11.

Also the top regulated Gene Sets are completely different... ??

As said before, I expected some differences, but NOT because 2 different implementations of the same algorithm were used, but rather because of the permutations (that are random).

> packageVersion("DOSE")
[1] ‘3.0.4’
> 
> library(DOSE)
> data(geneList)
> x.DOSE <- gseDO(geneList,by="DOSE")
[1] "preparing geneSet collections..."
[1] "calculating observed enrichment scores..."
[1] "calculating permutation scores..."
[1] "calculating p values..."
[1] "leading edge analysis..."
[1] "done..."
> head(x.DOSE, 5)
                       ID                               Description setSize
DOID:5679       DOID:5679                           retinal disease     299
DOID:0060040 DOID:0060040          pervasive developmental disorder     175
DOID:127         DOID:127                                 leiomyoma      89
DOID:9884       DOID:9884                        muscular dystrophy      90
DOID:0060086 DOID:0060086 female reproductive organ benign neoplasm      42
             enrichmentScore       NES      pvalue   p.adjust    qvalues rank
DOID:5679         -0.3676313 -1.573945 0.001308901 0.03271556 0.02193074 1768
DOID:0060040      -0.3909104 -1.590133 0.001418440 0.03271556 0.02193074 2308
DOID:127          -0.4491341 -1.663010 0.001506024 0.03271556 0.02193074 2237
DOID:9884         -0.4623732 -1.713722 0.001515152 0.03271556 0.02193074 1971
DOID:0060086      -0.5530173 -1.780920 0.001636661 0.03271556 0.02193074 1674
  
DOID:5679    tags=24%, list=14%, signal=21%
DOID:0060040 tags=30%, list=18%, signal=25%
DOID:127     tags=40%, list=18%, signal=33%
DOID:9884    tags=31%, list=16%, signal=26%
DOID:0060086 tags=45%, list=13%, signal=39%
                                                                                                                                                                                                                                                                                                                                                                  core_enrichment
DOID:5679    2878/3791/23247/80184/6750/7450/596/9187/2034/482/948/1490/1280/5737/4314/4881/3426/187/629/6403/6785/2934/5176/7078/5950/727/10516/4311/2247/1295/358/10203/582/10218/57125/585/1675/6310/2202/4313/2944/4254/3075/2099/3480/4653/6387/1471/857/4016/1909/4053/6678/1296/4915/55812/1191/5654/10631/2697/2952/6935/2200/3479/2006/10451/9370/771/652/4693/5346/1524
DOID:0060040                                                                                                 1760/9732/7337/5175/6532/54806/6326/1499/7157/221037/627/3399/2571/3082/3791/27347/596/22829/23426/324/5021/4885/7248/8604/3397/4208/3400/26470/64221/9037/3952/93664/2944/7102/2550/4915/4922/26960/1746/2697/6863/2891/367/4128/7166/6505/5348/18/9370/57502/79083
DOID:127                                                                                                                                                                                             5175/5595/7157/1288/1543/1027/1958/7450/596/947/595/2246/3912/7507/2308/1277/8991/2247/5468/6469/81029/4313/56034/2944/2099/3480/1462/1289/5744/7043/367/1287/4582/3479/5241
DOID:9884                                                                                                                                                                                                                              825/8106/1291/3643/7402/7450/596/1490/8082/2817/284119/54843/1293/23345/3952/4313/10468/6444/9499/2273/3339/1292/3908/81493/3679/3708/4137
DOID:0060086                                                                                                                                                                                                                                                                                1958/7450/596/947/595/3912/1277/8991/3952/4313/2099/3480/1462/1289/7043/367/3479/5241
> x.fgsea <- gseDO(geneList,by="fgsea")
[1] "preparing geneSet collections..."
[1] "GSEA analysis..."
[1] "leading edge analysis..."
[1] "done..."
> head(x.fgsea, 5)
                   ID            Description setSize enrichmentScore       NES
DOID:9884   DOID:9884     muscular dystrophy      90      -0.4623732 -1.709404
DOID:14221 DOID:14221   metabolic syndrome X      22      -0.7194112 -2.018608
DOID:13359 DOID:13359 Ehlers-Danlos syndrome      11      -0.7647954 -1.784188
DOID:3382   DOID:3382            liposarcoma      12       0.7050486  1.825362
DOID:3939   DOID:3939      lipomatous cancer      12       0.7050486  1.825362
                pvalue   p.adjust   qvalues rank                   leading_edge
DOID:9884  0.001494768 0.03715567 0.0247576 1971 tags=31%, list=16%, signal=26%
DOID:14221 0.001626016 0.03715567 0.0247576  894  tags=36%, list=7%, signal=34%
DOID:13359 0.001773050 0.03715567 0.0247576 1023  tags=55%, list=8%, signal=50%
DOID:3382  0.002304147 0.03715567 0.0247576 1022  tags=42%, list=8%, signal=38%
DOID:3939  0.002304147 0.03715567 0.0247576 1022  tags=42%, list=8%, signal=38%
                                                                                                                                      core_enrichment
DOID:9884  825/8106/1291/3643/7402/7450/596/1490/8082/2817/284119/54843/1293/23345/3952/4313/10468/6444/9499/2273/3339/1292/3908/81493/3679/3708/4137
DOID:14221                                                                                                          5468/3952/2152/185/1489/3479/9370
DOID:13359                                                                                                                   1277/1281/1290/1289/1634
DOID:3382                                                                                                                   7153/1029/1019/7545/10644
DOID:3939                                                                                                                   7153/1029/1019/7545/10644
> 

> DOSE.res <- rownames(as.data.frame(x.DOSE))
> fgsea.res <- rownames(as.data.frame(x.fgsea))
> 
> combined.ID <- union(DOSE.res,fgsea.res)
> C <- cbind(combined.ID %in% DOSE.res, combined.ID %in% fgsea.res)
> colnames(C) <- c("DOSE.res","fgsea.res")
> 
> library(limma)
> vennCounts(C)
  DOSE.res fgsea.res Counts
1        0         0      0
2        0         1     11
3        1         0     24
4        1         1    138
attr(,"class")
[1] "VennCounts"
> 
> vennDiagram(vennCounts(C))
> 
>
@assaron
Copy link

assaron commented Nov 14, 2016

I would expect some difference due to randomness of the algorithms. You
should get similar difference between different runs of the same algorithm.
You'd better compare p-values of the same pathway.

The difference should be smaller if you increase number of permutations.
I'd recommend to use at least 10000.

On Mon, Nov 14, 2016, 13:26 Guido Hooiveld notifications@github.com wrote:

Hi, I don't know whether I should ask here, or at the maintainer of fgsea
(@assaron https://github.com/assaron), but after analyzing the example
DOSE dataset using the GSEA algorithm implemented in either DOSE or fgsea,
I noticed the results are IMO quite different. I realize that the results
are permutation based, and that the results are therefore expected not to
be 100% identical, but i find the observed differences quite large. Could
you maybe comment on this whether this is expected? Thanks, Guido

Full code below, but main outcome:
gseDO(geneList,by="DOSE") identified 162 different gene sets
gseDO(geneList,by="fgsea") identified 149 different gene sets

Overlap is only 138!
Unique for 'DOSE' are 24 gene sets, for 'fgsea' 11.

Also the top regulated Gene Sets are completely different... ??

As said before, I expected some differences, but NOT because 2 different
implementations of the same algorithm were used, but rather because of the
permutations (that are random).

packageVersion("DOSE")
[1] ‘3.0.4’

library(DOSE)
data(geneList)
x.DOSE <- gseDO(geneList,by="DOSE")
[1] "preparing geneSet collections..."
[1] "calculating observed enrichment scores..."
[1] "calculating permutation scores..."
[1] "calculating p values..."
[1] "leading edge analysis..."
[1] "done..."
head(x.DOSE, 5)
ID Description setSize
DOID:5679 DOID:5679 retinal disease 299
DOID:0060040 DOID:0060040 pervasive developmental disorder 175
DOID:127 DOID:127 leiomyoma 89
DOID:9884 DOID:9884 muscular dystrophy 90
DOID:0060086 DOID:0060086 female reproductive organ benign neoplasm 42
enrichmentScore NES pvalue p.adjust qvalues rank
DOID:5679 -0.3676313 -1.573945 0.001308901 0.03271556 0.02193074 1768
DOID:0060040 -0.3909104 -1.590133 0.001418440 0.03271556 0.02193074 2308
DOID:127 -0.4491341 -1.663010 0.001506024 0.03271556 0.02193074 2237
DOID:9884 -0.4623732 -1.713722 0.001515152 0.03271556 0.02193074 1971
DOID:0060086 -0.5530173 -1.780920 0.001636661 0.03271556 0.02193074 1674

DOID:5679 tags=24%, list=14%, signal=21%
DOID:0060040 tags=30%, list=18%, signal=25%
DOID:127 tags=40%, list=18%, signal=33%
DOID:9884 tags=31%, list=16%, signal=26%
DOID:0060086 tags=45%, list=13%, signal=39%
core_enrichment
DOID:5679 2878/3791/23247/80184/6750/7450/596/9187/2034/482/948/1490/1280/5737/4314/4881/3426/187/629/6403/6785/2934/5176/7078/5950/727/10516/4311/2247/1295/358/10203/582/10218/57125/585/1675/6310/2202/4313/2944/4254/3075/2099/3480/4653/6387/1471/857/4016/1909/4053/6678/1296/4915/55812/1191/5654/10631/2697/2952/6935/2200/3479/2006/10451/9370/771/652/4693/5346/1524
DOID:0060040 1760/9732/7337/5175/6532/54806/6326/1499/7157/221037/627/3399/2571/3082/3791/27347/596/22829/23426/324/5021/4885/7248/8604/3397/4208/3400/26470/64221/9037/3952/93664/2944/7102/2550/4915/4922/26960/1746/2697/6863/2891/367/4128/7166/6505/5348/18/9370/57502/79083
DOID:127 5175/5595/7157/1288/1543/1027/1958/7450/596/947/595/2246/3912/7507/2308/1277/8991/2247/5468/6469/81029/4313/56034/2944/2099/3480/1462/1289/5744/7043/367/1287/4582/3479/5241
DOID:9884 825/8106/1291/3643/7402/7450/596/1490/8082/2817/284119/54843/1293/23345/3952/4313/10468/6444/9499/2273/3339/1292/3908/81493/3679/3708/4137
DOID:0060086 1958/7450/596/947/595/3912/1277/8991/3952/4313/2099/3480/1462/1289/7043/367/3479/5241

x.fgsea <- gseDO(geneList,by="fgsea")
[1] "preparing geneSet collections..."
[1] "GSEA analysis..."
[1] "leading edge analysis..."
[1] "done..."
head(x.fgsea, 5)
ID Description setSize enrichmentScore NES
DOID:9884 DOID:9884 muscular dystrophy 90 -0.4623732 -1.709404
DOID:14221 DOID:14221 metabolic syndrome X 22 -0.7194112 -2.018608
DOID:13359 DOID:13359 Ehlers-Danlos syndrome 11 -0.7647954 -1.784188
DOID:3382 DOID:3382 liposarcoma 12 0.7050486 1.825362
DOID:3939 DOID:3939 lipomatous cancer 12 0.7050486 1.825362
pvalue p.adjust qvalues rank leading_edge
DOID:9884 0.001494768 0.03715567 0.0247576 1971 tags=31%, list=16%, signal=26%
DOID:14221 0.001626016 0.03715567 0.0247576 894 tags=36%, list=7%, signal=34%
DOID:13359 0.001773050 0.03715567 0.0247576 1023 tags=55%, list=8%, signal=50%
DOID:3382 0.002304147 0.03715567 0.0247576 1022 tags=42%, list=8%, signal=38%
DOID:3939 0.002304147 0.03715567 0.0247576 1022 tags=42%, list=8%, signal=38%
core_enrichment
DOID:9884 825/8106/1291/3643/7402/7450/596/1490/8082/2817/284119/54843/1293/23345/3952/4313/10468/6444/9499/2273/3339/1292/3908/81493/3679/3708/4137
DOID:14221 5468/3952/2152/185/1489/3479/9370
DOID:13359 1277/1281/1290/1289/1634
DOID:3382 7153/1029/1019/7545/10644
DOID:3939 7153/1029/1019/7545/10644

DOSE.res <- rownames(as.data.frame(x.DOSE))
fgsea.res <- rownames(as.data.frame(x.fgsea))

combined.ID <- union(DOSE.res,fgsea.res)
C <- cbind(combined.ID %in% DOSE.res, combined.ID %in% fgsea.res)
colnames(C) <- c("DOSE.res","fgsea.res")

library(limma)
vennCounts(C)
DOSE.res fgsea.res Counts
1 0 0 0
2 0 1 11
3 1 0 24
4 1 1 138
attr(,"class")
[1] "VennCounts"

vennDiagram(vennCounts(C))


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#18, or mute the thread
https://github.com/notifications/unsubscribe-auth/AA_fWaI33fSKVtQvxQSdQuutwCkLocSaks5q-DdSgaJpZM4KxKYS
.

@guidohooiveld
Copy link
Author

guidohooiveld commented Nov 15, 2016

Thanks Alexey for your comments.
If I repeatedly run GSEA either using the DOSE or fgsea algorithmic implementation, I noticed that for each of them the overlap of repeated analyses is quite OK. However, the "between-implementation" variation is (much) larger... I'll do some more studies and report back.
G

@AboubakarAhamada
Copy link

AboubakarAhamada commented Feb 5, 2019

I have a similar problem but for me the gsea results change run by run. The pathways who were on the top of list are not the same when I re-run . What did I wrong ?
Thanks.

GuangchuangYu pushed a commit that referenced this issue Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants