Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All p-values are equal to 1 #690

Open
MyBiscuit opened this issue Apr 22, 2024 · 1 comment
Open

All p-values are equal to 1 #690

MyBiscuit opened this issue Apr 22, 2024 · 1 comment

Comments

@MyBiscuit
Copy link

I was running ClusterProfiler without any problems for several datasets, except for this time. I am using rice genome and follow this protocol
My issue is that all p-values are equal to 1. This is my code:

col_names <- c("chromosome", "start", "end")

NPB <- read.table(file = "/Users/soniamukherjee/Desktop/TADS_project/NPB_TADs.bed", col.names = col_names, header = FALSE)
NPB
genes <- read.table("/Users/soniamukherjee/Desktop/TADS_project/IRGSP_genes.bed", header = FALSE, col.names = c("chromosome", "start", "end", "gene_id"))
gene_names <- genes$gene_id

library(GenomicRanges)

genes_gr <- GRanges(
  seqnames = Rle(genes$chromosome),
  ranges = IRanges(start = genes$start, end = genes$end),
  gene_id = genes$gene_id  
)


all_gr <- GRanges(
  seqnames = Rle(NPB$chromosome),
  ranges = IRanges(start = NPB$start, end = NPB$end)
)

intersecting_genesA <- genes_gr[genes_gr %over% all_gr]

gene_namesA <- mcols(intersecting_genesA)$gene_id
gene_namesA
noT <- setdiff(gene_names, gene_namesA)


library (GO.db)
library (dplyr)
## make GO ID-GO name mapping tables
# extract GO term information from a Bioconductor package, GO. db
go_table <- as.data.frame(GOTERM)
# only take go_id, Term, and Oncology columns and remove duplicated rows
godb <- unique(go_table[, c (1,3,4)])
# separate the GO terms into 3 ontology types
godb_BP <- godb[godb$Ontology =="BP", 1:2]
godb_MF <- godb[godb$Ontology == "MF", 1:2]
godb_CC <- godb[godb$Ontology =="CC", 1:2]
# save results
write.table(data.frame(godb_BP), file="/Users/soniamukherjee/Desktop/godb_BP.txt", sep = "\t", quote = FALSE, row.names = FALSE)
write.table(godb_MF, "/Users/soniamukherjee/Desktop/godb_MF.txt", sep = "\t", quote = FALSE,
            row.names = FALSE)
write.table(godb_CC, "/Users/soniamukherjee/Desktop/godb_CC.txt", sep = "\t", quote = FALSE,
            row.names = FALSE)

goid2gene_all <- read.table("/Users/soniamukherjee/Desktop/TADS_project/cache/combinedU.tsv", header=T,
                            sep="\t", quote="'", stringsAsFactors = F)
colnames(goid2gene_all) <- c("go_id", "gene_id")

# seperate genes based on their GO annotation types
goid2gene_BP <- goid2gene_all %>% dplyr::filter(go_id %in% godb_BP$go_id)
goid2gene_MF <- goid2gene_all %>% dplyr:: filter(go_id %in% godb_MF$go_id)
goid2gene_CC <- goid2gene_all %>% dplyr:: filter(go_id%in% godb_CC$go_id)
# save results
write.table(goid2gene_BP, "/Users/soniamukherjee/Desktop/goid2gene_BP.txt", sep="\t", row.names=FALSE, quote=FALSE)
write.table(goid2gene_MF, "/Users/soniamukherjee/Desktop/goid2gene_MF.txt", sep="\t",row.names=FALSE, quote=FALSE)
write.table(goid2gene_CC, "/Users/soniamukherjee/Desktop/goid2gene_CC.txt", sep="\t",row.names=FALSE, quote=FALSE)

library (clusterProfiler)
library(ggplot2)


term2gene <- read.table("/Users/soniamukherjee/Desktop/goid2gene_BP.txt", sep="\t", header=T, quote="", stringsAsFactors = F)
term2name <- read.table("/Users/soniamukherjee/Desktop/godb_BP.txt", sep="\t", header=T, quote="", stringsAsFactors= F)



go <- enricher(gene = gene_namesA, # a vector of gene id
               universe = noT, # background genes
               TERM2GENE = term2gene, # user input annotation of TERM TO GENE mapping
               TERM2NAME = term2name, # user input of TERM TO NAME mapping
               pvalueCutoff = 0.05, # p-value cutoff (default)
               pAdjustMethod = "BH", # multiple testing correction method to calculate adjusted p-value (default)
               qvalueCutoff = 0.2, # q-value cutoff (default). q value: local FDR corrected p-value.
               minGSSize = 10, # minimal size of genes annotated for testing (default)
               maxGSSize = 500, # maximal size of genes annotated for  testing (default)
)



go_df <- as.data.frame (go)
write.table(go_df, "cache/go_df.txt", sep="\t", row.names=FALSE,quote=FALSE)
p1 <- dotplot(go, showCategory=10,title = "Top 3 most statistically significant enriched GO terms (CC) of genes located within all TAD ",
              font.size = 10)
p1
ggsave (p1,
        filename = "/Users/soniamukherjee/Desktop/TADS_project/2kbCC_dotplot.png",
        height = 12, width = 30, units = "cm" )

The resulting p-values are all 1. What is the issue?

@shihsama
Copy link

It's not your fault. The p-values are showed as "all one" because they are too large and very close to one (the decimal digits can be hundreds). So don't worry a lot. After all, they are not significant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants