Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significance status of signature: NA #9

Open
maxnest opened this issue Sep 23, 2021 · 3 comments
Open

Significance status of signature: NA #9

maxnest opened this issue Sep 23, 2021 · 3 comments

Comments

@maxnest
Copy link

maxnest commented Sep 23, 2021

Dear Dr. Hajk-Georg Drost,
Let me thank you for the awesome and very important myTAI library. In my project, I used myTAI to determine the transcriptome age indices of the life cycle stages of different flatworm species. When considering the complete transcriptomes, no problems arose, and very interesting results were obtained. However we are faced with the NaNs produced warning during PlotSignature when analyzing the “reduced” dataset, including only genes of last common ancestor of the studied species:

Plot signature: ' TAI ' and test statistic: ' FlatLineTest ' running 1000 permutations.
$start.arg
$start.arg$shape
[1] Inf
$start.arg$rate
[1] Inf
$fix.arg
NULL
Significance status of signature: NA
Warning:
1: In dgamma(c(0.000122768450181325, 0.000122768450181325, 0.000122768450181325, :
NaN produced
2: In stats::pgamma(real.var, shape = shape, rate = rate, lower.tail = FALSE) :
NaN produced

Used code:

library(edgeR)
library(myTAI)
library(dplyr)
library(phylotools)

fgig_phylostratr <- read.csv2("../../Phylostratr/Fgigantica/Fgigantica_phylostratr_results.tsv", sep="\t", header = T)
fgig_digenea_ancestor <- get.fasta.name(infile="../../Ancestral_pyHAM/Fgigantica_100aa.Digenea_ancestor.fasta")
fgig_expr <- read.csv2("../../Expr_quant/Fgig_decontaminated_salmon_quant/Fgigantica_100aa_unaveraged_TPMs.tsv", sep="\t", header = T)

fgig_exp_filtered <- subset(fgig_expr, GeneIDs %in% fgig_digenea_ancestor)
fgig_exp_filtered_mut <- mutate_all(fgig_exp_filtered[, 1:length(colnames(fgig_exp_filtered))], function(x) as.numeric(as.character(x)))
fgig_exp_filtered_mut$GeneIDs <- fgig_exp_filtered$GeneIDs
fgig_exp_filtered_mut_mean <- data.frame("GeneIDs"=fgig_exp_filtered_mut$GeneIDs, "Egg"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_egg_rep1", "Fgig_egg_rep2", "Fgig_egg_rep3")]),
"Miracidium"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_miracidium_rep1", "Fgig_miracidium_rep2", "Fgig_miracidium_rep3")]),
"Redia"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_redia_rep1", "Fgig_redia_rep2", "Fgig_redia_rep3")]),
"Cercaria"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_cerc_rep1", "Fgig_cerc_rep2", "Fgig_cerc_rep3")]),
"Metacercaria"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_metacerc_rep1", "Fgig_metacerc_rep2", "Fgig_metacerc_rep3")]),
"Juvenile_42_days"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_juv_42d_rep1", "Fgig_juv_42d_rep2", "Fgig_juv_42d_rep3")]),
"Juvenile_70_days"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_juv_70d_rep1", "Fgig_juv_70d_rep2", "Fgig_juv_70d_rep3")]),
"Adult"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_adult_rep1", "Fgig_adult_rep2", "Fgig_adult_rep3")]))
fgig_phylostratr_filtered <- subset(fgig_phylostratr, qseqid %in% fgig_digenea_ancestor)

colnames(fgig_phylostratr_filtered) <- c("GeneIDs", "MRCA", "Phylostratum", "MRCA_name")
fgig_phylomap <- select(fgig_phylostratr_filtered, "Phylostratum", "GeneIDs")
fgig_phyloexp <- MatchMap(fgig_phylomap, fgig_exp_filtered_mut_mean)

fgig_phyloexp_tf <- tf(fgig_phyloexp, function(x) log2(x+1))
pdf("Fgigantica_logTPMs_TAI.digenea_ancestor.pdf", width = 14)
PlotSignature(ExpressionSet = fgig_phyloexp_tf, measure = "TAI",
TestStatistic = "FlatLineTest", xlab="F.gigantica complex life cycle", ylab="TAI: Digenean ancestor genome model", permutations = 1000)
dev.off()

As a result, on the created plot we see “p_flt = NaN”. I guess is that there is not enough observation count in fgig_phyloexp_tf (3772 observations) to run the tests. Is this possible or is there another explanation or assumption? I would be grateful for any help!
Thanks a lot!
Yours sincerely,
Maksim

@HajkD
Copy link
Member

HajkD commented Sep 26, 2021

Hi Maksim,

I am very glad to hear that you find myTAI useful for your research!

Regarding your question, usually TAI profiles should be computed on the entire transcriptome (including all genes) and not for a subset, since this may result in misleading outcomes or interpretations. On the technical side, however, computing TAI values and performing the test stats on ~4000 genes should not be the limiting step. To me the error message looks more like there is a NA value present in the input data passed internally to the FlatLineTest() function.

Could you try running:

fgig_phyloexp_tf <- na.omit(tf(fgig_phyloexp, function(x) log2(x+1)))

Within your analysis steps to see if this makes a difference?

Also, when you run the other tests: rht etc. Does the same NAN error occur?

I hope this helps!

Cheers,
Hajk

@maxnest
Copy link
Author

maxnest commented Sep 27, 2021

Dear Hajk,
Following your comment, we decided to focus on the analysis of entire transcriptomes. After a series of tests, we assumed that the error was not related to data, but to the environment in RStudio (Windows). During the reanalysis of the data, but already in Linux, we did not find any errors and got excellent results on all our data. Thank you for your help and for developing the extremely useful and very valuable myTAI.
Best wishes,
Maksim

@HajkD
Copy link
Member

HajkD commented Sep 28, 2021

Dear Maksim,

Thank you for your kind feedback and I am glad the issue is resolved. It seems like the fitdistrplus package that myTAI uses to perform parameter inference for fitting distributions in FlatLineTest() does not work on Windows machines. I will try to see if there is a workaround or please feel free to contact the authors of the fitdistrplus R package to see if they plan to support computations on Windows machines or not.

I hope this helps.

Cheers,
Hajk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants