TPM values depend on Combination #282

keksundso · 2023-09-19T13:15:48Z

Using sleuth I can calculate the TPM-Value of a gene in a Sample (sleuth_prep followed by sleuth_to_matrix).

Now if I have 6 samples (6 abundance tables from Kallisto) with two belonging to on of the three conditions P, G and H, I would expect to get one TPM-Value per gene per sample.

However, the TPM-Value for Gene i in sample j is not a fixed value, but i varies depended on the combinations of conditions which go into sleuth_prep.

E.g. gene i in P1 has a different TPM-value when condition P and G goes into sleuth_prep compared to gene i in P1 with the condition P and H. See minimal example below:

getSleuthObj <- function(s2c){
    transcript2gene <- read_delim(transcript2genePath, delim = "\t", col_names=c("target_id","ens_gene","ext_gene"),show_col_types = FALSE )
    sleuth.obj <- sleuth_prep(sample_to_covariates = s2c, 
                              target_mapping = transcript2gene, 
                              extra_bootstrap_summary = TRUE,
                              read_bootstrap_tpm = TRUE, 
                              aggregation_column = 'ens_gene',
                              num_cores = numberOfCores,
                              gene_mode = TRUE
    )
    
    tpms <- sleuth_to_matrix(sleuth.obj, "obs_norm", "tpm")
    
    tpms <- as.data.frame(tpms)
    tpms$ens_gene <- rownames(tpms)
    tpms$ext_gene <- sleuth.obj$target_mapping$ext_gene[match(tpms$ens_gene, sleuth.obj$target_mapping$ens_gene)]
    rownames(tpms) <- NULL
    
    return(tpms)
    }

sleuth_PG <- getSleuthObj(rbind( 
    data.frame(sample = c("P1","P2"),
               condition = "P",
               path = c("/home/keks/app/data/P1","/home/keks/app/data/P2"),
               stringsAsFactors = FALSE)
    ,
    data.frame(sample = c("G5","G6"),
               condition = "G",
               path = c("/home/keks/app/data/G5","/home/keks/app/data/G6"),
               stringsAsFactors = FALSE)
))
sleuth_PH <- getSleuthObj(rbind( 
    data.frame(sample = c("P1","P2"),
               condition = "PC",
               path = c("/home/keks/app/data/P1","/home/keks/app/data/P2"),
               stringsAsFactors = FALSE)
    ,
    data.frame(sample = c("H3","H4"),
               condition = "H",
               path = c("/home/keks/app/data/H3","/home/keks/app/data/H4"),
               stringsAsFactors = FALSE)
))

The text was updated successfully, but these errors were encountered:

mschilli87 · 2023-09-20T08:15:41Z

I didn't take a deep look but my first guess would be that depending on your input, a different set of transcripts passes sleuth's internal filters, resulting in a different total number of reads and features being used as the basis for the TPM normalization. If you have less features, all remaining features will end up with higher numbers because TPM always sum up to 1 M. So maybe you can get the behaviour that you want by manually overwriting the filter settings.

keksundso · 2023-09-20T11:59:41Z

This sounds reasonable, especially since the variation is a constant shift proportional to the tpm values as it would be expected by a different total read number.
I replaced the filter function by a custom one which should not filter anything:

myFilter <- function (row, min_reads = 0, min_prop = 0) 
{
    mean(row >= min_reads) >= min_prop
}

sleuth.obj <- sleuth_prep(sample_to_covariates = s2c, 
                           target_mapping = transcript2gene, 
                           extra_bootstrap_summary = TRUE,
                           read_bootstrap_tpm = TRUE, 
                           aggregation_column = 'ens_gene',
                           num_cores = numberOfCores,
                           gene_mode = TRUE,
                           filter_fun = myFilter

As expected, in both combinations the same number of targets and genes now pass the filter.
However, the difference in the tpm values between both combinations persist.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TPM values depend on Combination #282

TPM values depend on Combination #282

keksundso commented Sep 19, 2023

mschilli87 commented Sep 20, 2023

keksundso commented Sep 20, 2023

TPM values depend on Combination #282

TPM values depend on Combination #282

Comments

keksundso commented Sep 19, 2023

mschilli87 commented Sep 20, 2023

keksundso commented Sep 20, 2023