Speed up sleuth_fit, custom shrinkage function, reconfigure transform API, misc bug fixes #213

warrenmcg · 2019-03-14T22:17:30Z

Hi @pimentel,

There are a few different things that have been changed here:

significant speed boost in sleuth_fit by doing the measurement error and beta covariance calculations with the full matrix rather than by row
significant memory improvement by removing any environment associated with formulas supplied by users.
added functionality for users to supply their own shrinkage functions in sleuth_fit using the shrink_fun argument. The default sleuth shrinkage function is a new function called basic_shrink_fun. I also implemented limma's shrinkage function in a new function called squeezeVar. This should address concerns raised by NA in likelihood ratio test with small number of transcripts #173.
changed the API for log_transform and added one for TPMs (tpm_transform), so that they can take a single size factor when normalizing bootstraps, or can use a vector of size factors equal in length to the number of samples when transforming the full matrix. This allows for sleuth-ALR to separate the normalization and transformation steps.

I have also fixed various small bug fixes. I'll update this description when I've merged in more:

I fixed Error in sleuth_live using gene_mode=TRUE #190 by bringing back gene_from_gene. This incorporates the same patch included with pull request Emergency patch to fix #190, part 2 #192.
Replaced outdated eval statements in the shiny scatter plot panel with rlang::eval_tidy to fix scatter plots -> Error in as.data.frame.default #194
Improved sanity checking in sleuth_results to better handle when gene_mode and pval_aggregate are both TRUE, to fix strange error from sleuth_results in gene mode #202

To-dos:

add download button for LRT table in shiny to fix likelihood test table in shiny has no down button #62 (and check for other missing buttons)
add a better error message to alert users to the edge cases brought up in Issue reading h5 files #120 and HDF5 files not being read properly (hdf5 lib incompatibility, or base R and rhdf5 clash?) #175
fix dependency error seen in Unstated namespace dependency on data.table::head #135
fix bug when users supply factors when loading kallisto results, seen in add support of strings as factors #197
fix bug when subsetting kallisto, seen in subsetting kallisto object using subset_kallisto() #204
fix heat map issues in Order of labels_row in plot_transcript_heatmap #207 and cluster_transcripts = FALSE still cluster rows in plot_transcript_heatmap #208
fix gene view error seen in Shiny App Not allowing to plot bootstraps using ext_gene #209 (I think this has been fixed by 5b4a330)
fix typo in sleuth_prep documentation mentioned in typo in ?sleuth_prep #212

…t check for obj so that this function can be used with p-value aggregation mode

+ this is done by using the whole matrix rather than modeling by row + an additional small change is to make target_id the first column, which means that the final summary table in the sleuth_fit object will have target_id first, instead of in the middle. This will ease user readability if decide to view the summary table directly for columns not selected in sleuth_results + add documentation for me_model

…uth_fit API: + the models slot for the sleuth_model object is now the one model with the full matrix + these functions have retained the old code to remain backwards compatible with older versions of sleuth

…bjects: + speed up the calculation using matrix multiplication rather than lapply + calculate beta covariances for later use + bonus that this reduces memory footprint of the sleuth_model object further + update 'extract_model' and 'sleuth_wt' to use the new matrix format

+ this will allow the option to supply custom shrinkage procedures + the default is now 'basic_shrink_fun', which contains the old sleuth shrinkage procedure done by 'sliding_window_grouping' and then 'shrink_df' + any custom function must take the list produced by me_model, and then output the mes_df within that list modified with at least one new column: "smooth_sigma_sq", the smooth variances + add updated documentation to 'sleuth_fit' + add documentation for 'basic_shrink_fun' + shrink_fun is now an additional slot on the sleuth_model object, to provide provenance for how the variance estimation was done

+ remove specified extra options needed in sleuth_fit call + use 'do.call' instead of directly calling shrink_fun + this solution was discussed [here](https://stackoverflow.com/a/7028630)

…odels in sleuth_lrt

+ in testing, this was capturing the full sleuth object, essentially doubling the size of the final sleuth object

+ now only happens if there is more than one row, indicating that there is a beta beyond the intercept-only + this prevents an error occurring if the intercept-only model is used

… function + process bootstrap now normalizes TPMs by TPM size factors instead of indirectly by count size factor + add 'sf' argument to the default log_transform function, taking a single size factor when normalizing bootstraps, or a vector of size factors equal in length to the number of samples when normalizing the observed counts + add 'tpm_transform' as an equivalent default function for TPMs. + this will facilitate an alternative approach for sleuth-ALR that decouples normalization and transformation steps

…n the target IDs in the fit + this now guarantees that the fit object, the summary table, and the beta_covars table are all in the same order

+ switch to rlang::eval_tidy + add rlang to NAMESPACE and import list + fixes issue pachterlab#194

pimentel · 2019-12-17T21:54:55Z

NAMESPACE

@@ -84,5 +87,7 @@ importFrom(data.table,fread)
 importFrom(dplyr,"%>%")
 importFrom(lazyeval,interp)
 importFrom(lazyeval,lazy)
+importFrom(limma,squeezeVar)


is this used anywhere? I don't see it

just kidding, see it now. hm. I'd like to not depend on yet another package, ideally

warrenmcg added 24 commits June 27, 2018 14:23

add back in deleted 'gene_from_gene' method to fix pachterlab#190

9acf16e

fix outdated documentation for 'sleuth_to_matrix'

50b35d0

fix bug found by @lmigueel in pachterlab#190; switch condition to jus…

5b4a330

…t check for obj so that this function can be used with p-value aggregation mode

remove redundant code; this is taken care of in sleuth_fit

a289b4d

change sleuth_lrt, sleuth_wt, and extract_model to handle the new sle…

b7cdbc8

…uth_fit API: + the models slot for the sleuth_model object is now the one model with the full matrix + these functions have retained the old code to remain backwards compatible with older versions of sleuth

add sanity checks for shrink fun

a90ac31

modify how the shrinkage function is called to prevent warnings:

73c61ff

+ remove specified extra options needed in sleuth_fit call + use 'do.call' instead of directly calling shrink_fun + this solution was discussed [here](https://stackoverflow.com/a/7028630)

add a variance shrinkage function to use limma's procedure

a02423d

add limma to imports list to reflect adding the squeezeVar function

55e061d

fix bug that occurs when calculating degrees of freedom between two m…

f211f2f

…odels in sleuth_lrt

remove a line referencing a basic_shrink_fun specific column

d33e620

fix memory leak from formulas that have captured external environments

243716b

+ in testing, this was capturing the full sleuth object, essentially doubling the size of the final sleuth object

replace outdated aperm with t

f12f6ad

add sanity check for adding off diagonal beta covariances

17242ba

+ now only happens if there is more than one row, indicating that there is a beta beyond the intercept-only + this prevents an error occurring if the intercept-only model is used

fix bug where sleuth_model summary table was in a different order tha…

79e99cf

…n the target IDs in the fit + this now guarantees that the fit object, the summary table, and the beta_covars table are all in the same order

fix eval statements for scatter plot on shiny:

d733061

+ switch to rlang::eval_tidy + add rlang to NAMESPACE and import list + fixes issue pachterlab#194

Merge branch 'issue194' into speedy_fit

d1c2d86

update sanity check for gene_mode and pval_aggregate in sleuth_results

653908d

Merge branch 'issue202' into speedy_fit

e8df07a

fix bug where obs_to_matrix was using obs_raw rather than obs_norm

4dc4355

This was referenced Mar 16, 2019

Shiny App Not allowing to plot bootstraps using ext_gene #209

Open

Miscellaneous bug fixes #214

Merged

small version bump

b3b1c58

pimentel reviewed Dec 17, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up sleuth_fit, custom shrinkage function, reconfigure transform API, misc bug fixes #213

Speed up sleuth_fit, custom shrinkage function, reconfigure transform API, misc bug fixes #213

warrenmcg commented Mar 14, 2019 •

edited

pimentel Dec 17, 2019

pimentel Dec 17, 2019

Speed up sleuth_fit, custom shrinkage function, reconfigure transform API, misc bug fixes #213

Are you sure you want to change the base?

Speed up sleuth_fit, custom shrinkage function, reconfigure transform API, misc bug fixes #213

Conversation

warrenmcg commented Mar 14, 2019 • edited

pimentel Dec 17, 2019

Choose a reason for hiding this comment

pimentel Dec 17, 2019

Choose a reason for hiding this comment

warrenmcg commented Mar 14, 2019 •

edited