Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up sleuth_fit, custom shrinkage function, reconfigure transform API, misc bug fixes #213

Open
wants to merge 25 commits into
base: devel
Choose a base branch
from

Conversation

warrenmcg
Copy link
Collaborator

@warrenmcg warrenmcg commented Mar 14, 2019

Hi @pimentel,

There are a few different things that have been changed here:

  • significant speed boost in sleuth_fit by doing the measurement error and beta covariance calculations with the full matrix rather than by row
  • significant memory improvement by removing any environment associated with formulas supplied by users.
  • added functionality for users to supply their own shrinkage functions in sleuth_fit using the shrink_fun argument. The default sleuth shrinkage function is a new function called basic_shrink_fun. I also implemented limma's shrinkage function in a new function called squeezeVar. This should address concerns raised by NA in likelihood ratio test with small number of transcripts #173.
  • changed the API for log_transform and added one for TPMs (tpm_transform), so that they can take a single size factor when normalizing bootstraps, or can use a vector of size factors equal in length to the number of samples when transforming the full matrix. This allows for sleuth-ALR to separate the normalization and transformation steps.

I have also fixed various small bug fixes. I'll update this description when I've merged in more:

To-dos:

…t check for obj so that this function can be used with p-value aggregation mode
+ this is done by using the whole matrix rather than modeling by row
+ an additional small change is to make target_id the first column,
which means that the final summary table in the sleuth_fit object
will have target_id first, instead of in the middle. This will ease
user readability if decide to view the summary table directly for
columns not selected in sleuth_results
+ add documentation for me_model
…uth_fit API:

+ the models slot for the sleuth_model object is now the one model with the full matrix
+ these functions have retained the old code to remain backwards compatible with older
versions of sleuth
…bjects:

+ speed up the calculation using matrix multiplication rather than lapply
+ calculate beta covariances for later use
+ bonus that this reduces memory footprint of the sleuth_model object further
+ update 'extract_model' and 'sleuth_wt' to use the new matrix format
+ this will allow the option to supply custom shrinkage procedures
+ the default is now 'basic_shrink_fun', which contains the
old sleuth shrinkage procedure done by 'sliding_window_grouping'
and then 'shrink_df'
+ any custom function must take the list produced by me_model, and
then output the mes_df within that list modified with at least one
new column: "smooth_sigma_sq", the smooth variances
+ add updated documentation to 'sleuth_fit'
+ add documentation for 'basic_shrink_fun'
+ shrink_fun is now an additional slot on the sleuth_model object,
to provide provenance for how the variance estimation was done
+ remove specified extra options needed in sleuth_fit call
+ use 'do.call' instead of directly calling shrink_fun
+ this solution was discussed [here](https://stackoverflow.com/a/7028630)
+ in testing, this was capturing the full sleuth object,
essentially doubling the size of the final sleuth object
+ now only happens if there is more than one row, indicating that there is a beta beyond the intercept-only
+ this prevents an error occurring if the intercept-only model is used
… function

+ process bootstrap now normalizes TPMs by TPM size factors instead of indirectly by count size factor
+ add 'sf' argument to the default log_transform function, taking a single size factor
when normalizing bootstraps, or a vector of size factors equal in length to the number of samples
when normalizing the observed counts
+ add 'tpm_transform' as an equivalent default function for TPMs.
+ this will facilitate an alternative approach for sleuth-ALR that decouples normalization and transformation steps
…n the target IDs in the fit

+ this now guarantees that the fit object, the summary table, and the beta_covars table
are all in the same order
+ switch to rlang::eval_tidy
+ add rlang to NAMESPACE and import list
+ fixes issue pachterlab#194
@@ -84,5 +87,7 @@ importFrom(data.table,fread)
importFrom(dplyr,"%>%")
importFrom(lazyeval,interp)
importFrom(lazyeval,lazy)
importFrom(limma,squeezeVar)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used anywhere? I don't see it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just kidding, see it now. hm. I'd like to not depend on yet another package, ideally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants