Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tests to use new version of Sailfish #1

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

rob-p
Copy link

@rob-p rob-p commented Sep 4, 2015

Hi guys,

I've made the changes necessary to use the new version of Sailfish (v 0.7.0 and greater) with these benchmarks. There should be substantial improvements in both speed and accuracy, but the interface with the program has changed a bit. Let me know if you have any questions or want any more information.

Best,
Rob

Update Sailfish version
Any line starting with '#' is a header.  The estimated number of reads is now in column 4 instead of 7.
Since sailfish includes the entire FASTA header as the name of the target, we have to process these output files for the rest of the pipeline to work.
Add code to fix the target names
-k argument no longer required for sailfish
@mqbssppe
Copy link
Member

mqbssppe commented Sep 8, 2015

Dear Rob,

we'll try the most recent version of sailfish in our benchmarks. Thank you for letting us know.

Best

Panos

@rob-p
Copy link
Author

rob-p commented Sep 8, 2015

Hi Panos,

Thanks! Please keep me posted on how it performs in your tests; I'll be very interested to know. As you can see, some things about the interface have changed and my pull request should deal with all of them. However, if you run into any issues (as I've not run all of the tests with these changes), please let me know.

Best,
Rob

@mqbssppe
Copy link
Member

mqbssppe commented Oct 1, 2015

Hi Rob,

finally I've found some time to run the new version of sailfish. It seems that your modifications are quite cool, since they are improving sailfish performance. The main difference I observed is that now sailfish exhibits a significantly larger mapping rate (approximately 95%) compared to version 0.6.3 which mapped almost 63% (as we report in the manuscript).

The new results are shown in the attached *pdf files

simulation criteria (corresponds to Fig.2 of the manuscript)
simCriteriaUpdatedwithSailfish0.7.3.pdf

run-times (corresponds to Fig.3 of the manuscript)
sim-spanki-times-updatedWithSailfish0.7.3.pdf

Thank you for letting us know for this updated version.

Best

Panos

@rob-p
Copy link
Author

rob-p commented Oct 2, 2015

Hi Panos,

Thanks so much for testing this and reporting it here! The big change between Sailfish v0.6.3 and versions >= v0.7 is the move from simple k-mer counting to making use of quasi-alignments produced by RapMap. This improves the speed, as quasi-alignments, which make use of a suffix array, get rid of the need to hash all k-mers independently. It also improves the memory usage as well since the suffix array acts as a fairly compact representation of the transcriptome. This has re-invigorated development of Sailfish — we're up to version 0.7.6 now — and led us to back-port some useful features from Salmon (e.g. optional posterior Gibbs sampling --useGSOpt and Variational Bayesian EM optimization --useVBOpt) and even implement some new ones (e.g. bootstrap sampling).

Interestingly, we find that the Variational Bayesian EM seems to perform better in our testing than the "standard" EM algorithm originally adopted by Sailfish and used by Kallisto. Thus, we're very excited by your even better natural gradient-based approach to optimizing the variational objective (since our optimization step is already a minor slice of the overall runtime, we're primarily interested in the improved accuracy)!

Thanks!
Rob

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants