Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: [Comparative RNA-Seq analysis] #1021

Open
bioinfolabmu opened this issue Dec 4, 2023 · 2 comments
Open

Q: [Comparative RNA-Seq analysis] #1021

bioinfolabmu opened this issue Dec 4, 2023 · 2 comments
Labels
question Further information is requested

Comments

@bioinfolabmu
Copy link

Question
We are comparing different aligners and quantifiers to see their impacts on the same RNA-Seq raw data. Of course, it took long time to run one run. My guess is that we do not have to re-run the every step in the RNA-seq pipeline, but only re-run a few step that we want to compare.

For example, I want to use different qualtifier (Salmon versus htseq).

What have I tried
I have finished running the pipeline and htseq, and I have not start the downstream differential analysis. Now, I want to modify the config.yaml file to use salmon as the quantifier. My question is that both quantifier results should be saved in my results folder. They are not conflicting to each other, right? Then, next step, is to bring back the differential analysis part of pipeline to finish the salmon-based DEG results and htseq-based results.

My questions is that all these results can be saved in the same "results" folder and let config.yaml to tell which one quantifier to use for DEG analysis. right. Or, I should run entirely pipeline again seperately?

Thank you for your attention and help.

@bioinfolabmu bioinfolabmu added the question Further information is requested label Dec 4, 2023
@Maarten-vd-Sande
Copy link
Member

  1. Does seq2science need to be fully rerun? No, seq2science can continue from the last possible point, as to save compute. There is a minor "problem", in that seq2science deletes some files to not save too much unnecessary stuff (called temp files). For your case, seq2science removes the trimmed fastqs after it is done with them, because otherwise it will keep both the raw fastqs as the trimmed ones. That's a waste of space! You can turn off the removal of temp files with --snakemakeOptions notemp=True. Make sure to check if this works as expected with --dryrun, because might just delete some files you wanted to keep after all...
  2. Are the results stored in the same spot? Yes and no... The results of the quantifier are stored in the folder specific for the quantifier, so they won't overlap. The downstream results, for instance, the differential analysis, is stored in the same deseq folder. So those will be overwritten

So it is possible indeed. If you don't have too many samples then I think it is the easiest and least error prone to just run them in separate folders. However if you have a lot of samples, or are limited by compute/storage/time then you can reuse some of the seq2science output.

@siebrenf
Copy link
Member

siebrenf commented Dec 5, 2023

Adding to Maarten's asnwer: you can change the counts_dir and/or final_bam_dir in the config. That way, the final output is kept separately. Check out all configurable options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants