Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining STAR and Salmon #1019

Open
siebrenf opened this issue Nov 29, 2023 · 5 comments
Open

Combining STAR and Salmon #1019

siebrenf opened this issue Nov 29, 2023 · 5 comments

Comments

@siebrenf
Copy link
Member

          When I run my own data for alignment with Star, I encountered a bug.  I am debugging now to see what happens.  I noticed that Salmon as the quantifier tool, is not affected at all.   It generates its own data.  This means that Salmon is using his own alignment tool to finish the quantification itself.  My next question is that, after I fixed the bug of running star,  how can I connect the start alignment results to feed salmon for qunatification?

Originally posted by @bioinfolabmu in #1018 (comment)

@siebrenf
Copy link
Member Author

siebrenf commented Nov 29, 2023

Salmon can accept FASTQ files in mapping-based mode, or BAM files in alignment-based mode. Seq2science has only implemented the mapping based mode.

If your are using Salmon for gene quantification, but you also want to produce a trackhub, then seq2science uses STAR to generate the required BAM files. In this situation, STAR and Salmon do not interact with each other.

@siebrenf
Copy link
Member Author

siebrenf commented Nov 29, 2023

If you wish to create BAM files with STAR, and them feed them into Salmon, there are some considerations:

  1. Why do you wish to do this in the first place? (Genuine question here 😄 )
  2. Do you want to filter the BAM files using STAR before using Salmon?
  3. Do you want to filter the BAM files using seq2sience before using Salmon?

If your answer to 3 is yes, then you can run seq2science using aligner STAR. Afterward, you can runs Salmon in aligment-based mode manually. If you place the Salmon output in a folder structure that seq2science expects ({result_dir}/salmon/{assembly}-{sample}/quant.sf), you can start a new seq2science run, which should use those files as input.

If you want more fine-tuning (e.g. option 2) it might get easier writing your own script from scratch.

@bioinfolabmu
Copy link

Initially, I want to compare different aligners and their impacts to RNA-Seq analysis results. But, I found this is too time consuming, because we have 24 RNA-seq samples.

(1) I have finished one seq2science run using "star" as the aligner, and "htseq" as the quantifier. Accordingly, the gene-level and transcript-level quantification results that I obtained are solely based on star alignment results. Right?

(2) Now, I am running the second run of seq2science using "start" as the aligner, and "salmon" as the quantifier. In terms of your explanation above, the actually aligner I am using is "salmon-quant", which is seemly integrated with own quantification process. In this run of seq2science, "star" is not really utilized in gene-level and transcript-level analysis at all. Am I right? I just want to double check.

(3) In your download_fastq pipeline, you provide several aligners including bwa, bowtie2, gmap, hisat2, star and so no. Have you explore that which aligner will provide the best or more accurate results in RNA-seq analysis? I know this is very general, yet challenging question. I ask just in case you guys did some exploration here...

@bioinfolabmu
Copy link

OK, I checked the log file. For my question (2), star alignment is totally irrelevant to "salmon-quant" that generates transcript-level quantification output, which were than converted into gene-level quantification results using "pytxi_count_matrix".

@siebrenf
Copy link
Member Author

(1) correct
(2) correct
(3) In my experience the differences are small (check out this benchmark paper) unless you are aligning to a new/complex genome (source)
Personally, I choose STAR if I will be working with gene counts and/or BAM files, and Salmon if I will be working with TPM values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants