Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arriba for plate-based scRNA-seq #206

Open
Silvia-Bio opened this issue Jul 25, 2023 · 2 comments
Open

Arriba for plate-based scRNA-seq #206

Silvia-Bio opened this issue Jul 25, 2023 · 2 comments

Comments

@Silvia-Bio
Copy link

Hiya,

Can Arriba be used to detect fusions in single-cell RNAseq data (e.g. Smart-seq2). I expect to see a greater number of false positives compared to bulk RNAseq, and with a lower number of reads supporting the fusions. I'm wondering what parameters should be optimized for scRNAseq data, and how the list of candidate fusions should be further filtered.

Thank you,

S.

@suhrig
Copy link
Owner

suhrig commented Aug 5, 2023

Hi Silvia,

I haven't run Arriba run on SMART-Seq2 data. I will give this a try and report back. Arriba automatically adjusts to the sequencing depth and has quite sensitive filters in general, so you might be able to run it with the default parameters.

One important note upfront: If your library has UMIs, you should use the flag -u (see manual) and perform duplicate marking before running Arriba, because Arriba is not aware of UMIs.

Regards,
Sebastian

@Silvia-Bio
Copy link
Author

Hi Sebastian,

Thanks so much for getting back to me!

I went ahead and tried Arriba on my Smart-seq2 data (no UMIs), using the default settings. As suggested in the Arriba manual, I undertook further filtering to refine the results. Specifically, I:

  • Eliminated fusions with a low number of supporting reads, which were already classified as low confidence.
  • Discarded fusions involving identical genes (gene 1 and gene 2) and those containing ribosomal genes.
  • Removed fusions present in a small number of cells.

I also analysed matched bulk RNA-seq data using Arriba. I was quite encouraged to see significant overlap in the fusion findings between the single-cell and bulk RNA-seq datasets. However, two observations caught my attention:

a) There were quite a lot of novel transcripts forming fusion genes in the single-cell data.
b) This one gene, TMSB4X (located on chrX), kept showing up fused with other genes.

Interestingly, these particular fusions involving novel transcripts or TMSB4X were absent from the bulk RNA-seq data. What's more, the TMSB4X fusions are shared across the three different experimental conditions I have. However, since they are absent in the bulk data, and these fusions haven't been reported in the literature or in fusion databases, I'm kind of scratching my head trying to figure out if these fusions are real or artifacts caused by the Smart-seq2 protocol. Do you have any idea why this might have happened? Any insights would be awesome!

If you have a chance to run Arriba on another Smart-seq2 dataset, I'd love to hear what you find.

Thanks again,

S.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants