Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard workflow for the "Introduction to RNA-seq" episode #16

Open
almeidasilvaf opened this issue Jan 13, 2023 · 4 comments
Open

Standard workflow for the "Introduction to RNA-seq" episode #16

almeidasilvaf opened this issue Jan 13, 2023 · 4 comments

Comments

@almeidasilvaf
Copy link
Contributor

Hello, everyone.

The Introduction to RNA-seq is currently empty, so I would like to contribute to it.

As far as I understood it, this episode should contain instructions on how to go from raw FASTQ files to a matrix of transcript abundances, including pre-processing steps (e.g., sequence QC, trimming adapters and low-quality sequences, etc).

However, as there are several options of software tools to use in each step of the pipeline, I think we should first agree on a workflow to use. I think we can build on the Bioc workflow package rnaseqGene. My suggested workflow would be:

  1. QC, trimming and filtering with fastp
  2. Quantification of transcript abundances with salmon
  3. Data import with tximport - maybe this could go to the beginning of the Importing and annotating quantified data into R episode?

I'd love to hear what you all think.

Best,
Fabricio

@csoneson
Copy link
Collaborator

Hi Fabricio,

thanks a lot - that would be great! On my side, I agree with your suggested workflow (my preference though would be to directly use tximeta rather than tximport as it involves less downstream fiddling to get the object into a suitable shape for further analysis). It would probably also be good to have a short introduction on the biological/technological side - describing what we are actually measuring with RNA-seq.

@almeidasilvaf
Copy link
Contributor Author

Thank you for your feedback, Charlotte.

Indeed, tximeta would be better. I will see what I can do to avoid having to download huge FASTQ files from ENA to use in the episode. I think there might be some nice FASTQ files on ExperimentHub that I can use.

I will also try to write a short intro on what RNA-seq is.

@jdrnevich
Copy link
Collaborator

jdrnevich commented Jan 13, 2023 via email

@almeidasilvaf
Copy link
Contributor Author

almeidasilvaf commented Jan 13, 2023

Thank you for bringing these points up, Jenny.

fastp and salmon can be run on a laptop without problems (I myself have done it on an Ubuntu laptop with 8 GB RAM). The issue here might be compatibility with multiple platforms. I have not tried installing fastp and salmon on Windows and macOS, so I'm not sure if that would be an issue. I will try that asap and let you know.

I saw the discussion on Slack, and I will try to think of solutions to this issue, from using Orchestra to Desktop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants