sRNA spike-ins #116

mschilli87 · 2023-11-09T10:05:19Z

More and more sRNA-seq projects utilize RNA spike-ins as controls.

What is the recommended workflow to quantify these along miRNAs using miRDeep2's quantifier module?
I added the corresponding sequences to the mature miRNA input FASTA but they don't get picked up. I am guessing I need to at least make up some 'hairpins' for them as well (How? Can I just use the mature sequence or do I need to actually craft legit hairpin-structures to bypass some internal filtering? And what do I do with spike-ins that are longer than your typical mature miRNA but well within the read length - say 35nt RNA spike-in @ 51 cycles single-end Illumina sequencing?).
But will I need to make up a fake chromsome as well that hosts those fake hairpins and re-build an new genome bowtie index just to quantify the spike-ins along endogenous miRNAs?

Any advise would be highly appreciated.

Also, would you be up to adding a new optional input FASTA with 'mature' spike-in sequences that are quantied as-is by the quantifier module?

Drmirdeep · 2023-11-09T14:09:28Z

You will just need your spikein sequences in your mature file and precursor file definition (length doesn't matter)

cat mature.fa
>spikein1
CTACGATCGATGCTAGCTAGCAGGCGATCAGCATCGATGATCGATTA

cat pres.fa
>spikein1
CTACGATCGATGCTAGCTAGCAGGCGATCAGCATCGATGATCGATTA

just run it in its most simple form and give the mature.fa as the precursor input as well

perl quantifier.pl -r reads_collapsed.fa -m mature.fa -p mature.fa

so just add your spikins to your mature file and to the precursor file

mschilli87 · 2023-11-09T15:52:03Z

Thank you. I can confirm that copying the 'mature' sequences to the 'precursor' FASTA results in miRDeep2 picking up the RNA spike-ins for quantification.

mschilli87 · 2024-02-06T14:04:47Z

Sorry for re-opening this: I noticed that only reads matching the spike-in exactly are quantified using this approach. I noticed because the results did not add up and by a simple grep 'analysis' I found that most reads in my data that contain an exact match of a specific spike-in contain a final T after pre-processing. I do not know if this could be an actual problem with the spike-ins or is just an artifact from insufficient adapter-removal but I found it weird that the exact 'mature' sequence plus a single (non-templated) nucleotide would not be picked up mir miRDeep2. Now I am wondering if that is due to the processed read being longer than the 'precursor' sequence of that spike-in. Any advice on how to adjust my input or tweak the parameters to pick those up?

edit: Just a quick update: It looks like indeed I missed a leading T in the 3' adapter sequence during pre-processing. I'll try to see if this is enough to fix the issue. Still, I am curious if this is expected as untemplated bases added to miRNAs post-transcriptionally have been shown to be biologically relevant so it would be good to know if miRDeep2 misses those by default or if this is related to the precursor=mature hack I used for the spike-ins.

Drmirdeep · 2024-02-06T14:51:08Z

Then just put a few A‘s in the end of the precursor. Mismatches can be allowed up to 2 if I remember correctly. If the mature is longer than the precursor then bowtie will fail to map I guess

mschilli87 · 2024-02-06T14:53:26Z

That's what I am currently guessing as well. Do you know if N's would be allowed as well? I would prefer not to bias the mapper towards any specific untemplated base. Thank you for the fast response.

mschilli87 · 2024-02-06T15:04:09Z

I can confirm the spike-in counts changed by orders of magnitude while miRNA counts remains almost unchanged. So the extreme sensitivity to non-templated bases appears to be an issue for the spike-ins only, likely due to what was discussed here. I will leave this issue open until as a reminder to update it with my final conclusions in case anyone else searches this tracker looking for advice how to include spike-ins.

mschilli87 · 2024-02-06T16:04:58Z

When padding the 'mature' spike-ins with ten N's on either side for the 'hairpins' I do not get any spike-in quantification. IIRC, miRDeep2 discards sequences with non ACTG(U?) characters (at least by default). ~~@Drmirdeep: Would you expect A's to perform better and if so, do you share my concerns regarding the bias introduced by those?~~

edit: As expected, replacing the N's with A's did yield spike-in results again, with slightly more reads mapping to them as before. I feel like some padding should be done to remove the bias agains spike-ins (compared to endogenous miRNAs) in case of imperfect pre-processing but I am concerned that picking a specific nucleotides introduces another bias. @Drmirdeep: Could you provide your point of view?

Drmirdeep · 2024-02-06T19:31:20Z

If the preprocessing is the problem then this should be solved on the preprocessing side not on a tool that expects the input data to follow certain conventions

mschilli87 · 2024-02-07T07:09:28Z

I guess I was not clear enough: The pre-processing issue was the reason the quantification was off orders of magnitude. That is fixed and not my issue. However, that made me realise that a read that, for whatever reason, has an extra base maps just fine for actual miRNAs but not for spike-ins. The whole point of spike-ins is to serve as controls. So they should be treated as close to identitcal as possible. So in a way the problem is that my pre-processing mistake affected spike-ins but not miRNAs (or at least much less). Padding the spike ins with A's helps, because now they are treated more similar as miRNAs. But as along as I fix that padding, there is still a bias that is spike-in specific. And ideally, nothing should be specific for the thing you normalize to. 😉

mschilli87 added the question label Nov 9, 2023

mschilli87 closed this as completed Nov 9, 2023

mschilli87 added the solved label Nov 9, 2023

mschilli87 reopened this Feb 6, 2024

mschilli87 added help wanted and removed solved labels Feb 6, 2024

mschilli87 removed the help wanted label Feb 6, 2024

mschilli87 self-assigned this Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sRNA spike-ins #116

sRNA spike-ins #116

mschilli87 commented Nov 9, 2023 •

edited

Drmirdeep commented Nov 9, 2023 •

edited

mschilli87 commented Nov 9, 2023

mschilli87 commented Feb 6, 2024 •

edited

Drmirdeep commented Feb 6, 2024

mschilli87 commented Feb 6, 2024 •

edited

mschilli87 commented Feb 6, 2024

mschilli87 commented Feb 6, 2024 •

edited

Drmirdeep commented Feb 6, 2024

mschilli87 commented Feb 7, 2024 •

edited

sRNA spike-ins #116

sRNA spike-ins #116

Comments

mschilli87 commented Nov 9, 2023 • edited

Drmirdeep commented Nov 9, 2023 • edited

mschilli87 commented Nov 9, 2023

mschilli87 commented Feb 6, 2024 • edited

Drmirdeep commented Feb 6, 2024

mschilli87 commented Feb 6, 2024 • edited

mschilli87 commented Feb 6, 2024

mschilli87 commented Feb 6, 2024 • edited

Drmirdeep commented Feb 6, 2024

mschilli87 commented Feb 7, 2024 • edited

mschilli87 commented Nov 9, 2023 •

edited

Drmirdeep commented Nov 9, 2023 •

edited

mschilli87 commented Feb 6, 2024 •

edited

mschilli87 commented Feb 6, 2024 •

edited

mschilli87 commented Feb 6, 2024 •

edited

mschilli87 commented Feb 7, 2024 •

edited