Many reads fail QC during TALON run but meet primary, coverage, and identity filters #98

callumparr · 2022-04-15T10:02:32Z

Using TALON v5 installed python setup.py install on HPC running Debian

Using python version 3.6.7

talon --f /analysisdata/fantom6/Interactome/ONT-CAGE_TALON_Callum/F6_interactome_config_run2.csv --db /analysisdata/fantom6/Interactome/ONT-CAGE_TALON_Callum/F6_interactome.db --build hg38 --threads 12 --o /analysisdata/fantom6/Interactome/ONT-CAGE_TALON_Callum/F6_interactome_run2

I kept the default 0.9 fraction alignment and 0.8 identity defaults

I was routing through the TALON QC log file because we are seeing many reads filtered out despite using cap-trap and oligo-dT alignment so sure we have good quality data. I actually found a potential issue that may account for a lot of reads having low fraction alignment due to my library prep and pychopper not trimming effectively the polyA tails from the FASTQ reads but then I saw an additional subset of alignments that were filtered out not because they were not primary alignments, nor failed either of the fraction aligned or identity filters.

I attach an upSet plot of the reasoning for an alignment passed to TALON to either pass or fail the QC step. You can see the third column has no reason to fail around 3.5M reads.

I was looking through the TALON_label log and I roughly saw around 0.5M reads with evidence of internal priming but from what I understand this doesn't factor for generating the talon database.

Is there some other behind the scenes filtering going on during database generation that isn't reported in the QC log?

The text was updated successfully, but these errors were encountered:

fairliereese · 2022-04-20T23:40:19Z

Your intuition that internal priming / the reproducibility filter should not be affecting these numbers is correct.

I'm looking into it otherwise. I've checked a log file that I have lying around and have found something similar :/ It does not seem to me that this should be happening. I will update you when I have found anything.

callumparr · 2022-04-21T03:23:08Z

Your intuition that internal priming / the reproducibility filter should not be affecting these numbers is correct.

I'm looking into it otherwise. I've checked a log file that I have lying around and have found something similar :/ It does not seem to me that this should be happening. I will update you when I have found anything.

Thank you for the reply and for looking into it. When I have the time I will look into this type of read failing and read characteristics.

fairliereese · 2022-04-21T16:17:19Z

If you're also planning to look into it on your end, here's some code that might be useful as a starting point: https://github.com/fairliereese/220421_talon_debug/blob/master/check_talon_log.ipynb

callumparr · 2022-05-19T08:02:14Z

I looked into it a bit more and I am still at a loss why some reads are failing. This was consistent across multiple samples although all processed the same so there is the possibility I am doing something weird.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many reads fail QC during TALON run but meet primary, coverage, and identity filters #98

Many reads fail QC during TALON run but meet primary, coverage, and identity filters #98

callumparr commented Apr 15, 2022

fairliereese commented Apr 20, 2022

callumparr commented Apr 21, 2022

fairliereese commented Apr 21, 2022

callumparr commented May 19, 2022

Many reads fail QC during TALON run but meet primary, coverage, and identity filters #98

Many reads fail QC during TALON run but meet primary, coverage, and identity filters #98

Comments

callumparr commented Apr 15, 2022

fairliereese commented Apr 20, 2022

callumparr commented Apr 21, 2022

fairliereese commented Apr 21, 2022

callumparr commented May 19, 2022