Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many reads fail QC during TALON run but meet primary, coverage, and identity filters #98

Open
callumparr opened this issue Apr 15, 2022 · 4 comments

Comments

@callumparr
Copy link

Using TALON v5 installed python setup.py install on HPC running Debian

Using python version 3.6.7

talon --f /analysisdata/fantom6/Interactome/ONT-CAGE_TALON_Callum/F6_interactome_config_run2.csv --db /analysisdata/fantom6/Interactome/ONT-CAGE_TALON_Callum/F6_interactome.db --build hg38 --threads 12 --o /analysisdata/fantom6/Interactome/ONT-CAGE_TALON_Callum/F6_interactome_run2

I kept the default 0.9 fraction alignment and 0.8 identity defaults

I was routing through the TALON QC log file because we are seeing many reads filtered out despite using cap-trap and oligo-dT alignment so sure we have good quality data. I actually found a potential issue that may account for a lot of reads having low fraction alignment due to my library prep and pychopper not trimming effectively the polyA tails from the FASTQ reads but then I saw an additional subset of alignments that were filtered out not because they were not primary alignments, nor failed either of the fraction aligned or identity filters.

I attach an upSet plot of the reasoning for an alignment passed to TALON to either pass or fail the QC step. You can see the third column has no reason to fail around 3.5M reads.

I was looking through the TALON_label log and I roughly saw around 0.5M reads with evidence of internal priming but from what I understand this doesn't factor for generating the talon database.

Is there some other behind the scenes filtering going on during database generation that isn't reported in the QC log?

iPSC_rep1_run1_UpSetR

@fairliereese
Copy link
Member

Your intuition that internal priming / the reproducibility filter should not be affecting these numbers is correct.

I'm looking into it otherwise. I've checked a log file that I have lying around and have found something similar :/ It does not seem to me that this should be happening. I will update you when I have found anything.

@callumparr
Copy link
Author

Your intuition that internal priming / the reproducibility filter should not be affecting these numbers is correct.

I'm looking into it otherwise. I've checked a log file that I have lying around and have found something similar :/ It does not seem to me that this should be happening. I will update you when I have found anything.

Thank you for the reply and for looking into it. When I have the time I will look into this type of read failing and read characteristics.

@fairliereese
Copy link
Member

If you're also planning to look into it on your end, here's some code that might be useful as a starting point: https://github.com/fairliereese/220421_talon_debug/blob/master/check_talon_log.ipynb

@callumparr
Copy link
Author

I looked into it a bit more and I am still at a loss why some reads are failing. This was consistent across multiple samples although all processed the same so there is the possibility I am doing something weird.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants