Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve/add UMI deduplication metrics #1277

Open
ppericard opened this issue Apr 2, 2024 · 3 comments
Open

Improve/add UMI deduplication metrics #1277

ppericard opened this issue Apr 2, 2024 · 3 comments
Milestone

Comments

@ppericard
Copy link
Contributor

Description of feature

Hello ^^
I'm having difficulties finding easy to understand stats on UMI deduplication in the outputs. It seems there is no section in the multiqc output, not even in the statistics table (where I would expect to have metrics about nb of reads before dedup, nb of reads after dedup, % duplication (from umi-tools dedup on alignements)).
In the output directory, I'm also not finding a log with easy to understand metrics from umi-tools dedup. I'm probably missing something.
Thanks in advance. Pierre

@MatthiasZepper
Copy link
Member

Since people complained about the poor performance, the generation of deduplication statistics if off by default now.

You have to set the parameter --umitools_dedup_stats respectively umitools_dedup_stats : true in a params file to activate that functionality.

@ppericard
Copy link
Contributor Author

ppericard commented Apr 24, 2024

Hi @MatthiasZepper,
I'm sorry if i wasn't clear enough in my initial message. All my comments apply to the pipeline while having activated the --umitools_dedup_stats parameter.
In the *.umi_dedup.transcriptome.filtered.prepare_for_rsem.log files there are no summaries with the dedup stats, and the other files are not very informative and easy to read: *.umi_dedup.sorted_edit_distance.tsv, *.umi_dedup.sorted_per_umi_per_position.tsv, *.umi_dedup.sorted_per_umi.tsv. There is a real need for an easy to read and understand summary for deduplications, such as the one that can be obtained through Multiqc parsing of the UMI tools for exemple (MultiQC/MultiQC#1769).
Right now, as a user I have even less information about deduplication than what I would have in the logs just by running the umi-tools dedup command.

@MatthiasZepper
Copy link
Member

Apologies for stonewalling on this issue before. While hunting down the cause for issue #1303, it occurred to me that probably a botched MultiQC config is behind this issue as well. For some reason, we explicitly specify the MultiQC modules to be run and UMI-tools is nowhere to be found.

Since we run MultiQC with a custom config outside the pipeline again, we did not notice.

It should be fixed on this branch, but I struggle with testing at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants