Finding a suitable reference for a set of genomes #26

MostafaYA · 2022-05-02T12:37:43Z

Hello, thanks for this great tool.
Just a question:
I wonder how to select the appropriate reference for a set of (diverse) genomes.
When I run the referenceseeker in this case, it gives different reference for each genome.

oschwengers · 2022-05-02T20:02:12Z

Hi @MostafaYA,
thanks for this excellent question! This is indeed an interesting use case and we already started to work on a solution for that. However, this will still take a while. Maybe we can provide a solution for that at the end of this year .

pvanheus · 2024-01-15T06:07:58Z

@oschwengers any update on that work? I'm wondering what the best approach would be here? Two passes, the first that finds all candidates for all samples and the second that computes distance to each of these candidates and finds the one with the lowest average distance?

oschwengers · 2024-02-01T09:12:03Z

Thanks @pvanheus for bringing this up again. Actually, this just slipped down my priority list. But if there is still a need for and interest in that, I would try to work on this as a side-side project. Unfortunately, I cannot make any reliable commitments to this right now.

Regarding the WF: right as you mentioned: First we have to calculate approx. genome distances (for instance Mash) as a rough estimate to select reference candidates. Then we have to compute ANI between all query and reference candidates and then rank & select these references. The main task we tried to work on is how to best rank the reference genomes as ANI difference of course can differ a lot between a reference and the given query genomes. How to handle harsh outliers for example? As a simple approach we played around with classic arithmetic/geometric/harmonic means....

oschwengers self-assigned this May 2, 2022

oschwengers added the enhancement New feature or request label May 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finding a suitable reference for a set of genomes #26

Finding a suitable reference for a set of genomes #26

MostafaYA commented May 2, 2022

oschwengers commented May 2, 2022

pvanheus commented Jan 15, 2024

oschwengers commented Feb 1, 2024

Finding a suitable reference for a set of genomes #26

Finding a suitable reference for a set of genomes #26

Comments

MostafaYA commented May 2, 2022

oschwengers commented May 2, 2022

pvanheus commented Jan 15, 2024

oschwengers commented Feb 1, 2024