Mapping hits using a pangenome #183

LeonardosMageiros · 2021-11-22T11:44:06Z

Hi,

I have executed pyseer and I would like to map my unitigs not to the list of my input files but back to just one fasta/gff file that represents my pangenome.

This file contains representative sequences of all the genes in my dataset (produced using roary). So if I understand things correctly using it as a ref in your annotate script should map all the hits in my results back to a gene. Nonetheless I have many unmapped unitigs in my output.

Am I missing something?
Any help would be much appreciated.

Best
Leonardos

johnlees · 2021-11-23T10:35:20Z

For a single reference, you probably want to use this method rather than the annotate script:
https://pyseer.readthedocs.io/en/master/usage.html#mapping-to-references-phandango

There will be many unitigs from assemblies that are not in a roary pangenome - those in intergenic regions, those not annotated correctly in the input etc.
This also sounds like a mapping issue if you're using representatives. Some genes are very diverse and you may need to use different settings for the mapping, or a more sensitive aligner such as bowtie2.

This isn't something we directly support in pyseer however

LeonardosMageiros · 2021-11-23T12:35:40Z

Thank you very much for your quick response.
Things are clearer now.

Just one last question to see if my understanding is correct.

I have a dataset of ~1500 strains and I have performed 2 GWAS runs on two different subsets of isolates.
My goal is to find the overlap of these two runs on the gene level.

If I use the annotate script with my pangenome on top as ref and then all the strains participating in the two GWAS runs below as draft sequences, would the mapping of the unitigs be consistent in the two cases so I can calculate the overlapping genes?

Do you see any reason for the above approach not to work and if yes do you suggest another approach to tackle my problem?

Thank you very much in advance
Leonardos

johnlees · 2021-11-23T17:00:25Z

Annotation is all done through mapping, so is with respect to whichever reference you use. If you use the same sequences to map to, they will have the same co-ordinate system so can be compared directly.

LeonardosMageiros · 2021-12-15T13:30:27Z

Can I also ask if there is a way to map the results of a gene presence absence analysis to the same sequences as I map my unitigs GWAS?

mgalardini self-assigned this Jun 8, 2022

mgalardini added scripts improvements labels Jun 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mapping hits using a pangenome #183

Mapping hits using a pangenome #183

LeonardosMageiros commented Nov 22, 2021

johnlees commented Nov 23, 2021

LeonardosMageiros commented Nov 23, 2021 •

edited

johnlees commented Nov 23, 2021

LeonardosMageiros commented Dec 15, 2021

Mapping hits using a pangenome #183

Mapping hits using a pangenome #183

Comments

LeonardosMageiros commented Nov 22, 2021

johnlees commented Nov 23, 2021

LeonardosMageiros commented Nov 23, 2021 • edited

johnlees commented Nov 23, 2021

LeonardosMageiros commented Dec 15, 2021

LeonardosMageiros commented Nov 23, 2021 •

edited