Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of GFA2 as a pangenome reference #118

Open
oakeley opened this issue May 25, 2023 · 0 comments
Open

Use of GFA2 as a pangenome reference #118

oakeley opened this issue May 25, 2023 · 0 comments

Comments

@oakeley
Copy link

oakeley commented May 25, 2023

When we assemble a new genome (for example with HiFiASM) we get five useful sequence files:
hap1.p_ctg.gfa
hap2.p_ctg.gfa
p_ctg.gfa
p_utg.gfa
r_utg.gfa

Each captures elements of the (diploid) genome. A future (pan) genome reference is even worse.
I would like to be able to specify a GFA1/2 file as a reference rather than a legacy FASTA file for aligning PacBio reads.

Aligning a PacBio IsoSeq with "allele-specific expression" or unexpected heterozygous differences would avoid the multimapper problem of alignment because the "path of the read" would be one valid route through the GFA2 graph. In the case of an identical duplication in two distinct places in the genome it would be nice to represent as a single sequence with an edge path joined to either "this chromosome" or "that" so the unique mapping would be like street junction where we either know that we are at the "junction" (conserved sequence on all paths) or turned left or right from the junction if the read is long enough to resolve the route.

Some code or process for merging GFA1/2 files via a multiple GFA alignment would facilitate the "all against all search" needed with FASTA views of a genome. So, given the GFA files from HifiASM plus a public FASTA reference like human T2T and merge all the possible paths into a single GFA2 pangenome in one file (to be used as a possible reference for a long read aligner). If we assemble multiple individuals then merging the old and new GFA2 pangenomes to represent the newly observed haplotype paths would be desirable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant