You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we assemble a new genome (for example with HiFiASM) we get five useful sequence files:
hap1.p_ctg.gfa
hap2.p_ctg.gfa
p_ctg.gfa
p_utg.gfa
r_utg.gfa
Each captures elements of the (diploid) genome. A future (pan) genome reference is even worse.
I would like to be able to specify a GFA1/2 file as a reference rather than a legacy FASTA file for aligning PacBio reads.
Aligning a PacBio IsoSeq with "allele-specific expression" or unexpected heterozygous differences would avoid the multimapper problem of alignment because the "path of the read" would be one valid route through the GFA2 graph. In the case of an identical duplication in two distinct places in the genome it would be nice to represent as a single sequence with an edge path joined to either "this chromosome" or "that" so the unique mapping would be like street junction where we either know that we are at the "junction" (conserved sequence on all paths) or turned left or right from the junction if the read is long enough to resolve the route.
Some code or process for merging GFA1/2 files via a multiple GFA alignment would facilitate the "all against all search" needed with FASTA views of a genome. So, given the GFA files from HifiASM plus a public FASTA reference like human T2T and merge all the possible paths into a single GFA2 pangenome in one file (to be used as a possible reference for a long read aligner). If we assemble multiple individuals then merging the old and new GFA2 pangenomes to represent the newly observed haplotype paths would be desirable.
The text was updated successfully, but these errors were encountered:
When we assemble a new genome (for example with HiFiASM) we get five useful sequence files:
hap1.p_ctg.gfa
hap2.p_ctg.gfa
p_ctg.gfa
p_utg.gfa
r_utg.gfa
Each captures elements of the (diploid) genome. A future (pan) genome reference is even worse.
I would like to be able to specify a GFA1/2 file as a reference rather than a legacy FASTA file for aligning PacBio reads.
Aligning a PacBio IsoSeq with "allele-specific expression" or unexpected heterozygous differences would avoid the multimapper problem of alignment because the "path of the read" would be one valid route through the GFA2 graph. In the case of an identical duplication in two distinct places in the genome it would be nice to represent as a single sequence with an edge path joined to either "this chromosome" or "that" so the unique mapping would be like street junction where we either know that we are at the "junction" (conserved sequence on all paths) or turned left or right from the junction if the read is long enough to resolve the route.
Some code or process for merging GFA1/2 files via a multiple GFA alignment would facilitate the "all against all search" needed with FASTA views of a genome. So, given the GFA files from HifiASM plus a public FASTA reference like human T2T and merge all the possible paths into a single GFA2 pangenome in one file (to be used as a possible reference for a long read aligner). If we assemble multiple individuals then merging the old and new GFA2 pangenomes to represent the newly observed haplotype paths would be desirable.
The text was updated successfully, but these errors were encountered: