Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: multisample vcf and existing alignement #3

Open
leone93 opened this issue Jan 25, 2024 · 1 comment
Open

Request: multisample vcf and existing alignement #3

leone93 opened this issue Jan 25, 2024 · 1 comment

Comments

@leone93
Copy link

leone93 commented Jan 25, 2024

Hello, thanks for the tool!
However, I have two ideas that could be nice to implement in the software.
The first is related to the use of multisample vcf. I think will be interesting if it can work with multisample vcf (for instance, the one produced by survivor) to have a clear representation of the genotype for each variant for each sample. And also discriminate where the sample homozygous and where the is no information about the genotype. This will also be nice using the capability of multi alignment of minigraph.
The second point is more related to old version of SVjedi; I think will be nice to be able to use other alignments (maybe in bam format) instead of doing a new one. I mean, to obtain the VCF, you probably have the alignments where it comes from, so it will be nice to use these alignments to genotype the SVs. Thinking about multisample will be nice to give the software a list of bam, the same present in the vcf and genotype the multisample.
Let me know what you think,
Thanks
Leo

@clemaitre
Copy link
Collaborator

Hi Leo,

thank you for your comment and ideas and sorry for the late reply.

Concerning the multisample idea, SVJedi-graph can already take as input a multisample VCF. SVJedi-graph uses only the description of the SVs in the VCF file, it does not use the genotype column(s). For the moment it estimates the genotype of each SV for only one sample given as one or several sequencing read files (fastq). If you want to get the genotypes for several samples, the only way for the moment, is to run SVJedi-graph several times (one for each sample) and then combine the genotype columns of all output VCF in a single VCF file. We are thinking of automating this process in a single SVJedi-graph run, but we have not found the time to dot it yet...

Concerning the second point, this is not possible to re-use a previous bam file obtained by mapping the reads to the reference genome, because SVjedi does not map reads to the reference genome but to a specific set of sequences obtained from the SV descriptions. It maps simultaneously reads on the reference and alternative allele sequences, to avoid any biais towards the reference alleles. Re-using a previous bam with reads mapped only to reference alleles would introduce such a bias.
By the way, we recommend to use SVJedi-graph rather than SVJedi because it obtains much better results, especially when there are close or overlapping SVs in the input SV call set.

Best,
Claire

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants