Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output interpretation #55

Open
abeu9727 opened this issue Mar 20, 2017 · 3 comments
Open

Output interpretation #55

abeu9727 opened this issue Mar 20, 2017 · 3 comments

Comments

@abeu9727
Copy link

Thankyou for providing this software. Sorry if this is a simple question but we are hoping you could provide some clarity and explanation of the output results. We would like to use this software for our analysis. We have run the pipeline on a few samples and have discovered a few different outputs and would like confirmation that we are interpreting the data correctly. The output below is from the bubble.joint.plain.k31.k61.geno.vcf files.

Our first set of output displays this. Would this be interpreted as Ck01 and Ck02 having the same base as the reference whilst Ck03 and Ck04 have the same base as the ALT?

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Ck01 Ck02 Ck03 Ck04

NC_020260.1 220 . G C . PASS BUBBLE=41257;K31 GT:K61R:K61A:GQ 1:57:0:. 1:72:0:. 1:0:31:. 1:0:230:.

NC_020260.1 839 . T C . PASS BUBBLE=15255;K31 GT:K61R:K61A:GQ 1:66:0:. 1:57:0:. 1:0:21:. 1:0:181:.

The second lot of output we are getting is this. What does it mean if there is only dots rather than coverage values?

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Ck01 Ck02 Ck03 Ck04

NC_020260.1 14366 . T C . PASS BUBBLE=2393;K31 GT:K61R:K61A:GQ .:.:.:. .:.:.:. .:.:.:. .:.:.:.

NC_020260.1 14385 . T G . PASS BUBBLE=2393;K31 GT:K61R:K61A:GQ .:.:.:. .:.:.:. .:.:.:. .:.:.:.

And finally we have some output where the GT is 0. How would this be interpreted? Also why is a GQ value provided when there is one isolate analysed but not when there are multiple isolates?

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Ck01

NC_020260.1 1103701 . A C . PASS BRKPNT=1507;K31;AC=1;AN=1 GT:K61R:K61A:GQ 0:51:10:20

NC_020260.1 1152696 . T G . PASS BRKPNT=1323;K31;AC=1;AN=1 GT:K61R:K61A:GQ 0:32:8:15

Would you also be able to provide an explanation for the difference between the breakpoints and bubble vcf files? We have noticed that some sites occur in one file type whilst in the other they are absent. Why does this occur? Also, is the main difference between the breakpoints.joint.plain.k31.k61.geno.vcf and breakpoints.join.plain.k31.k61.vcf is that the coverage is shown in the geno.vcf and only the GT values displayed in the other? Does the same apply to the bubble.joint vcf files?

Any help would be greatly appreciated.

Regards,

Alicia

@noporpoise
Copy link
Member

I'll update the docs when I get a chance, in the mean time I hope I can answer some of your questions briefly:

.:.:.:. are sites that could not be genotyped (no coverage or too much variation in the region).

The sample genotype information 0:51:10:20 means:

  • genotype 0 (REF allele) in a haploid sample. It could also take the value of 1 meaning ALT. In a diploid 0/1 means heterozygous REF/ALT, 0/0 = homozygous-REF, 1/1 = homozygous-ALT
  • 51 mean kmer coverage for the ref allele
  • 10 mean kmer coverage for the alt allele
  • 20 genotype confidence (Phred scored)

breakpoints.join.plain.k31.k61.vcf is generated by the breakpoint calling algorithm (it has not genotype information). We run genotyping on it to generate breakpoints.joint.plain.k31.k61.geno.vcf.

bubbles and breakpoints are two different variant calling algorithms we have developed. Which is best depends on the quality of your reference, coverage, number of samples and repeat content of the genome in question.

Simply:

  • bubbles does de novo assembly of your samples and compares them to each other. Differences between samples are then mapped to the reference to make a VCF.
  • breakpoints does de novo assembly of your samples and compares them to the reference to find differences.

@abeu9727
Copy link
Author

This explanation is very helpful. Thank you.

@abeu9727
Copy link
Author

abeu9727 commented May 9, 2017

Hi Isaac,

We are still having issues with the output file. I have sent you a few emails that include the file output. I have rerun the program after applying your update and it seems to have resolved the issue of genotyping for some samples but not others. Some help with this issue would be greatly appreciated.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants