Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VPhaser output intepretation - strand bias p value #789

Open
cromozome opened this issue Feb 9, 2018 · 3 comments
Open

VPhaser output intepretation - strand bias p value #789

cromozome opened this issue Feb 9, 2018 · 3 comments

Comments

@cromozome
Copy link

cromozome commented Feb 9, 2018

Hello

I have three questions pertaining to the VPhaser output

  1. How do I interpret the output below ??
    Pos Var Cons Strd_bias_pval Type Var_perc SNP_or_LP_Profile
    1425 A T 0.3718 snp 0.3173 A:3:3 C:3:1 G:1:1 T:1319:560
    1426 A G 0.756 snp 0.2629 A:2:3 G:1332:561 T:1:3

Specifically, How is it that I get such different p values for such a close allelic diversity profile ?
In my limited understanding I interpret a 0.7 strand bias p value to be higher evidence of strand bias and 0.37 to be poor evidence of strand bias, but it seems difficult to explain with such close allele counts (I did try a 2*2 Fishers Test with the above values, but got different results for p value...wondering it its because VPhaser does BH correction.)

  1. What does var_perc in the output mean ?

  2. I am assuming file.nofdr.var.txt is the file that has both the strand bias corrections and fdr corrections.. please correct me if I am wrong

@tomkinsc
Copy link
Member

tomkinsc commented Feb 9, 2018

The documentation for V-Phaser II can be found here:
http://software.broadinstitute.org/viral/docs/VPhaserII.pdf
From the docs, the output files are described:

{ReferenceName}.fdr.var.txt – the result of interest, where strand bias test + FDR (false
discovery rate) correction were used. {ReferenceName} is the name of the reference in
the input BAM file. Each variant entry consists of the
following: 
- the reference position (coordinate starts at 1)
- predicted variant
- consensus base
- strand-bias p-value
- type of variant (SNP or LP)
- frequency of the variant and the profile, where each entry consists of three values: the base, its count in the forward strand, and its count in the reverse strand, separated by colons.

{ReferenceName}.var.raw.txt – the raw variants without strand bias test.

{ReferenceName}.nofdr.var.txt – strand-bias test but no FDR correction

V-Phaser II was developed before my time at the Broad, so for methodological questions I'll point you to the original paper (10.1186/1471-2164-14-674), but perhaps @dpark01 can chime in. It does apply Benjamini-Hochberg correction, so perhaps that could explain the difference you're seeing in p-value.

@cromozome
Copy link
Author

Thanks a lot Chris!
Just so I am interpreting the results correctly, in the lines below
1425 A T 0.3718 snp 0.3173 A:3:3 C:3:1 G:1:1 T:1319:560
1426 A G 0.756 snp 0.2629 A:2:3 G:1332:561 T:1:3

Am I correct in interpreting the above as ~ 37% chance of strand bias is the first row and ~ 75% chance of strand bias in the second row ?

@cromozome
Copy link
Author

Also @tomkinsc would you recommend I rather use intrahost.py in the viral-ngs package ? (as I see it seems to achieve the same functionality)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants