Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid genotypes #1

Open
brettva opened this issue Jun 30, 2023 · 6 comments
Open

invalid genotypes #1

brettva opened this issue Jun 30, 2023 · 6 comments

Comments

@brettva
Copy link

brettva commented Jun 30, 2023

Thank you for developing this tool, it will be quite handy for us.

In my merges I have been getting invalid genotypes (eg 0/-44) in addition a mixture of phased and unphased sites.

I imputed some publicly available HGDP samples on MIS to demonstrate this issue here

Do you advice on how to proceed? Hopefully I am not doing something silly. Thanks

@jonathonl
Copy link
Contributor

jonathonl commented Jun 30, 2023

Thanks for providing this example. I'm not seeing the same output as you. What operating system are you running on (and what operating system did you compile hds-util on)? I'd like to reproduce your environment.

@brettva
Copy link
Author

brettva commented Jun 30, 2023

@jonathonl Thank you so much for getting back to me so fast. I tried both compiling and running independently on the csg and armis clusters and seem to see the issue in both envs

I am not sure what details would be most helpful for you, but here are a few:

csg:

lsb_release -a:

LSB Version:    core-11.1.0ubuntu2-noarch:security-11.1.0ubuntu2-noarch
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:        20.04
Codename:       focal

gcc --version:
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

armis:

lsb_release -a:

LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterprise
Description:    Red Hat Enterprise Linux release 8.6 (Ootpa)
Release:        8.6
Codename:       Ootpa

gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10)

Not sure if it matters but I believe in both cases the version of sav that was available at time of compiling was:

sav v2.1.0

If you need any other info please let me know

@jonathonl
Copy link
Contributor

This should now be fixed with 763bb2d. Please rebuild with latest from master branch.

@brettva
Copy link
Author

brettva commented Jun 30, 2023

@jonathonl I really appreciate the time you put into fixing that, especially so fast. It looks better on my end now.

Another quick question, and sorry if I am missing it somewhere. We certainty want MAF and Rsq recomputed in our merged data, but what is the point of recomputing DS, GT , GP from HDS?

Is it just so that these numbers can be recapitulated from the HDS that appears in the VCF? Regardless is it always recommended to update DS, GT , GP with -f DS, GT , GP when merging, iiuc this issue at least Rsq is originally based off more precise HDS than what is seen in the VCF.

@jonathonl
Copy link
Contributor

It's recomputed for the sake of simpler code. There is a plan for future versions of the imputation server to only export HDS in the output files in order to reduce compute and storage costs. Most people don't need all four FORMAT fields, so hds-util allows you to generate only the fields needed by a user for downstream analysis.

In the latest version of Minimac4, the Rsq is computed after the precision loss. But Imputation Server is still using the older version so that issue would still apply. In any case, the median difference is quite small and I suspect it would have negligible effects on Rsq filtering strategies.

@brettva
Copy link
Author

brettva commented Jun 30, 2023

@jonathonl That makes a lot of sense thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants