Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to give ploidy a non missing value in a dosage data set #86

Open
nickychapman opened this issue Oct 9, 2023 · 3 comments
Open

how to give ploidy a non missing value in a dosage data set #86

nickychapman opened this issue Oct 9, 2023 · 3 comments

Comments

@nickychapman
Copy link

nickychapman commented Oct 9, 2023

Background: I've used imputedDosageFile to get my Beagle output in a format I can use with SeqArray. Then I used seqSNP2GDS to make a file for use with SeqVarTools. This object only includes dosage data, no genotypes. I am running into problems using GENESIS because it doesn't like that ploidy is NA. How can I tweak the object so ploidy is non missing? I think I probably need to put some dummy data in the genotype node but I can't figure that out.

Here is what my object looks like.

genodata
Object of class "SeqVarGDSClass"
File::panel_01_chr22.dose.array.gds (267.7K)

  • [ ] *
    |--+ description [ ] *
    |--+ sample.id { Str8 628 LZMA_ra(22.3%), 1.2K } *
    |--+ variant.id { Int32 429 LZMA_ra(24.4%), 425B } *
    |--+ position { Int32 429 LZMA_ra(79.4%), 1.3K } *
    |--+ chromosome { Str8 429 LZMA_ra(7.30%), 101B } *
    |--+ allele { Str8 429 LZMA_ra(22.0%), 385B } *
    |--+ genotype [ ] *
    |--+ phase [ ]
    |--+ annotation [ ]
    | |--+ id { Str8 429 LZMA_ra(24.0%), 393B } *
    | |--+ qual { Float32 429 LZMA_ra(5.71%), 105B } *
    | |--+ filter { Int32,factor 429 LZMA_ra(5.71%), 105B } *
    | |--+ info [ ]
    | --+ format [ ]
    | --+ DS [ ] *
    | --+ data { Float64 628x429 LZMA_ra(12.4%), 260.1K } *
    --+ sample.annotation [ ]
@smgogarten
Copy link
Contributor

Currently SeqVarTools determines ploidy of a GDS file using the first dimension of the genotype node: https://github.com/smgogarten/SeqVarTools/blob/devel/R/AllUtilities.R#L20

Is there an alternate way to get the ploidy if that value is missing?

@nickychapman
Copy link
Author

Here's some background to this predicament, in case that is where the problem lies:

I've been using imputedDosageFile (from GWASTools) and then seqSNP2GDS (from SeqArray) to go from a Beagle output file of dosages to a SeqVarGDS object.

In an older version of R (4.1.2), using GWASTools 1.40.0 and seqArray 1.34.0 this works fine and in the resulting object the ploidy is 2.

In a newer version of R (4.3.1), using GWASTools 1.46.6 and SeqArray 1.40.1, the resulting object has ploidy NA. This trips me up when I go to use assocTestSingle in GENESIS.

For now I am going to use the old version of R and proceed as I was before.

@zhengxwen
Copy link
Owner

Currently SeqVarTools determines ploidy of a GDS file using the first dimension of the genotype node: https://github.com/smgogarten/SeqVarTools/blob/devel/R/AllUtilities.R#L20

Is there an alternate way to get the ploidy if that value is missing?

In seqAlleleFreq(), if ploidy is NA (no genotype node, but a numeric dosage node), it will be set to be

ploidy <- getOption("seqarray.ploidy", 2L)

If the environment variable seqarray.ploidy is not defined, then just use 2 as a default value.

This might help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants