Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing data in ldNe #36

Open
jdalapicolla opened this issue Jun 8, 2020 · 4 comments
Open

Missing data in ldNe #36

jdalapicolla opened this issue Jun 8, 2020 · 4 comments

Comments

@jdalapicolla
Copy link

Hello Eric,

My name is Jeronymo and I'd like to estimate Ne using strataG similarly to NeEstimator. My dataset shows 6.3% of missing, represent one panmictic population, and when I used the function ldNe:

Ne = ldNe(snps_gtypes, maf.threshold = 0, by.strata = TRUE, ci = 0.95, drop.missing = TRUE, num.cores = 4)

Ne is estimated excluding missing values, same values than NeEstimator v2.1. When I used "drop.missing = TRUE", NULL is returned:

Warning message:
Can't compute ldNe in '1' because loci are missing genotypes and 'drop.missing = FALSE'. NULL returned.

I would like to know how I should proceed in your opinion:

  1. Am I doing something wrong? My script is here: https://github.com/jdalapicolla/Ne_StrataG.R/blob/master/Ne_Estimation.R
  2. Should I impute missing values by the mode or mean? If yes, what functions or packages do you recommend?
  3. Should I use NeEstimator when there are missing values in my dataset?
  4. Is there any way to implement the NeEstimator's solution for missing values in the function ldNe?

Thank you so much for your time,
Best regards,
Jeronymo Dalapicolla

@EricArcher
Copy link
Owner

First, is there a typo in above? Do you get the warning message and NULL returned when you have 'drop.missing = FALSE'? Your text says that it works when you have it set to TRUE, but then you say when you set it to TRUE you get the message.

If it works when 'drop.missing = TRUE', but you get the warning message when 'drop.missing = FALSE', there is nothing being done wrong. Do you get the same values as NeEstimator when you use 'drop.missing = TRUE'? I would hope so as I believe NeEstimator is dropping missing values. I do not think they impute them somehow. The option in the ldNe() function is there to be explicit that missing data is being dropped.

@jdalapicolla
Copy link
Author

Hi Eric,

Yes, it's a typo, I'm so sorry. The correct is: When I used "drop.missing = TRUE" the Ne is estimated correctly, same values than NeEstimator v2.1 without missing data. When I used "drop.missing = FALSE", NULL is returned.

Do you get the same values as NeEstimator when you use 'drop.missing = TRUE'?
Actually, I got different values using my dataset with missing data. Thus I searched for an explanation. The Help documentation of NeEstimator said they don't drop the missing data but the correct the LD following Peel et al. (2013):

"Based on findings from a simulation study (Peel et al. 2013), the software
implements a fixed-inverse variance-weighted harmonic mean correction for
missing data for the linkage disequilibrium and temporal methods (refer to
methodological outlines below).
The new method for correcting for missing data calculates or for each locus or
locus pair (using the sample size for that locus) and then computes a weighted
harmonic mean effective size across all loci, with weights proportional to the
number of independent comparisons. If sample size is identical across loci, this
should produce a result identical to previous methods (eg Waples & Do 2008)."

So, I excluded all missing data and reran ldNe (with drop.missing = TRUE) and NeEstimator, and this time, WITHOUT missing, both results were identical.

I would hope so as I believe NeEstimator is dropping missing values. I do not think they impute them somehow.
Yes, you're right. They don't impute them, they corrected by the sample size in each locus, if I understood correctly.

The option in the ldNe() function is there to be explicit that missing data is being dropped.
I see. So, do you not recommend using dataset with missing data to Ne estimation?

The reference is:
Peel D, Waples RS, Macbeth GM, Do C, Ovenden JR (2013) Accounting for missing data in the estimation of contemporary genetic effective population size (Ne). Molecular Ecology Resources 13(2):243-253 doi:Doi 10.1111/1755-0998.12049

Thank you again,
Jeronymo

@EricArcher
Copy link
Owner

You're right that strataG::ldNe() does not currently implement this correction from Peel et al 2013. When I have some time, I could look into it further and see if it makes sense to do so.
If your missing data is MCAR, I would think that dropping the missing data should produce an unbiased estimate. If you have enough loci, it shouldn't affect the precision of the estimate too much.

@jdalapicolla
Copy link
Author

Thank you so much for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants