Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it necessary to use the --merge-alleles parameter when doing genetic correlation in ldsc? #425

Open
dqq0404 opened this issue Mar 21, 2024 · 8 comments

Comments

@dqq0404
Copy link

dqq0404 commented Mar 21, 2024

Hi,
I have two sumstats, before doing genetic correlation, I have intersected the SNPs of the two data and generated the sumstats.gz files. When I do rg, it reports this error:

./ldsc.py
--ref-ld-chr eur_w_ld_chr/
--out ...
--rg ...sumstats.gz,...sumstats.gz
--w-ld-chr eur_w_ld_chr/

Reading summary statistics from ...1_sumstats.gz ...
Read summary statistics for 2629833 SNPs.
Reading reference panel LD Score from eur_w_ld_chr/[1-22] ...
Read reference panel LD Scores for 1290028 SNPs.
Removing partitioned LD Scores with zero variance.
Reading regression weight LD Score from eur_w_ld_chr/[1-22] ...
Read regression weight LD Scores for 1290028 SNPs.
After merging with reference panel LD, 522341 SNPs remain.
After merging with regression SNP LD, 522341 SNPs remain.
Computing rg for phenotype 2/2
Reading summary statistics from ...2_sumstats.gz ...
Read summary statistics for 2689050 SNPs.
After merging with summary statistics, 522329 SNPs remain.
522328 SNPs with valid alleles.
ERROR computing rg for phenotype 2/2, from file ...2_sumstats.gz.
Traceback (most recent call last):
File "...sumstats.py", line 409, in estimate_rg
loop = _read_other_sumstats(args, log, p2, sumstats, ref_ld_cnames)
File "...sumstats.py", line 441, in _read_other_sumstats
loop['Z2'] = _align_alleles(loop.Z2, alleles)
File "...sumstats.py", line 517, in _align_alleles
raise KeyError(msg)
KeyError: 'Incompatible alleles in .sumstats files: AGAC. Did you forget to use --merge-alleles with munge_sumstats.py?'

When I use the --merge-alleles to match w_hm3.snplist, it can work. I think this may be caused by the unequal number of SNPs in the two sumstats.gz data. But when I use w_hm3.snplist, although the total number of SNPs written is the same, the number of SNPs with effect sizes is also inconsistent.The following is the number of SNPs written after using w_hm3.snplist for two sumstats.gz data:
1:Writing summary statistics for 1217311 SNPs (524794 with nonmissing beta)
2:Writing summary statistics for 1217311 SNPs (532061 with nonmissing beta)
.How can this problem be solved? Do I have to use the --merge-alleles parameter?

@aksarkar
Copy link

@dqq0404 The code does not properly handle indels. The simplest solution is to remove indels from your input data.

@dqq0404
Copy link
Author

dqq0404 commented Apr 6, 2024

@dqq0404 The code does not properly handle indels. The simplest solution is to remove indels from your input data.

Thanks!! I have solved this problem. I want to ask that can I use the parameter--ref-ld-chr eur_w_ld_chr/ to calculate the gcov_intercept in order to calculate potential sample overlap? Or do I have to calculate my own ldsccore as a template?

@aksarkar
Copy link

aksarkar commented May 5, 2024

@dqq0404 No, you cannot use ldsc to calculate sample overlap since it assumes there was none.

The simplest way to estimate sample overlap from summary statistics is to compute the correlation between null z-scores (absolute value < 2).

@dqq0404
Copy link
Author

dqq0404 commented May 6, 2024 via email

@aksarkar
Copy link

aksarkar commented May 6, 2024

@dqq0404 You can calculate the intercept of the regression using reference LD scores.

Note that LAVA is estimating the variance-covariance matrix of the effect sizes, not the sample overlap.

@dqq0404
Copy link
Author

dqq0404 commented May 7, 2024 via email

@leoarrow1
Copy link

If ldsc is applied on non-EUR gwas summary data, does w_hm3.snplist also work? (--merge-alleles w_hm3.snplist)

@aksarkar
Copy link

@leoarrow1 Yes, assuming that you are using reference weights computed for the same SNPs in w_hm3.snplist using reference genotypes of individuals of similar genetic ancestry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants