Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paintor Pipeline. #2

Open
theboocock opened this issue Jun 11, 2015 · 10 comments
Open

Paintor Pipeline. #2

theboocock opened this issue Jun 11, 2015 · 10 comments

Comments

@theboocock
Copy link

Hi,

I have been writing a pipeline to help with Paintor analyses, automating the annotation collection LD etc etc, all the parts that will make PAINTOR hard to use for your average biologist. This should give it a far-wider reach as a method.

I don't know if this is of interest to you guys, but I will post the link to the repo anyways.
https://github.com/smilefreak/fine_mapping_pipeline
Thanks for creating this software.

@gkichaev
Copy link
Owner

Hi James!

Many thanks for your interest in our framework. I've been meaning to set something similar up but haven't had the time. I really appreciate the pipeline!! Please do link it to the repo.

Regarding the NLopt failure, I have had on occasion that it fails on me. What is the size of your locus?

@theboocock
Copy link
Author

Hi,

I have added reference to your repo and the citation to my README.md.

NLopt fails at different loci depending on how large I make the window etc, I assume this is because the algorithm cannot converge or something similar. Will find a specific failure today and upload to this thread.

In my analysis pipeline I basically just run each locus independently first, and if I get a failure running
PAINTOR I delete that locus from input.files. I investigated the exact error following thread on NLOpt

http://permalink.gmane.org/gmane.science.analysis.nlopt.general/278

I am yet to determine the exact failure but it looks like -10 was returned.

Cheers James.

@theboocock
Copy link
Author

I have added two test files to the following dropbox folder to illustrate the failure.

test1 and test1.LD contain a locus that is 200kb, I am using every SNP with a MAF > 2% in the European population. The region contains 1057 SNPs, is this a problem? Other regions with more SNPs do not fail. This is run with -c 1 otherwise the runtime is unreasonable.

test2 and test2.LD contain a locus that is 100kb and contains less SNPs (601). This locus has a very high-set of Z-scores, very significant locus could that be driving this problem.

https://www.dropbox.com/sh/quekc6qjn3nttrc/AAB4aqBLBY4IpvfLz1Iqlh1Na?dl=0

Cheers James

@gkichaev
Copy link
Owner

Hi James,

I noticed you have a lot of Zero-valued z-scores in your file (almost 1/3). Is there a particular reason why you left these in? I would definitely recommend removing them as (A) they will never be causal (B) you will see a large boost in speed by reducing the number of SNPs (and probably more numerical stability).

@theboocock
Copy link
Author

Hi Gleb,

Thanks, yes I will remove all those zeros.

Yes I have also worked out another problem I believe on my end.

Sometimes I find the haplotype R value is 100% different from the genotype R value I calculate in plink, I think that this has created
many downstream problems for me using these fine-mapping tool. This is definitely a problem because the Z-scores are going in
the wrong direction for the genotype R value. Not sure how this effects the results but something I am solving now, it definitely makes
a difference in the results of the newly published method caviarbf.

Do you know of a tool that will calculate the R-value from haplotypes instead of genotypes easily, Plink is no good here afaik.

Will fix these things and get back to you.

Thanks for your great support.
Also, the trans-ethnic fine-mapping you have added looks very interesting.
Thanks James.
On 16/06/2015, at 2:04 pm, Gleb Kichaev <notifications@github.commailto:notifications@github.com> wrote:

Hi James,

I noticed you have a lot of Zero-valued z-scores in your file (almost 1/3). Is there a particular reason why you left these in? I would definitely recommend removing them as (A) they will never be causal (B) you will see a large boost in speed by reducing the number of SNPs (and probably more numerical stability).


Reply to this email directly or view it on GitHubhttps://github.com//issues/2#issuecomment-112259305.

@gkichaev
Copy link
Owner

Hi James,

Yes, I would say the most challenging thing is getting your LD to match up correctly.

If the genotypes and haplotypes are based on the same sample, it is strange that the correlation coefficient you obtain from haplotypes is different than the genotypes-- mathematically they should be the same. Maybe you're having an issue with phasing?

Are you trying to calculate LD from a reference panel? If so, you need to make sure that the effect alleles for the z-score calculation match the effect allele (i.e the 1 allele) in the reference panel.

@theboocock
Copy link
Author

Ahhh, got it PLINK is flipping alleles from the reference (need to reset them) to minor/major.

Will just have to script that into my analysis. For anyone using plink to calculate the LD this is likely going
to be a problem.

Great help.

Thanks

@ghost
Copy link

ghost commented Oct 11, 2017

@theboocock @gkichaev - hey gentlemen, I am running v2.1 despite it being deprecated for specific reasons ...

I am getting the following error:

terminate called after throwing an instance of 'std::runtime_error'
what(): nlopt failure

This appears to happen when I am running v2.1 with annotations only, whether I am trying to get the marginal enrichment for each annotation or running the final model with annotated data. When I run it on Unannotated variants the algorithm runs to completion.

Is the best solution here to trim the average locus size (I am submitting ~90 loci) or to reduce the number of loci in the run?

@gkichaev
Copy link
Owner

@vlaufer. My suggestion would be to trim. Maybe filter out variants that have very low Z-scores (say in the meta analysis).

--Gleb

@ghost
Copy link

ghost commented Oct 12, 2017

@gkichaev yeah - this makes a lot of sense in particular in light of your reasoning in the PAINTOR3 manuscript.

I'll give it a try, and thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants