Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry on PLACO Method from PLoS Genetics Paper (louchen2004@163.com) #2

Open
nvice111 opened this issue Nov 7, 2023 · 2 comments

Comments

@nvice111
Copy link

nvice111 commented Nov 7, 2023

I have used louchen2004@163.com to send an email to you containing all the questions mentioned earlier. Perhaps you could copy the content of the email response so that it can be visible to everyone on GitHub as well.

I am reaching out to discuss the PLACO technique introduced in your publication: Ray, D., Chatterjee, N. (2020) "A powerful method for pleiotropic analysis under composite null hypothesis identifies novel shared loci between Type 2 Diabetes and Prostate Cancer". PLoS Genetics 16(12): e1009218. I find the technique very useful and have several questions I hope you could clarify:

1.Can PLACO be used to analyze the relationship between a binary trait (such as ovarian cancer with case control) and a continuous variable (such as epigenetic age acceleration) with complete GWAS data, including beta, se, and pval?

2.Can PLACO be used to analyze between one continuous variable (such as accelerated epigenetic aging, with the unit being years) and another continuous variable (such as telomere length, with the unit being standard deviation change in log-transformed telomere length), both having complete GWAS data including beta, standard error (se), and p-value (pval)?

3.After de-correlating the Z-scores, can PLACO be used on datasets with substantial sample overlap, or almost complete overlap, such as two datasets from the UK Biobank?

4.In your GitHub, you mentioned that PLACO should not be used for highly correlated GWAS, stating "PLACO does not work well if the two traits are strongly correlated." Does this conflict with the approach taken in the paper published in JAMA Psychiatry titled "Role of the Gut-Brain Axis in the Shared Genetic Etiology Between Gastrointestinal Tract Diseases and Psychiatric Disorders: A Genome-Wide Pleiotropic Analysis," where PLACO analysis was conducted on trait pairs that exhibited high genetic correlation? Of course, they performed de-correlation of the Z-scores.

5.You mentioned in GitHub, "When samples are related, PLACO can use the summary statistics from EMMAX (or other univariate mixed model frameworks) to appropriately test for genetic associations." However, it seems that this step is not included in the PLACO R code. Is that the case, or is it that this step is not necessary?

6.You mentioned, "Harmonize the same effect allele across the two studies/traits so that Z-scores from the two datasets can be jointly analyzed appropriately using PLACO." For example, if I have traits A and B, and their effect alleles happen to be opposite, which trait's beta value should be flipped to be negative, and does the EAF need to be changed to 1-EAF? Or is it fine either way, just flipping the beta value of one trait? Is there anything else that needs to be flipped?
7.Because the computation time for PLACO is very lengthy, I believe the coding approach can be altered a bit. Since the range of the square of the Z score is from 0 to 80, we only need to first compute the VarZ for this pair of traits, then calculate each Z1Z2 (for calculating p.placo, only the absolute value of Z1Z2 is used, which ranges from 0 to 80, and I've set the interval at 0.001), and determine the corresponding p.placo value for this VarZ. This requires only 80,000 calculations, and the result will be the same as the original code (which requires computing the p.placo for each SNP in the entire GWAS, totaling millions of calculations). If you find any errors, please let me know. I have only changed the order of computation, not the calculation rules of PLACO.

Here is my code:

#First calculate VarZ, which is different for each pair of traits.

VarZ <- var.placo(Z.matrix.decor, P.matrix, p.threshold = 1e-04)
varz<-VarZ

print("Now let's start deriving the Zplus quantity.")

#Redefine the function, the meaning remains unchanged, z[1]*z[2] has been replaced with zplus
.pdfx<-function(x) besselK(x=abs(x),nu=0)/pi
.p.besselchange<-function(zplus, varz, AbsTol=1e-13){

p1<-2as.double(integrate(Vectorize(.pdfx),abs(zplus/sqrt(varz[1])),Inf,abs.tol=AbsTol)$value)
p2<-2
as.double(integrate(Vectorize(.pdfx),abs(zplus/sqrt(varz[2])),Inf,abs.tol=AbsTol)$value)
p0<-2*as.double(integrate(Vectorize(.pdfx), abs(zplus),Inf, abs.tol=AbsTol)$value)
pval.compnull<-p1+p2-p0
return(pval.compnull)
}

zplus_values <- seq(0, 80, by=0.001)

pb <- txtProgressBar(min = 0, max = length(zplus_values), style = 3)

list_results <- lapply(seq_along(zplus_values), function(i) {
  setTxtProgressBar(pb, i)
  p_value <- .p.besselchange(zplus = zplus_values[i], varz = varz)
  c(p.placo = p_value, Zplus = zplus_values[i])
})

close(pb)

results <- do.call(rbind, list_results)

results<-as.data.frame(results)
head(results)
    

Z.matrix.decor <- cbind(Z.matrix.decor, abs(Z.matrix.decor[,1] * Z.matrix.decor[,2]))
colnames(Z.matrix.decor)[3] <- "Zplus"
Z.matrix.decor[, "Zplus"] <- round(Z.matrix.decor[, "Zplus"], 3)
Z.matrix.decordataframe<-as.data.frame(Z.matrix.decor)
head(Z.matrix.decordataframe)
head(t)

combined_df <- cbind(t, Z.matrix.decordataframe)
head(combined_df)

combined_df$Zplus <- as.character(combined_df$Zplus)

#Convert to string type, otherwise there will be errors in matching floating-point numbers.
results$Zplus <- as.character(results$Zplus)

out_2 <- merge(combined_df, results, by = "Zplus", all.x = TRUE)
out_2$T.placo<-out_2$Zplus
head(out_2)

I appreciate your time and assistance in answering these questions and am looking forward to implementing the PLACO method in my research.

@Lloyd-LiuSiyi
Copy link

Lloyd-LiuSiyi commented Nov 27, 2023

6.You mentioned, "Harmonize the same effect allele across the two studies/traits so that Z-scores from the two datasets can be jointly analyzed appropriately using PLACO." For example, if I have traits A and B, and their effect alleles happen to be opposite, which trait's beta value should be flipped to be negative, and does the EAF need to be changed to 1-EAF? Or is it fine either way, just flipping the beta value of one trait? Is there anything else that needs to be flipped?

Hi @nvice111, when harmonizing two summary statistics I personally just flip one of the beta values to its negative, and EAF does not need to be flipped since EAF is not included in the matrix. Here is how I performed harmonization:

dat = merge(trait1, trait2, by=c("SNP", "chr"))
reversed_rows = (dat$A1.x == dat$A2.y) & (dat$A2.x == dat$A1.y)
dat$beta.y[reversed_rows] = -dat$beta.y[reversed_rows]
dat$A1.y[reversed_rows] = dat$A1.x[reversed_rows]
dat$A2.y[reversed_rows] = dat$A2.x[reversed_rows]

discard_rows = !(dat$A1.x %in% c(dat$A1.y, dat$A2.y) & dat$A2.x %in% c(dat$A1.y, dat$A2.y))
dat = dat[!discard_rows, ]

I look forward to your implementation.

@nvice111
Copy link
Author

@Lloyd-LiuSiyi

The EAF parameter does not participate in the calculation. I think your point of view is correct.

The following is the reply sent to me by the author of the PLACO method in an email on November 7, 2023, you can also refer to it:
Hi Lou,

Thanks for your interest in using PLACO. Responses below. Additionally, I’d suggest you read the paper thoroughly, including results from simulation experiments and materials in supplementary.

1.We are currently exploring this in a manuscript, and so I don’t have a definitive answer yet.
2.You can analyze two continuous traits as long as their overall correlation is low (e.g., Pearson correlation r^2 < 0.5). It is discussed in the paper.
3.No. If you have substantial sample overlap between 2 traits, PLACO will give you false positives even after you have decorrelated the Z-scores. This is discussed in the paper as well. Note, we are working on the manuscript to extend PLACO to include these situations.
4.I can’t comment on this since I have not read the paper you referred.
5.PLACO directly takes in summary statistics. It does not generate summary statistics for the user. What software you use to generate summary statistics (e.g. PLINK, EMMAX, trio) is up to the user as long as you get the necessary inputs needed for PLACO. I worked on a paper using PLACO on summary stats obtained from trio-based data on cleft; you may find it helpful (https://doi.org/10.1371/journal.pgen.1009584)
6.Which trait’s beta value you flip depends on which allele you choose as effect allele. Suppose trait 1 has effect allele A with beta=0.64 and trait 2 has effect allele T with beta=-0.12. If you decide to fix the effect allele of trait 2 for all variants as your effect allele for PLACO, then you need to flip the beta of trait 1 and also do 1-EAF for trait 1. You can similarly choose to fix the effect allele of trait 1 as your effect allele, in which case you flip those in trait 2.
7.Sorry I don’t have time to go through individual user’s code. I’d recommend not tinkering with PLACO internal code. Since PLACO is a single variant test implemented in R, it can be slow compared to other single variant methods implemented in say C++. If you have to speed up computation, you can break up chromosomes into chunks; also you can skip using apply() type R functions because they can slow you down for a large number of variants. Note, the parameter estimation part needs to be done on all variants together, so you can do that first; then have a separate code for applying PLACO on single variants from smaller chunks of chromosome parallelly.

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants