Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What input should be provided to MetaSTAAR_merge? #5

Closed
Hepit opened this issue Apr 30, 2024 · 8 comments
Closed

What input should be provided to MetaSTAAR_merge? #5

Hepit opened this issue Apr 30, 2024 · 8 comments

Comments

@Hepit
Copy link

Hepit commented Apr 30, 2024

Hello,

I've generated sumstat & cov files for my two cohorts using MetaSTAAR_worker_cov() & MetaSTAAR_worker_sumstat(), but I'm now unsure how to feed those files to MetaSTAAR_merge(), which seems to be the next step in the pipeline. The main point of my confusion is that MetaSTAAR_merge() has no argument where I can specify the name under which I saved the sumstat & cov files; it only has arguments where I can specify the directory. (For the record, I'm using version 0.9.6.3 of the R package.)

I also notice that the scripts that accompany your 2023 article don't seem to make use of MetaSTAAR_merge() at all, and rather use a custom MetaSTAAR_merge_simulation() function instead...

Could you shed some light on how these functions are supposed to be used?

Thanks !

@xihaoli
Copy link
Owner

xihaoli commented May 2, 2024

Hi @Hepit,

Thanks for reaching out. MetaSTAAR_merge_simulation() is specifically designed for performing simulation studies in the MetaSTAAR paper. It has the same core structure as MetaSTAAR_merge() but was simplified so that it could directly take the loaded sumstat.list and cov.list objects as input.

If you are running a meta-analysis from real data, please feel free to contact us via email and we can share more details.

Best,
Xihao

@Hepit
Copy link
Author

Hepit commented May 2, 2024

@xihaoli Thanks for the quick response ! Until I need to share data, I might keep the conversation here, if that is ok with you, just in case other people have the same questions I do.

Understood about MetaSTAAR_merge_simulation(). In that case, let me formulate my question differently : what is the sequence of functions that should be called if one is trying to use this R package to perform meta-analysis?

Currently, I have run MetaSTAAR_worker_cov() & MetaSTAAR_worker_sumstat() on my variantset for each of my two cohorts; I then saved the 4 resulting R objects as .rda files (using save().) My understanding is that the next step would be to pass them to MetaSTAAR_merge(); is this correct? If so, how can I do this, given that there is no filename argument in MetaSTAAR_merge()? If not, what would be the next step?

Thanks a lot for your time and advice !

@xihaoli
Copy link
Owner

xihaoli commented May 4, 2024

Hi @Hepit,

Sure no problem. In this case, let me ask a quick question. For MetaSTAAR_worker_cov(), how did you specify the input parameter of region_midpos? In other words, is the output of the function (sparse weighted LD matrix) a square (symmetric) matrix or a rectangle matrix? Knowing this would be more informative in answering your question.

Best,
Xihao

@Hepit
Copy link
Author

Hepit commented May 5, 2024

Hi @xihaoli,

I wasn't sure exactly what was wanted for that parameter, so the value I used was simply the median position of the variant set - but if there's a better way to go about it, I'm very happy to redo it.

I took a look at the .rda cov file, and the matrix is rectangular.

@xihaoli
Copy link
Owner

xihaoli commented May 5, 2024

Hi @Hepit,

Thanks for letting me know. MetaSTAAR has several general use cases:

(1) generate the cov files for all possible variants across the genome, so that meta-analysis of any variant set can be default after the cov files are shared. In this case, the cov files are rectangular matrices and region_midpos should be the midpoint position of the longer arm of the rectangle;

(2) generate the cov files for pre-defined variant sets, so that meta-analysis is performed for the underlying variant sets. In this case, the cov files are square matrices. Note that this is a special case of (1) by setting region_midpos in MetaSTAAR_worker_cov() to be the position of the last variant in the set.

To clarify, MetaSTAAR_merge() assumes that the cov files are rectangular, generated from the general case (1) above; whereas MetaSTAAR_merge_simulation() assumes that the cov files are square, generated from the case (2) above. If you have already had your variant set defined and genotype extracted, you can use MetaSTAAR_worker_cov() to generate the square cov matrices by setting region_midpos to be the last variant's position (see the file type1_sim_cluster_2kb_arrayjob_binary_1e9_N=50000_meta.R as an example), and then run MetaSTAAR_merge_simulation() for your meta-analysis.

Hope this helps.

Best,
Xihao

@Hepit
Copy link
Author

Hepit commented May 5, 2024

Hi @xihaoli,

Thanks, that does clarify a few things ! Still a few points of confusion though :

In case (1), what does the "rectangle" (and its "longer arm") refer to? If I'm passing an entire chromosome to MetaSTAAR_worker_cov(), is the midpoint simply the position in the middle of the chromosome?

Also regarding case (1) : does this mean I can generate a single cov file for the whole genome, and define my variant sets of interest later? If so, when (in which function) do I define them? And should I still use MetaSTAAR_merge_simulation() if I go this route?

Sorry for all these questions, and thanks a lot for your patience in answering them !

@xihaoli
Copy link
Owner

xihaoli commented May 5, 2024

Hi @Hepit,

The "rectangle" (and its "longer arm") refers to the illustration in Panel b of the Workflow Figure of the MetaSTAAR paper. The segment.size parameter in the MetaSTAAR_worker_cov() function specifies the chunk size. The midpoint would then be the midpoint position of the segment, which can also be illustrated in the workflow figure.

Yes, you can generate a single cov file for the whole genome, but it also depends on how large your dataset is. In practice, you can generate cov matrices with sufficiently large bandwidth (segment.size) so that they are large enough to cover all variant sets afterward. We have been preparing a follow-up package that utilizes the MetaSTAAR package as a building block and automates all the processes from generating summary stat / cov files to defining the variant sets and performing meta-analysis. The package is in the final testing stage, and I can add you to the repository so that you can make use of the full functionality there.

Let me know your plan.

Best,
Xihao

@Hepit
Copy link
Author

Hepit commented May 7, 2024

Hi @xihaoli,

Thanks for the explanation, that clarifies so much !

I'd love to be added to the repo. Thanks again !

@Hepit Hepit closed this as completed May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants