Skip to content
This repository has been archived by the owner on Mar 10, 2023. It is now read-only.

Memory usage #4

Open
TaruMuranen opened this issue Apr 25, 2018 · 3 comments
Open

Memory usage #4

TaruMuranen opened this issue Apr 25, 2018 · 3 comments

Comments

@TaruMuranen
Copy link

Hi,

I would need to find proxies (r2>0.8) for about 8000 snps from about 300 genomic regions (region defined so that the distance between consecutive snps is less than 1 000 000 bases). Using get_proxies per snp in a for-loop or with apply requires horribly lot of memory (10Gb is reached with about 50 snps). Apparently get_proxies calls get_vcf, which downloads huge datafiles from web.

Is there any way to free memory after each snps? Or should I download all required data in advance and store it locally? How would I then run get_proxies?

Or would you suggest a better way of finding the proxies?
SNAP proxy search has only 1000 genomes pilot.
LDlink does not appear suitable for this many snps.
Both have restrictions for the search region width.

Best wishes

/tm

@slowkow
Copy link
Owner

slowkow commented Apr 30, 2018

Since you need proxies for 8000 SNPs, I would not recommend using proxysnps. It will download the same data and recompute the same statistics multiple times without caching any intermediate results.

As you suggested, I would recommend downloading all of the genotype data and storing it locally. Right now, get_proxies() does not support querying local files, but this feature should be easy to add. If I find the time to add this feature, I'll reply to this issue and let you know.

For now, here's another approach that you might consider:

https://gist.github.com/slowkow/3d13aa44cf4f65ca9ad2a0570346ba05

@TaruMuranen
Copy link
Author

Thanks for sharing your code.
I'll try this.

@slowkow
Copy link
Owner

slowkow commented May 3, 2018

You're very welcome! Please let me know if you run into any issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants