Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph partitioning #31

Open
sademakn opened this issue May 14, 2021 · 2 comments
Open

graph partitioning #31

sademakn opened this issue May 14, 2021 · 2 comments

Comments

@sademakn
Copy link

sademakn commented May 14, 2021

There's an option named 'num_partitions' in pytorch-biggraph that can reduce the peak memory usage, Can Cleora provide that option too? is it possible in the future?
my situation:
40M nodes
180M edges
more than 20GB of peak memory usage to train Cleora embeddings!
I also set ( --in-memory-embedding-calculation 0 )

@piobab
Copy link
Contributor

piobab commented May 14, 2021

Hi @sademakn !

It's planned in the future, but we can't promise any deadlines. You're more than welcome to contribute. As per our whitepaper, you can split the graph into multiple parts and average the resulting embeddings, without sacrificing too much quality. Also 20GB peak usage is not much ;) Look up spot instances on Azure/GCP/AWS, you can get 500GB RAM for $1.5/hr.

@sademakn
Copy link
Author

Hi
Thank you for your answer, I'll try to find a spare time to work on partitioning but I am a beginner in rust!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants