graph partitioning #31

sademakn · 2021-05-14T03:05:09Z

There's an option named 'num_partitions' in pytorch-biggraph that can reduce the peak memory usage, Can Cleora provide that option too? is it possible in the future?
my situation:
40M nodes
180M edges
more than 20GB of peak memory usage to train Cleora embeddings!
I also set ( --in-memory-embedding-calculation 0 )

piobab · 2021-05-14T14:14:24Z

Hi @sademakn !

It's planned in the future, but we can't promise any deadlines. You're more than welcome to contribute. As per our whitepaper, you can split the graph into multiple parts and average the resulting embeddings, without sacrificing too much quality. Also 20GB peak usage is not much ;) Look up spot instances on Azure/GCP/AWS, you can get 500GB RAM for $1.5/hr.

sademakn · 2021-05-14T21:44:31Z

Hi
Thank you for your answer, I'll try to find a spare time to work on partitioning but I am a beginner in rust!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graph partitioning #31

graph partitioning #31

sademakn commented May 14, 2021 •

edited

piobab commented May 14, 2021

sademakn commented May 14, 2021

graph partitioning #31

graph partitioning #31

Comments

sademakn commented May 14, 2021 • edited

piobab commented May 14, 2021

sademakn commented May 14, 2021

sademakn commented May 14, 2021 •

edited