Allows to load node embeddings into memory with FP16 format #77
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR gives a new command-line argument
--use_fp16
that, when added, allows the user to load node embeddings into memory with FP16 format. With this argument, the memory-mapped node features will be loaded into memory from disk with FP16 format. This allows users to run the training code on large dataset (IGBH-Large & IGBH-Full, for instance, or IGBH-Medium, which requires >128GB memory) on devices with smaller memory size.From our current running result, we see that the accuracy is the same when training with FP16 vs. FP32. This is verified on IGBH-Small, IGBH-Medium, and IGBH-Large.
We need to note that with this argument, we load all node features into the memory by default, regardless of whether
--in_memory
is true.Additionally, this PR also fixes the type of command-line argument
--learning_rate
, so that we can actually control the learning rate from bash script, instead of modifying the actual.py
file.