Investigate alternatives for faster (joint) embeddings lookup #1036

gabrielspmoreira · 2023-03-24T15:55:14Z

@vysarge has done a number of benchmark and profiling experiments using using synthetic data comparing Merlin Models. In particular, she compared DLRM with the JoC DLRM TF implementation, whose experiments results can be found in this spreadsheet (Nvidia internal only).

She noticed in particular that MM implementation uses TF embedding API functions, while JoC uses a custom joint embedding that fuses embedding tables together and performs embeddings jointly with one call [code link], which is faster

gabrielspmoreira mentioned this issue Mar 24, 2023

[RMP] Performance improvements for Merlin Models NVIDIA-Merlin/Merlin#870

Open

6 tasks

gabrielspmoreira transferred this issue from NVIDIA-Merlin/Merlin Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate alternatives for faster (joint) embeddings lookup #1036

Investigate alternatives for faster (joint) embeddings lookup #1036

gabrielspmoreira commented Mar 24, 2023 •

edited

Investigate alternatives for faster (joint) embeddings lookup #1036

Investigate alternatives for faster (joint) embeddings lookup #1036

Comments

gabrielspmoreira commented Mar 24, 2023 • edited

gabrielspmoreira commented Mar 24, 2023 •

edited