[SIGMOD 2024] CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models (Paper)

Scripts

Our implementation builds upon DLRM repo: https://github.com/facebookresearch/dlrm

The code supports interface with the Criteo Kaggle Display Advertising Challenge Dataset.
- The model can be trained using the following script
  - Convert the value of the numerical feature to log(x)+1.
  - Ensure that the feature count for each field is independent.
  - Set the parameters cat_path, dense_path, label_path and count_path in the script.
```
./bench/criteo_kaggle.sh
```
The code supports interface with the Criteo Terabyte Dataset.
- Please do the following to prepare the dataset for use with this code:
  - Convert the value of the numerical feature to log(x)+1.
  - Ensure that the feature count for each field is independent.
  - Set the parameters cat_path, dense_path, label_path and count_path in the script.
- The model can be trained using the following script
```
./bench/criteo_terabyte.sh
```
The code also supports another two datasets Avazu and KDD12.
- Please do the following to prepare the dataset for use with this code:
  - Ensure that the feature count for each field is independent.
  - Set the parameters cat_path, dense_path, label_path and count_path in the script.
- The model can be trained using the following script
```
./bench/avazu.sh
./bench/kdd12.sh
```
The code provides three models to train the dataset:
- dlrm:
```
./bench/criteo_terabyte.sh
```
- wdl:
```
./bench/wdl.sh
```
- dcn:
```
./bench/dcn.sh
```

The code provides six methods for generating embedding layers:

Full embedding with the following script
```
./bench/criteo_terabyte.sh
```

Hash embedding with the following script

./bench/criteo_terabyte.sh "--hash-flag --compress-rate=0.001"

CAFE with the following script

./bench/criteo_terabyte.sh "--sketch-flag --compress-rate=0.001 --hash-rate=0.3"

QR embedding with the following script

./bench/criteo_terabyte.sh "--qr-flag --qr-collisions=10"

Ada embedding with the following script

./bench/criteo_terabyte.sh "--ada-flag --compress-rate=0.1"

MD embedding with the following script

./bench/criteo_terabyte.sh "--md-flag --compress-rate=0.1"

Guidance for Adjustment of CAFE Parameters

Default parameters:

./bench/criteo_terabyte.sh "--sketch-flag --compress-rate=0.001 --hash-rate=0.3 --threshold=300"

To get better experimental results, when cranking up the compression rate, you can crank down the memory footprint of the hash and crank up the threshold, and vice versa. For example, for other compression rates please try the following commands:
```
./bench/criteo_terabyte.sh "--sketch-flag --compress-rate=0.1 --hash-rate=0.7 --threshold=30"
```
```
./bench/criteo_terabyte.sh "--sketch-flag --compress-rate=0.0001 --hash-rate=0.2 --threshold=500"
```

Citation

If you find this work useful, please cite our paper.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
bench		bench
tricks		tricks
.gitignore		.gitignore
README.md		README.md
autoencoder.py		autoencoder.py
dcn.py		dcn.py
dlrm_data_pytorch.py		dlrm_data_pytorch.py
dlrm_s_pytorch.py		dlrm_s_pytorch.py
supplementary_file.pdf		supplementary_file.pdf
wdl.py		wdl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench

bench

tricks

tricks

.gitignore

.gitignore

README.md

README.md

autoencoder.py

autoencoder.py

dcn.py

dcn.py

dlrm_data_pytorch.py

dlrm_data_pytorch.py

dlrm_s_pytorch.py

dlrm_s_pytorch.py

supplementary_file.pdf

supplementary_file.pdf

wdl.py

wdl.py

Repository files navigation

[SIGMOD 2024] CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models (Paper)

Scripts

Guidance for Adjustment of CAFE Parameters

Citation

About

Releases

Packages

Languages

PKU-DAIR/CAFE

Folders and files

Latest commit

History

Repository files navigation

[SIGMOD 2024] CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models (Paper)

Scripts

Guidance for Adjustment of CAFE Parameters

Citation

About

Resources

Stars

Watchers

Forks

Languages