Skip to content

bytedance/LargeBatchCTR

Repository files navigation

Large Batch Training for CTR Prediction (CowClip)

LargeBatchCTR aims to train CTR prediction models with large batch (~128k). The framework is based on DeepCTR. You can run the code on a V100 GPU to feel the fast training speed.

Adaptive Column-wise Clipping (CowClip) method from paper "CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU" is implemented in this repo.

Get Started

First, download dataset to the data folder. Use data_utils.py to preprocess the data for training.

python data_utils.py --dataset criteo_kaggle --split rand

Then, use train.py to train the network.

# Criteo (baseline)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset criteo_kaggle --model DeepFM
# Avazu (baseline)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset avazu --model DeepFM

For large batch training with CowClip, do as follows:

# Criteo (8K)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset criteo_kaggle --model DeepFM --lr_embed 1e-4 --warmup 1 --init_stddev 1e-2 --clip 1 --bound 1e-5 --bs 8192 --l2 8e-05 --lr 22.6274e-4
# Criteo (128K)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset criteo_kaggle --model DeepFM --lr_embed 1e-4 --warmup 1 --init_stddev 1e-2 --clip 1 --bound 1e-5 --bs 131072 --l2 128e-05 --lr 90.5096e-4
# Avazu (64K)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset avazu --model DeepFM --lr_embed 1e-4 --warmup 1 --init_stddev 1e-2 --clip 1 --bound 1e-4 --bs 65536 --l2 64e-05 --lr 8e-4

CowClip Quick Look

CowClip Algorithm Quick Look

Dataset List

  • Criteo Kaggle: download train.txt in data/criteo_kaggle/
  • Avazu: download train in data/avazu/

Hyperparameters

The meaning of hyperparameters in the command line is as follows:

params name
--bs batch size
--lr_embed learning rate for the embedding layer
--lr learning rate for the dense weights
--l2 L2-regularization weight λ
--clip CowClip coefficient r
--bound CowClip bound ζ
--warmup number of epochs to warmup on dense weights
--init_stddev initialization weight standard deviation

The hyperparameters neet to be scaled are listed as follows. For Criteo dataset:

bs lr l2 ζ DeepFM AUC(%) Time(min)
1K 8e-4 1e-5 1e-5 80.86 768
2K 11.31e-4 2e-5 1e-5 80.93 390
4K 16e-4 4e-5 1e-5 80.97 204
8K 22.62e-4 8e-5 1e-5 80.97 102
16K 32e-4 16e-5 1e-5 80.94 48
32K 45.25e-4 32e-5 1e-5 80.95 27
64K 64e-4 64e-5 1e-5 80.96 15
128K 90.50e-4 128e-5 1e-5 80.90 9

For Avazu dataset:

bs lr l2 ζ DeepFM AUC(%) Time(min)
1K 1e-4 1e-5 1e-3 78.83 210
2K 1.41e-4 2e-5 1e-3 78.82 108
4K 2e-4 4e-5 1e-4 78.90 54
8K 2.83e-4 8e-5 1e-4 79.06 30
16K 4e-4 16e-5 1e-4 79.01 17
32K 5.66e-4 32e-5 1e-4 78.82 10
64K 8e-4 64e-5 1e-4 78.82 6.7
128K 16e-4 96e-5 1e-4 78.80 4.8

Model List

Model Paper
Wide & Deep [DLRS 2016]Wide & Deep Learning for Recommender Systems
DeepFM [IJCAI 2017]DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
Deep & Cross Network [ADKDD 2017]Deep & Cross Network for Ad Click Predictions
DCN V2 [arxiv 2020]DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems

Requirements

Tensorflow 2.4.0
Tensorflow-Addons

pip install -r requirements.txt

Citation

@article{zheng2022cowclip,
  title={{CowClip}: Reducing {CTR} Prediction Model Training Time from 12 hours to 10 minutes on 1 {GPU}},
  author={Zangwei Zheng, Pengtai Xu, Xuan Zou, Da Tang, Zhen Li, Chenguang Xi, Peng Wu, Leqi Zou, Yijie Zhu, Ming Chen, Xiangzhuo Ding, Fuzhao Xue, Ziheng Qing, Youlong Cheng, Yang You},
  journal={arXiv},
  volume={abs/2204.06240},
  year={2022}
}

About

Large batch training of CTR models based on DeepCTR with CowClip.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages