Large Batch Training for CTR Prediction (CowClip)

LargeBatchCTR aims to train CTR prediction models with large batch (~128k). The framework is based on DeepCTR. You can run the code on a V100 GPU to feel the fast training speed.

Adaptive Column-wise Clipping (CowClip) method from paper "CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU" is implemented in this repo.

Get Started

First, download dataset to the data folder. Use data_utils.py to preprocess the data for training.

python data_utils.py --dataset criteo_kaggle --split rand

Then, use train.py to train the network.

# Criteo (baseline)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset criteo_kaggle --model DeepFM
# Avazu (baseline)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset avazu --model DeepFM

For large batch training with CowClip, do as follows:

# Criteo (8K)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset criteo_kaggle --model DeepFM --lr_embed 1e-4 --warmup 1 --init_stddev 1e-2 --clip 1 --bound 1e-5 --bs 8192 --l2 8e-05 --lr 22.6274e-4
# Criteo (128K)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset criteo_kaggle --model DeepFM --lr_embed 1e-4 --warmup 1 --init_stddev 1e-2 --clip 1 --bound 1e-5 --bs 131072 --l2 128e-05 --lr 90.5096e-4
# Avazu (64K)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset avazu --model DeepFM --lr_embed 1e-4 --warmup 1 --init_stddev 1e-2 --clip 1 --bound 1e-4 --bs 65536 --l2 64e-05 --lr 8e-4

CowClip Quick Look

Dataset List

Criteo Kaggle: download train.txt in data/criteo_kaggle/
Avazu: download train in data/avazu/

Hyperparameters

The meaning of hyperparameters in the command line is as follows:

params	name
--bs	batch size
--lr_embed	learning rate for the embedding layer
--lr	learning rate for the dense weights
--l2	L2-regularization weight λ
--clip	CowClip coefficient r
--bound	CowClip bound ζ
--warmup	number of epochs to warmup on dense weights
--init_stddev	initialization weight standard deviation

The hyperparameters neet to be scaled are listed as follows. For Criteo dataset:

bs	lr	l2	ζ	DeepFM AUC(%)	Time(min)
1K	8e-4	1e-5	1e-5	80.86	768
2K	11.31e-4	2e-5	1e-5	80.93	390
4K	16e-4	4e-5	1e-5	80.97	204
8K	22.62e-4	8e-5	1e-5	80.97	102
16K	32e-4	16e-5	1e-5	80.94	48
32K	45.25e-4	32e-5	1e-5	80.95	27
64K	64e-4	64e-5	1e-5	80.96	15
128K	90.50e-4	128e-5	1e-5	80.90	9

For Avazu dataset:

bs	lr	l2	ζ	DeepFM AUC(%)	Time(min)
1K	1e-4	1e-5	1e-3	78.83	210
2K	1.41e-4	2e-5	1e-3	78.82	108
4K	2e-4	4e-5	1e-4	78.90	54
8K	2.83e-4	8e-5	1e-4	79.06	30
16K	4e-4	16e-5	1e-4	79.01	17
32K	5.66e-4	32e-5	1e-4	78.82	10
64K	8e-4	64e-5	1e-4	78.82	6.7
128K	16e-4	96e-5	1e-4	78.80	4.8

Model List

Model	Paper
Wide & Deep	[DLRS 2016]Wide & Deep Learning for Recommender Systems
DeepFM	[IJCAI 2017]DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
Deep & Cross Network	[ADKDD 2017]Deep & Cross Network for Ad Click Predictions
DCN V2	[arxiv 2020]DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems

Requirements

Tensorflow 2.4.0
Tensorflow-Addons

pip install -r requirements.txt

Citation

@article{zheng2022cowclip,
  title={{CowClip}: Reducing {CTR} Prediction Model Training Time from 12 hours to 10 minutes on 1 {GPU}},
  author={Zangwei Zheng, Pengtai Xu, Xuan Zou, Da Tang, Zhen Li, Chenguang Xi, Peng Wu, Leqi Zou, Yijie Zhu, Ming Chen, Xiangzhuo Ding, Fuzhao Xue, Ziheng Qing, Youlong Cheng, Yang You},
  journal={arXiv},
  volume={abs/2204.06240},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
assets		assets
data		data
deepctr		deepctr
docs		docs
examples		examples
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_DeepCTR.md		README_DeepCTR.md
clip.py		clip.py
data_utils.py		data_utils.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
train.py		train.py
utils.py		utils.py

License

bytedance/LargeBatchCTR

Folders and files

Latest commit

History

Repository files navigation

Large Batch Training for CTR Prediction (CowClip)

Get Started

CowClip Quick Look

Dataset List

Hyperparameters

Model List

Requirements

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages