Learning to Optimize Differentiable Games

Code for ICML 2023 paper: Learning to Optimize Differentiable Games.

Overview

Many machine learning problems can be abstracted in solving game theory formulations and boil down to optimizing nested objectives, such as generative adversarial networks (GANs) and multi-agent reinforcement learning. Solving these games requires finding their stable fixed points or Nash equilibrium. However, existing algorithms for solving games suffer from empirical instability, hence demanding heavy ad-hoc tuning in practice.
To tackle these challenges, we resort to the emerging scheme of $\textit{Learning to Optimize}$ (L2O), which discovers problem-specific efficient optimization algorithms through data-driven training. Our customized L2O framework for differentiable game theory problems, dubbed $\textit{``Learning to Play Games"}$ (L2PG), seeks a stable fixed point solution, by predicting the fast update direction from the past trajectory, with a novel gradient stability-aware, sign-based loss function. We further incorporate curriculum learning and self-learning to strengthen the empirical training stability and generalization of L2PG. On test problems including quadratic games and GANs, L2PG can substantially accelerate the convergence, and demonstrates a remarkably more stable trajectory.

Experiments

Meta-Testing

We provide checkpoints of the optimizers under the checkpoints folder.

Two-player games (stable)

python sga_l2o_batch.py --eval_game_list stable_game_list_normal.txt --checkpoint checkpoints/two_player.pkl --n_hidden 32 --formula grad,A,S --feat_level o,m0.5,m0.9,m0.99,mt,gt

Four-player games

python sga_l2o_batch_four_player.py --checkpoint checkpoints/four_player.pkl --n_hidden 32 --formula grad,S,A --feat_level o,m0.5,m0.9,m0.99,mt,gt --n_player 4

GANs

Meta-Training

Two-player games

python -u sga_l2o_train.py --n_hidden 32 --game-distribution gaussian --formula grad,S,A --inner-iterations 100 --feat_level o,m0.5,m0.9,m0.99,mt,gt --unroll_length 10 --output-name all_cl_0.1_reg_2_0.95.pkl --eval-game-list stable_game_list_new.txt --init-mode ball  --use-slow-optimizer --use-slow-ema --slow-optimizer-freq 5 --epoch 300 --wandb-name all_cl_0.1_reg_2_0.95 --slow-ema 0.95 --slow-optimizer-start 0.1  --cl --reg_2 --data-cl --batch-size 128

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
checkpoints		checkpoints
.gitignore		.gitignore
README.md		README.md
benchmark_batch.py		benchmark_batch.py
benchmark_batch_fix_lambda.py		benchmark_batch_fix_lambda.py
benchmark_batch_four_player.py		benchmark_batch_four_player.py
gan.py		gan.py
generate_game_list.py		generate_game_list.py
losses.py		losses.py
meta_module.py		meta_module.py
networks.py		networks.py
omd_line.py		omd_line.py
sga.py		sga.py
sga_batch.py		sga_batch.py
sga_l2o_batch.py		sga_l2o_batch.py
sga_l2o_batch_four_player.py		sga_l2o_batch_four_player.py
sga_l2o_copy_observe.py		sga_l2o_copy_observe.py
sga_l2o_evaluate_gan copy.py		sga_l2o_evaluate_gan copy.py
sga_l2o_evaluate_gan.py		sga_l2o_evaluate_gan.py
sga_l2o_evaluate_gan_high.py		sga_l2o_evaluate_gan_high.py
sga_l2o_evaluate_gan_high_l2o.py		sga_l2o_evaluate_gan_high_l2o.py
sga_l2o_evaluate_gan_mnist.py		sga_l2o_evaluate_gan_mnist.py
sga_l2o_evaluate_gan_mnist_l2o.py		sga_l2o_evaluate_gan_mnist_l2o.py
sga_l2o_quadratic.py		sga_l2o_quadratic.py
sga_l2o_test.py		sga_l2o_test.py
sga_l2o_train.py		sga_l2o_train.py
sga_l2o_train_four_player.py		sga_l2o_train_four_player.py
sga_l2o_train_gan copy.py		sga_l2o_train_gan copy.py
sga_l2o_train_gan.py		sga_l2o_train_gan.py
sga_l2o_train_gan_high.py		sga_l2o_train_gan_high.py
sga_l2o_train_gan_mnist.py		sga_l2o_train_gan_mnist.py
sga_l2o_train_update.py		sga_l2o_train_update.py
sga_l2o_visualize.py		sga_l2o_visualize.py
sga_line.py		sga_line.py
sga_visualize.py		sga_visualize.py
sgd.py		sgd.py
stable_game_list.txt		stable_game_list.txt
stable_game_list_new.txt		stable_game_list_new.txt
stable_game_list_normal.txt		stable_game_list_normal.txt
stable_game_list_uniform.txt		stable_game_list_uniform.txt
stable_saddle_game_list.txt		stable_saddle_game_list.txt
unstable_game_list_normal.txt		unstable_game_list_normal.txt
utils.py		utils.py

VITA-Group/L2PG

Folders and files

Latest commit

History

Repository files navigation

Learning to Optimize Differentiable Games

Overview

Experiments

Meta-Testing

Two-player games (stable)

Four-player games

GANs

Meta-Training

Two-player games

About

Topics

Resources

Stars

Watchers

Forks

Languages