Cold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient

This repository contains an implementation of the reinforcement learning method described in the paper "Cold-Start Reinforcement Learning with Softmax Policy Gradient" by Nan Ding and Radu Soricut from Google Inc. The method is based on a softmax value function that eliminates the need for warm-start training and sample variance reduction during policy updates.

Method

RNN Encoder Decoder

Requirements

Create a conda environment using the following command:

conda create -n <env_name> python=3.9

Intsall the required packages using the following command:

conda install --file requirements.txt

Program issues

In pipeline.py, change the following line if has an error:

AssertionError: Torch not compiled with CUDA enabled

Change

z = torch.cat([z, zt_idx.cuda()[None]], dim=0) # (T, B) token id

to

z = torch.cat([z, zt_idx[None]], dim=0) # (T, B) token id

Experiment

Summarization Task: Headline Generation

Dataset:

Training: English Gigaword
Testing: DUC 2004

Evaluation: ROUGE-L score

Automatic Image-Caption Generation

Dataset:

Training / Validation: Microsoft COCO
Testing: Microsoft COCO

Evaluation: CIDer score / ROUGE-L score

Results

Model loss

Model reward (ROUGE-L score)

Acknowledgements

We would like to thank Nan Ding and Radu Soricut for their valuable contributions to the field of reinforcement learning, and for making their paper available to the public. We also acknowledge the TensorFlow team for providing a powerful and flexible deep learning framework.

Contributors

Citation

@misc{20230615,
  author = {Chih-Chun Chen and Pin-Yen Liu and Po-Chuan Chen},
  title = {Cold-Start Reinforcement Learning with Softmax Policy Gradient},
  year = {2023},
  month = {06},
  note = {Version 1.0},
  howpublished = {GitHub},
  url = {https://github.com/jacksonchen1998/Cold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient}
}

@misc{ding2017coldstart,
      title={Cold-Start Reinforcement Learning with Softmax Policy Gradient}, 
      author={Nan Ding and Radu Soricut},
      year={2017},
      eprint={1709.09346},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
image		image
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataloader.py		dataloader.py
inference.py		inference.py
model.py		model.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
train.py		train.py

License

jacksonchen1998/Cold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient

Folders and files

Latest commit

History

Repository files navigation