The goal of this repo is to experiment some Reinforcement Leanring algorithms on customs and simple games.
A policy gradient algorithm is used in the context of neural network policy. The neural network take an observation as an input and the output is the action to be executed. To be more specific, the output is a probability to take one action of the action space. This is to let the agent have a right balance between exploring new states and also using actions that are known to work well.
Now, how can we train the neural network so that the policy takes the best decision at each step ?
First we need to run a policy gradient training.
Command to visualize graph in tensorboard: tensorboard --logdir tf_logs/
python froze_model.py --model_dir C:/YourPath/model --output_node_names multinomial/Multinomial,whateverOtherTFNodeName