Skip to content
This repository has been archived by the owner on Dec 11, 2022. It is now read-only.

PolicyOptimization Agents do not log signals to csv #448

Open
crzdg opened this issue May 25, 2020 · 1 comment
Open

PolicyOptimization Agents do not log signals to csv #448

crzdg opened this issue May 25, 2020 · 1 comment

Comments

@crzdg
Copy link

crzdg commented May 25, 2020

I encountered a strange behavior.

For ClippedPPO, PPO and ActorCritic I was not able to get the signals defined in there init-Method.
Loss, Gradients, Likelihood, KL Divergence, etc...

I'm not sure if it is a issue in my Environment implementation. But DQN logs its signals. I also checked the signals dumpy by update_log. For the mentiond agents, the self.episode_signals includes duplicate entries for the signals not logged. As the signals are defined on several inheritents of the agent-class but still saved to self.episode_signals multiple times. Obviously only the latest created will be updated with values in the train-Method.

Also, It could be to the behavior of updating signals before every episode. As gradients are only available after training they might get reseted after last training iteration as a new episode starts.

However, I do have experiments with ClippedPPO where those signals were logged, but I can't recreate this.

Any suggestions?

@crzdg
Copy link
Author

crzdg commented May 26, 2020

I found the causing behavior.

Following setup,

ClippedPPOAgent with
num_consecutive_playing_steps = EnvironmentEpisodes(15)

CSV dumper is set to
dump_signals_to_csv_every_x_episodes = 5

Before the training after 15 episodes the csv will be dumped due to 15 % 5 = 0, the last 5 episodes will be dumped including episode 15 with no training values (loss, graidents, etc...).

The training happens and training values will be generated. The training values will be saved in the 15th episode in the loggers pandas datarframe. As this line is already dumped it never will be written to the CSV.

I assume this was caused due to #113

I updated clipped_ppo_agent.py as following.
Simply added a decrement of last_line_idx_written_to_csv in the logger.

    def train(self):
        if self._should_train():
            for network in self.networks.values():
                network.set_is_training(True)

            dataset = self.memory.transitions
            update_internal_state = self.ap.algorithm.update_pre_network_filters_state_on_train
            dataset = self.pre_network_filter.filter(dataset, deep_copy=False,
                                                     update_internal_state=update_internal_state)
            batch = Batch(dataset)

            for training_step in range(self.ap.algorithm.num_consecutive_training_steps):
                self.networks['main'].sync()
                self.fill_advantages(batch)

                # take only the requested number of steps
                if isinstance(self.ap.algorithm.num_consecutive_playing_steps, EnvironmentSteps):
                    dataset = dataset[:self.ap.algorithm.num_consecutive_playing_steps.num_steps]
                shuffle(dataset)
                batch = Batch(dataset)

                self.train_network(batch, self.ap.algorithm.optimization_epochs)

            for network in self.networks.values():
                network.set_is_training(False)

            self.post_training_commands()
            self.training_iteration += 1
            # should be done in order to update the data that has been accumulated * while not playing *
            self.update_log()
            self.agent_logger.last_line_idx_written_to_csv -= 1
            return None


crzdg added a commit to crzdg/coach that referenced this issue May 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant