Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different kinds of outputs #247

Open
erupturatis opened this issue Aug 4, 2022 · 0 comments
Open

Different kinds of outputs #247

erupturatis opened this issue Aug 4, 2022 · 0 comments

Comments

@erupturatis
Copy link

erupturatis commented Aug 4, 2022

Hey,

Generally the action space in a game or environment more generally can be represented in a lot of ways (for example one-hot encoded inputs, probabilities, values between 0 and 1, etc).
This is also a bit related to #184
I think we should be able to make a more detailed setup in the config files for different types of outputs such as

[LanderGenome]

...
num_outputs = 20
Custom_outputs = True
Softmaxed_outputs = [3, 2]
Clamped_outputs = [(3,1,2),(2,0,1)]
one_hot_encoded = [5]
normal_outputs = [5]

Custom outputs would signal neat to return outputs in another form then usual (I was thinking about a dictionary of outputs depending on the customization)

Softmaxed_outputs = [3, 2] # 1 group of 3 softmaxed outputs and another of 2 softmaxed outputs
Clamped_outputs = [(3,1,2),(2,0,1)] # a group of 3 clamped outputs between 1 and 2 and another group of 2 clamped outputs between 0 and 1
one_hot_encoded = [5] , 5one hot encoded values
normal_outputs = [5] the last 5 outputs will have raw data in them
The output matrix for what I am describing should look like this:

outputs = {
"softmax": [[0.2, 0.4, 0.4], [0.45, 0.55]], # the 2 pair of one hot encoded values
"clamped":[[1,2,1], [1,0]],
"encoded":[[0,0,0,1,0]],
"nomal": [12.1,32.1,43.1,1.1,2.3]
}

The total number of custom outputs should be equal to the total number of outputs

Also I am not sure how this would affect the performance of the networks. For example if the output of a network would be mostly over 10 you can't directly clamp the values between 0 and 1 since they would all be 1 so we would need to normalize some of the special values first like the clamped ones

Please let me know what you think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant