Skip to content
This repository has been archived by the owner on Nov 6, 2023. It is now read-only.

tf_models KerasConvLSTM seq_lens tensor #80

Open
sa1g opened this issue Nov 4, 2022 · 0 comments
Open

tf_models KerasConvLSTM seq_lens tensor #80

sa1g opened this issue Nov 4, 2022 · 0 comments

Comments

@sa1g
Copy link

sa1g commented Nov 4, 2022

Hi, I'm detaching the trainer from rllib (need of custom stuff incompatible with it). I'm having problems using the model.forward method:
what should I put in seq_lens? I couldn't find any documentation about it.

Last error:

ValueError: Input 0 of layer "permute_1" is incompatible with the layer: expected ndim=5, found ndim=4. 
Full shape received: (Dimension(1), Dimension(2), Dimension(11), Dimension(11))

Code context:

from keras_model import build_model
from env_wrapper import RLlibEnvWrapper
from tf_models import KerasConvLSTM, get_flat_obs_size
import tensorflow as tf
from tensorflow.python.framework.ops import enable_eager_execution
# enable_eager_execution()

# Model config and env config as in /tutorials/rllib/phase1/config.yaml
model_config = {
    'custom_model': "keras_conv_lstm",
    'custom_options': {
        'fc_dim': 128,
        'idx_emb_dim': 4,
        'input_emb_vocab': 100,
        'lstm_cell_size': 128,
        'num_conv': 2,
        'num_fc': 2,
    },
    'max_seq_len': 25,

}

env_config = {'env_config_dict': {
    # ===== SCENARIO CLASS =====
    # Which Scenario class to use: the class's name in the Scenario Registry (foundation.scenarios).
    # The environment object will be an instance of the Scenario class.
    'scenario_name': 'layout_from_file/simple_wood_and_stone',

    # ===== COMPONENTS =====
    # Which components to use (specified as list of ("component_name", {component_kwargs}) tuples).
    #   "component_name" refers to the Component class's name in the Component Registry (foundation.components)
    #   {component_kwargs} is a dictionary of kwargs passed to the Component class
    # The order in which components reset, step, and generate obs follows their listed order below.
    'components': [
        # (1) Building houses
        ('Build', {
            'skill_dist':                   'pareto',
            'payment_max_skill_multiplier': 3,
            'build_labor':                  10,
            'payment':                      10
        }),
        # (2) Trading collectible resources
        ('ContinuousDoubleAuction', {
            'max_bid_ask':    10,
            'order_labor':    0.25,
            'max_num_orders': 5,
            'order_duration': 50
        }),
        # (3) Movement and resource collection
        ('Gather', {
            'move_labor':    1,
            'collect_labor': 1,
            'skill_dist':    'pareto'
        }),
        # (4) Planner
        ('PeriodicBracketTax', {
            'period':          100,
            'bracket_spacing': 'us-federal',
            'usd_scaling':     1000,
            'disable_taxes':   False
        })
    ],

    # ===== SCENARIO CLASS ARGUMENTS =====
    # (optional) kwargs that are added by the Scenario class (i.e. not defined in BaseEnvironment)
    'env_layout_file': 'quadrant_25x25_20each_30clump.txt',
    'starting_agent_coin': 10,
    'fixed_four_skill_and_loc': True,

    # ===== STANDARD ARGUMENTS ======
    # kwargs that are used by every Scenario class (i.e. defined in BaseEnvironment)
    'n_agents': 4,          # Number of non-planner agents (must be > 1)
    'world_size': [25, 25],  # [Height, Width] of the env world
    'episode_length': 1000,  # Number of timesteps per episode

    # In multi-action-mode, the policy selects an action for each action subspace (defined in component code).
    # Otherwise, the policy selects only 1 action.
    'multi_action_mode_agents': False,
    'multi_action_mode_planner': True,

    # When flattening observations, concatenate scalar & vector observations before output.
    # Otherwise, return observations with minimal processing.
    'flatten_observations': True,
    # When Flattening masks, concatenate each action subspace mask into a single array.
    # Note: flatten_masks = True is required for masking action logits in the code below.
    'flatten_masks': True,

    # How often to save the dense logs
    'dense_log_frequency': 1
}}

env = RLlibEnvWrapper(env_config)
obs = env.reset()

# so num_outputs must be equal to env.action_space's value 
model = KerasConvLSTM(env.observation_space,
                      env.action_space, num_outputs=50, model_config=model_config, name=None)
state = model.get_initial_state()

# probably the issue is here
# rank_1_tensor = tf.constant([(50,),(136,),(1,),(2,11,11),(7,11,11)], shape=(1,5,1))
rank_1_tensor = tf.constant([1])

flat_obs_space = get_flat_obs_size(env.observation_space)


def dict_to_tensor_dict(a_dict: dict):
    """
    pass a single agent obs, returns it's tensor_dict
    """
    tensor_dict = {}
    for key, value in a_dict.items():
        tensor_dict[key] = tf.convert_to_tensor(value, name=key)

    return tensor_dict

obs_tensor_dict = dict_to_tensor_dict(obs['0'])

# from /python3.7/site-packages/ray/rllib/models/tf/tf_modelv2.py
# `input_dict` (dict): dictionary of input tensors, including `"obs", "obs_flat", "prev_action", "prev_reward", "is_training"`

input_dict = {
    'obs': obs_tensor_dict,
    # 'obs_flat': ,
    'prev_action': None, 
    'prev_reward': None, 
    'is_training': True
}

output, new_state = model.forward(input_dict, state, rank_1_tensor)

Thank you for the help!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant