Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(zms): add new league middlewares and other models and tools. #458

Open
wants to merge 288 commits into
base: main
Choose a base branch
from

Conversation

hiha3456
Copy link
Collaborator

Description

  1. add LeagueCoordinator, LeagueLearnerCommunicator, StepLeagueActor, BattleStepCollector, battle_inferencer, battle_rolloutor, with their corresponding tests.
  2. add BattleTransitionList to gather transitions and cut trajectories for league environment
  3. add dataclass of actor data and learner model
  4. add an attribute return_original_data into EnvSupervisor
  5. add BattleContext into context.py
  6. add EventEnum and add feature in event_loop so that we can add customized string in chosen events.
  7. add utils sparse_logging to logging in a sparse frequency
  8. add my_pickle_loads inside ding/framework/parallel.py so that we could transfer a cuda tensor to a pure-cpu node without bug
  9. add old Storage and FileStorage class used in old dev-league branch inside ding/framework/storage
  10. change a bit player
  11. add sl_branch inside starcraft_player
  12. add old ding/league/v2/base_league.py used in old dev-league branch
  13. add steve, change upgo, vtrace in ding/rl_utils
  14. add detach_grad, flatten in ding/torch_utils/data_helper.py
  15. add l2_distance in ding/torch_utils/metric.py
  16. add GLU2, GatedConvResBlock, scatter_connection_v2, AttentionPool, lstm ding/torch_utils/network/, and make some changes in networks
  17. add read_yaml_config in ding/utils/default_helper.py
  18. and other changes...

Related Issue

TODO

  1. delete old checkpoints saved by LeagueLearnerCommunicator, because it can consume disk storage very quickly
  2. When we use multiple envs inside env_manager(or other variants), in some cases only some of the envs work properly, so for the return timesteps of step(action), I hugely recommend that, all the EnvManagers need to return the timesteps in format dict instead of list. As far as I know, the return of EnvSupervisor and BaseEnvManagerV2 is dict, and the returns subprocesEnvManager and BaseEnvManager is list.
  3. The BattleCollector now could not handle the case when the policy has intermediate state, for example, the policy of SC2(which in DI-star) maintain a huge amount of intermediate state. In this case, each env should maintain number of players policies, which is not the case in current BattleCollector. The current BattleCollector can only handle this kind of policy when EnvManager has only one environment.
  4. add middleware of teacher model.

Check List

  • merge the latest version source branch/repo, and resolve all the conflicts
  • pass style check
  • pass all the tests

@hiha3456 hiha3456 marked this pull request as draft August 26, 2022 07:30
ding/rl_utils/steve.py Outdated Show resolved Hide resolved
ding/torch_utils/data_helper.py Show resolved Hide resolved
ding/torch_utils/metric.py Show resolved Hide resolved
ding/torch_utils/network/activation.py Show resolved Hide resolved
ding/utils/data/collate_fn.py Show resolved Hide resolved
ding/envs/env_manager/env_supervisor.py Show resolved Hide resolved
@@ -75,3 +75,38 @@ def __init__(self, *args, **kwargs) -> None:
self.last_eval_iter = -1

self.keep('train_iter', 'last_eval_iter')


class BattleContext(Context):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add overview comments

@PaParaZz1 PaParaZz1 added the enhancement New feature or request label Aug 27, 2022
ding/utils/tests/test_sparse_logging.py Show resolved Hide resolved
ding/utils/sparse_logging.py Outdated Show resolved Hide resolved
@@ -91,3 +93,29 @@ def forward(self, x: torch.Tensor, spatial_size: Tuple[int, int], location: torc
output = output.reshape(N, B, H, W)
output = output.permute(1, 0, 2, 3).contiguous()
return output


def scatter_connection_v2(shape, project_embeddings, entity_location, scatter_dim, scatter_type='add'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we merge v2 to original ScatterConnection

return x


def build_activation2(activation):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should merge this function into original version

@PaParaZz1 PaParaZz1 marked this pull request as ready for review September 21, 2022 10:50
from ding.framework import Task, Context
from ding.league.v2 import BaseLeague
from ding.league.player import PlayerMeta
from ding.league.v2.base_league import Job
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on import from 2-level directory, like from ding.league.v2 import BaseLeague, Job

sleep(1)
log_every_sec(
logging.INFO, 600, "[Coordinator {}] running jobs {}".format(task.router.node_id, self._running_jobs)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should add state_dict/load_state_dict method for coordinator?

def get_job_info(self, player_id):
self.get_job_info_cnt += 1
other_players = [i for i in self.active_players_ids if i != player_id]
another_palyer = random.choice(other_players)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

return unpickler.load()


def my_pickle_loads(msg):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this function to ding/utils


@dataclass
class PlayerModelInfo:
get_new_model_time: float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why two time here, maybe we can simplify them

# If we don't have Trajectory(t-1), i.e. the length of the whole episode is smaller than unroll_len,
# we fill up the trajectory with the first element of episode.
return_episode = []
i = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why init i here

initial_elements.append(trajectory[0])
trajectory = initial_elements + trajectory
if self._last_step_fn:
last_step = deepcopy(trajectory[-1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why deepcopy here

task.router.node_id
)

ctx.n_episode = self.cfg.policy.collect.n_episode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeat?

time_begin = time.time()
collector(ctx)

if ctx.job_finish is True:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ctx.job_finish

)
)

gc.collect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why call gc here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants