DefaultStrategy only receives successful training replies #1086

srcansiz · 2024-04-09T10:18:05Z

The default behavior of the default strategy is that if any nodes returns an unsuccessful reply the training round should fail. However, DefaultStrategy.refine function in Experiment.run_once only gets training replies that are success. This is due to Job._get_training_result that does not add success = False replies into training_replies dictionary. This situation results as successful training round even when one or more than one node returns success=False if at least a node return successful training reply.

Simple solution can be to add unsuccessful reply to the training replies and let strategy class do the rest. However, this solution also requires to modify function _extract_received_optimizer_aux_var_from_round to only allow extraction for successful replies.

The text was updated successfully, but these errors were encountered:

srcansiz added bug this issue is about reporting and resolving a suspected bug candidate an individual developer submits a work request to the team (extension proposal, bug, other request) labels Apr 9, 2024

srcansiz changed the title ~~DefaultStrategy only receives succesfull training replies~~ DefaultStrategy only receives successful training replies Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DefaultStrategy only receives successful training replies #1086

DefaultStrategy only receives successful training replies #1086

srcansiz commented Apr 9, 2024

DefaultStrategy only receives successful training replies #1086

DefaultStrategy only receives successful training replies #1086

Comments

srcansiz commented Apr 9, 2024