How to use gradient accumulate in BytePS torch DDP? #417

wuyujiji · 2021-11-02T06:34:32Z

Did you have the demo for gradient accumulate in BytePS torch DDP? I can not find it in byteps/torch/example.

The text was updated successfully, but these errors were encountered:

aDecisionTree · 2021-11-02T12:35:17Z

I'm also interested in this~

ymjiang · 2021-11-03T05:58:33Z

bps.DistributedOptimizer supports gradient accumulation with the backward_passes_per_step option.

bps.DistributedDataParallel does not support it for now. We will add this feature.

wuyujiji · 2021-11-03T06:34:05Z

Could you please share the entire gradient accumulate demo for bps.DistributedOptimizer?

ymjiang · 2021-11-03T06:43:39Z

Here is a general workflow:

optimizer = bps.DistributedOptimizer(optimizer)
optimizer.set_backward_passes_per_step(accumulation_steps)
model.zero_grad()                               
for i, (inputs, labels) in enumerate(training_set):
    predictions = model(inputs)                   
    loss = loss_function(predictions, labels)       
    loss = loss / accumulation_steps               # optional
    loss.backward()
    if (i+1) % accumulation_steps == 0:          
        optimizer.step()                            
        model.zero_grad()

We will consider adding an example later.

wuyujiji · 2021-11-03T06:51:03Z

Here is a general workflow:

optimizer = bps.DistributedOptimizer(optimizer)
optimizer.set_backward_passes_per_step(accumulation_steps)
model.zero_grad()                               
for i, (inputs, labels) in enumerate(training_set):
    predictions = model(inputs)                   
    loss = loss_function(predictions, labels)       
    loss = loss / accumulation_steps               # optional
    loss.backward()
    if (i+1) % accumulation_steps == 0:          
        optimizer.step()                            
        model.zero_grad()

We will consider adding an example later.

Thanks for replying quickly! If I want to use torch.cuda.amp in above code, how did I further add it?

ymjiang added the enhancement New feature or request label Nov 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use gradient accumulate in BytePS torch DDP? #417

How to use gradient accumulate in BytePS torch DDP? #417

wuyujiji commented Nov 2, 2021

aDecisionTree commented Nov 2, 2021

ymjiang commented Nov 3, 2021

wuyujiji commented Nov 3, 2021

ymjiang commented Nov 3, 2021

wuyujiji commented Nov 3, 2021

How to use gradient accumulate in BytePS torch DDP? #417

How to use gradient accumulate in BytePS torch DDP? #417

Comments

wuyujiji commented Nov 2, 2021

aDecisionTree commented Nov 2, 2021

ymjiang commented Nov 3, 2021

wuyujiji commented Nov 3, 2021

ymjiang commented Nov 3, 2021

wuyujiji commented Nov 3, 2021