Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[onert] Apply BackPropAccumulator layer #12976

Merged
merged 1 commit into from May 17, 2024

Conversation

ragmani
Copy link
Contributor

@ragmani ragmani commented May 9, 2024

This commit applies BackPropAccumulator layer to train backend.

  • Add registering and planing disposable tensors for back-propagation
  • Make layers use disposable tensors instread of original back-prop tensors
  • Apply BackPropAccumulator layer to each BackPropTensor

ONE-DCO-1.0-Signed-off-by: ragmani ragmani0216@gmail.com

@ragmani
Copy link
Contributor Author

ragmani commented May 9, 2024

After this change, training of the branching graph becomes possible.

$ ./Product/x86_64-linux.release/out/bin/onert_train mnist_branched.circle --load_expected:raw out/train.output.1000.bin --load_input:raw out/train.input.1000.bin --loss 1 --loss_reduction_type 1 --optimizer 1 --learn
ing_rate 0.001 --batch_size 1
Model Expected Filename out/train.output.1000.bin
Model Input Filename out/train.input.1000.bin
Model Filename mnist_branched.circle
== training parameter ==
- learning_rate   = 0.001
- batch_size      = 1
- loss_info       = {loss = mean squared error, reduction = sum over batch size}
- optimizer       = sgd
========================
Epoch 1/5 - time: 0.196ms/step - loss: [0] 0.0340
Epoch 2/5 - time: 0.175ms/step - loss: [0] 0.0316
Epoch 3/5 - time: 0.186ms/step - loss: [0] 0.0304
Epoch 4/5 - time: 0.197ms/step - loss: [0] 0.0296
Epoch 5/5 - time: 0.180ms/step - loss: [0] 0.0289
===================================
MODEL_LOAD   takes 0.3950 ms
PREPARE      takes 2.3390 ms
EXECUTE      takes 950.4810 ms
- Epoch 1      takes 195.6460 ms
- Epoch 2      takes 175.2290 ms
- Epoch 3      takes 185.9180 ms
- Epoch 4      takes 196.5610 ms
- Epoch 5      takes 179.5790 ms
===================================

This commit applies BackPropAccumulator layer to train backend.
  - Add registering and planing disposable tensors for back-propagation
  - Make layers use disposable tensors instread of original back-prop tensors
  - Apply BackPropAccumulator layer to each BackPropTensor

ONE-DCO-1.0-Signed-off-by: ragmani <ragmani0216@gmail.com>
@ragmani ragmani force-pushed the onert/apply_backprop_accumulator branch from e6b2c32 to e54b48d Compare May 9, 2024 07:45
@ragmani ragmani requested a review from a team May 9, 2024 07:45
@ragmani ragmani added the PR/ready for review It is ready to review. Please review it. label May 9, 2024
Copy link
Contributor

@jyoungyun jyoungyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ir::OperandInfo backend_info{obj.shape(), obj.typeInfo(), obj.info().memAllocType(),
obj.isConstant()};
tensor_builder->registerBackwardTensorInfo(ind, backend_info, ir::Layout::NHWC);
tensor_builder->registerBackwardTensorInfo(ind, createBackwardTensorInfo(obj),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give me an example of BackwardTensor?

In my understanding,

  • DisposableBackwardTensor: Tensor for backpropagation. Once propagation is done, It can be freed.
  • BackwardTensor: Used for backpropagation but can not be freed..? I couldn't think of a concrete example of this tensor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code just registers information of tensors that are only used in backwarding. It does not directly affect memory planning of any tensors. Memory planning of DisposableBackPropTensors will be planed in planDisposableBackPropTensors() below. And other tensors for backwarding do not have memory planning yet because they don't have def/use information.
If you are curious about how memory planning is possible with the function below, I will be happy to provide additional explanation online or offline.

Copy link
Contributor

@zetwhite zetwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

zetwhite

This comment was marked as duplicate.

@@ -103,11 +127,52 @@ backend::train::ITensorRegistry *BackendContext::genTrainingTensors()
tensor_builder->notifyBackwardFirstUse(ind);
});

for (const auto &op_index : tgraph.btopolSortOperations())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another little question.

Is it possible to replace tgraph.btopolSorrtOperations() with an operation vector(backward_order) that applied truncatedBackwardOrder?

// linearize for backwarding
auto backward_order = lowered_graph->trainable_graph().btopolSortOperations();
// get rid of all nodes not reachable from a node with trainable parameters
backward_order = lowered_graph->trainable_graph().truncateBackwardOrder(backward_order);

I thought that if backward is not going to work, there is no need to allocate tensor related to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, This is a bit out of context. So, please proceed without considering this question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your information. I missed out truncateBackwardOrder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aeren1564 Are you going to apply truncateBackwardOrder to all where btopolSortOperations() is being used? If so, Could you apply it to here as well?

@hseok-oh hseok-oh merged commit e561dfc into Samsung:master May 17, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR/ready for review It is ready to review. Please review it.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants