New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT][onert-micro] PoC_V1: Training for onert-micro #12892
base: master
Are you sure you want to change the base?
[DRAFT][onert-micro] PoC_V1: Training for onert-micro #12892
Conversation
This draft introduces full refactored onert-micro. ONE-DCO-1.0-Signed-off-by: Artem Balyshev <a.balyshev@samsung.com>
d1c7ef1
to
c107b1f
Compare
c107b1f
to
ad3942d
Compare
reinterpret_cast<float *>(cur_train_target_data), | ||
target_size); | ||
#endif | ||
train_interpreter.backward(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BalyshevArtem This approach (backward for each sample in a batch) does not produce the same result with backward mean/sum over batch approach ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understood the question correctly. In this approach, for each current data example, we calculate the gradient and summarize it for all examples in the current batch sample. And then, using the updateWeights
method, according to the chosen optimization technique (SGD, ADAM), we update the weights using the calculated gradients on the batch_size sample. Thus, we somehow take into account the gradients calculated on the entire batch_size sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we calculate the gradient and summarize it for all examples in the current batch sample.
Oh, I will look into more. Could you please point to where we summerize the loss ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ONE/onert-micro/onert-micro/src/core/train/OMTrainingRuntimeModule.cpp
Lines 448 to 454 in ad3942d
if (_training_storage.getOptimizationStrategy() == SGD) | |
{ | |
for (uint32_t j = 0; j < output_size; ++j) | |
{ | |
grad_data_f[j] += calculated_data_f[j]; | |
} | |
} else |
ONE/onert-micro/onert-micro/src/core/train/OMTrainingRuntimeModule.cpp
Lines 471 to 473 in ad3942d
void OMTrainingRuntimeModule::updateSGDWeights(uint8_t *dest, uint8_t *src, size_t size) | |
{ | |
assert(dest != nullptr); // Check caller |
printf("MSE_ERROR TRAIN = %f\n", mse_result); | ||
for (uint32_t e = 0; e < training_epochs; ++e) | ||
{ | ||
train_interpreter.set_training_mode(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this code support transfer learning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean by transfer learning that only certain last layers are trained? If yes, then TrainingConfigure tool is responsible for how many and which specific layers are trained in this draft for proposal_1
from #12873 (comment).
from TrainingDriver we cannot change and choose which layers will be trained in this variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood that the backprop graph contains which layers will be trained, and TrainingDriver cannot change the training layer information other than turning training on and off. I'm trying to add trainable
property in circle+ file format. So I wonder how you handle the trasfer learning feature. Thank you for your explanation.
This is draft first version of training for onert-micro. Similar to #12873 (comment) proposal.
Note: in this draft I added gradient related operation into main circle schema, not into circle+.
This draft also adds the necessary infrastructure to create a back propagation graph: circle-weight-divider and circle_training_configure tools (note: in #12873 (comment) proposal these two tools merged into one
TrainingConfigureTool
).