Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT][onert-micro] PoC_V1: Training for onert-micro #12892

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

BalyshevArtem
Copy link
Contributor

This is draft first version of training for onert-micro. Similar to #12873 (comment) proposal.
Note: in this draft I added gradient related operation into main circle schema, not into circle+.
This draft also adds the necessary infrastructure to create a back propagation graph: circle-weight-divider and circle_training_configure tools (note: in #12873 (comment) proposal these two tools merged into one TrainingConfigureTool).

@BalyshevArtem BalyshevArtem added PR/NO TEST Tell CI to not run test PR/NO MERGE Please don't merge. I'm still working on this :) DRAFT A draft issue or PR for sharing one's current working status and discussion. labels Apr 22, 2024
reinterpret_cast<float *>(cur_train_target_data),
target_size);
#endif
train_interpreter.backward();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BalyshevArtem This approach (backward for each sample in a batch) does not produce the same result with backward mean/sum over batch approach ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understood the question correctly. In this approach, for each current data example, we calculate the gradient and summarize it for all examples in the current batch sample. And then, using the updateWeights method, according to the chosen optimization technique (SGD, ADAM), we update the weights using the calculated gradients on the batch_size sample. Thus, we somehow take into account the gradients calculated on the entire batch_size sample

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we calculate the gradient and summarize it for all examples in the current batch sample.

Oh, I will look into more. Could you please point to where we summerize the loss ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (_training_storage.getOptimizationStrategy() == SGD)
{
for (uint32_t j = 0; j < output_size; ++j)
{
grad_data_f[j] += calculated_data_f[j];
}
} else
- here summarize calculated gradients (SGD).
void OMTrainingRuntimeModule::updateSGDWeights(uint8_t *dest, uint8_t *src, size_t size)
{
assert(dest != nullptr); // Check caller
- here updated weigths

printf("MSE_ERROR TRAIN = %f\n", mse_result);
for (uint32_t e = 0; e < training_epochs; ++e)
{
train_interpreter.set_training_mode(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BalyshevArtem

Does this code support transfer learning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean by transfer learning that only certain last layers are trained? If yes, then TrainingConfigure tool is responsible for how many and which specific layers are trained in this draft for proposal_1 from #12873 (comment).
from TrainingDriver we cannot change and choose which layers will be trained in this variant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood that the backprop graph contains which layers will be trained, and TrainingDriver cannot change the training layer information other than turning training on and off. I'm trying to add trainable property in circle+ file format. So I wonder how you handle the trasfer learning feature. Thank you for your explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DRAFT A draft issue or PR for sharing one's current working status and discussion. PR/NO MERGE Please don't merge. I'm still working on this :) PR/NO TEST Tell CI to not run test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants