[onert-micro] Introduce Training #12873

BalyshevArtem · 2024-04-16T11:51:44Z

What

Let's discuss how to add training into onert-micro.

Why

We need add training feature into onert-micro for some target models.

cc @Torrero, @SlavikMIPT, @chunseoklee, @lemmaa

BalyshevArtem · 2024-04-16T12:43:45Z

First proposal

The idea is to make a two-stage process. At the first stage, a model is prepared from the initial model on the developer's host (using one-toolchain) - back propagation graph is built, optimizations of this graph are carried out, and so on. At the second stage, based on the initial model and the resulting back propagation graph, training is performed on the device using onert-micro.

The main goal in this proposal is to keep onert-micro as simple as possible, without greatly complicating its logic, without greatly increasing the code base (the same as the binary size). This will help to keep it light during normal (non-training) usage. The introduction of a two-stage process will help achieve this goal, while adding variability in terms of applying various complex optimizations and hypothyroidism checks.

Proposed Overall structure of this two-stage system:

First, we have a circle model. We submit it to the program TrainingConfigureTool input (the name is temporary, can be supplied separately, can be part of a one-toolchain), at the output it outputs three files: a circle model from which the weights that will be trained are cut out, a wof (weight only format) file in which the trained weights are stored, and a backpropagation graph in the form of a circle model.
All this files and files with prepared train and test datasets are submitted to the onert-micro training input. According to the training parameters set in the application (it is possible to iterate through them in order to find the best combination), training takes place.
At the end, after the end of the training process, it is checked on the test data whether there has been an improvement and if so, then we save the new weights obtained in the wof file.

Details

First stage it is - TrainingConfigureTool. Its two main tasks of this:

cutting weights from the source file (layers for this can be set manually or selected automatically in the future to achieve better performance under current resource constraints)
creating a back propagation graph that will allow learning to take place, performing which onert-micro will be able to calculate gradients.

As a result of the operation of this tool, we will receive three files: circle model without training weights, file where stored weights for training (wof - weights only format), and circle (maybe circle +) model with backpropagation graph.
in the future, the Training Tool will be able to perform actions to improve the training process:

based on a given memory budget, select only a part of the layers for training, or even a part of the weights of a certain layer - this process called sparse backpropagation.
applications of mixed precision quantization, search for parts of the network for which the materialization technique will be performed (when intermediate results are not saved for some part of the network, but they are recalculated in the backpropagation process)
optimizations on the graph itself
and so on

The output graph of back propagation will consist of both traditional circle operations and special operations calculating the gradient for the current operation (for example, Conv2DWeigthGrad operation calculating the gradient for weights, Conv2DInputGrad - operation calculating the gradient for input tensor). These operations can be added as custom to circle, or for example, add them as specific to circle+. I prefer the second option. That is, these are operations for calculating the gradient.

Second stage is - Onert-micro Training - will provide various training parameters. In order to achieve maximum effect during training, onert-micro training will be able to support different optimizers (SGD, ADAM, RMSProp, maybe some custom), the choice of the size of the batch, and specific constants of optimizers (learning rate, constans for ADAM, for RMSProp).

BalyshevArtem · 2024-04-16T12:50:54Z

Proposal for a file structure containing only trainable weights

.wof (weigth only format)

This proposal is taken from A3: Define a separate format for storing a single file of diffs (≈ changed weights) from internal repo (proposed by @glistening ).

0	2		4	8	…	4 + N * 4
MAGIC NUMBER	SCHEMA VERSION	RESERVED	n_buffers (=N)	Offset 1	…	Offset N	Buffer 1 Data	Buffer 2 Data

Set offset as -1 if there is no data.

The sequence numbers in this .wof file correspond to the tensor numbers from the original circle file. That is, the number of buffers corresponds to the number of tensors in the original file. In order to find the necessary constant data for the current tensor number k in the original file, you need to take data from the buffer number k.

BalyshevArtem added the type/discussion We need discussion. Discussion itself can help. Even without conclusions! label Apr 16, 2024

BalyshevArtem mentioned this issue Apr 22, 2024

[DRAFT][onert-micro] PoC_V1: Training for onert-micro #12892

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[onert-micro] Introduce Training #12873

[onert-micro] Introduce Training #12873

BalyshevArtem commented Apr 16, 2024 •

edited

BalyshevArtem commented Apr 16, 2024 •

edited

BalyshevArtem commented Apr 16, 2024

[onert-micro] Introduce Training #12873

[onert-micro] Introduce Training #12873

Comments

BalyshevArtem commented Apr 16, 2024 • edited

What

Why

BalyshevArtem commented Apr 16, 2024 • edited

First proposal

Details

BalyshevArtem commented Apr 16, 2024

Proposal for a file structure containing only trainable weights

.wof (weigth only format)

BalyshevArtem commented Apr 16, 2024 •

edited

BalyshevArtem commented Apr 16, 2024 •

edited