Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feasibility research] Investigate if we can store the model artifact with layers #55

Open
gaocegege opened this issue Jun 15, 2020 · 2 comments
Labels
kind/design Categorizes issue or PR as related to design. priority/P2 Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@gaocegege
Copy link
Member

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug
/kind feature

What happened:

Now we store the model in one layer.

What you expected to happen:

We should investigate if we can store it with multiple layers. We can reuse the cached layer if it is not changed.

/cc @zw0610

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

@gaocegege gaocegege added kind/design Categorizes issue or PR as related to design. priority/P2 Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jun 15, 2020
@zw0610
Copy link

zw0610 commented Jun 17, 2020

We can decompose a model into two dimensions, which differs in terms of how to distinguish models.

Op/layer-oriented

This seems straightforward as we visualize models. Like a double-stranded structure, one strand represents the the op/layer with the other strand holds the corresponding parameter values. To the finest grid, a model with N ops will have 2N files, so that the oci artifacts can trace the difference between models and versions.

However, such proposal seems not practical at this moment.

  1. Demand: Model files are not large enough to ask for storage-efficient registry (this may change in the future)
  2. Depend: Main stream dl frameworks do not separate parameter values into different files in their exporting scheme

Diff-oriented

Another proposal to conform the contemporary model exporting scheme in model version control is to extract the difference from model A_n to model A_n+1 into a single file (or multiple files).

Artifacts Layers Actual Files
Layer 1 [model file] + [diff file(empty)]
Layer 2 (same model file) + [diff file from 1->2 ]
... ...
Layer N+1 (same model file) + [diff file n->n+1]

The client can still use file(s) to distinguish layers as the diff-file always changes when part of the model is change, either in parameter values or structure.

Additional work lies on how to design and implement the diff file. The simplest way is to store modification scripts directly into the diff file so executing the diff file with the model file generated from the last layer can result in the model file in this layer.


Phase-oriented

This proposal changes nothing. It's more like how users can utilize a model registry with version control feature.

In a perspective of end-to-end model training, it often takes several phases to train a model. For example, if we wish to train a cnn based on a relatively small dataset for image classification, we can use ImageNet to train the convolution part as Phase 1 and then fine-tuning the FC part as Phase 2. Such pattern can be extended to a more complicated style:

Phase 1                  Phase 2                  Phase 3                         Phase 4                                 Phase 5                      Phase 6+n
                                              ⊢[Fine-tune FC on Dataset X]--[Fine-tune on User Profile P]--[online training on time A0]-⋯-[online training on time An]
[Raw Model]--[Train on Large Public Dataset]-|
                                              ⊢[Fine-tune FC on Dataset X]--[Fine-tune on User Profile Q]--[online training on time B0]-⋯-[online training on time Bn]

The changes of model files between phases can be captured in this client. Under such design, users are able to validate the lineage of a final model and can revert easily if any accuracy incidents occur. But it seems not so friendly to the storage space.

@gaocegege
Copy link
Member Author

#55 (comment)

I think we need framework support for all these three approaches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design. priority/P2 Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

2 participants