Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add model manager for machine learning models #918

Open
wants to merge 1 commit into
base: v1.x.x
Choose a base branch
from

Conversation

idlir-shkurti-frequenz
Copy link
Contributor

@idlir-shkurti-frequenz idlir-shkurti-frequenz commented Apr 8, 2024

This PR adds a simple model manager module which implements the logic needed to load, update and monitor machine learning models.
Currently there are two model repository structures:

  • SingleModelRepository - This is a structure where we direct our handler to a repository with a single model in it.
  • DirectModelRepository - This repository structure takes in dictionary input with a key which is the name of the model and a value which is the direct path to each specific module.

This PR is heavily based (but slimmed down) on #397

@idlir-shkurti-frequenz idlir-shkurti-frequenz requested a review from a team as a code owner April 8, 2024 09:41
@github-actions github-actions bot added the part:tests Affects the unit, integration and performance (benchmarks) tests label Apr 8, 2024
@github-actions github-actions bot added the part:docs Affects the documentation label Apr 8, 2024
@idlir-shkurti-frequenz idlir-shkurti-frequenz self-assigned this Apr 8, 2024
@llucax
Copy link
Contributor

llucax commented Apr 8, 2024

Maybe this was talked elsewhere, but it might be nice to provide some extra context about what changed between the closing of #397 and now. Was the experiment inside the actor successful and we think this new slimmed down implementation is good/useful enough to go in the SDK?

@idlir-shkurti-frequenz
Copy link
Contributor Author

Maybe this was talked elsewhere, but it might be nice to provide some extra context about what changed between the closing of #397 and now. Was the experiment inside the actor successful and we think this new slimmed down implementation is good/useful enough to go in the SDK?

@llucax We decided that it would be best to start the model loading immediately inside the sdk so that we don't have to migrate extra code later. This slimmed down version should be good enough for the initial usecases. We don't need extra weeday and enumerated repositories but only a way to load and monitor ML models from a pre-defined path.

Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not crazy about the get_model() interface because it is too dynamic, it would be good if it could be more static and misuse would be detected by mypy instead of at runtime.

Looking at the bigger picture here, it seems this code is still inheriting a lot of the complexity of the previous PR. Do we really need a special case for a single model repository? Can't we just use the dict-based repo and let user pass a key with the value they like the most? It seems like a lot of added extra complexity just to avoid passing a key.

I would just remove the special case here and also the SingleModelRepository class and force users to pass a key. Once this is done, I don't think it makes sense to keep the abstract base class ModelRepository, and if we remove it, we can just rename DirectModelRepository to ModelRepository. This will make this PR much much simpler.

If we need more abstraction and more types of repositories in the future, we can bring the complexity back, but I wouldn't bet on it just yet.

src/frequenz/sdk/model/__init__.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
@idlir-shkurti-frequenz
Copy link
Contributor Author

Looking at the bigger picture here, it seems this code is still inheriting a lot of the complexity of the previous PR. Do we really need a special case for a single model repository? Can't we just use the dict-based repo and let user pass a key with the value they like the most? It seems like a lot of added extra complexity just to avoid passing a key.

That is actually something I was also considering. @daniel-zullo-frequenz @cwasicki I think I agree with Luca here that we can just have the one repo structure with a dict input and forget about the single model repo structure. If that is ok with you I will make the change.

@cwasicki
Copy link
Collaborator

If that is ok with you I will make the change.

Agree. I think we can also limit it to the manager and don't need the repositories.

@cwasicki
Copy link
Collaborator

I suggest for now to only implement the bare minimum functionality that we currently need.

@llucax
Copy link
Contributor

llucax commented Apr 10, 2024

Agree. I think we can also limit it to the manager and don't need the repositories.

I think having only the "dict repository" is simple enough and gives more flexibility to separate different models if we need to, also the manager and repository are very closely related, so doing this would basically mean throwing everything and start again, right?

IMHO having DirectModelRepository (as ModelRepository) and the ModelManager that can monitor more than one ModelRepository strikes a good balance between simplicity and flexibility. But if you think that is still over-bloated for our current needs, I'm also OK to remove the repositories altogether.

@idlir-shkurti-frequenz
Copy link
Contributor Author

@llucax @cwasicki I made some changes to the PR. I removed the repository structures for simplicity and changed the ModelManager to require a model_config in its initialisation. This model_config is simply a dict[str, Path] pointing to the models.

@llucax I changed the key from Any to str and not as a generic like you proposed. Do you think it is important for now to make them generic? This will simply be the name of the model and I would assume as key is good enough for now. If you think it still makes sense to change them to generic then let me know.

@llucax
Copy link
Contributor

llucax commented Apr 11, 2024

I changed the key from Any to str and not as a generic like you proposed. Do you think it is important for now to make them generic?

TL;DR: I think it is OK. It was a journey to convince myself of that, so I will keep the long version for future reference :D

I was going to say it's not important in this context, but then I was thinking about the case where we want to use an enum for safety, as str is very prone to errors due to typos for example, which can't be detected by linters or mypy, so they will only occur at runtime.

This is why I still liked the separation between the repository and the manager, because then you can have a "weekly" model, where keys are a enum with the week days, and then you can have an arbitrary model with some str keys that you build more dynamically.

I think it would be good to encourage users to design their code with safety in mind, so I'm not sure. I understand this is good enough for the current use cases but I feel it might be a move in the wrong direction.

But I'm not sure if it is worth to provide a model repository that it is just a dict with str keys, because then most users will just use that and will have the same issues. So it probably makes sense to add the safety once we want to provide safer repositories, like a day of the week repo, where keys can only be members of a DayOfTheWeek enum for example, etc., as we discussed originally.

Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a bit lost in the patching and mock files in the tests, so I will have a look at it later.

src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
tests/model/test_model_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/model/_manager.py Outdated Show resolved Hide resolved
Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, a few more comments, mostly about how to subclass BackgroundService 😬

Args:
model_paths: A dictionary of model keys and their corresponding file paths.
"""
super().__init__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still need to forward the name to the BackgroundService.

Suggested change
super().__init__()
super().__init__(name=name)

super().__init__()
self._models: dict[str, _Model[T]] = {}
self.model_paths = model_paths
self._service_name = name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And remove this.

Suggested change
self._service_name = name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh ok I see. I wasn't sure how this worked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you did and it was still unclear we might need to improve the docs!

src/frequenz/sdk/ml/_model_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/ml/_model_manager.py Outdated Show resolved Hide resolved
src/frequenz/sdk/ml/_model_manager.py Outdated Show resolved Hide resolved
llucax
llucax previously approved these changes Apr 16, 2024
Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patience!

@idlir-shkurti-frequenz idlir-shkurti-frequenz added the cmd:skip-release-notes It is not necessary to update release notes for this PR label Apr 30, 2024
@github-actions github-actions bot added the part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) label Apr 30, 2024
Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment while we wait for the release of rc6.post1 :)

pyproject.toml Outdated
@@ -96,6 +96,7 @@ dev-pytest = [
# For checking docstring code examples
"frequenz-sdk[dev-examples]",
"hypothesis == 6.100.0",
"sybil == 6.0.3",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be necessary anymore, we fixed it in repo-config.

src/frequenz/sdk/ml/_model_manager.py Outdated Show resolved Hide resolved
@cwasicki
Copy link
Collaborator

@idlir-shkurti-frequenz Could you rebase this PR please?

@github-actions github-actions bot added part:data-pipeline Affects the data pipeline part:actor Affects an actor ot the actors utilities (decorator, etc.) part:microgrid Affects the interactions with the microgrid labels May 27, 2024
…models

Signed-off-by: Idlir Shkurti <idlir.shkurti@frequenz.com>
@idlir-shkurti-frequenz
Copy link
Contributor Author

@idlir-shkurti-frequenz Could you rebase this PR please?

Done! Also addressed the custom exception point suggested by Luca.

@shsms
Copy link
Contributor

shsms commented May 28, 2024

Looks like the commit wasn't signed, someone might have to create a new PR, ideally with a shorter commit message.

@cwasicki
Copy link
Collaborator

Ok, then I will do some testing with this and re-create it afterwards.

@cwasicki
Copy link
Collaborator

#954

@llucax
Copy link
Contributor

llucax commented May 29, 2024

Looks like the commit wasn't signed, someone might have to create a new PR, ideally with a shorter commit message.

Didn't get what's going on, it needs to be re-created because Idlir can't do it himself now? We can still amend this PR and force-push to it. Also I don't get the comment about the commit message length.

@shsms
Copy link
Contributor

shsms commented May 29, 2024

Also I don't get the comment about the commit message length.

a shorter one that doesn't fold on gh would be nice.

@llucax
Copy link
Contributor

llucax commented May 29, 2024

Oh, OK, so unrelated to the signing part, I thought it was related. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmd:skip-release-notes It is not necessary to update release notes for this PR part:actor Affects an actor ot the actors utilities (decorator, etc.) part:data-pipeline Affects the data pipeline part:docs Affects the documentation part:microgrid Affects the interactions with the microgrid part:tests Affects the unit, integration and performance (benchmarks) tests part:tooling Affects the development tooling (CI, deployment, dependency management, etc.)
Projects
Status: Review approved
Development

Successfully merging this pull request may close these issues.

None yet

4 participants