Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add InvokeAIAppConfig schema migration system #6243

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

lstein
Copy link
Collaborator

@lstein lstein commented Apr 19, 2024

Summary

This implements a straightforward system for migrating the invokeai.yaml configuration file between InvokeAIAppConfig schema versions.

To illustrate how it works, say that we are currently at config schema version 4.0.1, which includes an attention_type value xformers. But xformers becomes unsupported and we need to remove it. So we remove this choice from the list of valid values for attention_type, and update CONFIG_SCHEMA_VERSION to 4.0.2. But some users will have an old invokeai.yaml that contains attention_type: xformers and if they try to run the app it will fail at startup time with a pydantic schema validation error.

Here's what we do to fix that. In invokeai/app/services/config/migrations.py we add the following schema migration function to the load() method:

@migrator.register(from_version="4.0.1", to_version="4.0.2")
def migrate_3(config_dict: AppConfigDict) -> AppConfigDict:
   new_config_dict: AppConfigDict = {}
   for k, v in config_dict.items():
        if k == "xformers":
              continue
        else:
              new_config_dict[k] = v
   return new_config_dict

The migrator.register() decorator is defined in invokeai/app/services/config/config_migrate.py. You provide it with parameters indicating the incoming and outgoing version numbers, and define a function that takes a dictionary dump from the incoming schema and returns a dictionary compatible with the outgoing schema.

The function that calls this takes care of inserting the schema_version field and validating against the current schema after all the migrations are completed.

When 4.0.3 comes around, you add a similar function to config_default.py using the decorator @migrator.register(from_version="4.0.2", to_version="4.0.3")

Related Issues / Discussions

QA Instructions

First test whether it messes up your current up to date invokeai.yaml file (it shouldn't) by launching the web app. The code makes a backup in invokeai.yaml.bak, so you'll have backup copy just in case.

Next create an older version of invokeai.yaml. I have tested with an old 3.7-era file. Run the app. It should load fine, and invokeai.yaml will be brought up to date, preserving any customized settings from the 3.7 file.

Merge Plan

Squash merge when approved.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)

@github-actions github-actions bot added python PRs that change python files services PRs that change app services labels Apr 19, 2024
@lstein lstein force-pushed the lstein/feat/config-migration branch from 6bc864c to 1171661 Compare April 19, 2024 03:00
@lstein lstein force-pushed the lstein/feat/config-migration branch from 1171661 to 36495b7 Compare April 19, 2024 03:08
We don't need separate implementations for this class, let's not complicate it with an ABC
TypeVar is for generics, but the usage here is as an alias
The only pydantic usage was to convert strings to `Version` objects. The reason to do this conversion was to allow the register decorator to accept strings. MigrationEntry is only created inside this class, so we can just create versions from each migration when instantiating MigrationEntry instead.

Also, pydantic doesn't provide runtime time checking for arbitrary classes like Version, so we don't get any real benefit.
This was checking a `Version` object against a `MigrationEntry`, but what we want is to check the version object against `MigrationEntry.from_version`
@github-actions github-actions bot added the python-tests PRs that change python tests label Apr 23, 2024
Copy link
Collaborator

@psychedelicious psychedelicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reviewing I cleaned up and simplified a few things.

I noticed that the migrator looks for a version between a range to decide which migration to run.

I think we should be very strict here, more like the sqlite migrator class. Migrations work migrate data from an explicit version to one other explicit version, and any other versions result in an error.

I added a failing test case that should pass with stricter migration version logic. This test fails on main too, but I'd like this PR to improve on the current situation.

invokeai/app/services/config/config_default.py Outdated Show resolved Hide resolved
@lstein
Copy link
Collaborator Author

lstein commented Apr 25, 2024

I added a failing test case that should pass with stricter migration version logic. This test fails on main too, but I'd like this PR to improve on the current situation.

@psychedelicious Thanks for the tidying. I've made the migration logic strict by checking for contiguous ranges in the decorator parameters before any migrations begin.

In the new test case, I don't understand why this config is expected to fail:

invalid_v4_0_1_config = """
schema_version: 4.0.1

host: "192.168.1.1"
port: 8080
"""

It looks OK to me. Perhaps it was intended to fail when the config file version was 4.0.0? I've changed the port number to "ice cream", and the test correctly raises an invalid schema error now.

@lstein lstein force-pushed the lstein/feat/config-migration branch from 2ef6781 to 8144a26 Compare April 25, 2024 02:14
@lstein
Copy link
Collaborator Author

lstein commented Apr 25, 2024

@psychedelicious I updated the test_deny_nodes() test, which was marked as xfailed. It seems to work as expected now, both in isolation and as part of the full test suite. It looks like there used to be schema validation tests in the Graph class, but I don't see them in there any longer, so I removed the Graph assertions and kept the BaseInvocation.get_invocations() assertions.

@psychedelicious
Copy link
Collaborator

In the new test case, I don't understand why this config is expected to fail:

This should fail bc 4.0.1 isn't a valid registered config schema version

@lstein
Copy link
Collaborator Author

lstein commented Apr 28, 2024

I'm still not following; maybe it's too late at night. The current schema version in main (and this PR) is 4.0.1 and there are two migrators defined, one from 3.0.0 to 4.0.0 and a second from 4.0.0 to 4.0.1.

Oh sorry, that's my mistake. I thought the migration you added was 4.1.0.

My intention was to test having config schema version that is between two schema versions. In the first iteration of this PR, the migrator would have treated 4.0.1 the same as 4.0.0 and migrated it to 4.1.0. Hope that makes sense.

Now I understand. I've added a new test for missing intermediate steps in the config migration path.

Copy link
Collaborator

@psychedelicious psychedelicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in returning to this PR. I'm being very cautious of making changes to the config file handling because the it is very sensitive - a small error can turn into a lot of user pain, even if it is easy to fix. So I'm being very picky in this review.

Comment on lines 39 to 42
if any(from_version == m.from_version for m in cls._migrations):
raise ValueError(
f"function {function.__name__} is trying to register a migration for version {str(from_version)}, but this migration has already been registered."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the added strictness for contiguous migrations, this should probably raise when either from_version or to_version already have a registered migration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

function: MigrationFunction


class ConfigMigrator:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this class is a stateful singleton, testing is difficult. Right now, during tests, this class has all migrations registered in the "live" app. Test failures could possibly pop up over time as more migrations are registered.

Making this an instanced class would mitigate those issues.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. ConfigMigrator is now an instanced class, and test_config.py creates new instances of the migrator for testing decorator functionality, and continuity checking.

The migration steps themselves have been moved into their own module now: invokeai.app.services.config.migrations.

I also relocated get_config() into invokeai.app.services.config.config_migrate, and changed all places that are importing config_default to import from invokeai.app.services.config instead (get_config() is reexported from __init__.py) . This avoided circular import problems and makes config_default.py cleaner looking overall.

@pytest.mark.xfail(
reason="""
This test fails when run as part of the full test suite.
def test_migration_check() -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no test coverage of the register decorator here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test test_migration_check() checks that the decorator works when presented with contiguous from and to values and fails when there is a discontinuity. However, I have to rework it for the migrator being an instance rather than a class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check it out now.

temp_config_file.write_text(v4_config)

conf = load_and_migrate_config(temp_config_file)
assert Version(conf.schema_version) >= Version("4.0.1")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would the output version not being exactly 4.0.1 be acceptable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was written so that the test would continue to work when we move to version 4.0.2 -- because the test is running on the "live" config. I'm reworking this as suggested so that the migrator is an instance, in which case the exact value check will work.

@lstein lstein requested a review from RyanJDick as a code owner May 3, 2024 03:45
@github-actions github-actions bot added invocations PRs that change invocations backend PRs that change backend files labels May 3, 2024
@lstein
Copy link
Collaborator Author

lstein commented May 13, 2024

@psychedelicious When you have a chance, please give this the once-over. I've converted the migration class into a migration instance, but I'm not sure I did this in the most elegant way.

@psychedelicious
Copy link
Collaborator

Ok, I've revisited this PR and made some changes, simplified a few things and made the tests more rigorous. I'm not entirely convinced the extra complexity in this change is warranted.

I'm running into a problem where the app creates the custom nodes dir & files in the repo root, whenever I run a test. I believe the reason is that my invoke venv is now inside the repo root, and the find_root method decides the repo root is the runtime root. Also, I noticed that my live config file is migrated on main, when running the config tests.

I added a workaround - setting the root to a temp dir when args weren't parsed. I'm really wary of this change... I'm out of time on this PR for now, I'll revisit later this week.

@lstein
Copy link
Collaborator Author

lstein commented May 19, 2024

Ok, I've revisited this PR and made some changes, simplified a few things and made the tests more rigorous. I'm not entirely convinced the extra complexity in this change is warranted.

I'm running into a problem where the app creates the custom nodes dir & files in the repo root, whenever I run a test. I believe the reason is that my invoke venv is now inside the repo root, and the find_root method decides the repo root is the runtime root. Also, I noticed that my live config file is migrated on main, when running the config tests.

I added a workaround - setting the root to a temp dir when args weren't parsed. I'm really wary of this change... I'm out of time on this PR for now, I'll revisit later this week.

Blah. This is very bad. We need to set up the test environment with a temporary root directory. For the model manager tests, I created two fixtures like this:

@pytest.fixture
def mm2_root_dir(tmp_path_factory) -> Path:
    root_template = Path(__file__).resolve().parent / "data" / "invokeai_root"
    temp_dir: Path = tmp_path_factory.mktemp("data") / "invokeai_root"
    shutil.copytree(root_template, temp_dir)
    return temp_dir

@pytest.fixture
def mm2_app_config(mm2_root_dir: Path) -> InvokeAIAppConfig:
    app_config = InvokeAIAppConfig(models_dir=mm2_root_dir / "models", log_level="info")
    app_config._root = mm2_root_dir
    return app_config

Then I use the mm2_app_config fixture to get the config rather than calling get_conf(). Maybe something like this would work here?

I'll revert to this on Monday to see if I can make the tests behave properly.

When you say "I'm not sure the extra complexity is worth it," what are you referring to? To me, the current code looks more complex than the original version I wrote using decorators. My goal in the original version was to make it so that you could register a new migration by just adding a single decorated function definition to the migrations.py file. In the current code, we have to make code changes in four places: two changes in migrations.py to define the migration function and then create a ConfigMigration object, followed two more changes in config_migrate.py to import the new ConfigMigration object and then to register it in load_and_migrate_config.

I'd like to rethink how this is done. It seems like a lot of work just to remove a key from a dict.

@psychedelicious
Copy link
Collaborator

By "I'm not sure the extra complexity is worth it", I meant having a migration system at all for the config file. A couple simple functions serve the need just fine.

Where I was coming from I added to the PR: We don't have a need to be able to crank out config migrations quickly, but we do have a need for simple and testable code.

It took me a while to understand the intention of ABC, its implementation, the decorator and migration functions. Because the migration functions were only defined inside another function, it was not easy to test each one individually. You had to test the whole migration chain at once. Removing this extra stuff removes a layer of indirection.

@psychedelicious
Copy link
Collaborator

The problem with interactions with the live filesystem aren't caused by tests calling get_config directly. It's that when the app's modules are loaded and classes instantiated in tests, get_config gets called at some point. Another issue is when we read the invocations folder's init, it creates a the custom nodes directory.

My last commit hijacks the root dir when the app isn't launched via script. This does work but I'm not confident in it. This issue is unrelated to the changes in this PR.

@psychedelicious
Copy link
Collaborator

Couple related issues with how tests are running and the config and custom nodes loader:

  • get_config() is called in several places in the module scope (example). These calls should all be moved inside classes.
  • The bulk download images test imports the whole live app. That test is very thorough but also, a lot of it feels like testing for testing's sake. Fixing the root dir issue should make this OK.
  • The custom nodes implementation relies on the invokeai.app.invocations module's init file. In hindsight, this is clearly terrible.

@lstein
Copy link
Collaborator Author

lstein commented May 24, 2024

@psychedelicious At this point I think it's easier just to add ad hoc code to the load_and_migrate_config() function. Unless you object, I'm just going to close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend PRs that change backend files invocations PRs that change invocations python PRs that change python files python-tests PRs that change python tests services PRs that change app services
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants