New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow embeddings and loras to have different names #6053
base: main
Are you sure you want to change the base?
Allow embeddings and loras to have different names #6053
Conversation
97f0e7b
to
3c8c944
Compare
) | ||
return TextualInversionCheckpointProbe(path).get_base_type() | ||
super().__init__(files.pop()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the first model file we find isn't the correct one?
I've seen a few TI models hosted on diffusers that have multiple checkpoints in them, presumably representing different stages of trainign.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the first model file we find isn't the correct one?
Nevermind that, I misread the length check. The other issue still stands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe any of the checkpoints are actually used for inference but I could be wrong. This only scans the root directory of the folder so as long as there's only 1 model there it loads it fine. It does move it out of the folder though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that this logic would appear to fail probing TIs that have multiple model files in the directory.
Some look like this (struggling to find an example, filenames probably aren't accurate):
embedding.pt
checkpoint-001.pt
checkpoint-002.pt
This is a valid TI and the model loader knows how to load it, but I believe the logic in this PR would see the multiple model files and consider it an invalid model.
def get_base_type(self) -> BaseModelType: | ||
path = self.model_path / "learned_embeds.bin" | ||
if not path.exists(): | ||
class TextualInversionFolderProbe(TextualInversionCheckpointProbe): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like how this results in a "folder" model having a path that points to a file...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not really sure what value of storing it as a folder even are for these though. I haven't found any lora or embeddings that have other contextual files that affect generations. Just seems to put us in a position where we have to enforce folder structure for no reason to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The folder was introduced by hugging face. TIs usually contain two files--one with the weights and another with the name of the concept.
For Lora's the folder is used to give the model a name
I've totally lost track of all the model probing pathways at this point, but I see that |
Yeah this PR loads all Lora and embedding as checkpoint single file models. This bypasses all of the checks post install since they're tied to folder models. If you install a model while passing in the model type, that also bypasses the strict naming requirements. Currently there's no way to do that in the UI but the import api call accepts it. All said though, I'm not sure if there's unexpected consequences to treating a "diffuser lora/ti" as a single checkpoint model. I couldn't find examples of models requiring a config.json on huggingface |
3c8c944
to
653a8f6
Compare
Indeed most of these are just single file checkpoints, but HuggingFace gives them the same file name. The enclosing folder name is used during probing to get a name for the model and to prevent path collisions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what I get when I probe one of HuggingFace's concept library textual inversions:
% probe-model.py 001glitch-core
001glitch-core:{
"key": "cfe3205d-b881-458f-8be7-99f875452039",
"hash": "blake3:7e62bfe8ff7c27a2747c146fe86afeda1ba3bba3db75d3644a40fe418b9b3d7e",
"path": "/home/lstein/invokeai-main/models/sd-1/embedding/001glitch-core/learned_embeds.bin",
"name": "learned_embeds",
"base": "sd-1",
"description": "sd-1 embedding model learned_embeds",
"source": "/home/lstein/invokeai-main/models/sd-1/embedding/001glitch-core/learned_embeds.bin",
"source_type": "path",
"source_api_response": null,
"cover_image": null,
"type": "embedding",
"format": "embedding_file"
}
As noted by @psychedelicious, the format
is now embedding_file
, which is deceptive. But the big issue is that the model is now named "learned_embeds.bin" I confirmed by installing a hugging face concept using its repo_id
, and sure enough the system installed a model named learned_embeds
. This will make it hard to distinguish one TI from another!
@brandonrising Are you still working on this? |
Yeah it seemed like an issue that could only really come up from manual manipulation of the models directory, so it's fallen down my priority list. I think we can just safely close this PR unless you see value in it somewhere. |
As in, models can't even be installed with different names without manual intervention currently. We probably want to find some way to support different filenames, but it seems like this isn't the right way |
Summary
Currently LoRA folders will only named if they're named precisely pytorch_lora_weights.bin or pytorch_lora_weights.safetensors, and embedding folders will only load if they're named "learned_embeds.bin". This PR is an attempt to support different lora/embedding types if they are installed as a folder. Even though we have this naming convention for the files in the folder, our backend supports many more types of models for embeddings, and loras seem to be rarely named "correctly" in Huggingface.
QA Instructions
Test this by copying one of your already existing embeddings into a folder under the same directory in your models. Ensure it's scanned correctly on startup and is usable forgeneration.
Merge Plan
Can be merged if approved by code owners. @lstein may know some reason why we had this naming requirement. If so, we can look for different ways to handle this.