PrefixLM is loaded as CausalLM after HuggingFace export #739

timsteuer · 2023-11-15T14:41:31Z

When loading a prefix-lm model trained with llm-foundry into HuggingFace, one is tempted to do an AutoModelForCausalLM.from_pretrained(<snapshot>).

However, this loads the model not as a prefix-lm but as a causal LM (what I learnt the hard way).
In consequence, the predictions of the model are not exactly random, but rather suboptimal given its training state.

I consider this a bug, because it lets the user falsely believe that the model is loaded correctly. It is also kind of sneaky, as only if we compare the model predictions after loading with the ones from e.g. the training, we can see that something is not right.

The expected behavior would be to not allow a loading of a prefix-lm model as a causal LM with a pure left-to-right mask.

Environment

conceptual problem with the HuggingFace interfacing (tested with llm-foundry:main)

To reproduce

Steps to reproduce the behavior:

train a prefix-lm
convert it with the llm-foundry scripts to HuggingFace
Load it via AutoModelForCausalLM.from_pretrained()

Model will be loaded in the wrong state.

Expected behavior

Model will not be loaded at all. An error message reminds me to turn on trust_remote_code

The text was updated successfully, but these errors were encountered:

dakinggg · 2023-11-15T19:26:21Z

Hey @timsteuer, just to confirm. If you specify trust_remote_code=True, everything works the way you expect?

timsteuer · 2023-11-16T08:33:36Z

Yes, after trust_remote_code everything works fine.

dakinggg · 2023-11-16T08:44:35Z

Got it, unfortunately there isn't anything we can do about this. When you do AutoModelForCausalLM.from_pretrained() with an mpt model and trust_remote_code=False, it never touches code that Mosaic controls. It only uses Hugging Face code, because they have created an implementation of MPT within transformers itself.

timsteuer · 2023-11-16T09:15:54Z

Oh, I see.

So from your side, it might be a good idea to document that somewhere very prominently.
I only got the idea after reading to load the model with trust_remote_code=True in the HF documentation under usage tips.

Also, do you think it would be worthwhile to raise an issue / pull request @ HuggingFace?

If I get it correctly, they just have to check the config for the model's objective to see if they can load their implementation or have to use your implementation with trust_remote_code=True.

dakinggg · 2023-11-16T17:38:11Z

Yeah, docs are a good idea. We have an explicit error if you try to use MPT with trust_remote_code=False in foundry too. Where might you have looked for them to help me decide where to put them?

As for an issue/PR on Hugging Face, sounds reasonable to me! I'm not sure if they will want to do this because there are actually quite a few things that our implementation supports that theirs does not, but no harm in asking.

timsteuer · 2023-11-21T07:53:54Z

Hm, documentation may be helpful at the following two places:

In the scripts/inference/convert_composer_to_hf.py: show a warning when converting a prefix-lm that this must be loaded with trust remote code=True to work as expected.
Put an entry inside the LLM Foundry Tutorial (something like common pitfalls)

dakinggg · 2023-11-30T01:11:38Z

Thanks for the suggestions, will do!

timsteuer added the bug Something isn't working label Nov 15, 2023

dakinggg self-assigned this Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PrefixLM is loaded as CausalLM after HuggingFace export #739

PrefixLM is loaded as CausalLM after HuggingFace export #739

timsteuer commented Nov 15, 2023

dakinggg commented Nov 15, 2023

timsteuer commented Nov 16, 2023

dakinggg commented Nov 16, 2023

timsteuer commented Nov 16, 2023

dakinggg commented Nov 16, 2023

timsteuer commented Nov 21, 2023

dakinggg commented Nov 30, 2023

PrefixLM is loaded as CausalLM after HuggingFace export #739

PrefixLM is loaded as CausalLM after HuggingFace export #739

Comments

timsteuer commented Nov 15, 2023

Environment

To reproduce

Expected behavior

dakinggg commented Nov 15, 2023

timsteuer commented Nov 16, 2023

dakinggg commented Nov 16, 2023

timsteuer commented Nov 16, 2023

dakinggg commented Nov 16, 2023

timsteuer commented Nov 21, 2023

dakinggg commented Nov 30, 2023