Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use generation_config.json #646

Merged
merged 9 commits into from
Mar 28, 2024
Merged

Use generation_config.json #646

merged 9 commits into from
Mar 28, 2024

Conversation

haqishen
Copy link
Collaborator

@haqishen haqishen commented Mar 26, 2024

close #566
close #568

Check1

train->valid: output1
train->valid->Wave UI chat: output1.1
pull from HF->valid: output2
pull from HF->pipeline generation: output2.1
pull from HF->Wave UI chat: output2.2

I have output1.1 == output2 == output2.1 == output2.2 but slightly different from output1 when do_sample=False

Check2

from transformers import pipeline

generate_text = pipeline(
    model="haqishen/genconf-test",
    torch_dtype="auto",
    trust_remote_code=True,
    use_fast=True,
    device_map={"": "cuda:2"},
    token=True,
)

res = generate_text(
    input_text,
    # min_new_tokens=2,
    # max_new_tokens=256,
    # do_sample=False,
    # num_beams=1,
    # temperature=float(0.0),
    # repetition_penalty=float(1.0),
    # renormalize_logits=True
)
print(res[0]["generated_text"])

In this segment of code from model card, whether these parameters are commented out or not, the output remains the same.

Check3

generate_text.model.generation_config outputs:

GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2,
  "max_new_tokens": 256,
  "max_time": 120.0,
  "min_new_tokens": 2,
  "pad_token_id": 0,
  "temperature": null,
  "top_k": null,
  "top_p": null
}

shows that the generation_config.json is automatically loaded when creating pipeline.

Check4

Both DDP and Deepspeed works well with generation_config.json

Copy link
Collaborator

@pascal-pfeiffer pascal-pfeiffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @haqishen
Please see my in-code comments.

Please also adapt the model cards to reflect the usage of the generation config so that it is clear that base settings are coming from the model (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model).

probably something like this:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("h2oai/h2o-danube-1.8b-chat")
model = AutoModelForCausalLM.from_pretrained("h2oai/h2o-danube-1.8b-chat")

print(model.generation_config)
# ... base settings output here

# change default parameters here:
model.generation_config.temperature=0.8

print(model.generation_config)
# ... changed output here

inputs = tokenizer("Why is water good for you?", return_tensors="pt")
outputs = model.generate(**inputs)
# ...

Furthermore, this PR sets the base for a larger change including the chat template into the generation. #551

Comment on lines 171 to 178
# push generation_config to hub
if cfg.problem_type not in NON_GENERATION_PROBLEM_TYPES:
model.backbone.generation_config.push_to_hub(
repo_id=repo_id,
private=True,
commit_message="Upload generation_config.json",
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generation_config.json is pushed with the model automatically, this this is redundant and will give a 0 file change:
image

Comment on lines 846 to 853
cfg.prediction.temperature if cfg.prediction.do_sample else None
)
backbone.generation_config.top_k = (
cfg.prediction.top_k if cfg.prediction.do_sample else None
)
backbone.generation_config.top_p = (
cfg.prediction.top_p if cfg.prediction.do_sample else None
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not set these at all when do_sample is False

Copy link
Collaborator

@pascal-pfeiffer pascal-pfeiffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for the quick changes

edit: actually, there are more model cards that may need to be touched. But let's merge and address that with the chat template change.

@haqishen haqishen merged commit c732bad into main Mar 28, 2024
5 checks passed
@haqishen haqishen deleted the gen_conf branch March 28, 2024 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Add max_time generate setting [FEATURE] Push generation_config to HF
2 participants