Asking about the return_scores during generation #1661

freyaya123 · 2024-04-10T10:49:51Z

Hi, I'm new to ctranslate2, and I'm confused about the scores returned by generator.generate_batch() function. What's the coresponding meaning of the scores in the huggingface generate() function?

For example,

>>> text
'Question:\nNatalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?\nAnswer reasoning:\ndef solution():    """Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?"""\n'

in hf generation:

>>> input_ids=tokenizer(text, return_tensors="pt").input_ids
>>> input_ids.shape
torch.Size([1, 100])
>>>stop=[13, tokenizer.eos_token_id]
>>>model = AutoModelForCausalLM.from_pretrained("xxx", return_dict_in_generate=True)
>>>tokenizer = AutoTokenizer.from_pretrained("xxx")
>>>outputs=model.generate(input_ids, do_sample=True, num_return_sequences=3, output_scores=True,max_length=200,eos_token_id=stop,top_k=50,top_p=1.0,temperature=1.0,pad_token_id=tokenizer.pad_token_id) 
>>> outputs.sequences.shape
torch.Size([3, 111])
>>> len(outputs.scores)
11
>>> outputs.scores[0].shape
torch.Size([3, 32016])

But if I use ctranslate2, for example:

>>>>>>prompt_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(text))
>>>step_results = ct2_generator.generate_batch([prompt_tokens],return_scores=True, max_length=100,num_hypotheses=3,sampling_topk=50,sampling_topp=1.0,sampling_temperature=1.0,include_prompt_in_result=False,end_token=stop)
>>> [GenerationResult(sequences=[['▁▁▁', '▁cli', 'ps', '_', 's', 'old', '_', 'ap', 'ril', '▁=', '▁', '4', '8'], ['▁▁▁', '▁cli', 'ps', '_', 's', 'old', '_', 'ap', 'ril', '▁=', '▁', '4', '8'], ['▁▁▁', '▁cli', 'ps', '_', 'ap', 'ril', '▁=', '▁', '4', '8']], sequences_ids=[[1678, 9335, 567, 29918, 29879, 1025, 29918, 481, 4115, 353, 29871, 29946, 29947], [1678, 9335, 567, 29918, 29879, 1025, 29918, 481, 4115, 353, 29871, 29946, 29947], [1678, 9335, 567, 29918, 481, 4115, 353, 29871, 29946, 29947]], scores=[-0.0640094131231308, -0.0640094131231308, -0.09266181290149689])]

I will get a list of len 3 for step_results[0].scores

And I also noticed that there is another function in hf:

>>> transition_scores = model.compute_transition_scores(outputs.sequences,outputs.scores,normalize_logits=True) 
>>> transition_scores
tensor([[-1.0014e-05, -2.0167e-01, -3.2067e-05, -5.0068e-06, -6.9655e-02,
         -4.8401e+00, -1.8686e-02, -3.4571e-06, -1.5497e-06, -3.3379e-06,
         -6.1989e-06],
        [-1.0014e-05, -2.0167e-01, -3.2067e-05, -5.0068e-06, -6.9655e-02,
         -7.9593e-03, -6.5205e-05, -3.5763e-06, -1.6689e-06, -3.2186e-06,
         -5.3644e-06],
        [-1.0014e-05, -2.0167e-01, -3.2067e-05, -5.0068e-06, -6.9655e-02,
         -7.9593e-03, -6.5205e-05, -3.5763e-06, -1.6689e-06, -3.2186e-06,
         -5.3644e-06]])#[3,11]

which is really different from the scores in step_results.

So I have two questions here:

What's the corresponding relationship between generated_outputs.scores, transition_scores and step_results[0].scores?
Another little questions, I think I have set the same parameters between hf and ctranslate, but I have definitely different generations. For example, if the stop token id equals to 13(\n), the hf output will contain the \n, but the ctranslate2 output won't. How to include the stop into the output?

#ctranslate2
>>> step_results[0].sequences_ids
[[1678, 9335, 567, 29918, 29879, 1025, 29918, 481, 4115, 353, 29871, 29946, 29947], [1678, 9335, 567, 29918, 29879, 1025, 29918, 481, 4115, 353, 29871, 29946, 29947], [1678, 9335, 567, 29918, 481, 4115, 353, 29871, 29946, 29947]]

#hf
>>> outputs.sequences[:, len(input_ids[0]):]
tensor([[ 1678,  9335,   567, 29918,   481, 29878,   353, 29871, 29946, 29947,
            13],
        [ 1678,  9335,   567, 29918,   481,  4115,   353, 29871, 29946, 29947,
            13],
        [ 1678,  9335,   567, 29918,   481,  4115,   353, 29871, 29946, 29947,
            13]])

Why is that? Are there parameters I'm not aware of?

The text was updated successfully, but these errors were encountered:

minhthuc2502 · 2024-04-11T10:25:32Z

Hello,
1/ It seems like HF return the score of all vocab in each step. Otherwise, Ctranslate2 calculate the sum of the highest score of each step. 3 in your case is the batch size.
2/ Could you set the include_eos_in_hypotheses to True ? The eos token should be added at the end.

freyaya123 · 2024-04-11T11:07:54Z

Hello, 1/ It seems like HF return the score of all vocab in each step. Otherwise, Ctranslate2 calculate the sum of the highest score of each step. 3 in your case is the batch size. 2/ Could you set the include_eos_in_hypotheses to True ? The eos token should be added at the end.

What do you mean by "Ctranslate2 calculate the sum of the highest score of each step"?
for example, if we assume bs=1
HF score: seq_len*[1,vocab]
ctranslate2 score: a list of len 1. [Num]

What is the Num equal to?

minhthuc2502 · 2024-04-11T12:15:49Z

For example bs = 1. HF score: seq_len x 1 x vocab. Otherwise, Ctranslate2 have shape: 1: (max score in vocab) of token 1 + (max score in vocab) of token 2 + ... + (max score in vocab) of token seq_len .
If you want to get the max score for each token. You can use the async function and then get score of each token.

freyaya123 · 2024-04-11T13:01:16Z

For example bs = 1. HF score: seq_len x 1 x vocab. Otherwise, Ctranslate2 have shape: 1: (max score in vocab) of token 1 + (max score in vocab) of token 2 + ... + (max score in vocab) of token seq_len . If you want to get the max score for each token. You can use the async function and then get score of each token.

Thank you! Another question, according to the autoregressive score after linear layer and chain rule, why is sum here rather than product? --P(x1)*P(x2|x1)P(x3|x1,x2)...*P(xn|x1,x2,...x_n-1)=P(x1,x2,...xn), if we want to calculate the score of generated sequence. I remember there is no log operation in the returned HF score.

minhthuc2502 · 2024-04-11T13:18:08Z

In ctranslate2, the score after each step is the log-likelihood score. That's why we do the sum.

freyaya123 · 2024-04-12T08:05:54Z

Oh I see! Thanks so much!

freyaya123 · 2024-04-12T10:11:50Z

include_eos_in_hypotheses

Sorry I don't find the parameter include_eos_in_hypotheses in the generate_batch function, where to set this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asking about the return_scores during generation #1661

Asking about the return_scores during generation #1661

freyaya123 commented Apr 10, 2024 •

edited

minhthuc2502 commented Apr 11, 2024

freyaya123 commented Apr 11, 2024

minhthuc2502 commented Apr 11, 2024 •

edited

freyaya123 commented Apr 11, 2024

minhthuc2502 commented Apr 11, 2024

freyaya123 commented Apr 12, 2024

freyaya123 commented Apr 12, 2024

Asking about the return_scores during generation #1661

Asking about the return_scores during generation #1661

Comments

freyaya123 commented Apr 10, 2024 • edited

minhthuc2502 commented Apr 11, 2024

freyaya123 commented Apr 11, 2024

minhthuc2502 commented Apr 11, 2024 • edited

freyaya123 commented Apr 11, 2024

minhthuc2502 commented Apr 11, 2024

freyaya123 commented Apr 12, 2024

freyaya123 commented Apr 12, 2024

freyaya123 commented Apr 10, 2024 •

edited

minhthuc2502 commented Apr 11, 2024 •

edited