Batching for speed. #45

wj210 · 2024-04-14T07:36:19Z

Hi, i would like to ask if it has been tried out with llama that batch inference works?

i followed this https://huggingface.co/docs/transformers/llm_tutorial#wrong-padding-side , where they pass both the input_ids and attention_mask to the model, but i got 'nan' values. If i only passed in the input_ids, it is fine, however I'm not sure if not passing in the attention mask will have any effects on the final output.

Also, there seems to be a bug in

FActScore/factscore/lm.py

Line 27 in c6e150f

if prompt.endswith(" True or False?\nAnswer:"):

where " True or False?\nAnswer:" is not detected since

FActScore/factscore/factscorer.py

Line 223 in c6e150f

    
           prompt = "{}\n\nInput: {} True or False?\nOutput:".format(definition.strip(), atom.strip())

ends with a different output, hence the model generated length is 128 instead of 1. This will waste cost and if in the case of gpt3.5 being used, there may be both 'true' and 'false' in the 128 tokens leading to wrong decisions.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batching for speed. #45

Batching for speed. #45

wj210 commented Apr 14, 2024

Batching for speed. #45

Batching for speed. #45

Comments

wj210 commented Apr 14, 2024