Inference Experiments #58

JMMackenzie · 2024-03-11T04:35:44Z

Hey all,

I'm looking at the Efficiency Study paper and I'd like to replicate the query encoding numbers - could you please provide a pipeline or any other pointers so I can ensure my measurement is correct?

Thanks a lot!

carlos-lassance · 2024-03-14T16:21:39Z

Hey Joel,

So at the time I think I basically tokenized everything (without taking the time into account) and then ran it a query at the time with a single CPU core (set with SLURM). I can try spinning a similar pipeline, but would love to hear your thoughts.

For later papers I started using a benchmarker from huggingface, but I cannot find it right now. I can try digging deeper if needed.

JMMackenzie · 2024-03-24T23:19:16Z

Thanks Carlos! I was looking for the test setup for measuring inference latency for queries so I could replicate your numbers, but I think I can manage without it if you don't have it. I'll have a dig on HF and see if I can find anything. Thanks again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference Experiments #58

Inference Experiments #58

JMMackenzie commented Mar 11, 2024

carlos-lassance commented Mar 14, 2024

JMMackenzie commented Mar 24, 2024

Inference Experiments #58

Inference Experiments #58

Comments

JMMackenzie commented Mar 11, 2024

carlos-lassance commented Mar 14, 2024

JMMackenzie commented Mar 24, 2024