-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is Gemma on device really this slow ? #379
Comments
Oh boy, no definitely not. It's not really intended to be run on the emulator, so your results are going to vary wildly. Here's a presentation I did last week with a slide showing Gemma running on a device in real-time (not sped up or altered, just recorded and turned into a gif) https://docs.google.com/presentation/d/1uetAcmkNWDXHEJaCt6WoBflDM1iMUU1N1ahzQof6PLM/edit#slide=id.g26cd5c56ad9_1_30 |
I saw a post suggesting emulator with increased ram works similarly. What's the difference that makes physical device so much faster ? Is it particularly customized for gemma ? Thanks for the prompt response! |
No idea on that level of detail. My general experience over the last 10+ years with Android development though has always been "Eh, emulators are OK, but never as good as a real device" |
Time to first token is still pretty slow compared to the video you shared. Takes around 15 seconds for both 4bit and 8bit cpu versions of gemma2b. |
I used llm_inference sample with
gemma-2b-it-cpu-int4.bin
on Pixel 8 Pro emulator.The prefill speed seems to be in minutes.
Pixel 8 Pro configurations:-
RAM - 22GB, VM heap - 512mb
Reference video
https://github.com/googlesamples/mediapipe/assets/22965002/c7730dba-48e8-4eec-ae68-fe847d2778f2
The text was updated successfully, but these errors were encountered: