Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Gemma on device really this slow ? #379

Open
MJ1998 opened this issue Apr 30, 2024 · 4 comments
Open

Is Gemma on device really this slow ? #379

MJ1998 opened this issue Apr 30, 2024 · 4 comments

Comments

@MJ1998
Copy link
Contributor

MJ1998 commented Apr 30, 2024

I used llm_inference sample with gemma-2b-it-cpu-int4.bin on Pixel 8 Pro emulator.

The prefill speed seems to be in minutes.

Pixel 8 Pro configurations:-
RAM - 22GB, VM heap - 512mb

Reference video
https://github.com/googlesamples/mediapipe/assets/22965002/c7730dba-48e8-4eec-ae68-fe847d2778f2

@PaulTR
Copy link
Collaborator

PaulTR commented Apr 30, 2024

Oh boy, no definitely not. It's not really intended to be run on the emulator, so your results are going to vary wildly. Here's a presentation I did last week with a slide showing Gemma running on a device in real-time (not sped up or altered, just recorded and turned into a gif) https://docs.google.com/presentation/d/1uetAcmkNWDXHEJaCt6WoBflDM1iMUU1N1ahzQof6PLM/edit#slide=id.g26cd5c56ad9_1_30

@MJ1998
Copy link
Contributor Author

MJ1998 commented Apr 30, 2024

I saw a post suggesting emulator with increased ram works similarly.
Here it is - link - Search for "Creating an Android Emulator with Increased RAM"

What's the difference that makes physical device so much faster ? Is it particularly customized for gemma ?

Thanks for the prompt response!

@PaulTR
Copy link
Collaborator

PaulTR commented Apr 30, 2024

No idea on that level of detail. My general experience over the last 10+ years with Android development though has always been "Eh, emulators are OK, but never as good as a real device"

@MJ1998
Copy link
Contributor Author

MJ1998 commented May 2, 2024

Time to first token is still pretty slow compared to the video you shared. Takes around 15 seconds for both 4bit and 8bit cpu versions of gemma2b.
Physical device that I am using is pixel 7 pro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants