New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bounds-checks for large prompts. Refs #99 #119
Conversation
66a4955
to
594613a
Compare
gemma.cc
Outdated
void RangeChecks(size_t& max_tokens, size_t& max_generated_tokens, | ||
size_t& prompt_size) { | ||
if (max_tokens > TConfig::kSeqLen) { | ||
fprintf(stderr, "WARNING: max_tokens %zu > kSeqLen %d, truncating.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beyond the scope of this PR but one ambitious TODO (https://github.com/google/gemma.cpp/wiki/Notes-on-Tasks-to-be-Done-(03%E2%80%9015%E2%80%902024)) is to unify error handling as return values beyond printing and/or immediately exiting.
Ideally failure/warning states can be passed back to application code to handle in a that's appropriate for the use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, not sure I've seen anything more fine-grained than "user error" (like here) or "generic fail". Is there an example of something we could recover from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some examples off the top of my head:
- In an application / API service, loading the model file fails you want to give the user feedback (eg an onscreen message or popup) and give the user a chance to retry with a different file without exiting the application.
- In a bulk inference use case, the frontend/application may want to take all the inferences that failed and retry them using a different model or different configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These situations come into play more in library use cases moreso than the interactive run.cc
app, but then most productive use cases will probably be in this category.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, kLoadFailed and kSeqTooShort seem like they could be actionable, yes. I'll add a TODO for that.
An AuxOut with timing/stats also seems like a useful direction. Let's add those in 2 weeks or so.
af7e620
to
d2b272c
Compare
d2b272c
to
bb767d7
Compare
Bounds-checks for large prompts. Refs #99
Also remove init placeholder and move Sqrt to ops.h.