Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bounds-checks for large prompts. Refs #99 #119

Closed
wants to merge 0 commits into from

Conversation

copybara-service[bot]
Copy link

Bounds-checks for large prompts. Refs #99
Also remove init placeholder and move Sqrt to ops.h.

gemma.cc Outdated Show resolved Hide resolved
ops.h Outdated Show resolved Hide resolved
gemma.cc Outdated
void RangeChecks(size_t& max_tokens, size_t& max_generated_tokens,
size_t& prompt_size) {
if (max_tokens > TConfig::kSeqLen) {
fprintf(stderr, "WARNING: max_tokens %zu > kSeqLen %d, truncating.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond the scope of this PR but one ambitious TODO (https://github.com/google/gemma.cpp/wiki/Notes-on-Tasks-to-be-Done-(03%E2%80%9015%E2%80%902024)) is to unify error handling as return values beyond printing and/or immediately exiting.

Ideally failure/warning states can be passed back to application code to handle in a that's appropriate for the use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, not sure I've seen anything more fine-grained than "user error" (like here) or "generic fail". Is there an example of something we could recover from?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some examples off the top of my head:

  • In an application / API service, loading the model file fails you want to give the user feedback (eg an onscreen message or popup) and give the user a chance to retry with a different file without exiting the application.
  • In a bulk inference use case, the frontend/application may want to take all the inferences that failed and retry them using a different model or different configuration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These situations come into play more in library use cases moreso than the interactive run.cc app, but then most productive use cases will probably be in this category.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, kLoadFailed and kSeqTooShort seem like they could be actionable, yes. I'll add a TODO for that.
An AuxOut with timing/stats also seems like a useful direction. Let's add those in 2 weeks or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants