Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing grammar as a request parameter in completion/chat with go-side grammar validation #4525

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

richardanaya
Copy link

@richardanaya richardanaya commented May 19, 2024

Why is passing down grammars needed?

Relying upon the context of a prompt to dictate structure can be unreliable (because its dependent upon the model and generational randomness) and takes up context space. Grammar is a well proven way to constrain generational output, and in fact format="JSON" even depends on it, but format="JSON" allows no reliable specification large complex structures and can even be tricked with prompt attacks.

image

Why grammar and not JSON schema?

While JSON schema would make a nice future addition, there's interest in data structures outside of JSON (simple enum values, programming languages, etc.). Also, JSON schema generators will rely upon grammars fundamentally, so validating the grammar generated by JSON schema will also benefit from grammar checking.

Why not just pass along the grammar to llama.cpp?

I looked into complexities of passing along grammar to llama.cpp server. There's a few challenges:

  • llama.cpp server doesn't return errors when bad grammar is passed to it with streaming mode on. It gives an incomprehensible "unexpected EOF"
    image
  • the in memory model will be reused if the grammar is valid OR changed. BUT... the in-memory model appears to get reloaded if you give it a bad grammar and then follow up with a good grammar.
    image
  • it appears to work perfectly reusing in memory models just passing along a completely valid grammar (even a variety of valid grammars)

My conclusion from this given the advice of the community is that we do indeed have to do our our GBNF grammar validation on the Go server side to do our best at preventing passing down bad grammar.


In this PR i've created:

  • the functionality to pass along grammar in chat and completion mode
  • documentation in readme related to new property
  • prevention of using grammar and json parameters at same time.
  • validation code for grammars
  • extensive set of 30+ tests for grammar ranging from character classes, strings, internationalizations comments, etc.
  • tests of every known grammar on llama.cpp and also individual unit tests
  • no usages of regex to make clear understandable parsing

Edge cases:

  • i've probably not implemented the entirety of whats possible in character classes, but I have a limited subset compatible with the grammar listed on llamma.cpp. My assumption is most people's grammars will be less complex than these.
  • there might be some valid grammars I don't currently support (but to the best of my knowledge we support all the major publicly available ones including ones as complex as C programming language), I chose not to use a full on go parser library because I wanted the cognitive load of this code to be approachable initially (rather than every viewer of this code to have to learn a new library). if in the future, we want to replace it with a more formal technology we can and tests can be reused.

Examples of success:

Screenshot 2024-05-19 at 1 29 10 PM Screenshot 2024-05-19 at 1 44 22 PM

Example of failure:

Screenshot 2024-05-19 at 1 28 22 PM

I believe this PR satisfies #4074 with an acceptable amount of protection from sending invalid GBNF grammars with useful error messages.

@richardanaya
Copy link
Author

richardanaya commented May 20, 2024

I think i've added as many tests as I can think of to meaningfully add. I'll await feedback. @jmorganca

@mitar
Copy link

mitar commented May 31, 2024

Have you seen #3618? It adds both grammar and JSON schema option which is then passed to llama.cpp. I think it would be nice to combine those two PRs (especially tests from this PR).

return fmt.Errorf("grammar and format cannot be used together")
}

err := ValidateGrammar(req.Grammar)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to validate the grammar? Llama.cpp does that anyway?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I showed in my investigation above bad grammars eject the model from memory and makes it reload. The streaming llama.cpp server has bad error handling. The go-side grammar check prevents that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, where is go-side in this PR? I see that you implemented your own validator?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, I wrote my own validator in the grammar.go file of the commit. I didn't want to obligate this project to some special parsing library, so I tried to just be as straight forward as possible to get some initial validation going. I was a bit paranoid the PR might seem too strange if I did something too esoteric. The suite of tests I think could be useful for whatever go validation evolves into.

I was aiming for something just broadly adequate at validation to protect the lamma.cpp server from the ejections of models. I noticed that a lot of PRs didn't get accepted because they were pretty simplistic pass throughs. Talking with someone from the Discord said that some lack of even basic protection over the servers state might have been the reason why. That's sort of why this PR evolved as it did.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought "go-side" is some library for grammar checking. :-) Lol.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S'all good

@richardanaya
Copy link
Author

richardanaya commented Jun 1, 2024

Have you seen #3618? It adds both grammar and JSON schema option which is then passed to llama.cpp. I think it would be nice to combine those two PRs (especially tests from this PR).

I have, but I think it suffers from the same problem as bad grammars that erroneous json schema cause model ejects. I think it would require a json schema validator. I think that's a big enough task it'd make sense to either leave that to another wrapper project to convert JSON schema to grammar, or put in separate PR. I'm interested in writing that separate PR, but i'd like to at least get this one finished. I think a lot of folks have been waiting on even just basic grammar support. Thanks!

@mitar
Copy link

mitar commented Jun 1, 2024

I think it suffers from the same problem as bad grammars that erroneous json schema cause model ejects.

I am not sure why I would have to pay for grammar and JSON schema validation at every API request? I find this strange. In general I would say that in my case, JSON schema and/or grammar is part of a trusted input and not provided by the user. At least validation should then be cached or something?

Also is the point of this validation to prevent malicious values for grammar or just accidental erroneous values? Because if the goal is to prevent malicious values, then validator should completely match llama.cpp would reject. Otherwise an attacker would be able to bypass this validator by crafting the value which passes this validator but is still rejected by llama.cpp.

So I am not sure exactly why is this validation needed?

@richardanaya
Copy link
Author

These are two valid concerns:

  • Cacheing could definitely help, I could add something small
  • You're right that a malicious hacker could find some way to bust the internal model. The goal of this PR isn't to be perfect security, its to get the ball rolling on getting grammar into the project and even get feedback and be generally aligned with the desired principles. As I specified above, my validator isn't a perfect representation of what's inside of llama.cpp's capabilities. To my knowledge, a spec of their grammar support doesn't even exist. So my validation is a subset and maybe has holes and is to what I could determine from their public documentation and public existing grammars.

Again :) I know nothing about the mindset of the project owners on what's holding them back from merging in grammar support. I did my best to make a PR in line with convos from older members in Discord to speculatively address their issues but also not make something too esoteric.

@richardanaya
Copy link
Author

@mitar added simple caching and some simple sanity checking around size of grammar

@richardanaya
Copy link
Author

@mitar I thought about your concern, I now only process grammars if OLLAMA_GRAMMAR set to "true". That way custom grammar is opt-in for people okay with it's trade offs.

@mitar
Copy link

mitar commented Jun 1, 2024

I now only process grammars if OLLAMA_GRAMMAR set to "true". That way custom grammar is opt-in for people okay with it's trade offs.

I do not get why would this be useful? You should maybe only make validation an option. But you should always pass grammar through if user wants to use the grammar?

@richardanaya richardanaya requested a review from mitar June 1, 2024 20:12
@richardanaya
Copy link
Author

richardanaya commented Jun 1, 2024

I now only process grammars if OLLAMA_GRAMMAR set to "true". That way custom grammar is opt-in for people okay with it's trade offs.

I do not get why would this be useful? You should maybe only make validation an option. But you should always pass grammar through if user wants to use the grammar?

My understanding the goal is to protect the model from being ejected by llama.cpp server. We should never pass down invalid gramma to the best of our ability. Therefore grammar being passed in is opt-in until the community is comfortable with the validation. In other words, we shouldn't let a vulnerability be the default behavior.

@MHugonKaliop
Copy link

Would like to see this merged, using grammar, plus having a warning when syntax is wrong is highly interested for me ;)

@mann1x
Copy link
Contributor

mann1x commented Jun 7, 2024

@BruceMacD
This seems a pretty solid implementation, can someone review it and consider the merging?

@ketsapiwiq
Copy link

ketsapiwiq commented Jun 7, 2024

If the maintainers are afraid to add this grammar property to the API, maybe we can merge this PR with the one exposing OpenAI-like function calling (defining tools) in the API? (#3303)
I think that grammar constraints are mostly for JSON / function calling anyway.

@richardanaya
Copy link
Author

Just for clarity, i'm not interested in @ketsapiwiq's PR.

@mitar
Copy link

mitar commented Jun 8, 2024

Personally, I think only thing which should be done is https://gitlab.com/peerdb/ollama/-/commit/f1f7b7ea0cc1c582fdb122b69646fc1b5661c9c8 (+ documentation update). This adds little code to maintain to ollama and its simplicity I think is better than the potential downsides of passing invalid data. (Passing invalid data could also be handled in the future by llama.cpp.)

@richardanaya
Copy link
Author

@mitar

I would actually prefer this as well out of simplicity. But, there's been countless PRs in Ollama that have suggested that already that never made it. The reason this PR is so complex is i'm addressing what I assume are objections that were concerns.

@mitar
Copy link

mitar commented Jun 9, 2024

I assume are objections that were concerns.

But those objections you assume were never stated.

@richardanaya
Copy link
Author

richardanaya commented Jun 9, 2024

@mitar I'm just trying to do whatever possible to make this happen. I did not do it on whim, I took my best guess implementation based on @mann1x who is more familiar with the community than I. But you are correct there is a lot of silence around this issue, but trust, me, i've tried to get answers.

@mann1x
Copy link
Contributor

mann1x commented Jun 9, 2024

@richardanaya @mitar

I don't know why there's no interest from the maintainers about this.
Can only speculate they have something different in mind.
Like an implementation that requires changes in the library.

I think a big factor is the fact that the grammars can easily tank performances and until a little while ago this was very easy and could easily get so bad that llama.cpp would become unresponsive.
You don't want to implement something that brings more support headaches than else.
I guess now that the situation is better on llama.cpp maybe they are more prone into considering it.

Personally, I like much more the complex implementation.
It's not that complex and a pre-filtering could be used to mitigate issues, which will come out for sure.

But the specifics of the implementation are irrelevant if there's no feedback from the maintainers.
Nothing can be done without.

@richardanaya
Copy link
Author

@mann1x yah, something i'll offer about this specific PR to emphasize is its only enabled by env flag. Even if the maintainers are nervous about making it default, I see no reason why we can't enable it for those who voluntarily opt in.

@mann1x
Copy link
Contributor

mann1x commented Jun 9, 2024

@mann1x yah, something i'll offer about this specific PR to emphasize is its only enabled by env flag. Even if the maintainers are nervous about making it default, I see no reason why we can't enable it for those who voluntarily opt in.

this is indeed a plus

@MHugonKaliop
Copy link

MHugonKaliop commented Jun 9, 2024

Some comments as I asked for this merge first ;) (well, as a simple user)

This feature (from what I have seen), allows to ask for an answer in a specific format. I know we can do it in the prompt, but from time to time, the answer doesn't follow the rules, and we have to code extra verification and "manual" correction (or rerun). I already "cheated" by using function format, but even like that, from time to time, i have answers not following the rules I need.

With the grammar, if I want a json with a value chosen from three options, I'll have it.
That's why this feature is very interesting for my use cases.

Concerning the validation, I understand the point. There are other ways to check the syntax, and I can agree that it may not be ollama's role.
I can deal without ;)

@zutto
Copy link

zutto commented Jun 10, 2024

Looked briefly at some of the other pull requests related to this grammar change.

I think some effort from the maintainers is required to get this going, this change in general seems to be stagnating too much and would be great to get some insight on why this is the case, why are the efforts ignored?
If the first version of the feature is not perfect, it can and will be improved in the future. But for now, it would be very good to get grammar support in Ollama.

--
Similar/related pull requests:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants