Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--format code only works sometimes, leads to syntax errors #751

Open
jaanli opened this issue Apr 26, 2024 · 3 comments
Open

--format code only works sometimes, leads to syntax errors #751

jaanli opened this issue Apr 26, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@jaanli
Copy link

jaanli commented Apr 26, 2024

Description

When I teach courses, it is hard to have students new to programming understand the limitations of LLMs, so they often take the output at face value.

The --format code directive in the cells is very helpful to illustrate what the expected output is.

image

However, if I write the prompt incorrectly/make typos, then the --format code directive should throw an exception (I think? new to this) if the output is not formatted as code.

Here is an example:

image

Reproduce

See above for the prompt that leads to a syntax error, running in https://colab.research.google.com/github/jaanli/language-model-notebooks/blob/main/notebooks/getting-started.ipynb

Expected behavior

When the user declares an intent such as --format code, the output should throw an exception if the output is not adherent to the user's intent or stated format.

For example, in SGLang (https://github.com/sgl-project/sglang) it is possible to declare intents and then catch errors if the output is not in a format. Is support for these types of frameworks planned? I'd love to see what this would take and happy to try to contribute :)

@jaanli jaanli added the bug Something isn't working label Apr 26, 2024
@JasonWeill
Copy link
Collaborator

You can modify prompt templates, as described in #498 and implemented in #581, to attempt to get better quality code from your language models. From my experience, even the most highly trained large language models often make mistakes with programming languages, such as relying on modules that may not be installed, or calling functions that look plausible but do not exist. Some LLMs can purport to run code in a sandboxed environment, but these, too, are not guaranteed to work with any particular programming language or real-world runtime. In addition, I've found that even after giving a prompt template telling an LLM not to include anything other than code, many LLMs include explanatory text, such as the text in your screenshot. Prompt engineering may fix this, but future LLM developments may render your prompt ineffective.

@jaanli
Copy link
Author

jaanli commented Apr 26, 2024

Sounds good, thank you!!

Sorry if I didn't make it clear - do you think raising an exception when code is not generated would be feasible?

For example, if parsing with Tree Sitter (https://tree-sitter.github.io/tree-sitter/) does not work, then I would raise an error if the user has passed in --format code.

Appreciate the discussion! I don't think this issue has to do with generating better quality code, but simply about the unreliability of language models, and ways to constrain that unreliability. For example, I have tried setting the temperature parameter to zero, but that still leads to irreproducible results - where, sometimes code is generated, sometimes not in the above example I shared.

Hope that helps for additional context!

@JasonWeill
Copy link
Collaborator

Thank you for mentioning Tree Sitter! I haven't tried that myself. Our general guidance is that whenever an AI model generates code for a person, the person needs to do a code review as diligently as if another person had written it. If you have a code change that would improve the quality of code generated via LLMs in Jupyter AI, I encourage you to open a pull request against Jupyter AI, or to build a new extension to further improve our code. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants