Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifiy post processing in chat #698

Closed
wants to merge 2 commits into from

Conversation

srdas
Copy link
Collaborator

@srdas srdas commented Mar 23, 2024

Fixes #686

Removed spurious fragments from the suggestion, like backticks

    While most models (especially instruct and infill models do not require
    any pre-processing), some models such as gpt-4 which only have chat APIs
    may require removing spurious fragments (such as backticks).
    This function uses heuristics and request data to remove such fragments.
    It comments out all lines that are not code, irrespective of programming language.

Changes made to packages/jupyter-ai/jupyter_ai/tests/completions/test_handlers.py

  1. All lines with backticks have been removed.
  2. All text outside code blocks is commented out.

Result looks like the following:
backticks example

srdas and others added 2 commits March 23, 2024 13:05
Remove spurious fragments from the suggestion.

        While most models (especially instruct and infill models do not require
        any pre-processing), some models such as gpt-4 which only have chat APIs
        may require removing spurious fragments (such as backticks).
        This function uses heuristics and request data to remove such fragments.
        It comments out all lines that are not code, irrespective of programming language.
@srdas srdas added enhancement New feature or request good first issue Good for newcomers bug Something isn't working and removed enhancement New feature or request labels Mar 23, 2024
@krassowski krassowski removed the good first issue Good for newcomers label Mar 24, 2024
Copy link
Member

@krassowski krassowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many languages where comments are written differently. This would generate invalid code for these languages.

Why not just modify the previous logic to strip the triple backtick suffix? Why do you want to ignore language? There is a very different use case of generating suggestions for markdown where having triple backtick is desired and of generating Python (or other) code.

@krassowski
Copy link
Member

In general, if an LLM wants to generate comments, it should generate comments as valid language-specific syntax. It is not responsibility of jupyter-ai to transform any spurious text to comments. If jupyter-ai were to do that, one would need to write (and maintain) a very extensive multi-language logic to handle it. Instead, a LLM should be prompted differently (to only return non-code as comments with syntax matching given language). If a given LLM cannot do that, user should be advised that this is not a good LLM for code completions (ideally we would have two lists, one for chat and one for code completion; some models would be on both, but many are fine tuned for one or the other task).

@srdas
Copy link
Collaborator Author

srdas commented Apr 12, 2024

Closing this PR as discussed here: #726 (review)

@srdas srdas closed this Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inline completer displays ``` at the end of suggested code when using provider
2 participants