Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline completer displays ``` at the end of suggested code when using provider #686

Open
bensonlee5 opened this issue Mar 10, 2024 · 5 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@bensonlee5
Copy link

bensonlee5 commented Mar 10, 2024

As flagged in this Jupyterlab forum post; should be removed in post-processing.

@bensonlee5 bensonlee5 added the bug Something isn't working label Mar 10, 2024
Copy link

welcome bot commented Mar 10, 2024

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@dlqqq
Copy link
Collaborator

dlqqq commented Mar 11, 2024

@krassowski Do you want to own this issue? 👀

@krassowski
Copy link
Member

It looks like the problem is two-fold:

  1. when streaming, post-processing is only used when a problematic prefix is detected:
    async for fragment in self.llm_chain.astream(input=model_arguments):
    suggestion += fragment
    if suggestion.startswith("```"):
    if "\n" not in suggestion:
    # we are not ready to apply post-processing
    continue
    else:
    suggestion = self._post_process_suggestion(suggestion, request)
    self.write_message(
    InlineCompletionStreamChunk(
    type="stream",
    response={"insertText": suggestion, "token": token},
    reply_to=request.number,
    done=False,
    )
    )
  2. post-processing does not look for suffixes at all in general:
    def _post_process_suggestion(
    self, suggestion: str, request: InlineCompletionRequest
    ) -> str:
    """Remove spurious fragments from the suggestion.
    While most models (especially instruct and infill models do not require
    any pre-processing, some models such as gpt-4 which only have chat APIs
    may require removing spurious fragments. This function uses heuristics
    and request data to remove such fragments.
    """
    # gpt-4 tends to add "```python" or similar
    language = request.language or "python"
    markdown_identifiers = {"ipython": ["ipython", "python", "py"]}
    bad_openings = [
    f"```{identifier}"
    for identifier in markdown_identifiers.get(language, [language])
    ] + ["```"]
    for opening in bad_openings:
    if suggestion.startswith(opening):
    suggestion = suggestion[len(opening) :].lstrip()
    # check for the prefix inclusion (only if there was a bad opening)
    if suggestion.startswith(request.prefix):
    suggestion = suggestion[len(request.prefix) :]
    break
    return suggestion

I think (2) is easy to solve, I am happy to review a PR from new contributors here if anyone is interested. Ideally the PR would come with a new test case in packages/jupyter-ai/jupyter_ai/tests/completions/test_handlers.py.

@krassowski krassowski added the good first issue Good for newcomers label Mar 12, 2024
@srdas
Copy link
Collaborator

srdas commented Mar 20, 2024

@dlqqq @krassowski Thanks for bringing up this issue. I did a detailed analysis and provide my thoughts below.
The issue is that for the example shown, the JupyterLab completer enters ``` at the end of the generated code snippet. Here is a visual:
image

Installed jupyterlab-transformers-completer based on https://github.com/krassowski/jupyterlab-transformers-completer.

Replicated this error on mac-os for the example above using both openai-chat:gtp-3.5-turbo and bedrock-chat:anthropic.claude-instant-v1 . As shown below for the Claude case:
image

But it does not always give the same result (see the two examples below):
image

Here we mimic the add_two_numbers case but get different results:
image
Here we want the backticks. Replicated the same results using Linux as well. At least we know this is not LLM specific nor os specific.

Clearly, adding special code to remove the ``` at the end as suggested here https://github.com/jupyterlab/jupyter-ai/issues/686#issuecomment-1991202710, will only handle one edge case, which may just be the outcome of some LLM completions, not all. It may not be desirable to build in special code for one edge case into jupyter-ai. Maybe this needs to be handled in JupyterLab , if at all.

Further, the outcome of the transformer completion also depends on the Settings in JupyterLab. In the Settings
drop down, select Settings Editor.
image
Then select Inline Completer.
image
If you scroll down the Inline Completer panel, you get three boxes where transformer completion can be “Enabled”. As shown below.

The Issue can be reproduced when only the third box is checked as shown below.
image

Checking any other configuration of the checkboxes generates other behavior, for example let’s check the second and third boxes and then we get the following examples:
image

Checking the first and third boxes gives different behavior as well
image

Checking all three boxes again alters behavior:
image

A slightly different experience may arise depending on machine, environment, and code completion use cases.

We may not want to fix jupyter-ai code for varying responses from the transformer used for JupyterLab completions. Also, the UX is altered with completions, so it is better to set all the three “Enabled” checkboxes off as default as different users may want completions versus not.

We need to think/discuss the UX and generality of the code mods before going ahead.

@krassowski
Copy link
Member

Maybe this needs to be handled in JupyterLab , if at all

No, we do not want to special case anything in JupyterLab.

I don't think that the LLMs should add any examples like on your second snapshot.

Here is my point of view - there are two issues:

  1. specific LLMs need a different template to coerce them to only produce code, which is configurable
  2. the default post-processing should handle the case of returned code being wrapped in extra backticks; nothing more nothing less

Also, the UX is altered with completions, so it is better to set all the three “Enabled” checkboxes off as default as different users may want completions versus not.

I do not understand how this relates to this issue.

bartleusink added a commit to bartleusink/jupyter-ai that referenced this issue Apr 11, 2024
bartleusink added a commit to bartleusink/jupyter-ai that referenced this issue Apr 11, 2024
dlqqq pushed a commit to bartleusink/jupyter-ai that referenced this issue Apr 15, 2024
dlqqq pushed a commit to bartleusink/jupyter-ai that referenced this issue Apr 22, 2024
dlqqq pushed a commit to bartleusink/jupyter-ai that referenced this issue Apr 22, 2024
dlqqq pushed a commit that referenced this issue Apr 22, 2024
* Remove closing markdown identifiers (#686)

* Remove whitespace after closing markdown identifier

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants