Capital letters breaks autocomplete in VS Code Extension #1347

drhagen · 2024-01-18T19:52:26Z

A grammar of a keyword followed by /[A-Z]+/ will not correctly autocomplete the keyword, but the same keyword followed by /[a-z]+/ will autocomplete just fine. This might be a bug on the VS Code side because the same grammar in the Langium Playground autocompletes fine.

Langium version: 2.1.3
Package name: hello-world

Steps To Reproduce

npm install -g yo generator-langium
yo langium

Keep defaults except do not create CLI or webworker
Accept open in VS code

Replace hello-world.langium with:

grammar HelloWorld

entry Model:
    'header' value=ID;

hidden terminal WS: /\s+/;
terminal ID: /[A-Z]+/;

Purge validation in hello-world-validator.ts because we don't need it:

import type { HelloWorldServices } from './hello-world-module.js';
export function registerValidationChecks(services: HelloWorldServices) { }
export class HelloWorldValidator { }

npm run langium:generate
npm run build
Run extension in Code to open a new window with the extension installed
Create a file test.hello
In the file, try to auto-complete the first keyword he<tab>

The current behavior

When starting to type the keyword, the correct completion appears. But when pressing Tab or Enter to accept the autocomplete, it types in the whole keyword again instead of the remainder of the word.

Now switch ID from /[A-Z]+/ to /[a-z]+/. Rebuild and restart the extension. With this grammar autocomplete works as expected.

The expected behavior

Autocomplete completes the keyword instead of typing the whole keyword in again regardless of the token that follows.

The text was updated successfully, but these errors were encountered:

msujew · 2024-01-18T20:59:52Z

Ok, fascinating. This is a really hard to catch edge case in very special grammars for completions within the first token of a file. I'm honestly suprised someone was able to create reproduction steps for this. Kudos, I guess. We basically run into this branch, which then later assumes that no tokens have been parsed. As a consequence it doesn't even attempt to fuzzy match the previous code to override it. This logic got fairly recently into Langium, whereas the playground lags behind a minor version, which is why it doesn't exhibit the behavior.

I'm not sure whether we can actually change this part of the logic though. The fuzzy matcher isn't allowed to look too far back in the token stream to find the text to replace. It should only look for the current token, which is exactly what's happening right now. In some cases, the current token just cannot be lexed, which leads to the behavior you're experiencing.

drhagen · 2024-01-19T14:00:35Z

within the first token of a file

I minimized this down, but failed autocompletion can trigger further than the first token, unless we have different definitions of "token".

For example, using this grammar:

grammar ReactionModel

entry ReactionModel:
    EOL? '%%' 'ReactionModel@2' EOL
    'initialization' '=' initialization=Initialization EOL
    '%' 'components' EOL
;

Initialization:
    InitialValue | SteadyState;

InitialValue:
    {infer InitialValue} 'initial_value' '(' ')';

SteadyState:
    'steady_state' '(' 'time_scale' '=' time_scale=FLOAT (',' 'max_scale' '=' FLOAT )? ')';

hidden terminal WS: /[ \t]+/;
terminal EOL: /((#.*)?\n[ \t]*)*(#.*)?((\n[ \t]*)|\Z)/;
terminal FLOAT returns number: /[+-]?\d+(\.\d+)?([Ee][+-]?\d+)?/;

with this valid file

%% ReactionModel@2
initialization = steady_state(time_scale = 1.0, max_scale=1.0)
% components

not a single keyword autocompletes correctly while typing it in or when going back to edit it. It knows what can be autocompleted there (e.g. after "initialization =" then "steady_state" or "initial_value" are valid autocompletes), but it types in the whole word instead of completing the word.

msujew · 2024-01-25T14:52:14Z

@drhagen Let me rephrase: For example initial - in your language - isn't actually a token (even though initial_value is), since there's neither a keyword nor something like an ID terminal that could lex it. Instead, the lexer simply ignores the characters. Since we can only know where a token ends/starts if the lexer recognizes it's a token, the completion provider assumes that the characters before the cursor position are invalid characters and ignores them as well. This is actually independent of the issue that we don't lex any tokens at all - the issue is really that we have no idea "how much" of a token already exists at a given point.

In order to successfully perform completion, even "broken" keywords need to be recognized as tokens by the lexer. Most languages (i.e. all that I've encountered so far) have an ID terminal that can be expressed as /\w+/, which automatically fixes this issue.

I don't think we can fix this as part of our framework. You are free to override how the completion provider attempts its fuzzy matching, so you should be able to fix this behavior for your language yourself.

drhagen added the bug Something isn't working label Jan 18, 2024

msujew added the completion Completion related issue label Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capital letters breaks autocomplete in VS Code Extension #1347

Capital letters breaks autocomplete in VS Code Extension #1347

drhagen commented Jan 18, 2024 •

edited

msujew commented Jan 18, 2024 •

edited

drhagen commented Jan 19, 2024

msujew commented Jan 25, 2024 •

edited

Capital letters breaks autocomplete in VS Code Extension #1347

Capital letters breaks autocomplete in VS Code Extension #1347

Comments

drhagen commented Jan 18, 2024 • edited

Steps To Reproduce

The current behavior

The expected behavior

msujew commented Jan 18, 2024 • edited

drhagen commented Jan 19, 2024

msujew commented Jan 25, 2024 • edited

drhagen commented Jan 18, 2024 •

edited

msujew commented Jan 18, 2024 •

edited

msujew commented Jan 25, 2024 •

edited