Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guide on multi-mode lexing #132

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Guide on multi-mode lexing #132

wants to merge 3 commits into from

Conversation

msujew
Copy link
Contributor

@msujew msujew commented Feb 13, 2023

Effectively closes #70

@msujew msujew added the recipe Improvements or additions to recipes label Feb 13, 2023
@msujew msujew requested a review from montymxb February 13, 2023 18:15
Copy link
Contributor

@montymxb montymxb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, nice to see some stuff about multi-mode! However, this seems more about template literals than just multi-mode lexing. Not necessarily a problem, but I would recommend we change the title and make it clear that we're talking about implementing template literals through multi-mode lexing, as that appears to be the primary topic here.

It might even make a better tutorial over a guide, as this is a targeted application; but that's less important than the first point.


Many modern programming languages such as [JavaScript](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals) or [C#](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated) support template literals.
They are a way to easily concatenate or interpolate string values while maintaining great code readability.
This guide will show you how to support template literals in Langium.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This guide will show you how to support template literals in Langium.
This guide will show you how to support template literals in Langium though multi-mode lexing.

This paragraph is still a bit strange, as it reads more like the topic is template literals.

They are a way to easily concatenate or interpolate string values while maintaining great code readability.
This guide will show you how to support template literals in Langium.

For this specific example, our template literal starts and ends using backticks `` ` `` and are interupted by expressions that are wrapped in curly braces `{}`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For this specific example, our template literal starts and ends using backticks `` ` `` and are interupted by expressions that are wrapped in curly braces `{}`.
For this specific example, our template literal starts and ends with backticks `` ` ``, and is interrupted by expressions that are wrapped in curly braces `{}`.

hugo/content/guides/multi-mode-lexing.md Show resolved Hide resolved
@@ -0,0 +1,175 @@
---
title: "Multi-Mode Lexing"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend changing the title here to something about template literals, possibly

Template Literals with Multi-Mode Lexing

```

Conceptually, template strings work by reading a start terminal which starts with `` ` `` and ends with `{`,
followed by an expression and then an end terminal which is effectively just the start terminal in reverse using `}` and `` ` ``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
followed by an expression and then an end terminal which is effectively just the start terminal in reverse using `}` and `` ` ``.
followed by an expression and an end terminal, which is `}` and `` ` ``.

}
```

Of course, let's not forget to bind all of these services:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Of course, let's not forget to bind all of these services:
Of course, let's not forget to bind all of these services in your **module.ts**:

hugo/content/guides/multi-mode-lexing.md Show resolved Hide resolved

export class CustomTokenBuilder extends DefaultTokenBuilder {

override buildTokens(grammar: GrammarAST.Grammar, options?: { caseInsensitive?: boolean }): TokenVocabulary {
Copy link
Contributor

@montymxb montymxb Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From before, I would first break this out into a separate paragraph, explaining we need to first build up a multi-mode lexer definition that has various modes, which are pushed on by our special tokens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

}
}

protected override buildKeywordToken(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would make a nice second part, indicating we need cleanup our } token so regular mode doesn't get messed up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

return tokenType;
}

protected override buildTerminalToken(terminal: GrammarAST.TerminalRule): TokenType {
Copy link
Contributor

@montymxb montymxb Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third part, we can add this & explain how we're associating a push/pop action for start/end literals (which chevrotain needs).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Copy link

PR Preview Action v1.4.4
🚀 Deployed preview to https://eclipse-langium.github.io/langium-previews/pr-previews/pr-132/
on branch previews at 2023-12-14 14:04 UTC

Copy link
Contributor

@montymxb montymxb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went back through and resolved a number of discussions to try and make the outstanding points clearer. Most of the remaining suggestions are specifically for grammar or clarity, but the rest should be good.

hugo/content/guides/multi-mode-lexing.md Show resolved Hide resolved
hugo/content/guides/multi-mode-lexing.md Show resolved Hide resolved
The following implementation of a `TokenBuilder` will do the job for us. It creates two lexing modes, which are almost identical except for the `TEMPLATE_LITERAL_MIDDLE` and `TEMPLATE_LITERAL_END` terminals.
We will also need to make sure that the modes are switched based on the `TEMPLATE_LITERAL_START` and `TEMPLATE_LITERAL_END` terminals. We use `PUSH_MODE` and `POP_MODE` for this.

```ts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a step up from this, I still feel we should split this up. But in the interest of moving this along after some time can we instead make an issue for a custom token builder guide separately?


export class CustomTokenBuilder extends DefaultTokenBuilder {

override buildTokens(grammar: GrammarAST.Grammar, options?: { caseInsensitive?: boolean }): TokenVocabulary {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

}
}

protected override buildKeywordToken(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

return tokenType;
}

protected override buildTerminalToken(terminal: GrammarAST.TerminalRule): TokenType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

hugo/content/guides/multi-mode-lexing.md Show resolved Hide resolved
@agacek
Copy link

agacek commented May 17, 2024

The generated AstNode for TemplateLiteral looks like this:

export interface TemplateLiteral extends AstNode {
    ...
    content: Array<Expr> | Array<string>;
}

I would have expected to see content: Array<Expr | string>. Is this a bug or am I missing something?

@msujew
Copy link
Contributor Author

msujew commented May 21, 2024

@agacek Thanks for the info, I've created eclipse-langium/langium#1506 for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
recipe Improvements or additions to recipes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a guide on how to customize lexing
3 participants