Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on tab indentation rules #255

Open
herabit opened this issue Nov 12, 2023 · 3 comments
Open

Clarification on tab indentation rules #255

herabit opened this issue Nov 12, 2023 · 3 comments

Comments

@herabit
Copy link

herabit commented Nov 12, 2023

I was hoping for there to be some clarification on the rules for parsing indentation that includes tabs. I am currently (attempting, we'll see if I get far) to implement a parser for tree-sitter, and I was unsure if I should follow what commonmark does, and "treat" a tab as four spaces when handling indentation levels, or if it somehow differs in djot.

@jgm
Copy link
Owner

jgm commented Nov 14, 2023

Probably we should add something about this. I am somewhat embarrassed to say that djot.js does something fairly crude:

  // move parser position to first nonspace, adjusting indent
  skipSpace(): void {
    const subject = this.subject;
    let newpos = this.pos;
    while (isSpaceOrTab(subject.codePointAt(newpos))) newpos++;
    this.indent = newpos - this.startline;
    this.pos = newpos;
  }

which amounts to a tab stop of 1!

I guess I'd be inclined to use a tab stop of 4 in computing this. (That's a bit different from treating a tab as four spaces, since SPACE + TAB and TAB might both put you in the same column.) The code would have to be modified a bit.

Anyone have feedback on this?

@vassudanagunta
Copy link
Contributor

I believe for djot it only matters in the following two cases, and even then only when the actual tab stop is other than what djot assumes AND there is inconsistent use of spaces vs tabs:

( represents a tab char)

  1. multiple levels of nesting

    - parent
      - child
        - grandchild
    →→which list am I nested within?
      →what about me?
    →and me?
    

    A tab stop of 4 would make the last three lines, respectively: great-grandchild, grandchild, grandchild.
    A tab stop of 2: grandchild, grandchild, child.
    A tab stop of 8: great-grandchildren all.

  2. markers with insignificant leading whitespace

      - marker with two leading spaces
    →- nested if tab stop is 4, not nested if 2
    

If tabs are used consistently, either because the writer is disciplined or the editor automatically converts spaces to tabs, it doesn't matter if the tab stop differs from djot's assumption:

- parent
→- child regardless of tab stop value
→→- grandchild regardless of tab stop value
→→→great-grandchild regardless of tab stop value
→→ great-grandchild regardless of tab stop value
→→grandchild regardless of tab stop value
→ grandchild regardless of tab stop value
→child regardless of tab stop value
 child regardless of tab stop value

The last observation suggests an out-of-box idea: interpret a tab to mean "take me to the next level of nesting". It may be a bad idea, but throwing it out for consideration. The motivation is that it might be more resilient than a fixed value of 4? I need to sleep now :)

@jgm
Copy link
Owner

jgm commented Nov 15, 2023

interpret a tab to mean "take me to the next level of nesting".

That's more or less how it works now. And, as you say, it won't cause a problem if tabs are used consistently and people don't put spaces before tabs. But that in practice people aren't consistent, so I think a tab stop would be less confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants