Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect syntax highlighting of tag with note #74

Open
cypok opened this issue Apr 10, 2019 · 5 comments
Open

Incorrect syntax highlighting of tag with note #74

cypok opened this issue Apr 10, 2019 · 5 comments

Comments

@cypok
Copy link
Contributor

cypok commented Apr 10, 2019

Sometimes I have a tag with additional note on a single line (like in 2nd or 3rd transaction) and now they are highlighted not like normal tags (like in 1st transaction):
Снимок экрана 2019-04-10 в 9 51 00

Ledger in strict mode reports:

Unknown metadata tag 'aaa'
Unknown metadata tag 'bbb'
Unknown metadata tag 'eee'
Unknown metadata tag 'jjj'
Unknown metadata tag 'lll'

So these tags should be highlighted as tags.

@alerque
Copy link
Member

alerque commented Jul 1, 2019

I'd be happy to review any PRs related to this. Note that it would be nice to keep in mind that hledger's tag syntax is a bit different and it would be nice to support both.

@alerque
Copy link
Member

alerque commented Nov 4, 2019

See discussion in Neovim Gitter chat related to how to support two syntax variants in one filetype. See sh filetype support and it's support for bash variants.

@alerque
Copy link
Member

alerque commented Nov 4, 2019

I started working on this to include in PR #94, but I'm not quite sure how to fix this without gutting the current tag matching rules and I don't want to do that without an understanding of what I might be breaking.

@kljohann I'm having a really hard time making out why the tag parser is so complicated.

exe 'syn match ledgerTags '.
\ '/'.s:oe.'\%(\%(;\s*\|^tag\s\+\)\)\@<='.
\ ':[^:[:space:]][^:]*\%(::\?[^:[:space:]][^:]*\)*:\s*$/ '.
\ 'contained contains=ledgerTag'
syn match ledgerTag /:\zs[^:]\+\ze:/ contained

It seems to me this could be about 100 times simpler, since tags are only allowed in comments anyway we just have to match :\?[^\s:]\+: for the tag itself, then for ledger everything after is a value and for hledger everything up to another ; is a value. Because you've tweaked this logic several times maybe you could shed some light on what all the extra complexity is supposed to catch.

@alerque
Copy link
Member

alerque commented Nov 4, 2019

For ease of anybody else working on this, the approximate situation in the original issue can be copied from here:

2019/08/26 test  ; thing
	bar
	foo  1 ; :aaa:
	foo  1 ; :bbb: ccc
	foo  1 ; ddd :eee:
	foo  1 ; fff ggg
	foo  1 ; hhh :iii
	foo  1 ; jjj: kkk
	foo  1 ; lll: mmm:
	foo  1 ; nnn

@kljohann
Copy link
Member

@alerque I don't quite remember all the details, but a look at https://www.ledger-cli.org/3.0/doc/ledger3.html and :help reveals some of them:

  • s:oe selected the old regex engine for backtracking support. I don't think this is necessary any more.
  • \%(\) is just a non-capturing grouping.
  • \%(\%(;\s*\|^tag\s\+\)\)\@<= We only want to match a run of tags inside either a comment or a pre-declared tag name. I think as long as we can constrain the match to comments in some other way this can be removed. A look at the code suggests that :something: indeed counts as a tag even if it's preceded by arbitrary other text, see below. The ^tag part can be removed in any case, as pre-declared tags are not written using colons.
  • The rest of the regex (:[^:[:space:]][^:]*\%(::\?[^:[:space:]][^:]*\)*:) encodes the assumption that there should be no space between tags and that they should either look like this :tag::other-tag::third-tag: or :tag:other-tag:third-tag:. I think this isn't accurate, and a quick glance at the parsing code in ledger seems to confirm this.
  • The ;\s* and \s*$ parts were based on the belief that there should be only tags on a that particular line of a comment for them to count.

Note that the match rule you highlighted was only intended to highlight a series of tags, not the tag names of key-value metadata settings. There is ledgerValueTag and ledgerTypedTag for that.

So as you suggested the expression can likely be radically simplified. Here are some notes from a glance at the definition of item_t::parse_tags():

  • The comment text is split into tokens separated by spaces or tabs.
  • Iff a single token both begins with and ends with a : it is considered to be a "series of tags". (→ Tags cannot contain spaces.)
    • This series of tags is split into tokens separated by colons. These individual tokens are the tags.
  • Else if the token only ends with a : and is the first token on the line it is considered the start of a metadata setting. If the token even ends with two colons (key::) it is a special kind of metadata setting ("by value"). The latter part is covered by ledgerValueTag vs. ledgerTypedTag, though I think their names and definitions are swapped in ledger.vim.

I hope that helps. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants