Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python syntax file, support f-string #14033

Open
A4-Tacks opened this issue Feb 14, 2024 · 11 comments · May be fixed by #14048
Open

python syntax file, support f-string #14033

A4-Tacks opened this issue Feb 14, 2024 · 11 comments · May be fixed by #14048

Comments

@A4-Tacks
Copy link

A4-Tacks commented Feb 14, 2024

Vim syntax file for Python, with support for f-string added

  • f-string support
  • Fixed shortstring with multiple lines still highlighted
  • Update string prefix to Python3.10+ syntax

have contacted the maintainer zpetkovic@acm.org But there seems to be no response

python.vim.patch

--- /usr/share/vim/vim91/syntax/python.vim	2024-01-03 12:24:01.000000000 +0800
+++ python.vim	2024-02-14 12:03:07.910000000 +0800
@@ -143,17 +143,35 @@
 
 " Triple-quoted strings can contain doctests.
 syn region  pythonString matchgroup=pythonQuotes
-      \ start=+[uU]\=\z(['"]\)+ end="\z1" skip="\\\\\|\\\z1"
+      \ start=+[uUbB]\=\z(['"]\)+ end="$\|\z1" skip="\\\\\|\\\z1"
       \ contains=pythonEscape,@Spell
 syn region  pythonString matchgroup=pythonTripleQuotes
-      \ start=+[uU]\=\z('''\|"""\)+ end="\z1" keepend
+      \ start=+[uUbB]\=\z('''\|"""\)+ end="\z1" keepend
       \ contains=pythonEscape,pythonSpaceError,pythonDoctest,@Spell
 syn region  pythonRawString matchgroup=pythonQuotes
-      \ start=+[uU]\=[rR]\z(['"]\)+ end="\z1" skip="\\\\\|\\\z1"
+      \ start=+\%([bB][rR]\=\|[rR][bB]\=\)\z(['"]\)+
+      \ end="$\|\z1" skip="\\\\\|\\\r\=$\|\\\z1"
       \ contains=@Spell
 syn region  pythonRawString matchgroup=pythonTripleQuotes
-      \ start=+[uU]\=[rR]\z('''\|"""\)+ end="\z1" keepend
+      \ start=+\%([bB][rR]\=\|[rR][bB]\=\)\z('''\|"""\)+
+      \ end="\z1" keepend
       \ contains=pythonSpaceError,pythonDoctest,@Spell
+syn region  pythonFString matchgroup=pythonQuotes
+      \ start=+[fF]\z(['"]\)+
+      \ end="$\|\z1" skip="\\\\\|\\\z1"
+      \ contains=pythonEscape,pythonFStringEscapedBrace,pythonFStringValue,@Spell
+syn region  pythonFString matchgroup=pythonTripleQuotes
+      \ start=+[fF]\z('''\|"""\)+
+      \ end="\z1" keepend
+      \ contains=pythonEscape,pythonSpaceError,pythonFStringEscapedBrace,pythonFStringValue,pythonDoctest,@Spell
+syn region  pythonRawFString matchgroup=pythonQuotes
+      \ start=+\%([fF][rR]\|[rR][fF]\)\z(['"]\)+
+      \ end="$\|\z1" skip="\\\\\|\\\r\=$\|\\\z1"
+      \ contains=pythonFStringEscapedBrace,pythonFStringValue,@Spell
+syn region  pythonRawFString matchgroup=pythonTripleQuotes
+      \ start=+\%([fF][rR]\|[rR][fF]\)\z('''\|"""\)+
+      \ end="\z1" keepend
+      \ contains=pythonSpaceError,pythonFStringEscapedBrace,pythonFStringValue,pythonDoctest,@Spell
 
 syn match   pythonEscape	+\\[abfnrtv'"\\]+ contained
 syn match   pythonEscape	"\\\o\{1,3}" contained
@@ -161,7 +179,11 @@
 syn match   pythonEscape	"\%(\\u\x\{4}\|\\U\x\{8}\)" contained
 " Python allows case-insensitive Unicode IDs: http://www.unicode.org/charts/
 syn match   pythonEscape	"\\N{\a\+\%(\s\a\+\)*}" contained
-syn match   pythonEscape	"\\$"
+syn match   pythonEscape	"\\\r\=$"
+
+syn region  pythonFStringValue matchgroup=pythonFStringBrace
+      \ start=+{+ end=+=\=\%(![rsa]\)\=\%(:[^{}]*\%({[^{}]*\%({[^{}]*}[^{}]*\)*}[^{}]*\)*\)\=}+ contains=TOP contained excludenl
+syn match   pythonFStringEscapedBrace	"{{\|}}" contained
 
 " It is very important to understand all details before changing the
 " regular expressions below or their order.
@@ -312,9 +334,14 @@
 hi def link pythonTodo			Todo
 hi def link pythonString		String
 hi def link pythonRawString		String
+hi def link pythonFString		String
+hi def link pythonRawFString		String
 hi def link pythonQuotes		String
 hi def link pythonTripleQuotes		pythonQuotes
 hi def link pythonEscape		Special
+hi def link pythonFStringEscapedBrace	Special
+hi def link pythonFStringValue		Normal
+hi def link pythonFStringBrace		Include
 if !exists("python_no_number_highlight")
   hi def link pythonNumber		Number
 endif
@A4-Tacks
Copy link
Author

f"result: {value:{width}.{precision}}" seems to be unable to be handled well

@A4-Tacks
Copy link
Author

f"result: {value:{width}.{precision}}" seems to be unable to be handled well

Modified, hardcoded several layers, maybe enough

@dkearns
Copy link
Contributor

dkearns commented Feb 17, 2024

Thanks. Could you please create a PR?

f"result: {value:{width}.{precision}}" seems to be unable to be handled well

You can add the pythonFStringValue group to its own contains list to simplify matching these nested replacement fields. Then the end pattern can probably be reduced to a closing brace.

Ping @zvezdan

@A4-Tacks
Copy link
Author

Thanks. Could you please create a PR?

f"result: {value:{width}.{precision}}" seems to be unable to be handled well

You can add the pythonFStringValue group to its own contains list to simplify matching these nested replacement fields. Then the end pattern can probably be reduced to a closing brace.

Ping @zvezdan

I cannot add it to its own contains list, it should only take effect on the right side of the colon, and the expression needs to be highlighted on the left side

@A4-Tacks A4-Tacks linked a pull request Feb 17, 2024 that will close this issue
@zvezdan
Copy link

zvezdan commented Feb 19, 2024

have contacted the maintainer (...) But there seems to be no response

@A4-Tacks I found four emails from you in the spam quarantine of my email provider because they were ranked as spam at high-level of probability, due to the following facts:

  • several of them sent successively immediately one after another -- raises suspicion
  • they had an attachment encoded in base64 with rarely used extension (.patch) and type text/x-diff -- could have sent them as plain text instead
  • sender's name A4-Tacks sounds like a word Attacks -- why this choice of name/username/nickname?
  • sender's email address username part is a seemingly random selection of letters and numbers -- why the need to hide the identity?
  • the sender's domain 163.com was suspected as a possible source of spam

Each of these issues individually perhaps wouldn't be a problem, but taken together they resulted in your email messages being quarantined by my provider. I review quarantined messages once a week and I see them now. I allowed them to be delivered and carefully opened them (because I didn't trust them either) as raw messages, copied the base64 encoded content of the attachment into a file and then decoded it into an output file. It is indeed the patch similar to the one you have here.

As you can see, I did not ignore your messages, but, rather, you caused the issue by your choice to send the messages as shown above.

Lessons learned:

  • My identity is not hidden and if you're sending me a personal message you should not hide your identity either. Our communication remains between us only. It also helps with giving you credit later.
  • If you're sending patches, they're text and you can send them as plain text, to avoid spam filters ranking them.
  • If the message with attachment didn't receive response, re-sending it repeatedly is not going to help. Assume the spam filter got activated and send a simple message to start the communication. Once we exchange a few messages, the spam filter will also act differently.
  • Creating a GitHub issue or PR and tagging me on it as @dkearns did here is perhaps the simplest way and avoids all email spam filter issues. 🙂

@zvezdan
Copy link

zvezdan commented Feb 19, 2024

Ping @zvezdan

Thank you both @dkearns and @A4-Tacks!

I'm currently testing this change, but also a few other options that I worked on and some other people sent. Some of the options are whether the bytes should be treated separately from strings, allowing for separate highlight coloring for people who prefer that, etc. This patch/issue is not making that distinction. Also, getting the contains things right can be tricky. I have an implementation of similar handling already for docstrings and my personal reStructuredText syntax highlighting setup and I'm trying to get the best combination of options and functionality.

In the meantime, I'm inclined to send a simpler PR that would just support f-strings correctly without trying to do any interpretation inside them. That would resolve some other already open issues, as well as partially covering this one. Then, I am testing a setting similar to python_no_doctest_highlight that would allow switching ON/OFF the highlighting and interpretation of the content of f-strings. I believe that such a change would add more flexibility for users with different preferences.

In any case, I am working on this change and will incorporate all valid ideas including the ones from this issue and the corresponding patch. Thank you again and I'll keep you posted about the results of my testing soon.

@dkearns
Copy link
Contributor

dkearns commented Feb 19, 2024

I had a quick attempt at this on the back of @A4-Tacks' PR. This patch allows for replacement fields in the format spec and avoids premature termination of the expression by spurious matches for the conversion and format spec delimiters.

IIUC, the following are correctly matched:

# Lib/fractions.py
digits = f"{significand:0{point_pos + 1}d}"
# Lib/test/test_fstring.py
self.assertEqual(f"sadsd {1 + 1 =  :{1 + 1:1d}f}", "sadsd 1 + 1 =  2.000000")
# example from PEP - 701
f'Useless use of lambdas: { (lambda x: x*2) }'

@zvezdan and @A4-Tacks, this is just intended as a proof-of-concept but if you think this is worth pursuing I can make a PR for consideration as a possible alternative to #14048.

Edit: patch removed in favour of #14057

@A4-Tacks
Copy link
Author

I had a quick attempt at this on the back of @A4-Tacks' PR. This patch allows for replacement fields in the format spec and avoids premature termination of the expression by spurious matches for the conversion and format spec delimiters.

IIUC, the following are correctly matched:

# Lib/fractions.py
digits = f"{significand:0{point_pos + 1}d}"
# Lib/test/test_fstring.py
self.assertEqual(f"sadsd {1 + 1 =  :{1 + 1:1d}f}", "sadsd 1 + 1 =  2.000000")
# example from PEP - 701
f'Useless use of lambdas: { (lambda x: x*2) }'

Why is the def abnormal after this? Did I apply the damaged patch?

image

# Lib/fractions.py
digits = f"{significand:0{point_pos + 1}d}"
# Lib/test/test_fstring.py
self.assertEqual(f"sadsd {1 + 1 =  :{1 + 1:1d}f}", "sadsd 1 + 1 =  2.000000")
# example from PEP - 701
f'Useless use of lambdas: { (lambda x: x*2) }'

def foo() -> None:
    pass

@dkearns
Copy link
Contributor

dkearns commented Feb 19, 2024

Why is the def abnormal after this? Did I apply the damaged patch?

Sorry, I only tested against plain f-strings and overlooked the use of contains=ALLBUT,... in a few places throughout the syntax file which includes all contained groups.

pythonAttribute is using ALLBUT so pythonFStringExpression is being included there and matching from assertEqual onwards. You can probably get away with setting python_no_highlight_builtin to disable that for demonstration purposes but it would be better just to test against plain f-strings for now and I'll look at integrating it properly with the existing use of ALLBUT.

@zvezdan, this use of ALLBUT is generally a problem. E.g., self.TODO matches with pythonTodo which is obviously supposed to be contained to comments.

Thanks for testing it out.

dkearns added a commit to dkearns/vim that referenced this issue Feb 19, 2024
dkearns added a commit to dkearns/vim that referenced this issue Feb 19, 2024
@zvezdan
Copy link

zvezdan commented Feb 20, 2024

@dkearns The places where ALLBUT is used were added because they were needed to correctly display syntax in doctests, matrix multiplication, and when an object attribute name matches a builtin name. Any of these ALLBUT lists can and should be extended if needed when adding new features, such as f-strings. Of course, we can also have a separate conversation about the use ALLBUT in general and possible better solutions for the existing usage.

However, the PR associated with this issue is trying to do two things: add f-string support and change a definition of strings to include bytes. I prefer separate changes for separate purpose that can be also separately reverted if needed. Adding a full support for f-strings with all the correct contains lists is tricky by itself and, as I mentioned earlier, might be best done iteratively in separate changes. First a simple support for f-strings to be correctly highlighted as strings. Then more complex contains features.

There are other contributions and options that I'm looking into. Some of them prefer bytes to be a separate highlight category. People also prefer having an option to switch features ON/OFF. I doubt that any of my own or contributed changes, including this PR, will go in directly as is into the syntax file. A combination of them, that I'm working on, sent as smaller separate commits that add features and functionality with increasing usefulness and with switches for their use will be probably the result of this work.

Give me a couple of days to prepare the changes please. Then we can start closing the outstanding issues and PRs as we commit them.

Again, I truly appreciate all the contributions and ideas for improvements from all of you!

dkearns added a commit to dkearns/vim that referenced this issue Feb 20, 2024
dkearns added a commit to dkearns/vim that referenced this issue Feb 20, 2024
dkearns added a commit to dkearns/vim that referenced this issue Feb 20, 2024
dkearns added a commit to dkearns/vim that referenced this issue Feb 20, 2024
@dkearns
Copy link
Contributor

dkearns commented Feb 20, 2024

Sure, I just posted my PR for discussion purposes. Feel free to take as much or as little of it as you like.

@dkearns dkearns added the runtime label Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants