Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for Python code cells #2978

Closed
gerazov opened this issue Apr 27, 2021 · 12 comments · Fixed by #3000
Closed

Adding support for Python code cells #2978

gerazov opened this issue Apr 27, 2021 · 12 comments · Fixed by #3000

Comments

@gerazov
Copy link

gerazov commented Apr 27, 2021

Cells are useful for structuring code and navigating through them in Vim using Tagbar would be very useful. Tagbar and other outline plugins depend on ctags to get the tags they show. I was wandering if there is a way to enable cell tagging atm, and if not is it planned for a future release?

I originally posted this issue on Tagbar here it is for reference preservim/tagbar#759

Cells can be defined in several ways, e.g. # %%:

import numpy as np
from matplotlib import pyplot as plt
from scipy.io import wavfile
import os

# %% generate sound
f = 12000
fs = 44100
t = np.arange(0, 1, 1/fs)
sound = np.sin(2*np.pi*f * t)

# %% plot sound
plt.plot(t, sound)

# %% play sound
wavfile.write('sound.wav', fs, np.int16(sound * 2**15))
os.system('play sound.wav')

Atm this code only shows variables in the tagbar.

image

@masatake
Copy link
Member

Is there definition of cell?
@@#^^^:: , is this cell?
How about ::->::?
is this cell?

Another question is "cell" is python specific? or language neutral?

@gerazov
Copy link
Author

gerazov commented Apr 27, 2021

There are several ways, e.g. vim-ipython-cell defines them as # %%, #%%, # <codecell>, or ##.

Spyder IDE supports #%%, # %% (linters demand a space after the #) and # <codecell> used in IPython notebooks. The IDE also supports cell hierarchy via the number of % signs i.e. #%%, #%%%, #%%%% etc. and this is reflected in the outline.

They are used in other languages too, e.g. Matlab uses %%.

Julia I guess uses ## as a cell delimiter, but I'm not sure if it's only limited to VSCode.

@alerque
Copy link

alerque commented Apr 27, 2021

Not to clog up the works here, but after being confused about the Python specific nomenclature here, it finally dawned an me that this is a pretty old concept rebranded for a new language.

I think this is completely orthogonal to the language being used, with the caveat that they are usually buried behind the language's native comment syntax. Some editors have first class support for this that is editor specific. VIM for example has manual folds, triggered with {{{ foo and ending }}}. Of course these are expected in comments, so for Python it would be # {{{ foo to # }}} . I believe some other editors have similar markers.

Additionally many languages have ad-hoc usages that are common to see but not standardized. For example in HTML you might bind <!-- BEGIN: foo --> and <!-- END: foo -->, but with no editor or tooling standards the only consistency that exists is inside projects.

I suggest this needs to be some kind of flag that can be enabled for any language based on it's comment syntax, possible with a flag being used to specify the syntax. The ctags execution could then be tailored per project or language while the output would be standardized making it easy for tooling to expand on.

@masatake
Copy link
Member

$ cat /tmp/python-cell.ctags 
--langdef=PythonCell{base=Python}
--kinddef-PythonCell=c,cell,cells
--regex-PythonCell=/^# %%[ \t]*(.*[^ \t])/\1/c/
# YOU CAN ADD MORE PATTERNS HERE.
$ cat /tmp/input.py 
import numpy as np
from matplotlib import pyplot as plt
from scipy.io import wavfile
import os

# %% generate sound
f = 12000
fs = 44100
t = np.arange(0, 1, 1/fs)
sound = np.sin(2*np.pi*f * t)

# %% plot sound
plt.plot(t, sound)

# %% play sound
wavfile.write('sound.wav', fs, np.int16(sound * 2**15))
os.system('play sound.wav')
$ u-ctags --sort=no --fields=+'{language}' --extras=+'{subparser}' --options=/tmp/python-cell.ctags -o - /tmp/input.py 
np	/tmp/input.py	/^import numpy as np$/;"	I	language:Python	nameref:module:numpy
plt	/tmp/input.py	/^from matplotlib import pyplot as plt$/;"	x	language:Python	nameref:unknown:pyplot
generate sound	/tmp/input.py	/^# %% generate sound$/;"	c	language:PythonCell
f	/tmp/input.py	/^f = 12000$/;"	v	language:Python
fs	/tmp/input.py	/^fs = 44100$/;"	v	language:Python
t	/tmp/input.py	/^t = np.arange(0, 1, 1\/fs)$/;"	v	language:Python
sound	/tmp/input.py	/^sound = np.sin(2*np.pi*f * t)$/;"	v	language:Python
plot sound	/tmp/input.py	/^# %% plot sound$/;"	c	language:PythonCell
play sound	/tmp/input.py	/^# %% play sound$/;"	c	language:PythonCell

This is for python.
However, as @alerque wrote, this approach can support only one language.
To supported "language objects" in comments in language neutral way, more thoughts are needed.

Not to clog up the works here, but after being confused about the Python specific nomenclature here, it finally dawned an me that this is a pretty old concept rebranded for a new language.

Yes. What I would like know is the name of the concept.
How do we call it?

@gerazov
Copy link
Author

gerazov commented Apr 28, 2021

@masatake awesome! 😎

Yeah, as @alerque notes it is rather Editor specific. But, there is some language specific convergence that we can use. For example, for Python a perfectly legit regex can be '\s*# %%.*\|\s*#%%.*\|\s*# <codecell>.*\|\s*##.*' to allow for indentation, as used in vim-ipython-cell for highlighting.

As for the name +1 for code cells 🙂

@masatake
Copy link
Member

If we have to think only about Python, the PythonCell parser I showed in the comment may be enough for fixing your issue partially. I can include the PythonCell parser to u-ctags as a built-in parser after adding more patterns for the cells.

I don't know well about vim. However, I guess my parser fixes your issue partially.
I guess an issue remains in the usage of your client tool.

$ cat /tmp/pytho-cell.ctags 
--langdef=PythonCell{base=Python}
--kinddef-PythonCell=c,cell,cells
--regex-PythonCell=/^# %%[ \t]*(.*[^ \t])/\1/c/

$ cat /tmp/input.py
# %% CELL FOR THE CLASS
class Foo:
    pass

$ u-ctags --options=/tmp/pytho-cell.ctags  -o - /tmp/input.py 
CELL FOR THE CLASS	/tmp/input.py	/^# %% CELL FOR THE CLAS$/;"	c
Foo	/tmp/input.py	/^class Foo:$/;"	c

See the c at the end of lines of u-ctags output.
It represents a kind.
For the python input, c means class kind.
Vim plugin may know this.
c is for CELL FOR THE CLASS means cell kind.

I guess your tool doesn't know c has double meanings.
As a result, vim and vim-plugins may now work as you expect.

If you give more options when running u-ctags, you can give vim hints for distinguishing the kinds:

$ u-ctags --options=/tmp/pytho-cell.ctags --fields='{language}K' -o - /tmp/input.py 
CELL FOR THE CLAS	/tmp/input.py	/^# %% CELL FOR THE CLAS$/;"	cell	language:PythonCell
Foo	/tmp/input.py	/^class Foo:$/;"	class	language:Python

I wonder vim can utilize the output enough for your purpose.
We are discussing this issue at #2976.

As @alerque wrote, I think this issue is solved in language-neutral way.
I recognized this issue, and I had a plan to introduce a concept named preparser.
A parser works for specified input files.
Unlike parsers, a preparser works for cross-input files.

@masatake
Copy link
Member

See #2980, too.

@gerazov
Copy link
Author

gerazov commented May 4, 2021

Great - thanks! 👍

I hope this solves it on the VIM side.

Could I suggest the following regex update for cells:

--regex-PythonCell=/^[ \t]*# %[%]+[ \t]*(.*[^ \t])/\1/c/

This allows for tabs/spaces before the cell tag, and multiple % signs.

masatake added a commit to masatake/ctags that referenced this issue May 4, 2021
Close universal-ctags#2978

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake
Copy link
Member

masatake commented May 4, 2021

I'm not good at English.

Could you review 7935a2c#diff-64a31aeda11dbbe0458851a2db50a7cc14894fe93fec938ce753e2144f2df80a ?

@gerazov
Copy link
Author

gerazov commented May 4, 2021

Great stuff - thanks! 🙇

masatake added a commit to masatake/ctags that referenced this issue May 5, 2021
Close universal-ctags#2978

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake
Copy link
Member

masatake commented May 5, 2021

@gerazov, thank you very much for reviewing.

masatake added a commit to masatake/ctags that referenced this issue May 5, 2021
Close universal-ctags#2978

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@gerazov
Copy link
Author

gerazov commented May 5, 2021

Thank you for your fast response 🤟

masatake added a commit to masatake/ctags that referenced this issue May 9, 2021
Close universal-ctags#2978

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants