Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to enable support for Python Cells in the tagbar? #759

Open
gerazov opened this issue Apr 19, 2021 · 9 comments
Open

Is there a way to enable support for Python Cells in the tagbar? #759

gerazov opened this issue Apr 19, 2021 · 9 comments

Comments

@gerazov
Copy link

gerazov commented Apr 19, 2021

Cells are useful for structuring code and navigating through them with the tagbar would be very useful. They are defined in several ways, e.g. # %%:

import numpy as np
from matplotlib import pyplot as plt
from scipy.io import wavfile
import os

# %% generate sound
f = 12000
fs = 44100
t = np.arange(0, 1, 1/fs)
sound = np.sin(2*np.pi*f * t)

# %% plot sound
plt.plot(t, sound)

# %% play sound
wavfile.write('sound.wav', fs, np.int16(sound * 2**15))
os.system('play sound.wav')

Atm this code only shows variables in the tagbar.

image

There are some plugins that give cell support like vim-ipython-support that could be useful?

@alerque
Copy link
Member

alerque commented Apr 19, 2021

Ad-hock ways of sectioning code that aren't part of the language itself won't be easy to handle. Not impossible but not easy.

If there is a standardized way of doing this support could be added to Universal Ctags upstream and then we could easily add support for it.

Keep in mind this plugin doesn't parse the actual content file at all, it just parses the output of a tag provider. If you provide tags, you can setup Tagbar to give you an interface to them. but are there any tag providers out there that produce anything for "cells"?

@gerazov
Copy link
Author

gerazov commented Apr 27, 2021

Ok posted there - let's see what they say 😉

@alerque
Copy link
Member

alerque commented May 5, 2021

Now that the upstream feature seems to be merged (see issue and pr), what needs to happen to Tagbar's tag maps to support this?

@raven42
Copy link
Collaborator

raven42 commented May 5, 2021

From the issue and pr it looks like this is a variant on the python language? Or is this the python language itself with the cell definition used in comments to logically break up the code?

Standard Python

If this is straight up python, we might have a problem. The c tag kind is already reserved for the class kind for python, so changing the definition for that will be problematic at best. If this is the case, it might be best to update the ctags code to use a different tag kind identifier like a capital C instead so it can be unique.... this might be a good idea regardless. If this is changed to use a unique identifier for the tag kind, then the existing python definition could be updated like this possibly:

diff --git a/autoload/tagbar/types/uctags.vim b/autoload/tagbar/types/uctags.vim
index c1715ba..25abf73 100644
--- a/autoload/tagbar/types/uctags.vim
+++ b/autoload/tagbar/types/uctags.vim
@@ -814,15 +814,18 @@ function! tagbar#types#uctags#init(supported_types) abort
         \ {'short' : 'c', 'long' : 'classes',   'fold' : 0, 'stl' : 1},
         \ {'short' : 'f', 'long' : 'functions', 'fold' : 0, 'stl' : 1},
         \ {'short' : 'm', 'long' : 'members',   'fold' : 0, 'stl' : 1},
-        \ {'short' : 'v', 'long' : 'variables', 'fold' : 0, 'stl' : 0}
+        \ {'short' : 'v', 'long' : 'variables', 'fold' : 0, 'stl' : 0},
+        \ {'short' : 'C', 'long' : 'cell',      'fold' : 0, 'stl' : 1},
     \ ]
     let type_python.sro        = '.'
     let type_python.kind2scope = {
+        \ 'C' : 'cell',
         \ 'c' : 'class',
         \ 'f' : 'function',
         \ 'm' : 'function'
     \ }
     let type_python.scope2kind = {
+        \ 'cell'     : 'C',
         \ 'class'    : 'c',
         \ 'function' : 'f'
     \ }

New Language:

If this is a new language which is based on python, then we would probably need to define a new language type and change the c tag kind for the new language to cell instead of class. Something like this might work:

    " IPythonCell {{{1
    let type_ipythoncell = tagbar#prototypes#typeinfo#new()
    let type_ipythoncell.ctagstype = 'IPythonCell'
    let type_ipythoncell.kinds     = [
        \ {'short' : 'i', 'long' : 'modules',   'fold' : 1, 'stl' : 0},
        \ {'short' : 'c', 'long' : 'cell',   'fold' : 0, 'stl' : 1},
        \ {'short' : 'f', 'long' : 'functions', 'fold' : 0, 'stl' : 1},
        \ {'short' : 'm', 'long' : 'members',   'fold' : 0, 'stl' : 1},
        \ {'short' : 'v', 'long' : 'variables', 'fold' : 0, 'stl' : 0}
    \ ]
    let type_ipythoncell.sro        = '.'
    let type_ipythoncell.kind2scope = {
        \ 'c' : 'cell',
        \ 'f' : 'function',
        \ 'm' : 'function'
    \ }
    let type_ipythoncell.scope2kind = {
        \ 'cell'    : 'c',
        \ 'function' : 'f'
    \ }
    let type_ipythoncell.kind2scope.m = 'member'
    let type_ipythoncell.scope2kind.member = 'm'
    let types.ipythoncell = type_python

⚠️ This might require defining the new filetype in your .vimrc as well if vim does not actually detect this as a separate filetype.

Scope

The other thing to keep in mind for all this is scope... I'm not familiar with the ctags code, but the way tagbar uses ctags for the code hierarchy like this is mainly using the scope that ctags outputs marking when a tag scope ends. The tag scope begins on the line where the tag is found, and the tag scope ends based on the language parser in ctags. This is seen in the ctags output here for a .c file marking the end field:

#include <stdio.h>

typedef struct someStruct_s {
	int		s2_var1;
	int		s2_var2;
} someStruct_t;

int function1(int c) {
	return (0);
}

int main(int arvc, char *argv[]) {
	return (0);
}
  Option: --language-force=c
  Option: --c-kinds=hdpgetsumvf
Initialize parser: C
Reading command line arguments
OPENING simple.c as C language file [new]
Initialize parser: CPreProcessor
someStruct_s	simple.c	/^typedef struct someStruct_s {$/;"	s	line:3	file:	end:6
s2_var1	simple.c	/^	int		s2_var1;$/;"	m	line:4	struct:someStruct_s	typeref:typename:int	file:	access:public	end:4
s2_var2	simple.c	/^	int		s2_var2;$/;"	m	line:5	struct:someStruct_s	typeref:typename:int	file:	access:public	end:5
someStruct_t	simple.c	/^} someStruct_t;$/;"	t	line:6	typeref:struct:someStruct_s	file:
function1	simple.c	/^int function1(int c) {$/;"	f	line:8	typeref:typename:int	signature:(int c)	end:10
main	simple.c	/^int main(int arvc, char *argv[]) {$/;"	f	line:12	typeref:typename:int	signature:(int arvc,char * argv[])	end:14

Without an end field, then scope in tagbar assumes the scope goes to the end of the file. This might pose a problem for building the tag hierarchy in the tagbar window... but testing would show if it functions as needed. Then again, it might not be a problem either... just thought it might be worth mentioning.

@gerazov
Copy link
Author

gerazov commented May 5, 2021

@raven42 thanks for the detailed analysis 🤟 This is standard Python, i.e. it's not defined by the language itself, but it's a useful feature that a number of editors also support. Vim too through plugins such as vim-ipython-cell

In that line, we didn't realize that using c for a cell would shadow the class tag. I will raise a new issue at ctags and hopefully the cool people over there will fix the recent PR 😎

@gerazov
Copy link
Author

gerazov commented May 6, 2021

Ok, as @masatake pointed out it seems that it still can work with the c tag because the language descriptior for cells is PythonCell, as seen here:

$ cat /tmp/python-cell.ctags 
--langdef=PythonCell{base=Python}
--kinddef-PythonCell=c,cell,cells
--regex-PythonCell=/^# %%[ \t]*(.*[^ \t])/\1/c/
# YOU CAN ADD MORE PATTERNS HERE.
$ cat /tmp/input.py 
import numpy as np
from matplotlib import pyplot as plt
from scipy.io import wavfile
import os

# %% generate sound
f = 12000
fs = 44100
t = np.arange(0, 1, 1/fs)
sound = np.sin(2*np.pi*f * t)

# %% plot sound
plt.plot(t, sound)

# %% play sound
wavfile.write('sound.wav', fs, np.int16(sound * 2**15))
os.system('play sound.wav')
$ u-ctags --sort=no --fields=+'{language}' --extras=+'{subparser}' --options=/tmp/python-cell.ctags -o - /tmp/input.py 
np	/tmp/input.py	/^import numpy as np$/;"	I	language:Python	nameref:module:numpy
plt	/tmp/input.py	/^from matplotlib import pyplot as plt$/;"	x	language:Python	nameref:unknown:pyplot
generate sound	/tmp/input.py	/^# %% generate sound$/;"	c	language:PythonCell
f	/tmp/input.py	/^f = 12000$/;"	v	language:Python
fs	/tmp/input.py	/^fs = 44100$/;"	v	language:Python
t	/tmp/input.py	/^t = np.arange(0, 1, 1\/fs)$/;"	v	language:Python
sound	/tmp/input.py	/^sound = np.sin(2*np.pi*f * t)$/;"	v	language:Python
plot sound	/tmp/input.py	/^# %% plot sound$/;"	c	language:PythonCell
play sound	/tmp/input.py	/^# %% play sound$/;"	c	language:PythonCell

I hope this doesn't complicate things on your side too much 🤞

@raven42
Copy link
Collaborator

raven42 commented May 6, 2021

@gerazov based on that input, it looks like that is a new type of language though. It is based on python, but it looks like that definition is creating a new language definition called PythonCell. Though I could be wrong here... it just looks like there is the --langdef=PythonCell defined in the python-cell.ctags file in the example. From my understanding of the universal ctags options, this creates a new language type. Though maybe someone else knows better...

As a more general question, for a python script that has this cell architecture in the file, can it also have classes? Based on the previous comments, it sounds like this is just normal python, so classes would likely be allowed even with cell definitions in the file.

Tagbar parsing

From the tagbar perspective, it does not use the language descriptor in the tag info. If this is (as has been previously said) just a normal python file with only the cell definitions in the comments, then tagbar would have no way to differentiate between the two. At least from my knowledge anyway. If there is somebody else who knows the structure of tagbar that knows otherwise, I'm open to educating myself more.

The ctags command as tagbar would execute it currently is defined like this:

ctags --extras=+F -f - --format=2 --excmd=pattern --fields=nksSafet --sort=no --append=no -V --language-force=python --python-kinds=icfmv <file>

So in this instance it is forcing a language type of python, and the fields it is printing does not currently include the l indicator for the language descriptor. This part would be easy enough to add, but then there would also have to be some significant changes to the tag parser in tagbar to interpret the tag kind differently depending on the language descriptor.

Cell Tree Hierarchy

I guess one of the main questions to ask is how do you envision the tagbar tree structure to look like? Are you meaning to have the variables / functions / classes and such all reside in a tree under the cell definition above it? Or do you just want a listing of all the cells? For the below discussions, lets look at the following example:

import os

# %% Cell_1 // start of 'Cell 1' scope
def some_function(some_arg):
    return

# %% Cell_2 // end of 'Cell 1' scope and start of 'Cell 2' scope
class some_class:
    # %% Cell_2.1 // start of 'Cell 2.1' scope still inside 'Cell 2' scope
    def some_class_function(another_arg):
        return

    # %% Cell_2.2 // 'end of 'Cell 2.1' scope, start of 'Cell 2.2' scope, but still inside 'Cell 2' scope
    def another_class_function():
        return

# %% Cell_3 // end of 'Cell 2.2' scope, end of 'Cell 2' scope, start of 'Cell 3' scope
some_value = "some string"

Scoped Cell Hierarchy

Then there is the question of scope again for the cells. If we look at the example of a scope for a class, there can be multiple functions that are part of a class. But a function cannot be a member of another class. There could be the same function name in another class, but that is a unique tag. For cells though, can a cell span across multiple classes? Or can there be multiple cells within a class as seen in the example python code above? If I'm understanding the feature request correctly, would you be expecting a tagbar tree view like this?

" Press <F1>, ? for help

> Cell_1 : cell
   +some_function(some_arg) : def

> Cell_2 : cell
   >+some_class : class
      > Cell_2.1 : cell
         +another_class_function() : def
      > Cell_2.2 : cell
         +some_class_function(another_arg) : def

> Cell_3: cell
   +some_value

Would this be valid? and what would the scope of the different cells be? I could be way off here, but normal scope in python is defined by using indentation, so my assumption would be the scope would follow the same idea for cells as well as indicated in the above example with comments after the //.

Maybe I am way off here (entirely possible)? From the little bit I've read about the cell structure in python, it is more used for how to execute a given block of code while running in an interactive mode. Would there ever be a need to have a cell inside a function? Most of the examples I've seen are all executing in global space rather than inside a function or class. So maybe this is a more specific use-case. Perhaps the scope is more defined in terms of a given cell starts at the comment containing the cell definition, and the scope goes to the next cell definition regardless of indentation. If this is the case, now this would post another problem with the tagbar hierarchy. If we consider the above example, how would we define the tree structure with the cells in there? Cell_1 is defined in global scope, as is Cell_2, so that would be easy enough... but now Cell_2.1 is defined inside of a class as is Cell_2.2. This could be tricky to determine the hierarchy for creating a nested tree view.

Tagbar uses the scope information from the ctags output to determine the parent. If we look at the example above, the ctags output that tagbar parses would be as follows:

  Option: --language-force=python
  Option: --python-kinds=icfmv
Initialize parser: Python
Reading command line arguments
OPENING test.py as Python language file [new]
some_function	test.py	/^def some_function(some_arg):$/;"	f	line:5	access:public	signature:(some_arg)	end:6
some_class	test.py	/^class some_class:$/;"	c	line:10	access:public	end:17
some_class_function	test.py	/^    def some_class_function(another_arg):$/;"	m	line:12	class:some_class	access:public	signature:(another_arg)	end:13
another_class_function	test.py	/^    def another_class_function():$/;"	m	line:16	class:some_class	access:public	signature:()	end:17
some_value	test.py	/^some_value = "some string"$/;"	v	line:21	access:public

So in this case, if we look at the tag for some_class_function, it has a parent definition of class:some_class. This lets tagbar know that it should build the hierarchy like this:

" Press <F1>, ? for help

>+some_class : class
   +another_class_function() : def
   +some_class_function(another_arg) : def

 +some_function(some_arg) : def

> variables (1)
   +some_value

If tagbar doesn't have the parent identifier, it doesn't know how to build the hierarchy. So in the case of the cell definitions, tagbar would need that info with something like this (assuming a capital C for the cell tag kind):

  Option: --language-force=python
  Option: --python-kinds=icfmv
Initialize parser: Python
Reading command line arguments
OPENING test.py as Python language file [new]
Cell_1	test.py	/^# %% Cell_1$/;"	C	line:4	access:public	
some_function	test.py	/^def some_function(some_arg):$/;"	f	line:5	cell:Cell_1	access:public	signature:(some_arg)	end:6
Cell_2	test.py	/^# %% Cell_2$/;"	C	line:9	access:public
some_class	test.py	/^class some_class:$/;"	c	line:10	cell:Cell_2	access:public	end:17
Cell_2.1	test.py	/^    # %% Cell_2.1$/;"	C	line:11	class:some_class	access:public
some_class_function	test.py	/^    def some_class_function(another_arg):$/;"	m	line:12	cell:Cell_2.1	access:public	signature:(another_arg)	end:13
Cell_2.2	test.py	/^    # %% Cell_2.2$/;"	C	line:15	class:some_class	access:public
another_class_function	test.py	/^    def another_class_function():$/;"	m	line:16	cell:Cell_2.2	access:public	signature:()	end:17
Cell_3	test.py	/^    # %% Cell_3$/;"	C	line:20	access:public
some_value	test.py	/^some_value = "some string"$/;"	v	line:21	cell:Cell_3	access:public

Note all the changes in the --fields=s output for the tag scope. For example, the some_class_function tag now has a tag scope of cell:Cell_2.1 instead of class:some_class. And the Cell_2.1 and Cell_2.2 tags are defined with a scope of class:some_class. This would be needed to provide a proper tree hierarchy in tagbar.

Unscoped Cell Hierarchy

Without the tag scope output from ctags, tagbar cannot properly identify the tag hierarchy. So all existing tag definitions would need to have the proper tag scope setup so they point to the new cell tag. Without that tag scope, they would fall into a generic category tree like this:


>+some_class : class
   +another_class_function() : def
   +some_class_function(another_arg) : def

 +some_function(some_arg) : def

> cells (5)
   +Cell_1
   +Cell_2
   +Cell_2.1
   +Cell_2.2
   +Cell_3

> variables (1)
   +some_value

So in this case, the tree hierarchy would be a little different given the same code example above. All cells would be in their own root level node and you wouldn't be able to collapse the tree of everything inside that that cell, but only use it as a way to quickly see the defined cells and jump to those cell lines.

Final thoughts

Anyway, sorry to make this such a long write up... I'm just trying to understand how best to try to make this work and some of the end cases like this. If there was a completely new language type, this might be easier... but I don't want to cause something to break for others. Maybe this could be simplified from the tagbar perspective if it was enabled via some configuration flag, so it would only parse the python cells if g:tagbar_python_cell was set to 1 or something. Though the desired tree hierarchy output would need to be determined so we know if the tag scope is needed or not.

@gerazov
Copy link
Author

gerazov commented May 6, 2021

@raven42 that's an excellent write up highlighting the many problems and outlining possible solutions! 🤟

To iterate on a few things. Cells are mostly useful in scripts for executing/debugging parts of the code. You rarely use them inside function or class definitions, but it's not excluded as it could help debugging chunks of the code by allowing you to send them easily to the IPython console.

They can either be mixed in between other tags based on location, or also in a tree of their own. I personally would prefer the former as I use Tagbar as sort of an outline of my file making it easy to navigate and to get a sense of the current cursor position. On the other hand, Tagbar's general approach appears to be the latter I guess. At the end both work, so which ever is more convininient to implement. Alternatively it could be specified via a flag, e.g. g:tagbar_order_tags_based_on_line_number 🙂

This is how Spyder handles your sample code (not that it should be the gold standard, but just as an example):

image

In Spyder cell hierarchy is handled via the number of % in the start of the tag, but it is not an important feature, and would be hard to handle as ctags will not count the % in the cell headers I guess. So I would not worry about it at this point.

Having cell tags controlled via a flag is a great idea 👍 I wouldn't impose them as the standard feature of the language.

@raven42
Copy link
Collaborator

raven42 commented May 7, 2021

Thanks for the info @gerazov. Based on this, no matter what the ctags output would need to change to provide a different tag kind for the cell definitions so we aren't overloading the c/class tag kind. Since this is standard python and because we can have cells interleaved with class definitions, we can't really have two different tag kinds for the same language. So an option to add a config flag would not be a good approach. This support would have to come from ctags for the proper tag kind.

If that is done, then we should be able to implement the Unscoped Tag Hierarchy without too much issue.

If you would like to see the Scoped Tag Hierarchy, then we would need the ctags code to be updated to provide the proper scope information. If ctags is changing to do this, then the ctags code could also be updated to include the cell hierarchy based on the number of % characters in the start of the tag as you indicated as well. Short version: if ctags can provide the scope, then tagbar can build the hierarchy.

So either way there is at least some change needed on the ctags side. At a minimum, the tag kind needs to change. Optionally scope can be provided.

Please work with the ctags team to get this update, and once that is supported, I'd be happy to help update the tagbar code accordingly. Feel free to reference this discussion in the ctags issue/PR if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants