Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support in third-party tools/lib and how to evolve the spec #463

Open
pawamoy opened this issue Jun 27, 2023 · 5 comments
Open

Support in third-party tools/lib and how to evolve the spec #463

pawamoy opened this issue Jun 27, 2023 · 5 comments

Comments

@pawamoy
Copy link

pawamoy commented Jun 27, 2023

Hello, I'm maintaining a tool called Griffe which among other things is capable of parsing Numpydoc styled docstrings. It's used by mkdocstrings, to auto-generate HTML docs from Python source code in MkDocs sites.

With time, we discovered a few subtleties that are not specified, and a few use-cases that are not supported by the spec. Our parser therefore takes some liberties, or allows configuration, and I guess it would nice to give some feedback upstream, i.e. to you Numpydoc maintainers.

But first, I'd like to know if there's a preferred way for you to get this feedback, and if you're even OK with the possibility of evolving the spec. Are issues here OK? Should I mention someone in particular?

@rgommers
Copy link
Member

Hi @pawamoy, thanks for opening this issue! I'm quite interested in Griffe; in particular because autodoc support is the main thing keeping many projects from moving away from Sphinx.

Evolving the spec to address gaps in the spec or new use cases should be possible, assuming it's not too disruptive for existing users. Opening issues on this repo would be the way to go. Only if there's something big-picture or bc-breaking, it needs to be run by the numpy-discussion mailing list.

@pawamoy
Copy link
Author

pawamoy commented Jun 27, 2023

Hey @rgommers, thanks for the very fast response 🙂

I can then try to summarize our findings in this issue first, and if needed move each item to its own issue.

  1. The first item is the elephant in the room 😄 so will probably need discussion on the mailing list: the Numpydoc style is based on reStruturedText, and even more than that, designed only to work with Sphinx. As you may know, many users prefer Markdown over RST. It is generally not an issue though, since the only hard dependency on RST in the spec is for deprecations. However the spec still only shows RST markup, tightly coupling it to RST/Sphinx. It would be great if the spec was fully markup-agnostic (it almost is already 👍) and if a Markdown version of the guide existed.

  2. The second item is about type annotations. Nowadays devs like to annotate their parameter types and their return types in the signature directly. The spec does not always take that into account and sometimes asks them to write the types again in the docstring, notably for the Returns section (see issue linked above: Additional support for numpy returns section with no annotation, or just type annotation mkdocstrings/griffe#173). Here I think the spec can evolve in a backward compatible way (see the syntax suggested in the mentioned issue, which I think doesn't clash with the already supported syntax).

  3. Arbitrary text is allowed for types in docstrings, making it difficult (if not impossible) to parse type, default=thing correctly. Since arbitrary text is allowed, we cannot restrict the set of accepted characters for the type, to make sure to exclude the default at the end. Instead, we must check if there's a default at the end using string operations. Then, it's easy to build an example that would trick a parser based on string operations: param : Literal["thing, default=0"]. Since a default value can also hold a string containing double-quotes or brackets, the parser would find the type to be Literal["thing and the default value to be 0"]. Instead, we should match opening and closing quotes/brackets (and I know it's not a trivial task...). In Griffe, I thought about re-using Python itself, parsing the type and default as Python code (see Docstring default values for mutable arguments. mkdocstrings/mkdocstrings#577 (comment)) for Google-style docstrings, but again that would not be possible for Numpydoc since arbitrary text is allowed for the type, which would not compile as Python code. How would Numpydoc parse param : Literal["thing, default=0"]? Maybe I'm just bad at parsing things 😅

  4. It is not clear to me whether prose is allowed in between sections. See numpy parser parameter table breaks if there is an empty line between parameters mkdocstrings/griffe#167. Generally speaking, some aspects of the guide are ambiguous and could be made more clear (absolutely no judgment here, the guide is great). Anyway the guide is a guide, not a spec as I call it, so maybe such details could be documented in another location. I wonder if such a document exist by the way? Maybe the code has comments explaining all these subtleties?

The rest I forgot or is just standardization of the standards: finding a common denominator for all styles, or expanding each style to support everything supported in the others. For example Numpydoc has "See also" and "Methods" sections, while the Google-style does not. Currently "See also" and "Methods" have no special treatment in Griffe and are just parsed as admonitions, but in the future I'll probably add support for "Methods" in both Numpydoc and Google styles.

@rgommers
Copy link
Member

  1. The first item is the elephant in the room [...]. It is generally not an issue though, since the only hard dependency on RST in the spec is for deprecations.

I agree, this should not be a blocker. The deprecation directly is a simple admonition, so it has a Markdown equivalent - and we can consider changing it to a more neutral syntax or to show the equivalent Markdown (colon fence I guess?) as well.

The second item is about type annotations.

This one is a little painful, also in Sphinx. Last times I tried things were not pretty. The most important thing that was missing was reasonable support for type aliases, because showing raw type annotations as html is often super unhelpful when you go beyond the basics and start needing overloads/unions/typevars.

I'm not sure about using : or : int, if that's the suggestion (not 100% sure that it is). It seems a little hard to understand from reading the source - you kinda have to be familiar with the convention already for it to make sense. One of the key goals for numpydoc is to make the source readable, both in an editor and interactively in a terminal. So whatever is chosen here, I hope that aspect can be preserved as much as possible.

Instead, we must check if there's a default at the end using string operations.

I agree that that's unhelpful. The spec has these examples:

copy : bool, default True
copy : bool, default=True
copy : bool, default: True

In practice I don't think that is done in NumPy or SciPy much (or at all), we've kind of settled on including the default value as part of the description rather than the type. We may be able to remove support for this.

It is not clear to me whether prose is allowed in between sections

In my understanding, it isn't.

@pawamoy
Copy link
Author

pawamoy commented Jun 29, 2023

we can consider changing it to a more neutral syntax or to show the equivalent Markdown (colon fence I guess?) as well.

Hmm, that will depend on the Markdown implementation 🤔 I don't think admonitions are part of the Markdown spec (at least they're not in CommonMark)

with markdown-callouts

> DEPRECATED: **Deprecated since 1.6.0**
> `ndobj_old` will be removed in NumPy 2.0.0, it is replaced by
> `ndobj_new` because the latter works also with array subclasses.

(chevrons not strictly necessary)

with pymdown-extensions

!!! danger "Deprecated since 1.6.0"
    `ndobj_old` will be removed in NumPy 2.0.0, it is replaced by
    `ndobj_new` because the latter works also with array subclasses.

or with pymdownx blocks

/// admonition | Deprecated since 1.6.0
    type: danger

`ndobj_old` will be removed in NumPy 2.0.0, it is replaced by
`ndobj_new` because the latter works also with array subclasses.
///

They're all extensions of Python-Markdown. I can't speak for other implementations. The most readable is the markdown-callouts one, in my opinion. I think GFM has a similar, experimental feature for admonitions.


because showing raw type annotations as html is often super unhelpful when you go beyond the basics and start needing overloads/unions/typevars.

Yep, long type annotations are not pleasant to read. Not sure about Sphinx, but with MkDocs+mkdocstrings, if you declare a type alias and use it as return annotation, it will not be expanded when rendering docs. Even better, if you document this type alias (docstring below it), then the rendered return type will link to it and users will see its value (the raw type annotation, as written in the source, linking again to sub types/aliases, recursively, as long as they're documented).


I'm not sure about using : or : int, if that's the suggestion

That's the suggestion yes. I agree, it's not obvious what it means when reading the docstring. I cannot think of a better alternative that fits into the current syntax though 🤔


we've kind of settled on including the default value as part of the description rather than the type. We may be able to remove support for this.

Ha, I personally support this way of documenting the default value, unfortunately some users would like to override the default value from the signature with some parse-able metadata in the docstring, notably for mutable defaults that are set in the function body instead of the signature to avoid bugs. The signature's default value is for example rendered in a "Default" table column with None, and they'd like to replace it with [1, 1, 2], without writing that in the description of the item itself. The current default= allows that, but cannot always be parsed correctly. I'm thinking, maybe there's a way to first consider the type to be valid Python syntax, and fallback on string operations if it's not. Or just keep things the way they are now, documenting their limits. The breaking case I mentioned above must not happen a lot, if it even happens.


In my understanding, it isn't.

This could be explained early and explicitly in the guide then!


Thanks for your answers, that helps clearing my thoughts 🙂
I'll have to open the same issue on Google's side 😄

@machow
Copy link

machow commented Jun 29, 2023

Hey! I'm working on supporting generating API documentation with quarto.org (e.g. the python shiny API docs), so have a lot of time I can throw at supporting this!

I wonder if a helpful approach might be framing issues in terms of what the numpydoc parser does right now.

Returns section syntax always expects a type

One question the shiny team asked was "how do we just document the already annotated return type, without having to write some arbitrary name in the returns section?". We settled on having them just use :, but it seems like the parsing of this is a bit funky in the numpydoc implementation:

from numpydoc.docscrape import NumpyDocString

doc = """
    Returns
    -------
    some_type
    x : int
    y : 
    : float
    :

"""

NumpyDocString(doc)["Returns"]

output:

[Parameter(name='', type='some_type', desc=[]),
 Parameter(name='x', type='int', desc=[]),
 Parameter(name='', type='y', desc=[]),
 Parameter(name='', type=': float', desc=[]),
 Parameter(name='', type=':', desc=[])]

Note that the last 3 return pieces are interpreted as a type description. Unlike the parameters section, there isn't really a way to say "I am not manually specifying the type" in the returns section.

RE @rgommers point:

I'm not sure about using : or : int, if that's the suggestion (not 100% sure that it is). It seems a little hard to understand from reading the source - you kinda have to be familiar with the convention already for it to make sense. One of the key goals for numpydoc is to make the source readable, both in an editor and interactively in a terminal. So whatever is chosen here, I hope that aspect can be preserved as much as possible.

I wonder if it's helpful to reframe the question to these two:

  • how do I specify a return description without a type (or name)?
  • Is parsing a single : or : float as the return type (e.g. a type of ": float") the intended behavior?

Like with the parameters section, users in an interactive terminal will be able to see the return type in the function signature, so hopefully using something like : to ride off the parameter convention for not specifying types isn't too distracting (edit: but also very happy to explore alternatives! :).

Prose between sections

RE @rgommers point to whether prose is allowed between sections

In my understanding, it isn't.

The numpydoc parser does not allow prose between sections, but the sphinx parser does (and sphinx is the one rendering the ultimate output).

e.g.

def f(x, y, z: str) -> int:
    """Some function

    Parameters
    ----------
    x: int
        the x parameter
    z:
        cool
    y: abc
        nice


    .. note::
       I am a note


    Returns
    -----------
    x :
        Some description
    """

renders in sphinx with a note between the Parameters and Returns section, while the numpydoc parser just puts the note text as a parameter name.

Sphinx example

image

numpydoc example

from numpydoc.docscrape import NumpyDocString

doc =  """Some function

    Parameters
    ----------
    x: int
        the x parameter
    z:
        cool
    y: abc
        nice


    .. note::
       I am a note


    Returns
    -----------
    x :
        Some description
    """

list(NumpyDocString(doc).items()
[('Signature', ''),
 ('Summary', ['Some function']),
 ('Extended Summary', []),
 ('Parameters',
  [Parameter(name='x: int', type='', desc=['the x parameter']),
   Parameter(name='z:', type='', desc=['cool']),
   Parameter(name='y: abc', type='', desc=['nice']),
   Parameter(name='.. note::', type='', desc=['I am a note'])]),
 ('Returns', [Parameter(name='', type='x', desc=['Some description'])]),
 ('Yields', []),
 ('Receives', []),
 ('Raises', []),
 ('Warns', []),
 ('Other Parameters', []),
 ('Attributes', []),
 ('Methods', []),
 ('See Also', []),
 ('Notes', []),
 ('Warnings', []),
 ('References', ''),
 ('Examples', ''),
 ('index', {})]

It seems fair that this is maybe a weird edge case, and the numpydoc spec doesn't allow prose between sections!

pawamoy added a commit to mkdocstrings/griffe that referenced this issue Jan 14, 2024
…ions on blank lines

Breaking change: This change removes the docstring option
`allow_section_blank_line` for the Numpydoc parser,
and changes the parsing logic so that blank lines are now *always* allowed,
in any number, between sections. Sections are now only delimited by
section headers themselves. The result of these changes is that:
**prose is not allowed in between Numpydoc sections anymore**,
making the parser compliant with the Numpydoc style guide and
its maintainers recommendations.

PR #220: #220
Related to PR #219: #219
Numpydoc discussion: numpy/numpydoc#463
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants