improvement: Implement support for NumPy-style docstrings #279

celsiusnarhwal · 2023-01-25T22:03:37Z

This PR implements support for NumPy-style docstrings via the new NumpyProcessor class. It does so with the help of the numpydoc package, on which this PR makes Pydoc-Markdown dependent.

In addition to the above, this PR:

Adds a unit test for NumpyProcessor
Updates SmartProcessor to support NumpyProcessor
Updates pyproject.toml to reflect the addition of numpydoc as a dependency
Updates readme.md to reflect the addition of NumPy-style docstring support

This PR resolves #251.

Caveats and Limitations

NumpyProcessor.check_docstring_format() returns True if a docstring passes numpydoc's docstring validator without warnings or errors and False otherwise. Because SmartProcessor skips the call to check_docstring_format if the format is explicitly indicated in the docstring (e.g., with @doc:fmt:numpy), a docstring that would fail numpydoc's validator but nonetheless explicitly identifies itself as a NumPy-style docstring may result in warnings or exceptions at processing time.
- The processor converts docstrings to NumpyDocString objects before converting them to Markdown syntax. Instantiating a NumpyDocString object with an invalid docstring will result in warnings or exceptions.
Reference indexes in a docstring's Notes section are not hyperlinked to their corresponding references in the References section, in contrast to the numpydoc spec. This is due to what is apparently a behavior of Pydoc-Markdown's existing faculties, which insisted on rendering HTML tags in a way that broke the hyperlinks in all my attempts to implement this behavior. Examples of how reference indexes and references are rendered by NumpyProcessor can be found below.

Examples

Here are examples of how the various sections of a NumPy-Style docstring are rendered by NumpyProcessor.

Summary / Extended Summary

The Summary and Extended Summary are rendered together as a single summary.

Input

Decode a string by shifting each character by a given offset.

Extended Summary
----------------
There's not much else to say about this function, but if there was, it would go here. Fun fact: you 
don't need to include the Extended Summary heading — if your summary spans multiple lines, everything after the 
first will be implicitly considered to be the Extended Summary. You can't have both an implicit *and* explicit 
Extended Summary, though — that causes an exception!

Output

Decode a string by shifting each character by a given offset.

There's not much else to say about this function, but if there was, it would go here. Fun fact: you don't need to include the Extended Summary heading — if your summary spans multiple lines, everything after the first will be implicitly considered to be the Extended Summary. You can't have both an implicit and explicit Extended Summary, though — that causes an exception!

Parameters / Other Parameters / Attributes / Recieves

The Parameters, Other Parameters, Attributes, and Receives sections are all rendered similarly.

Input

Parameters
----------
string : str
    The string to decode.
   
Other Parameters
----------------
offset : int
    The offset by which to shift each character in the string. Defaults to 13.
    
Attributes
----------
attr : Any
    Functions don't have attributes, but if we were documenting a class, we'd put its attributes here. 
    Unfortunately, we are not. Too bad!
    
Receives
--------
param : Any
    If this was a generator, we'd document the parameters passed to it's `send()` method here.
    Unfortunately, it is not. Too bad!

Output

Arguments

string (str): The string to decode.
offset (int): The offset by which to shift each character in the string. Defaults to 13.

Attributes

attr (Any): Functions don't have attributes, but if we were documenting a class, we'd put its attributes here. Unfortunately, we are not. Too bad!

Receives

param (Any): If this was a generator, we'd document the parameters passed to it's send() method here. Unfortunately, it is not. Too bad!

Returns / Yields

The Returns and Yields sections are rendered similarly.

Input

Returns
-------
str
    The decoded string.

Yields
------
char : str
    The decoded string, one character at a time. By the way, you can optionally annotate your return and yield 
    values with names like I did here. The type annotation isn't optional, though.

Output

Returns

str: The decoded string.

Yields

char (str): The decoded string, one character at a time. By the way, you can optionally annotate your return and yield values with names like I did here. The type annotation isn't optional, though.

Raises / Warns

The Raises and Warns sections are rendered similarly.

Input

Raises
------
ValueError
    If the string contains non-alphabetic characters.

Warns
-----
UserWarning
    If I don't like you.

Output

Raises

ValueError: If the string contains non-alphabetic characters.

Warns

UserWarning: If I don't like you.

Input

See Also
--------
:func:`encode`
    Encode a string by shifting each character by a given offset.

Output

See Also

:func:`encode`: Encode a string by shifting each character by a given offset.

(The processor leaves the task of cross-referencing functions, classes, and methods in this section to Pydoc-Markdown's existing faculties.)

Notes

Input

Notes
-----
This function implements an inverse substitution cipher[1]_.

Output

Notes

This function implements an inverse substitution cipher¹.

References

Input

References
----------
.. [1] https://en.wikipedia.org/wiki/Substitution_cipher

Output

References

https://en.wikipedia.org/wiki/Substitution_cipher

Examples

The Examples section supports doctests. The processor renders doctests in code blocks and other content as plain text.

The processor considers the start of a doctest to be marked by a line beginning with >>> and the end of a doctest to be marked by a blank line. If multiple doctests are present, they are rendered in separate code blocks.

Input

Examples
--------
>>> decode("Qba'g nfx fghcvq dhrfgvbaf!")
"Don't ask stupid questions!"

This is a super simple function so I don't really know why you'd need more than one example but here's another one 
anyway.

>>> decode("Gunax lbh xvaqyl sbe lbhe nggragvba!")
"Thank you kindly for your attention!"

Output

Examples

>>> decode("Qba'g nfx fghcvq dhrfgvbaf!")
"Don't ask stupid questions!"

This is a super simple function so I don't really know why you'd need more than one example but here's another one anyway.

>>> decode("Gunax lbh xvaqyl sbe lbhe nggragvba!")
"Thank you kindly for your attention!"

NiklasRosenstein · 2023-01-30T22:15:16Z

Hey @celsiusnarhwal, thanks for this great PR! I'll be able to take a closer look at it next week.

skip-checks: true

…icks for each ApiObject

NiklasRosenstein · 2023-05-27T21:43:12Z

Hey @celsiusnarhwal, sorry for the silence. I'm finally finding some time again to look at your PR

I've made some minor adjustments, and I'd almost be happy to merge it as it is now! Only that there are two unit tests failing because the NumpyProcessor identifies the examples below as seemingly being of the Numpy doc format when in reality they're not and they don't really get processed as a consequence.

E.g. for the test_pydocmd_processor test:

# Arguments
s (str): A string.
b (int): An int.

It spits the same back out. I've added some logging so we can tell which processor the SmartProcessor is delegating to:

INFO     pydoc_markdown.contrib.processors.smart:smart.py:92 Using `numpy` processor for Module `test` (detected)

NumpyProcessor.check_docstring_format() returns True if a docstring passes numpydoc's docstring validator without warnings or errors and False otherwise

I'm also thinking that this on the other may be too restrictive. If I want to use the Numpy docstring format, I may still make mistakes, and I'd actually want it to be identified as Numpy docstring format regardless of whether I have a minor mistake in my docstring formatting. Getting a warning (although maybe not an exception) in this case would be desirable.

What do you think about checking for the presence of Numpy-doc-like sections (e.g. Raises\n-------) in the content of the docstring instead?

…stand what is actually produced

improvement: Implement support for NumPy-style docstrings

c9e9180

celsiusnarhwal marked this pull request as ready for review January 25, 2023 23:46

celsiusnarhwal added 2 commits January 25, 2023 18:50

Formatting adjustments

f874940

Formatting adjustments

c4fd4d1

NiklasRosenstein and others added 4 commits May 27, 2023 21:06

Merge branch 'develop' into celsiusnarhwal/develop

b3d14f4

Updated PR references in 1 changelogs.

1de1dbf

skip-checks: true

fmt, import __future__.annotations and fix mypy lints

0156c47

Streamline SmartProcessor implementation and log which processor it p…

11cbbe0

…icks for each ApiObject

add print to assert_processor_result() which makes it easier to under…

a98a68f

…stand what is actually produced

NiklasRosenstein added the status/awaiting response label Nov 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improvement: Implement support for NumPy-style docstrings #279

improvement: Implement support for NumPy-style docstrings #279

celsiusnarhwal commented Jan 25, 2023 •

edited

NiklasRosenstein commented Jan 30, 2023

NiklasRosenstein commented May 27, 2023

improvement: Implement support for NumPy-style docstrings #279

Are you sure you want to change the base?

improvement: Implement support for NumPy-style docstrings #279

Conversation

celsiusnarhwal commented Jan 25, 2023 • edited

Caveats and Limitations

Examples

Input

Output

Input

Output

Input

Output

Input

Output

Input

Output

Input

Output

Input

Output

Input

Output

NiklasRosenstein commented Jan 30, 2023

NiklasRosenstein commented May 27, 2023

celsiusnarhwal commented Jan 25, 2023 •

edited