New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: update parser to correctly parse desired tokens #55
Conversation
docfx_yaml/extension.py
Outdated
# Adding the extra space for non-colon ending types | ||
# helps determine if we simply ran into desired occurrence | ||
# or if we ran into a similar looking syntax but shouldn't | ||
# parse upon it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: extra space at the beginning of these comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
docfx_yaml/extension.py
Outdated
|
||
# Store the top summary separately. | ||
if index == 0: | ||
top_summary = summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return at this point and avoid needing the else
(and the indentation that comes with it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return summary
directly?
docfx_yaml/extension.py
Outdated
parsed_text = parsed_text[index:] | ||
|
||
# Clean up whitespace and other characters | ||
parsed_text = " ".join(filter(None, re.split(r'\n| |\|\s', parsed_text))).split(" ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed this before -- why do we need \n
and
in addition to \s
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand " ".join(stuff).split(" ")
. Isn't that the same as stuff
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the order seemed to have mattered, no need for \n
and
if I put \s
in front. Fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for filter, it is slightly different.
list(filter(...))
simply turns the filtered object into a list, while " ".join(filter(...))
transforms the filter object further.
For example:
>>> list(filter(None, re.split(r'\|\s', f_line)))
['\thello.\n world.']
>>> " ".join(filter(None, re.split(r'\|\s', f_line))).split()
['hello.', 'world.']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If stuff
items contain spaces, the resulting stuff
is not the same as the original one.
>>> stuff = ["one two", "three four"]
>>> " ".join(stuff).split(" ")
['one', 'two', 'three', 'four']
(no idea what the line does, though, not familiar with the extension 😄 )
self.assertEqual(summary_info1_got, summary_info1_want) | ||
|
||
|
||
## Test for input coming in mixed format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider creating a separate test case for each summary
. It's a bit hard to follow all of these as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! Please take a look again :)
docfx_yaml/extension.py
Outdated
# Adding the extra space for non-colon ending types | ||
# helps determine if we simply ran into desired occurrence | ||
# or if we ran into a similar looking syntax but shouldn't | ||
# parse upon it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
docfx_yaml/extension.py
Outdated
|
||
# Store the top summary separately. | ||
if index == 0: | ||
top_summary = summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
docfx_yaml/extension.py
Outdated
parsed_text = parsed_text[index:] | ||
|
||
# Clean up whitespace and other characters | ||
parsed_text = " ".join(filter(None, re.split(r'\n| |\|\s', parsed_text))).split(" ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the order seemed to have mattered, no need for \n
and
if I put \s
in front. Fixed it.
docfx_yaml/extension.py
Outdated
parsed_text = parsed_text[index:] | ||
|
||
# Clean up whitespace and other characters | ||
parsed_text = " ".join(filter(None, re.split(r'\n| |\|\s', parsed_text))).split(" ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for filter, it is slightly different.
list(filter(...))
simply turns the filtered object into a list, while " ".join(filter(...))
transforms the filter object further.
For example:
>>> list(filter(None, re.split(r'\|\s', f_line)))
['\thello.\n world.']
>>> " ".join(filter(None, re.split(r'\|\s', f_line))).split()
['hello.', 'world.']
self.assertEqual(summary_info1_got, summary_info1_want) | ||
|
||
|
||
## Test for input coming in mixed format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
docfx_yaml/extension.py
Outdated
|
||
# Store the top summary separately. | ||
if index == 0: | ||
top_summary = summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return summary
directly?
done! Updated to return summary directly. |
Before you open a pull request, note that this repository is forked from here.
Unless the issue you're trying to solve is unique to this specific repository,
please file an issue and/or send changes upstream to the original as well.
Updated the parser in
_extract_docstring_info
to correctly parse for tokens by specifically looking for strict match like:type
and not fail on input like<xref:type_>
.As well, updated the extractor to handle different ordering of docstring tokens. While
GoogleDocstring
only returns in specific order, for example:param
comes before:type
and:returns:
comes before:rtype:
but handwritten libraries sometimes flip these, and I don't see in Google Docstrings page that it should always come in specific order. Returns type is the only set that the extractor trips on different ordering, updated this bit.Adding unit tests for the above bits (and the function in general) as well.
Fixes #52