Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: update parser to correctly parse desired tokens #55

Merged
merged 6 commits into from Jun 24, 2021
Merged

Conversation

dandhlee
Copy link
Collaborator

Before you open a pull request, note that this repository is forked from here.
Unless the issue you're trying to solve is unique to this specific repository,
please file an issue and/or send changes upstream to the original as well.


Updated the parser in _extract_docstring_info to correctly parse for tokens by specifically looking for strict match like :type and not fail on input like <xref:type_>.

As well, updated the extractor to handle different ordering of docstring tokens. While GoogleDocstring only returns in specific order, for example :param comes before :type and :returns: comes before :rtype: but handwritten libraries sometimes flip these, and I don't see in Google Docstrings page that it should always come in specific order. Returns type is the only set that the extractor trips on different ordering, updated this bit.

Adding unit tests for the above bits (and the function in general) as well.

Fixes #52

It's a good idea to open an issue first for discussion.

  • Tests pass
  • Appropriate changes to README are included in PR

@dandhlee dandhlee requested a review from a team June 23, 2021 07:13
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jun 23, 2021
Comment on lines 296 to 299
# Adding the extra space for non-colon ending types
# helps determine if we simply ran into desired occurrence
# or if we ran into a similar looking syntax but shouldn't
# parse upon it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: extra space at the beginning of these comments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


# Store the top summary separately.
if index == 0:
top_summary = summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return at this point and avoid needing the else (and the indentation that comes with it)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return summary directly?

parsed_text = parsed_text[index:]

# Clean up whitespace and other characters
parsed_text = " ".join(filter(None, re.split(r'\n| |\|\s', parsed_text))).split(" ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this before -- why do we need \n and in addition to \s?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand " ".join(stuff).split(" "). Isn't that the same as stuff?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order seemed to have mattered, no need for \n and if I put \s in front. Fixed it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for filter, it is slightly different.
list(filter(...)) simply turns the filtered object into a list, while " ".join(filter(...)) transforms the filter object further.
For example:

>>> list(filter(None, re.split(r'\|\s', f_line)))
['\thello.\n world.']
>>> " ".join(filter(None, re.split(r'\|\s', f_line))).split()
['hello.', 'world.']

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If stuff items contain spaces, the resulting stuff is not the same as the original one.

>>> stuff = ["one two", "three four"]
>>> " ".join(stuff).split(" ")
['one', 'two', 'three', 'four']

(no idea what the line does, though, not familiar with the extension 😄 )

self.assertEqual(summary_info1_got, summary_info1_want)


## Test for input coming in mixed format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider creating a separate test case for each summary. It's a bit hard to follow all of these as-is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Collaborator Author

@dandhlee dandhlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Please take a look again :)

Comment on lines 296 to 299
# Adding the extra space for non-colon ending types
# helps determine if we simply ran into desired occurrence
# or if we ran into a similar looking syntax but shouldn't
# parse upon it.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


# Store the top summary separately.
if index == 0:
top_summary = summary
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

parsed_text = parsed_text[index:]

# Clean up whitespace and other characters
parsed_text = " ".join(filter(None, re.split(r'\n| |\|\s', parsed_text))).split(" ")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order seemed to have mattered, no need for \n and if I put \s in front. Fixed it.

parsed_text = parsed_text[index:]

# Clean up whitespace and other characters
parsed_text = " ".join(filter(None, re.split(r'\n| |\|\s', parsed_text))).split(" ")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for filter, it is slightly different.
list(filter(...)) simply turns the filtered object into a list, while " ".join(filter(...)) transforms the filter object further.
For example:

>>> list(filter(None, re.split(r'\|\s', f_line)))
['\thello.\n world.']
>>> " ".join(filter(None, re.split(r'\|\s', f_line))).split()
['hello.', 'world.']

self.assertEqual(summary_info1_got, summary_info1_want)


## Test for input coming in mixed format.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@dandhlee dandhlee requested a review from tbpg June 23, 2021 18:55

# Store the top summary separately.
if index == 0:
top_summary = summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return summary directly?

@dandhlee
Copy link
Collaborator Author

done! Updated to return summary directly.

@dandhlee dandhlee merged commit d1e18c7 into master Jun 24, 2021
@dandhlee dandhlee deleted the fix_parser branch June 24, 2021 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

plugin parses docstring differently than expected
3 participants