fix: update parser to correctly parse desired tokens #55

dandhlee · 2021-06-23T07:13:31Z

Before you open a pull request, note that this repository is forked from here.
Unless the issue you're trying to solve is unique to this specific repository,
please file an issue and/or send changes upstream to the original as well.

Updated the parser in _extract_docstring_info to correctly parse for tokens by specifically looking for strict match like :type and not fail on input like <xref:type_>.

As well, updated the extractor to handle different ordering of docstring tokens. While GoogleDocstring only returns in specific order, for example :param comes before :type and :returns: comes before :rtype: but handwritten libraries sometimes flip these, and I don't see in Google Docstrings page that it should always come in specific order. Returns type is the only set that the extractor trips on different ordering, updated this bit.

Adding unit tests for the above bits (and the function in general) as well.

Fixes #52

It's a good idea to open an issue first for discussion.

Tests pass
Appropriate changes to README are included in PR

tbpg · 2021-06-23T14:15:17Z

docfx_yaml/extension.py

+        #  Adding the extra space for non-colon ending types
+        #  helps determine if we simply ran into desired occurrence
+        #  or if we ran into a similar looking syntax but shouldn't
+        #  parse upon it.


Minor: extra space at the beginning of these comments?

tbpg · 2021-06-23T14:15:52Z

docfx_yaml/extension.py

+
+    # Store the top summary separately.
+    if index == 0:
+        top_summary = summary


Can we return at this point and avoid needing the else (and the indentation that comes with it)?

Just return summary directly?

tbpg · 2021-06-23T14:16:31Z

docfx_yaml/extension.py

+        parsed_text = parsed_text[index:]
+
+        # Clean up whitespace and other characters
+        parsed_text = " ".join(filter(None, re.split(r'\n|  |\|\s', parsed_text))).split(" ")


Missed this before -- why do we need \n and in addition to \s?

I'm not sure I understand " ".join(stuff).split(" "). Isn't that the same as stuff?

the order seemed to have mattered, no need for \n and if I put \s in front. Fixed it.

for filter, it is slightly different.
list(filter(...)) simply turns the filtered object into a list, while " ".join(filter(...)) transforms the filter object further.
For example:

>>> list(filter(None, re.split(r'\|\s', f_line))) ['\thello.\n world.']

>>> " ".join(filter(None, re.split(r'\|\s', f_line))).split() ['hello.', 'world.']

If stuff items contain spaces, the resulting stuff is not the same as the original one.

>>> stuff = ["one two", "three four"] >>> " ".join(stuff).split(" ") ['one', 'two', 'three', 'four']

(no idea what the line does, though, not familiar with the extension 😄 )

tbpg · 2021-06-23T14:21:12Z

tests/test_unit.py

+        self.assertEqual(summary_info1_got, summary_info1_want)
+
+
+        ## Test for input coming in mixed format.


Consider creating a separate test case for each summary. It's a bit hard to follow all of these as-is.

dandhlee

Thanks for the review! Please take a look again :)

dandhlee · 2021-06-23T15:20:33Z

docfx_yaml/extension.py

+        #  Adding the extra space for non-colon ending types
+        #  helps determine if we simply ran into desired occurrence
+        #  or if we ran into a similar looking syntax but shouldn't
+        #  parse upon it.


dandhlee · 2021-06-23T15:26:45Z

docfx_yaml/extension.py

+
+    # Store the top summary separately.
+    if index == 0:
+        top_summary = summary


dandhlee · 2021-06-23T17:12:47Z

docfx_yaml/extension.py

+        parsed_text = parsed_text[index:]
+
+        # Clean up whitespace and other characters
+        parsed_text = " ".join(filter(None, re.split(r'\n|  |\|\s', parsed_text))).split(" ")


the order seemed to have mattered, no need for \n and if I put \s in front. Fixed it.

dandhlee · 2021-06-23T17:58:48Z

docfx_yaml/extension.py

+        parsed_text = parsed_text[index:]
+
+        # Clean up whitespace and other characters
+        parsed_text = " ".join(filter(None, re.split(r'\n|  |\|\s', parsed_text))).split(" ")


for filter, it is slightly different.
list(filter(...)) simply turns the filtered object into a list, while " ".join(filter(...)) transforms the filter object further.
For example:

>>> list(filter(None, re.split(r'\|\s', f_line))) ['\thello.\n world.']

>>> " ".join(filter(None, re.split(r'\|\s', f_line))).split() ['hello.', 'world.']

dandhlee · 2021-06-23T18:52:26Z

tests/test_unit.py

+        self.assertEqual(summary_info1_got, summary_info1_want)
+
+
+        ## Test for input coming in mixed format.


… fix_parser

tbpg · 2021-06-24T18:06:14Z

docfx_yaml/extension.py

+
+    # Store the top summary separately.
+    if index == 0:
+        top_summary = summary


Just return summary directly?

dandhlee · 2021-06-24T18:10:46Z

done! Updated to return summary directly.

dandhlee added 3 commits June 23, 2021 00:50

fix: correct parser to scan specific tokens only

aa1c7b8

fix: update parser for varying input types

5830e6d

test: add unittest for extract_docstring_info

aa9abfc

dandhlee requested a review from a team June 23, 2021 07:13

google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jun 23, 2021

This was referenced Jun 23, 2021

chore: pin sphinx plugin version to working one googleapis/python-bigquery#715

Merged

fix: properly handle Raises section for GoogleDocstring #56

Merged

tbpg requested changes Jun 23, 2021

View reviewed changes

dandhlee commented Jun 23, 2021

View reviewed changes

dandhlee added 2 commits June 23, 2021 14:53

fix: update parser and test

18757e8

Merge branch 'master' of github.com:googleapis/sphinx-docfx-yaml into…

7baff61

… fix_parser

dandhlee requested a review from tbpg June 23, 2021 18:55

tbpg approved these changes Jun 24, 2021

View reviewed changes

fix: update to return summary directly

aabd1d3

dandhlee merged commit d1e18c7 into master Jun 24, 2021

dandhlee deleted the fix_parser branch June 24, 2021 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: update parser to correctly parse desired tokens #55

fix: update parser to correctly parse desired tokens #55

dandhlee commented Jun 23, 2021

tbpg Jun 23, 2021

dandhlee Jun 23, 2021

tbpg Jun 23, 2021

dandhlee Jun 23, 2021

tbpg Jun 24, 2021

tbpg Jun 23, 2021

tbpg Jun 23, 2021

dandhlee Jun 23, 2021

dandhlee Jun 23, 2021

plamut Jun 23, 2021

tbpg Jun 23, 2021

dandhlee Jun 23, 2021

dandhlee left a comment

dandhlee Jun 23, 2021

dandhlee Jun 23, 2021

dandhlee Jun 23, 2021

dandhlee Jun 23, 2021

dandhlee Jun 23, 2021

tbpg Jun 24, 2021

dandhlee commented Jun 24, 2021

		self.assertEqual(summary_info1_got, summary_info1_want)


		## Test for input coming in mixed format.

fix: update parser to correctly parse desired tokens #55

fix: update parser to correctly parse desired tokens #55

Conversation

dandhlee commented Jun 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dandhlee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dandhlee commented Jun 24, 2021