[WIP] Matching content in our doctests #197

k0pernicus · 2020-12-04T19:17:45Z

This PR solves issue #189 in order to match the content in our doctests.

I updated all the sources in the texthero folder - the main issue is in the scatterplot function, in visualization.py, where the 3D representation on the browser does not show anything (WiP).

I also updated the file CONTRIBUTING.md in order to inform the project contributors to match as much as possible the doctests in their examples / tests.

To finish, I also updated some doctests in order to add a new line between the doc and the source code sometimes (for clarity), to remove extra whitespaces, etc...

k0pernicus · 2020-12-05T19:32:07Z

Hi @jbesomi,
I have an issue about the return of replace_stopwords for Python 3.6.

Based on the runners, the doctests for Python 3.6 are not valid due that replace_stopwords filters the punctuations: https://travis-ci.com/github/jbesomi/texthero/jobs/454997146.

However, the doctests for Python 3.X (with X >= 7) are valid because replace_stopwords does not filter the punctuations, which is, checking directly in the regex, seems the good behaviour.
As an example, this is output of the runner for Python 3.8: https://travis-ci.com/github/jbesomi/texthero/jobs/454997148.

Do you have any idea about this issue?
It does not seems, in the code, that there is a specific code version for Python 3.6 and another one for the other Python versions...

jbesomi · 2020-12-08T07:39:56Z

Hi @k0pernicus, thank you for your PR! Amazing 🎉

Regarding the issue with replace_stopwords, probably this is due to the regex pattern. I will investigate that and get back to you. For you to know, we were thinking about adding a tokenization function and require all preprocessing functions to receive an already tokenized function, to avoid this kind of problems (for this function, for instance, we would have to go through the list of tokens and remove the stopwords).

k0pernicus · 2020-12-09T15:44:30Z

Hi @jbesomi, thank you for the update :)

For you to know, we were thinking about adding a tokenization function and require all preprocessing functions to receive an already tokenized function, to avoid this kind of problems[...].

Great!
Do not hesitate if you want to test this feature and integrate in this PR, I would be glad to help!

k0pernicus · 2021-01-09T16:15:34Z

Hi @jbesomi ,
Do you have any news about the regex issue please? :)

jbesomi · 2021-01-12T11:47:51Z

Hi @k0pernicus,

I've intensively thought about this and come to the conclusion that it's better for both of us developers and for all Texthero users to have all preprocessing functions to accept an already Tokenized Series. See here #145 for a complete discussion on the subject. Would you like to help with #145 too? Once implemented, writing replace_stopwords will be much easier, hence we will be able to integrate this PR too.

k0pernicus · 2021-01-12T18:32:55Z

Hi @jbesomi,
I would be glad to help for this issue, I can take a look starting tomorrow about #145.

jbesomi · 2021-01-12T21:29:27Z

Thank you @k0pernicus !

k0pernicus added 6 commits December 4, 2020 18:35

Replacement in preprocessing.py

d07f6b8

Replacement in visualization.py

f7e3c56

Replacement in nlp.py

3eb1a41

Explained the matching content in CONTRIBUTING.md

cdb0d72

Updated representation.py to match the doctests

be92bb0

Updated visualization.py to match with representation.py

5850f37

vercel bot deployed to Preview December 4, 2020 19:17 View deployment

k0pernicus changed the title ~~WiP Matching content in our doctests~~ [WIP] Matching content in our doctests Dec 4, 2020

k0pernicus added 2 commits December 5, 2020 16:44

Fixed issue in doctest output

6db47e9

Fixed doctest issues in both preprocessing.py and representation.py

8eeac99

vercel bot deployed to Preview December 5, 2020 15:50 View deployment

Fixed some formating in doctest

1a7de38

vercel bot deployed to Preview December 5, 2020 18:27 View deployment

Fixed trailing whitespaces in doctests

04ecf63

vercel bot deployed to Preview December 5, 2020 18:54 View deployment

k0pernicus added 2 commits December 5, 2020 19:58

Fixed trailing whitespace in doctest

e0d889f

Fixed an error in replace_stopwords example

9250b3e

vercel bot deployed to Preview December 5, 2020 18:58 View deployment

Fixed an output doctext issue in preprocessing.py

60c5740

vercel bot deployed to Preview December 5, 2020 19:02 View deployment

Tried to solve the order error in visualization.py

e6f1b15

vercel bot deployed to Preview December 5, 2020 19:11 View deployment

Performed tests with Python 3.8 to test the c-i results

0398f23

vercel bot deployed to Preview December 5, 2020 19:17 View deployment

Fixed comma issue in doctest

07b4afc

vercel bot deployed to Preview December 5, 2020 19:23 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Matching content in our doctests #197

[WIP] Matching content in our doctests #197

k0pernicus commented Dec 4, 2020

k0pernicus commented Dec 5, 2020 •

edited

jbesomi commented Dec 8, 2020

k0pernicus commented Dec 9, 2020

k0pernicus commented Jan 9, 2021 •

edited

jbesomi commented Jan 12, 2021

k0pernicus commented Jan 12, 2021

jbesomi commented Jan 12, 2021

[WIP] Matching content in our doctests #197

Are you sure you want to change the base?

[WIP] Matching content in our doctests #197

Conversation

k0pernicus commented Dec 4, 2020

k0pernicus commented Dec 5, 2020 • edited

jbesomi commented Dec 8, 2020

k0pernicus commented Dec 9, 2020

k0pernicus commented Jan 9, 2021 • edited

jbesomi commented Jan 12, 2021

k0pernicus commented Jan 12, 2021

jbesomi commented Jan 12, 2021

k0pernicus commented Dec 5, 2020 •

edited

k0pernicus commented Jan 9, 2021 •

edited