Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract_dict and extract_regex_tok should return TokenSpanArray, not DataFrame #206

Open
frreiss opened this issue Jun 10, 2021 · 6 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@frreiss
Copy link
Member

frreiss commented Jun 10, 2021

For legacy reasons, the functions extract_dict() and extract_regex_tok() in spanner/extract.py return single-column DataFrames. These functions should return TokenSpanArray objects instead. Users who want a DataFrame can construct one on top of the returned array.

In addition to the testing code in test_extract.py, there is some downstream code in the notebooks that will need to be modified to deal with this API change.

@frreiss frreiss added good first issue Good for newcomers help wanted Extra attention is needed labels Jun 10, 2021
@lvntky
Copy link

lvntky commented Jun 10, 2021

hello @frreiss I can fork the project and take a look at the issues if you don't start yet.

@frreiss
Copy link
Member Author

frreiss commented Jun 11, 2021

Thanks for your interest, @lvntky! We'd be happy to have you work on this issue. You may want to wait for the pull request #207, which contains other changes to spanner/extract.py, to be merged.

@frreiss
Copy link
Member Author

frreiss commented Jun 12, 2021

Update: PR #207 is merged now; this issue should be unblocked.

@lvntky
Copy link

lvntky commented Jun 13, 2021

sorry for the delay @frreiss i cant look the GitHub for the two days I was very busy at the job. but if there anything that I can help please inform me. I really like the project. Best wishes!

@frreiss
Copy link
Member Author

frreiss commented Jun 14, 2021

@lvntky we welcome contributions of all sizes. We've prepared a list of small changes that would make a good first issue for new contributors. Here's a link: https://github.com/CODAIT/text-extensions-for-pandas/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22

@lvntky
Copy link

lvntky commented Jun 15, 2021

@frreiss thank you for your friendly approach I really thank you I will hunting the issues at repo 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants