Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include support for spaCy v3 #1621

Open
rjurney opened this issue Dec 31, 2020 · 5 comments
Open

Include support for spaCy v3 #1621

rjurney opened this issue Dec 31, 2020 · 5 comments
Labels
feature request help wanted no-stale Auto-stale bot skips this issue

Comments

@rjurney
Copy link

rjurney commented Dec 31, 2020

Is your feature request related to a problem? Please describe.

I want to use spaCy v3 to get transformers models. I want to use snorkel to do text extraction by giving it examples created in spaCy's annotation format. Just like the spouse demo only why do in tensorflow what spaCy does out of the box?

Describe the solution you'd like

Bump requirements.txt entry to spaCy to 3.0 and test the system, then include support in the next release.

Describe alternatives you've considered

It's probably possible to generate the v3 annotation format in v2 then run spaCy v3 in a separate program.

Additional context

spaCy v3 is amazing! https://nightly.spacy.io/

@henryre
Copy link
Member

henryre commented Feb 14, 2021

Hi @rjurney, following up on #1630, we marked this as help wanted. If you want to contribute a PR to support spaCy 3.x, let us know and we can discuss the approach and review when the time comes.

@rjurney
Copy link
Author

rjurney commented Feb 15, 2021

@henryre Sounds reasonable. I'm interested in doing this work. I'm looking at the spaCy v3 migration docs and I have a couple of questions:

  • Does Snorkel use any custom pipeline components or factories?
  • Is there a spaCy config file?
  • Does it use the standard tokenizer or is it custom? With standard or modified settings?
  • Does it use tag maps or morph rules?

If you aren't sure, I can figure the answers out myself, but it doesn't look like a difficult migration based on the spacy code I've read in Snorkel in the past.

@henryre
Copy link
Member

henryre commented Feb 15, 2021

@rjurney agreed, shouldn't be too difficult. The library isn't very opinionated about spaCy usage, so I don't expect any of the above to come into play. The spaCy-based wrappers are primarily contained in the following:

@rjurney
Copy link
Author

rjurney commented May 2, 2022

@yinxiangshi I got tox -e complex to run. I am looking over the relevant files to see if there are anything we missed. I didn't quite get your comments about config - I am not sure how that changes things, unless we want to add spaCy config support to snorkel. I suppose that is reasonable, let me look!

@rjurney
Copy link
Author

rjurney commented May 2, 2022

@yinxiangshit I found this, and am digging in... https://spacy.io/usage/v3#features-training

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request help wanted no-stale Auto-stale bot skips this issue
Projects
None yet
Development

No branches or pull requests

2 participants