Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate MsgPack export/import #50

Open
leonkunert opened this issue Oct 14, 2022 · 4 comments
Open

Generate MsgPack export/import #50

leonkunert opened this issue Oct 14, 2022 · 4 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@leonkunert
Copy link
Collaborator

leonkunert commented Oct 14, 2022

We should try to reimplement the msgPack format from spacy. https://msgpack.org/ should be helpful. Maybe also implement import.

@tecoholic
Copy link
Owner

I think the current format that spacy uses for NER data is DocBin. I don't know if there is a open spec that will allow reading and writing this format. Maybe reading the spacy code will help.

Either way, I don't see a big need for msgpack.

@leonkunert
Copy link
Collaborator Author

leonkunert commented Oct 14, 2022

The DocBin format is a gzipped MsgPack https://spacy.io/api/docbin

@tecoholic
Copy link
Owner

@leonkunert Ah.. I should have RTFD. Thanks for pointing out. Then this is something that should be definitely implemented.

@tecoholic tecoholic added enhancement New feature or request help wanted Extra attention is needed labels Oct 14, 2022
@leonkunert
Copy link
Collaborator Author

The token, spaces and lengths fields can be difficult. They are serialized numpy arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants