Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the user to upload Information Extraction labels #257

Open
DerKernigeFeuerpfeil opened this issue May 23, 2023 · 0 comments
Open

Allow the user to upload Information Extraction labels #257

DerKernigeFeuerpfeil opened this issue May 23, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@DerKernigeFeuerpfeil
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I want to use refinery to label for information extraction, but cannot upload my existing labels, which sets me back in my project by a large margin.

Describe the solution you'd like
I want to tokenize my data in a notebook with the same tokenizer that refinery uses. I would then match the labels to the respective tokens. Technically, this would be realised through a JSON attribute, e.g. label__headline__MANUAL with the key of that being a list with one label per token, e.g. ["0", "0", "PERSON", "0"] (the "0" could also be null or anything other that is specified in the docs). This data, I want to upload to refinery. During the tokenization process, I want refinery to tell me if the internal tokenizer and my pre-tokenized data does not match. If so, there are two levels of complexity I can imagine:

  1. simple: it should stop the tokenization process and throw an error that the tokenization did not match my pre-provided tokens (in length)
  2. medium: it should additionally tell me what record caused this and what the tokenization lengths were (e.g. refinery produced 200 tokens while I only provided a list of 193 tokens)

Describe alternatives you've considered
hacking the project import/export functionality, which is rather complicated.

Additional context

@DerKernigeFeuerpfeil DerKernigeFeuerpfeil added the enhancement New feature or request label May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant