Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_TAGS for fine-grained tasks #703

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

sg-wbi
Copy link
Collaborator

@sg-wbi sg-wbi commented Jun 8, 2022

INTRO

This introduces a metadata attribute _TAGS.

The values in _TAGS are, well tags, to further classify the task.

These tags are meant to be used together w/ _SUPPORTED_TASKS.

I tried to be as generic as possible in order to have a as minimal as possible set of tags, whose combination w/ _SUPPORTED_TASKS could make sense.

For instance, bc5cdr will have the following combination of fine-grained tasks:

  • NER Chemical
  • NER Disease
  • NED Chemical
  • NED Disease
  • Relation Disease
  • Relation Chemical

This is just a temporary solution and we can discuss which tags make sense and which do not.

HOW:

You can access these tags by checking out this PR:

gh pr checkout https://github.com/bigscience-workshop/biomedical/pull/703

As mentioned these tags are meant to be composed w/ _SUPPORTED_TASKS:

configs = BigBioConifgHelpers()

for config in configs:
   for task in config.tasks:
       for tag in config._py_module._TAGS:
           finegrined_task = f"{task} {tag}"

TROUBLESHOOTING:

The BigBioConfigHelper should load just fine. If this is not the case, it is probably because of a spelling error.

To fix this you just need to edit the file you find in

bigbio/utils/resources/tags.json

If you find such errors, please ping me on slack and I will fix them right away.

@sg-wbi
Copy link
Collaborator Author

sg-wbi commented Jun 8, 2022

The idea moving forward would be to attach specific tags to specific taks, this way we can have a test for this information, e.g. the "MULTIPLE_CHOICE" tag should be available only for "QA".

@sg-wbi
Copy link
Collaborator Author

sg-wbi commented Jun 8, 2022

One more thing: during the process I was tempted to create a SOCIAL_MEDIA and CLINICAL tag, but I think we should have yet another metadata attribute specific only for "domain"/"source".

@hakunanatasha hakunanatasha self-assigned this Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants