Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spancat training not working with span group other than "sc" #13090

Open
nrodnova opened this issue Oct 29, 2023 · 3 comments
Open

spancat training not working with span group other than "sc" #13090

nrodnova opened this issue Oct 29, 2023 · 3 comments
Labels
feat / spancat Feature: Span Categorizer feat / training Feature: Training utils, Example, Corpus and converters training Training and updating models

Comments

@nrodnova
Copy link
Contributor

How to reproduce the behaviour

When span_key in [components.spancat_singlelabel] or [components.spancat] sections is other than "sc", training output looks like this:

ℹ Pipeline: ['sentencizer', 'tok2vec', 'spancat_singlelabel']
ℹ Set annotations on update for: ['sentencizer']
ℹ Initial learn rate: 0.001

E    #       LOSS TOK2VEC  LOSS SPANC...  SENTS_F  SENTS_P  SENTS_R  SPANS_SC_F  SPANS_SC_P  SPANS_SC_R  SCORE
---  ------  ------------  -------------  -------  -------  -------  ----------  ----------  ----------  ------
  0       0          0.00          19.33   100.00   100.00   100.00        0.00        0.00        0.00    0.50
  0     200          5.46         409.10   100.00   100.00   100.00        0.00        0.00        0.00    0.50
  0     400         10.51          96.83   100.00   100.00   100.00        0.00        0.00        0.00    0.50
  0     600         10.37          77.10   100.00   100.00   100.00        0.00        0.00        0.00    0.50
  0     800          8.99          99.86   100.00   100.00   100.00        0.00        0.00        0.00    0.50
  0    1000          9.52         100.14   100.00   100.00   100.00        0.00        0.00        0.00    0.50
  0    1200          6.18          62.30   100.00   100.00   100.00        0.00        0.00        0.00    0.50

I thought I was getting insane :)
Seems like a bug at the evaluation step, I didn't investigate further, sorry.
Other parts seem to be working:

python -m spacy debug data config.cfg

is happy with non-"sc" value.

Also, the labels get picked up from the training dataset and show up correctly in meta.json.

After I changed span_key back to the default "sc", I got this:

E    #       LOSS TOK2VEC  LOSS SPANC...  SENTS_F  SENTS_P  SENTS_R  SPANS_SC_F  SPANS_SC_P  SPANS_SC_R  SCORE
---  ------  ------------  -------------  -------  -------  -------  ----------  ----------  ----------  ------
  0       0          0.00          19.33   100.00   100.00   100.00       96.81       96.81       96.81    0.98
  0     200          6.64         421.19   100.00   100.00   100.00       99.32       99.32       99.32    1.00
  0     400          8.43          82.30   100.00   100.00   100.00       99.43       99.43       99.43    1.00
  0     600          9.34          71.43   100.00   100.00   100.00       99.45       99.45       99.45    1.00
  0     800          9.28         107.05   100.00   100.00   100.00       99.59       99.59       99.59    1.00
  0    1000          8.78          89.08   100.00   100.00   100.00       98.95       98.95       98.95    0.99

Also, not sure why sentence metrics show up - I am using non-trainable simple sentencizer. It's not that important, obviously.

Your Environment

  • spaCy version: 3.6.1
  • Platform: macOS-12.4-x86_64-i386-64bit
  • Python version: 3.10.11
  • Pipelines: en_core_web_sm (3.6.0)
@nrodnova nrodnova changed the title spancat not working with span group other than "sc" spancat training not working with span group other than "sc" Oct 29, 2023
@rmitsch rmitsch added training Training and updating models feat / training Feature: Training utils, Example, Corpus and converters feat / spancat Feature: Span Categorizer labels Oct 30, 2023
@rmitsch
Copy link
Contributor

rmitsch commented Oct 30, 2023

Hi @nrodnova, thanks for reporting this! We'll look into it.

@rmitsch
Copy link
Contributor

rmitsch commented Oct 30, 2023

This is likely due to your score weights not being configured properly, as the name of the score weight attributes are derived from the value of span_key. E. g. if span_key == "sc", the score weight config may look like this:

[training.score_weights]
spans_sc_f = 1.0
spans_sc_p = 0.0
spans_sc_r = 0.0

I recommend adding your config to include this snippet (or update it if it's already in there) with score weights that reflect your actual span_key. E. g. if span_key == "myspankey":

[training.score_weights]
spans_myspankey_f = 1.0
spans_myspankey_p = 0.0
spans_myspankey_r = 0.0

@nrodnova
Copy link
Contributor Author

@rmitsch Thanks that makes sense. I will test it out and comment back. If this works, it would be nice to add it to the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / spancat Feature: Span Categorizer feat / training Feature: Training utils, Example, Corpus and converters training Training and updating models
Projects
None yet
Development

No branches or pull requests

2 participants