Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spacy-transformers - update transformers compatibility #13322

Open
GennVa opened this issue Feb 12, 2024 · 4 comments
Open

Spacy-transformers - update transformers compatibility #13322

GennVa opened this issue Feb 12, 2024 · 4 comments
Labels
feat / transformer Feature: Transformer

Comments

@GennVa
Copy link

GennVa commented Feb 12, 2024

I'm using version 1.3.4 of spacy-transformers but it has incompatibility with the latest version of transformers (4.37.2).
Is an update planned?
Thanks

@svlandeg svlandeg added the feat / transformer Feature: Transformer label Feb 12, 2024
@svlandeg
Copy link
Member

Is it just a version incompatibility because we've pinned transformers to <4.37.0, or are you able to actually update your transformers install locally and does everything still work as expected?

Which version of spaCy are you on, if I may ask? Because from 3.7 onwards we've started switching towards https://github.com/explosion/spacy-curated-transformers instead - have you tried it?

@GennVa
Copy link
Author

GennVa commented Feb 12, 2024

@svlandeg I'm using spacy==3.7.3
Can I uninstall spacy-transformers for spacy-curated-transformers?
Using spacy-transformers==1.3.4 everything seems to work, I just get the version ERROR.

Using spacy-curated-transformers i have this error:

ValueError: [E002] Can't find factory for 'transformer' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. If you're using a custom component, make sure you've added the decorator `@Language.component` (for function components) or `@Language.factory` (for class components).

Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, doc_cleaner, parser, beam_parser, lemmatizer, trainable_lemmatizer, entity_linker, entity_ruler, tagger, morphologizer, ner, beam_ner, senter, sentencizer, spancat, spancat_singlelabel, span_finder, future_entity_ruler, span_ruler, textcat, textcat_multilabel, en.lemmatizer

Running:
spacy.load(path)

@svlandeg
Copy link
Member

We had to yank 3.7.3 (for unrelated reasons - a bug in the multiprocessing code), so please update to 3.7.4 if you can.

Can I uninstall spacy-transformers for spacy-curated-transformers?

Yes, but you'll then need to use curated_transformer as the factory instead of just transformer. You can see an example config here:

[components.transformer]
factory = "curated_transformer"

spacy.load(path)

Which model are you loading? If this is a pretrained model using the old spacy_transformer's transformer factory, then you'll still need spacy_transformer. If it's a pretrained model from us, you can likely update though.

@GennVa
Copy link
Author

GennVa commented Feb 20, 2024

@svlandeg Thanks. I want to train a spancat (with transformers) pipeline. I downloaded spacy-curated-transformers and spacy==3.7.4

I got this error:

catalogue.RegistryError: [E892] Unknown function registry: 'span_getters'.

Available names: architectures, augmenters, batchers, callbacks, cli, datasets, displacy_colors, factories, initializers, languages, layers, lemmatizers, loggers, lookups, losses, misc, model_loaders, models, ops, optimizers, readers, schedules, scorers, tokenizers, vectors

I used the "This is an auto-generated partial config." on spacy website, but it's for spacy-transformers only.
I tried to adapt it to spacy-curated-transformers
That's my actual cfg file, used in !python -m spacy init labels mycfg.cfg ... :

[paths]
train = null
dev = null
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "en"
pipeline = ["transformer","spancat"]
batch_size = 512
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
vectors = {"@vectors":"spacy.Vectors.v1"}

[components]

[components.spancat]
factory = "spancat"
max_positive = null
scorer = {"@scorers":"spacy.spancat_scorer.v1"}
spans_key = "sc"
threshold = 0.5

[components.spancat.model]
@architectures = "spacy.SpanCategorizer.v1"

[components.spancat.model.reducer]
@layers = "spacy.mean_max_reducer.v1"
hidden_size = 128

[components.spancat.model.scorer]
@layers = "spacy.LinearLogistic.v1"
nO = null
nI = null

[components.spancat.model.tok2vec]
@architectures = "spacy-curated-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

[components.spancat.suggester]
@misc = "spacy.ngram_suggester.v1"
sizes = [1,2,3]

[components.transformer]
factory = "curated_transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-curated-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-curated-transformers.RobertaTransformer.v1"
name = "roberta-base"
mixed_precision = false

[components.transformer.model.get_spans]
@span_getters = "spacy-curated-transformers.strided_spans.v1"
window = 128
stride = 96

[components.transformer.model.grad_scaler_config]

[components.transformer.model.tokenizer_config]
use_fast = true

[components.transformer.model.transformer_config]

[corpora]
...other..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / transformer Feature: Transformer
Projects
None yet
Development

No branches or pull requests

2 participants