Xrenner Module: 'BasicTokenizer' object has no attribute 'strip_accents' #18

nitinvwaran · 2021-12-17T20:30:43Z

I'm trying to run the xrenner module in amalgum, i get the title error after i setup using the env.yml file.

I saw there is a relation to #9 , which seems fixed, so i double checked and ensured that the versions are: flair==0.6.1, transformers==3.5.1, torch==1.5.1. I also see that the updates to dplp++ make_merge.py are present on my machine, acc. to 462de8a.

However i still get the error. I also tried removing the cached model file 'eng_flair_nner_distilbert.pt' on my machine from the conda environment and re-download, but the same error.

I'm not sure how to proceed, as i think the flair version is locked down to 0.6.1? By any chance, is the model file not up to date on the server, as i see the timestamp is July 2020 on that file..

I also see the same error in xrenner too, when i check out that codebase and run (i tried the master branch). Flair is frozen to 0.6.1, and it installs the latest transformers / torch by default so i explicitly install the versions 1.5.1 and 3.5.1, and i get the exact same error there too.

The text was updated successfully, but these errors were encountered:

amir-zeldes · 2021-12-18T21:38:37Z

Interesting - it looks like either the model's version doesn't play nicely with the tokenizer, or the flair version with transformers maybe? It's working for me with:

flair 0.6.1
torch 1.6.0+cu101
transformers 3.5.1

And it looks like my model is from 2020-11-24, but that might just be the date the library was installed? If it's the model, maybe you can try this one, which works well for me in standalone xrenner (haven't tried it in the pipeline yet):

https://corpling.uis.georgetown.edu/amir/download/eng_flair_nner_electra_gum7.pt

Does that work?

nitinvwaran · 2021-12-21T15:35:30Z

Thank you, it looks like it was the model file incompatibility; xrenner now works for me on torch==1.5.1, transformers == 3.5.1, flair==0.6.1, CUDA 11 with the electra model - I am able to convert a few GUM files to sgml with referent annotations.

I see the timestamp as july 2020 on the server for the default 'eng_flair_nner_distilbert' model which xrenner is downloading, so i believe this is causing the incompatibility and error. I think replacing this with the Nov 2020 model should also resolve things.

I didn't test the amalgum pipeline as i'm not sure how to change the model file to download in the configuration there.

amir-zeldes · 2021-12-21T15:41:19Z

OK, thanks! I'll reopen this as a reminder to update the model, but I think it's worth waiting for GUM8 to retrain a fresh one, rather than using the GUM7 model, since V8 is right around the corner.

nitinvwaran closed this as completed Dec 21, 2021

amir-zeldes reopened this Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xrenner Module: 'BasicTokenizer' object has no attribute 'strip_accents' #18

Xrenner Module: 'BasicTokenizer' object has no attribute 'strip_accents' #18

nitinvwaran commented Dec 17, 2021 •

edited

amir-zeldes commented Dec 18, 2021

nitinvwaran commented Dec 21, 2021 •

edited

amir-zeldes commented Dec 21, 2021

Xrenner Module: 'BasicTokenizer' object has no attribute 'strip_accents' #18

Xrenner Module: 'BasicTokenizer' object has no attribute 'strip_accents' #18

Comments

nitinvwaran commented Dec 17, 2021 • edited

amir-zeldes commented Dec 18, 2021

nitinvwaran commented Dec 21, 2021 • edited

amir-zeldes commented Dec 21, 2021

nitinvwaran commented Dec 17, 2021 •

edited

nitinvwaran commented Dec 21, 2021 •

edited