Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xrenner Module: 'BasicTokenizer' object has no attribute 'strip_accents' #18

Open
nitinvwaran opened this issue Dec 17, 2021 · 3 comments

Comments

@nitinvwaran
Copy link
Contributor

nitinvwaran commented Dec 17, 2021

@amir-zeldes , @lgessler ,

I'm trying to run the xrenner module in amalgum, i get the title error after i setup using the env.yml file.

I saw there is a relation to #9 , which seems fixed, so i double checked and ensured that the versions are: flair==0.6.1, transformers==3.5.1, torch==1.5.1. I also see that the updates to dplp++ make_merge.py are present on my machine, acc. to 462de8a.

However i still get the error. I also tried removing the cached model file 'eng_flair_nner_distilbert.pt' on my machine from the conda environment and re-download, but the same error.

I'm not sure how to proceed, as i think the flair version is locked down to 0.6.1? By any chance, is the model file not up to date on the server, as i see the timestamp is July 2020 on that file..

I also see the same error in xrenner too, when i check out that codebase and run (i tried the master branch). Flair is frozen to 0.6.1, and it installs the latest transformers / torch by default so i explicitly install the versions 1.5.1 and 3.5.1, and i get the exact same error there too.

@amir-zeldes
Copy link
Contributor

Interesting - it looks like either the model's version doesn't play nicely with the tokenizer, or the flair version with transformers maybe? It's working for me with:

flair 0.6.1
torch 1.6.0+cu101
transformers 3.5.1

And it looks like my model is from 2020-11-24, but that might just be the date the library was installed? If it's the model, maybe you can try this one, which works well for me in standalone xrenner (haven't tried it in the pipeline yet):

https://corpling.uis.georgetown.edu/amir/download/eng_flair_nner_electra_gum7.pt

Does that work?

@nitinvwaran
Copy link
Contributor Author

nitinvwaran commented Dec 21, 2021

Thank you, it looks like it was the model file incompatibility; xrenner now works for me on torch==1.5.1, transformers == 3.5.1, flair==0.6.1, CUDA 11 with the electra model - I am able to convert a few GUM files to sgml with referent annotations.

I see the timestamp as july 2020 on the server for the default 'eng_flair_nner_distilbert' model which xrenner is downloading, so i believe this is causing the incompatibility and error. I think replacing this with the Nov 2020 model should also resolve things.
Screen Shot 2021-12-21 at 10 30 34 AM

I didn't test the amalgum pipeline as i'm not sure how to change the model file to download in the configuration there.

@amir-zeldes
Copy link
Contributor

OK, thanks! I'll reopen this as a reminder to update the model, but I think it's worth waiting for GUM8 to retrain a fresh one, rather than using the GUM7 model, since V8 is right around the corner.

@amir-zeldes amir-zeldes reopened this Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants