Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem running run_nlpbook.py #4

Open
denisfitz57 opened this issue Feb 12, 2022 · 4 comments
Open

Problem running run_nlpbook.py #4

denisfitz57 opened this issue Feb 12, 2022 · 4 comments

Comments

@denisfitz57
Copy link

After running:
booknlp=BookNLP("en", model_params)

I get the following;

(It seems to refer to my model location by booknlps, but what is created is booknlp_models, and tacking a local directory path to the huggingface url also seems like an issue. I'm glad to help and try things here, though my experience with big pyhon code bases is limited )

404 Client Error: Repository Not Found for url: https://huggingface.co/C:%5CUsers%5Cdenis%5Cbooknlps%5Centities_google/bert_uncased_L-6_H-768_A-12/resolve/main/tokenizer_config.json

RepositoryNotFoundError Traceback (most recent call last)
c:\Users\denis\Anaconda3\envs\booknlp\lib\site-packages\transformers\file_utils.py in get_file_from_repo(path_or_repo, filename, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only)
2241 local_files_only=local_files_only,
-> 2242 use_auth_token=use_auth_token,
2243 )

c:\Users\denis\Anaconda3\envs\booknlp\lib\site-packages\transformers\file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, use_auth_token, local_files_only)
1853 use_auth_token=use_auth_token,
-> 1854 local_files_only=local_files_only,
1855 )

c:\Users\denis\Anaconda3\envs\booknlp\lib\site-packages\transformers\file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, use_auth_token, local_files_only)
2049 r = requests.head(url, headers=headers, allow_redirects=False, proxies=proxies, timeout=etag_timeout)
-> 2050 _raise_for_status(r)
2051 etag = r.headers.get("X-Linked-Etag") or r.headers.get("ETag")

c:\Users\denis\Anaconda3\envs\booknlp\lib\site-packages\transformers\file_utils.py in _raise_for_status(request)
1970 if error_code == "RepoNotFound":
-> 1971 raise RepositoryNotFoundError(f"404 Client Error: Repository Not Found for url: {request.url}")
1972 elif error_code == "EntryNotFound":

RepositoryNotFoundError: 404 Client Error: Repository Not Found for url: https://huggingface.co/C:%5CUsers%5Cdenis%5Cbooknlps%5Centities_google/bert_uncased_L-6_H-768_A-12/resolve/main/tokenizer_config.json

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_2352\2094341818.py in
4 }
5
----> 6 booknlp=BookNLP("en", model_params)

c:\Users\denis\Anaconda3\envs\booknlp\lib\site-packages\booknlp\booknlp.py in init(self, language, model_params)
12
13 if language == "en":
---> 14 self.booknlp=EnglishBookNLP(model_params)
15
16 def process(self, inputFile, outputFolder, idd):

c:\Users\denis\Anaconda3\envs\booknlp\lib\site-packages\booknlp\english\english_booknlp.py in init(self, model_params)
146
147 if self.doEntities:
--> 148 self.entityTagger=LitBankEntityTagger(self.entityPath, tagsetPath)
149 aliasPath = pkg_resources.resource_filename(name, "data/aliases.txt")
150 self.name_resolver=NameCoref(aliasPath)

c:\Users\denis\Anaconda3\envs\booknlp\lib\site-packages\booknlp\english\entity_tagger.py in init(self, model_file, model_tagset)
17 base_model=re.sub(".model", "", base_model)
18
---> 19 self.model = Tagger(freeze_bert=False, base_model=base_model, tagset_flat={"EVENT":1, "O":1}, supersense_tagset=self.supersense_tagset, tagset=self.tagset, device=device)
20
21 self.model.to(device)

c:\Users\denis\Anaconda3\envs\booknlp\lib\site-packages\booknlp\english\tagger.py in init(self, freeze_bert, base_model, tagset, supersense_tagset, tagset_flat, hidden_dim, flat_hidden_dim, device)
56 self.num_labels_flat=len(tagset_flat)
57
---> 58 self.tokenizer = BertTokenizer.from_pretrained(modelName, do_lower_case=False, do_basic_tokenize=False)
59 self.bert = BertModel.from_pretrained(modelName)
60

c:\Users\denis\Anaconda3\envs\booknlp\lib\site-packages\transformers\tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
1662 use_auth_token=use_auth_token,
1663 revision=revision,
-> 1664 local_files_only=local_files_only,
1665 )
1666 if resolved_config_file is not None:

c:\Users\denis\Anaconda3\envs\booknlp\lib\site-packages\transformers\file_utils.py in get_file_from_repo(path_or_repo, filename, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only)
2246 logger.error(err)
2247 raise EnvironmentError(
-> 2248 f"{path_or_repo} is not a local folder and is not a valid model identifier "
2249 "listed on 'https://huggingface.co/models'\nIf this is a private repository, make sure to "
2250 "pass a token having permission to this repo with use_auth_token or log in with "

OSError: C:\Users\denis\booknlps\entities_google/bert_uncased_L-6_H-768_A-12 is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True

@wjbmattingly
Copy link

wjbmattingly commented Feb 28, 2022

I have spent a good part of the morning trying to solve this problem on Windows 10. I have got it working. Everything will work on Linux, but the problem is when BookNLP tries to call the directories for each model

The key to solving is this portion of the error message: OSError: C:\Users\denis\booknlps\entities_google/

BookNLP does not create the folder booknlps or the subfolders. You will need to do this manually.

Go to Users/{username}/ and here create 3 subdirectories: entities_google, coref_google, and speaker_google . Next download the appropriate models from hugging face. You can use git clone (here is a good tutorial for how to do ithttps://stackoverflow.com/questions/67595500/how-to-download-model-from-huggingface)

https://huggingface.co/google/bert_uncased_L-6_H-768_A-12 - entities
https://huggingface.co/google/bert_uncased_L-12_H-768_A-12 - coref
https://huggingface.co/google/bert_uncased_L-12_H-768_A-12 - speaker

I am about to make a video on this whole process as I am preparing a YouTube series on using BookNLP and resolving Windows issues was priority number 1 since most of my viewers use Windows

@denisfitz57
Copy link
Author

Thank you - I will try it out this weekend. Also glad to hear of the YouTube series.

@wjbmattingly
Copy link

No problem! Please do let me know if it works/does not work for you. I have only tested it on one machine. Best of luck!

Nathanlauga added a commit to Nathanlauga/booknlp that referenced this issue May 22, 2022
@Nathanlauga
Copy link

Nathanlauga commented May 22, 2022

Hey, I created a fix for this problem. It comes from the fact that os.path.join use backslashes instead of forward slashes.

So I updated all os.path.join with replacing '\\' by '/'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants