Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BookNLP crashes without internet access even when models are already downloaded #6

Open
quadrismegistus opened this issue Mar 30, 2022 · 2 comments

Comments

@quadrismegistus
Copy link

I've been using BookNLP for the last couple weeks and love it; thanks for such a great package.

I realized working in the (wifi-less) subway today that even though I have the models downloaded, BookNLP crashes without internet access. That's unfortunate since there are of course many real-life situations in which internet access is impossible.

Here's the error (with internet turned off):

ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.]()

Here's the full stack trace:

[File ~/github/lltk/lltk/model/booknlp.py:436, in get_booknlp(language, pipeline, model, cache, quiet, **kwargs)
    ]()[434](file:///Users/ryan/github/lltk/lltk/model/booknlp.py?line=433)[ if not key in booknlpd:
    ]()[435](file:///Users/ryan/github/lltk/lltk/model/booknlp.py?line=434)[     from booknlp.booknlp import BookNLP
--> ]()[436](file:///Users/ryan/github/lltk/lltk/model/booknlp.py?line=435)[     booknlpd[key]=BookNLP(
    ]()[437](file:///Users/ryan/github/lltk/lltk/model/booknlp.py?line=436)[         language=language,
    ]()[438](file:///Users/ryan/github/lltk/lltk/model/booknlp.py?line=437)[         model_params=dict(pipeline=pipeline,model=model)
    ]()[439](file:///Users/ryan/github/lltk/lltk/model/booknlp.py?line=438)[     )
    ]()[440](file:///Users/ryan/github/lltk/lltk/model/booknlp.py?line=439)[ return booknlpd[key]

File ~/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/booknlp.py:14, in BookNLP.__init__(self, language, model_params)
     ]()[11](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/booknlp.py?line=10)[ def __init__(self, language, model_params):
     ]()[13](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/booknlp.py?line=12)[ 	if language == "en":
---> ]()[14](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/booknlp.py?line=13)[ 		self.booknlp=EnglishBookNLP(model_params)

File ~/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/english_booknlp.py:148, in EnglishBookNLP.__init__(self, model_params)
    ]()[145](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/english_booknlp.py?line=144)[ self.quoteTagger=QuoteTagger()
    ]()[147](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/english_booknlp.py?line=146)[ if self.doEntities:
--> ]()[148](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/english_booknlp.py?line=147)[ 	self.entityTagger=LitBankEntityTagger(self.entityPath, tagsetPath)
    ]()[149](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/english_booknlp.py?line=148)[ 	aliasPath = pkg_resources.resource_filename(__name__, "data/aliases.txt")
    ]()[150](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/english_booknlp.py?line=149)[ 	self.name_resolver=NameCoref(aliasPath)

File ~/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/entity_tagger.py:19, in LitBankEntityTagger.__init__(self, model_file, model_tagset)
     ]()[16](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/entity_tagger.py?line=15)[ base_model=re.sub("google_bert", "google/bert", model_file.split("/")[-1])
     ]()[17](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/entity_tagger.py?line=16)[ base_model=re.sub(".model", "", base_model)
---> ]()[19](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/entity_tagger.py?line=18)[ self.model = Tagger(freeze_bert=False, base_model=base_model, tagset_flat={"EVENT":1, "O":1}, supersense_tagset=self.supersense_tagset, tagset=self.tagset, device=device)
     ]()[21](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/entity_tagger.py?line=20)[ self.model.to(device)
     ]()[22](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/entity_tagger.py?line=21)[ self.model.load_state_dict(torch.load(model_file, map_location=device))

File ~/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/tagger.py:58, in Tagger.__init__(self, freeze_bert, base_model, tagset, supersense_tagset, tagset_flat, hidden_dim, flat_hidden_dim, device)
     ]()[54](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/tagger.py?line=53)[ self.rev_supersense_tagset[len(supersense_tagset)+1]="O"
     ]()[56](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/tagger.py?line=55)[ self.num_labels_flat=len(tagset_flat)
---> ]()[58](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/tagger.py?line=57)[ self.tokenizer = BertTokenizer.from_pretrained(modelName, do_lower_case=False, do_basic_tokenize=False)
     ]()[59](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/tagger.py?line=58)[ self.bert = BertModel.from_pretrained(modelName)
     ]()[61](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/tagger.py?line=60)[ self.tokenizer.add_tokens(["[CAP]"], special_tokens=True)

File ~/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1724, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   ]()[1722](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1721)[ else:
   ]()[1723](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1722)[     try:
-> ]()[1724](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1723)[         resolved_vocab_files[file_id] = cached_path(
   ]()[1725](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1724)[             file_path,
   ]()[1726](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1725)[             cache_dir=cache_dir,
   ]()[1727](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1726)[             force_download=force_download,
   ]()[1728](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1727)[             proxies=proxies,
   ]()[1729](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1728)[             resume_download=resume_download,
   ]()[1730](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1729)[             local_files_only=local_files_only,
   ]()[1731](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1730)[             use_auth_token=use_auth_token,
   ]()[1732](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1731)[             user_agent=user_agent,
   ]()[1733](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1732)[         )
   ]()[1735](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1734)[     except FileNotFoundError as error:
   ]()[1736](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1735)[         if local_files_only:

File ~/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py:1921, in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, use_auth_token, local_files_only)
   ]()[1917](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1916)[     local_files_only = True
   ]()[1919](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1918)[ if is_remote_url(url_or_filename):
   ]()[1920](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1919)[     # URL, so get it from the cache (downloading if necessary)
-> ]()[1921](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1920)[     output_path = get_from_cache(
   ]()[1922](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1921)[         url_or_filename,
   ]()[1923](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1922)[         cache_dir=cache_dir,
   ]()[1924](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1923)[         force_download=force_download,
   ]()[1925](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1924)[         proxies=proxies,
   ]()[1926](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1925)[         resume_download=resume_download,
   ]()[1927](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1926)[         user_agent=user_agent,
   ]()[1928](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1927)[         use_auth_token=use_auth_token,
   ]()[1929](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1928)[         local_files_only=local_files_only,
   ]()[1930](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1929)[     )
   ]()[1931](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1930)[ elif os.path.exists(url_or_filename):
   ]()[1932](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1931)[     # File, and it exists.
   ]()[1933](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1932)[     output_path = url_or_filename

File ~/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py:2177, in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, use_auth_token, local_files_only)
   ]()[2171](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2170)[                 raise FileNotFoundError(
   ]()[2172](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2171)[                     "Cannot find the requested files in the cached path and outgoing traffic has been"
   ]()[2173](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2172)[                     " disabled. To enable model look-ups and downloads online, set 'local_files_only'"
   ]()[2174](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2173)[                     " to False."
   ]()[2175](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2174)[                 )
   ]()[2176](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2175)[             else:
-> ]()[2177](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2176)[                 raise ValueError(
   ]()[2178](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2177)[                     "Connection error, and we cannot find the requested files in the cached path."
   ]()[2179](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2178)[                     " Please try again or make sure your Internet connection is on."
   ]()[2180](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2179)[                 )
   ]()[2182](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2181)[ # From now on, etag is not None.
   ]()[2183](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2182)[ if os.path.exists(cache_path) and not force_download:

ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.]()

I turn wifi on and everything works normally.

@dbamman
Copy link
Member

dbamman commented Mar 30, 2022

Yes, thanks for bringing this up -- this is something I've been wanting to look into about the transformers library (which seems to require http calls for bert-based models even when the original bert model doesn't need to be accessed). Let me look into it (but if anyone else has seen this, let me know!)

@dbamman
Copy link
Member

dbamman commented Mar 30, 2022

One quick solution is to use transformers' "offline mode" when executing your code, which involves setting the environment variable TRANSFORMERS_OFFLINE=1. In your case (within the lltk/model directory), from the command line, this would be:

TRANSFORMERS_OFFLINE=1 python booknlp.py

This doesn't address why transfomers isn't able to read from the cache (where it stores model/tokenizer files) when there's no internet (it seems to do so when there is internet access, without redownloading every time) -- I'll dig into that more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants