Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Authentication Credentials Issue when bert_deid download #20

Open
julianmricci opened this issue Jun 20, 2023 · 4 comments
Open

Comments

@julianmricci
Copy link

(deid) julianricci@Julians-MacBook-Pro-673 bert-deid-master % bert_deid download

06/20/2023 15:31:34 - INFO - bert_deid.download - Beginning download of model files to bert_deid_model
06/20/2023 15:31:34 - INFO - bert_deid.download - Downloading bert-deid/bert-i2b2-2014/added_tokens.json to bert_deid_model/added_tokens.json
06/20/2023 15:31:37 - WARNING - google.auth.compute_engine._metadata - Compute Engine Metadata server unavailable on attempt 1 of 3. Reason: timed out
06/20/2023 15:31:40 - WARNING - google.auth.compute_engine._metadata - Compute Engine Metadata server unavailable on attempt 2 of 3. Reason: timed out
06/20/2023 15:31:40 - WARNING - google.auth.compute_engine._metadata - Compute Engine Metadata server unavailable on attempt 3 of 3. Reason: [Errno 64] Host is down
06/20/2023 15:31:40 - WARNING - google.auth._default - Authentication failed using Compute Engine authentication due to unavailable metadata server.
Traceback (most recent call last):
File "/Users/julianricci/anaconda3/envs/deid/bin/bert_deid", line 8, in
sys.exit(main())
File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/main.py", line 131, in main
download(args)
File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/main.py", line 121, in download
download_model(args.model_dir)
File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/download.py", line 41, in download_model
download_blob(bucket_name, f'bert-i2b2-2014/{fn}', f'{model_dir}/{fn}')
File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/download.py", line 16, in download_blob
storage_client = storage.Client()
File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/storage/client.py", line 119, in init
_http=_http,
File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/client.py", line 318, in init
_ClientProjectMixin.init(self, project=project, credentials=credentials)
File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/client.py", line 266, in init
project = self._determine_default(project)
File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/client.py", line 285, in _determine_default
return _determine_default_project(project)
File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/_helpers.py", line 186, in _determine_default_project
_, project = google.auth.default()
File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/auth/_default.py", line 488, in default
raise exceptions.DefaultCredentialsError(_HELP_MESSAGE)
google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started

How would I resolve this?

@wayneisaacuy
Copy link

(deid) julianricci@Julians-MacBook-Pro-673 bert-deid-master % bert_deid download

06/20/2023 15:31:34 - INFO - bert_deid.download - Beginning download of model files to bert_deid_model 06/20/2023 15:31:34 - INFO - bert_deid.download - Downloading bert-deid/bert-i2b2-2014/added_tokens.json to bert_deid_model/added_tokens.json 06/20/2023 15:31:37 - WARNING - google.auth.compute_engine._metadata - Compute Engine Metadata server unavailable on attempt 1 of 3. Reason: timed out 06/20/2023 15:31:40 - WARNING - google.auth.compute_engine._metadata - Compute Engine Metadata server unavailable on attempt 2 of 3. Reason: timed out 06/20/2023 15:31:40 - WARNING - google.auth.compute_engine._metadata - Compute Engine Metadata server unavailable on attempt 3 of 3. Reason: [Errno 64] Host is down 06/20/2023 15:31:40 - WARNING - google.auth._default - Authentication failed using Compute Engine authentication due to unavailable metadata server. Traceback (most recent call last): File "/Users/julianricci/anaconda3/envs/deid/bin/bert_deid", line 8, in sys.exit(main()) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/main.py", line 131, in main download(args) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/main.py", line 121, in download download_model(args.model_dir) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/download.py", line 41, in download_model download_blob(bucket_name, f'bert-i2b2-2014/{fn}', f'{model_dir}/{fn}') File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/download.py", line 16, in download_blob storage_client = storage.Client() File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/storage/client.py", line 119, in init _http=_http, File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/client.py", line 318, in init _ClientProjectMixin.init(self, project=project, credentials=credentials) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/client.py", line 266, in init project = self._determine_default(project) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/client.py", line 285, in _determine_default return _determine_default_project(project) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/_helpers.py", line 186, in _determine_default_project _, project = google.auth.default() File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/auth/_default.py", line 488, in default raise exceptions.DefaultCredentialsError(_HELP_MESSAGE) google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started

How would I resolve this?

Hi, I'm facing the same problem. Were you able to resolve it?

@landiisotta
Copy link

Looks like bert-deid and other models have been uploaded to PhysioNet https://www.physionet.org/content/transformer-deid/1.0.0/ and can also be found on huggingface https://huggingface.co/KindLab

@wayneisaacuy
Copy link

Thanks for the heads up! I have seen those but were you able to make it run? The files that have to be downloaded for this code to work are specified here: https://github.com/alistairewj/bert-deid/blob/master/bert_deid/download.py. They are
files = [ 'added_tokens.json', 'config.json', 'label_set.bin', 'pytorch_model.bin', 'special_tokens_map.json', 'tokenizer_config.json', 'training_args.bin', 'vocab.txt' ]

I still have to check the code if all of the files are needed.

I also saw the github code from KindLab but are you aware if a demo notebook exists on how to use the pre-trained model? The documentation in https://github.com/kind-lab/transformer-deid only talks about how to train the model. The link for evaluation doesn't work.

I'm currently using another de-identification model but it'd be nice to compare. Thanks!

@landiisotta
Copy link

@wayneisaacuy I was able to run the model in inference on one example using the function deid_example in the module predict. Nevertheless, the performance needs to be improved so it looks like it needs further fine-tuning anyway. Another option is the Perl-based de-identification software package by Neamatullah et al. 2008 (https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-8-32), that was used to de-identify the MIMIC dataset. It is a rule-based approach that works quite well as is, but needs rules to be added to adapt it to your corpus. It is publicly available through the Physionet repository: (1) https://www.physionet.org/content/deid/1.1/ ; (2) https://www.physionet.org/content/deidentifiedmedicaltext/1.0/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants