Skip to content
This repository has been archived by the owner on Feb 25, 2022. It is now read-only.

Cannot Connect To Local TPU-VM #323

Open
nikhilanayak opened this issue Feb 25, 2022 · 1 comment
Open

Cannot Connect To Local TPU-VM #323

nikhilanayak opened this issue Feb 25, 2022 · 1 comment
Labels
bug Something isn't working.

Comments

@nikhilanayak
Copy link

Describe the bug
When I try to connect to the TPU to finetune, it gives me this error:

Traceback (most recent call last):
  File "main.py", line 257, in <module>
    main(args)
  File "main.py", line 251, in main
    estimator.train(input_fn=partial(input_fn, global_step=current_step, eval=False), max_steps=params["train_steps"])
  File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3110, in train
    rendezvous.raise_errors()
  File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors
    six.reraise(typ, value, traceback)
  File "/home/nikhilnayak/.local/lib/python3.8/site-packages/six.py", line 703, in reraise
    raise value
  File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3100, in train
    return super(TPUEstimator, self).train(
  File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 346, in train
    hooks.extend(self._convert_train_steps_to_hooks(steps, max_steps))
  File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2973, in _convert_train_steps_to_hooks
    if ctx.is_running_on_cpu():
  File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 531, in is_running_on_cpu
    self._validate_tpu_configuration()
  File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 699, in _validate_tpu_configuration
    num_cores = self.num_cores
  File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 429, in num_cores
    metadata = self._get_tpu_system_metadata()
  File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 333, in _get_tpu_system_metadata
    tpu_system_metadata_lib._query_tpu_system_metadata(
  File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow/python/tpu/tpu_system_metadata.py", line 135, in _query_tpu_system_metadata
    raise RuntimeError(
RuntimeError: Cannot find any TPU cores in the system (master address ). This usually means the master address is incorrect or the TPU worker has some problems. Available devices: [_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, -3188567715276368833)]

To Reproduce
Steps to reproduce the behavior:
I followed the instructions for finetuning on this github page.

Expected behavior
The finetuning program should finetune with my dataset without datasets.

Proposed solution
N/A

Environment (please complete the following information):

  • TPU Version: v2-alpha
  • TPU Type: v3-8
  • Architecture: TPU-VM
@nikhilanayak nikhilanayak added the bug Something isn't working. label Feb 25, 2022
@StellaAthena
Copy link
Member

This codebase is not actively maintained and was created before TPU VMs existed. You’ll probably have to figure it out yourself, unfortunately.

You may want to check out Mesh Transformer Jax, which is a more actively maintained Jax-based TOU framework

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working.
Projects
None yet
Development

No branches or pull requests

2 participants