Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing required option: region – in Google Cloud #70

Open
drunkinlove opened this issue Sep 1, 2020 · 6 comments
Open

Missing required option: region – in Google Cloud #70

drunkinlove opened this issue Sep 1, 2020 · 6 comments

Comments

@drunkinlove
Copy link

Hello!

I get the following error when trying to execute the create_data.py script in the Google Cloud Shell:

Traceback (most recent call last):
  File "reddit/create_data.py", line 347, in <module>
    run()
  File "reddit/create_data.py", line 285, in run
    p = beam.Pipeline(options=pipeline_options)
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 203, in __init__
    'Pipeline has validations errors: \n' + '\n'.join(errors))
ValueError: Pipeline has validations errors:
Missing required option: region.

I'm using the latest version of apache-beam, 2.23.0.

@drunkinlove
Copy link
Author

Fixed that by installing the necessary dependencies through requirements.txt.
This is the error I get now:

user@cloudshell:~/conversational-datasets (reddit-data-288210)$ python reddit/create_data.py \
>   --output_dir ${DATADIR?} \
>   --reddit_table ${PROJECT?}:${DATASET?}.${TABLE?} \
>   --runner DataflowRunner \
>   --temp_location ${DATADIR?}/temp \
>   --staging_location ${DATADIR?}/staging \
>   --project ${PROJECT?} \
>   --dataset_format JSON
********************************************************************************
Python 2 is deprecated. Upgrade to Python 3 as soon as possible.
See https://cloud.google.com/python/docs/python2-sunset
To suppress this warning, create an empty ~/.cloudshell/no-python-warning file.
The command will automatically proceed in  seconds or on any key.
********************************************************************************
WARNING: Logging before flag parsing goes to stderr.
I0902 11:45:25.874641 140704769283904 apiclient.py:464] Starting GCS upload to gs://reddit-data-bucket/reddit/20200902/staging/beamapp-user-0902114525-652460.1599047125.652742/pipeline.pb..
.
I0902 11:45:25.880270 140704769283904 transport.py:157] Attempting refresh to obtain initial access_token
Traceback (most recent call last):
  File "reddit/create_data.py", line 347, in <module>
    run()
  File "reddit/create_data.py", line 341, in run
    result = p.run()
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 390, in run
    self.to_runner_api(), self.runner, self._options).run(False)
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 403, in run
    return self.runner.run_pipeline(self)
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 364, in run_pipeline
    self.dataflow_client.create_job(self.job), self)
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/utils/retry.py", line 180, in wrapper
    return fun(*args, **kwargs)
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 485, in create_job
    self.create_job_description(job)
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 511, in create_job_description
    StringIO(job.proto_pipeline.SerializeToString()))
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 467, in stage_file
    response = self._storage_client.objects.Insert(request, upload=upload)
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py", line 971, in Insert
    download=download)
  File "/home/user/.local/lib/python2.7/site-packages/apitools/base/py/base_api.py", line 720, in _RunMethod
    http, http_request, **opts)
  File "/home/user/.local/lib/python2.7/site-packages/apitools/base/py/http_wrapper.py", line 356, in MakeRequest
    max_retry_wait, total_wait_sec))
  File "/home/user/.local/lib/python2.7/site-packages/apitools/base/py/http_wrapper.py", line 304, in HandleExceptionsAndRebuildHttpConnections
    raise retry_args.exc
httplib2.SSLHandshakeError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:727)

@AntoineSimoulin
Copy link

Hi, I got the same error and solve it by updating the httplib2 to the latest version. Regarding the requirements, I also updated tensorflow==1.15.0 since version 1.14.0 gives me the following error: "No module named deprecation_wrapper".

@AntoineSimoulin
Copy link

Regarding the region, you can add the flag 'region' in the command prompt.
python reddit/create_data.py \

--output_dir ${DATADIR?}
--reddit_table ${PROJECT?}:${DATASET?}.${TABLE?}
--runner DataflowRunner
--temp_location ${DATADIR?}/temp
--staging_location ${DATADIR?}/staging
--project ${PROJECT?}
--dataset_format JSON
--region us-east1

@amorisot
Copy link

amorisot commented Oct 2, 2021

Hi, I got the same error and solve it by updating the httplib2 to the latest version. Regarding the requirements, I also updated tensorflow==1.15.0 since version 1.14.0 gives me the following error: "No module named deprecation_wrapper".

Hmmm, did you change anything from the requirements.txt file other than update httplib2 to newest and updating tensorflow to 1.15.0? I did both of those things but now am getting a "No module named module_wrapper" error :(

@alu13
Copy link

alu13 commented Feb 9, 2022

I was wondering if there was an update to the "No module named module_wrapper" error. Thanks!

@pygongnlp
Copy link

Regarding the region, you can add the flag 'region' in the command prompt. python reddit/create_data.py \

--output_dir ${DATADIR?}
--reddit_table PROJECT?:{DATASET?}.${TABLE?}
--runner DataflowRunner
--temp_location ${DATADIR?}/temp
--staging_location ${DATADIR?}/staging
--project ${PROJECT?}
--dataset_format JSON
--region us-east1

hi,

I used your method and I found it can not sign in google and apitools has been deprecated.

If there has other way to download reddit dataset? Thanks @AntoineSimoulin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants