Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

95.7 hours zeroth use aws s3 to download, but An error occurred (403) when calling the HeadObject operation: Forbidden #17

Open
whaozl opened this issue Jul 22, 2021 · 2 comments

Comments

@whaozl
Copy link

whaozl commented Jul 22, 2021

  • 16 July 2018: 95.7 hours (46,347 utterances, 181 speakers, 27,330 uniq. sentences)

I use my account to aws s3 cp s3://zeroth-opensource/AUDIO_INFO AUDIO_INFO. But have as follow error:

Traceback (most recent call last):
  File "/home/kaldi/python3/lib/python3.7/site-packages/awscli/customizations/s3/s3handler.py", line 173, in call
    for fileinfo in fileinfos:
  File "/home/kaldi/python3/lib/python3.7/site-packages/awscli/customizations/s3/fileinfobuilder.py", line 31, in call
    for file_base in files:
  File "/home/kaldi/python3/lib/python3.7/site-packages/awscli/customizations/s3/filegenerator.py", line 142, in call
    for src_path, extra_information in file_iterator:
  File "/home/kaldi/python3/lib/python3.7/site-packages/awscli/customizations/s3/filegenerator.py", line 318, in list_objects
    yield self._list_single_object(s3_path)
  File "/home/kaldi/python3/lib/python3.7/site-packages/awscli/customizations/s3/filegenerator.py", line 355, in _list_single_object
    response = self._client.head_object(**params)
  File "/home/kaldi/python3/lib/python3.7/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/kaldi/python3/lib/python3.7/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
2021-07-22 16:40:56,020 - Thread-1 - awscli.customizations.s3.results - DEBUG - Shutdown request received in result processing thread, shutting down result thread.
Download from AWS is failed, check your credential and configure your aws CLI

Can you help me?

AUDIOINFO='AUDIO_INFO'
AUDIOLIST=$2
bucketname="zeroth-opensource"
# download audio info file
if [ ! -f $data/$AUDIOINFO ]; then
    aws s3 cp s3://$bucketname/$AUDIOINFO $data/$AUDIOINFO
    success=$(echo $?)
    if [ $success -ne 0 ]; then
        echo "Download from AWS is failed, check your credential and configure your aws CLI"
        exit 1
    fi
fi

# download Audio
echo "Now download Audio ----------------------------------------------------"
for file in $AUDIOLIST
do
	echo "check if $file.tar.gz exist or not"
	if [ ! -f $data/$file.tar.gz ]; then
		aws s3 cp s3://$bucketname/$file.tar.gz $data/$file.tar.gz
	else
		echo "  $data/$file.tar.gz already exist"
	fi
done
@jty016
Copy link
Contributor

jty016 commented Jul 22, 2021

@whaozl
95.7 hour data is not opened in public. For now public data is in http://www.openslr.org/40/, 50 hour data.
Maybe I can consider it to be opened soon.

@mrrostam
Copy link

Hey @jty016, I wonder if the more extensive dataset containing 95 hours of data is now opened to the public, or will it be in the near future? Interestingly it seems all significant Korean speech corpora are private or at least have some unreasonable restrictions, like KoSpeech.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants