-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After the interruption, download from beginning #15
Comments
HI, there is a PR that solves this from my fork pending but you can pip install it in the meantime > pip install git+git://github.com/tianhuil/google-ngram-downloader.git@master |
@tianhuil If I install your fork above will I be able to run
in a directory where I already have some of the length 3 ngrams downloaded? Or do I need to specify that I am using your specific version to get the functionality where the downloads will not restart from the beginning? |
If anyone else happens upon this post. I wanted a way to be able to stop the downloads and then come back and continue downloading where I'd left off. This is useful when downloading any nGrams greater than size 1 since they take many hours to download. The current implementation just restarts from the very first ngram. If you update the
and re-run the command in an output file where there are already some ngrams downloaded it will continue downloading at the next undownloaded ngram. Here is some sample output in a directory where I'd downlaoded A, B and some C ngrams:
...continues downloading the rest of the ngrams beginning at co |
I have downloaded parts of the zip files, while download processing some error occured,when I restart the download process,it will from the first one to download.so ,here is my Temporary solution:
inside download function,
for fname, url, request in iter_google_store(ngram_len, verbose=verbose, lang=lang):
# add this new if sentence to check
if os.path.exists(str(output.join(fname))):
print('already exist')
continue
else:
with output.join(fname).open('wb') as f:
print(output.join(fname),'downloading...')
for num, chunk in enumerate(request.iter_content(1024)):
if verbose and not divmod(num, 1024)[1]:
sys.stderr.write('.')
sys.stderr.flush()
f.write(chunk)
Maybe this question has been handled,but are there any better solutions.thanks.
The text was updated successfully, but these errors were encountered: