-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error and termination when hitting an unavailable URL #9
Comments
Hi, Thanks for the bug report. The issue is not that trivial to fix because different languages miss different indices. I would avoid a try ... catch block because it might hide real issues, for example when a file that should be retrieved is not retrieved due to poor connection. For the time being, you can pass indices to >>> from google_ngram_downloader import readline_google_store
>>> fname, url, records = next(readline_google_store(ngram_len=5, indices=['cd', 'ed'], lang='chi-sim'))
>>> fname
'googlebooks-chi-sim-all-5gram-20120701-cd.gz'
>>> url
'http://storage.googleapis.com/books/ngrams/books/googlebooks-chi-sim-all-5gram-20120701-cd.gz'
>>> next(records)
Record(ngram='CDP _NOUN_ _NOUN_ _NOUN_ _NOUN_', year=1983, match_count=1, volume_count=1) |
It works this way, however I have to make sure that I know all the indices, so on the long term it would still be more handy if the script could check that itself (e.g. download the google ngram page and check whether it contains the links corresponding to the indices? Sounds a bit like overkill, though...) |
I'm very busy right now, but once I get time, I'll just copy the indices from the page. |
HI, there is a PR that solves this from my fork pending but you can pip install it in the meantime > pip install git+git://github.com/tianhuil/google-ngram-downloader.git@master |
Background: For simplified Chinese, there is no "bq" combination, hence the downloader will quit with an error message when iterating through the data.
Suggestion: Wouldn't it be nicer if there was a try/catch block around the data retrieval part or the assert would be replaced by an if statement that outputs an error message but allows for jumping to the next file instead?
The text was updated successfully, but these errors were encountered: