Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: invalid signature when unzipping Geonames data file #26

Open
heffergm opened this issue Jan 9, 2016 · 10 comments · Fixed by #154
Open

Error: invalid signature when unzipping Geonames data file #26

heffergm opened this issue Jan 9, 2016 · 10 comments · Fixed by #154

Comments

@heffergm
Copy link

heffergm commented Jan 9, 2016

This occurs sporadically.

root@worker1:/mnt/pelias/logs# cat /mnt/pelias/logs/geonames_all.err
[Error: invalid signature: 0xc7cf971f]
_writableState.buffer is deprecated. Use _writableState.getBuffer() instead.
events.js:85
      throw er; // Unhandled 'error' event
            ^
Error: invalid signature: 0xc7cf971f
    at /mnt/pelias/pelias-geonames/releases/20160108001248/node_modules/geonames-stream/node_modules/unzip/lib/parse.js:63:13
    at processImmediate [as _immediateCallback] (timers.js:358:17)
@riordan riordan added the bug label Jan 9, 2016
@riordan riordan added this to the Who's on First milestone Jan 15, 2016
@orangejulius
Copy link
Member

Since this ticket we've completely revamped the Geonames importer. This looks like a transient issue related to an invalid download file. If it happens again I will take another look.

@orangejulius
Copy link
Member

Hello two years later, this issue has happened again and is an issue in the unzip utility we use. Fortunately there is a replacement

@orangejulius orangejulius removed this from the Who's on First milestone Dec 12, 2016
@orangejulius orangejulius self-assigned this Dec 12, 2016
@orangejulius
Copy link
Member

This appears to not necessarily be an issue with the unzip NPM package, but something about the files that are downloaded to disk that causes them to be slightly corrupt in a way the unzip command line program handles fine, but the unzip NPM package does not. It's unclear if it's our downloader causing the corruption or if the Geonames server distributes the files in this corrupted state.

orangejulius added a commit that referenced this issue Dec 13, 2016
The downloader is now similar to the WOF downloader in that it uses
child_process.exec and curl to download data. This seems to be more
reliable than using request and piping to a file.

There were also issues with the progress bar package, so rather than
sort them out, this allows us to simply remove them.

This change probably will help fix #26,
but it's not 100% certain.
@ghost ghost removed the in progress label Dec 13, 2016
@orangejulius
Copy link
Member

This issue is still occuring in our builds, despite the attempts in #154 and #171 to solve or work around it. This has been happening periodically since the very creation of this repository (it is in fact a dupe of the very first issue in the repo).

We need to consider using an alternate unzip method, such as a commandline unzip that is more robust, or some sort of other solution.

@asdfasdafas
Copy link

Does anyone know if there a work-around for this? I'm encountering this issue currently, and I'm unable to complete the import.

@orangejulius
Copy link
Member

Hey @asdfasdafas,
We have somewhat of a workaround, but its not great. Since the problem is (we think) inherent in the zipfile as published by Geonames, we get around it for Mapzen Search by caching old, valid zipfiles.

One possible alternative workaround would be to change our code to avoid using the node.js zip library, and use a standard commandline unzip. This would require some reorganizing of the code in this importer, but if you were interested in taking a look at it I'd be happy to help point you in the right direction. We would gladly accept a PR that does that :)

@asdfasdafas
Copy link

Ah I probably wouldn't be much help on the node.js code, but would you happen to know where I could download copies of the known-good geonames files?

@orangejulius
Copy link
Member

orangejulius commented Nov 20, 2017

No worries. This is the one we have cached for Mapzen Search: https://s3.amazonaws.com/pelias-data/geonames/allCountries.zip

Its modification time is Nov 18, 2017 7:05:50 PM GMT-0500, so its not TOO old.

@orangejulius
Copy link
Member

An update here: as it turns out, there is no correct way to stream a zip file without loading it into memory. This makes sense, as you can't pipe to or from unzip on the command line.

We have two options, switch to using a library like yazul which implements a non-streaming API for reading zip files, or extract zip files after download to expose the underlying text file, which IS stream-able.

My vote is for the second approach, since it would have the added benefit of removing code, whereas adjusting our existing code to use yazul may be a bit of tedious work.

In either case, #297 is effectively a prerequisite.

@orangejulius
Copy link
Member

Update: a possible workaround here is to download the broken Geonames zip file, extract the data with unzip, and then re-compress it with zip. This seems to create archives that the importer can successfully read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants