Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to fetch libpostal data #620

Open
dart-mtucker opened this issue Mar 13, 2023 · 3 comments
Open

unable to fetch libpostal data #620

dart-mtucker opened this issue Mar 13, 2023 · 3 comments

Comments

@dart-mtucker
Copy link

We've been using libpostal for several years. Pulling the 1.1 tag and building fails to fetch the libpostal data into the DATADIR location. Manual updates with libpostal_data also fail.


My country is

U.S.


Here's how I'm using libpostal

Using libpostal with perl code to do address matching.


Here's what I did

Ubuntu 22.04 (x86_64)

$ apt-get update
$ apt-get -y install \
	    autoconf automake curl git libtool pkg-config wget make
$ mkdir /data
$ cd /tmp
$ git clone --depth 1 --branch v1.1 https://github.com/openvenues/libpostal.git
$ cd libpostal 
$ ./bootstrap.sh
$ ./configure --datadir=/data
$ make -j4
$ make install 
$ ldconfig

Here's what I got

$ libpostal_data download all /data/libpostal
Checking for new libpostal data file...
libpostal data file up to date
Checking for new libpostal encoding="UTF-8"?>...
libpostal encoding="UTF-8"?> up to date
Checking for new libpostal encoding="UTF-8"?>...
libpostal encoding="UTF-8"?> up to date
$ du -sxh /data/libpostal
16K     /data/libpostal
$ ls -lh /data/libpostal
total 16K
-rw-r--r--. 1 root root  3 Mar 13 12:54 data_version
-rw-r--r--. 1 root root 21 Mar 13 12:54 last_updated
-rw-r--r--. 1 root root 21 Mar 13 12:54 last_updated_language_classifier
-rw-r--r--. 1 root root 21 Mar 13 12:54 last_updated_parser

Here's what I was expecting

In the past the data dir would have approximately 2GB of data. Running the installer or libpostal_data would properly populate this directory with data. Running the libpostal_data script with bash -x produces the output below (excerpt). I get the same error message when attempting to download the data with a web browser.

$ bash -x libpostal_data download all /data/libpostal
...
++ curl --silent https://libpostal.s3.amazonaws.com/models/address_parser/latest
+ latest_parser='<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>GDTXBM5NAHNSD2ES</RequestId><HostId>SzcJkJgwXVynWjDelOcxbpZVrpl1Ls7cYOPtO3OWvUjOAkNZ8mpq/AN4NIPiD7/qvYlyIZijuK581hfo/EC5Zj0+276oXON1sbmHv7ToOjA=</HostId></Error>'
+ parser_s3_prefix='models/address_parser/<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>GDTXBM5NAHNSD2ES</RequestId><HostId>SzcJkJgwXVynWjDelOcxbpZVrpl1Ls7cYOPtO3OWvUjOAkNZ8mpq/AN4NIPiD7/qvYlyIZijuK581hfo/EC5Zj0+276oXON1sbmHv7ToOjA=</HostId></Error>'
+ download_file /data/libpostal/last_updated_parser /data/libpostal 'models/address_parser/<?xml' 'version="1.0"' 'encoding="UTF-8"?>' '<Error><Code>AccessDenied</Code><Message>Access' 'Denied</Message><RequestId>GDTXBM5NAHNSD2ES</RequestId><HostId>SzcJkJgwXVynWjDelOcxbpZVrpl1Ls7cYOPtO3OWvUjOAkNZ8mpq/AN4NIPiD7/qvYlyIZijuK581hfo/EC5Zj0+276oXON1sbmHv7ToOjA=</HostId></Error>' parser.tar.gz 'parser data file' address_parser
+ updated_path=/data/libpostal/last_updated_parser

For parsing issues, please answer "yes" or "no" to all that apply.

N/A

Here's what I think could be improved

correct the URLs or S3 access permissions.

@hendursaga
Copy link

I also cannot fetch the data. In the meantime, I'll try using https://github.com/Senzing/libpostal-data instead.

@brianmacy
Copy link

Let me know how the Senzing model goes.

@hendursaga
Copy link

I, unfortunately, cannot remember how it went.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants