Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a GeoJSON processing pipeline #32

Open
3 tasks
waldoj opened this issue May 11, 2017 · 9 comments
Open
3 tasks

Create a GeoJSON processing pipeline #32

waldoj opened this issue May 11, 2017 · 9 comments

Comments

@waldoj
Copy link
Owner

waldoj commented May 11, 2017

  • retrieve ZIP files
  • unzip them
  • convert each shapefile into GeoJSON
@waldoj
Copy link
Owner Author

waldoj commented May 12, 2017

This isn't working. The shapefiles use polygons, reasonably, so instead of having one file per point, we have one file per polygon. That's no good. It may be easier to go back to #31 and figure out how to get that projection working.

@waldoj
Copy link
Owner Author

waldoj commented May 13, 2017

I'm trying using SpatiaLite (the SQLite geodata extension), and loading the shapefile directly in there. The thinking is to then issue ~9M queries, one for each coordinate pair, creating a GeoJSON record for each. The catch is that each query for the metadata for a given point requires about 1.5 seconds, or 23 weeks of queries. This could be parallelized, but that hardly seems worth the trouble.

SpatiaLite supports spatial indices, but they're just not working for me. e.g.:

spatialite> SELECT CreateSpatialIndex('phz', 'Geometry');
CreateSpatialIndex() error: "no such table: geometry_columns"
0

There are no useful Google results when searching for information about this error. I've tried creating the table, but then it demands a particular columns, and presumably this would continue until I built an adequate table? Anyway, that doesn't seem like a functioning approach.

@waldoj
Copy link
Owner Author

waldoj commented May 13, 2017

I guess I could always go full AWS and use Postgres with PostGIS extensions, on RDS, to do this. It'd surely be faster. But I don't want to get that AWS lock-in.

@waldoj
Copy link
Owner Author

waldoj commented May 14, 2017

I'm trying out Postgres/PostGIS, using these data-loading instructions. That's gone fine so far (with the surprise requirement that I have to run CREATE EXTENSION postgis up front), but my efforts to perform a point-in-polygon query have so far failed.

@waldoj
Copy link
Owner Author

waldoj commented May 14, 2017

Ah-ha—this works:

SELECT gid, id, gridcode, zone
FROM conus
WHERE ST_Contains(
    geom, ST_Transform(
        ST_GeomFromText(
            'POINT(-78.2 38.5)', 4326)
        ,4269)
    )=true;

This is running 19–25 ms per query. That comes to 122 hours for 20 million queries, or just over 5 days, which seems pretty manageable.

Spot-checking a couple of queries, the geometry appears to be fine, presumably thanks to the 4326/4269 conversion.

I feel good about this.

@webmaven
Copy link

Any progress?

@waldoj
Copy link
Owner Author

waldoj commented Sep 27, 2017

No, this was going really badly, for reasons that I didn't document and cannot remember, though in my defense I was on some very powerful painkillers for the entire week, and it's kind of amazing that I wrote any useful code at all?

Coincidentally, I was working on this again last night, returning to where I was at with #31 before I closed it. I think the proper thing to do is to solicit help in adjusting the projection that doomed that approach, instead of moving to an entirely different approach. I'll have to set up some test data, document the problem, and tweet about it, in hopes that some clever geodata folks can propose a solution.

@webmaven
Copy link

I'll have to set up some test data, document the problem, and tweet about it, in hopes that some clever geodata folks can propose a solution.

Any progress?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants