Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spatial Index for improving performance #120

Open
seamusdu opened this issue Nov 29, 2016 · 8 comments
Open

Spatial Index for improving performance #120

seamusdu opened this issue Nov 29, 2016 · 8 comments

Comments

@seamusdu
Copy link

seamusdu commented Nov 29, 2016

I am trying to use HiveContext within Spark to use this spatial framework and it does work. However, once I use a large dataset, it seems that the performance will decline dramatically. I am trying to count points within polygons. Hence, I wonder whether you have done any performance test, which can probably explain the performance of this framework. Also, have you ever considered creating a spatial index, which might improve the performance of spatial operations.

Thanks.

@randallwhitman
Copy link
Contributor

If your polygon dataset can fit into memory, build an in-memory quadtree index on the polygons using the Geometry API, by adapting for Spark the MapReduce sample in the GIS-Tools-for-Hadoop.

@seamusdu
Copy link
Author

Hi @randallwhitman

Thanks for your reply. The sample using quadtree index does help and I will try to use the Geometry API for Spark.

@stevebuckingham
Copy link

@seamusdu How did you find the running the Spatial Framework on Spark in the end, it is an option I'm looking at at the moment?

@randallwhitman
Copy link
Contributor

randallwhitman commented Apr 20, 2017

Cross-reference re Spark: #97 (works with JsonSerde as of v1.2)

@harryprince
Copy link

@seamusdu I am doing the same thing and wrapper spatial join query with index in geospark R package.

@guillemfrancisco
Copy link

Has anyone tried to make a benchmarking with number of points and time that took to process them? Or even a comparison between Hive and MapReduce(with spatial indexing)?

@randallwhitman
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants