Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance: Python and C++ #72

Open
1 of 2 tasks
lexman opened this issue Oct 26, 2016 · 12 comments
Open
1 of 2 tasks

Improve performance: Python and C++ #72

lexman opened this issue Oct 26, 2016 · 12 comments
Milestone

Comments

@lexman
Copy link

lexman commented Oct 26, 2016

Hello,

in order to increase encoding performance, we'll to convert some part of the code to C/C++ in submodule.

We've already forked the project for working freely, but our purpose is to submit a pull request when we're done. That's why we want to share the design to make things right the first time.

According to our tests (in python 2), 40% of encoding occurs in the _geo_encode() function, and that's the first one we'll translate. We'll try to produce a drop-in
replacement for this function. This will allow to keep the python implementation for plateforms where C++ extensions can't be installed. We also want be compatible with both Python 2 and Python 3.

What do you think ? Do you have any suggestion or requirement ?

Alexandre

@flippmoke
Copy link

@lexman I am putting the final touches on https://github.com/mapbox/wagyu right now - which will be responsible for making all our geometry valid and supported for v1/v2. This is exciting because it is a header only library -- rather then having reliance on other large libraries. Which is really exciting because it will unlock me to start working on adding an encoder to https://github.com/mapbox/vector-tile.

I would highly recommend the python implementation to use these C++ libraries once they are available. This will help resolve #42 for this library as well.

@rmarianski
Copy link
Member

@lexman thanks for offering to do this work, and for letting us know in advance! We would appreciate a pr here, and here are my immediate thoughts:

  • the _geo_encode python function has not been optimized at all, and I'd expect that it can be sped up significantly with a relatively small amount of effort.
  • the major concern is that maintaining native code will be more effort than in python. I'd urge you to start with as small a pr as possible that can demonstrate an improvement, which will make it easier for us to review and feel comfortable about maintaining going forward. More involved work can be done in subsequent prs.

@flippmoke that looks interesting, and longer term I agree that it would be much better to converge on a shared implementation for this logic. @lexman for an initial pr, I'd rather that the scope be as limited as possible, and down the road we can consider replacing the validation logic that we currently have in place with wagyu.

@lexman
Copy link
Author

lexman commented Oct 28, 2016

Hello,

@rmarianski I perfectly understand your concerns about maintaining native code, and that's why I've made contact in the first place. If we go down that path, be sure we'll be willing to maintain the code for the 13 milion users of our french website http://mappy.com .

But for the moment I'm submiting a PR only for improving the python version of the code : simplier code will be easyer to translate. We'll see how it goes from there.

@flippmoke thanks for the tip ! We'll sure have a deep look on this promissing library...

@nvkelso nvkelso changed the title Encode in C++ Improve performance: Python and C++ Oct 28, 2016
@nvkelso nvkelso modified the milestone: v1.1.0 Dec 14, 2016
@nvkelso nvkelso added ready and removed in review labels Dec 15, 2016
@nvkelso nvkelso added nextnext and removed ready labels Jan 6, 2017
@nvkelso nvkelso removed the nextnext label Feb 14, 2017
@jalessio
Copy link
Contributor

@lexman what's the status of your performance improvements for this project? I looked around Mappy/mapbox-vector-tile but couldn't quite work out where things landed. Were you able to speed up any of the slower Python parts with your C++ code? If so, which branch should we be looking at? Any instructions you can provide? Thanks!

@jalessio
Copy link
Contributor

@nvkelso I was asking around at the Mapzen booth at SOTMUS and it was suggested that I ping you about this issue. I'm trying to sort out if anyone has made concrete progress on C-based performance improvements to this library.

I'm having trouble figuring out if anyone is really using the changes proposed by @lexman and the Mappy team. Do you have any info on that or other performance improvements that look promising? Many thanks!

@nvkelso
Copy link
Member

nvkelso commented Oct 26, 2017

@jalessio Hello again :) @rmarianski can speak to this when he's back in the office Friday.

@rmarianski
Copy link
Member

I can confirm that we are using the changes from #74 in production, which while had some improvements in performance, IIRC was more focused on preparing for a C based implementation. If there's been further improvements, we'd be happy to accept a pr for it. But at this time, we are using the v1.2.0 tag in production.

@flippmoke
Copy link

@rmarianski, at Mapbox are slowly working towards making some more generic pieces for making vector tiles. So that we can have individuals use smaller pieces that do exactly what is needed for encoding/decoding/creating vector tiles. If you would want to re-use any of our code or contribute both would be appreciated! Most of our code is being focused around using a specific internal C++ geometry structure (geometry.hpp). However, we are attempting to make more generic encoding/decoding libraries available as well. https://github.com/mapbox/vtzero

@rmarianski
Copy link
Member

@flippmoke thanks for the heads up. Are you planning on making a python wrapper? It would be interesting to look into what it would take to use vtzero as the implementation.

@flippmoke
Copy link

@rmarianski no we are not planning on making a python wrapper for it, but it should be really fast if it was wrapped in a python.

@lexman
Copy link
Author

lexman commented Feb 28, 2018

Hello @jalessio, sorry for this late answer but I've left the tiling team for a year now. At the time, we didn't go to the C++ implementation as advertised because the 25% performance we gained was enough at the time. This performance improvement is included version 1.1 and 1.2 that have been used in production at https://mappy.com/ for a year now and it works well.

If you need more speed and load improvements and if you are using a postgis database, I would strongly suggest to take a look at new Postgis function ST_AsMVT ( http://postgis.net/docs//ST_AsMVT.html ) that have been available since september (version 2.4). It directly encodes the vector tile in the database with a C library. Very efficent.

So it's clear now that we won't make the C++ implementation. Should I close the issue ?

@jalessio
Copy link
Contributor

jalessio commented Mar 3, 2018

@lexman thanks for the update!

I would like to use ST_AsMVT but we're using Postgres/Postgis on AWS RDS and as of now they don't have protobuf support on their instances :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants