-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPIKE: Address BCGW extract taking too long in the future #612
Comments
Information from Michelle: thanks to Annika on the quick response and intel from FAIB -- David Cromarty! -- They settled on the Douglas-Peucker algorithm with a tolerance of 2.5m, as it provide the best compromise for vertex removal, shape retention, and speed |
Use ST_SimplifyPreserveTopology in postgis with tolerance 2.5 (since geometrics are stored in BCAlbers which is units of meters, so using 2.5m as per Michelle's information above). Question - will this make a difference. fom=# select sum(ST_NPoints(geojson)) from app_fom.spatial_feature; So roughly a 30x reduction in the size of the data Option 1 quick fix - create a new view used by the BCGW extract adapted from existing spatial_feature view to remove centroid and use ST_SimplifyPreserveTopology function. Option 2 do data fix (via migration) to change all existing data to apply ST_SimplifyPreserveTopology function, and add logic on upload of submission to apply this function as well. This is preferred as it minimizes data stored and helps the entire system out, not just the BCGW extract. Next steps:
|
I have a sample view definition for option 1 (would also need associated code changes to use the view). Might want to further modify the view to add the joins being done to code tables by the bcgw_extract javascript logic create view app_fom.spatial_feature_bcgw_extract as |
Steps to verify the difference of shapes for original data vs simplified data use FOM PROD data in FOM TEST environment.
Please note: to run this query, you'll need to have POSTGIS installed, and run it outside the schema
I connected our prod database from local pgAdmin, as it's easier to copy the geoJson data from pgAdmin. Created a wiki page for how to connect remote Openshift database from local: https://github.com/bcgov/nr-fom/wiki/Connect-Openshift-database-from-local-pgAdmin |
Worked on two examples (prod FOM #1356 and #1513 ), links to FOM-TEST. Please note that the "Proposed" shapes are original ones, "Final" are simplified ones. From the screenshot below we can see the difference is actually small.
|
Met with business - Julius - to show this and got his approval to proceed. He was fine with the 2.5m tolerance, and stated there are no requirements in the regulations regarding tolerance/precision. I updated the follow-on ticket with his requests to add wording to the FOM UI describing what is being done and to provide wording regarding this to Julius for him to communicate with submitters. Julius indicated that the likely reason for the high-vertice spatial objects is having people walking the cutblocks/roads using GPS trackers. |
Describe the task
Long term solution to FOM's performance issue.
Acceptance Criteria
Additional context
the query of spatial_feature view is taking roughly twice as long in prod as test for the same # of spatial feature rows, and it appears that real production data has many more times the # of spatial features per project compared to test.
Explore possible options:
0. BCGW has 3 jobs for each spatial feature type (cut block, road section, retention area) so split the API into 3, querying the tables directly instead of the view.
Performance metrics - test environment
Data volume:
~3000 FOM projects
~14000 cut blocks (4 / FOM), ~3000 retention areas (1 / FOM), ~3000 road sections (1 / FOM)
941,494 vertices (313/FOM, mostly from some small number of realistic test files, 45/spatial feature)
(Use query select sum(ST_NPoints(geojson)) from app_fom.spatial_feature;)
Query response (doesn't include full data fetch time)
BCGW extract: ~17 seconds
Original view definition: ~8 seconds to return results
Without centroid: ~6 seconds
Performance metrics - prod environment
Data volume:
228 FOM projects
6680 cut blocks (29/FOM)
5583 retention areas (24/FOM)
5002 road sections (22 / FOM)
2,114,341 vertices (9273 / FOM, 122 / spatial feature).
Query response (doesn't include full data fetch time):
BCGW extract: ~40 seconds
Original view definition: ~12 seconds
Without centroid: ~9 seconds
Without centroid and without transform of geometry: ~5 seconds
Conclusions:
The text was updated successfully, but these errors were encountered: