Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPIKE: Address BCGW extract taking too long in the future #612

Closed
4 tasks
OlgaLiber2 opened this issue Apr 4, 2024 · 6 comments
Closed
4 tasks

SPIKE: Address BCGW extract taking too long in the future #612

OlgaLiber2 opened this issue Apr 4, 2024 · 6 comments

Comments

@OlgaLiber2
Copy link
Collaborator

OlgaLiber2 commented Apr 4, 2024

Describe the task
Long term solution to FOM's performance issue.

Acceptance Criteria

  • Performance scales to handle at least 1 year of volume, ideally 3 years.
  • determine how much vertice simplification (Douglas-Peucker) would improve performance
  • clarify #FOMS/yr expected (so can extrapolate total # vertices)
  • Assess splitting API -one per feature type, drop centroid calculation

Additional context
the query of spatial_feature view is taking roughly twice as long in prod as test for the same # of spatial feature rows, and it appears that real production data has many more times the # of spatial features per project compared to test.

Explore possible options:
0. BCGW has 3 jobs for each spatial feature type (cut block, road section, retention area) so split the API into 3, querying the tables directly instead of the view.

  1. have separate view for BCGW extract that doesn't include the centroid calculation. Basil has confirmed that centroid column is only used for public interface to query the features of an individual FOM project. (The centroid is used to place the spatial feature label on the screen when the object is clicked). The BCGW extract does not need the centroid.
  2. cache centroid calculation on tables so the view doesn't have to calculate.
  3. cache geoJSON calculation (coordinate units transform) so the view doesn't have to calculate
  4. dba explain plan on bcgw extract query
  5. BCGW incremental load

Performance metrics - test environment
Data volume:
~3000 FOM projects
~14000 cut blocks (4 / FOM), ~3000 retention areas (1 / FOM), ~3000 road sections (1 / FOM)
941,494 vertices (313/FOM, mostly from some small number of realistic test files, 45/spatial feature)
(Use query select sum(ST_NPoints(geojson)) from app_fom.spatial_feature;)

Query response (doesn't include full data fetch time)
BCGW extract: ~17 seconds
Original view definition: ~8 seconds to return results
Without centroid: ~6 seconds

Performance metrics - prod environment
Data volume:
228 FOM projects
6680 cut blocks (29/FOM)
5583 retention areas (24/FOM)
5002 road sections (22 / FOM)
2,114,341 vertices (9273 / FOM, 122 / spatial feature).

Query response (doesn't include full data fetch time):
BCGW extract: ~40 seconds
Original view definition: ~12 seconds
Without centroid: ~9 seconds
Without centroid and without transform of geometry: ~5 seconds

Conclusions:

  • query response is largely a function of the size of the query result, which is largely a function of the # of vertices
  • unable to extrapolate from the current # of FOM projects in prod - need Julius business feedback whether we are on track for 1000 in a year or not.
  • assuming 1000 FOM/ year, can assume roughly x5 the volume of vertices in one year and x15 in three years. This will increase time required to ~200 seconds (3.5 min) in 1 year and ~600 seconds (10 min) in 3 years.
  • splitting view and removing centroid will very likely be insufficient (maybe a 30% speedup)
  • an incremental pull by BCGW is likely required.
  • investigate generalizing of input (what scale are we at when we measure geometry? how do we reduce the load?)
  • Can the Douglas-Peucker approach work for us? (https://stackoverflow.com/questions/1849928/how-to-intelligently-degrade-or-smooth-gis-data-simplifying-polygons)
  • investigate if we can simplify in JavaScript: https://mourner.github.io/simplify-js/
@OlgaLiber2 OlgaLiber2 added the task label Apr 4, 2024
@OlgaLiber2 OlgaLiber2 changed the title Long term solution to FOM's performance issue Address BCGW extract taking too long in the future Apr 4, 2024
@gormless87 gormless87 changed the title Address BCGW extract taking too long in the future SPIKE: Address BCGW extract taking too long in the future Apr 23, 2024
@basilv
Copy link
Collaborator

basilv commented Apr 24, 2024

Information from Michelle: thanks to Annika on the quick response and intel from FAIB -- David Cromarty! -- They settled on the Douglas-Peucker algorithm with a tolerance of 2.5m, as it provide the best compromise for vertex removal, shape retention, and speed

@basilv
Copy link
Collaborator

basilv commented May 10, 2024

Use ST_SimplifyPreserveTopology in postgis with tolerance 2.5 (since geometrics are stored in BCAlbers which is units of meters, so using 2.5m as per Michelle's information above).

Question - will this make a difference.
Vertex count of all app_fom.spatial_feature geometrics with and without simplification:

fom=# select sum(ST_NPoints(geojson)) from app_fom.spatial_feature;
4139717
fom=# select sum(ST_NPoints(ST_SimplifyPreserveTopology(geojson,2.5))) from app_fom.spatial_feature;
129823

So roughly a 30x reduction in the size of the data
This is the most beneficial optimization we can make, assuming this doesn't distort the data too much.

Option 1 quick fix - create a new view used by the BCGW extract adapted from existing spatial_feature view to remove centroid and use ST_SimplifyPreserveTopology function.

Option 2 do data fix (via migration) to change all existing data to apply ST_SimplifyPreserveTopology function, and add logic on upload of submission to apply this function as well. This is preferred as it minimizes data stored and helps the entire system out, not just the BCGW extract.

Next steps:

  1. We need to test/demo that this ST_SimplifyPreserveTopology function does not distort the data too much. I suggest querying records in prod and copying them into test environment with the function applied to do a side-by-side comparison. Even better, create the original record as the initial submission and the simplified data as the final submission within the same FOM.

  2. We need Julius's approval for this.

@basilv
Copy link
Collaborator

basilv commented May 10, 2024

I have a sample view definition for option 1 (would also need associated code changes to use the view). Might want to further modify the view to add the joins being done to code tables by the bcgw_extract javascript logic

create view app_fom.spatial_feature_bcgw_extract as
select o.cut_block_id as feature_id, 'cut_block' as feature_type,
p.project_id, p.forest_client_number, p.workflow_state_code,
s.submission_type_code,
o.create_timestamp,
o.name,
ST_AsGeoJson(ST_Transform(ST_SimplifyPreserveTopology(o.geometry, 2.5), 4326)) as geojson,
o.planned_development_date, o.planned_area_ha, 0.0 as planned_length_km
from app_fom.cut_block o
inner join app_fom.submission s on o.submission_id = s.submission_id
inner join app_fom.project p on s.project_id = p.project_id
union
select o.retention_area_id as feature_id, 'retention_area' as feature_type,
p.project_id, p.forest_client_number, p.workflow_state_code,
s.submission_type_code,
o.create_timestamp,
null as name,
ST_AsGeoJson(ST_Transform(ST_SimplifyPreserveTopology(o.geometry, 2.5), 4326)) as geojson,
null as planned_development_date, o.planned_area_ha, 0.0 as planned_length_km
from app_fom.retention_area o
inner join app_fom.submission s on o.submission_id = s.submission_id
inner join app_fom.project p on s.project_id = p.project_id
union
select o.road_section_id as feature_id, 'road_section' as feature_type,
p.project_id, p.forest_client_number, p.workflow_state_code,
s.submission_type_code,
o.create_timestamp,
o.name,
ST_AsGeoJson(ST_Transform(ST_SimplifyPreserveTopology(o.geometry, 2.5), 4326)) as geojson,
o.planned_development_date, null as planned_area_ha, o.planned_length_km
from app_fom.road_section o
inner join app_fom.submission s on o.submission_id = s.submission_id
inner join app_fom.project p on s.project_id = p.project_id

@MCatherine1994
Copy link
Collaborator

MCatherine1994 commented May 15, 2024

Steps to verify the difference of shapes for original data vs simplified data use FOM PROD data in FOM TEST environment.

  • Run query to get the original geojson data and the simplified geojson data for cut blocks:
select *, ST_AsGeoJson(ST_Transform(ST_SimplifyPreserveTopology(geometry, 2.5), 4326)) as simplifygeojson, ST_AsGeoJson(ST_Transform(geometry, 4326)) as origingeojson from app_fom.cut_block where submission_id in (select submission_id from app_fom.submission where project_id=1356);

Please note: to run this query, you'll need to have POSTGIS installed, and run it outside the schema

  • Format the geoJson file to submit to FOM TEST, following the sample below for cut blocks. For example, the above query will return 3 lines of records, which means the geoJson file will include three Feature Cut Blocks
{
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "id": 1,
            "properties": {
                "OBJECTID": 1,
                "FOM_OBJECT": "Cut Block",
                "GEOMETRY": "Polygon",
                "NAME": "we can get the name from the cut_block table name column",
                "DEV_DATE": "we can get the dev_date from the cut_block table planned_development_date column",
                "SHAPE_Area": "we can get the shape_area from the cut_block table, it's the value in planned_area_ha column * 10000",
            },
            "geometry": this will be the content from column simplifygeojson or origingeojson, it's in a json format
        }
    ]
}
  • Go to FOM TEST and submit a FOM. Submit the origin spatial files. When we submit the spatial files, it should indicates on the screen this is a Proposed submission
  • Go to the location tab, and verify the shape is there. Write down the FOM id
  • Connect with the FOM TEST db, update the commenting open date of the FOM we just created to today, and change the workflow_state_code to "COMMENT_CLOSED"
  • Go back to FOM TEST, and submit the simplified spatial file. When we submit the spatial files, it should indicates on the screen this is a Final submission
  • Go to the location tab, and verify the shapes are there. And now we're able to compare the original shapes vs the simplified shapes. Could zoom in to check details.

I connected our prod database from local pgAdmin, as it's easier to copy the geoJson data from pgAdmin. Created a wiki page for how to connect remote Openshift database from local: https://github.com/bcgov/nr-fom/wiki/Connect-Openshift-database-from-local-pgAdmin

@MCatherine1994
Copy link
Collaborator

Worked on two examples (prod FOM #1356 and #1513 ), links to FOM-TEST. Please note that the "Proposed" shapes are original ones, "Final" are simplified ones. From the screenshot below we can see the difference is actually small.

Image

@basilv
Copy link
Collaborator

basilv commented May 16, 2024

Met with business - Julius - to show this and got his approval to proceed. He was fine with the 2.5m tolerance, and stated there are no requirements in the regulations regarding tolerance/precision.

I updated the follow-on ticket with his requests to add wording to the FOM UI describing what is being done and to provide wording regarding this to Julius for him to communicate with submitters.

Julius indicated that the likely reason for the high-vertice spatial objects is having people walking the cutblocks/roads using GPS trackers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants