Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: helper functions part 1 #465

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open

ENH: helper functions part 1 #465

wants to merge 25 commits into from

Conversation

gregmaya
Copy link
Contributor

@gregmaya gregmaya commented Feb 1, 2023

This is the correction that I really wanted to make before moving back to fixing the tri_like_junctions.
Apologies for any code clumsiness. I'm afraid some of my python skills might need refreshing.

@gregmaya gregmaya changed the title [ENH] helper functions part 1 ENH: helper functions part 1 Feb 2, 2023
@codecov
Copy link

codecov bot commented Feb 2, 2023

Codecov Report

❗ No coverage uploaded for pull request base (main@50f29d3). Click here to learn what that means.
The diff coverage is 100.0%.

❗ Current head 62eab1f differs from pull request most recent head f1d67a2. Consider uploading reports for the commit f1d67a2 to get more accurate results

Impacted file tree graph

@@          Coverage Diff           @@
##             main    #465   +/-   ##
======================================
  Coverage        ?   96.1%           
======================================
  Files           ?      13           
  Lines           ?    2974           
  Branches        ?       0           
======================================
  Hits            ?    2857           
  Misses          ?     117           
  Partials        ?       0           
Impacted Files Coverage Δ
momepy/preprocessing.py 91.9% <100.0%> (ø)

@gregmaya gregmaya marked this pull request as ready for review February 2, 2023 15:10
@gregmaya
Copy link
Contributor Author

gregmaya commented Feb 2, 2023

Happy to discuss the new processes!
But actually more excited to be able to use the new helper function to identify tri_like_junctions... :P

@martinfleis
Copy link
Member

Could you summarise what this does? Need to know the goal to properly review it :).

@gregmaya
Copy link
Contributor Author

gregmaya commented Feb 2, 2023

In short, all I did was include a few attributes to the polygons before selecting them as part of the overall roundabout group or not. i.e. _create_shape_index() and _count_edges_per_polygon().
Then I use those two functions within the overall process that selects the adjacent polygons.
For a polygon in order to be considered adjacent it now has to :

  • be touching a circle (roundabout)
  • be smaller than its paired circle OR
  • have 3 forming edges AND
  • be less than 4x the size of the roundabout (this number was after some testing but I still include it as part of the function's adjustable attributes)

This just creates a simpler (arguably more robust) way of dealing with those roads approaching roundabouts than the current test on the area and Hausdorff distance.

Now that I have had time to detach myself from this a little bit I do still see one problem with the overall roundabout_simplification() function, though. That is when two roundabouts share the same adjacent polygon the outputs are pretty bad, and their link gets broken. see image
image

Overall, this is an enhancement but the next step should be to fix those cases (luckily it's not very often than that occurs)

@gregmaya gregmaya marked this pull request as draft February 8, 2023 11:29
@gregmaya
Copy link
Contributor Author

gregmaya commented Feb 8, 2023

Hey @martinfleis @jGaboardi
I believe I need your help here.
I went ahead and updated the function to disregard the adjacent polygons that are shared amongst two (or more roundabouts). Up to that point all seems to be running OK.
However, for a reason that I cannot understand I'm now getting a TypeError: 'NoneType' object is not subscriptable in a for loop (lines 979 - 983):

lines = []
for i, row in incoming.iterrows():
        if row.dist_first_pt < row.dist_last_pt:
            lines.append(LineString([row.first_pt, row.center_pt]))
        else:
            lines.append(LineString([row.last_pt, row.center_pt]))

This is trying to add a line from the center point of the roundabout to the closest end of the (previously selected) incoming lines.
I've tried to debug this several times, and even when I look at individual repetitions of the loop the lines do get added to the lines list. But when I run it all is that I get the error. Could it be something with the way shapely created coordinates?

Anyway, one thing fixed/improved and something else breaks !

@martinfleis
Copy link
Member

hey sorry, I hope to get to check the code this week. if you can ensure CI is green in the meantime, that'd be fab!

@gregmaya
Copy link
Contributor Author

gregmaya commented Feb 13, 2023

hey sorry, I hope to get to check the code this week. if you can ensure CI is green in the meantime, that'd be fab!

Sure, I'm on it !

@jGaboardi
Copy link
Member

@gregmaya Also, you can take care of the formatting issues/failure either installing pre-commit in your environment or running the following on preprocessing.py:

$ isort preprocessing.py
$ flake8 preprocessing.py
$ black preprocessing.py

Either option will require you to install isort, flake8, and black.

@jGaboardi
Copy link
Member

xref #396

@gregmaya gregmaya marked this pull request as ready for review February 13, 2023 16:01
@gregmaya
Copy link
Contributor Author

Hopefully, by keeping the PR as 'draft' I wasn't spamming you in every single commit 🙈

@jGaboardi thanks for the advice. Formatting is definitely giving me a hard time, lately.

@jGaboardi
Copy link
Member

Hopefully, by keeping the PR as 'draft' I wasn't spamming you in every single commit

All good. I don't mind it at all.

@gregmaya
Copy link
Contributor Author

Ok so before you go ahead and review this code I actually kept thinking that the hausdorff_distance bit (lines 860-869) was sub-optimal.
I know it runs but it's not the most efficient and is hard to explain. I, therefore, went another route and I have an Enhancement to that which is faster (as it uses arrays) but before committing that I thought I would ask you guys if you prefer to have small changes at a time...

@martinfleis
Copy link
Member

Generally yes but if waiting for a review should block you then go ahead and make a bigger change

@gregmaya gregmaya marked this pull request as draft February 16, 2023 10:12
@gregmaya gregmaya marked this pull request as ready for review February 16, 2023 11:41
@gregmaya
Copy link
Contributor Author

Hey, @martinfleis and @jGaboardi just flagging that this is ready for review.
Once approved, I'll use some of the functions here to speed up the resolution of 'tri-junctions' which was started in PR #396.

Also, are you guys thinking to include this as part of GSOC2023? I saw NumFOCUS there but couldn't find momepy in their list

@martinfleis
Copy link
Member

Thanks! Will have a look.

Also, are you guys thinking to include this as part of GSOC2023

We haven't discussed this yet.

Copy link
Member

@martinfleis martinfleis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. A few notes in the code.

@@ -756,6 +757,57 @@ def _polygonize_ifnone(edges, polys):
return polys


def _create_shape_index(gdf):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has been a development around this in the meantime leading to dropping of this specific formula for the shape index. I don't want to go into details now but the mathematical derivation of this one was flawed.

We now use this one in the paper (in progress)

from esda import shape

polygons["circular_compactness"] = shape.minimum_bounding_circle_ratio(polygons)
polygons["circular_compactness_index"] = polygons["circular_compactness"] * area

I suggest trying the same, but the threshold values will obviously change.

For sure, we should use esda to measure the actual index to avoid calling pygeos directly here (we'll need to get rid of pygeos soon).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reviewed this and the reality is that we only use reock as a selection metric for roundabouts. By using shape.minimum_bounding_circle_ratio(polygons) we would be simplifying the way we get those numbers (which is great) but the actual values are the same.
Either I'm missing something or we could just make those changes and forget about the extra step to create the INDEX that in reality never gets used.

Comment on lines 789 to 791
polygons_gdf["geometry"] = polygons_gdf["geometry"].apply(
lambda x: make_valid(x) if not x.is_valid else x
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done as a vectorised call, make_valid is now available as GeoSeries.make_valid.

Comment on lines 863 to 874
max_dists = []
for i, row in rab_adj.iterrows():
xs, ys = row.geometry.exterior.coords.xy
poly = np.array(list(map(lambda x, y: (x, y), xs, ys)))

cx, cy = row.savedgeom.centroid.coords.xy
c_point = np.array(list(map(lambda x, y: (x, y), cx, cy)))

# list of distances between centroid and each vertex of polygon
ds = map(np.linalg.norm, c_point - poly)
d_max = max(ds)
max_dists.append(d_max)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just a max distance between a centroid and a node? If so, you can avoid the loop and depend on shapely here for sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I could think of was this...

max_dists = []
        for _, row in rab_adj.iterrows():
            dists = [
                row.savedgeom_right.centroid.distance(Point(x, y))
                for x, y in row.geometry.exterior.coords
            ]
            max_dists.append(max(dists))
        rab_adj["max_dist"] = max_dists

but you probably had something else in mind.

@gregmaya
Copy link
Contributor Author

Thanks, Martin.
I've seen your comments and hope to be back on this soon.

@gregmaya
Copy link
Contributor Author

I tried to implement most of the suggestions. However, I must say that I couldn't figure out a way to avoid the for loop for calculating the max_distance to all vertex of the adjacent polygons. I did change it to using the distance() function but couldn't figure out how to avoid the loop.
As for the failed checks, I believe those are mostly in relation to esda. I'm pretty sure it needs to be included in the different environments. Are you happy to include that?

@gregmaya
Copy link
Contributor Author

@martinfleis @jGaboardi just raising some awareness of this as I believe the updated functions work as long as esda is incorporated into the test environments. Thanks

@martinfleis
Copy link
Member

I've added esda to CI in 7548af0. Let's see how it goes, I haven't tested it locally.

@gregmaya
Copy link
Contributor Author

Hey! Any idea why the pre-commit checks remain 'pending'?
Is there anything I can do?

@martinfleis martinfleis reopened this Mar 30, 2023
@martinfleis
Copy link
Member

@gregmaya I managed to make it work :)

Copy link
Member

@martinfleis martinfleis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make sure the test pass?

Counts the number of linestrings that formed the polygons
"""
# Check if the polygons are valid geometries and fix them if not
polygons_gdf["geometry"] = polygons_gdf.geometry.make_valid()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail for older geopandas (which we still formally support).

Also, we should call make_valid only if there are actually some invalid polygons.

area_threshold_val = gdf.area.quantile(area_threshold)
rab = rab[rab[area_col] < area_threshold_val]
if area_threshold_val > 2000:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why 2000? Should this be controllable via a keyword?

@martinfleis
Copy link
Member

@gregmaya can you update the code to depend on shapely 2 instead of pygeos? We've just merged it elsewhere. There are also some to-do's when shapely 2 is available that can be tackled now.

@gregmaya
Copy link
Contributor Author

gregmaya commented Apr 8, 2023

Oh wow!
Yeah I'll try to find some time this upcoming week. Thanks for the heads up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants