[WIP] initial draft of network weights #356

knaaptime · 2020-10-25T19:30:20Z

this starts a new weights module that creates a neighbor relationship based on the shortest path through a pandana network.

Currently the weights are set to distance, so need to include some inverse distance weighting functions, but this is a start for generating feedback

stuartlynn · 2020-10-25T20:37:55Z

Some thoughts about handling situations where the origin or destination dataframes are polygons rather than points.

Currently, I think the approach is to use the geometry centroid as the point geometry and snap that to the closest node of the network.

One can imagine other strategies that might be more appropriate in other cases. For example

Using a representative point on the polygon instead of a centroid
If the origin polygons have population like data and we can assume that data is uniform across the polygon, assigning some weighted fraction of the orgin polygon data to each network node that the polygon intersects.

It would be interesting to include a mechanism to specify different strategies for assigning polygon data to the network. Perhaps a function that can be passed into the from_dataframe function with a signature similar to

"""
parameters
--------------
polygon:  The origin or destination polygon  + properties as a Data Frame
intersecting_nodes: A list of nodes that intersect with the polygon 

returns
---------
A DataFrame of the nodes to be used with an associated weight value
"""

def assign_polygon_data_to_network(polygon,intersecting_nodes):
   return nodes, weights

An implementation of this function for centroid assignment would simply find the closest intersecting_node to the polygon centroid and return that with a weight of 1.

While an implementation for assigning equal weights for each of the intersecting nodes would return the list of intersecting_nodes with 1/len(intersecting_nodes) as the weights for each.

This probably isn't quite the right way to do this but hopefully gets across the idea and can spark some discussion.

codecov · 2020-10-25T20:39:52Z

Codecov Report

Merging #356 into master will decrease coverage by 0.91%.
The diff coverage is 12.22%.

@@            Coverage Diff             @@
##           master     #356      +/-   ##
==========================================
- Coverage   81.40%   80.48%   -0.92%     
==========================================
  Files         117      118       +1     
  Lines       11920    12075     +155     
==========================================
+ Hits         9703     9719      +16     
- Misses       2217     2356     +139

Impacted Files	Coverage Δ
libpysal/weights/network.py	`11.23% <11.23%> (ø)`
libpysal/weights/__init__.py	`100.00% <100.00%> (ø)`
libpysal/cg/shapely_ext.py	`19.01% <0.00%> (-4.92%)`	⬇️
libpysal/cg/locators.py	`60.26% <0.00%> (-0.29%)`	⬇️
libpysal/io/util/tests/test_weight_converter.py	`16.12% <0.00%> (-0.18%)`	⬇️
libpysal/cg/tests/test_rtree.py	`95.65% <0.00%> (-0.10%)`	⬇️
libpysal/cg/tests/test_locators.py	`96.22% <0.00%> (-0.07%)`	⬇️
libpysal/common.py	`82.69% <0.00%> (ø)`
libpysal/cg/rtree.py	`91.64% <0.00%> (ø)`
libpysal/cg/kdtree.py	`67.85% <0.00%> (ø)`
... and 97 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f24ebba...e1269a6. Read the comment docs.

knaaptime · 2020-10-25T20:39:53Z

as @stuartlynn raises above, centroids are sketchy so will be cool to consider other options for attaching polys to the network

jGaboardi · 2020-10-25T20:43:33Z

as @stuartlynn raises above, centroids are sketchy so will be cool to consider other options for attaching polys to the network

This is what I did my dissertation on (see pp2n). I am planning on making it part of spaghetti eventually. Definitely here, too, if desired.

stuartlynn · 2020-10-25T20:51:52Z

@jGaboardi that looks great!

knaaptime · 2020-10-25T20:56:01Z

the error looks to be specific to windows/fiona/py3.6, but otherwise i think this is basically passing

jGaboardi · 2020-10-25T20:59:29Z

the error looks to be specific to windows/fiona/py3.6, but otherwise i think this is basically passing

Yes, and I think specific to the feedstock. I opened an issue over there last week.

conda-forge/fiona-feedstock#171

martinfleis

I feel that the most important part here will be the integration of NetworkW with the rest of the ecosystem, i.e. the simple way of creating pandana.Network from LineString GeoDataFrame, OSMnx-like networkx.Graph, momepy-like networkx.Graph, spaghetti.Network and likely some other options.

Btw, do we know what is the performance of pandana's shortest paths vs networkx? I guess it is much faster since it is vectorised C code, but I haven't found it.

martinfleis · 2020-10-26T17:10:08Z

libpysal/weights/network.py

+    origins["osm_ids"] = network.get_node_ids(
+        origins.centroid.x, origins.centroid.y
+    ).astype(int)
+    destinations["osm_ids"] = network.get_node_ids(
+        destinations.centroid.x, destinations.centroid.y
+    ).astype(int)


We could check if origins == destinations and copy osm_ids in that case. I guess that this can be costly operation, so it is worth skipping if we can.

do we know what is the performance of pandana's shortest paths vs networkx? I guess it is much faster since it is vectorised C code, but I haven't found it.

i think i can find it somewhere. Its not only that pandana is vectorized c, but it also uses contraction hierarchies instead of e.g. dijkstra

We could check if origins == destinations and copy osm_ids in that case. I guess that this can be costly operation, so it is worth skipping if we can.

thats a really good point. I originally wrote this for use with access, where origins and destinations aren't always the same, but in think in the context of a W, they are, so we can skip the second call to get_node_ids

Suggested change

origins["osm_ids"] = network.get_node_ids(

origins.centroid.x, origins.centroid.y

).astype(int)

destinations["osm_ids"] = network.get_node_ids(

destinations.centroid.x, destinations.centroid.y

).astype(int)

origins["osm_ids"] = network.get_node_ids(

origins.centroid.x, origins.centroid.y

).astype(int)

destinations["osm_ids"] = network.get_node_ids(

destinations.centroid.x, destinations.centroid.y

).astype(int)

martinfleis · 2020-10-26T17:15:09Z

libpysal/weights/network.py

+        destinations['idx'] = destinations.index.values
+        index_dest = 'idx'
+
+    # I dont think there's a way to do this in parallel, so we can at least show a progress bar


You could have origins["osm_ids"] as dask.Series and apply over it in parallel, but that may be overkill. Isn't network.shortest_path_lengths parallelised in pandana?

i've been wondering about whether this is dask-able for massive datasets. Still very curious to see if it works/how performant it could be, so i may try out some explorations

Partially. You will always need to fit the whole Network object in memory. But you can use dask to parallelize that for loop. I don't think it is possible to actually distribute this computation to a cluster, but on a single machine, the parallel loop may help. One way how to do that is here https://urbangrammarai.github.io/spatial_signatures/measuring/morphometrics.html#generate-spatial-weights-w

libpysal/weights/network.py

jGaboardi · 2020-10-26T18:01:03Z

libpysal/weights/network.py

+    return feeds
+
+
+class NetworkW(W):


I know this PR is a WIP, but would it be possible to clean up the docstrings to conform with pysal/pysal#1174 once it is nearing non-WIP status?

I can also work on this once thing are more concrete.

martinfleis · 2020-10-27T11:26:35Z

libpysal/weights/network.py

+        adj = compute_travel_cost_adjlist(df, df, network, index_orig=ids, index_dest=ids)
+        if max_dist:
+            adj = adj[adj['cost'] <= max_dist]


One thing I just realised about this. adj is a dense matrix. If you set max_dist, you still have to create that dense matrix before, which means that you can quickly run out of memory. Would it make sense to do this check within the loop in compute_travel_cost_adjlist? It may be less efficient performance-wise, but it would not blow up on memory.

I was just thinking about my data with hundreds of thousands of buildings for which I'd like to get network W, realising this would probably kill the machine.

On a related note, it may be worth checking the option to generate W on demand (for all constructors) to avoid these memory issues altogether.

Yes, definitely. This strategy works for sparse graphs, but any dense graph will have approximately n^2 entries. No fun....

To make these things built on the fly for a NetworkW, we'd need to make many attributes be cached properties. it's definitely doable in the same pattern as the rasterW GSOC project in #343

ah, thanks. good point. I think moving that check up into the loop is probably the best way to go, but its worth noting that I sketched out a different way of doing this that used pandana's nearest_pois functionality before they added the shortest_path method.

The difference between the two approaches has to do with how/when the data get filtered. When using nearest_pois, pandana searches out the specified distance and returns all the nodes and their distances that fall within. That means we have to set a parameter for the max total neighbors (which isnt terribly easy to estimate beforehand) and its calculating pairwise distances for every single node within the distance threshold

with shortest_paths, we're first filtering out nodes we don't care about, only calculating pairwise distances between the objects we've snapped to the OSM network (so e.g. going up to census tracts, that changes the resolution dramatically), but then we have to filter by distance afterward. So in the first case, we're including irrelevant nodes inside the distance threshold, in the second case we're including irrelevant nodes outside the threshold. I'm also not sure how precomputation may potentially factor into the difference

the raison d'etre for pandana is, more or less, to create spatial lag variables that can be used in urbansim's hedonic and location choice models, and it's designed explicitly to work at the types of scales you describe @martinfleis --only now we have to figure out how to translate that same efficiency to generate the full W object instead of just the lag. I wonder if @fscottfoti or @smmaurer have any thoughts about how best to do this? I think it could be useful for pandana users generally, because once you have the W object generated from the pandana.Network it lets you calculate different accessibility metrics or change the decay function/constant (e.g. using the generated travel matrix as input to access which would address issues like this) and those could be incorporated into the UDST workflow as well

libpysal/weights/network.py

ljwolf · 2020-10-27T11:49:16Z

libpysal/weights/network.py

+        adj = compute_travel_cost_adjlist(df, df, network, index_orig=ids, index_dest=ids)
+        if max_dist:
+            adj = adj[adj['cost'] <= max_dist]


Yes, definitely. This strategy works for sparse graphs, but any dense graph will have approximately n^2 entries. No fun....

To make these things built on the fly for a NetworkW, we'd need to make many attributes be cached properties. it's definitely doable in the same pattern as the rasterW GSOC project in #343

knaaptime · 2021-01-06T05:04:17Z

thanks to some fantastic help from the UDST folks, the next version of pandana will expose the vectorized range query, so this should reduce to a few lines of really fast code :). I'll take another pass once pandana 0.7 is out!

initial draft of network weights

4e1a609

knaaptime requested review from jGaboardi and sjsrey October 25, 2020 19:30

knaaptime added 3 commits October 25, 2020 12:46

add feeds_from_bbox func

6c1e9e9

create ped network if none passed

0877074

update reqs

0477bbe

jGaboardi added the weights label Oct 25, 2020

rename to NetworkW

5ec1e7f

martinfleis reviewed Oct 26, 2020

View reviewed changes

jGaboardi reviewed Oct 26, 2020

View reviewed changes

libpysal/weights/network.py Show resolved Hide resolved

jGaboardi reviewed Oct 26, 2020

View reviewed changes

removing duplicate pandas import

e1269a6

jGaboardi added enhancement WIP Work in progress, do not merge. Discussion only. labels Oct 26, 2020

martinfleis reviewed Oct 27, 2020

View reviewed changes

ljwolf reviewed Oct 27, 2020

View reviewed changes

knaaptime added 3 commits December 20, 2020 13:42

Merge branch 'master' of github.com:pysal/libpysal into network

cb305d4

Merge branch 'network' of github.com:knaaptime/libpysal into network

af33270

move max_dist filter inside loop

e080266

knaaptime mentioned this pull request Dec 28, 2020

Future Improvements- Returning a DataFrame of all source-destination nodes within a certain distance (and including the distance) UDST/pandana#56

Open

martinfleis changed the base branch from master to main February 27, 2023 08:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] initial draft of network weights #356

[WIP] initial draft of network weights #356

knaaptime commented Oct 25, 2020

stuartlynn commented Oct 25, 2020

codecov bot commented Oct 25, 2020 •

edited

knaaptime commented Oct 25, 2020

jGaboardi commented Oct 25, 2020 •

edited

stuartlynn commented Oct 25, 2020

knaaptime commented Oct 25, 2020

jGaboardi commented Oct 25, 2020

martinfleis left a comment

martinfleis Oct 26, 2020

knaaptime Oct 26, 2020

knaaptime Oct 26, 2020

martinfleis Oct 26, 2020

knaaptime Oct 26, 2020

martinfleis Oct 26, 2020

jGaboardi Oct 26, 2020

knaaptime Oct 26, 2020

jGaboardi Oct 26, 2020

martinfleis Oct 27, 2020

ljwolf Oct 27, 2020

knaaptime Oct 27, 2020

ljwolf Oct 27, 2020

knaaptime commented Jan 6, 2021

[WIP] initial draft of network weights #356

Are you sure you want to change the base?

[WIP] initial draft of network weights #356

Conversation

knaaptime commented Oct 25, 2020

stuartlynn commented Oct 25, 2020

codecov bot commented Oct 25, 2020 • edited

Codecov Report

knaaptime commented Oct 25, 2020

jGaboardi commented Oct 25, 2020 • edited

stuartlynn commented Oct 25, 2020

knaaptime commented Oct 25, 2020

jGaboardi commented Oct 25, 2020

martinfleis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knaaptime commented Jan 6, 2021

codecov bot commented Oct 25, 2020 •

edited

jGaboardi commented Oct 25, 2020 •

edited