Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate_network returning floats instead of ints #64

Open
knaaptime opened this issue Jul 22, 2020 · 10 comments
Open

integrate_network returning floats instead of ints #64

knaaptime opened this issue Jul 22, 2020 · 10 comments

Comments

@knaaptime
Copy link
Contributor

knaaptime commented Jul 22, 2020

I'm unable to create a pandana network object after processing OSM and GTFS data with urbanaccess. I'm using this osm network and this gtfs feed. If I use either of those data sources directly, I can sucessfuly insantiate a pandana.Network.

Once I try to create a multimodal network with 'integrate_network`, it completes successfully:


Loaded UrbanAccess network components comprised of:
     Transit: 2,415 nodes and 8,385 edges;
     OSM: 486,514 nodes and 742,113 edges
Connector edges between the OSM and transit network nodes successfully completed. Took 1.16 seconds
Edge and node tables formatted for Pandana with integer node ids: id_int, to_int, and from_int. Took 3.15 seconds
Network edge and node network integration completed successfully resulting in a total of 488,929 nodes and 755,328 edges:
     Transit: 2,415 nodes 8,385 edges;
     OSM: 486,514 nodes 742,113 edges; and
     OSM/Transit connector: 4,830 edges.
<urbanaccess.network.urbanaccess_network at 0x7fb8d761df50>

however, if I try to create a pdna.Network from the integrated data, I get

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-35-2640ece2a378> in <module>
      4                                urbanaccess_net.net_edges["to_int"],
      5                                urbanaccess_net.net_edges[["weight"]],
----> 6                                twoway=False)

~/anaconda3/envs/healthacc/lib/python3.7/site-packages/pandana/network.py in __init__(self, node_x, node_y, edge_from, edge_to, edge_weights, twoway)
    101                                                           .astype('double')
    102                                                           .values,
--> 103                             twoway)
    104 
    105         self._twoway = twoway

src/cyaccess.pyx in pandana.cyaccess.cyaccess.__cinit__()

ValueError: Buffer dtype mismatch, expected 'long' but got 'double'

looking closer at the ua_network.net_edges object, I can see that the two columns to_int and from_int are actually floats, though looking at the code I cant see why that would be the case. I'm guessing its the underlying reason I cant build a network since the docs seem to indicate pandana needs integers in the from/to cols (though it also seems to work ok with strings if I try and build a network exclusively from the GTFS data) but was curious if you had any insight.

I could post the whole notebook if its useful

  • Operating system:
    macos
  • Python version:
    3.7
  • UrbanAccess version:
    0.2.0 (albeit with a small local fix for the as_matrix issue)
@sablanchard
Copy link
Contributor

Hi @knaaptime thanks for the issue, yes a full notebook to replicate the data you are using with the full workflow would be helpful so we can try to replicate the issue.

@knaaptime
Copy link
Contributor Author

https://gist.github.com/knaaptime/3d195ce2e23a05aed3af603f75d40168

@smmaurer
Copy link
Member

Hi @knaaptime, could you share a link to the GTFS zip file? I think we need that to run the notebook.

Your diagnosis sounds right to me. I guess the next step is to confirm that Pandana can create the network if the floats are cast to ints, and then track down why this is happening on the UrbanAccess side..

@knaaptime
Copy link
Contributor Author

oops, sorry about that. links updated

@knaaptime
Copy link
Contributor Author

ok, so the issue was that i was using pandana's Network.from_hdf5 method to read from an existing file, then passing network.node_df to the urbanaccess osm network creator. The problem was that id was still set as the node_df index rather than being available as a column on the dataframe, generating nan's during the call to integrate_networks and upcasting to/from_int to floats.

I can go ahead and close this, though it was tough to track this down, so I could do a PR to add an explicit check for the required cols or something?

@knaaptime
Copy link
Contributor Author

maybe i spoke too soon. I think this line needs to have a .reset_index()

@knaaptime
Copy link
Contributor Author

i think id needs to be both the index and a column on the nodes_df

@smmaurer
Copy link
Member

smmaurer commented Aug 3, 2020

I wish this stuff was better documented in Pandana. I might have a helpful code example, though.

Last week I made a new Pandana demo notebook, and in Section 1 there's an example of what a typical network.nodes_df looks like, and then how to build a new network directly from its columns. Maybe we can compare this to what's happening in UrbanAccess..

https://github.com/UDST/pandana/blob/master/examples/Pandana-demo.ipynb

@smmaurer
Copy link
Member

Hi, finally had a chance to look at this more closely.

To summarize what's happening, running ua.integrate_network() with these data files appears to work but actually creates an edges table where from_int and to_int are incorrect and sometimes missing. The missing values means they end up as floats rather than ints, which causes a Pandana error when trying to load the integrated network.

I think we need @sablanchard's eyes on this. @knaaptime reports that it might have to do with the id format in this OSM data -- but comparing that data to the Pandana demo material, it looks completely standard.

@sablanchard, here's a single zip file with everything you need (notebook, which i've updated a bit, plus the data files). Environment info is at the top of the notebook. urbanaccess-issue.zip

I'm thinking we should move ahead with the release as-is, rather than waiting for a fix here. It will be no problem to put out subsequent updates.

@knaaptime
Copy link
Contributor Author

thanks for looking into this. I think it comes down to the way that integrate_network expects the input dfs to be formatted. I can fix the issue by inserting this line before cell 9 in the linked gist:

osm_network.nodes_df['id'] = osm_network.nodes_df.index

its not enough to reset the index (thus making the 'id' column available on the df). Instead both the index and id variable need to be identical and formatted as ints. osmnet returns data in this format, but if you have a network you've already used with pandana sitting around (as in my case) you may have the index but not the column. I think the easiest way to ensure this would be to check for the necessary columns/index on the input to integrate_network and I could add that check if its of interest

btw, thanks for the recent dev pushes and the new example notebook. I the new shortest path stuff is fantastic and makes it really easy to 1) create a pysal spatial weights object based on network distance and 2) to integrate the udst stack with pysal's new access module (which goes a long way toward addressing this and this). I have some new demos that are just about ready to share so i'll ping you when i post them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants