Skip to content

Refactor: spaghetti v2.0

eli knaap edited this page Aug 9, 2021 · 6 revisions

PLEASE ADD TO THIS DOCUMENT AS NECESSARY


Initial ideas (James)

Intro:

The pysal submodule spaghetti originated from network before pandas had been considered as a dependency in pysal and prior to the reorganization that transformed the project from a single monolithic package into a group of confederated submodules linked with libpysal. Through the new structure submodule maintainers are given the freedom to add external dependencies, a practice that was previously discouraged.

Potential inclusions as dependencies include:

Keys:

  • maintain support for native pysal objects/geometries while including the option for serious performance improvements made possible through external libraries
    • Improved support for PySAL geometries in v1.4.1
  • simplify and generally refactor code base
  • strengthen ties to libpysal where possible to ensure DRY programming

2020-01-31 thoughts (James)

  • store all network segments (one per record) (pandas.DataFrame)
  • network segment endpoints as graph nodes
  • do not record degree-2 vertices (check Okabe naming)
    • network retains original spatial structure (libpysal.cg.Chain / shapely.geometry.LineString)
    • graph expresses only
      1. relationship between nodes; and
      2. distance between nodes (sure to non-euclidean nature of network (segments?)
  • snapping should be available to libpysal.cg.Chain
  • snapping can be a lookup by nearest endpoint (dissertation code: snap_to_nearest
  • quadrat search for intersections (dissertation code: get_intersections)
  • must include node insertion for snapping/network voronoi
  • ek: the data structure you describe above is basically what pandana uses, and it also implements network node snapping and shortest pathfinding via contraction hierarchies rather than dijkstra (which is what give it such outstanding performance). Might we want to just build on those structures? Once 167 is merged, it should also make the library more flexible to work with

2021-08-09 thoughts (James)

  • Giving this another push see issue here

  • See notes from PySAL meeting here

  • Thinking a potential solution will :

    • rely on shapely (pygeos) geometries through GeoPandas objects
      • spatial queries .sindex.nearest() (currently being worked out?)
    • represent networks as two GeoDataFrame objects (nodes
    • include components of...
      • momepy --> networkx
        • graph generation
        • gdf --> nx (see here)
        • nx --> gdf (see here)
      • networkx
        • connected components
        • connectivity stats, etc.
        • shortest paths (see here)
        • spanning trees (see here)
        • etc.
      • snkit
        • clean implementation (see here)
      • pyrosm and/or osmnx
        • fetching data when not provided? -- probably best to leave this separate?
      • pandana
  • Challenges

    • Associating observations with the network is probably the important aspect of spaghetti
      • The current implementation treats observations (PointPattern) separately than networks nodes. Since the observations are not network nodes, we can't use the traditional implementation of the dijkstra, which causes a HUGE time and memory leak.
    • Licensing
  • Conceptual workflow

    • Data Input
      • streets/paths (lines) geodataframe or path {.shp, .gpkg+layer, etc.}
      • observations (points) geodataframe or path {.shp, .gpkg+layer, etc.}
    • Preprocess
      • remove non-articulation points from lines data (degree-2 nodes) (see momepy here and snkit here)
      • allocate observations to line segments...
        • actual vs. abstract
          • if observations are actual (fire station) generate single connector to nearest location along nearest segment
          • if observations are abstract (census block centroid) generate n connector(s) to nearest n endpoints
        • In both actual and abstract scenarios the observations become nodes in the network and connectors become edges. This allows for using dijkstra to create cost matrices between only observations in the network (which is currently not the case in spaghetti), and will allow for multi-source dijkstra (see 374) and network Voronoi diagrams (see 292)
        • This also means the degree-2 nodes can exist in the network/graph when the nearest location along the network is a degree-1 node (where the connector edge has to be retained and labeled a special network segment).
    • Create Network Topology
      • store as an object with nodes and edges attributes as geodataframes?
      • contiguity from PySAL W?
      • ...
    • Generate Observation-Observation cost matrices
    • Perform Network Analysis
    • etc...