Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proportional distances along defined kinds of ways #144

Closed
mpadge opened this issue Oct 13, 2020 · 16 comments
Closed

proportional distances along defined kinds of ways #144

mpadge opened this issue Oct 13, 2020 · 16 comments

Comments

@mpadge
Copy link
Member

mpadge commented Oct 13, 2020

Develop a new function modified from dodgr_distances to accept one additional parameter specifying a binary "flag" variable. Use this to aggregate two distances, rather than just one in current dodgr_distances, with the second variable only aggregated along edges with the flag variable set to TRUE.

Use cases:

Generate a sample of journeys between random points in a given area/city, and calculate things like:

  1. Average proportional distance that pedestrians are forced to walk along major motorways
  2. Average distance of journeys along streets named after men compared with streets named after women.

The latter would provide more insight into equalstreetnames (with more cities via github repo linked at top of that page). While those websites are really cool, it's also very obvious that streets named after women tend to be both peripheral and minor. To work out the exposure of actual people traversing a city to female versus male streets requires a realistic routing of random (or all) journeys through a city. That in turn could be achieved with this functionality.

ping @Robinlovelace @mem48 Any thoughts on the usefulness of this functionality for your work? Should be pretty easy to implement, if you think it would help.

@Robinlovelace
Copy link
Contributor

I think this would be useful for sure, great idea.

@mpadge
Copy link
Member Author

mpadge commented Oct 13, 2020

Another great use case: Take OD data and say you want to examine bicycle flows between all pairs. For each pair, this function would enable you to calculate the average length between all points in O and all in D that actually travels along dedicated bicycle paths (or whatever). So your OD flows would be able to be directly related to infrastructure connecting each pair.

@mpadge
Copy link
Member Author

mpadge commented Oct 13, 2020

Steps to implement this:

  1. Make a new DGraphDualEdge class which inherits from current DGraphEdge, but has an additional <bool> flag vector
    https://github.com/ATFutures/dodgr/blob/f59b049b23f0919f88ad0a466057dee991db4cbc/src/dgraph.h#L23-L28
  2. Adapt DGraph::addNewEdge to addNewDualEdge which also includes the flag vector
  3. Duplicate PathFinder::scan_edges as scan_dual_edges, and include vectors of <bool> flag and <double> d_flagged.
  4. Duplicate and modify all relevant bits in run_sp, including OneDist -> OneDistDual, and rcpp_get_sp_dists as rcpp_get_sp_dists_dual.
  5. Export rcpp_get_sp_dists_dual to R and write a wrapper for that function interface.

@mem48
Copy link

mem48 commented Oct 13, 2020

Definitely sounds useful, only thought is what if you wanted more than a binary breakdown, for example, % of the journey on a 20 mph road, 30 mph road, 40 mph road, etc

@mpadge
Copy link
Member Author

mpadge commented Oct 13, 2020

Yeah, @mem48, I had a think about that too - being able to submit a single factor column, and getting a breakdown for all specified factors. But the mechanics have to be propagated to the innermost depths of the underlying C++ Dijkstra algorithms, so even just doing it for binary is pretty non-trivial. Recorded here for posterity nonetheless, so thanks!

@mpadge
Copy link
Member Author

mpadge commented Sep 12, 2021

This is now almost fully implemented in a branch (#166). That will be merged when the TODO list has been completed - Done! Leaving vignette should be about as the only major remaining task.


Edit:

TODO

  • vignette
  • Option to return aggregate measure of relative proportions only, in order to allow really large jobs without having to hold all matrices in memory - each map-reduce job can then reduce down to a simple vector of category proportions prior to return.
  • Add extra parameter of distance threshold to override to and directly calculate proportions from each from point along all paths out to specified threshold
  • Refine the proportions_only argument to return individual proportions for each from value in that case.

mpadge added a commit that referenced this issue Sep 13, 2021
mpadge added a commit that referenced this issue Sep 13, 2021
add proportions_only parameter for #144
@mpadge mpadge pinned this issue Sep 14, 2021
@mpadge mpadge changed the title New function for proportional distances along defined kinds of ways proportional distances along defined kinds of ways Sep 14, 2021
@mpadge
Copy link
Member Author

mpadge commented Oct 4, 2021

@mem48 @Robinlovelace This function now works. It's called dodgr_dists_categorical, with a few types of interface implemented. The input graphs need to have an edge_type column with some number of discrete values (integers, characters, whatever). Distances are then aggregated along each of these categories, and final results given for each component. There's also a dlimit parameter which aggregates distances along all categories from a defined set of "from" points. The following code demonstrates that:

library (dodgr)
packageVersion ("dodgr")
#> [1] '0.2.10.11'
f <- "/<path>/<to>/leeds-sc.Rds"
graph <- readRDS (f) %>%
    weight_streetnet (wt_profile = "bicycle") %>%
    dodgr_contract_graph ()
#> Loading required namespace: geodist
#> Loading required namespace: dplyr

# Some processing of "highways" into a restricted set of `edge_type` categories:
graph$edge_type <- graph$highway
graph$edge_type [grep ("^(primary|trunk)", graph$edge_type)] <- "primary"
graph$edge_type [grep ("^secondary", graph$edge_type)] <- "secondary"
graph$edge_type [grep ("^tertiary", graph$edge_type)] <- "tertiary"
graph$edge_type [graph$edge_type == "path"] <- "pedestrian"
graph$edge_type [graph$edge_type == "living_street"] <- "residential"
not_these_types <- c ("service", "footway", "unclassified", "track",
                      "steps", "bridleway")
graph$edge_type [graph$edge_type %in% not_these_types] <- NA_character_
sort (table (graph$edge_type), decreasing = TRUE)
#> 
#> residential    tertiary  pedestrian     primary    cycleway   secondary 
#>      113813       23543       18702       16689        9979        5231

v <- dodgr_vertices (graph)
from <- v$id

system.time (
    d <- dodgr_dists_categorical (graph, from = from, dlimit = 2000)
)
#>     user   system  elapsed 
#> 2106.168    3.884  341.442
for (n in names (d) [-1])
    d [[n]] <- d [[n]] / d$distance
d$x <- v$x [match (rownames (d), v$id)]
d$y <- v$y [match (rownames (d), v$id)]

Created on 2021-10-04 by the reprex package (v2.0.1.9000)

All of Leeds takes just over 5 minutes, and the results look like this:

Proportion of all bicycle trips along dedicated cycleways

image

Proportion of all bicycle trips forced to traverse primary and secondary roads

image

Those maps provide an immediate overview of locations of relatively good and bad cycling infrastructure. Each point represents the proportion of bicycle-routed trips along all possible paths outward from that point that traverse the given kinds of ways out to an arbitrary threshold of 2km. Increasing that to a larger distance (like, say 5km) generates smoother results, but simply takes longer to calculate.

@Robinlovelace
Copy link
Contributor

Hi Mark, first impressions from that and the maps covering a huge area: seriously impressive stuff!

@mpadge
Copy link
Member Author

mpadge commented Oct 4, 2021

That's partly why this has taken so long - i knew if was possible to do this extremely efficiently, but required completely designing the path finding algorithms at the lowest level. I only recently found time to do that, and am very happy with the result. Every path from every OSM street junction in Leeds in 5 minutes.

@Robinlovelace
Copy link
Contributor

Follow-up question on reproducibility, what was the command used to generate leeds-sc.Rds? Will give it a spin in a free moment...

@Robinlovelace
Copy link
Contributor

Full reprex for my hometown of Hereford:

remotes::install_github("atfutures/dodgr")
library (dodgr)
packageVersion ("dodgr")
sc = dodgr::dodgr_streetnet("hereford UK")
graph = sc %>% 
  weight_streetnet (wt_profile = "bicycle") %>%
  dodgr_contract_graph ()
graph$edge_type <- graph$highway
graph$edge_type [grep ("^(primary|trunk)", graph$edge_type)] <- "primary"
graph$edge_type [grep ("^secondary", graph$edge_type)] <- "secondary"
graph$edge_type [grep ("^tertiary", graph$edge_type)] <- "tertiary"
graph$edge_type [graph$edge_type == "path"] <- "pedestrian"
graph$edge_type [graph$edge_type == "living_street"] <- "residential"
not_these_types <- c ("service", "footway", "unclassified", "track",
                      "steps", "bridleway")
graph$edge_type [graph$edge_type %in% not_these_types] <- NA_character_
sort (table (graph$edge_type), decreasing = TRUE)
#> 
#> residential    tertiary  pedestrian     primary    cycleway   secondary 
#>      113813       23543       18702       16689        9979        5231

v <- dodgr_vertices (graph)
from <- v$id

system.time (
  d <- dodgr_dists_categorical (graph, from = from, dlimit = 2000)
)
#>     user   system  elapsed 
#> 2106.168    3.884  341.442
for (n in names (d) [-1])
  d [[n]] <- d [[n]] / d$distance
d$x <- v$x [match (rownames (d), v$id)]
d$y <- v$y [match (rownames (d), v$id)]
head(d)
dsf = dodgr_to_sf(d)
dsf2 = sf::st_as_sf(d, coords = c("x", "y"), crs = 4326)

image

@mpadge
Copy link
Member Author

mpadge commented Oct 4, 2021

Sweet! If you know precisely what you want, it'll generally be quicker just to keep that edge type (say, "cycleway"), and set all others to NA, which are then ignored and not aggregated. All will be documented asap in vignette, which will close this issue.

@mem48
Copy link

mem48 commented Oct 4, 2021

As Robin said, this looks great. Can't wait to try it out.

@mpadge
Copy link
Member Author

mpadge commented Oct 5, 2021

Thanks @mem48, it was essentially all motivated by your incidental comment:

only thought is what if you wanted more than a binary breakdown, for example, % of the journey on a 20 mph road, 30 mph road, 40 mph road, etc

That got me thinking ... for a long time ... that such an approach must indeed be possible. Turns out that it is, so thanks!

@mem48
Copy link

mem48 commented Oct 5, 2021

I'll have to be careful what I say in the future, I might accidentally send you off on another long project.

@mpadge mpadge closed this as completed in 58a3a74 Oct 5, 2021
@mpadge
Copy link
Member Author

mpadge commented Oct 5, 2021

New vignette for you reading pleasure gentlemen! 📖 👀 Thanks both for the input 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants