Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support for left joining sf objects, preserving multiple geometry columns #2337

Open
AarshBatra opened this issue Feb 12, 2024 · 0 comments
Labels
feature a feature request or enhancement

Comments

@AarshBatra
Copy link

Encountered a situation where you needed two geometry columns in the same sf object?

Initially residing in separate sf objects (let's say sf_1, sf_2), the goal is to left_join (not a spatial join) them using a common joining key (e.g. uid). This results in a new sf object (sf_12) with two geometry columns. For example, in sf_12, you may want to calculate the row-wise distance between the first geometry (e.g., a village centroid, geometry type: point) and the second geometry (nearest irrigation canal to the village, geometry type: linestring).

However, attempting to perform this operation as follows:

# read in the village centroid shapefile
sf1 <- st_read("path/to/sf1.gpkg") # village centroids 

# read in the irrigation canal linestring shapefile
sf2 <- st_read("path/to/sf2.gpkg") # irrigation canal linestrings

# join sf1 and sf2 by "uid", which is the unique identifier for each row (leaving out geometries) in both sf1 and sf2
# and calculate row wise distance of village centroid to it's nearest irrigation canal
sf_12 <- sf_1 %>%
  left_join(sf_2, by = "uid") %>%
  mutate(dist_g1_g2 = st_distance(geometry.x, geometry.y, by_element =  TRUE))

Results in this error:

Error: y should not have class sf; for spatial joins, use st_join.

R by default assumes a spatial join by default, which makes sense, but for the above use case it's not the desired behavior we want. Ideally, in the resulting sf_12 we would have wanted a sf object with 2 geometry columns (one from sf1 and one from sf2) and all other additional new columns of sf2.

To work around this issue, coercing each sf object to a dataframe before left joining, then re-coercing it back to an sf object, was effective and worked, here is the code:

# join the sf objects after coercing them to a data frame or a tibble and then recoerce the joined df back to sf to calculate distances
sf_12 <- sf_1 %>%
as.data.frame() %>%
left_join(sf_2 %>% as.data.frame(), by = "uid") %>%
st_as_sf() %>%
mutate(dist_g1_g2 = st_distance(geometry.x, geometry.y, by_element = TRUE))

But, this might not be the most efficient implementation, or maybe I am missing something. Is there a way to handle this in sf in a better way?

If not, proposed solution:

Possibly, a dedicated function in the sf package for left joining sf objects while retaining multiple geometry columns. Something like:

# Hypothetical function to perform left join with multiple geometry columns (from above example)
sf_12 <- sf_left_join_multi_geom(sf_1, sf_2, by = "uid") %>%
  mutate(dist_g1_g2 = st_distance(geometry.x, geometry.y, by_element = TRUE))

Happy to hear thoughts!

Thanks,
Aarsh

@edzer edzer added the feature a feature request or enhancement label Apr 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants