Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use EPSG codes instead of proj4strings #426

Open
albhasan opened this issue Sep 23, 2023 · 1 comment
Open

Use EPSG codes instead of proj4strings #426

albhasan opened this issue Sep 23, 2023 · 1 comment

Comments

@albhasan
Copy link
Contributor

How could the content be improved?

Good evening,

the Data Carpentry Geospatial Curriculum Advisory Committee (CAC) has adviced to reference coordinate systems by their EPSG codes instead of using proj4strings (see #368). Some of the lesson' data (downloaded using /episodes/setup.R) doesn't meet this advice.

Regarding this issue, I found, after run the code below, that:

  • Some raster files are missing EPSG codes.
  • Some rasters are compatible with EPSG codes. I compared their CRSs' text descriptions before and after re-projecting them using the EPSG code corresponding to their original CRS.
  • Most vector files' CRSs contain EPSG codes (the only exception is newPlots_latLon.shp but it easy to infer it). However, after applying the same procedure used for the raster data, I found differences in their CRSs' WKT before and after applying a projection. This implies the their CRS were build either using a PROJ string or an old EPSG database.
  • Some vector files have Z coordinates but they are all set to 0. This could make the sf package produce warnings or even throw errors in lessons' code.

If we assign EPSG codes (to both raster and vector files) and remove their Z coordinates, we can simplify the lesson's code by, for example, removing workarounds and validation code.

Regarding the lesson's data, which is currently hosted in FigShare, I think we could:

  • Host the lesson's data along the lesson's repo,
  • Or we could build an R package with the data and publish it to CRAN,
  • Or we could drop the current data and replace it using data already hosted in a CRAN package.

Any of these alternatives would simplify managing the lesson and also installing and running its code.

I'm opening this issue following the suggestion by @tobyhodges in order to close #368 and move on with the discussion regarding the CRSs of this lesson's data.

Bests,

Alber

###############################################################################
# Check the data for the Carpentries lesson r-raster-vector-geospatial
###############################################################################

library(dplyr); library(purrr); library(sf); 
library(terra); library(tibble); library(utils);


#---- Setup ----

#NOTE: This variable should point to the episodes' directory.
episode_dir <- "~/Documents/github/datacarpentry/r-raster-vector-geospatial/episodes"

setwd(episode_dir)

# Download the lesson data using the script setup.R
source(file.path(episode_dir, "setup.R"))

# Find the directory with the data.
data_dir <- file.path(episode_dir, "data")
stopifnot("Data directory not found!" = dir.exists(data_dir))


#--- Utilitary ----

get_crs_wkt <- function(crs) { return(crs$wkt) }

get_epsg <- function(crs) { return(crs$epsg) }

has_z <- function(obj_sf) { return(!is.null(sf::st_z_range(obj_sf))) }

has_m <- function(obj_sf) { return(!is.null(sf::st_m_range(obj_sf))) }

get_input <- function(crs) { return(crs$input) }


#---- Raster data ----

raster_tb <-
    data_dir %>%
    list.files(pattern = "*.tif$", full.names = TRUE, recursive = TRUE) %>%
    tibble::as_tibble() %>%
    dplyr::rename(file_path = value) %>%
    dplyr::mutate(r = purrr::map(file_path, terra::rast),
                  crs_terra = purrr::map_chr(r, terra::crs),
                  crs_sf = purrr::map(r, sf::st_crs),
                  epsg = purrr::map_int(crs_sf, get_epsg),
                  epsg = dplyr::if_else(is.na(epsg), NA, paste("EPSG:", epsg)))

#NOTE: Some (30) rasters have NA instead of a valid CRS.
raster_tb %>%
    dplyr::count(epsg)

#NOTE: Re-projecting rasters using EPSG codes doesn't change their CRSs at all.
#      So, the lesson data can be re-projected using EPSG codes without
#      breaking the code in the lessons. 
raster_tb %>%
    dplyr::filter(!is.na(epsg)) %>%
    dplyr::mutate(r_proj = purrr::map2(r, epsg, terra::project),
                  r_proj_crs = purrr::map_chr(r_proj, terra::crs),
                  crs_diff = purrr::map2_dbl(crs_terra, r_proj_crs, 
                                             utils::adist)) %>%
    dplyr::count(crs_diff)


#---- Vector data ----

vector_tb <-
    data_dir %>%
    list.files(pattern = "*.shp$", full.names = TRUE, recursive = TRUE) %>%
    tibble::as_tibble() %>%
    dplyr::rename(file_path = value) %>%
    dplyr::mutate(v = purrr::map(file_path, sf::read_sf), 
                  v_crs = purrr::map(v, sf::st_crs),
                  v_wkt = purrr::map(v_crs, get_crs_wkt),
                  epsg = purrr::map_int(v_crs, get_epsg),
                  has_z = purrr::map_lgl(v, has_z),
                  has_m = purrr::map_lgl(v, has_m),
                  v_no_z = purrr::map(v, sf::st_zm),
                  crs_input = purrr::map_chr(v_crs, get_input))

#NOTE: There is a vector missing EPSG code.
vector_tb %>% 
    dplyr::filter(is.na(epsg)) %>%
    dplyr::select(file_path, epsg) %>%
    dplyr::mutate(file_path = basename(file_path)) %>%
    print(n = Inf)

#NOTE: There some vectors with Z coordinates, but all of them are 0s.
vector_tb %>%
    dplyr::filter(has_z) %>%
    dplyr::mutate(file_path = basename(file_path), 
                  z_range = purrr::map(v, sf::st_z_range)) %>%
    dplyr::select(file_path, has_z, z_range) %>%
    print(n = Inf) %>%
    pull(z_range)

# Add missing EPSG by hand.
vector_tb <-
    vector_tb %>%
    dplyr::mutate(epsg = dplyr::if_else((crs_input == "WGS 84" & is.na(epsg)), 
                                        4326, epsg)) %>%
    # Re-project using EPSGs.
    dplyr::mutate(new_v = purrr::map2(v_no_z, epsg, sf::st_transform), 
                  new_crs = purrr::map(new_v, sf::st_crs),
                  new_crs_wkt = purrr::map(new_crs, get_crs_wkt), 
                  crs_diff = purrr::map2_dbl(v_wkt, new_crs_wkt,
                  utils::adist))

#NOTE: The CRSs' WKT change after projection using EPSG codes.
vector_tb %>% 
    dplyr::select(file_path, crs_diff)
@tobyhodges
Copy link
Member

The example dataset is published on FigShare, where there is the option of creating a new version of the record if and when the file change. I think the record is owned by NEON, but I would be happy to try to coordinate with them to publish a new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants