Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curriculum Advisory Committee Recommendations from 2022 Q1 Meeting (Help wanted!) #368

Open
srappel opened this issue May 19, 2022 · 6 comments
Labels
help wanted Looking for Contributors

Comments

@srappel
Copy link
Member

srappel commented May 19, 2022

Hello maintainers and fellow Carpentries community members!

The Data Carpentry Geospatial Curriculum Advisory Committee had our 1st quarter meeting on March 29th and we have posted our minutes on the Curriculum Advisors repo.

You can view the minutes here, but I've copied the relevant recommendations below. We are calling for volunteers to help implement these important changes to the lessons. I will label this issue as "Help Wanted" and look forward to your contributions!

Please feel free to reach out to me or my co-chair Jeff Hollister (@jhollist) if you have questions about these recommendations or if you would like to bring something to our attention for future meetings of the CAC. We also encourage you to reach out to the maintainers of the lessons as you develop.

  1. Transition from PROJ and proj4strings
    • Lessons should be updated to reference coordinate systems by their EPSG codes from the EPSG Geodetic Parameter Dataset instead of using proj4strings. The Committee agreed that the alternative Well-known text (WKT) representation is unwieldy and unnecessary for most common use cases. WKT should be mentioned as an alternative to EPSG codes, especially where there is no existing EPSG standard. Lessons should include examples of converting between the EPSG and WKT representations.
  2. Deprecation of the sp, rgeos, and rgdal packages
    • The Committee agreed that the references to rgdal and rgeos be removed or replaced with references to the equivalent sf and terra functions as appropriate. This decision is closely paired with the rationale for choosing terra as the replacement for the raster package and aims to avoid code-breaking deprecations coming some time in 2023.
  3. Transition from raster to terra or stars
    • The terra package appears to be the most direct replacement for raster as it uses language which is similar to raster and common to other GIS. The Committee recommends that terra be adopted as a replacement to raster. Stars should be presented as an alternative to terra that may be faster in some cases or more appropriate for analyses with longitudinal elements.
@srappel srappel added the help wanted Looking for Contributors label May 19, 2022
@srappel srappel changed the title Curriculum Advisory Committee Recomemdations from 2022 Q1 meeting (Help wanted!) Curriculum Advisory Committee Recommendations from 2022 Q1 Meeting (Help wanted!) May 19, 2022
@gklarenberg
Copy link

Is anyone already working on the the raster to terra conversions? I can help with that (have been doing the conversions it for my own classes as well, and I am very familiar with the NEON data). And I support using terra here, not stars. The latter is good for more advanced users, but as an intro course, terra is appropriate.

@jebyrnes
Copy link

jebyrnes commented Feb 22, 2023 via email

albhasan added a commit to albhasan/r-raster-vector-geospatial that referenced this issue Feb 25, 2023
albhasan added a commit to albhasan/r-raster-vector-geospatial that referenced this issue Feb 25, 2023
albhasan added a commit to albhasan/r-raster-vector-geospatial that referenced this issue Feb 25, 2023
albhasan added a commit to albhasan/r-raster-vector-geospatial that referenced this issue Feb 25, 2023
albhasan added a commit to albhasan/r-raster-vector-geospatial that referenced this issue Mar 6, 2023
albhasan added a commit to albhasan/r-raster-vector-geospatial that referenced this issue Mar 9, 2023
albhasan added a commit to albhasan/r-raster-vector-geospatial that referenced this issue Mar 11, 2023
albhasan added a commit to albhasan/r-raster-vector-geospatial that referenced this issue Mar 14, 2023
albhasan added a commit to albhasan/r-raster-vector-geospatial that referenced this issue Mar 14, 2023
albhasan added a commit to albhasan/r-raster-vector-geospatial that referenced this issue Mar 14, 2023
albhasan added a commit to albhasan/r-raster-vector-geospatial that referenced this issue Mar 14, 2023
…ges with calls to terra package. datacarpentry#368 datacarpentry#363" because it was included by accident.
albhasan added a commit to albhasan/r-raster-vector-geospatial that referenced this issue Mar 16, 2023
albhasan added a commit to albhasan/r-raster-vector-geospatial that referenced this issue Mar 16, 2023
albhasan added a commit to albhasan/r-raster-vector-geospatial that referenced this issue Mar 17, 2023
@srappel
Copy link
Member Author

srappel commented Mar 17, 2023

It's so exciting to watch all these changes roll in!

@albhasan
Copy link
Contributor

Good evening,

The changes made so far addressed @srappel's issues 2 & 3. Regarding issue 1, Transition from PROJ and proj4strings , I took a look at the lesson's data and found the following:

  • The raster files are compatible with EPSG codes. I tested by projecting the raster using their own EPSG codes and them comparing the text description of their CRSs; I found no difference (see code below).
  • Most vector files' CRSs contain a EPSG code (the only exception is newPlots_latLon.shp but it easy to infer it). However, after applying the same procedure used for the raster data, I found differences in their CRSs' WKT before and after applying a projection. This implies the their CRS were build either using a PROJ string or an old EPSG database.
  • Some vector files have Z coordinates but they are all set to 0. This could make the sf package produce warnings or even throw errors.

I guess we could replace the vector data with re-projected versions and without the Z coordinates. In this way, we could start looking at issue 1 knowing the lesson's data isn't a source of PROJ strings. However, the lesson data isn't currently under version control. However, this raises a new question; could we also host the lesson's data along the lesson? Or even better, could we build an R package with the data? It wouldn't need to be on CRAN, it could be hosted in some public git repository and be installed using devtools::install_github

Bests,

Alber

#!/usr/bin/Rscript --vanilla
###############################################################################
# Check the data for the Carpentries lesson r-raster-vector-geospatial 
###############################################################################

library(dplyr)
library(purrr)
library(sf)
library(terra)
library(tibble)
library(utils)


#---- Setup ----

data_dir <- "~/Documents/github/datacarpentry/r-raster-vector-geospatial/episodes/data"

stopifnot("Data directory not found!" = dir.exists(data_dir))


#--- Utilitary ----

get_crs_wkt <- function(crs) {
    return(crs$wkt)
}

get_epsg <- function(crs) {
    return(crs$epsg)
}

get_input <- function(crs) {
    return(crs$input)
}

has_z <- function(obj_sf) {
    return(!is.null(sf::st_z_range(obj_sf)))
}

has_m <- function(obj_sf) {
    return(!is.null(sf::st_m_range(obj_sf)))
}


#---- Raster data ----

raster_tb <-
    data_dir %>%
    list.files(pattern = "*.tif$", 
	       full.names = TRUE,
	       recursive = TRUE) %>%
    tibble::as_tibble() %>%
    dplyr::rename(file_path = value) %>%
    dplyr::mutate(obj = purrr::map(file_path, terra::rast),
                  obj_crs = purrr::map_chr(obj, terra::crs),
                  obj_crs1 = purrr::map(obj_crs, sf::st_crs),
                  epsg = purrr::map_int(obj_crs1, get_epsg),
                  epsg = purrr::map2_chr("EPSG:", epsg, paste0),
                  new_obj = purrr::map2(obj, epsg, terra::project),
                  new_crs = purrr::map_chr(new_obj, terra::crs),
                  crs_diff = purrr::map2_dbl(obj_crs, new_crs, utils::adist))

print("NOTE: Re-projecting rasters using EPSG codes doesn't change their CRSs \
      at all.")
raster_tb %>%
    dplyr::select(obj_crs, new_crs, crs_diff) %>%
    print(n = Inf)


#---- Vector data ----

vector_tb <-
    data_dir %>%
    list.files(pattern = "*.shp$", 
	       full.names = TRUE,
	       recursive = TRUE) %>%
    tibble::as_tibble() %>%
    dplyr::rename(file_path = value) %>%
    dplyr::mutate(obj = purrr::map(file_path, sf::read_sf),
                  obj_crs = purrr::map(obj, sf::st_crs),
		  crs_wkt = purrr::map(obj_crs, get_crs_wkt),
                  epsg = purrr::map_int(obj_crs, get_epsg),
                  has_z = purrr::map_lgl(obj, has_z),
                  has_m = purrr::map_lgl(obj, has_m),
		  obj_no_z = purrr::map(obj, sf::st_zm),
		  crs_input = purrr::map_chr(obj_crs, get_input))

print("NOTE: There is a vector missing EPSG code.")
vector_tb %>%
    dplyr::filter(is.na(epsg)) %>%
    dplyr::select(file_path, epsg) %>%
    dplyr::mutate(file_path = basename(file_path)) %>%
    print(n = Inf)

print("NOTE: There some vectors with Z coordinates, but all of them are 0s")
vector_tb %>%
    dplyr::filter(has_z) %>%
    dplyr::mutate(file_path = basename(file_path),
		  z_range = purrr::map(obj, sf::st_z_range)) %>%
    dplyr::select(file_path, has_z, z_range) %>%
    print(n = Inf) %>%
    pull(z_range)

# Add missing EPSG by hand.
vector_tb <- 
    vector_tb %>%
    dplyr::mutate(epsg = dplyr::if_else((crs_input == "WGS 84" & is.na(epsg)), 
					4326, epsg)) %>%
    # Re-project using EPSGs.
    dplyr::mutate(new_obj = purrr::map2(obj_no_z, epsg, sf::st_transform),
                  new_crs = purrr::map(new_obj, sf::st_crs),
		  new_crs_wkt = purrr::map(new_crs, get_crs_wkt),
		  crs_diff = purrr::map2_dbl(crs_wkt, new_crs_wkt, 
					     utils::adist))

print("NOTE: The CRSs' WKT change after projection using EPSG codes.")
vector_tb %>%
    dplyr::select(file_path, crs_diff)

@tobyhodges
Copy link
Member

Thanks so much for the fantastic work here, @albhasan.

Regarding the versioning of the example data. The example dataset is published on FigShare, where there is the option of creating a new version of the record if and when the file change. I think the record is owned by NEON, but I would be happy to try to coordinate with them to publish a new version.

Finally, a suggestion: as you have addressed most of the points raised by the CAC, it might be best to close this issue and open a new one where the specific question of how to update the dataset can be discussed further. I'll be happy to re-post my comment there if you do.

@albhasan
Copy link
Contributor

Hi @tobyhodges,

I'm following your suggestion and I opened #426. Can you please re-post your comment there?

Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Looking for Contributors
Projects
None yet
Development

No branches or pull requests

5 participants