Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdc.csv not up to date with BC Conservation Data Centre (CDC) fish data #13

Open
lucy-schick opened this issue Apr 7, 2024 · 5 comments

Comments

@lucy-schick
Copy link

The Bug

The data in cdc.csv is not up to date with the BC Conservation Data Centre (CDC).

A Reprex

Using Oncorhynchus nerka as an example, in the CDC database Oncorhynchus nerka and many of its populations have a COSEWIC status (screenshot 1), but this is not present in cdc.csv (screenshot 2).

Other information is not up to date either, including review dates and Provincial statuses, so this makes me think the cdc.csv just hasn't been updated in a while (and github says it hasn't been edited in 4 years). Any guidance on an update workflow would be much appreciated. Thanks

Screen Shot 2024-04-07 at 3 10 10 PM

Screen Shot 2024-04-07 at 3 15 27 PM

@NewGraphEnvironment
Copy link
Contributor

NewGraphEnvironment commented Apr 8, 2024

This is an extremely useful package that we use all the time. Thank you so much for having it available. We are keen to contribute if we can be helpful.

export of both Results and Conservation Status Data look slightly different from cdc.csv (see below) so we are guessing there was some wrangling required originally and/or required due to changes in how BC Species & Ecosystems Explorer exports.

Had a look for some wrangling tracking in the repo but did not find it so either I missed it or perhaps someone talented (ie. Evan) may have something locally? We can look into putting together a PR for a workflow for updating the current csv using the same column names with or without some kind of template of past moves to work off of but figured we should most definitely check in first as there is likely lots we don't know.

It's good to see that BC Species & Ecosystems Explorer has been updating their COSEWIC info and it makes me wonder if there is access to that data raw through an API. I did a quick search of https://search.open.canada.ca/data/ through the front door and API to no prevail so guessing BC Species & Ecosystems Explorer is best option...

library("rgovcan")
library("ckanr")
library("tidyverse")

# set up the connection to the data portal
ckanr_setup(url = "https://open.canada.ca/data/en")

govcan_search(keywords = c("COSEWIC"), records = 100, format_results = TRUE) %>% 
  pull(resources) %>% 
  bind_rows()

image

image

exported and converted to csv
image

@joethorley
Copy link
Member

@newgraph-lschick and @NewGraphEnvironment - thanks for the interest in this package.

We are keen to keep it up to date. I'm the maintainer while @evanamiesgalonski is on leave so will make decisions.

I agree with your suggested outline. ie

  1. Ideally would pull using API but will settle for manual download if all that is available
  2. should wrangle from downloaded file (saved as csv) to format for import as data frame in package in data-raw.R script in data-raw directory so record of changes (I couldn't locate any record of how we made changes previously)
  3. as much as possible we want the CDC data as is on the CDC site.

If you are able to do a PR that fits with these requirements that would be fantastic. Let me know if I missed anything. Thanks!

@NewGraphEnvironment
Copy link
Contributor

Had an initial look at what is going on. Not surprisingly there is lots of complexity here related to past/current formats and even within content (species present before but no longer detailed etc.).

Will continue to work on this as time allows and we likely produce some sort of markdown review to communicate the story of the design decisions that were or will need to be made and potentially short term work arounds to fulfill current reporting obligations.

Worth some more effort to determine if indeed using the point and click BC Ecosystem Explorer interface is our only viable (and sad) option since the wrangle is not trivial (as usual).

@joethorley
Copy link
Member

Yes an API on the government sites would ensure data use can be automated minimizing errors and ensuring information is up to date....

@NewGraphEnvironment
Copy link
Contributor

NewGraphEnvironment commented May 15, 2024

looks like api access is not yet possible :<

answers are from contact at BC Conservation Data Centre

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants