Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

result() throws an error with Notaro GCMs for more than one variable #399

Open
lindsayplatt opened this issue Nov 15, 2021 · 2 comments
Open

Comments

@lindsayplatt
Copy link

When I try to use result() after having queried more than one variable, it throws a delimiter parsing error.

library(geoknife)
query_pts <- structure(list(
  `1` = c(-88.4467147238926, 42.7996962344843), 
  `2` = c(-88.6432114598417, 43.6283635061587), 
  `3` = c(-91.7772962678006, 45.6126497556779), 
  `4` = c(-90.290648760424, 44.2691685292945), 
  `5` = c(-91.5175899000678, 45.7450307015114)), 
  class = "data.frame", row.names = c("X", "Y"))

# More than one var fails:
gcm_job <- geoknife(
  stencil = simplegeom(query_pts),
  fabric = webdata(
    url = "https://cida.usgs.gov/thredds/dodsC/notaro_GFDL_1980_1999",
    variables = c("evspsbl", "hfss", "mrso"),
    times = c('1999-01-01', '1999-01-15')
  ),
  wait = TRUE
)

# I get a parsing error:
# Error in value[[3L]](cond) : Delimiter parse fail.
my_data <- result(gcm_job)

# Just one var works:
gcm_job <- geoknife(
  stencil = simplegeom(query_pts),
  fabric = webdata(
    url = "https://cida.usgs.gov/thredds/dodsC/notaro_GFDL_1980_1999",
    variables = c("evspsbl"),
    times = c('1999-01-01', '1999-01-15')
  ),
  wait = TRUE
)

my_data <- result(gcm_job)
@dblodgett-usgs
Copy link
Collaborator

Huh -- I thought that this would have worked.

The mrso data have soil levels. It's not that you are asking for three variables, it's that one of them is breaking the parser.

Function is here: https://github.com/USGS-R/geoknife/blob/master/R/parseTimeseries.R#L19

This line is not handling that appropriately. https://github.com/USGS-R/geoknife/blob/master/R/parseTimeseries.R#L23

image

I don't have the brain space to work up a fix right now, but maybe you want to modify the parser to work the way you want? I'm not sure what the right way to handle this z dimension in the timeseries is.

The reprex below gets you to what you need and shows how to get as close as possible to the geoknife code that's failing. In essence, you have an additional column that is the z dimension soil_layer that is much the same as the time dimension stuck in there.

library(geoknife)
#> 
#> Attaching package: 'geoknife'
#> The following object is masked from 'package:stats':
#> 
#>     start
#> The following object is masked from 'package:graphics':
#> 
#>     title
#> The following object is masked from 'package:base':
#> 
#>     url
query_pts <- structure(list(
  `1` = c(-88.4467147238926, 42.7996962344843), 
  `2` = c(-88.6432114598417, 43.6283635061587), 
  `3` = c(-91.7772962678006, 45.6126497556779), 
  `4` = c(-90.290648760424, 44.2691685292945), 
  `5` = c(-91.5175899000678, 45.7450307015114)), 
  class = "data.frame", row.names = c("X", "Y"))

gcm_job <- geoknife(
  stencil = simplegeom(query_pts),
  fabric = webdata(
    url = "https://cida.usgs.gov/thredds/dodsC/notaro_GFDL_1980_1999",
    variables = c("mrso"),
    times = c('1999-01-01', '1999-01-15')
  ),
  wait = TRUE
)
#> Process Accepted

my_data <- result(gcm_job)
#> Error in value[[3L]](cond): Delimiter parse fail.

(my_job <- check(gcm_job))
#> $status
#> [1] "Process successful"
#> 
#> $URL
#> [1] "https://cida.usgs.gov:443/gdp/process/RetrieveResultServlet?id=12169137-96be-4d7e-8a6b-de88ee4f602cOUTPUT"
#> 
#> $statusType
#> [1] "ProcessSucceeded"
#> 
#> $percentComplete
#> [1] "100"

my_data <- readr::read_csv(my_job$URL, skip = 2)
#> New names:
#> * `MEAN(kg m-2)` -> `MEAN(kg m-2)...3`
#> * `MEAN(kg m-2)` -> `MEAN(kg m-2)...4`
#> * `MEAN(kg m-2)` -> `MEAN(kg m-2)...5`
#> * `MEAN(kg m-2)` -> `MEAN(kg m-2)...6`
#> * `MEAN(kg m-2)` -> `MEAN(kg m-2)...7`
#> Rows: 674 Columns: 7
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> dbl  (6): soil_layer(layer), MEAN(kg m-2)...3, MEAN(kg m-2)...4, MEAN(kg m-2...
#> dttm (1): TIMESTEP
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.

my_data
#> # A tibble: 674 x 7
#>    TIMESTEP            `soil_layer(layer)` `MEAN(kg m-2)...3` `MEAN(kg m-2)...4`
#>    <dttm>                            <dbl>              <dbl>              <dbl>
#>  1 1999-01-01 00:00:00                   0               47.9               47.6
#>  2 1999-01-01 00:00:00                   1              620.               620. 
#>  3 1999-01-01 01:00:00                   0               47.9               47.6
#>  4 1999-01-01 01:00:00                   1              620.               620. 
#>  5 1999-01-01 02:00:00                   0               47.9               47.6
#>  6 1999-01-01 02:00:00                   1              620.               620. 
#>  7 1999-01-01 03:00:00                   0               47.9               47.6
#>  8 1999-01-01 03:00:00                   1              620.               620. 
#>  9 1999-01-01 04:00:00                   0               47.9               47.6
#> 10 1999-01-01 04:00:00                   1              620.               620. 
#> # ... with 664 more rows, and 3 more variables: MEAN(kg m-2)...5 <dbl>,
#> #   MEAN(kg m-2)...6 <dbl>, MEAN(kg m-2)...7 <dbl>


# It's failing in here.
parseTimeseries(my_job$URL, delim = ",")
#> Error in value[[3L]](cond): Delimiter parse fail.

Created on 2021-11-15 by the reprex package (v2.0.0)

@lindsayplatt
Copy link
Author

I actually don't need mrso, so we can just skip that! No rush on a fix for us :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants