Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_vpts() cannot read bioRad created CSV #654

Open
CeciliaNilsson709 opened this issue Feb 19, 2024 · 4 comments
Open

read_vpts() cannot read bioRad created CSV #654

CeciliaNilsson709 opened this issue Feb 19, 2024 · 4 comments
Assignees
Milestone

Comments

@CeciliaNilsson709
Copy link
Collaborator

Related to #653 and #635, there is still a descrepancy between the VPTS CSV format and how a VPTS dataframe looks like in bioRad. As a result, writing a VPTS dataframe to CSV and then reading it with read_vpts() doesn't work (the columns are not the same).

library(bioRad)
#> Welcome to bioRad version 0.7.3
#> using vol2birdR version 1.0.1 (MistNet not installed)
vpts_df <- as.data.frame(example_vpts)
write.csv(vpts_df, "vpts_df.csv")
read_vpts("vpts_df.csv")
#> New names:
#> • `` -> `...1`
#> Error: Field names in `schema` must match column names in data:
#> ℹ Field names: `radar`, `datetime`, `height`, `u`, `v`, `w`, `ff`, `dd`, `sd_vvp`, `gap`, `eta`, `dens`, `dbz`, `dbz_all`, `n`, `n_dbz`, `n_all`, `n_dbz_all`, `rcs`, `sd_vvp_threshold`, `vcp`, `radar_latitude`, `radar_longitude`, `radar_height`, `radar_wavelength`, `source_file`
#> ℹ Column names: `...1`, `radar`, `datetime`, `height`, `u`, `v`, `w`, `ff`, `dd`, `sd_vvp`, `gap`, `dbz`, `eta`, `dens`, `DBZH`, `n`, `n_dbz`, `n_all`, `n_dbz_all`, `rcs`, `sd_vvp_threshold`, `radar_latitude`, `radar_longitude`, `radar_height`, `radar_wavelength`, `day`, `sunrise`, `sunset`

vpts_df <- as.data.frame(example_vpts, geo =FALSE, suntime = FALSE)
write.csv(vpts_df, "vpts_df.csv")
read_vpts("vpts_df.csv")
#> New names:
#> • `` -> `...1`
#> Error: Field names in `schema` must match column names in data:
#> ℹ Field names: `radar`, `datetime`, `height`, `u`, `v`, `w`, `ff`, `dd`, `sd_vvp`, `gap`, `eta`, `dens`, `dbz`, `dbz_all`, `n`, `n_dbz`, `n_all`, `n_dbz_all`, `rcs`, `sd_vvp_threshold`, `vcp`, `radar_latitude`, `radar_longitude`, `radar_height`, `radar_wavelength`, `source_file`
#> ℹ Column names: `...1`, `radar`, `datetime`, `height`, `u`, `v`, `w`, `ff`, `dd`, `sd_vvp`, `gap`, `dbz`, `eta`, `dens`, `DBZH`, `n`, `n_dbz`, `n_all`, `n_dbz_all`, `rcs`, `sd_vvp_threshold`
Created on 2024-02-19 with reprex v2.1.0
@peterdesmet
Copy link
Collaborator

@adokter @iskandari @bart1 @CeciliaNilsson709 I think we need a roadmap to tackle this, since the VPTS format is now half supported in bioRad:

  1. Reading from hdf5 or csv results in different vpts objects (Combining hdf5 and csv VP files #653). This should be patched.
  2. The columns in a bioRad vpts object are not consistent with the VPTS CSV format. I think we should have one format throughout. This is a major change.
  3. Ideally the change to one format through is combined with making bioRad vpts objects a data.frame directly (VPTS objects as data.frame #568). This is a major change.
  4. as.data.frame() currently has options (suntime and geo) resulting in different flavour of data frames. As described above, when using write.csv() you no longer end up with a VPTS CSV. This could be fixed with a write_vpts() function (removing those columns), no longer providing those options and/or having vpts objects be data.frames.

My 2 cents is that we should move towards this situation:

functions

@iskandari
Copy link
Collaborator

iskandari commented Feb 20, 2024

  1. as.data.frame() currently has options (suntime and geo) resulting in different flavour of data frames. As described above, when using write.csv() you no longer end up with a VPTS CSV. This could be fixed with a write_vpts() function (removing those columns), no longer providing those options and/or having vpts objects be data.frames.

Agreed that it would be simpler to have vpts objects as data frames, but then a side effect is that we lose metadata relevant to vp file creation. From the example in #653:

vpts_hdf5<- bind_into_vpts(read_vpfiles(c(hdf5_1, hdf5_2)))
vpts_hdf5$attributes$how$task_args
[1] "azimMax=360.000000,azimMin=0.000000,layerThickness=200.000000,nLayers=25,rangeMax=35000.000000,rangeMin=5000.000000,elevMax=90.000000,elevMin=0.000000,radarWavelength=5.300000,useClutterMap=0,clutterMap=,fitVrad=1,exportBirdProfileAsJSONVar=0,minNyquist=5.000000,maxNyquistDealias=25.000000,birdRadarCrossSection=11.000000,stdDevMinBird=2.000000,cellEtaMin=11500.000000,etaMax=36000.000000,dbzType=DBZH,requireVrad=0,dealiasVrad=1,dealiasRecycle=1,dualPol=0,singlePol=1,rhohvThresMin=0.950000,resample=0,resampleRscale=500.000000,resampleNbins=100,resampleNrays=360,mistNetNElevs=5,mistNetElevsOnly=1,useMistNet=0,mistNetPath=/MistNet/mistnet_nexrad.pt,areaCellMin=0.500000,cellClutterFractionMax=0.500000,chisqMin=0.000010,clutterValueMin=0.100000,dbzThresMin=0.000000,fringeDist=5000.000000,nBinsGap=8,nPointsIncludedMin=25,nNeighborsMin=5,nObsGapMin=5,nAzimNeighborhood=3,nRangNeighborhood=3,nCountMin=4,refracIndex=0.964000,cellStdDevMax=5.000000,absVDifMax=10.000000,vradMin=1.000000"

This irreversibility should be communicated clearly. Maybe vpts data frames should be able to retain other information, but then we implement a write_vpts() that is called, for example, by extending write.csv(format='vpts')

@iskandari iskandari self-assigned this Feb 20, 2024
@iskandari iskandari added this to the 0.8.0 milestone Feb 20, 2024
@peterdesmet
Copy link
Collaborator

You can add custom attributes to data.frames, which we could use for the vpts metadata. These attributes are not lost when using other functions, like dplyr's filter(). Note: I'm not very familiar with attributes, but it seems useful here.

library(dplyr)

# A dataframe has default attributes
df <- iris
attributes(df)
#> $names
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
#>  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
#>  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
#>  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
#>  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
#>  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
#> [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
#> [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
#> [145] 145 146 147 148 149 150

# One can add custom attributes (here the list "metadata")
metadata <- list(radar = "bejab", regular = FALSE)
attr(df, "metadata") <- metadata
attributes(df)
#> $names
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
#>  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
#>  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
#>  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
#>  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
#>  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
#> [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
#> [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
#> [145] 145 146 147 148 149 150
#> 
#> $metadata
#> $metadata$radar
#> [1] "bejab"
#> 
#> $metadata$regular
#> [1] FALSE

# If the data frame is handled by other functions, the attributes are retained
df %>%
  filter(Species == "virginica") %>%
  attr("metadata")
#> $radar
#> [1] "bejab"
#> 
#> $regular
#> [1] FALSE

Created on 2024-02-20 with reprex v2.1.0

In addition, we should add a vpts class to the data.frame, so it is easy to recognize:

library(dplyr)
df <- iris
class(df) <- c("vpts", class(df))
class(df)
#> [1] "vpts"       "data.frame"

# Class is retained by dplyr
df %>%
  filter(Species == "virginica") %>%
  class()
#> [1] "vpts"       "data.frame"

# Class can also be added to a tibble
df <- iris
dft <- as_tibble(df)
class(dft) <- c("vpts", class(dft))
class(dft)
#> [1] "vpts"       "tbl_df"     "tbl"        "data.frame"

# Class is retained by dplyr
dft %>%
  filter(Species == "virginica") %>%
  class()
#> [1] "vpts"       "tbl_df"     "tbl"        "data.frame"

Created on 2024-02-20 with reprex v2.1.0

@bart1
Copy link
Collaborator

bart1 commented Feb 26, 2024

A few remarks from my side. For move2 I pretty much do everything in the way @peterdesmet describes (based on sf). Some extra properties are retained as attributes that works quite well and you can update these when required using custom methods. For that it is indeed important to add the vpts class on top of a data frame. For sf/move2 it does not matter if the underlying data.frame is a tbl or a real data.frame. This helps as sometimes tbl are considerably faster for example when reading using readr or vroom.

This is also quite interesting as a read for restoring objects after dplyr operations: https://dplyr.tidyverse.org/reference/dplyr_extending.html

Maybe if it requires too many breaking changes you could also call it vpts_df

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants