Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to select specific Argo BGC dataset (BR, BD, SR, SD) for a given avh.query? #6

Open
SBS-EREHM opened this issue Dec 7, 2023 · 2 comments

Comments

@SBS-EREHM
Copy link

SBS-EREHM commented Dec 7, 2023

Consider the following Argo BGC query params:

params  = {'startDate': '2023-01-01T00:00:00Z', 'endDate': '2023-12-06T20:51:28Z', 
         'platform': '6903578', 'data': 'cdom', 'source': 'argo_bgc'}
 
d = avh.query('argo', options=params, apikey=API_KEY, apiroot=API_PREFIX)

Question : How do I control which data is returned? For example, for profile 167, the following data is available from the Coriolis DAC:

  • BR6903578_167D.nc (BioArgo real-time data, descending profile)
  • BD6903578_167.nc (BioArgo delayed-mode data, ascending profile)
  • SR6903578_167D.nc (Synthetic real-time data, descending profile, includes interpolated values)
  • SD6903578_167.nc (Synthetic delayed-mode data, ascending profile, includes interpolated values)

It appears that they query is returning data from both B* and S* data sets (semi-visible via the plotly hover text's id parameter: some id's include "D" suffix). How do I select, for example, just BR or just BD data, excluding possibly interpolated S* data? Do I do this in the query, or by somehow select the desired data from the dataset returned by the query?

(Sent via email too, but realized git Issue was probably the right place to raise this.)

Thanks,

-eric rehm
sea-bird scientific

@SBS-EREHM SBS-EREHM changed the title How to select specific dataset (BR, BD, SR, SD) for a given avh.query? How to select specific Argo BGC dataset (BR, BD, SR, SD) for a given avh.query? Dec 7, 2023
@bkatiemills
Copy link
Member

Hi @SBS-EREHM, thanks for your question! Argovis returns an opinionated 'best choice' representation of every BGC profile:

  • If available, a BGC profile will always be derived from delayed mode synthetic (SD) files.
  • If no delayed synthetic file is available, we present the realtime synthetic (SR) profile.
  • Under no circumstances do we draw from the bioargo (B*) profiles.

You can always tell which synthetic mode you have by inspecting the upstream filename in the URL properties of the source key array elements on the data documents. For example, profile 6903578_167 was built from the synthetic delayed file at ftp://ftp.ifremer.fr/ifremer/argo/dac/coriolis/6903578/profiles/SD6903578_167.nc.

Please let me know if this makes sense; if you have a use case for requesting realtime data when delayed mode data is available, I'd love to hear about it.

@SBS-EREHM
Copy link
Author

SBS-EREHM commented Dec 16, 2023

Hi @BillMills. Thanks for the response and the opportunity to discuss this. I'll try to make my case here.

'Best data' is in the eye of the beholder. ;-) Synthetic data is not what I need for my use case.

My use case is that I want to evaluate the proper functioning of sensors on Argo Floats. So, I want the closest to "raw" data as possible for evaluating sensor operation, the adjustments made by RTQC, and eventually, adjustments made by DMQC. Interpolated synthetic data does not represent the precise data that was collected in the field by the BGC float and makes it harder to evaluate proper sensor operation.

So, for example, in oligotrophic waters, CDOM concentrations are quite low and the CDOM sensor raw counts (FLUORESCENCE_CDOM) and engineering values (CDOM = FLUORESCENCE_CDOM passed through the PREDEPLOYMENT_CALIB_EQUATION) shows distinct digitization (See BR/BD data below: dot, plus). The Synthetic data (SR/SD below: circle, X) no longer represents sensor output. They key here for using BIoArgo data to evaluate sensor operation is that the CDOM sensor does not produce those intermediate values in SR/SD files - they arise from the interpolation to other (synthesis) pressure levels. From (possibly) interpolated data, I can no longer produce proper sensor operations statistics.

So, it would be most useful if Argovis gave me the choice of what I determine is "best" for my analysis.

I found the Argovis query facilities so so useful to my use case - its delightful ability to query by date, region, bgc variable, and platform are powerful tools for my "data forensics" and monitoring the operation of literally tens of thousands of sensors. I hope I can have the best of both worlds - the choice selecting the data I want via query parameters, but also the choice of data processing level (BR vs. Bd vs. SR vs. SD) I feel solves my application the best.

Perhaps an analogous approach is where scientists can choose which NASA "product levels" they use ocean color satellite data analysis and algorithm development.
https://oceancolor.gsfc.nasa.gov/resources/docs/product-levels/

Thanks!

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants