Skip to content
This repository has been archived by the owner on Jan 3, 2024. It is now read-only.

Implement Level3 and Level4 subsetting logic #128

Open
lewismc opened this issue Feb 12, 2019 · 14 comments
Open

Implement Level3 and Level4 subsetting logic #128

lewismc opened this issue Feb 12, 2019 · 14 comments

Comments

@lewismc
Copy link
Member

lewismc commented Feb 12, 2019

Right now there is no standardized, user friendly mechanism for subsetting level3 or level4 data from PO.DAAC. This is a major issue and it is an area for podaacpy to address.
In order to subset, typical parameters include a dataset name OR short name AND space AND time AND variable information.
Currently, we do have a function for retrieving variables for a dataset however this service is only available for a very small number of PO.DAAC datasets.
In order to address this, we would need to obtain the variables from an OPeNDAP DDX response for a given datasetId OR shortName.
We should implement a utility function for obtaining the variables for a given dataset and then we should provide another utility function which enables passing in relevant parameters to do subsetting operation.

@lewismc lewismc added this to the 2.3.0 milestone Feb 12, 2019
@lewismc lewismc self-assigned this Feb 12, 2019
@lewismc
Copy link
Member Author

lewismc commented Feb 12, 2019

Thinking about L3 and L4 subsetting again.
Concerning only the variable names we would want to retrieve from Hyrax…
Consider the following DDX response.
Would we want to use the ‘standard_name’ e.g. ‘sea_surface_subskin_temperature’ or normal ‘name’ e.g. ‘sea_surface_temperature’? Which is it that we would use when calling the subset request from Hyrax?

<?xml version="1.0"?>
<Array name="sea_surface_temperature">
    <Attribute name="long_name" type="String">
        <value>sea surface sub-skin temperature</value>
    </Attribute>
    <Attribute name="standard_name" type="String">
        <value>sea_surface_subskin_temperature</value>
    </Attribute>
    <Attribute name="units" type="String">
        <value>K</value>
    </Attribute>
    <Attribute name="_FillValue" type="Int16">
        <value>-32768</value>
    </Attribute>
    <Attribute name="add_offset" type="Float32">
        <value>273.149994</value>
    </Attribute>
    <Attribute name="scale_factor" type="Float32">
        <value>0.00999999978</value>
    </Attribute>
    <Attribute name="valid_min" type="Int16">
        <value>-5000</value>
    </Attribute>
    <Attribute name="valid_max" type="Int16">
        <value>5000</value>
    </Attribute>
    <Attribute name="coordinates" type="String">
        <value>lon lat</value>
    </Attribute>
    <Attribute name="source" type="String">
        <value>REMSS AMSR2 L2B Version-8</value>
    </Attribute>
    <Attribute name="comment" type="String">
        <value>Microwave SST = approximately the top 1 milimeter</value>
    </Attribute>
    <Int16/>
    <dimension name="time" size="1"/>
    <dimension name="nj" size="4193"/>
    <dimension name="ni" size="243"/>
</Array>

@lewismc
Copy link
Member Author

lewismc commented Feb 12, 2019

As it turns out, we want to be retrieving the
<Array name="sea_surface_temperature">
as there is no guarantee that standard_name or long_name are the variable names.

@Omkar20895
Copy link
Collaborator

Omkar20895 commented Feb 13, 2019

Hi @lewismc, can you please pull up some link that would help me retrieve OpeNDAP DDX response or maybe some API link? I would like to explore more. Is pydap one of the utilities to access the data? When I google it all I could find was some documentation links. Thanks.

@lewismc
Copy link
Member Author

lewismc commented Feb 13, 2019

Hi @Omkar20895 yes one resides here. It's very simple XML.

@Omkar20895
Copy link
Collaborator

Hi @lewismc,

I see from the attached xml data that the following are the list of variables in the data:

  • lat
  • lon
  • time
  • sea_surface_temperature
  • sst_dtime
  • dt_analysis
    .
    .
    .
  • cloud_liquid_water
  • rain_rate

Correct me if I am getting something wrong. We can use XML Xpaths using lxml.xtree/xpath module in the variable utility function to get the variable names. I will start working on it and write a prototype. Can you please give me a link of the api to call with the dataset name or id from the prototype function to get the response? Please let me know if you have any questions/concerns in the approach.

I will also look for some documentation on l2, l3 and l4 subsetting on PO.DAAC forums, I need to read more on this, honestly I forgot a lot of stuff, please suggest any documentation that you would think would be helpful to me.

Thanks.

@lewismc
Copy link
Member Author

lewismc commented Feb 14, 2019

@Omkar20895 thanks for stepping up here.

We can use XML Xpaths using lxml.xtree/xpath module in the variable utility function to get the variable names.

The only issue just now is that Podaac.dataset_variables function is only available for a handful of datasets... this means that, by enlarge level 3 and 4 subsetting is unavailable using the Webservices API. We need to be more creative in the implementation!

I think we need to do as follows

Edit the function called 'dataset_variables` to do the following

Execute a granule_search (because we can only obtain a DDX for an OPeNDAP granule) e.g.

p.granule_search(dataset_id='PODAAC-GHGMB-3CO02', start_time='2019-02-12T01:30:00Z', end_time='2019-02-012T01:30:00Z')

this will return an atom XML response which include the OPeNDAP URL as follows

<entry>
...
   <link href="https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L3C/GLOB/AVHRR_SST_METOP_B_GLB/OSISAF/v1/2019/042/20190212000000-OSISAF-L3C_GHRSST-SSTsubskin-AVHRR_SST_METOP_B_GLB-sstglb_metop01_20190212_000000-v02.0-fv01.0.nc.html" rel="enclosure" title="OPeNDAP URL" type="text/html"/>

From that we can substitute the trailing .html for the .ddx suffix we want. This will allow us to retrieve the following https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L3C/GLOB/AVHRR_SST_METOP_B_GLB/OSISAF/v1/2019/042/20190212000000-OSISAF-L3C_GHRSST-SSTsubskin-AVHRR_SST_METOP_B_GLB-sstglb_metop01_20190212_000000-v02.0-fv01.0.nc.ddx
We can then parse our the variables from that XML response and return them to the user as a list.

Add a new function called subset_L3_L4_granules()

Essentially here we design the function as follows

subset_L3_L4_granules(dataset_id='', short_name='', start_time='', end_time='', bbox='', path='', variables'')

This allows us to essentially execute a granule_search, extract the OPeNDAP URL and then to execute the OPeNDAP request with all of the parameters. The response can be saved to wherever path is defined.

Does this make sense?

@Omkar20895
Copy link
Collaborator

@lewismc yes, it makes sense to me.

I have a question, the present example that you have mentioned above returns only one entry because it is subsetted using start time and end time. I tried removing start and end times, it returned multiple entries of the dataset and the set of variables in all the entries are common.

For example, I used:

p.granule_search(dataset_id='PODAAC-GHGMB-3CO02')

then replaced .html of each entry with .ddx and observed the set of variables for each entry.

I see that the entries are basically time series datasets, measuring the same set of variables at different time instances. But still, Is there a case where different entries have different variables?

@lewismc
Copy link
Member Author

lewismc commented Feb 16, 2019 via email

@Omkar20895
Copy link
Collaborator

Hi @lewismc, I have one last question, please bear with me here. Each XML tag(related to a variable) in the .ddx response either has Array or Grid as the tag name, are these in any way associated with different levels of datasets(l1, l2, l3)? The presence of both Array and Grid tag was not common, in most of the cases the .ddx response has Array tag, but there are some datasets which have both Array and Grid tags, for example, the .ddx response for the dataset PODAAC-GHGMB-3CO02: here.

If they are associated with different levels what are other possible tags(except Array or Grid)?
Please let me know. Once we are clear on these, I can start writing a prototype.

Thanks.

@lewismc
Copy link
Member Author

lewismc commented Feb 22, 2019

@Omkar20895

Each XML tag(related to a variable) in the .ddx response either has Array or Grid as the tag name, are these in any way associated with different levels of datasets(l1, l2, l3)?

No.

If you look at the following collapsed XML snippet you will see that the Grid child elements are the variables we are interested in. The top three Array elements define the structural dimensions for the Grid's

screen shot 2019-02-22 at 1 58 00 pm

@lewismc
Copy link
Member Author

lewismc commented Feb 22, 2019

So really, it is the Grid's which we are interested in extracting.

@Omkar20895
Copy link
Collaborator

@lewismc I am almost done writing new code(rewriting the original dataset_variables function) instead of using the API provided by web services as it does not support all the datasets. But, this increases dependency since we are basically providing a workaround, for example, what if replacing .html with .ddx does not work in the future? Feel free to correct me if I am missing something.

Please let me know your thoughts on this, in the meanwhile I will send a pull request for review.

@ShubhamShaswat
Copy link

ShubhamShaswat commented Apr 21, 2019

Hi,I like to help.So,going through comments and from my understanding the function 'dataset_variables' isn't working for all L3 and L4 datasets. The examples in the issue title as Updating dataset_variable to support L3 and L4 datasets #129 where Dataset id = PODAAC-SASSX-L3UCD deoesn't have OPENDAP URL links as per the the code line no 224
for link in dataset_links:
if(link.attrib['title'] == "OPeNDAP URL")
Therefore we get an empty dataset_url which gives the error

requests.exceptions.MissingSchema: Invalid URL '': No schema supplied. Perhaps you meant http://?

Do we want to handle this error or we want to find variables for this data using other methods?
If I am wrong please correct me

@lewismc
Copy link
Member Author

lewismc commented Apr 23, 2019

Hi @ShubhamShaswat did you see the proposed solution at the following PR #129 (comment)

@lewismc lewismc modified the milestones: 2.3.0, 2.4.0 Aug 7, 2019
@lewismc lewismc modified the milestones: 2.4.0, 2.5.0 Aug 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants