Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Types not the same for indexing vs. iteration of catalog.datasets #361

Open
deeplycloudy opened this issue Apr 8, 2021 · 6 comments · May be fixed by #362
Open

Types not the same for indexing vs. iteration of catalog.datasets #361

deeplycloudy opened this issue Apr 8, 2021 · 6 comments · May be fixed by #362
Milestone

Comments

@deeplycloudy
Copy link
Contributor

Try to include:

  • Code Sample: here the URL doesn't matter, but is provided for convenience.
from datetime import datetime, timedelta
from siphon.catalog import TDSCatalog

day = datetime(2018, 5, 24)
start = day + timedelta(hours=16)
end = day + timedelta(hours=24)
url = f"https://data.nssl.noaa.gov/thredds/catalog/WRDD/OKLMA/{day:%Y/%m/%d}/catalog.xml"

cat = TDSCatalog(url)

print(type(cat.datasets))
# <class 'siphon.catalog.DatasetCollection'>

print(type(cat.datasets[0]))
# <class 'siphon.catalog.Dataset'>

for ds in cat.datasets:
    print(type(ds))
#<class 'str'>
#<class 'str'>
# etc.
  • Problem description: When indexing cat.datasets, the return type is a Dataset object, but when iterating the return type is a string.
  • Expected output: I expected the same return type, preferably a Dataset object. It would seem to be more Pythonic if DatasetCollection behaved like a list or tuple of objects.
  • Background: Since this catalog does not index the time of these data, I had to write a manual loop to parse the time from the dataset name. I used the indexing method to figure out what was in cat.datasets, and developed a parsing method based on that return type. That information was inaccurate when I converted the logic to a function to loop over datasets.
  • Which platform: Mac
  • Versions. Include the output of:
    • python --version: Python 3.6.12
    • `python -c 'import siphon; print(siphon.version)': 0.9
@dopplershift dopplershift added this to the 0.10 milestone Apr 8, 2021
@dopplershift
Copy link
Member

100% chance this is because .datasets is effectively a OrderedDict where we override __getitem__. Easiest fix is probably to also override __iter__ to iterate over values().

@dopplershift
Copy link
Member

Hrmmm... now that I think about this with just a bit more thought, this is standard dict behavior. For this code:

d = {'a': 1, 'b':2, 'c':3}

print(type(d['a']))
# <class 'int'>

for i in d:
    print(type(i))
# <class 'str'>
...

So right now DatasetCollection works exactly like a python dictionary. This is because you can get items out of DatasetCollection by position (i.e. [0]) or by name (i.e. my_data_file_name.nc). And to be clear, it's always had the dictionary semantics--we added later on the ability to use [0]. I'm not sure whether it'd be a good idea to fundamentally change those semantics...

@deeplycloudy
Copy link
Contributor Author

deeplycloudy commented Apr 8, 2021 via email

@dopplershift
Copy link
Member

👍 to updating the docstring. I can't think of anything other place unless we add some proper narrative documentation about TDSCatalog or an example explicitly about working with .datasets.

@deeplycloudy
Copy link
Contributor Author

OK, PR for the doc fix is above. I suggest also updating section 3. of the workshop tutorial on Siphon, which is the code I was copying when I confused myself.

@deeplycloudy
Copy link
Contributor Author

deeplycloudy commented Apr 9, 2021

This continues to be subtle and surprising to me. I thought I understood well enough to write this up for the tutorial, and then cat.datasets[1:3] gave me a list of datasets, as though cat.datasets were a list of Datasets. I thought I'd get the first three keys! In retrospect it makes sense… slicing is just fancy integers, so it switches to giving values instead of keys.

My troubleshooting was made further confusing by the fact that cat.datasets[1:3] looks like a list of keys, and so the repr might be a target for a usability fix?

I actually built in this latest confusion as a teachable moment in the siphon tutorial notebook.

The slicing behavior is not explicitly documented in my docstring change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants