Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TDSCatalog not returning all datasets #759

Closed
briantjacobs opened this issue Apr 25, 2024 · 2 comments · Fixed by #760
Closed

TDSCatalog not returning all datasets #759

briantjacobs opened this issue Apr 25, 2024 · 2 comments · Fixed by #760

Comments

@briantjacobs
Copy link

briantjacobs commented Apr 25, 2024

Hi, I ran the following siphon code yesterday and received a full response. However, the catalog changed throughout the day and added one new dataset epic_1b_20240413222222_03.h5. When I reran this code, it did not return the additional dataset. Is there any local caching going on? and i'm missing the final dataset

	from siphon.catalog import TDSCatalog
	remote_cat = TDSCatalog('https://opendap.larc.nasa.gov/opendap/DSCOVR/EPIC/L1B/2024/04/catalog.xml')
	print(remote_cat.datasets[-1].name)
	# 'epic_1b_20240413203419_03.h5' <-- not the most recent

When I request/parse the data without siphon i can see the latest data:

	import requests
	import xmltodict
	response = requests.get("https://opendap.larc.nasa.gov/opendap/DSCOVR/EPIC/L1B/2024/04/catalog.xml")
	data = xmltodict.parse(response.text)	
	print(data["thredds:catalog"]["thredds:dataset"]["thredds:dataset"][-2]["@name"])
	# 'epic_1b_20240413203419_03.h5' <-- 2nd most recent, same as above
	print(data["thredds:catalog"]["thredds:dataset"]["thredds:dataset"][-1]["@name"])
	# 'epic_1b_20240413222222_03.h5' <-- most recent

Using:
MacOS 14.4.1
python --version: Python 3.11.3
python -c 'import siphon; print(siphon.__version__): 0.9

edit: revising due to misunderstanding

@dopplershift
Copy link
Member

dopplershift commented Apr 25, 2024

It's not a caching issue, but a problem in how we parse the catalog, specifically when individual datasets have listed access methods...like is done on the NASA Hyrax server. Essentially we never properly set up access methods for the last dataset, and it gets dropped.

@dopplershift dopplershift added this to the 0.10 milestone Apr 25, 2024
@briantjacobs briantjacobs changed the title TDSCatalog not returning all datasets when re-fetching data TDSCatalog not returning all datasets Apr 25, 2024
@briantjacobs
Copy link
Author

Gotcha, thanks. Looking back at the metadata, I never had the final dataset, so nothing to do with re-fetching the data. But glad I could surface the bug of sorts. Revising the bug report/title for clarity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants