Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.x] and up: Open dap4 or other alternative protocol address via NetcdfDataset(s) #985

Open
1 task done
rschmunk opened this issue Feb 24, 2022 · 17 comments
Open
1 task done
Labels
bug Something isn't working

Comments

@rschmunk
Copy link
Contributor

rschmunk commented Feb 24, 2022

Versions impacted by the bug

v5.x, v6.x, v7.x

What went wrong?

A developer at OpenDAP queried about using Panoply for opening a remote dataset that uses a non http/https/ftp protocol, and more specifically the dap4 protocol, e.g. dap4://test.opendap.org/opendap/some/path/to/fnoc1.nc

The NJ methods in NetcdfDataset and NetcdfDatasets that Panoply uses for acquiring a dataset expect a DatasetUrl rather than a URL, and constructing a DatasetUrl accepts such alternative protocol (*).

But if a DatasetUrl using a dap4 protocol is passed to one of the acquireDataset methods, the result is "Unknown service type".

So is there some alternative way to acquire (enhanced) a dap4 protocol DatasetUrl? Or is this an accidental or planned omission?

(*) Actually, it appears a DatasetUrl can be constructed for just about any supposed protocol, including "foo". However the start of that class defines arrays FRAGPROTOCOLS and FRAGPROTOSVCTYPE with what I presume are acceptable alternatives.

Relevant stack trace

No response

Relevant log messages

No response

If you have an example file that you can share, please attach it to this issue.

If so, may we include it in our test datasets to help ensure the bug does not return once fixed?
Note: the test datasets are publicly accessible without restriction.

N/A

Code of Conduct

  • I agree to follow the UCAR/Unidata Code of Conduct
@rschmunk rschmunk added the bug Something isn't working label Feb 24, 2022
@haileyajohnson
Copy link
Member

When you get the "Unknown service type" error message, does it include the name of the service or just an empty string? (This helps diagnose whether the failure is coming from assigning the service or from finding the correct class to handle it)

@rschmunk
Copy link
Contributor Author

@haileyajohnson, I just belatedly noticed you had commented on this. The error message is actually "Unknown service type: DAP4".

In both NetcdfDataset and NetcdfDatasets, when durl.getServicetype() is called, the only two allowed cases i the following switch block are File and HTTPServer.

BTW: This issue tangentially came up today because developers at GSFC and JPL were having trouble using Panoply to access data on an OpenDAP server because of DAP2 vs DAP4 protocol confusion. Panoply passes an https:// address to NJ to acquire the remote dataset, and it seems that NJ is trying to use the DAP2 protocol but the server wants DAP4.

@haileyajohnson
Copy link
Member

So that means it's getting the correct service type, but not finding the right NetcdfFileProvider, in this case, DapNetcdfFileProvider. Is it possible the build isn't bringing in the dap4 module?

In both NetcdfDataset and NetcdfDatasets, when durl.getServicetype() is called, the only two allowed cases i the following switch block are File and HTTPServer.

DAP4 actually shouldn't be reaching that block at all, it should be returning a provider from the first for loop in that method.

@rschmunk
Copy link
Contributor Author

@haileyajohnson, So apparently the dap4 package is not included when I build netcdfAll?

I see it is when I build toolsUI and that toolsUI will try to open a remote dataset whose URL begins with the dap4 protocol. However, both cases I just tested failed. One with some of Container parsing error, and in the other, the toolsUI simply locked up.

@haileyajohnson
Copy link
Member

@rschmunk I asked around and got some background on this issue - apparently the dap4 package is intentionally left out of netcdfAll because it has some major bugs in v5+ (including what you're seeing in toolsui). To be honest, it doesn't look like it will be a quick fix, but it is back on our radar now.

@rschmunk
Copy link
Contributor Author

@haileyajohnson, I am informed that NASA's NGAP project is pushing to switch over to DAP4, so sooner is better than later for anyone using NJ to access one of the associated dataset repos.

@rschmunk
Copy link
Contributor Author

What is the approved way of opening a remote DAP4 file or catalog?

It appears that as of a recent snapshot commit, the DAP4 code had been updated and is now included by default in the netcdfAll build.

However, I don't seem to be able to use it.

I have been trying to use NetcdfFiles.open(String) to open an example remote dataset on a DAP4 server maintained by the OpenDAP developers. If that String begins with "dap4://", thenI just get back a "No such file or directory" exception. See stacktrace below using an NJ snapshot from May 30.

I can access that file if I replace "dap4://" with "http://", but then I just get errors when trying to variables within.

Maybe there's a problem with the server config or the example file address I have been given (I'll query someone at OpenDAP about that shortly), but I'd like to be sure that I am at least starting with the correct scheme for attempting to open the file.

--
Sample stacktrace using "dap4://" at start of remote file address:

java.io.FileNotFoundException: dap4:/test.opendap.org:8080/opendap/dmrpp_test_files/ATL03_20181228015957_13810110_003_01.1var.h5.dmrpp (No such file or directory) at java.base/java.io.RandomAccessFile.open0(Native Method) at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:346) at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:260) at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:214) at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:128) at ucar.unidata.io.RandomAccessFile.<init>(RandomAccessFile.java:331) at ucar.unidata.io.RandomAccessFile.acquire(RandomAccessFile.java:192) at ucar.nc2.NetcdfFiles.getRaf(NetcdfFiles.java:465) at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:274) at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:243) at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:216) at gov.nasa.giss.data.nc.NcDataset.init(NcDataset.java:458)

@DennisHeimbigner
Copy link
Collaborator

try this URL:

http:/test.opendap.org:8080/opendap/dmrpp_test_files/ATL03_20181228015957_13810110_003_01.1var.h5.dmrpp#dap4

@rschmunk
Copy link
Contributor Author

Hah! That's actually one of the addresses I have been trying to test accessing. It's the "#dap4" appended to the URL that makes a difference.

@rschmunk
Copy link
Contributor Author

Okay, just loaded the catalog http://test.opendap.org/opendap/. Navigating down the tree to some random HD5 file, the http:// URL that is reported for the file is no good. It does work if I copy the address and append "#dap4" to it.

That's... awkward.

@DennisHeimbigner
Copy link
Collaborator

I will investigate why "dap4:" does not work.
The problem is that the URL must somehow inform your client program what protocol to use to access the data: DAP4 in this case.
Two hints available are the "dap4:" protocol or appending "#dap4" to the URL.
Not sure why "dap4:" is not working. I thought I was testing for that.
One other thing you need to be aware of has to do with accessing Hyrax servers.
There is and ongoing issue about how to handle checksums WRT the DAP4 specification.
See OPENDAP/dap4-specification#1
and
OPENDAP/dap4-specification#6.
I have a temporary fix as follows. Append to your URL, the following string: "#hyrax".
So, for example you should specify something like this:

dap4:....#hyrax" -- assuming the dap: protocol was being recognized.
or
http:....#dap4&hyrax

@DennisHeimbigner
Copy link
Collaborator

One other point. The client program can interrogate the server
and determine the proper protocol by looking at the response.
But this functionality is not yet published for thredds, and I do not
think it works for Hyrax.

@rschmunk
Copy link
Contributor Author

rschmunk commented Nov 3, 2023

@DennisHeimbigner, Did you ever look at this any further? I see no related commits, so maybe not?

I recently heard from a couple people at NASA/JPL placing data on an agency DAAC who wanted to know more about Panoply's (and hence netCDF-Java) ability to access data on a DAP4 archive. As an example, they cited a dataset described at an
opendap.earthdata.nasa.gov address. The actual data URL as given there is the same minus the .dmr.html extensions.

In testing again a moment ago, I'm using an NJ 5.5.4 snapshot from a week ago.

That address, as before, gets back a 405 server response when Panoply feeds it as is to the NJ library.

If I change the address so that it starts with dap4: rather than https:, the response is a FileNotFound, as follows, when my code attempts NetcdfFiles.open ( dap4addressStr );

java.io.FileNotFoundException: dap4:/opendap.earthdata.nasa.gov/collections/C2706510710-POCLOUD/granules/measures_esdr_as_metopb_l2_wind_stress_48195_v1.1_s20220101-000357-e20220101-014518_ancillary (No such file or directory)
	at java.base/java.io.RandomAccessFile.open0(Native Method)
	at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:344)
	at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:259)
	at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:213)
	at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:127)
	at ucar.unidata.io.RandomAccessFile.<init>(RandomAccessFile.java:331)
	at ucar.unidata.io.RandomAccessFile.acquire(RandomAccessFile.java:192)
	at ucar.nc2.NetcdfFiles.getRaf(NetcdfFiles.java:465)
	at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:274)
	at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:243)
	at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:216)
	at mycode...

So using the https address but appending #dap4 as you previously suggested, the remote dataset is apparently successfully loaded. However, if I try extract a variable and make a plot... a Dap4Exception results due to a "Malformed chunk source".

dap4.core.util.DapException: dap4.core.util.DapExc
eption: dap4.core.util.DapException: Malformed chunked source
	at dap4.dap4lib.HttpDSP.loadDAP(HttpDSP.java:113)
	at dap4.dap4lib.cdm.nc2.DapNetcdfFile.ensuredata(DapNetcdfFile.java:351)
	at dap4.dap4lib.cdm.nc2.DapNetcdfFile.readData(DapNetcdfFile.java:277)
	at ucar.nc2.Variable.reallyRead(Variable.java:797)
	at ucar.nc2.Variable._read(Variable.java:736)
	at ucar.nc2.Variable.read(Variable.java:614)
	at ucar.nc2.dataset.VariableDS.reallyRead(VariableDS.java:471)
	at ucar.nc2.dataset.VariableDS._read(VariableDS.java:444)
	at ucar.nc2.dataset.VariableDS._read(VariableDS.java:454)
	at ucar.nc2.Variable.read(Variable.java:600)
	at ucar.nc2.Variable.read(Variable.java:546)
        ...
Caused by: dap4.core.util.DapException: dap4.core.util.DapException: Malformed chunked source
	at dap4.dap4lib.D4DSP.loadDAP(D4DSP.java:200)
	at dap4.dap4lib.HttpDSP.loadDAP(HttpDSP.java:111)
	... 29 more
Caused by: dap4.core.util.DapException: Malformed chunked source
	at dap4.dap4lib.DeChunkedInputStream.readChunk(DeChunkedInputStream.java:228)
	at dap4.dap4lib.DeChunkedInputStream.read(DeChunkedInputStream.java:146)
	at dap4.dap4lib.DeChunkedInputStream.read(DeChunkedInputStream.java:135)
	at dap4.dap4lib.D4DataCompiler.compileAtomicVar(D4DataCompiler.java:169)
	at dap4.dap4lib.D4DataCompiler.compileVar(D4DataCompiler.java:131)
	at dap4.dap4lib.D4DataCompiler.compile(D4DataCompiler.java:108)
	at dap4.dap4lib.D4DSP.loadDAP(D4DSP.java:195)
	... 30 more

The DAAC is apparently not a Hyrax server, which you indicated might be another potential problem to cope with.

In your last comment, you mentioned "client program can interrogate the server and determine the proper protocol by looking at the response". How? Is that a matter of decrypting an error message, or is there an actual method one can call to get that info?

I should probably ask, do my troubles above interconnect with #1232?

/thx

@DennisHeimbigner
Copy link
Collaborator

Sorry, this may have got lost in my stack. Let me do some checking.

@DennisHeimbigner
Copy link
Collaborator

I had the fix, but apparently I got side tracked. Anyway, see PR #1255

@rschmunk
Copy link
Contributor Author

rschmunk commented Nov 8, 2023

@DennisHeimbigner, After changing one line of my code in Panoply that calls NJ and using the NJ 5.5.4 snapshot with yesterday's commits, I successfully acquired a remote DAP4 file using the dap4://... prefix.

However, I am still getting Malformed Chunk exceptions trying to read the actual data from the JPL earthdata.nasa.gov address I mentioned above. There are also a couple other sample datasets served via the earthdata.nasa.gov proxy that when trying to get the data, the process just never comes back with an answer. One of these samples is a DAP4 trajectory dataset, and the other is a non-DAP4 file.

Also, please doublecheck lines 132 and 133 of the updated DapNetcdfFile. The code is setting the xuri scheme to "https", but the comments suggest it's supposed to be set to "http" because test.opendap.org doesn't speak https. Is there an error there, or am I just reading the comments wrong?

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Nov 8, 2023

OOPS!
I was testing if the test.opendap.org test server was no accepting https requests (it is still not doing so).
I apparently forgot to switch back to http. I will put up a pr for it shortly.
As to the malformed chunks, my best guess is that this is the checksum problem I described above.
If you accessing a Hyrax-based server, then this is not an easy fix.
If you are using the lastest master that includes PR #1211,
then this may work; append "?checksumdap4.checksum=true" to the end of your url.
It probably won't because of this issue: OPENDAP/dap4-specification#6.
I have no current fix for this problem yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants