Handle dryad URLs #50

isteves · 2018-06-29T17:23:13Z

Currently, because of the way the data URL for dryad is constructed, it doesn't work with our function. check_version ends up looking for nonsensical results because it keeps chunking the URL and eventually looking for anything that matches 1. I've changed the breaking point to nchar(pid) > 5 (instead of 0) to account for this to some extent. 4163fb9

Not sure what the logic of dryad URL's is, so more investigation is needed!

download_d1_data("https://datadryad.org/bitstream/handle/10255/dryad.181477/experiement1.txt?sequence=1", ".")

The text was updated successfully, but these errors were encountered:

mbjones · 2018-06-29T22:46:22Z

For some related issues on the structure of Dryad identifiers in DataONE, see https://redmine.dataone.org/issues/7896

gothub · 2018-10-23T03:04:40Z

@brunj7 what is the origin of the URL in the above example from @isteves ? It doesn't look like a DataONE Dryad identifier or a DataONE URL. The changes that we discussed to make check_version more efficient would only work for DataONE identifiers or DataONE URLs.

brunj7 · 2018-10-23T04:58:41Z

@gothub sorry for the confusion. The idea is that scientists could also go on each data repository and get the URL from there. The KNB check_version("https://knb.ecoinformatics.org/knb/d1/mn/v2/object/msleckman.40.1") seems to conform to what we discussed; but we should also handle PASTA check_version("https://pasta.lternet.edu/package/data/eml/edi/195/2/51abf1c7a36a33a2a8bb05ccbf8c81c6").

The DRYAD URL comes from this package https://datadryad.org/resource/doi:10.5061/dryad.7ns4pk2 for the dataset experiment_1.txt. It seems that https://datadryad.org/bitstream/handle/10255/dryad.181477/experiement1.txt will also resolve and if I search for dryad.181477 on their repo I find the corresponding data package; so more likely their internal identifier?

Side note: when I search on dataONE for this DOI (10.5061/dryad.7ns4pk2) I get 5 hits...more likely related to the problem Matt mentioned, but if I search for the "DRYAD" dataset identifier (dryad.181477) I get 0 hit.

So we might have to understand the URL logic behind DRYAD if we want to support it.

gothub · 2018-10-23T17:08:37Z

Here is the corresponding DataONE URL for the above Dryad id: https://cn.dataone.org/cn/v2/resolve/https://doi.org/10.5061/dryad.7ns4pk2/1/bitstream

brunj7 · 2018-10-26T18:40:57Z

@gothub following our discussion I think it would make sense to add a rule to prioritize the DataONE URLs and then default to the current system if it fails to make the fct more efficient.

This being said that does not solve the mapping problem between DRAYD URLs and corresponding DataONE ones.

gothub self-assigned this Aug 10, 2018

brunj7 added the enhancement New feature or request label Oct 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle dryad URLs #50

Handle dryad URLs #50

isteves commented Jun 29, 2018

mbjones commented Jun 29, 2018

gothub commented Oct 23, 2018

brunj7 commented Oct 23, 2018 •

edited

gothub commented Oct 23, 2018

brunj7 commented Oct 26, 2018

Handle dryad URLs #50

Handle dryad URLs #50

Comments

isteves commented Jun 29, 2018

mbjones commented Jun 29, 2018

gothub commented Oct 23, 2018

brunj7 commented Oct 23, 2018 • edited

gothub commented Oct 23, 2018

brunj7 commented Oct 26, 2018

brunj7 commented Oct 23, 2018 •

edited