Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle dryad URLs #50

Open
isteves opened this issue Jun 29, 2018 · 5 comments
Open

Handle dryad URLs #50

isteves opened this issue Jun 29, 2018 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@isteves
Copy link
Collaborator

isteves commented Jun 29, 2018

Currently, because of the way the data URL for dryad is constructed, it doesn't work with our function. check_version ends up looking for nonsensical results because it keeps chunking the URL and eventually looking for anything that matches 1. I've changed the breaking point to nchar(pid) > 5 (instead of 0) to account for this to some extent. 4163fb9

Not sure what the logic of dryad URL's is, so more investigation is needed!

download_d1_data("https://datadryad.org/bitstream/handle/10255/dryad.181477/experiement1.txt?sequence=1", ".")
@mbjones
Copy link
Member

mbjones commented Jun 29, 2018

For some related issues on the structure of Dryad identifiers in DataONE, see https://redmine.dataone.org/issues/7896

@gothub gothub self-assigned this Aug 10, 2018
@gothub
Copy link
Collaborator

gothub commented Oct 23, 2018

@brunj7 what is the origin of the URL in the above example from @isteves ? It doesn't look like a DataONE Dryad identifier or a DataONE URL. The changes that we discussed to make check_version more efficient would only work for DataONE identifiers or DataONE URLs.

@brunj7
Copy link
Collaborator

brunj7 commented Oct 23, 2018

@gothub sorry for the confusion. The idea is that scientists could also go on each data repository and get the URL from there. The KNB check_version("https://knb.ecoinformatics.org/knb/d1/mn/v2/object/msleckman.40.1") seems to conform to what we discussed; but we should also handle PASTA check_version("https://pasta.lternet.edu/package/data/eml/edi/195/2/51abf1c7a36a33a2a8bb05ccbf8c81c6").

The DRYAD URL comes from this package https://datadryad.org/resource/doi:10.5061/dryad.7ns4pk2 for the dataset experiment_1.txt. It seems that https://datadryad.org/bitstream/handle/10255/dryad.181477/experiement1.txt will also resolve and if I search for dryad.181477 on their repo I find the corresponding data package; so more likely their internal identifier?

Side note: when I search on dataONE for this DOI (10.5061/dryad.7ns4pk2) I get 5 hits...more likely related to the problem Matt mentioned, but if I search for the "DRYAD" dataset identifier (dryad.181477) I get 0 hit.

So we might have to understand the URL logic behind DRYAD if we want to support it.

@gothub
Copy link
Collaborator

gothub commented Oct 23, 2018

Here is the corresponding DataONE URL for the above Dryad id: https://cn.dataone.org/cn/v2/resolve/https://doi.org/10.5061/dryad.7ns4pk2/1/bitstream

@brunj7
Copy link
Collaborator

brunj7 commented Oct 26, 2018

@gothub following our discussion I think it would make sense to add a rule to prioritize the DataONE URLs and then default to the current system if it fails to make the fct more efficient.

This being said that does not solve the mapping problem between DRAYD URLs and corresponding DataONE ones.

@brunj7 brunj7 added the enhancement New feature or request label Oct 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants