Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding New Databases #181

Open
elifcevrim opened this issue Feb 14, 2022 · 1 comment
Open

Adding New Databases #181

elifcevrim opened this issue Feb 14, 2022 · 1 comment
Assignees
Labels
help wanted User needs help

Comments

@elifcevrim
Copy link
Collaborator

Hi Denes,

We are working on adding new databases in pypath. We have completed one of them, Drugcentral. How would you like to proceed about merging new scripts or updates into repo? Can we open a pull request directly or would you prefer discussing about them in issues at first, like we did before?

For Drugbank, it requires user and password. We tried to implement it by checking similar databases in pypath. But we are missing some points I think, could you help us about this issue? Here is the programmatical access of Drugbank: https://go.drugbank.com/releases/help I guess -L option is about an authentication procedure. We are not sure how to implement these options to the current curl script in pypath.

@elifcevrim elifcevrim added bug Problem in the code help wanted User needs help and removed bug Problem in the code labels Feb 14, 2022
@deeenes
Copy link
Member

deeenes commented Feb 14, 2022

Hi Elif,

Thanks, sounds really great!

I see you have write access to this repo. Feel free to merge directly to master. New modules in pypath.inputs don't break the module, and once merged, next day you can check in the report if the new functions run without error also on the test server: https://status.omnipathdb.org/inputs/latest/

Alternatively, you can open pull requests, in case you want me to review the code first.

About DrugBank: I would suggest to check the legal notes first, the license, we should first know if it's okay to redistribute the data. Otherwise, the -L option of curl is CURLOPT_FOLLOWLOCATION, which means to follow HTTP 30x redirects. This is enabled by default in pypath.share.curl. Downloads which require cookies, custom HTTP headers or password authentication are often tricky to implement. It is often a guess work to find out which headers are important for the server. I show few examples here:

https://github.com/saezlab/pypath/blob/master/pypath/inputs/cell.py
This function downloads supplementary files from journals of the Cell publisher. The logic is the following:

  • Create a Curl instance which we do not execute, but only obtain the cache path
  • Check if the cache path exists, if it does, go ahead and use the final Curl instance to access the cache content
  • Otherwise, create a Curl instance with another URL init_url, where a user-agent header must be present, and this Curl instance must bypass the cache because we must get a valid cookie from the server
  • Then we process the cookies and include them in the request headers of the final request
  • Finally we create a Curl object for the URL of the supplementary file that we want to download, using the headers which contain the cookie

https://github.com/saezlab/pypath/blob/master/pypath/inputs/cosmic.py
This is an example of password authentication. The user has many ways to provide their password: by a file in the config directory, or somewhere else, or by the pypath.share.settings module or just passing it to the function.

https://github.com/saezlab/pypath/blob/master/pypath/inputs/innatedb.py
And here, we just have to add a browser user-agent header, otherwise the server doesn't respond properly.

https://github.com/saezlab/pypath/blob/master/pypath/inputs/protmapper.py
Sometimes it's quite difficult to find out why a request fails, for example here, when disabling ALPN was the solution.

A very useful tool is the Inspector of your browser, where on the Network tab you can inspect the request and response headers of each request and by right click copy them as curl command line call.

I hope this helps. If you experience any difficulties, just let me know.

Best,

Denes

@deeenes deeenes self-assigned this Feb 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted User needs help
Projects
None yet
Development

No branches or pull requests

2 participants