Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing "staffwww.dcs.sheffield.ac.uk/people/J.Hensman" data #8

Open
finmod opened this issue Dec 15, 2015 · 8 comments
Open

Accessing "staffwww.dcs.sheffield.ac.uk/people/J.Hensman" data #8

finmod opened this issue Dec 15, 2015 · 8 comments
Assignees

Comments

@finmod
Copy link

finmod commented Dec 15, 2015

There is a common problem on accessing compbio and other datasets: drosophilia, spellman yeasts, Lab3.zip and others. This is in addition to migrating matplotlib and pods to Python 3. Should'nt these datasets be integrated nicely in pods to provide an homogeneous set of testing notebook (gprs, gpss) and "datasets" folder?

The error is:
C:\Users\Denis\Anaconda3\lib\urllib\request.py in http_error_default(self, req, fp, code, msg, hdrs)
587 class HTTPDefaultErrorHandler(BaseHandler):
588 def http_error_default(self, req, fp, code, msg, hdrs):
--> 589 raise HTTPError(req.full_url, code, msg, hdrs, fp)
590
591 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 403: Forbidden

@mzwiessele
Copy link
Member

@jameshensman do you still have the files?

@lawrennd
Copy link
Member

lawrennd commented Mar 6, 2016

I tried to move most of these types of things across as I found them.
Certainly spellman is in pods, but I'm not sure about drosophila.

It's a good example of why we developed pods!

If we can recover the datasets let's try and get them integrated.

On Sun, Mar 6, 2016 at 10:27 AM, Max Zwiessele notifications@github.com
wrote:

@jameshensman https://github.com/jameshensman do you still have the
files?


Reply to this email directly or view it on GitHub
#8 (comment).

@magnusrattray
Copy link

Is there any news on the drosophila data?

@finmod
Copy link
Author

finmod commented May 3, 2016

No, I established that using pods is better than using GPy.utils to access the dataset files. This is with GPy-devel. All in all, I managed to put a complete folder "datasets" from various sources and packages in SheffieldML. Hence, I managed to form the drosophila.knirps file required by Hierarchical.ipynb and eliminate direct access to Lab3 in that notebook.

@lawrennd
Copy link
Member

lawrennd commented May 4, 2016

That's great. yes pods is the right place to do this.

Did you do a pull request for an updated version of the notebook?

On Tue, May 3, 2016 at 3:09 PM, finmod notifications@github.com wrote:

No, I established that using pods is better than using GPy.utils to access
the dataset files. This is with GPy-devel. All in all, I managed to put a
complete folder "datasets" from various sources and packages in
SheffieldML. Hence, I managed to form the drosophila.knirps file required
by Hierarchical.ipynb and eliminate direct access to Lab3 in that notebook.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#8 (comment)

@jameshensman
Copy link
Contributor

Here's the drosophila data if someone wants to add it.
dros.zip

@finmod
Copy link
Author

finmod commented May 10, 2016

Thank you James for this data file. With the kalinka09_mel.csv and kalinka09_mel_pdata.csv files extracted into the compbio folder, Hierarchical.ipynb is now running fine.

Note that kalinka09_mel is a lighter version than the one I downloaded from the original source using pods.

To recap the fix:

  1.  Extract the two kalinka09 files to the compbio folder;
    
  2.  Comment out urllib in hierarchical.ipynb as follows:
    

#import urllib

#urllib.urlretrieve('http://staffwww.dcs.sheffield.ac.uk/people/J.Hensman/data/kalinka09_mel.csv', 'kalinka_data.csv')

#urllib.urlretrieve('http://staffwww.dcs.sheffield.ac.uk/people/J.Hensman/data/kalinka09_mel_pdata.csv', 'kalinka_pdata.csv')

expression = np.loadtxt('kalinka09_mel.csv', delimiter=',', usecols=range(1, 57))

gene_names = np.loadtxt('kalinka09_mel.csv', delimiter=',', usecols=[0], dtype=np.str)

replicates, times = np.loadtxt('kalinka09_mel_pdata.csv', delimiter=',').T

#normalize data row-wise

expression -= expression.mean(1)[:,np.newaxis]

expression /= expression.std(1)[:,np.newaxis]

Running the complete (8 out of 8) compbio folder requires a similar availability of a data file for

Y=np.load("/users/suraalrashid/expression.npy") in TFA_with_Coregion-1.ipynb.

I could not locate the suraalrashid data anywhere.

From: James Hensman [mailto:notifications@github.com]
Sent: Thursday, May 5, 2016 9:14 AM
To: SheffieldML/notebook notebook@noreply.github.com
Cc: finmod denis.richard@dr.com; Author author@noreply.github.com
Subject: Re: [SheffieldML/notebook] Accessing "staffwww.dcs.sheffield.ac.uk/people/J.Hensman" data (#8)

Here's the drosophila data if someone wants to add it.
dros.zip https://github.com/SheffieldML/notebook/files/250051/dros.zip


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub #8 (comment) https://github.com/notifications/beacon/AMHyIlwuQNLm8-kfUZU3-5vi0rDxy-UAks5p-ZitgaJpZM4G1oVA.gif

@finmod
Copy link
Author

finmod commented May 10, 2016

Hello James,

As a logical step after running hierarchical.ipynb, in deepGPy (configuration: Linux (Ubuntu) on VM VirtualBox, python 2.7 and Anaconda 2.5), two questions arise about plotting:

  •      The Nested Deep GP.ipynb stops abruptly on the production of Fig 4 in the paper on the robot wireless data. This issue has been raised as issue #5 in deepGPy;
    
  •      Same problem with Figure 3 of the two dimensional toy demo in the Gaussian Processes with Big Data paper.
    

It would be nice if you could make available the code for these two plots because they convey a telling message for otherwise complex processes.

Thank you.

From: James Hensman [mailto:notifications@github.com]
Sent: Thursday, May 5, 2016 9:14 AM
To: SheffieldML/notebook notebook@noreply.github.com
Cc: finmod denis.richard@dr.com; Author author@noreply.github.com
Subject: Re: [SheffieldML/notebook] Accessing "staffwww.dcs.sheffield.ac.uk/people/J.Hensman" data (#8)

Here's the drosophila data if someone wants to add it.
dros.zip https://github.com/SheffieldML/notebook/files/250051/dros.zip


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub #8 (comment) https://github.com/notifications/beacon/AMHyIlwuQNLm8-kfUZU3-5vi0rDxy-UAks5p-ZitgaJpZM4G1oVA.gif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants