Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdpclassifiertraindata download timeouts and breaks the build #10

Open
EricDeveaud opened this issue Feb 25, 2016 · 6 comments
Open

rdpclassifiertraindata download timeouts and breaks the build #10

EricDeveaud opened this issue Feb 25, 2016 · 6 comments

Comments

@EricDeveaud
Copy link

hello,

while trying to build a docker image for RDPTools I have a problem with the donload of the classifier training set that timesout
see:

download-traindata:
      [get] Getting: http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
      [get] To: /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz
    [untar] Expanding: /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz into /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes

BUILD FAILED
/local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build.xml:112: Error while expanding /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz
java.io.EOFException: Unexpected end of ZLIB input stream
    at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
    at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)

wget of the same url gives:

wget http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
--2016-02-25 11:43:38--  http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
Resolving rdp.cme.msu.edu... 35.8.164.79
Connecting to rdp.cme.msu.edu|35.8.164.79|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 149530230 (143M) [application/x-gzip]
Saving to: 'data.tgz'

61% [================================>                     ] 91,435,408   255KB/s   in 3m 30s 

2016-02-25 11:47:08 (425 KB/s) - Connection closed at byte 91435408. Retrying.

seems to me that the the get method used does not honour timeout nor the retry

best regards

Eric

@rdpstaffmsu
Copy link

Hi, Eric,

We tried and were not able to replicate this problem using computers in
locations. We will look into any adjustments that might remedy this
situation. For now, would you mind downloading this file and add to the
folder if this problem persists? Thank you.

Benli

On Thu, Feb 25, 2016 at 5:55 AM, Eric Deveaud notifications@github.com
wrote:

hello,

while trying to build a docker image for RDPTools I have a problem with
the donload of the classifier training set that timesout

see:

download-traindata:
[get] Getting: http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
[get] To: /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz
[untar] Expanding: /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz into /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes

BUILD FAILED
/local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build.xml:112: Error while expanding /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz
java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)

wget of the same url gives:

wget http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
--2016-02-25 http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz--2016-02-25 11:43:38-- http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
Resolving rdp.cme.msu.edu... 35.8.164.79
Connecting to rdp.cme.msu.edu|35.8.164.79|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 149530230 (143M) [application/x-gzip]
Saving to: 'data.tgz'

61% [================================> ] 91,435,408 255KB/s in 3m 30s

2016-02-25 11:47:08 (425 KB/s) - Connection closed at byte 91435408. Retrying.

seems to me that the the get method used does not honour timeout nor the
retry

best regards

Eric


Reply to this email directly or view it on GitHub
#10.

RDP Staff
Ribosomal Database Project
Center for Microbial Ecology
Michigan State University
567 Wilson Rd. Room 2225 A
East Lansing, MI 48824
(517) 353-3842

@EricDeveaud
Copy link
Author

currently I was abble to build using the following

get externaly the data.tgz (wget)
host data.tgz in localhost web server
and patch classifier/build.xml to use localhost instead of rdp.cme.msu.edu

sed -i -e 's,http://rdp.cme.msu.edu/download,http://localhost|'  classifier/build.xml

it's more or less what you suggested.

2 suggestion to fix the build process

  1. (hard way) check the get method used while building in order to see if it can handle timeouts
  2. (easy way) what you suggested. remove training data download from the build process and document that user must download the files by their own.

regards

Eric

@rdpstaffmsu
Copy link

Hi, Eric,

Thank you for the suggestions. We will look into the options to get it
fixed.

Benli

On Sat, Feb 27, 2016 at 6:22 AM, Eric Deveaud notifications@github.com
wrote:

currently I was abble to build using the following

get externaly the data.tgz (wget)
host data.tgz in localhost web server
and patch classifier/build.xml to use localhost instead of rdp.cme.msu.edu

sed -i -e 's,http://rdp.cme.msu.edu/download,http://localhost|' classifier/build.xml

it's more or less what you suggested.

2 suggestion to fix the build process

  1. (hard way) check the get method used while building in order to see if
    it can handle timeouts
  2. (easy way) what you suggested. remove training data download from the
    build process and document that user must download the files by their own.

regards

Eric


Reply to this email directly or view it on GitHub
#10 (comment).

RDP Staff
Ribosomal Database Project
Center for Microbial Ecology
Michigan State University
567 Wilson Rd. Room 2225 A
East Lansing, MI 48824
(517) 353-3842

@EricDeveaud
Copy link
Author

EricDeveaud commented Jul 7, 2016

back at this.

I had to make a fresh install RDPtools.
here is some output from wget

make[1]: Entering directory `/inst/RDPTools/RDPTools-2.0.2'
# java builder//installer tries to download data file and timeout
# donwload externaly
test -d /src/RDPTools/RDPTools-2.0.2/classifier/build/classes || mkdir -m 2775  -p /src/RDPTools/RDPTools-2.0.2/classifier/build/classes
test -f /src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz || \
        wget --tries=5 -c  http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz -O /src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz
--2016-07-07 18:02:02--  http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
Resolving rdp.cme.msu.edu... 35.8.164.79
Connecting to rdp.cme.msu.edu|35.8.164.79|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.

--2016-07-07 18:03:33--  (try: 2)  http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
Connecting to rdp.cme.msu.edu|35.8.164.79|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 181332714 (173M) [application/x-gzip]
Saving to: `/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz'

 0% [                                                                                        ] 302,632      101K/s   in 47s     

2016-07-07 18:04:22 (6.26 KB/s) - Connection closed at byte 302632. Retrying.

--2016-07-07 18:04:24--  (try: 3)  http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
Connecting to rdp.cme.msu.edu|35.8.164.79|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 181332714 (173M), 181030082 (173M) remaining [application/x-gzip]
Saving to: `/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz'

100%[=======================================================================================>] 181,332,714 5.31M/s   in 36s     

2016-07-07 18:05:00 (4.81 MB/s) - `/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz' saved [181332714/181332714]

@davidvilanova
Copy link

Cannot download the traindata either. Can you copy the traindata somewhere or fix the URL ??

@cebercoto
Copy link

Same problem here as of 14/05/2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants