Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue downloading pre-built assets #261

Open
chrishuges opened this issue Jul 9, 2021 · 6 comments
Open

issue downloading pre-built assets #261

chrishuges opened this issue Jul 9, 2021 · 6 comments

Comments

@chrishuges
Copy link

Hi,

I am having an issue with refgenie pull. This is a fresh install of refgenie on a server running CentOS7. Basically what happens is that if I try to do a refgenie pull, it seems to connect with the server but never actually downloads the file. For example:

refgenie pull hg38/fasta
Compatible refgenieserver instances: ['http://refgenomes.databio.org']
Downloading URL: http://refgenomes.databio.org/v3/assets/archive/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/fasta

It seems to create the correct folder structure in alias and data, but there are no actual files in there (symbolic links or the data files themselves. Not sure what is happening here. I am running refgenie in a python 3.6.8 virtual environment, installed with pip install refgenie. My genome_config is as below:

config_version: 0.4
genome_folder: /home/chughes/databases/refgenieGenomes
genome_servers:
 - http://refgenomes.databio.org
genomes:
  2230c535660fb4774114bfa966a62f823fdb6d21acf138d4:
    aliases:
     - hg38
  94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4:
    aliases:
     - rCRSd
  baa91c8f6e2780cfd8fd1040ff37f51c379947a2a4820d6c:
    aliases:
     - hg19

Refgenie listr seems to work fine as well:

refgenie listr
                                                       Remote refgenie assets
                                             Server URL: http://refgenomes.databio.org
+----------------------------------------------------------------------------------------------------------------------------------+
| genome           | assets                                                                                                        |
|------------------+---------------------------------------------------------------------------------------------------------------|
| rCRSd            | fasta, bowtie2_index, bwa_index, star_index, hisat2_index, bismark_bt2_index                                  |
| hg38             | fasta, gencode_gtf, refgene_anno, dbsnp, ensembl_gtf, ensembl_rb, suffixerator_index, bwa_index,              |
|                  | bowtie2_index, dbnsfp, star_index, fasta_txome, hisat2_index, cellranger_reference, bismark_bt2_index,        |
|                  | salmon_partial_sa_index, tgMap, salmon_sa_index                                                               |
| human_repeats    | fasta, suffixerator_index, bowtie2_index, bwa_index, star_index, hisat2_index, bismark_bt2_index              |
| mouse_chrM2x     | fasta, suffixerator_index, bowtie2_index, bwa_index, star_index, hisat2_index, bismark_bt2_index              |
| hg18_cdna        | fasta, kallisto_index                                                                                         |
| hs38d1           | fasta, suffixerator_index, bowtie2_index, bwa_index, star_index, hisat2_index, bismark_bt2_index              |
| hg38_cdna        | fasta, salmon_index, kallisto_index                                                                           |
| rn6_cdna         | fasta, salmon_index, kallisto_index                                                                           |
| mm10_cdna        | fasta, salmon_index, kallisto_index                                                                           |
| hg19_cdna        | fasta, salmon_index, kallisto_index                                                                           |
| hg38_chr22       | fasta, suffixerator_index, bowtie2_index, bwa_index, star_index, hisat2_index, bismark_bt2_index              |
| hg19             | fasta, gencode_gtf, refgene_anno, dbsnp, ensembl_gtf, ensembl_rb, suffixerator_index, bowtie2_index,          |
|                  | bwa_index, hisat2_index, fasta_txome, star_index, cellranger_reference, bismark_bt2_index,                    |
|                  | salmon_partial_sa_index, tgMap, salmon_sa_index                                                               |
| hg18             | fasta, gencode_gtf, bowtie2_index, suffixerator_index, bwa_index, hisat2_index, fasta_txome, star_index,      |
|                  | cellranger_reference, bismark_bt2_index                                                                       |
| human_alu        | fasta, suffixerator_index, bowtie2_index, bwa_index, hisat2_index, bismark_bt2_index                          |
| human_rDNA       | fasta, suffixerator_index, bowtie2_index, bwa_index, star_index, hisat2_index, bismark_bt2_index              |
| human_alphasat   | fasta, suffixerator_index, bowtie2_index, bwa_index, star_index, hisat2_index, bismark_bt2_index              |
| mm10             | fasta, gencode_gtf, refgene_anno, ensembl_gtf, ensembl_rb, suffixerator_index, bwa_index, bowtie2_index,      |
|                  | hisat2_index, star_index, fasta_txome, cellranger_reference, bismark_bt2_index, salmon_partial_sa_index,      |
|                  | tgMap, salmon_sa_index                                                                                        |
| rn6              | fasta, refgene_anno, ensembl_gtf, suffixerator_index, bwa_index, bowtie2_index, hisat2_index, star_index,     |
|                  | fasta_txome, bismark_bt2_index, salmon_partial_sa_index, tgMap, salmon_sa_index                               |
| t7               | fasta, bowtie2_index                                                                                          |
| dm6              | fasta, gencode_gtf, refgene_anno, ensembl_gtf, bowtie2_index                                                  |
| hg38_noalt_decoy | fasta, suffixerator_index, bwa_index, bowtie2_index, star_index, hisat2_index, bismark_bt2_index              |
| mm10_primary     | fasta, bowtie2_index, bwa_index                                                                               |
| hg38_primary     | fasta, bwa_index, bowtie2_index                                                                               |
| hg38_mm10        | fasta, bwa_index                                                                                              |
+----------------------------------------------------------------------------------------------------------------------------------+

I searched through the issues but couldn't find anything related to the files simply not downloading. Any ideas?

Thanks,
Chris

@nsheff
Copy link
Contributor

nsheff commented Jul 9, 2021

Is there any other output to refgenie pull? Can you paste the entire output (or is that all it says, above) ?

@stolarczyk
Copy link
Contributor

stolarczyk commented Jul 9, 2021

and maybe paste the debug logs as well, run:

refgenie --verbosity 5 pull rCRSd/fasta

@chrishuges
Copy link
Author

That was the entire output for refgenie pull. It just ends after the 'downloading URL' statement. Here is the debug log:

refgenie --verbosity 5 pull rCRSd/fasta
DEBU 10:57:11 | root:est:266 > Configured logger 'root' using logmuse v0.2.6
DEBU 10:57:11 | root:cli:41 > versions: refgenie 0.12.0 | refgenconf 0.12.0
DEBU 10:57:11 | root:cli:42 > Args: Namespace(asset_registry_paths=['rCRSd/fasta'], batch=False, command='pull', force_overwrite=False, genome=None, genome_config=None, logdev=False, no_large=False, no_overwrite=False, pull_large=False, silent=False, size_cutoff=10, skip_read_lock=False, verbosity='5')
DEBU 10:57:11 | yacman.yacman:yacman:501 > No local config file was provided
DEBU 10:57:11 | yacman.yacman:yacman:506 > Checking for environment variable: ['REFGENIE']
DEBU 10:57:11 | yacman.yacman:yacman:511 > Found config file in REFGENIE: /home/chughes/databases/refgenieGenomes/genome_config.yaml

DEBU 10:57:11 | root:cli:74 > Determined genome config: /home/chughes/databases/refgenieGenomes/genome_config.yaml
DEBU 10:57:11 | root:cli:80 > Found registry_path: ['rCRSd/fasta']
DEBU 10:57:11 | attmap.attmap:attmap:105 > Transforming map-like: {'2230c535660fb4774114bfa966a62f823fdb6d21acf138d4': {'aliases': ['hg38']}, '94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4': {'aliases': ['rCRSd']}, 'baa91c8f6e2780cfd8fd1040ff37f51c379947a2a4820d6c': {'aliases': ['hg19']}}
DEBU 10:57:11 | attmap.attmap:attmap:105 > Transforming map-like: {'aliases': ['hg38']}
DEBU 10:57:11 | attmap.attmap:attmap:105 > Transforming map-like: {'aliases': ['rCRSd']}
DEBU 10:57:11 | attmap.attmap:attmap:105 > Transforming map-like: {'aliases': ['hg19']}
DEBU 10:57:11 | yacman.yacman:yacman:231 > Validated successfully
DEBU 10:57:11 | refgenconf.refgenconf:refgenconf:135 > Config version is compliant: 0.4
DEBU 10:57:11 | refgenconf.helpers:helpers:300 > Downloading JSON data; querying URL: http://refgenomes.databio.org/openapi.json
DEBU 10:57:11 | urllib3.connectionpool:connectionpool:231 > Starting new HTTP connection (1): refgenomes.databio.org:80
DEBU 10:57:11 | urllib3.connectionpool:connectionpool:461 > http://refgenomes.databio.org:80 "GET /openapi.json HTTP/1.1" 200 55101
INFO 10:57:11 | refgenconf.refgenconf:refgenconf:1530 > Compatible refgenieserver instances: ['http://refgenomes.databio.org']
DEBU 10:57:11 | refgenconf.helpers:helpers:300 > Downloading JSON data; querying URL: http://refgenomes.databio.org/openapi.json
DEBU 10:57:11 | urllib3.connectionpool:connectionpool:231 > Starting new HTTP connection (1): refgenomes.databio.org:80
DEBU 10:57:11 | urllib3.connectionpool:connectionpool:461 > http://refgenomes.databio.org:80 "GET /openapi.json HTTP/1.1" 200 55101
DEBU 10:57:11 | refgenconf.helpers:helpers:300 > Downloading JSON data; querying URL: http://refgenomes.databio.org/v3/assets/default_tag/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta
DEBU 10:57:11 | urllib3.connectionpool:connectionpool:231 > Starting new HTTP connection (1): refgenomes.databio.org:80
DEBU 10:57:11 | urllib3.connectionpool:connectionpool:461 > http://refgenomes.databio.org:80 "GET /v3/assets/default_tag/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta HTTP/1.1" 200 7
DEBU 10:57:11 | refgenconf.helpers:helpers:306 > The returned data is not a valid JSON
DEBU 10:57:11 | refgenconf.helpers:helpers:308 > Request returned pain text data: default
DEBU 10:57:11 | refgenconf.refgenconf:refgenconf:1562 > Determined tag: default
DEBU 10:57:11 | refgenconf.helpers:helpers:300 > Downloading JSON data; querying URL: http://refgenomes.databio.org/openapi.json
DEBU 10:57:11 | urllib3.connectionpool:connectionpool:231 > Starting new HTTP connection (1): refgenomes.databio.org:80
DEBU 10:57:12 | urllib3.connectionpool:connectionpool:461 > http://refgenomes.databio.org:80 "GET /openapi.json HTTP/1.1" 200 55101
DEBU 10:57:12 | refgenconf.helpers:helpers:300 > Downloading JSON data; querying URL: http://refgenomes.databio.org/openapi.json
DEBU 10:57:12 | urllib3.connectionpool:connectionpool:231 > Starting new HTTP connection (1): refgenomes.databio.org:80
DEBU 10:57:12 | urllib3.connectionpool:connectionpool:461 > http://refgenomes.databio.org:80 "GET /openapi.json HTTP/1.1" 200 55101
DEBU 10:57:12 | refgenconf.helpers:helpers:300 > Downloading JSON data; querying URL: http://refgenomes.databio.org/openapi.json
DEBU 10:57:12 | urllib3.connectionpool:connectionpool:231 > Starting new HTTP connection (1): refgenomes.databio.org:80
DEBU 10:57:12 | urllib3.connectionpool:connectionpool:461 > http://refgenomes.databio.org:80 "GET /openapi.json HTTP/1.1" 200 55101
DEBU 10:57:12 | refgenconf.helpers:helpers:300 > Downloading JSON data; querying URL: http://refgenomes.databio.org/v3/assets/attrs/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta
DEBU 10:57:12 | urllib3.connectionpool:connectionpool:231 > Starting new HTTP connection (1): refgenomes.databio.org:80
DEBU 10:57:13 | urllib3.connectionpool:connectionpool:461 > http://refgenomes.databio.org:80 "GET /v3/assets/attrs/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta?tag=default HTTP/1.1" 200 795
DEBU 10:57:13 | refgenconf.refgenconf:refgenconf:1590 > Determined server URL: http://refgenomes.databio.org
DEBU 10:57:13 | refgenconf.helpers:helpers:300 > Downloading JSON data; querying URL: http://refgenomes.databio.org/v3/genomes/attrs/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4
DEBU 10:57:13 | urllib3.connectionpool:connectionpool:231 > Starting new HTTP connection (1): refgenomes.databio.org:80
DEBU 10:57:13 | urllib3.connectionpool:connectionpool:461 > http://refgenomes.databio.org:80 "GET /v3/genomes/attrs/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4 HTTP/1.1" 200 115
DEBU 10:57:13 | refgenconf.refgenconf:refgenconf:1629 > '94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta:default' archive size: 8.6KB
DEBU 10:57:13 | refgenconf.refgenconf:refgenconf:3136 > Checking archive size: '8.6KB'
INFO 10:57:13 | refgenconf.refgenconf:refgenconf:1651 > Downloading URL: http://refgenomes.databio.org/v3/assets/archive/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta

@stolarczyk
Copy link
Contributor

stolarczyk commented Jul 9, 2021

That's interesting, the logs don't show any issues. Can you try to download the archive with wget and the following Python code:

from urllib.request import urlretrieve 
urlretrieve("http://refgenomes.databio.org/v3/assets/archive/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta", filename="rCRSd.fa.gz")

does that succeed?

@chrishuges
Copy link
Author

Not sure I did this correctly, but for the second command, I assume you meant within python:

>>> from urllib.request import urlretrieve
>>> urlretrieve("http://refgenomes.databio.org/v3/assets/archive/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta", filename="rCRSd.fa.gz")
('rCRSd.fa.gz', <http.client.HTTPMessage object at 0x7f14fa7c9240>)
>>>

It downloads the file fine and it seems to be correct by manual inspection. Wget just throws a file not found error, so I must have something wrong in the link.

wget http://refgenomes.databio.org/v3/assets/archive/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta/rCRSd.fa.gz
--2021-07-09 11:14:48--  http://refgenomes.databio.org/v3/assets/archive/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta/rCRSd.fa.gz
Resolving refgenomes.databio.org (refgenomes.databio.org)... 52.206.204.252
Connecting to refgenomes.databio.org (refgenomes.databio.org)|52.206.204.252|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2021-07-09 11:14:48 ERROR 404: Not Found.

I am able to download the prebuilt assets without issue:

curl -L http://refgenomes.databio.org/v3/assets/archive/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta?tag=default -o rCRSd.fa.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  8787  100  8787    0     0  23844      0 --:--:-- --:--:-- --:--:-- 23844

This file looks identical to the one downloaded via python.

@nsheff
Copy link
Contributor

nsheff commented Nov 23, 2021

Sorry for dropping the ball on this. Did you ever figure it out?

One thing I'm not clear on is: does the command just hang after that message, or does the process terminate?

I am guessing this has something to do with the rich progress reports, but it's going to be really difficult to debug without being able to reproduce it locally. What shell are you using? Do you know if there's any reason this would be incompatible with rich ?

Here's what I see:

INFO 09:38:51 | refgenconf.refgenconf:refgenconf:1651 > Downloading URL: http://refgenomes.databio.org/v3/assets/archive/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta 
94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta:default ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 24.0/8.6 KB • 4.1 MB/s • 0:00:00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants