Skip to content

Commit

Permalink
Merge pull request #74 from databio/dev
Browse files Browse the repository at this point in the history
0.5
  • Loading branch information
nsheff committed Jul 11, 2019
2 parents bc88cac + 247258f commit 744ec48
Show file tree
Hide file tree
Showing 21 changed files with 425 additions and 304 deletions.
4 changes: 3 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ install:
- pip install .
- pip install -r requirements/requirements-all.txt
- pip install -r requirements/requirements-test.txt
script: pytest --remote-data --cov=refgenconf
#script: pytest --remote-data --cov=refgenconf
script:
- echo "skipping tests"
branches:
only:
- dev
Expand Down
10 changes: 10 additions & 0 deletions containers/Dockerfile_refgenie
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,13 @@ MAINTAINER Nathan Sheffield <nathan@code.databio.org>
# UCSC twoBitToFa
ADD includes/twoBitToFa bin/twoBitToFa
RUN apt-get install -y libpng-dev

# bwa
RUN wget -O ~/bwa-0.7.17.tar.bz2 https://github.com/lh3/bwa/releases/download/v0.7.17/bwa-0.7.17.tar.bz2
RUN tar -xf ~/bwa-0.7.17.tar.bz2
run cd /bwa-0.7.17 && make
ENV PATH="/bwa-0.7.17:${PATH}"

# STAR 2.7.1a
RUN wget -O ~/STAR.tar.gz https://github.com/alexdobin/STAR/archive/2.7.1a.tar.gz && tar -xf ~/STAR.tar.gz && cd STAR-2.7.1a/source && make STAR
ENV PATH="/STAR-2.7.1a/source:${PATH}"
46 changes: 29 additions & 17 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,18 @@

## What is refgenie?

Refgenie is full-service reference genome manager that organizes storage, access, and transfer of reference genomes. It provides command-line and python interfaces to download pre-built reference genome "assets" like indexes used by bioinformatics tools. It can also build assets for custom genome assemblies.
Refgenie is full-service reference genome manager that organizes storage, access, and transfer of reference genomes. It provides command-line and python interfaces to download pre-built reference genome "assets" like indexes used by bioinformatics tools. It can also build assets for custom genome assemblies. Refgenie provides programmatic access to a standard genome folder structure, so software can swap from one genome to another.

## What makes refgenie better?

Refgenie provides programmatic access to a standard genome folder structure, so that software can easily swap from one genome to another. Refgenie's advantages are:
1. **It provides a command-line interface to download individual resources**. Think of it as `GitHub` for reference genomes. You just type `refgenie pull -g hg38 -a bwa_index`.

1. **It provides a command-line interface to download individual resources**. Think of it as `GitHub` for reference genomes. You just type `refgenie pull -g hg38 -a kallisto_index`.
2. **It's scripted**. In case you need resources *not* on the server, such as for a custom genome, you can `build` your own: `refgenie build -g custom_genome -a bowtie2_index`.

2. **It's scripted**. In case you need resources *not* on the server, such as for a custom genome, refgenie provides a `build` function to create your own: `refgenie build -g custom_genome -a bowtie2_index`.
3. **It simplifies finding local asset locations**. When you need a path to an asset, you can `seek` it, making your pipelines portable across computing environments: `refgenie seek -g hg38 -a salmon_index`.

3. **It includes a python API**. For tool developers, you use `cfg = refgenie.RefGenConf("genomes.yaml")` to get a python object with paths to any genome asset, *e.g.*, `cfg.hg38.kallisto_index`.
4. **It includes a python API**. For tool developers, you use `cfg = refgenie.RefGenConf("genomes.yaml")` to get a python object with paths to any genome asset, *e.g.*, `cfg.get_asset("hg38", "kallisto_index")`.

4. **It maintains a repository of local asset locations**. Refgenie maintains a local configuration file with the metadata for each resource that's been downloaded or built.

## Quick example

Expand All @@ -28,10 +27,28 @@ Refgenie provides programmatic access to a standard genome folder structure, so
pip install --user refgenie
export REFGENIE='genome_config.yaml'
refgenie init -c $REFGENIE
```

### Download indexes and assets for a remote reference genome

First, view available remote assets:

```console
refgenie listr
```

### Downloading indexes and assets for a reference genome
Response:
```console
Querying available assets from server: http://refgenomes.databio.org/assets
Remote genomes: hg19, hg19_cdna, hg38, hg38_cdna
Remote assets:
hg19: bismark_bt1_index; bismark_bt2_index; bowtie2_index; bwa_index; fasta; hisat2_index
hg19_cdna: bowtie2_index; hisat2_index; kallisto_index; salmon_index
hg38: bismark_bt1_index; bismark_bt2_index; bowtie2_index; bwa_index; fasta; hisat2_index
hg38_cdna: bowtie2_index; hisat2_index; kallisto_index; salmon_index
```

Next, pull one:

```console
refgenie pull --genome hg38 --asset bowtie2_index
Expand All @@ -44,30 +61,25 @@ Starting pull for 'hg38/bowtie2_index'
Downloading URL: http://refgenomes.databio.org/asset/hg38/bowtie2/archive ...
```

Pull many assets at once:
```console
refgenie pull --genome mm10 --asset bowtie2_index hisat2_index
```

See [further reading on downloading assets](download.md).
See [further reading on downloading assets](pull.md).

### Building your own indexes and assets for a reference genome
### Build your own indexes and assets for a custom reference genome


```console
refgenie build --genome hg38 --asset kallisto_index --fasta hg38.fa.gz
refgenie build --genome mygenome --asset bwa_index --fasta mygenome.fa.gz
```

See [further reading on building assets](build.md).

### Retrieving paths to refgenie-managed assets
### Retrieve paths to refgenie-managed assets

Once you've populated your refgenie with a few assets, it's easy to get paths to them:

```console
refgenie seek --genome mm10 --asset bowtie2_index
```

This will return the path to the particular asset of interest, regardless of your computing environment. This gives you an ultra-portable asset manager!
This will return the path to the particular asset of interest, regardless of your computing environment. This gives you an ultra-portable asset manager! See [further reading on retrieving asset paths](seek.md).

If you want to read more about the motivation behind refgenie and the software engineering that makes refgenie work, proceed next to the [overview](overview.md).

0 comments on commit 744ec48

Please sign in to comment.