Skip to content

Commit

Permalink
Merge pull request #240 from refgenie/dev
Browse files Browse the repository at this point in the history
v0.11.0
  • Loading branch information
stolarczyk committed Apr 27, 2021
2 parents 5bf3fdd + 3a65e2a commit de1ae09
Show file tree
Hide file tree
Showing 32 changed files with 682 additions and 568 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/black.yml
Expand Up @@ -8,4 +8,4 @@ jobs:
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- uses: psf/black@stable
- uses: psf/black@20.8b1
17 changes: 4 additions & 13 deletions .github/workflows/build-package.yml
Expand Up @@ -7,11 +7,11 @@ on:
branches: [master, dev]

jobs:
pytest:
build-package:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: [3.6, 3.7, 3.8]
python-version: [3.6, 3.7, 3.8, 3.9]
os: [ubuntu-latest, macos-latest]

steps:
Expand All @@ -25,17 +25,8 @@ jobs:
- name: Install dev dependancies
run: if [ -f requirements/requirements-dev.txt ]; then pip install -r requirements/requirements-dev.txt; fi

# - name: Install test dependancies
# run: if [ -f requirements/requirements-test.txt ]; then pip install -r requirements/requirements-test.txt; fi
- name: Install test dependancies
run: if [ -f requirements/requirements-test.txt ]; then pip install -r requirements/requirements-test.txt; fi

- name: Install package
run: python -m pip install .

# - name: Run pytest tests
# run: pytest tests --remote-data --cov=./ --cov-report=xml

# - name: Upload coverage to Codecov
# uses: codecov/codecov-action@v1
# with:
# file: ./coverage.xml
# name: py-${{ matrix.python-version }}-${{ matrix.os }}
30 changes: 22 additions & 8 deletions .github/workflows/test-refgenie-cli.yml
Expand Up @@ -9,7 +9,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: [3.6, 3.8]
python-version: [3.6, 3.9]
os: [ubuntu-latest, macos-latest]

steps:
Expand All @@ -26,6 +26,11 @@ jobs:
- name: Install package
run: python -m pip install .

- name: Set up Homebrew
if: startsWith(matrix.os, 'macOS')
id: set-up-homebrew
uses: Homebrew/actions/setup-homebrew@master

- name: install macOS-specific dependancies
if: startsWith(matrix.os, 'macOS')
run: brew install md5sha1sum
Expand Down Expand Up @@ -68,9 +73,6 @@ jobs:
working-directory: ./genomes
run: refgenie list -c g.yaml

- name: refgenie build fasta
run: refgenie build -c genomes/g.yaml t7/fasta --files fasta=tests/data/t7.fa.gz --recipe tests/data/recipe_parent.json

- name: refgenie set aliases
run: |
refgenie alias set -c genomes/g.yaml --aliases t7_new t7_new1 --digest 6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905
Expand All @@ -96,18 +98,18 @@ jobs:
./tests/assert_in_file.sh genomes/g.yaml t7_new 1
./tests/assert_in_file.sh genomes/g.yaml t7_new1 1
./tests/assert_in_file.sh genomes/g.yaml t7_another 0
if [ -L genomes/alias/t7_new/fasta/default/6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905.fa.gz ]; then
echo "'genomes/alias/t7_new/fasta/default/6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905.fa.gz' exists."
if [ -L genomes/alias/t7_new/fasta/default/t7_new.fa.gz ]; then
echo "'genomes/alias/t7_new/fasta/default/t7_new.fa.gz' exists."
exit 1
else
echo "Error: 'genomes/alias/t7_new/fasta/default/6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905.fa.gz' does not exist."
echo "Error: 'genomes/alias/t7_new/fasta/default/t7_new.fa.gz' does not exist."
fi
- name: refgenie get aliases
run: |
refgenie alias get -c genomes/g.yaml
- name: refgneie add asset
- name: refgenie add asset
run: |
refgenie add t7_another/test_asset -c genomes/g.yaml --path ../tests/data --seek-keys '{"recipe": "recipe_parent.json"}'
./tests/assert_in_file.sh genomes/g.yaml test_asset 0
Expand Down Expand Up @@ -138,3 +140,15 @@ jobs:
refgenie remove -c genomes/g.yaml t7_another/fasta_child -f
./tests/assert_in_file.sh genomes/g.yaml fasta_child 1
./tests/assert_in_file.sh genomes/g.yaml 6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905/fasta_child:new_tag 1 # test if the entry was removed from the fasta children list
- name: refgenie populate
run: |
populate_path=`echo 'refgenie://t7_another/fasta:default' | refgenie populate -c genomes/g.yaml`
seek_path=`refgenie seek -c genomes/g.yaml t7_another/fasta:default`
if [[ "$populate_path" == "$seek_path" ]]; then
echo "seek and populate returned identical paths"
else
echo "Error: seek and populate returned different paths -- seek: ${seek_path}; populate: ${populate_path}"
exit 1
fi
20 changes: 20 additions & 0 deletions .pre-commit-config.yaml
@@ -0,0 +1,20 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.4.0
hooks:
- id: trailing-whitespace
- id: check-yaml
- id: end-of-file-fixer
- id: requirements-txt-fixer
- id: trailing-whitespace

- repo: https://github.com/PyCQA/isort
rev: 5.7.0
hooks:
- id: isort
args: ["--profile", "black"]

- repo: https://github.com/psf/black
rev: 20.8b1
hooks:
- id: black
54 changes: 46 additions & 8 deletions docs/README.md
Expand Up @@ -10,6 +10,8 @@

Refgenie manages storage, access, and transfer of reference genome resources. It provides command-line and Python interfaces to *download* pre-built reference genome "assets", like indexes used by bioinformatics tools. It can also *build* assets for custom genome assemblies. Refgenie provides programmatic access to a standard genome folder structure, so software can swap from one genome to another.

**In a hurry?** Check out the [demo videos](demo_videos.md) that present the most relevant refgenie features in 3 minutes!

## What makes refgenie better?

1. **It provides a command-line interface to download individual resources**. Think of it as `GitHub` for reference genomes. You just type `refgenie pull hg38/bwa_index`.
Expand All @@ -18,23 +20,46 @@ Refgenie manages storage, access, and transfer of reference genome resources. It

3. **It simplifies finding local asset locations**. When you need a path to an asset, you can `seek` it, making your pipelines portable across computing environments: `refgenie seek hg38/salmon_index`.

4. **It includes a python API**. For tool developers, you use `rgc = refgenconf.RefGenConf("genomes.yaml")` to get a Python object with paths to any genome asset, *e.g.*, `rgc.seek("hg38", "kallisto_index")`.
4. **It provides remote operation mode**, useful for cloud applications. Get a path to an asset file hosted on AWS S3: `refgenie seekr hg38/fasta --remote-class s3`.

5. **It strictly determines genomes compatibility**. Users refer to genomes with arbitrary aliases, like "hg38", but refgenie uses sequence-derived identifiers to verify genome identity with asset servers.
5. **It includes a Python API**. For tool developers, you use `rgc = refgenconf.RefGenConf("genomes.yaml")` to get a Python object with paths to any genome asset, *e.g.*, `rgc.seek("hg38", "kallisto_index")`.

6. **It strictly determines genomes compatibility**. Users refer to genomes with arbitrary aliases, like "hg38", but refgenie uses sequence-derived identifiers to verify genome identity with asset servers.

## Quick example

### Install and initialize
### Install

Refgenie keeps track of what's available using a configuration file initialized by `refgenie init`:
Refgenie is a Python package package, install from [PyPi](https://pypi.org/project/refgenie/):

```console
pip install --user refgenie
```

Or [conda](https://anaconda.org/bioconda/refgenie):

```console
conda install refgenie
```

And that's it! If you wish to use refgenie in *remote mode* See [further reading on remote mode in refgenie](remote.md).

If you're connected to the Internet, call a test command, e.g.:

```console
refgenie seekr hg38/fasta
```

### Initialize to use refgenie locally

Refgenie keeps track of what's available using a configuration file initialized by `refgenie init`:

```console
export REFGENIE='genome_config.yaml'
refgenie init -c $REFGENIE
```


### Download indexes and assets for a remote reference genome

Use `refgenie pull` to download pre-built assets from a remote server. View available remote assets with `listr`:
Expand All @@ -45,8 +70,8 @@ refgenie listr

Response:
```console
Remote refgenie assets
Server URL: http://refgenomes.databio.org
Remote refgenie assets
Server URL: http://refgenomes.databio.org
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ genome ┃ assets ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
Expand All @@ -70,7 +95,7 @@ Downloading URL: http://rg.databio.org/v3/assets/archive/94e0d21feb576e6af61cd2a
Download complete: /Users/mstolarczyk/Desktop/testing/refgenie/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index/bowtie2_index__default.tgz
Extracting asset tarball: /Users/mstolarczyk/Desktop/testing/refgenie/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index/bowtie2_index__default.tgz
Default tag for '94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index' set to: default
Created alias directories:
Created alias directories:
- /Users/mstolarczyk/Desktop/testing/refgenie/alias/rCRSd/bowtie2_index/default
```

Expand All @@ -87,7 +112,7 @@ refgenie build mygenome/bwa_index

See [further reading on building assets](build.md).

### Retrieve paths to refgenie-managed assets
### Retrieve paths to *local* refgenie-managed assets

Once you've populated your refgenie with a few assets, use `seek` to retrieve their local file paths:

Expand All @@ -97,4 +122,17 @@ refgenie seek mm10/bowtie2_index

This will return the path to the particular asset of interest, regardless of your computing environment. This gives you an ultra-portable asset manager! See [further reading on retrieving asset paths](seek.md).

### Retrieve paths to *remote* refgenie-managed assets

Use `seekr` (short for "seek remote") to retrieve remote `seek_key` targets:

```console
refgenie seekr mm10/fasta.fai
```

This will return the path to the particular remote file of interest, here: FASTA index file, which is a part of `mm10/fasta` asset.

See [further reading on using refgenie in remote mode](remote.md).

---
If you want to read more about the motivation behind refgenie and the software engineering that makes refgenie work, proceed next to the [overview](overview.md).

0 comments on commit de1ae09

Please sign in to comment.