Skip to content

Commit

Permalink
Merge pull request #197 from refgenie/dev
Browse files Browse the repository at this point in the history
v0.10.0
  • Loading branch information
stolarczyk committed Mar 11, 2021
2 parents b47d37d + f68a1b6 commit 2b84ca5
Show file tree
Hide file tree
Showing 46 changed files with 4,114 additions and 1,655 deletions.
11 changes: 11 additions & 0 deletions .github/workflows/black.yml
@@ -0,0 +1,11 @@
name: Lint

on: [push, pull_request]

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- uses: psf/black@stable
140 changes: 140 additions & 0 deletions .github/workflows/test-refgenie-cli.yml
@@ -0,0 +1,140 @@
name: Test refgenie CLI

on:
push:
branches: [master, dev]

jobs:
test_CLI:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: [3.6, 3.8]
os: [ubuntu-latest, macos-latest]

steps:
- uses: actions/checkout@v2

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}

- name: Install dev dependancies
run: if [ -f requirements/requirements-dev.txt ]; then pip install -r requirements/requirements-dev.txt; fi

- name: Install package
run: python -m pip install .

- name: install macOS-specific dependancies
if: startsWith(matrix.os, 'macOS')
run: brew install md5sha1sum

- name: create genomes dir
run: mkdir genomes

- name: refgenie init
working-directory: ./genomes
run: refgenie init -c g.yaml; cat g.yaml

- name: refgenie list
working-directory: ./genomes
run: refgenie list -c g.yaml

- name: refgenie build fasta (parent asset)
run: |
refgenie build -c genomes/g.yaml t7/fasta --files fasta=tests/data/t7.fa.gz --recipe tests/data/recipe_parent.json
./tests/assert_in_file.sh genomes/g.yaml t7 0
./tests/assert_in_file.sh genomes/g.yaml 6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905 0 # this is a digest that should be produced from this FASTA file
- name: refgenie build fasta_child (child asset)
run: |
refgenie build -c genomes/g.yaml t7/fasta_child --recipe tests/data/recipe_child.json
./tests/assert_in_file.sh genomes/g.yaml fasta_child 0
if [ -L `refgenie seek -c genomes/g.yaml t7/fasta_child` ]; then
echo "`refgenie seek -c genomes/g.yaml t7/fasta_child` exists."
else
echo "Error: `refgenie seek -c genomes/g.yaml t7/fasta_child` does not exist."
exit 1
fi
if [ -d genomes/data/6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905/fasta_child/default ]; then
echo "'genomes/data/6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905/fasta_child/default' exists."
else
echo "Error: 'genomes/data/6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905/fasta_child/default' does not exist."
exit 1
fi
- name: refgenie list
working-directory: ./genomes
run: refgenie list -c g.yaml

- name: refgenie build fasta
run: refgenie build -c genomes/g.yaml t7/fasta --files fasta=tests/data/t7.fa.gz --recipe tests/data/recipe_parent.json

- name: refgenie set aliases
run: |
refgenie alias set -c genomes/g.yaml --aliases t7_new t7_new1 --digest 6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905
./tests/assert_in_file.sh genomes/g.yaml t7_new 0
./tests/assert_in_file.sh genomes/g.yaml t7_new1 0
if [ -L `refgenie seek -c genomes/g.yaml t7_new/fasta` ]; then
echo "`refgenie seek -c genomes/g.yaml t7_new/fasta` exists."
else
echo "Error: `refgenie seek -c genomes/g.yaml t7_new/fasta` does not exist."
exit 1
fi
if [ -L `refgenie seek -c genomes/g.yaml t7_new1/fasta` ]; then
echo "`refgenie seek -c genomes/g.yaml t7_new1/fasta` exists."
else
echo "Error: `refgenie seek -c genomes/g.yaml t7_new1/fasta` does not exist."
exit 1
fi
- name: refgenie remove aliases
run: |
refgenie alias set -c genomes/g.yaml --aliases t7_another --digest 6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905
refgenie alias remove -c genomes/g.yaml --aliases t7_new t7_new1 t7 --digest 6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905
./tests/assert_in_file.sh genomes/g.yaml t7_new 1
./tests/assert_in_file.sh genomes/g.yaml t7_new1 1
./tests/assert_in_file.sh genomes/g.yaml t7_another 0
if [ -L genomes/alias/t7_new/fasta/default/6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905.fa.gz ]; then
echo "'genomes/alias/t7_new/fasta/default/6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905.fa.gz' exists."
exit 1
else
echo "Error: 'genomes/alias/t7_new/fasta/default/6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905.fa.gz' does not exist."
fi
- name: refgenie get aliases
run: |
refgenie alias get -c genomes/g.yaml
- name: refgneie add asset
run: |
refgenie add t7_another/test_asset -c genomes/g.yaml --path ../tests/data --seek-keys '{"recipe": "recipe_parent.json"}'
./tests/assert_in_file.sh genomes/g.yaml test_asset 0
if [ -L `refgenie seek t7_another/test_asset.recipe:default -c genomes/g.yaml` ]; then
echo "`refgenie seek t7_another/test_asset.recipe:default -c genomes/g.yaml` exists."
else
echo "Error: `refgenie seek t7_another/test_asset.recipe:default -c genomes/g.yaml` does not exist."
exit 1
fi
- name: refgenie tag asset
run: |
refgenie tag -c genomes/g.yaml t7_another/fasta_child:default -t new_tag -f
./tests/assert_in_file.sh genomes/g.yaml new_tag 0
if [ -f `refgenie seek t7_another/fasta_child:new_tag -c genomes/g.yaml` ]; then
echo "`refgenie seek t7_another/fasta_child:new_tag -c genomes/g.yaml` exists."
else
echo "Error: `refgenie seek t7_another/fasta_child:new_tag -c genomes/g.yaml` does not exist."
exit 1
fi
- name: refgenie id
run: |
./tests/assert_in_file.sh genomes/g.yaml `refgenie id -c genomes/g.yaml t7_another/fasta_child:new_tag` 0
- name: refgenie remove fasta_child
run: |
refgenie remove -c genomes/g.yaml t7_another/fasta_child -f
./tests/assert_in_file.sh genomes/g.yaml fasta_child 1
./tests/assert_in_file.sh genomes/g.yaml 6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905/fasta_child:new_tag 1 # test if the entry was removed from the fasta children list
3 changes: 3 additions & 0 deletions .gitignore
Expand Up @@ -80,3 +80,6 @@ refgenie.egg-info/
docs_jupyter/refgenie.yaml
docs_jupyter/rCRSd*
docs_jupyter/hs38d1*

# build dir
build/
1 change: 1 addition & 0 deletions MANIFEST.in
@@ -1,4 +1,5 @@
include requirements/*
include refgenie/schemas/*
include README.md
include LICENSE.txt
include refgenie/refgenie.yaml
Expand Down
3 changes: 2 additions & 1 deletion README.md
@@ -1,4 +1,5 @@
![Build package](https://github.com/refgenie/refgenie/workflows/Build%20package/badge.svg)
[![Build package](https://github.com/refgenie/refgenie/workflows/Build%20package/badge.svg)](https://github.com/refgenie/refgenie/actions?query=workflow%3A%22Build+package%22)
[![Test refgenie CLI](https://github.com/refgenie/refgenie/workflows/Test%20refgenie%20CLI/badge.svg)](https://github.com/refgenie/refgenie/actions?query=workflow%3A%22Test+refgenie+CLI%22)
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/refgenie/README.html)

<img src="https://raw.githubusercontent.com/databio/refgenie/master/docs/img/refgenie_logo.svg?sanitize=true" alt="Refgenie" height="70"/><br>
Expand Down
28 changes: 20 additions & 8 deletions docs/README.md
Expand Up @@ -20,6 +20,8 @@ Refgenie manages storage, access, and transfer of reference genome resources. It

4. **It includes a python API**. For tool developers, you use `rgc = refgenconf.RefGenConf("genomes.yaml")` to get a Python object with paths to any genome asset, *e.g.*, `rgc.seek("hg38", "kallisto_index")`.

5. **It strictly determines genomes compatibility**. Users refer to genomes with arbitrary aliases, like "hg38", but refgenie uses sequence-derived identifiers to verify genome identity with asset servers.


## Quick example

Expand All @@ -43,11 +45,16 @@ refgenie listr

Response:
```console
Querying available assets from server: http://refgenomes.databio.org/v2/assets
Remote genomes: mouse_chrM2x, rCRSd
Remote assets:
mouse_chrM2x/ bowtie2_index:default, fasta.chrom_sizes:default, fasta.fai:default, fasta:default
rCRSd/ bowtie2_index:default, fasta.chrom_sizes:default, fasta.chrom_sizes:test, fasta.fai:default, fasta.fai:test, fasta:default, fasta:test
Remote refgenie assets
Server URL: http://refgenomes.databio.org
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ genome ┃ assets ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ mouse_chrM2x │ fasta, bwa_index, bowtie2_index │
│ hg38 │ fasta, bowtie2_index │
│ rCRSd │ fasta, bowtie2_index │
│ human_repeats │ fasta, hisat2_index, bwa_index │
└─────────────────────┴──────────────────────────────────────────────┘
```

Next, pull one:
Expand All @@ -58,8 +65,13 @@ refgenie pull rCRSd/bowtie2_index

Response:
```console
'rCRSd/bowtie2_index:default' archive size: 116.8KB
Downloading URL: http://staging.refgenomes.databio.org/v2/asset/rCRSd/bowtie2_index/archive ...
Downloading URL: http://rg.databio.org/v3/assets/archive/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index
94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index:default ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 128.0/117.0 KB • 1.8 MB/s • 0:00:00
Download complete: /Users/mstolarczyk/Desktop/testing/refgenie/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index/bowtie2_index__default.tgz
Extracting asset tarball: /Users/mstolarczyk/Desktop/testing/refgenie/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index/bowtie2_index__default.tgz
Default tag for '94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index' set to: default
Created alias directories:
- /Users/mstolarczyk/Desktop/testing/refgenie/alias/rCRSd/bowtie2_index/default
```

See [further reading on downloading assets](pull.md).
Expand All @@ -70,7 +82,7 @@ Refgenie assets are scripted, so if what you need is not available remotely, you


```console
refgenie build mygenome/bwa_index --fasta mygenome.fa.gz
refgenie build mygenome/bwa_index
```

See [further reading on building assets](build.md).
Expand Down

0 comments on commit 2b84ca5

Please sign in to comment.