Skip to content

Commit

Permalink
Merge pull request #69 from databio/dev
Browse files Browse the repository at this point in the history
0.4.3
  • Loading branch information
nsheff committed Jun 21, 2019
2 parents b4bab68 + 4a527fb commit 9756bcf
Show file tree
Hide file tree
Showing 15 changed files with 558 additions and 297 deletions.
16 changes: 16 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
language: python
python:
- "2.7"
- "3.5"
- "3.6"
os:
- linux
install:
- pip install .
- pip install -r requirements/requirements-all.txt
- pip install -r requirements/requirements-test.txt
script: pytest --remote-data --cov=refgenconf
branches:
only:
- dev
- master
14 changes: 7 additions & 7 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,19 @@

## What is refgenie?

Refgenie is full-service reference genome manager. It provides command-line and python interfaces to download pre-built reference genome "assets" like indexes used by different bioinformatics tools. It can also build assets for custom genome assemblies, and it facilitates systematic organization of, and access to, local genome "assets."
Refgenie is full-service reference genome manager that organizes storage, access, and transfer of reference genomes. It provides command-line and python interfaces to download pre-built reference genome "assets" like indexes used by bioinformatics tools. It can also build assets for custom genome assemblies.

## What makes refgenie better?

Refgenie provides programmatic access to a standard genome folder structure, so that software can easily swap from one genome to another. There are other similar projects, but Refgenie has a few advantages:
Refgenie provides programmatic access to a standard genome folder structure, so that software can easily swap from one genome to another. Refgenie's advantages are:

1. **It provides a command-line interface to download individual resources**. Think of it as `GitHub` for reference genomes. You just type `refgenie pull -g hg38 -a kallisto`.
1. **It provides a command-line interface to download individual resources**. Think of it as `GitHub` for reference genomes. You just type `refgenie pull -g hg38 -a kallisto_index`.

2. **It's scripted**. In case you need resources *not* on the server, such as for a custom genome, refgenie provides a `build` function to create your own: `refgenie build -i custom.fa.gz -a bowtie2`.
2. **It's scripted**. In case you need resources *not* on the server, such as for a custom genome, refgenie provides a `build` function to create your own: `refgenie build -g custom_genome -a bowtie2_index`.

3. **It includes a python API**. For tool developers, you use `cfg = refgenie.RefGenConf("genomes.yaml")` to get a python object with paths to any genome asset, *e.g.*, `cfg.hg38.kallisto`.
3. **It includes a python API**. For tool developers, you use `cfg = refgenie.RefGenConf("genomes.yaml")` to get a python object with paths to any genome asset, *e.g.*, `cfg.hg38.kallisto_index`.

4. When a new asset is downloaded, Refgenie can automatically update a local configuration file that acts as a sort of filesystem oracle for locally available genome assets. It's aware of the path to each resource that's been downloaded or otherwise declared.
4. **It maintains a repository of local asset locations**. Refgenie maintains a local configuration file with the metadata for each resource that's been downloaded or built.

## Quick example

Expand Down Expand Up @@ -56,7 +56,7 @@ See [further reading on downloading assets](download.md).


```console
refgenie build --input hg38.fa.gz --asset kallisto
refgenie build --genome hg38 --asset kallisto_index --fasta hg38.fa.gz
```

See [further reading on building assets](build.md).
Expand Down
124 changes: 89 additions & 35 deletions docs/autodoc_build/refgenconf.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,20 @@
# Package refgenconf Documentation

## Class MissingAssetError
Error type for request of an unavailable genome asset.


## Class RefgenconfError
Base exception type for this package


## Class MissingGenomeError
Error type for request of unknown genome/assembly.


## Class UnboundEnvironmentVariablesError
Use of environment variable that isn't bound to a value.


## Class GenomeConfigFormatError
Exception for invalid genome config file format.


## Class MissingConfigDataError
Missing required configuration instance items


## Class RefGenConf
A sort of oracle of available reference genome assembly assets


### assets\_dict
Map each assembly name to a list of available asset names.
```python
def assets_dict(self)
def assets_dict(self, order=None)
```

#### Parameters:

- `order` -- ``: function(str) -> object how to key genome IDs for sort


#### Returns:

`Mapping[str, Iterable[str]]`: mapping from assembly name tocollection of available asset names.
Expand All @@ -44,14 +25,15 @@ def assets_dict(self)
### assets\_str
Create a block of text representing genome-to-asset mapping.
```python
def assets_str(self, offset_text=' ', asset_sep='; ', genome_assets_delim=': ')
def assets_str(self, offset_text=' ', asset_sep='; ', genome_assets_delim=': ', order=None)
```

#### Parameters:

- `offset_text` -- `str`: text that begins each line of the textrepresentation that's produced
- `asset_sep` -- `str`: the delimiter between names of types of assets,within each genome line
- `genome_assets_delim` -- `str`: the delimiter to place betweenreference genome assembly name and its list of asset names
- `order` -- ``: function(str) -> object how to key genome IDs and assetnames for sort


#### Returns:
Expand All @@ -61,10 +43,30 @@ def assets_str(self, offset_text=' ', asset_sep='; ', genome_assets_delim=': ')



### filepath
Determine path to a particular asset for a particular genome.
```python
def filepath(self, genome, asset, ext='.tar')
```

#### Parameters:

- `genome` -- `str`: reference genome iD
- `asset` -- `str`: asset name
- `ext` -- `str`: file extension


#### Returns:

`str`: path to asset for given genome and asset kind/name




### genomes\_list
Get a list of this configuration's reference genome assembly IDs.
```python
def genomes_list(self)
def genomes_list(self, order=None)
```

#### Returns:
Expand All @@ -77,9 +79,14 @@ def genomes_list(self)
### genomes\_str
Get as single string this configuration's reference genome assembly IDs.
```python
def genomes_str(self)
def genomes_str(self, order=None)
```

#### Parameters:

- `order` -- ``: function(str) -> object how to key genome IDs for sort


#### Returns:

`str`: single string that lists this configuration's knownreference genome assembly IDs
Expand All @@ -90,7 +97,7 @@ def genomes_str(self)
### get\_asset
Get an asset for a particular assembly.
```python
def get_asset(self, genome_name, asset_name, strict_exists=True, check_exist=<function RefGenConf.<lambda> at 0x7fc96c0d7158>)
def get_asset(self, genome_name, asset_name, strict_exists=True, check_exist=<function RefGenConf.<lambda> at 0x7f9b5c8f9378>)
```

#### Parameters:
Expand Down Expand Up @@ -118,12 +125,13 @@ def get_asset(self, genome_name, asset_name, strict_exists=True, check_exist=<fu
### list\_assets\_by\_genome
List types/names of assets that are available for one--or all--genomes.
```python
def list_assets_by_genome(self, genome=None)
def list_assets_by_genome(self, genome=None, order=None)
```

#### Parameters:

- `genome` -- `str | NoneType`: reference genome assembly ID, optional;if omitted, the full mapping from genome to asset names
- `order` -- ``: function(str) -> object how to key genome IDs and assetnames for sort


#### Returns:
Expand All @@ -136,12 +144,13 @@ def list_assets_by_genome(self, genome=None)
### list\_genomes\_by\_asset
List assemblies for which a particular asset is available.
```python
def list_genomes_by_asset(self, asset=None)
def list_genomes_by_asset(self, asset=None, order=None)
```

#### Parameters:

- `asset` -- `str | NoneType`: name of type of asset of interest, optional
- `order` -- ``: function(str) -> object how to key genome IDs and assetnames for sort


#### Returns:
Expand All @@ -151,15 +160,34 @@ def list_genomes_by_asset(self, asset=None)



### list\_local
List locally available reference genome IDs and assets by ID.
```python
def list_local(self, order=None)
```

#### Parameters:

- `order` -- ``: function(str) -> object how to key genome IDs and assetnames for sort


#### Returns:

`str, str`: text reps of locally available genomes and assets




### list\_remote
List genomes and assets available remotely.
```python
def list_remote(self, get_url=<function RefGenConf.<lambda> at 0x7fc96c0d7378>)
def list_remote(self, get_url=<function RefGenConf.<lambda> at 0x7f9b5c8f9620>, order=None)
```

#### Parameters:

- `get_url` -- `function(refgenconf.RefGenConf) -> str`: how to determineURL request, given RefGenConf instance
- `order` -- ``: function(str) -> object how to key genome IDs and assetnames for sort


#### Returns:
Expand All @@ -172,7 +200,7 @@ def list_remote(self, get_url=<function RefGenConf.<lambda> at 0x7fc96c0d7378>)
### pull\_asset
Download and possibly unpack one or more assets for a given ref gen.
```python
def pull_asset(self, genome, assets, genome_config, unpack=True, get_json_url=<function RefGenConf.<lambda> at 0x7fc96c0d7488>, get_main_url=None)
def pull_asset(self, genome, assets, genome_config, unpack=True, force=None, get_json_url=<function RefGenConf.<lambda> at 0x7f9b5c8f9730>, get_main_url=None, build_signal_handler=<function _handle_sigint at 0x7f9b5ce178c8>)
```

#### Parameters:
Expand All @@ -181,8 +209,10 @@ def pull_asset(self, genome, assets, genome_config, unpack=True, get_json_url=<f
- `assets` -- `str`: name(s) of particular asset(s) to fetch
- `genome_config` -- `str`: path to genome configuration file to update
- `unpack` -- `bool`: whether to unpack a tarball
- `force` -- `bool | NoneType`: how to handle case in which asset pathalready exists; null for prompt (on a per-asset basis), False to effectively auto-reply No to the prompt to replace existing file, and True to auto-replay Yes for existing asset replacement.
- `get_json_url` -- `function(str, str, str) -> str`: how to build URL fromgenome server URL base, genome, and asset
- `get_main_url` -- `function(str) -> str`: how to get archive URL frommain URL
- `build_signal_handler` -- `function(str) -> function`: how to createa signal handler to use during the download; the single argument to this function factory is the download filepath


#### Returns:
Expand Down Expand Up @@ -218,10 +248,34 @@ def update_genomes(self, genome, asset=None, data=None)



## Class MissingGenomeError
Error type for request of unknown genome/assembly.


## Class MissingConfigDataError
Missing required configuration instance items


## Class UnboundEnvironmentVariablesError
Use of environment variable that isn't bound to a value.


## Class RefgenconfError
Base exception type for this package


## Class GenomeConfigFormatError
Exception for invalid genome config file format.


## Class MissingAssetError
Error type for request of an unavailable genome asset.


### select\_genome\_config
Get path to genome configuration file.
```python
def select_genome_config(filename, conf_env_vars=None)
def select_genome_config(filename, conf_env_vars=None, **kwargs)
```

#### Parameters:
Expand All @@ -238,4 +292,4 @@ def select_genome_config(filename, conf_env_vars=None)



**Version Information**: `refgenconf` v0.1.2, generated by `lucidoc` v0.4dev
**Version Information**: `refgenconf` v0.2.1-dev, generated by `lucidoc` v0.4dev

0 comments on commit 9756bcf

Please sign in to comment.