-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Daniel López
committed
Mar 26, 2020
1 parent
a04e0d8
commit 4f76589
Showing
24 changed files
with
1,407 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
FROM python:3 | ||
|
||
COPY requirements.txt ./ | ||
|
||
RUN pip install --upgrade pip && \ | ||
pip install --no-cache-dir -r requirements.txt && \ | ||
pip install --no-cache-dir smaca | ||
|
||
ENTRYPOINT ["smaca"] | ||
|
||
CMD ["--help"] |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
========================================================================================== | ||
SMAca: SMN1 copy-number and sequence variant analysis from next generation sequencing data | ||
========================================================================================== | ||
|
||
* `summary`_ | ||
* `usage`_ | ||
* `output`_ | ||
* `interpretation`_ | ||
* `instalation`_ | ||
* `citation`_ | ||
|
||
|
||
summary | ||
------- | ||
|
||
Spinal Muscular Atrophy (SMA) is a severe neuromuscular autosomal recessive disorder affecting 1/10,000 live births. Most SMA patients present homozygous deletion of SMN1, while most SMA carriers present only a single SMN1 copy. The sequence similarity between SMN1 and SMN2, and the complexity of the SMN locus, make the estimation of the SMN1 copy-number difficult by next generation sequencing (NGS). | ||
|
||
SMAca is a python tool to detect putative SMA carriers and estimate the absolute SMN1 copy-number in a population. Moreover, SMAca takes advantage of the knowledge of certain variants specific to SMN1 duplication to also identify the so-called “silent carriers”. | ||
|
||
This tool is developed with multithreading supported to afford high performance and a focus on easy installation. This combination makes it especially attractive to be integrated into production NGS pipelines. | ||
|
||
|
||
|
||
|
||
|
||
usage | ||
----- | ||
|
||
You can run SMAca by typing at the terminal: | ||
|
||
:: | ||
|
||
$ smaca sample1.bam sample2.bam sample3.bam | ||
|
||
|
||
|
||
For a large number of samples, the **ncpus** option is recommended: | ||
|
||
:: | ||
|
||
$ smaca --output results.batch1.csv --ncpus 24 $(cat samplelist.batch1.txt) | ||
|
||
|
||
|
||
For additional options use: | ||
|
||
:: | ||
|
||
$ smaca --help | ||
|
||
|
||
|
||
|
||
output | ||
------ | ||
|
||
SMAca outputs a number of statistics for each sample: | ||
|
||
:Pi_p: scaled proportion of SMN1 reads for positions *p*. | ||
|
||
:cov_x_p: raw coverage of gene *x* at position *p*. | ||
|
||
:avg_cov_x: average coverage for the whole gene *x*. | ||
|
||
:std_control: standard deviation for the average coverage of the 20 control. | ||
|
||
:g.27134T>G: consensus sequence at position 27134 as well as counts for "A", "C", "G" and "T". | ||
|
||
:g.27706_27707delAT: consensus sequence at positions 27706-27707 as well as counts for "A", "C", "G" and "T". | ||
|
||
:scale_factor: scale factor proportional to the total SMN1 and SMN2 copy number. | ||
|
||
|
||
|
||
|
||
interpretation | ||
-------------- | ||
|
||
SMA carriers with a single SMN1 copy are expected to have **Pi_b** values under 1/3. However, complex SMN reorganizations may leads to large differences between **Pi_a**, **Pi_b** and **Pi_c**. These cases should be analized carefully. | ||
|
||
The **scale_factor**, that is proportional to the absolute number of SMN1 and SMN2 copies, and **cov_x_p** can be used to estimate the absolute SMN1:SMN2 copy-number as follows: | ||
|
||
======== ============ ===================== | ||
genotype scale_factor cov_SMN1_p/cov_SMN2_p | ||
======== ============ ===================== | ||
1:3 1 1/3 | ||
1:2 0.75 1/2 | ||
1:1 0.5 1 | ||
======= ============= ===================== | ||
|
||
In order to detect the so-called *silent carriers* (i.e. individuals with two copies of SMN1 on one chromosome, but none on the other), the consensus sequence at the two locations should also be taken into account. Depending on the number of SMN2 copies, the expected **scale_factor** should be close to 0.75 (2:1) or 0.5 (2:0) and, in both cases, the scaled proportion of SMN1 reads **Pi_p** should be close to 1/2 in each position. | ||
|
||
|
||
|
||
|
||
instalation | ||
----------- | ||
|
||
SMAca is available through PyP: | ||
|
||
:: | ||
|
||
$ pip install smaca | ||
|
||
If you are using the conda packaging manager (e.g. miniconda or anaconda), you can install SMAca from the bioconda channel: | ||
|
||
:: | ||
|
||
$ conda config --add channels defaults | ||
$ conda config --add channels conda-forge | ||
$ conda config --add channels bioconda | ||
$ conda install smaca | ||
|
||
Developers can clone the repository, create a conda/pip environment and install in editable mode: | ||
|
||
:: | ||
|
||
$ git clone git+https://www.github.com/babelomics/SMAca.git | ||
$ cd SMAca | ||
$ python -m venv smaca_venv | ||
$ source smaca_venv/bin/activate | ||
$ pip install --editable=. | ||
|
||
|
||
|
||
citation | ||
-------- | ||
|
||
Daniel Lopez-Lopez, Rosario Carmona, Carlos Loucera, Virginia Aquino, Josefa Salgado, Angel Alonso, Joaquín Dopazo (2020). SMAca: SMN1 copy-number and sequence variant analysis from next generation sequencing data, XXX |
Submodule SMA_test
added at
63c356
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
click | ||
numpy | ||
pysam | ||
joblib |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
from setuptools import setup, find_packages | ||
|
||
setup( | ||
name='smaca', | ||
version='1.0', | ||
packages=find_packages(), | ||
url='git@gitlab.cbra.com:dlopez/SMA_test.git', | ||
license='GNU General Public License v.3.0', | ||
author='Daniel López López, Carlos Loucera', | ||
author_email='daniel.lopez.lopez@juntadeandalucia.es, carlos.loucera@juntadeandalucia.es', | ||
long_description='A python module for detecting spinal muscular atrophy carriers', | ||
python_requires='>=3.6', | ||
install_requires=[ | ||
'click', | ||
'cython', | ||
'numpy', | ||
'pysam', | ||
'joblib' | ||
], | ||
entry_points={ | ||
'console_scripts': ['smaca = smaca.cli:main'] | ||
} | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
Metadata-Version: 1.0 | ||
Name: smaca | ||
Version: 1.0 | ||
Summary: UNKNOWN | ||
Home-page: git@gitlab.cbra.com:dlopez/SMA_test.git | ||
Author: Daniel López López | ||
Author-email: daniel.lopez.lopez@juntadeandalucia.es | ||
License: GNU General Public License v.3.0 | ||
Description: A python module for detecting spinal muscular atrophy carriers | ||
Platform: UNKNOWN |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
README.rst | ||
setup.py | ||
smaca/__init__.py | ||
smaca/cli.py | ||
smaca/constants.py | ||
smaca/sma.py | ||
smaca/test_results.py | ||
smaca/utils.py | ||
smaca.egg-info/PKG-INFO | ||
smaca.egg-info/SOURCES.txt | ||
smaca.egg-info/dependency_links.txt | ||
smaca.egg-info/entry_points.txt | ||
smaca.egg-info/requires.txt | ||
smaca.egg-info/top_level.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[console_scripts] | ||
smaca = smaca.cli:main | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
click | ||
numpy | ||
pysam | ||
joblib |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
smaca |
Empty file.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# coding: utf-8 | ||
""" | ||
author: Daniel López | ||
email: daniel.lopez.lopez@juntadeandalucia.es | ||
SMA carrier Test program logic. | ||
""" | ||
|
||
import atexit | ||
import cProfile | ||
import io | ||
import pstats | ||
|
||
import click | ||
|
||
from smaca.sma import SmaCalculator | ||
|
||
|
||
@click.command() | ||
@click.option("--profile", is_flag=True) | ||
@click.option('--output', | ||
default="output.csv", | ||
type=click.Path(writable=True), | ||
help='output file') | ||
@click.option('--ncpus', | ||
default=1, | ||
type=int, | ||
help='number of cores to use') | ||
@click.argument("bam_list", | ||
type=click.Path(exists=True), | ||
nargs=-1, | ||
required=True) | ||
def main(profile, output, bam_list, ncpus): | ||
""" | ||
Predict proportion of SMN1:SMN2 for a set of BAM files. | ||
bamlist: input BAM files list | ||
""" | ||
if not bam_list: | ||
ctx = click.get_current_context() | ||
ctx.get_help() | ||
ctx.exit() | ||
|
||
if profile: | ||
print("Profiling...") | ||
prf = cProfile.Profile() | ||
prf.enable() | ||
|
||
def exit(): | ||
prf.disable() | ||
print("Profiling completed") | ||
ios = io.StringIO() | ||
pstats.Stats(prf, | ||
stream=ios).sort_stats("cumulative").print_stats() | ||
print(ios.getvalue()) | ||
|
||
atexit.register(exit) | ||
|
||
res = SmaCalculator(bam_list, n_jobs=ncpus) | ||
res.write_stats(output) | ||
|
||
|
||
if __name__ == "__main__": | ||
# pylint: disable=no-value-for-parameter | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# coding: utf-8 | ||
""" | ||
author: Daniel López | ||
email: daniel.lopez.lopez@juntadeandalucia.es | ||
SMA carrier Test samtools constants file. Genomic ranges are 0-based, stop-excluded | ||
""" | ||
|
||
SMN = {"SMN1": [5, 70220767, 70249769], "SMN2": [5, 69345349, 69374349]} | ||
|
||
GENES = { | ||
"ACAD9": [3, 128598332, 128634910], | ||
"ATR": [3, 142168076, 142297668], | ||
"CYP11B1": [8, 143953771, 143961262], | ||
"EDNRB": [13, 78469615, 78493903], | ||
"FASTKD2": [2, 207630080, 207660913], | ||
"FOXN1": [17, 26833260, 26865914], | ||
"HEXB": [5, 73935847, 74018472], | ||
"IQCB1": [3, 121488609, 121553926], | ||
"ITGA6": [2, 173292081, 173371181], | ||
"IVD": [15, 40697685, 40728146], | ||
"LMNA": [1, 156052363, 156109880], | ||
"LRPPRC": [2, 44113362, 44223144], | ||
"NTRK1": [1, 156785431, 156851642], | ||
"PTEN": [10, 89622869, 89731687], | ||
"RAB3GAP1": [2, 135809834, 135933964], | ||
"RAPSN": [11, 47459307, 47470730], | ||
"SIL1": [5, 138282408, 138629246], | ||
"SLC22A5": [5, 131705400, 131731306], | ||
"SLC35D1": [1, 67465014, 67520080], | ||
"STIM1": [11, 3875756, 4114440] | ||
} | ||
|
||
SMN1_POS = { | ||
"SMN1_a": [5, 70247723, 70247724], | ||
"SMN1_b_e7": [5, 70247772, 70247773], | ||
"SMN1_c": [5, 70247920, 70247921] | ||
} | ||
|
||
SMN2_POS = { | ||
"SMN2_a": [5, 69372303, 69372304], | ||
"SMN2_b_e7": [5, 69372352, 69372353], | ||
"SMN2_c": [5, 69372500, 69372501] | ||
} | ||
|
||
DUP_MARK = { | ||
"g.27134T>G": [5, 70247900, 70247901], | ||
"g.27706_27707delAT": [5, 70248472, 70248474] | ||
} | ||
|
||
HEADER_FILE = \ | ||
"id," \ | ||
"Pi_a," \ | ||
"Pi_b," \ | ||
"Pi_c," \ | ||
"cov_SMN1_a," \ | ||
"cov_SMN1_b," \ | ||
"cov_SMN1_c," \ | ||
"cov_SMN2_a," \ | ||
"cov_SMN2_b," \ | ||
"cov_SMN2_c," \ | ||
"avg_cov_SMN1," \ | ||
"avg_cov_SMN2," \ | ||
"scale_factor," \ | ||
"std_control," \ | ||
"g.27134T>G," \ | ||
"g.27706_27707delAT," \ | ||
"avg_cov_ACAD9," \ | ||
"avg_cov_ATR," \ | ||
"avg_cov_CYP11B1," \ | ||
"avg_cov_EDNRB," \ | ||
"avg_cov_FASTKD2," \ | ||
"avg_cov_FOXN1," \ | ||
"avg_cov_HEXB," \ | ||
"avg_cov_IQCB1," \ | ||
"avg_cov_ITGA6," \ | ||
"avg_cov_IVD," \ | ||
"avg_cov_LMNA," \ | ||
"avg_cov_LRPPRC," \ | ||
"avg_cov_NTRK1," \ | ||
"avg_cov_PTEN," \ | ||
"avg_cov_RAB3GAP1," \ | ||
"avg_cov_RAPSN," \ | ||
"avg_cov_SIL1," \ | ||
"avg_cov_SLC22A5," \ | ||
"avg_cov_SLC35D1," \ | ||
"avg_cov_STIM1" |
Oops, something went wrong.