MiniMotif (beta version)

Project Description

MiniMotif is a tool that detects transcription factor binding sites in a given genome.

MiniMotif detects transcription factor binding sites (TFBS) in a given genome, by combining the power of Position Weight Matrices (PWMs) and profile Hidden Markov Models (pHMMs). If the binding site of interest is gapless, then a Position Weight Matrix (PWM) is created and the tool MOODS is used to find any occurrences of the motif within the genome. Alternatively, if the binding site contains gaps (i.e. sigma factor binding sites with variable spacer length), then MiniMotif constructs profile Hidden Markov Models (pHMMs) and interrogates the genome with the nhmmscan flavor of HMMER. In addition, it allows the scanning of a genome with a premade set of TFBSs.

Requirements

The following instructions require the installation of the following on your machine (links to installation guidelines) :

Git ( https://github.com/git-guides/install-git )
conda ( https://conda.io/projects/conda/en/latest/user-guide/install/index.html )

Installation

Download MiniMotif with the following command:

git clone https://github.com/HAugustijn/MiniMotif.git

Note: Requires installation of git.

Then install all the dependencies from the minimotif.yml file with the following:

cd MiniMotif
conda env create -f minimotif.yml minimotif
conda activate MiniMotif

Note: Requires installation of conda.

Note 2: Remember to activate the MiniMotif environment every time you use MiniMotif!

Quick usage

Generally, MiniMotif can be used with the following command:

python3 minimotif.py [optional arguments] -i [binding site fasta] -G [genome_file] -O [output_directory]

Example: Given an input genome file test_genome.gbk and a binding site file test.fasta, the following command will output the results in the directory "output_dir":
python3 minimotif.py -i test.fasta -G test_genome.gbk -O output_dir

You can run minimotif for a test case, that we include in this repository, using the following code:

python3 minimotif.py -pc -G test_data/test_genome.gb -O test_out
# This will scan the test genome with precalculated PWMs, and store the output in a directory called test_out

For further information, please read carefully the following paragraphs.

1) Query a genome with precalculated PWMs

MiniMotif requires a genome file in .gb format and allows the automated search of a genome by a set of precalculated PWMs from transcription factors of Streptomyces coelicolor, using the following command:

python3 minimotif.py -pc -G [genome_file] -O [output_directory] 

Example: Given an input genome file test_genome.gb, the following command will output the results in the directory "output_dir":
python3 minimotif.py -pc -G test_genome.gb -O output_dir

Notes:

The genome filename has to be formatted as: [organism]_genome.gbk. i.e. scoe_genome.gb.
Specifying an output directory is mandatory.

2) Query a genome using custom binding site sequences

The user can specify a binding site file in a .fasta format, for a given transcription factor. Each sequence in the multi-fasta file corresponds to one binding site. The sequences are used to construct a binding site profile.

Example: test.fasta

>1
ACTGGTCTAGACAACT
>2
ACTGGTCTAGACAAGA
>3
ACTGGTCTACACCAGT
>4
ACAGGTCTACACCACT
>5
AGTGGTGTAGACCACC
>6
ATTGGTCTAAACCACA

Then, using the following command the tool decides if the profile is gapped or ungapped, based on Shannon Information Content:

python3 minimotif.py -i test.fasta -G test_genome.gb -O output_dir

If the user knows that the motif is gapped, ungapped or wants both the PWM and pHMM branches to be used, then the flag -am (--analysis-mode) allows it:

python3 minimotif.py -i test.fasta -am gapped -G test_genome.gb -O output_dir

Notes: -am can be set to "ungapped" (PWMs), "gapped" (pHMMs), "both" (PWMs and pHMMs), and "auto"( Default)

Here's a full description of all the optional arguments:

Optional arguments:
    -i  Provide the binding profiles in fasta format
    -w  Minimal width of the meme detection module. Default: 10
    -ps Pseudocount used to generate the PWM matrices. Default: 0.1
    -l  Use this flag to output .png sequence logo files
    -co Include this flag to detect TFBSs occurrences in coding regions
    -r  Range of the regulatory region. Default: -350 50
    -c  Range between genes that are considered to be co-regulated. Default: -50 40
    -p  P-value threshold used for the PWM detection module. Default: 0.00001
    -pc Add this flag to run on pre-calculated PWM matrices
    -b  Run MOODS in batch mode. In this mode, the p-value is not separately 
        calculated which increases the run speed. Default: True
    -m  Mode for the HMM detection module. Options: spacer_masking or positional_masking.
        Positional_masking masks nucleotides individually, if their information content 
        is over the given threshold. Spacer_masking assumes that nucleotides belonging 
        to -10 and -35 regions are significantly more conserved than the spacer nucleotides.
        Default: spacer_masking
    -ic Information content threshold. Default: 1.0
    -la Adjust the length of the alignments that are outputted from the 
        script, in comparison with full alignments. The default is 1 nucleotide
        less than the global alignment between pHMM models and the query
        sequence. Default: 1
    -am Analysis mode. Default: auto (gapped, ungapped, both)

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
bin		bin
minimotif_scripts		minimotif_scripts
test_data		test_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
minimotif.py		minimotif.py
minimotif.yml		minimotif.yml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

minimotif_scripts

minimotif_scripts

test_data

test_data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

minimotif.py

minimotif.py

minimotif.yml

minimotif.yml

setup.cfg

setup.cfg

setup.py

setup.py

Repository files navigation

MiniMotif (beta version)

Project Description

Requirements

Installation

Quick usage

1) Query a genome with precalculated PWMs

2) Query a genome using custom binding site sequences

References

License

About

Releases

Contributors 2

Languages

License

HAugustijn/MiniMotif

Folders and files

Latest commit

History

Repository files navigation

MiniMotif (beta version)

Project Description

Requirements

Installation

Quick usage

1) Query a genome with precalculated PWMs

2) Query a genome using custom binding site sequences

References

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages