A little bioinformatics command line tool, for filtering and extracting UniProt FASTA sequences from a file.
For more information about the UniProt FASTA format, take a look at the official UniProt guide.
Run the following command to view all the available filtering options:
sieve --help
- Using conda (recommended)
conda install -c ehatton sieve
-
Using pip
Download the latest wheel file from the releases section and then install with the following pip command:
pip install sieve-1.0.0-py3-none-any.whl
Filter for human sequences (taxonomy id 9606):
sieve uniprot.fasta out.fasta -t 9606
Filter for human sequences with a maximum length of 100:
sieve uniprot.fasta out.fasta -t 9606 -max 100
Filter for human sequences with a length between 50 and 100:
sieve uniprot.fasta out.fasta -t 9606 -min 50 -max 100
Filter for sequences with gene name BRCA1 or BRCA1, reading from stdin and writing to stdout:
sieve - - -g BRCA1 -g BRCA2 < uniprot.fasta > out.fasta
Convert UniProt text format (flatfile) to FASTA format:
sieve uniprot.txt uniprot.fasta
Get help:
sieve --help
The parse_fasta (for UniProt FASTA files) or parse_text functions (for UniProt text files) can also be used in your own python scripts:
from sieve import FastaParser
with open("my_proteins.fasta", "r") as infile:
for protein in parse_fasta(infile):
print(protein) # or do your custom filtering here
python version 3.6 or above.
May also work with earlier versions of python 3 but this has not been tested.
unittest is used for the test suite. To run tests:
python -m unittest discover
- 1.0.0
- First release
Emma Hatton-Ellis – ehattonellis@gmail.com
Distributed under the MIT license. See LICENSE
for more information.