ProPhex

ProPhex is an efficient k-mer index with a small memory footprint. It uses the BWA implementation of the BWT-index. ProPhex is designed as a core computational component of ProPhyle, a phylogeny-based metagenomic classifier allowing fast and accurate read assignment.

Getting started

git clone https://github.com/prophyle/prophex
cd prophex && make -j

Alternative ways of installation

conda install prophex

Quick example

# Build a ProPhex index
./prophex index -k 25 index.fa

# Query reads from reads.fq for k=25 (with k-LCP)
./prophex query -k 25 -u -t 4 index.fa index.fq

# Query reads from reads.fq for k=20 (with 4 threads and without k-LCP)
./prophex query -k 20 index.fa index.fq

ProPhex commands

Program: prophex (a lossless k-mer index)
Version: 0.1.1
Authors: Kamil Salikhov, Karel Brinda, Simone Pignotti, Gregory Kucherov
Contact: kamil.salikhov@univ-mlv.fr

Usage:   prophex <command> [options]

Command: index           construct a BWA index and k-LCP
         query           query reads against index

         klcp            construct an additional k-LCP
         bwtdowngrade    downgrade .bwt to the old, more compact format without Occ
         bwt2fa          reconstruct FASTA from BWT

Usage:   prophex index [options] <idxbase>
Options: -k INT    k-mer length for k-LCP
         -s        construct k-LCP and SA in parallel
         -i        sampling distance for SA
         -h        print help message

Usage:   prophex query [options] <idxbase> <in.fq>

Options: -k INT    length of k-mer
         -u        use k-LCP for querying
         -v        output set of chromosomes for every k-mer
         -p        do not check whether k-mer is on border of two contigs, and show such k-mers in output
         -b        print sequences and base qualities
         -l STR    log file name to output statistics
         -t INT    number of threads [1]
         -h        print help message

Usage:   prophex klcp [options] <idxbase>

Options: -k INT    length of k-mer
         -s        construct k-LCP and SA in parallel
         -i        sampling distance for SA
         -h        print help message

Usage:   prophex bwtdowngrade <input.bwt> <output.bwt>
         -h        print help message

Usage:   prophex bwt2fa <idxbase> <output.fa>
         -h        print help message

Output format

Matches are reported in an extended Kraken format. ProPhex produces a tab-delimited file with the following columns:

Category (unused, U as a legacy value)
Sequence name
Final decision (unused, 0 as a legacy value)
Sequence length
Assigned k-mers. Space-delimited list of k-mer blocks with the same assignments. The list is of the following format: comma-delimited list of sets (or A for ambiguous, or 0 for no matches), colon, length. Example: 2157,393595:1 393595:1 0:16 (the first k-mer assigned to the nodes 2157 and 393595, the second k-mer assigned to 393595, the subsequent 16 k-mers unassigned)
Bases (optional)
Base qualities (optional)

FAQs

Can I remove duplicate k-mers from the index in order to use less memory when querying?

Yes, duplicate k-mers can be removed using ProphAsm, which assembles contigs by greedy enumeration of disjoint paths in the associated de-Bruijn graph. BCalm is another tool that can be used with ProPhex. Compared to ProPhex, BCalm has a smaller memory footprint. On the other hand, the resulting FASTA file can be significantly bigger (when assemblying, BCalm stops at every branching k-mer).

Issues

Please use Github issues.

Changelog

See Releases.

Licence

MIT

Authors

Kamil Salikhov <salikhov.kamil@gmail.com>

Karel Brinda <karel.brinda@inria.fr>

Simone Pignotti <pignottisimone@gmail.com>

Gregory Kucherov <gregory.kucherov@univ-mlv.fr>

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
src		src
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

tests

tests

.clang-format

.clang-format

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE.txt

LICENSE.txt

Makefile

Makefile

README.md

README.md

Repository files navigation

ProPhex

Getting started

Alternative ways of installation

Quick example

ProPhex commands

Output format

FAQs

Issues

Changelog

Licence

Authors

About

Releases 2

Packages

Contributors 3

Languages

License

prophyle/prophex

Folders and files

Latest commit

History

Repository files navigation

ProPhex

Getting started

Alternative ways of installation

Quick example

ProPhex commands

Output format

FAQs

Issues

Changelog

Licence

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Languages