Skip to content

Commit

Permalink
Add sequence query functionnality
Browse files Browse the repository at this point in the history
  • Loading branch information
GuillaumeHolley committed Nov 21, 2016
1 parent f2c1e4a commit f671eb9
Show file tree
Hide file tree
Showing 217 changed files with 1,353 additions and 1,197 deletions.
Empty file modified .gitignore 100644 → 100755
Empty file.
Empty file modified Doxyfile 100644 → 100755
Empty file.
Empty file modified LICENSE 100644 → 100755
Empty file.
Empty file modified Makefile.in 100644 → 100755
Empty file.
2 changes: 2 additions & 0 deletions README.md 100644 → 100755
Expand Up @@ -110,6 +110,7 @@ bft build k treshold_compression {kmers|kmers_comp} list_genome_files output_fil
bft load file_bft [-add_genomes {kmers|kmers_comp} list_genome_files output_file] [Options]
Options:
[-query_sequences threshold list_sequence_files]
[-query_kmers {kmers|kmers_comp} list_kmer_files]
[-query_branching {kmers|kmers_comp} list_kmer_files]
[-extract_kmers {kmers|kmers_comp} kmers_file]
Expand All @@ -133,6 +134,7 @@ Command **load** loads a BFT from file *file_bft*.
### Options

* **-add_genomes** adds the genomes listed in *list_genome_files* to the BFT stored in *file_bft*, the new BFT is written in *output_file*
* **-query_sequences** queries the BFT for the sequences written in the files of *list_sequence_files*. For each file of *list_sequence_files* is output a CSV file: columns are the genomes represented in the BFT, rows are the queried sequences, the intersection of a column and a row is a binary value indicating if the sequence represented by the row is present in the genome represented by the column. Threshold is a float (0 < threshold <= 1) indicating the percentage of *k*-mers from each query sequence that must occur in sample *x* to be reported present in sample *x*.
* **-query_kmers** queries the BFT for *k*-mers written in the files of *list_kmer_files*. For each file of *list_kmer_files* is output a CSV file: columns are the genomes represented in the BFT, rows are the queried *k*-mers, the intersection of a column and a row is a binary value indicating if the *k*-mer represented by the row is present in the genome represented by the column.
* **-query_branching** queries the BFT for the number of *k*-mers written in the files of *list_kmer_files* that are branching in the colored de-Bruijn graph represented by the BFT.
* **-extract_kmers** extracts the *k*-mers stored in the BFT and writes them to a *k*-mers file named *kmers_file* (see below for input file types).
Expand Down
14 changes: 13 additions & 1 deletion configure
Expand Up @@ -650,6 +650,7 @@ infodir
docdir
oldincludedir
includedir
runstatedir
localstatedir
sharedstatedir
sysconfdir
Expand Down Expand Up @@ -721,6 +722,7 @@ datadir='${datarootdir}'
sysconfdir='${prefix}/etc'
sharedstatedir='${prefix}/com'
localstatedir='${prefix}/var'
runstatedir='${localstatedir}/run'
includedir='${prefix}/include'
oldincludedir='/usr/include'
docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
Expand Down Expand Up @@ -973,6 +975,15 @@ do
| -silent | --silent | --silen | --sile | --sil)
silent=yes ;;

-runstatedir | --runstatedir | --runstatedi | --runstated \
| --runstate | --runstat | --runsta | --runst | --runs \
| --run | --ru | --r)
ac_prev=runstatedir ;;
-runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
| --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
| --run=* | --ru=* | --r=*)
runstatedir=$ac_optarg ;;

-sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
ac_prev=sbindir ;;
-sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \
Expand Down Expand Up @@ -1110,7 +1121,7 @@ fi
for ac_var in exec_prefix prefix bindir sbindir libexecdir datarootdir \
datadir sysconfdir sharedstatedir localstatedir includedir \
oldincludedir docdir infodir htmldir dvidir pdfdir psdir \
libdir localedir mandir
libdir localedir mandir runstatedir
do
eval ac_val=\$$ac_var
# Remove trailing slashes.
Expand Down Expand Up @@ -1263,6 +1274,7 @@ Fine tuning of the installation directories:
--sysconfdir=DIR read-only single-machine data [PREFIX/etc]
--sharedstatedir=DIR modifiable architecture-independent data [PREFIX/com]
--localstatedir=DIR modifiable single-machine data [PREFIX/var]
--runstatedir=DIR modifiable per-process data [LOCALSTATEDIR/run]
--libdir=DIR object code libraries [EPREFIX/lib]
--includedir=DIR C header files [PREFIX/include]
--oldincludedir=DIR C header files for non-gcc [/usr/include]
Expand Down
Empty file modified doc/doxygen/doxygen_sqlite3.db 100644 → 100755
Empty file.
256 changes: 6 additions & 250 deletions doc/doxygen/html/Node_8h_source.html 100644 → 100755

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions doc/doxygen/html/annotated.html 100644 → 100755
Expand Up @@ -3,7 +3,7 @@
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.9.1"/>
<meta name="generator" content="Doxygen 1.8.11"/>
<title>Bloom Filter Trie: Data Structures</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
Expand All @@ -22,7 +22,7 @@
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td style="padding-left: 0.5em;">
<td id="projectalign" style="padding-left: 0.5em;">
<div id="projectname">Bloom Filter Trie
</div>
</td>
Expand All @@ -31,7 +31,7 @@
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.9.1 -->
<!-- Generated by Doxygen 1.8.11 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
Expand Down Expand Up @@ -95,9 +95,9 @@
</div><!-- contents -->
<!-- start footer part -->
<hr class="footer"/><address class="footer"><small>
Generated on Wed Nov 9 2016 17:22:01 for Bloom Filter Trie by &#160;<a href="http://www.doxygen.org/index.html">
Generated on Mon Nov 21 2016 17:45:16 for Bloom Filter Trie by &#160;<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/>
</a> 1.8.9.1
</a> 1.8.11
</small></address>
</body>
</html>
Empty file modified doc/doxygen/html/arrowdown.png 100644 → 100755
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file modified doc/doxygen/html/arrowright.png 100644 → 100755
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file modified doc/doxygen/html/bc_s.png 100644 → 100755
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file modified doc/doxygen/html/bdwn.png 100644 → 100755
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
83 changes: 66 additions & 17 deletions doc/doxygen/html/bft_8h.html 100644 → 100755
Expand Up @@ -3,7 +3,7 @@
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.9.1"/>
<meta name="generator" content="Doxygen 1.8.11"/>
<title>Bloom Filter Trie: include/bft.h File Reference</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
Expand All @@ -22,7 +22,7 @@
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td style="padding-left: 0.5em;">
<td id="projectalign" style="padding-left: 0.5em;">
<div id="projectname">Bloom Filter Trie
</div>
</td>
Expand All @@ -31,7 +31,7 @@
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.9.1 -->
<!-- Generated by Doxygen 1.8.11 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
Expand Down Expand Up @@ -192,12 +192,6 @@
<tr class="memitem:ac5c1b6b0b04abfdf205f966299f042fd"><td class="memItemLeft" align="right" valign="top">void&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="bft_8h.html#ac5c1b6b0b04abfdf205f966299f042fd">free_BFT_kmer_content</a> (<a class="el" href="structBFT__kmer.html">BFT_kmer</a> *bft_kmer, int nb_bft_kmer)</td></tr>
<tr class="memdesc:ac5c1b6b0b04abfdf205f966299f042fd"><td class="mdescLeft">&#160;</td><td class="mdescRight">Function freeing the content of allocated BFT_kmers. <a href="#ac5c1b6b0b04abfdf205f966299f042fd">More...</a><br /></td></tr>
<tr class="separator:ac5c1b6b0b04abfdf205f966299f042fd"><td class="memSeparator" colspan="2">&#160;</td></tr>
<tr class="memitem:a67f15f4eeebe6fe620c467f8180e1f31"><td class="memItemLeft" align="right" valign="top"><a class="el" href="structBFT__kmer.html">BFT_kmer</a> *&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="bft_8h.html#a67f15f4eeebe6fe620c467f8180e1f31">get_kmer</a> (const char *kmer, <a class="el" href="bft_8h.html#a2db0a80fe662d044625641531e5ce292">BFT</a> *bft)</td></tr>
<tr class="memdesc:a67f15f4eeebe6fe620c467f8180e1f31"><td class="mdescLeft">&#160;</td><td class="mdescRight">Function searching for a k-mer in a BFT. <a href="#a67f15f4eeebe6fe620c467f8180e1f31">More...</a><br /></td></tr>
<tr class="separator:a67f15f4eeebe6fe620c467f8180e1f31"><td class="memSeparator" colspan="2">&#160;</td></tr>
<tr class="memitem:a170137e867f0ba24423badc9e1db872b"><td class="memItemLeft" align="right" valign="top">bool&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="bft_8h.html#a170137e867f0ba24423badc9e1db872b">is_kmer_in_cdbg</a> (<a class="el" href="structBFT__kmer.html">BFT_kmer</a> *bft_kmer)</td></tr>
<tr class="memdesc:a170137e867f0ba24423badc9e1db872b"><td class="mdescLeft">&#160;</td><td class="mdescRight">Function testing if a k-mer is in a BFT. <a href="#a170137e867f0ba24423badc9e1db872b">More...</a><br /></td></tr>
<tr class="separator:a170137e867f0ba24423badc9e1db872b"><td class="memSeparator" colspan="2">&#160;</td></tr>
<tr class="memitem:ab9b8daf6a5685786083c258526ef87cb"><td class="memItemLeft" align="right" valign="top">void&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="bft_8h.html#ab9b8daf6a5685786083c258526ef87cb">extract_kmers_to_disk</a> (<a class="el" href="bft_8h.html#a2db0a80fe662d044625641531e5ce292">BFT</a> *bft, char *filename_output, bool compressed_output)</td></tr>
<tr class="memdesc:ab9b8daf6a5685786083c258526ef87cb"><td class="mdescLeft">&#160;</td><td class="mdescRight">Function extracting the k-mers of a BFT in a file. <a href="#ab9b8daf6a5685786083c258526ef87cb">More...</a><br /></td></tr>
<tr class="separator:ab9b8daf6a5685786083c258526ef87cb"><td class="memSeparator" colspan="2">&#160;</td></tr>
Expand All @@ -207,6 +201,24 @@
<tr class="memitem:a77a802e5edc240a7a6a705ef6593799d"><td class="memItemLeft" align="right" valign="top">size_t&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="bft_8h.html#a77a802e5edc240a7a6a705ef6593799d">write_kmer_comp_to_disk</a> (<a class="el" href="structBFT__kmer.html">BFT_kmer</a> *bft_kmer, <a class="el" href="bft_8h.html#a2db0a80fe662d044625641531e5ce292">BFT</a> *bft, va_list args)</td></tr>
<tr class="memdesc:a77a802e5edc240a7a6a705ef6593799d"><td class="mdescLeft">&#160;</td><td class="mdescRight">Function writing an 2 bits encoded k-mer in a file. <a href="#a77a802e5edc240a7a6a705ef6593799d">More...</a><br /></td></tr>
<tr class="separator:a77a802e5edc240a7a6a705ef6593799d"><td class="memSeparator" colspan="2">&#160;</td></tr>
<tr><td colspan="2"><div class="groupHeader">Query functions</div></td></tr>
<tr><td colspan="2"><div class="groupText"><p>These functions query for k-mers or sequences. </p>
</div></td></tr>
<tr class="memitem:a67f15f4eeebe6fe620c467f8180e1f31"><td class="memItemLeft" align="right" valign="top"><a class="el" href="structBFT__kmer.html">BFT_kmer</a> *&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="bft_8h.html#a67f15f4eeebe6fe620c467f8180e1f31">get_kmer</a> (const char *kmer, <a class="el" href="bft_8h.html#a2db0a80fe662d044625641531e5ce292">BFT</a> *bft)</td></tr>
<tr class="memdesc:a67f15f4eeebe6fe620c467f8180e1f31"><td class="mdescLeft">&#160;</td><td class="mdescRight">Function searching for a k-mer in a BFT. <a href="#a67f15f4eeebe6fe620c467f8180e1f31">More...</a><br /></td></tr>
<tr class="separator:a67f15f4eeebe6fe620c467f8180e1f31"><td class="memSeparator" colspan="2">&#160;</td></tr>
<tr class="memitem:a170137e867f0ba24423badc9e1db872b"><td class="memItemLeft" align="right" valign="top">bool&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="bft_8h.html#a170137e867f0ba24423badc9e1db872b">is_kmer_in_cdbg</a> (<a class="el" href="structBFT__kmer.html">BFT_kmer</a> *bft_kmer)</td></tr>
<tr class="memdesc:a170137e867f0ba24423badc9e1db872b"><td class="mdescLeft">&#160;</td><td class="mdescRight">Function testing if a k-mer is in a BFT. <a href="#a170137e867f0ba24423badc9e1db872b">More...</a><br /></td></tr>
<tr class="separator:a170137e867f0ba24423badc9e1db872b"><td class="memSeparator" colspan="2">&#160;</td></tr>
<tr class="memitem:aa6c30881d62f797a766d5ad9e1548b21"><td class="memItemLeft" align="right" valign="top">uint32_t *&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="bft_8h.html#aa6c30881d62f797a766d5ad9e1548b21">query_sequence</a> (<a class="el" href="bft_8h.html#a2db0a80fe662d044625641531e5ce292">BFT</a> *bft, char *sequence, double threshold)</td></tr>
<tr class="memdesc:aa6c30881d62f797a766d5ad9e1548b21"><td class="mdescLeft">&#160;</td><td class="mdescRight">Function querying a BFT for a sequence. <a href="#aa6c30881d62f797a766d5ad9e1548b21">More...</a><br /></td></tr>
<tr class="separator:aa6c30881d62f797a766d5ad9e1548b21"><td class="memSeparator" colspan="2">&#160;</td></tr>
<tr><td colspan="2"><div class="groupHeader">Pattern matching functions</div></td></tr>
<tr><td colspan="2"><div class="groupText"><p>These functions provide pattern matching functionalities over the k-mers or paths of a colored de Bruijn graph stored as a BFT. </p>
</div></td></tr>
<tr class="memitem:a6f93be4a2e37c7355c79079e8c38530a"><td class="memItemLeft" align="right" valign="top">bool&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="bft_8h.html#a6f93be4a2e37c7355c79079e8c38530a">prefix_matching</a> (<a class="el" href="bft_8h.html#a2db0a80fe662d044625641531e5ce292">BFT</a> *bft, char *prefix, <a class="el" href="bft_8h.html#aac9ee38523fda7ab4f8eb1238e98d315">BFT_func_ptr</a> f,...)</td></tr>
<tr class="memdesc:a6f93be4a2e37c7355c79079e8c38530a"><td class="mdescLeft">&#160;</td><td class="mdescRight">Function for prefix matching over the k-mers of a BFT. <a href="#a6f93be4a2e37c7355c79079e8c38530a">More...</a><br /></td></tr>
<tr class="separator:a6f93be4a2e37c7355c79079e8c38530a"><td class="memSeparator" colspan="2">&#160;</td></tr>
<tr><td colspan="2"><div class="groupHeader">Marking functions</div></td></tr>
<tr><td colspan="2"><div class="groupText"><p>These functions allow to mark k-mers of a colored de Bruijn graph with flags. </p>
</div></td></tr>
Expand Down Expand Up @@ -249,12 +261,6 @@
<tr class="memitem:aa616e55874abfd5811c8854586718c35"><td class="memItemLeft" align="right" valign="top">void&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="bft_8h.html#aa616e55874abfd5811c8854586718c35">v_iterate_over_kmers</a> (<a class="el" href="bft_8h.html#a2db0a80fe662d044625641531e5ce292">BFT</a> *bft, <a class="el" href="bft_8h.html#aac9ee38523fda7ab4f8eb1238e98d315">BFT_func_ptr</a> f, va_list args)</td></tr>
<tr class="memdesc:aa616e55874abfd5811c8854586718c35"><td class="mdescLeft">&#160;</td><td class="mdescRight">Function iterating over the k-mers of a BFT. <a href="#aa616e55874abfd5811c8854586718c35">More...</a><br /></td></tr>
<tr class="separator:aa616e55874abfd5811c8854586718c35"><td class="memSeparator" colspan="2">&#160;</td></tr>
<tr><td colspan="2"><div class="groupHeader">Pattern matching functions</div></td></tr>
<tr><td colspan="2"><div class="groupText"><p>These functions provide pattern matching functionalities over the k-mers or paths of a colored de Bruijn graph stored as a BFT. </p>
</div></td></tr>
<tr class="memitem:a6f93be4a2e37c7355c79079e8c38530a"><td class="memItemLeft" align="right" valign="top">bool&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="bft_8h.html#a6f93be4a2e37c7355c79079e8c38530a">prefix_matching</a> (<a class="el" href="bft_8h.html#a2db0a80fe662d044625641531e5ce292">BFT</a> *bft, char *prefix, <a class="el" href="bft_8h.html#aac9ee38523fda7ab4f8eb1238e98d315">BFT_func_ptr</a> f,...)</td></tr>
<tr class="memdesc:a6f93be4a2e37c7355c79079e8c38530a"><td class="mdescLeft">&#160;</td><td class="mdescRight">Function for prefix matching over the k-mers of a BFT. <a href="#a6f93be4a2e37c7355c79079e8c38530a">More...</a><br /></td></tr>
<tr class="separator:a6f93be4a2e37c7355c79079e8c38530a"><td class="memSeparator" colspan="2">&#160;</td></tr>
<tr><td colspan="2"><div class="groupHeader">Disk I/O functions</div></td></tr>
<tr><td colspan="2"><div class="groupText"><p>These functions write and load a BFT from disk. </p>
</div></td></tr>
Expand Down Expand Up @@ -1224,6 +1230,49 @@ <h2 class="groupheader">Function Documentation</h2>
</dl>
<dl class="section return"><dt>Returns</dt><dd>a boolean indicating the presence (true) or absence (false) of the k-mer in a the genome. </dd></dl>

</div>
</div>
<a class="anchor" id="aa6c30881d62f797a766d5ad9e1548b21"></a>
<div class="memitem">
<div class="memproto">
<table class="memname">
<tr>
<td class="memname">uint32_t* query_sequence </td>
<td>(</td>
<td class="paramtype"><a class="el" href="bft_8h.html#a2db0a80fe662d044625641531e5ce292">BFT</a> *&#160;</td>
<td class="paramname"><em>bft</em>, </td>
</tr>
<tr>
<td class="paramkey"></td>
<td></td>
<td class="paramtype">char *&#160;</td>
<td class="paramname"><em>sequence</em>, </td>
</tr>
<tr>
<td class="paramkey"></td>
<td></td>
<td class="paramtype">double&#160;</td>
<td class="paramname"><em>threshold</em>&#160;</td>
</tr>
<tr>
<td></td>
<td>)</td>
<td></td><td></td>
</tr>
</table>
</div><div class="memdoc">

<p>Function querying a BFT for a sequence. </p>
<dl class="params"><dt>Parameters</dt><dd>
<table class="params">
<tr><td class="paramname">bft</td><td>is a BFT to be queried. </td></tr>
<tr><td class="paramname">sequence</td><td>is a string to query. </td></tr>
<tr><td class="paramname">threshold</td><td>is a float (0 &lt; threshold &lt;= 1) indicating the minimum percentage of k-mers from the queried sequence that must be present in a genome to have the queried sequence reported present in this genome. </td></tr>
</table>
</dd>
</dl>
<dl class="section return"><dt>Returns</dt><dd>a pointer to a sorted array of genome identifiers in which the queried sequence occurs (according to parameter threshold) or NULL if the queried sequence is not present in at least one genome (according to parameter threshold). The first element of the array (position 0) indicates how many ids are in this array. </dd></dl>

</div>
</div>
<a class="anchor" id="aec4a612e2beec69b717be9e972fa16cb"></a>
Expand Down Expand Up @@ -1621,9 +1670,9 @@ <h2 class="groupheader">Function Documentation</h2>
</div><!-- contents -->
<!-- start footer part -->
<hr class="footer"/><address class="footer"><small>
Generated on Wed Nov 9 2016 17:22:01 for Bloom Filter Trie by &#160;<a href="http://www.doxygen.org/index.html">
Generated on Mon Nov 21 2016 17:45:16 for Bloom Filter Trie by &#160;<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/>
</a> 1.8.9.1
</a> 1.8.11
</small></address>
</body>
</html>

0 comments on commit f671eb9

Please sign in to comment.