Clean-up (#14)

* Clean-up * fixed tab in output.py * removed old code/ folder * more cleanup
DuttonLab · May 27, 2016 · 47c75d8 · 47c75d8
1 parent be74847
commit 47c75d8
Show file tree

Hide file tree

Showing 19 changed files with 44 additions and 1,995 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,7 +1,3 @@
 .DS_store
 *ipynb*
-*.sublime-project
-*.sublime-workspace
 *.pyc
-.idea
-user_settings.py
diff --git a/README.md b/README.md
@@ -7,36 +7,29 @@ Dependencies:
 * MongoDB
   * pymongo
 * BioPython
-* ~~Scikit Bio~~
-* ~~Pandas~~
+* BLAST+ CLI
 
 **Identification of horizontal gene transfer between sequenced microbial genomes**
 
-##The following is out of date... Will get back to this soon
+Kvasir takes as the input a folder containing genomes in genbank format. The protein coding genes from these genomes are loaded into a database, and blasted against each other.
 
-##~~Running Kvasir~~
-~~With dependencies installed, fire up a Mongod instance. In the terminal:~~
+### Usage
 
-~~`mongod --dbpath path/to/db`~~
+Change the values in `settings.py` to point at your input folder, output folder, and the name you want for your database.
 
-~~Run Kvasir by invoking run_kvasir.py in your working directory:~~
+Launch a local `mongod` instance:
+```
+$ mongod --dbpath path/to/db
+```
 
-~~`python run_kvasir.py /path/to/gb_files name_of_mongoDB`~~
+Run functions in `run.py`. Eventually, this will get more streamlined, for now...
 
-####~~DataImport~~:
-* ~~Imports genbank-formated annotated genomes into Mongo database.~~
-* ~~.gb files require "locus_tag" feature. If your genomes don't have it, FixGbk.py shoul take care of it for you~~
-* ~~Mongo database has "collections" and "documents" - a different collection is generated for each species (each separate genbank file) and documents representing each CDS. ~~
-    * ~~CDS documents are like python dictionaries, and contain entries for species, DNA and amino acid protein sequences, contig and location info, and annotation information.~~
-    * ~~each document is assigned a uniqe `_id` attribute within the species, so every gene is uniquely identified by a `(species, _id)` tuple~~
-
-####~~MakeBlastDb~~
-~~Generates a multi-fasta file containing every gene in the mongo database, generates a BLASTable database, then deletes the temprorary file~~
-
-####~~KsavirBlast~~
-* ~~For each species, generates a temporary fasta file and BLASTs against every other gene in the database~~
-* ~~BLAST generates and xml document, which is parsed for unique hits~~
-* ~~new "hits" entry is added to each gene document in MongoDB, which contains a list of `(species, _id)` tuples for each hit (these are used in the next script to gather info about hits)~~
-
-####~~Outputs~~
-~~Still a work in progress. So far, have a bunch of output formates working... will detail later.~~
+```
+Python 2.7.11 (default, Dec 14 2015, 10:44:13)
+[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
+Type "help", "copyright", "credits" or "license" for more information.
+>>> import run
+>>> run.import_data()
+>>> run.blast_db()
+>>> run.analyze(0.99)
+```
diff --git a/code/DataImport.py b/code/DataImport.py