One of the challenges that computational biologists face during genome assembly projects is choosing from the plethora of assembly software. This is highly time-consuming as there are various parameters for each of the assemblers that the user needs to learn about. In addition, even if users learn about the various parameters of each assembler, various assemblers still need to be run, and statistical results need to be compared to identify the best assembly. GenoAssist helps computational biologists by centralizing all the assemblers, their parameters, running environments, and results reporting in a single place.
-
You can either use go (will be added to
$GOPATH/
):$ go get -u github.com/genoassist/genoassist
Or clone the repository:
$ git clone https://github.com/genoassist/genoassist
-
Build the
main.go
file$ go build main.go
If you are missing packages, run go mod vendor
to collect the necessary packages
GenoAssist only requires a YAML file that contains the configuration it should use to run its processes. A template can be found in this repository. For convenience, here's an example specification:
assemblers:
megahit:
kmers: "27"
abyss:
kmers: "27"
genoassist:
assemblers: ['abyss','megahit','flye']
inputFilePath: "/test/raw_sequences.fastq"
outputPath: "/test/output"
threads: 2
prep: true
qualityControl: true
fileType: "fasta"
- All paths used with GenoAssist have to be absolute paths (a Docker requirement)
- The accepted assembler values are:
- 'abyss'
- 'megahit'
- 'flye'
- The accepted file types are:
- FASTA
- FASTQ
If you are encountering problems with Docker, make sure that:
- The Docker daemon is running in the background
- You have the necessary Docker images, which can be installed via GenoAssist specifying
prep: true
undergenoassist
in the YAML configuration. This will install the necessary Docker images for the containers that GenoAssist runs.
The overall model follows the primary/replica architecture. The primary is what users interact with. The users specify the files containing the contigs and what type of read they have e.g Illumina. The primary takes the user's input and schedules assembly, parsing of results, and reporting, in that order.
Feel free to contact any of the maintainers if you would like to be an active maintainer and contributor to GenoAssist! If you would like to contribute only, you are encouraged to grab an issue and submit a pull request with proposed changes for review!
Submit feedback and bug reports by using the Issues section of the repository.