Skip to content
/ dna Public

A biological sequence file (fasta, fastq, qseq) parser for Ruby

Notifications You must be signed in to change notification settings

audy/dna

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DNA Gem Version Build Status Coverage Status

A biological sequence file parser for Ruby

Austin G. Davis-Richardson

Features

Installation

Tested on Ruby 1.9.3 and 2.0.0

$ (sudo) gem install dna

Usage

require 'dna'

# Automatic Format Detection 

File.open('sequences.fasta') do |handle|
  records = Dna.new handle

  records.each do |record|
    puts record.length
  end
end

File.open('sequences.fastq') do |handle|
  records = Dna.new handle

  records.each do |record|
    puts record.quality
  end
end

File.open('sequences.qseq') do |handle|
  records = Dna.new handle
  puts records.first.inspect
end

# **caveat:** If you are reading from a compressed file
# or `stdin` you MUST specify the sequence format:

require 'zlib'

Zlib::GzipReader('sequences.fasta.gz') do |handle|
  records = Dna.new handle, :format => :fasta

  records.each do |record|
    puts record.length
  end
end

Support for PHRED score parsing

# Illumina > 1.3)

record.illumina_qualities # => [31, ..., 37]

# Error probabilities

record.illumina_probabilities
# => [1.0, 0.7943282347242815, ...,  0.3981071705534972]

# Solexa + Illumina =< 1.3

record.solexa_qualities
record.solexa_probabilities

# Sanger

record.sanger_qualities
record.sanger_probabilities

Bonus Feature

The DNA gem is also a command-line tool with grep-like capabilities. Print records with (Ruby) regexp match in header.

$ dna spec/data/input.fastq "[1-2]"

@1
TGAAACTTATTGATCACCCCGCTTGGCGTTGGGGAGAAATTCAGAAAAGAGTGCTTGATGGGGCGCCACATGCCGTGCAACCCACTCTCTTTCACGCAGCGCGCCCCA
+1
5888.6778888650/-//&,(,./*-11'//0&,-0.(.,,,,/2/&-,,,,,.(.,(,..&---&-,,,((*-----*+.&,,,,,(//&,,,-(,,+(,,,--&(
@2
GTCGCGGCTTACCACCCAACGATTTTTTTTAGAGGTGCTGGTTTCA
+2
2550//*-1./4.--/'+.2.,,,,,,,,&(/00.11426554+13

$ dna spec/data/test.fasta "\d"

>1
GAGAGATCTCATGACACAGCCGAAG
>2
GAGACAUAUCCNNNAA

About

A biological sequence file (fasta, fastq, qseq) parser for Ruby

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages