Skip to content

Latest commit

 

History

History
115 lines (89 loc) · 3.88 KB

genebank.md

File metadata and controls

115 lines (89 loc) · 3.88 KB

Reading and writing a Genbank file

There are multiple ways how to read a Genbank file.

Method 1: Read a Genbank file using the GenbankProxySequenceReader

GenbankProxySequenceReader<AminoAcidCompound> genbankProteinReader =
    new GenbankProxySequenceReader<AminoAcidCompound>("/tmp", "NP_000257",
        AminoAcidCompoundSet.getAminoAcidCompoundSet());
ProteinSequence proteinSequence = new ProteinSequence(genbankProteinReader);
genbankProteinReader.getHeaderParser().parseHeader(
    genbankProteinReader.getHeader(), proteinSequence);
System.out.format("Sequence(%s,%d)=%s...",
    proteinSequence.getAccession(),
    proteinSequence.getLength(),
    proteinSequence.getSequenceAsString().substring(0, 10));

GenbankProxySequenceReader<NucleotideCompound> genbankDNAReader =
    new GenbankProxySequenceReader<NucleotideCompound>("/tmp", "NM_001126",
        DNACompoundSet.getDNACompoundSet());
DNASequence dnaSequence = new DNASequence(genbankDNAReader);
genbankDNAReader.getHeaderParser().parseHeader(genbankDNAReader.getHeader(), dnaSequence);
System.out.format("Sequence(%s,%d)=%s...", dnaSequence.getAccession(),
    dnaSequence.getLength(),
    dnaSequence.getSequenceAsString().substring(0, 10));

Method 2: Read a Genbank file using GenbankReaderHelper

File dnaFile = new File("src/test/resources/NM_000266.gb");
File protFile = new File("src/test/resources/BondFeature.gb");

LinkedHashMap<String, DNASequence> dnaSequences =
    GenbankReaderHelper.readGenbankDNASequence( dnaFile );
for (DNASequence sequence : dnaSequences.values()) {
  System.out.println( sequence.getSequenceAsString() );
}

LinkedHashMap<String, ProteinSequence> protSequences =
    GenbankReaderHelper.readGenbankProteinSequence(protFile);
for (ProteinSequence sequence : protSequences.values()) {
  System.out.println( sequence.getSequenceAsString() );

Method 3: Read a Genbank file using the GenbankReader Object

FileInputStream is = new FileInputStream(dnaFile);
GenbankReader<DNASequence, NucleotideCompound> dnaReader =
    new GenbankReader<DNASequence, NucleotideCompound>(
        is,
        new GenericGenbankHeaderParser<DNASequence,NucleotideCompound>(),
        new DNASequenceCreator(DNACompoundSet.getDNACompoundSet())
        );
dnaSequences = dnaReader.process();
is.close();
System.out.println(dnaSequences);

is = new FileInputStream(protFile);
GenbankReader<ProteinSequence, AminoAcidCompound> protReader =
    new GenbankReader<ProteinSequence, AminoAcidCompound>(
        is,
        new GenericGenbankHeaderParser<ProteinSequence,AminoAcidCompound>(),
        new ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet())
        );
protSequences = protReader.process();
is.close();
System.out.println(protSequences);

Write a Genbank file

Use the GenbankWriterHelper to write DNA sequences into a Genbank file.

// First let's read some DNA sequences from a genbank file

File dnaFile = new File("src/test/resources/NM_000266.gb");
LinkedHashMap<String, DNASequence> dnaSequences =
    GenbankReaderHelper.readGenbankDNASequence( dnaFile );
ByteArrayOutputStream fragwriter = new ByteArrayOutputStream();
ArrayList<DNASequence> seqs = new ArrayList<DNASequence>();
for(DNASequence seq : dnaSequences.values()) {
  seqs.add(seq);
}

// ok now we got some DNA sequence data. Next step is to write it

GenbankWriterHelper.writeNucleotideSequence(fragwriter, seqs,
    GenbankWriterHelper.LINEAR_DNA);

// the fragwriter object now contains a string representation in the Genbank format
// and you could write this into a file
// or print it out on the console
System.out.println(fragwriter.toString());

Navigation: Home | Book 4: The Genomics Module | Chapter 5 : Genebank

Prev: Chapter 4 : GTF and GFF files

Next: Chapter 5 : karyotype (cytoband)