Skip to content

nRo/DataFrame-GTF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GTF parser for Java Dataframes

A GTF Reader and Writer for Java DataFrames.

The GTF Format is implemented according to this documentation:

GFF/GTF File Format

travis codecov Codacy Badge

Documentation

Javadocs

Install

Maven Central

Add this to you pom.xml

<dependencies>
...
    <dependency>
        <groupId>de.unknownreality</groupId>
        <artifactId>dataframe-gtf</artifactId>
        <version>0.2.4</version>
    </dependency>
...
</dependencies>

Build

To build the library from sources:

  1. Clone github repository

    $ git clone https://github.com/nRo/DataFrame-GTF.git

  2. Change to the created folder and run mvn install

    $ cd DataFrame-GTF

    $ mvn install

  3. Include it by adding the following to your project's pom.xml:

<dependencies>
...
    <dependency>
        <groupId>de.unknownreality</groupId>
        <artifactId>dataframe-gtf</artifactId>
        <version>0.2.4-SNAPSHOT</version>
    </dependency>
...
</dependencies>

Usage

Create a DataFrame from a GTF file

File gtfFile = new File("genome.gtf");
DataFrame df = DataFrame.load(gtfFile,GTFFormat.GTF)

Per default, all GTF fields are included in the resulting DataFrame. Attributes can be added by adding them to the GTF reader.

GTFReader gtfReader = GTFReaderBuilder.create()
                .withAttribute("gene_id")
                .build();
DataFrame df = DataFrame.load(gtfFile, gtfReader);

The column type of GTF fields is predefined:

GTF field type
seqname String
source String
feature String
start Long
end Long
score Double
strand String
frame Integer

The type of attributes can be specified

GTFReader gtfReader = GTFReaderBuilder.create()
                .withAttribute("gene_id")
                .withAttribute("test_value", DoubleColumn.class)
                .build();
DataFrame df = DataFrame.load(gtfFile, gtfReader);

DataFrames can be written according to the GTF format.

dataFrame.write(new File("result.gtf"), GTFFormat.GTF);