Skip to content

Comparative Toxicogenomics Database (CTD) to BioPAX Level3 data converter.

Notifications You must be signed in to change notification settings

PathwayCommons/ctd-to-biopax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ctd-to-biopax

Originated from https://bitbucket.org/armish/gsoc14 and will continue here (ToDo).

Comparative Toxicogenomics Database (CTD) to BioPAX Level3 data converter

Unlike many other drug-target databases, this data resource has a controlled vocabulary that can be mapped to BioPAX, for example: 'nutlin 3 results in increased expression of BAX'. Therefore, implementation of a converter first requires a manual mapping from CTD terms to BioPAX ontology. Once the mapping is done, then the actual conversion requires parsing and integrating several CSV files that are distributed by the provider.

Data source

Implementation details

The converter is structured as a java maven project, where the only major dependencies are Paxtools and JAXB libraries. The project can be compiled into an executable 'fat' JAR file that can be used as a command line utility (described below).

For the conversion, the utility uses three different input files:

  1. Chemical-Gene Interactions (XML)
  2. Gene Vocabulary (CSV)
  3. Chemical Vocabulary (CSV)

all of which can be downloaded from the CTD Downloads page. User can provide any of these files as input and get a BioPAX file as the result of the conversion. If user provides more than one input, then the converted models are merged and a single BioPAX file is provided as output.

The gene/chemical vocabulary converters produce BioPAX file with only EntityReferences in them. Each entity reference in this converted models includes the external references provided within the vocabulary file. From the chemical vocabulary, SmallMoleculeReferences are produced; and from the gene vocabulary, various types of references are produced for corresponding CTD gene forms: ProteinReference, DnaReference, RnaReference, DnaRegionReference and RnaRegionReference.

The interactions file contains all detailed interactions between chemicals and genes, but no background information on the chemical/gene entities.

We can convert any or all of these three files at once, merge into one BioPAX model.

The CTD data sets have nested interactions that are captured by their structured XML file and their XML schema: CTD_chem_gene_ixns_structured.xml.gz and CTD_chem_gene_ixns_structured.xsd. The converter takes advantage of JAXB library to handle this structured data set. The automatically generated Java classes that correspond to this schema can be found under src/main/java/org/ctdbase/model. The simple flow that show how the conversion happens is available as the main executable class: CtdToBiopax.java.

Usage

Check out (clone) and change the project directory:

$ cd ctd-to-biopax

build with Maven:

$ mvn clean package

This will create an executable JAR file ctd-to-biopax.jar under the target/ directory. Once you have the single JAR file, you can try to run without any command line options to see the help text:

$ java -jar ctd-to-biopax.jar
usage: CtdToBiopax
 -c,--chemical <arg>      CTD chemical vocabulary (CSV) [optional]
 -g,--gene <arg>          CTD gene vocabulary (CSV) [optional]
 -o,--output <arg>        Output (BioPAX file) [required]
 -r,--remove-dangling     Remove dangling utility class entities [optional; use with -x -t]
 -t,--taxonomy <arg>      filter interactions by species, Taxonomy ID ('9606' for human);
                          can use special values: 'defined', 'undefined', and 'null') [optional]
 -x,--interaction <arg>   structured chemical-gene interaction file (XML)
                          [optional]
 Note: the input data files can be compressed, e.g. CTD_genes.csv.gz

If you want to test the converter though, you can download small (old) example files from goal2_ctd_smallSampleInputFiles-20140702.zip. To convert these sample files into a single BioPAX file, run the following command:

$ java -jar ctd-to-biopax.jar -x ctd_small.xml -c CTD_chemicals_small.csv -g CTD_genes_small.csv -r -t 9606 -o ctd.owl

which will create the ctd.owl file for you.

About

Comparative Toxicogenomics Database (CTD) to BioPAX Level3 data converter.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages