Skip to content

User's Guide

rmera edited this page Oct 30, 2013 · 45 revisions

#Introduction NOTE: This guide applies to the new version of goChem, which is in the devel branch of the repository. It does NOT apply to the current verion in the master branch (although the changes are not too radicals). Very soon, the new version of goChem will be available in the master branch. Meanwhile, if you want to use the new version, please git-clone the goChem repository by hand (instead of using go get) and switch to the devel branch

This guide assumes you can program in the Go language. It's easy! Take the Tour of go, and you will be pretty much ready to start. You can also go to golang.org for more documentation. In this guide, Go "slang" words are written in italics. Names of gochem types and functions are written in bold face, at least the first time they are mentioned.

##What is gochem?

Gochem is a library for chemistry, especially computational biochemistry, in go. It provides structures for handling atoms and molecules, reading/writing some common formats (currently xyz, PDB, xtc and dcd) and a few functions for geometrical manipulation of molecules.

##Design goals

  • Simplicity (of the code).
  • Readability.
  • Assume that users can code (dont overcomplicate the code with zillion simplifying functions. When some of these are provided they should be separated from the rest.) The helper functions provided are to be clearly identified and isolated from the core of the library.
  • Try to avoid the need for users to know gonum as much as possible -Supply convenience functions to deal with coordinates. -Of course users wanting to code calculations will need gonum functions anyway.
  • Fast and light
  • Concurrent when possible and necessary
  • Easy to extend
  • Useful for computational biochemistry at a classical and QM levels.

#Installation Assuming that you have installed Go, the installation of goChem is very simple. In addition to Go, you need to have the following installed:

*Git *Mercurial (only if you need Ramachandran plots) *The xdrfile libraty from the GROMACS project, which is open-source and can be obtained [here]ftp://ftp.gromacs.org/pub/contrib/xdrfile-1.1.1.tar.gz (only if you need to read XTC files).

If you need xdrfile, make sure that the include files and the library itself are available to the compiler/linker.

Having the needed software, the only command you need to issue is:

go get github.com/rmera/gochem

To have goChem installed in your machine. In order to have support for Ramachandran plots, you need, in addition, the following command:

go get -tags "plot" github.com/rmera/gochem

#Basic information

##A little Go reminber.

This reminder is not a replacement for taking the tour of Go, or reading documentation. It mostly hopes to allow you not to have to go back to the documentation every second. We do not explain Go here. We mostly give quick and dirty definitions, and rough equivalents in other programming languages, for key Go concepts used in the rest of this guide. The defined concepts are in logical, rather than in alphabetic order.

###Slice. A slice is an array of objects, all of the same type. Its contents can be altered, and its length can vary. you can obtain sub-slices of a slice. This sub-slices will be slices too, and any change in the sub-slice will be reflected in the original slice, and vice-versa. Slices are equivalent to Python lists (except that all the elements must be of the same type).

###Method. Similar to in C++, Python, etc, a method is a function that is attached to a type. If you have an object A of a type B, and type B has the method C, you can call the method C by doing:

 A.C()

In this case, A is said to be the received of the method.

###Receiver An object on which a method is called (see above).

###Interface An interface is an abstract type. It defines a set of methods. If any object has the methods defined in the interface, it is said that the object implements the interface. For instance, if you define an interface called Jumper which defines only the method Jump(), and then define a function that takes a Jumper interface as argument, it means that this function will accept any object that has a Jump() method. The function will not know anything about the object, except that it has a Jump() method.

##The basic types.

The library's core is one abstract type, the chem.Ref interface, which stores information of about a molecule which is not expected to change easily (mostly, atomic data other than coordinates), and a concrete type, chem.VecMatrix, which stores coordinates. chem.VecMatrix. The information stored in a chem.Ref interface is called the Topology of a molecule.

###The chem.Ref interface The main object stored in a chem.Ref is the non-geometrical atomic data for all the atoms in a molecule (although other data such as total charge and total multiplicity is also stored). The atomic data for each atom is stored in the type chem.Atom.

It could be said that a chem.Ref is little more than a collection of chem.Atoms with a little metadata, and several methods to view and manipulate the chem.Atom collection. As with every Go interface, you interact with a chem.Ref only via the exposed methods.

It is important to notice that there are several interfaces that contain some of the methods of chem.Ref allowing a more limited interaction with the atoms (read-only, write-only, etc.)

###The chem.VecMatrix type The type that stores the coordinates of the atoms in a molecule, chem.VecMatrix is different from chem.Ref in the sense that it is a concrete type, not an interface. chem.VecMatrix is simply a 3xN matrix, where N is the number of atoms in the molecule.

chem.VecMatrix implements most of the interfaces defined in the gonum/matrix. library for linear algebra, so it can be used with the functions of that library. chem.VecMatrix contains some specific methods not present in the gonum/matrix library which are especial for manipulations of atomic coordinates in 3D space.

Most functions in goChem require both a chem.VecMatrix and a chem.Ref (or a subset of chem.Ref) as parameters, although some functions which deal exclusively with geometry might require only a chem.VecMatrix (i.e. the coordinates of a molecule)

An additional type important in goChem is the chem.Traj interface. It represent a trajectory, or a collection of different geometries, or "frames" for the same molecule (for instance, the evolution in time of a molecule). As such, it consists of a collection of chem.VecMatrix objects. The main method of the chem.Traj interface is the Next method, which allows to obtain the next frame (i.e. the next chem.VecMatrix) of the trajectory.

###The *chem.Molecule type.

The most important concrete implementation of the chem.Ref interface is the chem.Molecule type. This is the type that you will get when you use goChem to read a PDB or XYZ file. It describes a Molecule, including the coordinates and the atomic data.

As for the coordinates, chem.Molecule can contain the geometries for different states of a molecule (for instance, if you read a multi-PDB or a multi-XYZ file). Because of this, chem.Molecule not only implements the chem.Ref interface but also the chem.Traj interface. In most cases, like when a single-PDB is read, chem.Molecule will contain a trajectory of only one frame.

In chem.Molecule, the trajectory is stored in a slice of chem.VecMatrix. The slice is in a field called Coords. This means that if you have a variable of type chem.VecMatrix called "A" and you want to obtain the first frame of the trajectory in "A" (which in many cases will be the only frame), you can use the following line:

 coords:=A.Coords[0]  //The method Add adds B and C, putting the result in A.

And you will have the coordinates (as a chem.VecMatrix) stored in the variable "coord".

The chem.Molecule type contains also additional data: It contains the B-factor, or temperature factors for each state of a molecule (i.e. one B-factor set for each frame), if applicable (only PDB files contain B-factor information).

##Method implementation.

###Normal method operation.

goChem aims to follow the style of the Go standard library and of the gonum set of libraries. For complex values (like atoms, coordinates and the like), the new value produced by the function is not returned, but instead is put in the received or the method. This means that if we want to add two vectors A and B and put the result in the variable B we do NOT do the following:

 C:=A.Add(B)  //This is incorrect!

Instead, we first prepare a C vector, and then do:

 C.Add(A,B)  //The result of A+B is stored in C

This approach allow to use the same variable C for several different calculations, which allows less memory to be used, making the faster and less demanding programs.

###Exceptions: The method Next.

The first exception to the previous rule is the method Next of the interface chem.Traj.

The method Next is also designed to save memory, as trajectories can be very large objects. As putting the information in the received would be senseless in this case, the method Next ask you for the coordinates as a parameter.

If we have a chem.Traj in a variable "B", and want to use Next to put the next frame in a variable "coords", we need to first build a chem.VecMatrix of size Nx3, where N is the number of atoms in the trajectory, put it in the variable "coords" and give it to Next() as a parameter. The matrix in the variable coords will be filled with the coordinates of the next frame in B.

 B.Next(coords)  //correct, make sure that coords is of the right size.

One might ask, how can we know the number of atoms in the trajectory? or, put in another way, how do we know which size should coord be?. The interface chem.Traj provides a method Len which returns precise this value:

 NAtoms:=B.Len()  //Now we now that we need a NAtomsx3 chem.VecMatrix to store the coordinates. 

If we are not interested in the next frame of B, we can discard it by giving nil as a parameter to Next (remember, nil is equivalent to NULL in c).

 B.Next(nil)  //correct, the next frame will be discarded

###Other exceptions Gochem methods that produce basic Go types, such as numbers (float64,int, etc), strings or similar, simply return their values, like the function Len just discussed. In addition, functions that produce a reference to an object instead of a copy may return the value, as a reference barely uses memory.

#Learning by example: Basic goChem usage All examples in this chapter can be downloaded, along with the data they need to operate, from the examples_gochem repository.

##First example workflow

The work made on a typical small goChem program could be sumarized in the following steps:

  • Load a system from a PDB or XYZ file, obtaining a chem.Molecule.
  • Collect a subset of coordinates which are of interest. (for instance, the backbone atoms of a protein)
  • Do something with the coordinates (study their distribution, or modify them as in rotating them about an axis)
  • If the coordinates were modified, replace coordinates for the atoms of interest in the original molecule by the modified ones.
  • Print the results and, if applicable, generate a PDB or XYZ file with the new coordinates.

The following code shows such a program. In this case, the program modifies the original coordinates.

func One() {
        pulled_atoms := []int{43, 41, 42, 40, 85, 86, 87} //indexes
        pulling_vector := []int{40, 88} //The direction in which we pull is given by atoms 40 and 88 counting from 0
        pull_ammount := 4.0 //how much do we want to pull, in Angstroms
        mol, err := chem.XYZRead("sample.xyz")
        if err != nil {
                panic(err.Error())
        }
        pulled := chem.ZeroVecs(7)
        pulled.SomeVecs(mol.Coords[0], pulled_atoms) //We use the pulled_atoms set of indexes to get the atoms we will be pulling
        at1 := mol.Coord(pulling_vector[0], 0)
        vector := chem.ZeroVecs(1)
        vector.Clone(mol.Coord(pulling_vector[1], 0)) //We copy to avoid altering the atom 88 coordinates
        vector.Sub(vector, at1)
        vector.Unit(vector)
        vector.Scale(pull_ammount, vector) //we started with an unit length vector and now multiply by the desired pull ammount
        pulled.AddRow(pulled, vector)
        if err != nil {
                panic(err.Error())
        }
        mol.Coords[0].SetVecs(pulled, pulled_atoms) //Now we put the pulled coordinates into the original molecule
        chem.XYZWrite("sample_pulled.xyz", mol.Coords[0], mol)
}

The most important functions used in this program are the methods .SomeVecs() and .SetVecs(). The first function takes a chem.VecMatrix and a slice of indexes (type int, starting from zero) and will put in the receiver a new matrix made from the rows corresponding to the indexes given, in the order given. In this way, first a chem.VecMatrix of the appropiate size, and filled with zeroes is produced by the function chem.ZeroVecs. After calling SomeVecs the zeroes in the matrix are replaced by the coordinates of the atoms of interest.

After transforming the coordinates obtained with SomeVecs, it is common to use SetVecs, with the same slice used by SomeVecs, to replace the modified coordinates into the original coordinates Notice that goChem uses Angstroms as units of length. Summarizing:

coords2transform:=chem.ZeroVecs(3) //We reserve space for the coordinates we are about to get.
indexes:=[]int{0,1,4}
coords2transform.SomeVecs(mollecule.Coods[0],indexes) //coords2transform contains a matrix with copies of
// the coordinates for atoms 0, 1 and 4 of molecule.

coords2transform:=RandomTransformationFunction(coords2transform)  //coords2transform have been modified.
//the corresponding coordinates in mollecule have not been altered.

molecule.Coords[0].SetVecs(coords2transform,indexes)  //Now the coordinates for
//atoms 0, 1 and 4 in molecule have been replaced by the modified versions in coords2transform

Note that the coordinates of the system are molecule.Coords[0] (or mol.Coords[0] in the main example). The 0 index means that we are taking the coordinates from the first state (frame) of the system.

##Second example workflow.

In the second example workflow we use a Ref interface to select atoms. Since everything except for the coordinates is stored in a Ref interface, we need Ref to decide which atoms we need to select (for instance, in order to use the chem.SomeRows function to obtain their coordinates). In the following example we get the indexes for all Histidine residues in a PDB file.

func Two() {
        mol, err := chem.PDBRead("2c9v.pdb", true)
        if err != nil {
                panic(err.Error())
        }
        //Use the reference to get the residue indexes for all histidine residues.
        //This is the function written below.
        ResSelected := SelectResidue(mol, "HIS")
        fmt.Println("The histidines in this proteins are residues number:", ResSelected)
        allowed_chains := []string{"A", "B"}
        //With the follwing we obtains all atoms that belong to the desired
        //molecules, in the allowed chains. In this case we allow both chains
        //(A and B) in the PDB file.
        HisAtoms := chem.Molecules2Atoms(mol, ResSelected, allowed_chains)
        //now you can use HisAtoms as indexes for chem.SomeCoords() and follow the first workflow.
        //I'm too lazy to do it here ;-)
        fmt.Println("Atom indexes for all histidine atoms:", HisAtoms)

}

//Uses a reference (mol) and a residue name (3-letter code) and returns a list with the ID of those residues in the molecule.
func SelectResidue(mol chem.Ref, residue string) []int {
        selected_residues := []int{} //we still dont select anything, so empty slice
        prevmol := -1 //This is the index of the molecular ID of the last atom read (a meaningless negative number initially)
        //it helps to make sure we process each molecule only once.
        for j := 0; j < mol.Len(); j++ {
                if mol.Atom(j).Molid != prevmol && mol.Atom(j).Molname == residue {
                        selected_residues = append(selected_residues, mol.Atom(j).Molid) //if the residue match and we have not processed
                        //this residue before, we add it to the list.
                }
                prevmol = mol.Atom(j).Molid
        }
        return selected_residues
}

The SelectResidue function takes a Ref interface and a residue name. It returns the ID of all residues with that name in the Ref. With this information, the function main calls the gochem function Molecules2Atoms which takes a list of residue (aka molecule) indexes and returns the indexes of all the atoms belonging to those residues. These indexes can be used, for instance, as an argument to SomeVecs to obtain the coordinates of the atoms in the desired residues (first example workflow).

##Third example workflow.

In this example we deal with trajectories and the Go interface to represent them, chem.Traj. The Traj interface allows to treat different kinds of trajectories in the same way. In the following example, only one line of the code would have to change if we wanted to process a DCD trajectory or a multiPDB instead of an XTC file. The most important function, ProcessTraj, is completely independent of the trajectory format. Note: This example requires compilation with the folowing option: -tags "xtc", and you must have the xdrfile installed in you system and available to the compiler/linker

func Three() {
        traj, err := chem.NewXTC("test.xtc")
        if err != nil {
                panic(err.Error())
        }
        ProcessTraj(traj)
}

//Obtains and prints the distance between atoms 2 and 10 (counting from zero) for each frame of trajectory
//traj.
func ProcessTraj(traj chem.Traj) {
        coords := chem.ZeroVecs(traj.Len())
        for i := 0; ; i++ { //infinite loop, we only break out of it by using "break"
                err := traj.Next(coords) //Obtain the next frame of the trajectory.
                if err != nil && err.Error() != "No more frames" {
                        panic(err.Error())
                        break
                } else if err == nil {
                        atom10 := coords.VecView(10)
                        atom2 := coords.VecView(2)
                        fmt.Println("Distance between the third and tenth atoms in the ", i+1, " frame: ", Distance(atom10, atom2), "A")
                } else {
                        break //leave the loop if the trajectory is over (i.e. if we get a "No more frames" error when trying to read a new fram)
                }
        }
}

//Calculates and returns the distance between two atoms.
func Distance(atom1, atom2 *chem.VecMatrix) float64 {
        res := chem.ZeroVecs(1)
        res.Sub(atom1, atom2)
        return res.Norm(0)
}

In the example, we use the Next() method of the interface Traj. This method reads the next frame in a trajectory. If we give it a pointer to a valid chem.VecMatrix as an argument it will put the frame in that chem.VecMatrix. If we give it nil, it will discard it. In this case we are interested in all the frames, so we always call Next with the same valid chem.VecMatrix object. The coordinates in the chem.VecMatrix object are substituted for those of the next frame in every call to Next.

Clone this wiki locally