Skip to content

User's Guide

Raúl Mera A edited this page Apr 9, 2024 · 45 revisions

Introduction

This guide assumes you can program in the Go language. It's easy! Take the Tour of go, and you will be pretty much ready to start. You can also go to golang.org for more documentation. In this guide, Go "slang" words are written in italics. Names of gochem types and functions are written in bold face, at least the first time they are mentioned.

What is goChem?

goChem is a library for chemistry, especially computational biochemistry, in Go. It provides structures for handling atoms and molecules, reading/writing some common formats (currently xyz, pdb, xtc and dcd), functions for geometrical analysis and manipulation of molecules, and functions for interaction with several quantum chemistry programs.

Design goals

  • Simplicity (of the code).
  • Readability.
  • Fast and light
  • Concurrent when possible and necessary
  • Easy to extend
  • Useful for computational biochemistry at a classical and QM levels.

Installation

goChem works with Go Modules, so you don't actually have to install anything. Simply add the relevant goChem modules to your program's go.mod file and issue the go tidy command.

If you need support for Gromacs' XTC trajectories, you will need the xdrfile library, which is open-source and can be obtained here. Make sure that the include files and the library itself are available to the compiler/linker.

Basic information

A little Go reminder

This reminder is not a replacement for taking the tour of Go, or reading documentation. It hopes to save you from having to go back to the documentation every second. We do not explain Go here. We mostly give quick and dirty definitions, and rough equivalents in other programming languages for key Go concepts used in the rest of this guide. The defined concepts are in logical, rather than in alphabetic order.

  • Reference A variable that does not have its own memory, but points to the same memory as another variable. This means that if A is a reference to B, A uses little additional memory, and every change in A will be reflected in B, and viceversa (see Slice).

  • Slice. A slice is an array of objects, all of the same type. Its contents can be altered, and its length can vary. You can obtain sub-slices of a slice. This sub-slices will be slices too, and any change in the sub-slice will be reflected in the original slice, and vice-versa. Slices are equivalent to Python lists (except that all the elements must be of the same type).

  • Method. Similar to in C++, Python, etc, a method is a function that is attached to a type. If you have an object A of a type B, and type B has the method C, you can call the method C by doing:

 A.C()

In this case, A is said to be the receiver of the method.

  • Receiver. An object on which a method is called (see above).

  • Interface. An interface is an abstract type. It defines a set of methods. If any object has the methods defined by the interface, it is said that the object implements the interface. For instance, if you define an interface called Jumper which defines only the method Jump(), and then define a function that takes a Jumper interface as argument, it means that this function will accept any object that has a Jump() method. The function will not know anything about the object, except that it has a Jump() method.

goChem units

goChem uses Angstroms for distance, radians for angles and kcal/mol for energy.

The Core goChem packages

The goChem library consists of several packages, but two of them form the core of the library, which are used in basically every functionality.

The v3 package.

This package which takes care of the spacial coordinates of molecules and the math related to that. Basically, the v3 package implements one type: v3.Matrix. v3.Matrix describes a Nx3 matrix, used by goChem to store Cartesian coordinates for N atoms. The v3.Matrix type implements most interfaces defined in gonum/matrix, and the v3 package depends on the gonum/matrix/mat64 library.

The chem package.

This package contains the core functionality of the library which is not directly dependent on the gonum/matrix/mat64 library. The chem package contains types for atoms, molecules and trajectories as well as functions to manipulate them.

Other packages for specific functions are available, such as the traj package for reading trajectory files, the qm package for interacting with quantum chemistry programs and the plot package to produce Ramachandran plots of proteins.

goChem basic types.

The library's core is one abstract type, the chem.Atomer interface, which, for a given molecule, stores information which is not expected to change easily (mostly, atomic data other than coordinates), and two concrete types: v3.Matrix, which stores coordinates, and chem.Atom wich stores information for a particular atom. The information stored in a chem.Ref interface is called the Topology of a molecule.

The chem.Atomer interface

The main object stored in a chem.Atomer is the non-geometrical atomic data for all the atoms in a molecule. chem.Atomer defines two functions: Atom(i int) which returns the chem.Atom object for the ith atom of the molecule and Len() which returns the number of atoms in the molecule.

A chem.Atomer is then merely a collection of chem.Atoms. When additional meta-data is needed, other interfaces such as chem.AtomMultiCharger which in addition to what Atomer can do allows to retrieve the charge and multiplicity of the molecule and chem.Masser which allows to obtain a vector with the masses of all atoms int he molecule.

The chem.Atom type

This concrete type, merely a go struct, stores non-coordinate data for one particular atom. The data includes the chemical symbol for the atom, the atom ID in the molecule, the ID of the sub-molecule it belongs, if any (i.e. the ID of the aminoacidic residue it belongs to if the molecule is a protein, or the equivalent for other polymers), among others.

The v3.Matrix type

This type that stores the coordinates of the atoms in a molecule. Unlike chem.Atomer and like chem.Atom, it is a concrete type, not an interface. v3.Matrix is simply a 3xN matrix, where N is the number of cartesian points in the system (i.e. atoms in the molecule).

As mentioned, v3.Matrix implements most of the interfaces defined in the gonum/matrix library for linear algebra, so it can be used with (at least most) functions of that library. v3.Matrix contains some specific methods not present in the gonum/matrix library which are useful for manipulations of atomic coordinates in 3D space.

Most functions in goChem require both a v3.Matrix and a chem.Atomer as parameters. Some functions some might require, for instance, a chem.AtomMultiCharger instead of a chem.Atomer, and some functions which deal exclusively with geometry might require only a v3.Matrix (i.e. the coordinates of a molecule)

An additional type important in goChem is the chem.Traj interface. It represent a trajectory, or a collection of different geometries, or "frames" for the same molecule (for instance, the evolution in time of a molecule). As such, it can be taken to represent a collection of v3.Matrix objects. The main method of the chem.Traj interface is the Next method, which allows to obtain the next frame (i.e. the next v3.Matrix) of the trajectory.

The chem.Molecule type

The most important concrete implementation of the chem.Atomer interface is the chem.Molecule type (which also implements chem.AtomMultiCharger and chem.Masser). This is the type that you will get when you use goChem to read a PDB or XYZ file. It describes a Molecule, including the coordinates and the atomic data.

As for the coordinates, chem.Molecule can contain the geometries for different states of a molecule (for instance, if you read a multi-PDB or a multi-XYZ file). Because of this, chem.Molecule not only implements the chem.Atomer and related interfaces but also the chem.Traj interface. In most cases, like when a single-PDB is read, chem.Molecule will contain a trajectory of only one frame.

In chem.Molecule, the trajectory is stored in a slice of v3.Matrix. The slice is in a field called Coords. This means that if you have a variable of type chem.Molecule called "A" and you want to obtain the first frame of the trajectory in "A" (which in many cases will be the only frame), you can use the following line:

 coords:=A.Coords[0]  

And you will have the coordinates (as a v3.Matrix) stored in the variable "coord".

The chem.Molecule type contains also additional data: It contains the B-factor, or temperature factors for each state of a molecule (i.e. one B-factor set for each frame), if applicable (only PDB files contain B-factor information).

Method implementation

Normal method operation

goChem aims to follow the style of the Go standard library and of the gonum set of libraries. For complex values (like atoms, coordinates and the like), the new value produced by the function is not returned, but instead is put in the receiver or the method. This means that if we want to add two vectors A and B and put the result in the variable B we do NOT do the following:

 C:=A.Add(B)  //This is incorrect!

Instead, we first prepare a C vector, and then do:

 C.Add(A,B)  //The result of A+B is stored in C

This approach allow to use the same variable C for several different calculations, which allows less memory to be used, resulting in faster and less memory-demanding programs.

Exceptions: The method Next.

The first exception to the previous rule is the method Next of the interface chem.Traj.

The method Next is also designed to save memory, as trajectories can be very large objects. As putting the information in the received would be senseless in this case, the method Next ask you for the coordinates as a parameter.

If we have a chem.Traj in a variable "B", and want to use Next to put the next frame in a variable "coords", we need to first build a v3.Matrix of size Nx3, where N is the number of atoms in the trajectory, put it in the variable "coords" and give it to Next() as a parameter. The matrix in the variable coords will be filled with the coordinates of the next frame in B.

 B.Next(coords)  //correct, 
//make sure that coords is of the right size. The variable coords now 
//contains the coordinates of the current frame.

One might ask, how can we know the number of atoms in the trajectory? or, put in another way, how do we know which size should coord be?. The interface chem.Traj provides a method Len which returns precisely this value:

 NAtoms:=B.Len()  //Now we now that we need a NAtomsx3 chem.VecMatrix to store the coordinates. 

If we are not interested in the next frame of B, we can discard it by giving nil as a parameter to Next (remember, nil is equivalent to NULL in C).

 B.Next(nil)  //correct, the next frame will be discarded

Other exceptions

goChem methods that produce basic Go types, such as numbers (float64,int, etc), strings or similar, simply return their values, like the function Len just discussed. In addition, functions that produce a reference to an object instead of a copy may return the value, as a reference barely uses memory.

Learning by example: Basic goChem usage

All examples in this chapter can be downloaded, along with the data they need to operate, from the examples_gochem repository.

First example workflow

The work made on a typical small goChem program could be summarized in the following steps:

  • Load a system from a PDB or XYZ file, obtaining a chem.Molecule.
  • Collect a subset of coordinates which are of interest. (for instance, the backbone atoms of a protein)
  • Do something with the coordinates (study their distribution, or modify them as in rotating them about an axis)
  • If the coordinates were modified, replace coordinates for the atoms of interest in the original molecule by the modified ones.
  • Print the results and, if applicable, generate a PDB or XYZ file with the new coordinates.

The following code shows such a program. In this case, the program modifies the original coordinates.

//One opens the sample.xyz file in the test directory, and pull a number of hardcoded atoms
//In the direction of a hardcoded vectos. It builds 12 files with the pulled atoms  displaced by
//different ammounts along the pulling vector
func One() {
	pulled_atoms := []int{43, 41, 42, 40, 85, 86, 87} //indexes
	pulling_vector := []int{40, 88}                   //The direction in which we pull is given by atoms 40 and 88 counting from 0
	pull_ammount := 4.0                               //how much do we want to pull, in Angstroms
	mol, err := chem.XYZFileRead("../FILES/sample.xyz")
	if err != nil {
		panic(err.Error())
	}
	pulled := v3.Zeros(7)
	pulled.SomeVecs(mol.Coords[0], pulled_atoms) //We use the pulled_atoms set of indexes to get the atoms we will be pulling
	at1 := mol.Coord(pulling_vector[0], 0)
	vector := v3.Zeros(1)
	vector.Copy(mol.Coord(pulling_vector[1], 0)) //We copy to avoid altering the atom 88 coordinates
	vector.Sub(vector, at1)
	vector.Unit(vector)
	vector.Scale(pull_ammount, vector) //we started with an unit lenght bector and now multiply by the desired pull ammount
	pulled.AddRow(pulled, vector)
	if err != nil {
		panic(err.Error())
	}
	mol.Coords[0].SetVecs(pulled, pulled_atoms) //Now we put the pulled coordinates into the original molecule
	chem.XYZFileWrite("../FILES/results/sample_pulled.xyz", mol.Coords[0], mol)
}

The most important functions used in this program are the methods .SomeVecs() and .SetVecs(). The first function takes a v3.Matrix and a slice of indexes (type int, starting from zero) and will fill each of the receiver's rows with the rows corresponding to the indexes given, in the order given. In this way, first a v3.Matrix of the appropriate size, and filled with zeroes is produced by the function v3.Zeros. After calling SomeVecs the zeroes in the matrix are replaced by the coordinates of the atoms of interest.

After transforming the coordinates obtained with SomeVecs, it is common to use SetVecs, with the same slice used by SomeVecs, to replace the modified coordinates into the original coordinates Notice that goChem uses Angstroms as units of length. Summarizing:

coords2transform:=v3.Zeros(3) //We reserve space for the coordinates we are about to get.
indexes:=[]int{0,1,4}
coords2transform.SomeVecs(molecule.Coods[0],indexes) //coords2transform contains a matrix with copies of
// the coordinates for atoms 0, 1 and 4 of molecule.

coords2transform:=RandomTransformationFunction(coords2transform)  //coords2transform has been modified.
//the corresponding coordinates in mollecule have not been altered.

molecule.Coords[0].SetVecs(coords2transform,indexes)  //Now the coordinates for
//atoms 0, 1 and 4 in molecule have been replaced by the modified versions in coords2transform

Note that the coordinates of the system are molecule.Coords[0] (or mol.Coords[0] in the main example). The 0 index means that we are taking the coordinates from the first state (frame) of the system.

Second example workflow

In the second example workflow we use a chem.Atomer interface to select atoms. Since everything except for the coordinates is stored in chem.Atoms objects, we can use chem.Atomer to decide which atoms we need to select (for instance, in order to use the chem.SomeVecs function to obtain their coordinates). In the following example we get the indexes for all Histidine residues in a PDB file.

func Two() {
	mol, err := chem.PDBFileRead("../FILES/test.pdb", true)
	if err != nil {
		panic(err.Error())
	}
	//Use the reference to get the residue indexes for all histidine residues.
	//This is the function written below.
	ResSelected := SelectResidue(mol, "HIS")
	fmt.Println("The histidines in this proteins are residues number:", ResSelected)
	allowed_chains := []string{"A", "B"}
	//With the follwing we obtains all atoms that belong to the desired
	//molecules, in the allowed chains. In this case we allow both chains
	//(A and B) in the PDB file.
	HisAtoms := chem.Molecules2Atoms(mol, ResSelected, allowed_chains)
	//now you can use HisAtoms as indexes for chem.SomeCoords() and follow the first workflow.
	//I'm too lazy to do it here ;-)
	fmt.Println("Atom indexes for all histidine atoms:", HisAtoms)

}

//Uses a reference (mol) and a residue name (3-letter code) and returns a list with the ID of those residues in the molecule.
func SelectResidue(mol chem.Atomer, residue string) []int {
	selected_residues := []int{} //we still dont select anything, so empty slice
	prevmol := -1                //This is the index of the molecular ID of the last atom read (a meaningless negative number initially)
	//it helps to make sure we process each molecule only once.
	for j := 0; j < mol.Len(); j++ {
		if mol.Atom(j).MolID != prevmol && mol.Atom(j).MolName == residue {
			selected_residues = append(selected_residues, mol.Atom(j).MolID) //if the residue match and we have not processed
			//this residue before, we add it to the list.
		}
		prevmol = mol.Atom(j).MolID
	}
	return selected_residues
}

The SelectResidue function takes a chem.Atomer interface and a residue name. It returns the ID of all residues with that name in the chem.Atomer. With this information, the function main calls the gochem function chem.Molecules2Atoms which takes a list of residue (aka molecule) indexes and returns the indexes of all the atoms belonging to those residues. These indexes can be used, for instance, as an argument to SomeVecs to obtain the coordinates of the atoms in the desired residues (first example workflow).

Third example workflow

In this example we deal with trajectories and the Go interface to represent them, chem.Traj. The Traj interface allows to treat different kinds of trajectories in the same way. In the following example, only one line of the code would have to change if we wanted to process a DCD trajectory or a multiPDB instead of an XTC file. The most important function, ProcessTraj, is completely independent of the trajectory format. Note: In addition to github/rmera/gochem, you need to import github.com/rmera/gochem/trajs in order to compile this example. The xdrfile library from GROMACS is also required

func Three() {
	traj, err := xtc.New("../FILES/test.xtc")
	if err != nil {
		panic(err.Error())
	}
	ProcessTraj(traj)
}

//Obtains and prints the distance between atoms 2 and 10 (counting from zero) for each frame of trajectory
//traj.
func ProcessTraj(traj chem.Traj) {
	coords := v3.Zeros(traj.Len())
	for i := 0; ; i++ { //infinite loop, we only break out of it by using "break"
		err := traj.Next(coords) //Obtain the next frame of the trajectory.
		if err != nil {
			_, ok := err.(chem.LastFrameError)
			if ok {
				break //We processed all frames and are ready, not a real error.

			} else {
				panic(err.Error)
			}
		} else {
			atom10 := coords.VecView(10)
			atom2 := coords.VecView(2)
			fmt.Println("Distance between the third and tenth atoms in the ", i+1, " frame: ", Distance(atom10, atom2), "A")
		}
	}
}

//Calculates and returns the distance between two atoms.
func Distance(atom1, atom2 *v3.Matrix) float64 {
	res := v3.Zeros(1)
	res.Sub(atom1, atom2)
	return res.Norm(2)
}

In the example, we use the Next() method of the interface chem.Traj. This method reads the next frame in a trajectory. If we give it a pointer to a valid v3.Matrix as an argument it will put the frame in that v3.Matrix. If we give it nil, it will discard it. In this case we are interested in all the frames, so we always call Next with the same valid v3.Matrix object. The coordinates in the v3.Matrix object are substituted for those of the next frame in every call to Next. The v3.Matrix method VecView takes the index of a row and returns a view of that row of the original v3.Matrix. The previous means that changes to the returned object will be reflected in the original v3.Matrix.