Tutorials for analysing CAGE and Deep-RACE data.

Various tutorials on how to analyse CAGE data.

Deep-RACE (work in progress)
CAGE differential analysis 1 (work in progress)
CAGE differential analysis 2
Simple use of FANTOM5 SDRF files
Normalisation of CAGE libraries by sub-sampling
Demultiplex nanoCAGE data using TagDust 2

These tutorials are designed to be executed on a Linux system's command line interface (also called Terminal or shell). I recommend the book The Linux Command Line, by William E. Shotts, Jr, January 2012, no starch press to people not familiar with entering commands on the keyboard. The missing semseter course of MIT looks good as well.

The programs used are assumed to be installed in advance. On the Debian operating system, many of them (BWA, SAMtools, BEDTools, ...) are available pre-packaged and will be installed (altogether with many other programs) by the command apt-get install med-bio.

Other software have to be downloaded and installed by hand. Place them in the bin directory in your home directory, and set their executable property in order to use them. If you had to create the bin directory, it will only be taken into account at your next connection (see stackoverflow for alternatives).

Here is for example how to download, compile and install the tagdust software. By convention, we will download the software in a directory called src. Compiling means to produce the executable program suitable for your computer, using the source code that was downloaded. On Debian systems, the programs necessary for compiling a program made in the C programming language can be installed through the build-essential package.

cd                    # move back to the home directory
mkdir -p src          # create the src directory if it did not exist.
cd src                # enter the src directory
wget http://genome.gsc.riken.jp/osc/english/software/src/tagdust.tgz   # download TagDust
tar xvf tagdust.tgz   # unpack TagDust
cd tagdust            # enter the freshly tagdust directory created by TagDust
make                  # compile the program
cp tagdust ~/bin      # copy tagdust to the 'bin' directory in your home directory

Frequent problems

Command not found.

It is not enough to compile a program. The command-line interface needs to find them, and by default it does not search in the current work directory.

A very good explanation is in The Linux Command Line's chapter 24, section Script File Location. Here is a brief summary.

The standard way to make programs accessible is to add them to one of a set of pre-defined directories that are collectively called the PATH. For system-wide installations, the directory is usually /usr/bin. For local installations by a single user, the directory is usually called bin, in the home directory, also accessible via the shortcut ~/bin. If it does not exist, it can be created like any other directory, but it may be necessary to log out and in again in order for the system to recognise this directory in the PATH.

In addition, the program needs to have the executable permissions. These can be given with the chmod command (see The Linux Command Line's chapter 24, section Executable Permissions.), or via the file navigator of the desktop graphical interface.

Lastly, it is possible to run a program that is not in the PATH. For this, just indicate in which directory it is. The current directory is always aliased to ., so to run a program called myscript that is in the current directory, type ./myscript. (The comment above about executable permissions still applies).

What is that sponge ?

sponge is a command from the moreutils collection, that I use frequently. On Debian systems, it is easy to install via the moreutils package.

The goal of sponge is to solve the following problem: when one file is read, piped to a command, and the result is redirected to the file itself, the contents are not updated as expected, but the file is deleted. This is because at the very beginning of the command, the file receiving the redirection is transformed in an empty file before its contents are even read. For example, with a file called example.fq:

cat example.fq | fastx_trimmer -f 11 > example.fq          # Deletes the file.
cat example.fq | fastx_trimmer -f 11 | sponge example.fq   # Trims the first 10 nucleotides.

Without sponge, one would need to create a temporary file (which is actually what sponge does in a more proper way behind the scene).

cat example.fq | fastx_trimmer -f 11 > example.tmp.fq
mv example.tmp.fq example.fq

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
CAGE_differential_analysis1		CAGE_differential_analysis1
CAGE_differential_analysis2		CAGE_differential_analysis2
CAGE_normalisation_by_subsampling		CAGE_normalisation_by_subsampling
Deep-RACE1		Deep-RACE1
Demultiplexing		Demultiplexing
FANTOM5_DPI_BED_file_with_annotation		FANTOM5_DPI_BED_file_with_annotation
FANTOM5_DPI_peak_shape		FANTOM5_DPI_peak_shape
FANTOM5_SDRF_files		FANTOM5_SDRF_files
Hierarchical_clustering		Hierarchical_clustering
Makefile		Makefile
README.html		README.html
README.md		README.md
todo		todo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tutorials for analysing CAGE and Deep-RACE data.

Frequent problems

Command not found.

What is that sponge ?

About

Uh oh!

Releases

Packages

Languages

charles-plessy/tutorial

Folders and files

Latest commit

History

Repository files navigation

Tutorials for analysing CAGE and Deep-RACE data.

Frequent problems

Command not found.

What is that sponge ?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages